Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
67
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
180
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
140
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
61
Continuous Delivery - the missing parts
stack72
0
950
Windows: Having its ass kicked by puppet and powershell
stack72
0
130
Other Decks in Technology
See All in Technology
大規模PaaSにおける監視基盤の構築と効率化の道のり
lycorptech_jp
PRO
0
170
Digitization部 紹介資料
sansan33
PRO
1
3.8k
プラットフォームとしての Datadog / Datadog as Platforms
aoto
PRO
1
330
大手企業のAIツール導入の壁を越えて:サイバーエージェントのCursor活用戦略
gunta
6
760
TypeScript と歩む OpenAPI の discriminator / OpenAPI discriminator with TypeScript
kaminashi
1
150
プロジェクトマネジメント実践論|現役エンジニアが語る!~チームでモノづくりをする時のコツとは?~
mixi_engineers
PRO
3
180
“新卒らしさ”を脱ぎ捨てて 〜1年を経て学んだこと〜
rebase_engineering
0
130
Cloud Run を解剖して コンテナ監視を考える / Breaking Down Cloud Run to Rethink Container Monitoring
aoto
PRO
0
110
Redmineの意外と知らない便利機能 (Redmine 6.0対応版)
vividtone
0
1.1k
AIコードエディタは開発を変えるか?Cursorをチームに導入して1ヶ月経った本音
ota1022
1
690
MCP で繋ぐ Figma とデザインシステム〜LLM を使った UI 実装のリアル〜
kimuson
2
1.3k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
8
65k
Featured
See All Featured
Navigating Team Friction
lara
186
15k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
48
5.4k
The Pragmatic Product Professional
lauravandoore
35
6.7k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
19k
Done Done
chrislema
184
16k
4 Signs Your Business is Dying
shpigford
183
22k
How GitHub (no longer) Works
holman
314
140k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.4k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72