Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
72
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
190
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
140
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
62
Continuous Delivery - the missing parts
stack72
0
960
Windows: Having its ass kicked by puppet and powershell
stack72
0
140
Other Decks in Technology
See All in Technology
いかにして命令の入れ替わりについて心配するのをやめ、メモリモデルを愛するようになったか(改)
nullpo_head
7
2.6k
生成AI時代におけるAI・機械学習技術を用いたプロダクト開発の深化と進化 #BetAIDay
layerx
PRO
1
1.2k
Agent Development Kitで始める生成 AI エージェント実践開発
danishi
0
150
【CEDEC2025】『Shadowverse: Worlds Beyond』二度目のDCG開発でゲームをリデザインする~遊びやすさと競技性の両立~
cygames
PRO
1
370
金融サービスにおける高速な価値提供とAIの役割 #BetAIDay
layerx
PRO
1
830
10年以上続くプロダクトで今取り組んでること、取り組もうとしていること
sansantech
PRO
2
110
テストを実行してSorbetのsigを書こう!
sansantech
PRO
1
100
Lambda management with ecspresso and Terraform
ijin
2
160
Foundation Model × VisionKit で実現するローカル OCR
sansantech
PRO
1
370
AIに目を奪われすぎて、周りの困っている人間が見えなくなっていませんか?
cap120
1
640
Jamf Connect ZTNAとMDMで実現! 金融ベンチャーにおける「デバイストラスト」実例と軌跡 / Kyash Device Trust
rela1470
1
200
アカデミーキャンプ 2025 SuuuuuuMMeR「燃えろ!!ロボコン」 / Academy Camp 2025 SuuuuuuMMeR "Burn the Spirit, Robocon!!" DAY 1
ks91
PRO
0
150
Featured
See All Featured
Git: the NoSQL Database
bkeepers
PRO
431
65k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Scaling GitHub
holman
461
140k
Making Projects Easy
brettharned
117
6.3k
BBQ
matthewcrist
89
9.8k
Thoughts on Productivity
jonyablonski
69
4.8k
The Invisible Side of Design
smashingmag
301
51k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Designing for humans not robots
tammielis
253
25k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72