Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
190
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
75
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
200
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
150
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
130
The Transition from Product to Infrastructure
stack72
0
65
Continuous Delivery - the missing parts
stack72
0
970
Windows: Having its ass kicked by puppet and powershell
stack72
0
140
Other Decks in Technology
See All in Technology
Flaky Testへの現実解をGoのプロポーザルから考える | Go Conference 2025
upamune
1
420
小学4年生夏休みの自由研究「ぼくと Copilot エージェント」
taichinakamura
0
140
o11yで育てる、強い内製開発組織
_awache
3
120
20250929_QaaS_vol20
mura_shin
0
110
GA technologiesでのAI-Readyの取り組み@DataOps Night
yuto16
0
270
英語は話せません!それでも海外チームと信頼関係を作るため、対話を重ねた2ヶ月間のまなび
niioka_97
0
120
Large Vision Language Modelを用いた 文書画像データ化作業自動化の検証、運用 / shibuya_AI
sansan_randd
0
110
Why React!?? Next.jsそしてReactを改めてイチから選ぶ
ypresto
10
4.5k
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
11
77k
自作LLM Native GORM Pluginで実現する AI Agentバックテスト基盤構築
po3rin
2
250
生成AIを活用したZennの取り組み事例
ryosukeigarashi
0
200
DataOpsNight#8_Terragruntを用いたスケーラブルなSnowflakeインフラ管理
roki18d
1
340
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
331
21k
Documentation Writing (for coders)
carmenintech
75
5k
The Invisible Side of Design
smashingmag
301
51k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Site-Speed That Sticks
csswizardry
11
880
Building an army of robots
kneath
306
46k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
890
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
The Language of Interfaces
destraynor
162
25k
The Cost Of JavaScript in 2023
addyosmani
53
9k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72