Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
180
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
72
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
190
DevOops 2016
stack72
0
120
The Quest for Infrastructure Management 2.0
stack72
0
140
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
120
The Transition from Product to Infrastructure
stack72
0
61
Continuous Delivery - the missing parts
stack72
0
950
Windows: Having its ass kicked by puppet and powershell
stack72
0
130
Other Decks in Technology
See All in Technology
本当にわかりやすいAIエージェント入門
segavvy
8
4.5k
AWS 怖い話 WAF編 @fillz_noh #AWSStartup #AWSStartup_Kansai
fillznoh
0
140
Amplify Gen2から知るAWS CDK Toolkit Libraryの使い方/How to use the AWS CDK Toolkit Library as known from Amplify Gen2
fossamagna
1
380
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.7k
Introduction to Bill One Development Engineer
sansan33
PRO
0
270
【あのMCPって、どんな処理してるの?】 AWS CDKでの開発で便利なAWS MCP Servers特集
yoshimi0227
6
1k
BEYOND THE RAG🚀 ~とりあえずRAG?を超えていけ! 本当に使えるAIエージェント&生成AIプロダクトを目指して~ / BEYOND-THE-RAG-Toward Practical-GenerativeAI-Products-AOAI-DevDay-2025
jnymyk
4
170
AWS Well-Architected から考えるオブザーバビリティの勘所 / Considering the Essentials of Observability from AWS Well-Architected
sms_tech
1
710
アクセスピークを制するオートスケール再設計: 障害を乗り越えKEDAで実現したリソース管理の最適化
myamashii
1
800
助けて! XからWaylandに移行しないと新しいGNOMEが使えなくなっちゃう 2025-07-12
nobutomurata
2
210
Deep Security Conference 2025:生成AI時代のセキュリティ監視 /dsc2025-genai-secmon
mizutani
4
3.4k
セキュアなAI活用のためのLiteLLMの可能性
tk3fftk
1
450
Featured
See All Featured
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Bash Introduction
62gerente
613
210k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
The Invisible Side of Design
smashingmag
301
51k
Building Applications with DynamoDB
mza
95
6.5k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.2k
Code Reviewing Like a Champion
maltzj
524
40k
Art, The Web, and Tiny UX
lynnandtonic
300
21k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
Side Projects
sachag
455
43k
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72