Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
170
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
61
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.3k
Continuously Delivering Infrastructure to the Cloud
stack72
0
170
DevOops 2016
stack72
0
110
The Quest for Infrastructure Management 2.0
stack72
0
130
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
100
The Transition from Product to Infrastructure
stack72
0
58
Continuous Delivery - the missing parts
stack72
0
920
Windows: Having its ass kicked by puppet and powershell
stack72
0
120
Other Decks in Technology
See All in Technology
【Λ(らむだ)】アップデート機能振り返りΛ編 / PADjp20250127
lambda
0
100
20250122_FinJAWS
takuyay0ne
2
340
Skip Skip Run Run Run ♫
temoki
0
320
企業テックブログにおける執筆ネタの考え方・見つけ方・広げ方 / How to Think of, Find, and Expand Writing Topics for Corporate Tech Blogs
honyanya
0
680
サーバレスの未来〜The Key to Simplifying Everything〜
kawaji_scratch
2
330
FinJAWS_reinvent2024_recap_database
asahihidehiko
2
310
Microsoft Ignite 2024 最新情報!Microsoft 365 Agents SDK 概要 / Microsoft Ignite 2024 latest news Microsoft 365 Agents SDK overview
karamem0
0
160
バクラクの組織とアーキテクチャ(要約)2025/01版
shkomine
4
500
実践している探索的テストの進め方 #jasstnano
makky_tyuyan
1
130
スクラムマスターの活動と組織からの期待のズレへの対応 / Dealing with the gap between Scrum Master activities and organizational expectations
pauli
2
980
2週に1度のビッグバンリリースをデイリーリリース化するまでの苦悩 ~急成長するスタートアップのリアルな裏側~
kworkdev
PRO
8
5.6k
プロダクト開発、インフラ、コーポレート、そしてAIとの共通言語としての Terraform / Terraform as a Common Language for Product Development, Infrastructure, Corporate Engineering, and AI
yuyatakeyama
6
1.4k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
98
18k
VelocityConf: Rendering Performance Case Studies
addyosmani
327
24k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.3k
Scaling GitHub
holman
459
140k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
260
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
11
890
Statistics for Hackers
jakevdp
797
220k
Documentation Writing (for coders)
carmenintech
67
4.6k
Visualization
eitanlees
146
15k
YesSQL, Process and Tooling at Scale
rocio
170
14k
GraphQLの誤解/rethinking-graphql
sonatard
68
10k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72