Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Paul Stack
June 03, 2015
Technology
210
0
Share
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
100
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
240
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
170
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
150
The Transition from Product to Infrastructure
stack72
0
91
Continuous Delivery - the missing parts
stack72
0
1k
Windows: Having its ass kicked by puppet and powershell
stack72
0
160
Other Decks in Technology
See All in Technology
エージェント時代の UIとAPI、CLI戦略
coincheck_recruit
0
140
AI와 협업하는 조직으로의 여정
arawn
0
590
知ってた?JavaScriptの"正しさ"を検証するテストが5万以上もあること(Test262)
riyaamemiya
1
150
Percolatorを廃止し、マルチ検索サービスへ刷新した話 / Search Engineering Tech Talk 2026 Spring
visional_engineering_and_design
0
320
GitHub Copilot CLI と VS Code Agent Mode の使い分け
tomokusaba
0
140
20260428_Product Management Summit_Loglass_JoeHirose
loglassjoe
4
7.1k
ボトムアップ限界を越える - 20チームを束る "Drive Map" / Beyond Bottom-Up: A 'Drive Map' for 20 Teams
kaonavi
0
120
AWS Transform CustomでIaCコードを自由自在に変換しよう
duelist2020jp
0
240
ServiceによるKubernetes通信制御ーClusterIPを例に
miku01
1
140
ハーネスエンジニアリング入門
knishioka
0
120
多角的な視点から見たAGI
terisuke
0
120
エンタープライズの厳格な制約を開発者に意識させない:クラウドネイティブ開発基盤設計/cloudnative-kaigi-golden-path
mhrtech
0
120
Featured
See All Featured
The browser strikes back
jonoalderson
0
1k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
The Cult of Friendly URLs
andyhume
79
6.9k
How to train your dragon (web standard)
notwaldorf
97
6.6k
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
200
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.4k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
370
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
110
Code Reviewing Like a Champion
maltzj
528
40k
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
1k
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72