Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to scale a Logging Infrastructure
Search
Paul Stack
June 03, 2015
Technology
0
200
How to scale a Logging Infrastructure
Logging infrastructure using ELK + Kafka
Paul Stack
June 03, 2015
Tweet
Share
More Decks by Paul Stack
See All by Paul Stack
Infrastructure as Software
stack72
0
91
Mirror, Mirror on the way, what is the vainest metric of them all?
stack72
1
2.4k
Continuously Delivering Infrastructure to the Cloud
stack72
0
220
DevOops 2016
stack72
0
130
The Quest for Infrastructure Management 2.0
stack72
0
160
The Biggest Trick Consultants Ever Pulled was Telling The World Continuous Delivery is Easy
stack72
1
140
The Transition from Product to Infrastructure
stack72
0
83
Continuous Delivery - the missing parts
stack72
0
990
Windows: Having its ass kicked by puppet and powershell
stack72
0
160
Other Decks in Technology
See All in Technology
Claude Cowork Plugins を読む - Skills駆動型業務エージェント設計の実像と構造
knishioka
1
180
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
AIで 浮いた時間で 何をする? 2026春 #devsumi
konifar
16
3.4k
AIエンジニア Devin と歩む、自律型運用プロセスの構築
a2ito
0
260
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
14k
バクラクにおける Document Understanding の挑戦:書類の「読取」から「意思決定」へ / document-understanding-in-bakuraku-2026
yuya4
0
150
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
43k
Databricksアシスタントが自分で考えて動く時代に! エージェントモード体験もくもく会
taka_aki
0
180
インシデント対応入門
grimoh
7
5.4k
Databricks (と気合い)で頑張るAI Agent 運用
kameitomohiro
0
340
LINEアプリ開発のための Claude Code活用基盤の構築
lycorptech_jp
PRO
1
1.1k
ローカルでLLMを使ってみよう
kosmosebi
0
200
Featured
See All Featured
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
110
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
92
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
200
Un-Boring Meetings
codingconduct
0
220
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
We Are The Robots
honzajavorek
0
190
The SEO Collaboration Effect
kristinabergwall1
0
380
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
180
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
130
Transcript
How do you scale a logging infrastructure to accept a
billion messages a day? Paul Stack http://twitter.com/stack72 mail:
[email protected]
About Me Infrastructure Engineer for a cool startup :) Reformed
ASP.NET / C# Developer DevOps Extremist Conference Junkie
Background Project was to replace the legacy ‘logging solution’
Iteration 0: A Developer created a single box with the
ELK all in 1 jar
Time to make it production ready now
None
Iteration 1: Using Redis as the input mechanism for LogStash
None
None
Enter Apache Kafka
“Kafka is a distributed publish- subscribe messaging system that is
designed to be fast, scalable, and durable” Source: Cloudera Blog
Introduction to Kafka • Kafka is made up of ‘topics’,
‘producers’, ‘consumers’ and ‘brokers’ • Communication is via TCP • Backed by Zookeeper
Kafka Topics Source: http://kafka.apache.org/documentation.html
Kafka Producers • Producers are responsible to chose what topic
to publish data to • The producer is responsible for choosing a partition to write to • Can be handled round robin or partition functions
Kafka Consumers • Consumption can be done via: • queuing
• pub-sub
Kafka Consumers • Kafka consumer group • Strong ordering
Kafka Consumers • Strong ordering
https://github.com/opentable/puppet-exhibitor
None
Iteration 2 Introduction of Kafka
None
None
Iteration 3 Further ‘Improvements’ to the cluster layout
None
The Numbers • Logs kept in ES for 30 days
then archived • 12 billion documents active in ES • ES space was about 25 - 30TB in EBS volumes • Average Doc Size ~ 1.2KB • V-Day 2015: ~750M docs collected without failure
What about metrics and monitoring?
Monitoring - Nagios • Alerts on • ES Cluster •
zK and Kafka Nodes • Logstash / Redis nodes
None
https://github.com/stack72/nagios-elasticsearch
Metrics - Kafka Offset Monitor
https://github.com/opentable/KafkaOffsetMonitor
Metrics - ElasticSearch
None
None
None
Visibility Rocks!
None
So what would I do differently?
Questions?
Paul Stack @stack72