Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
1.6k
3
Share
Servers are doomed to fail
JBD
May 17, 2019
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.8k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
290
Critical Path Analysis
rakyll
0
700
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
MCPサーバーを中核としたAIエージェント開発と業務自動化/nikkei-tech-talk-45
nikkei_engineer_recruiting
0
100
Google Cloud Next '26 の裏でこっそりリリースされたCloud Number Registry & Cloud Hub コスト分析 を試してみた
hikaru1001
0
130
Oracle Cloud Infrastructure:2026年4月度サービス・アップデート
oracle4engineer
PRO
0
220
コードや知識を組み込む / Incorporate Code and Knowledge
ks91
PRO
0
180
【技術書典20】OpenFOAM(自宅で深める流体解析)流れと熱移動(2)
kamakiri1225
0
340
Modernizing Your HCL Connections Experience: Visual Report to chain, Profile Enhancements, and AI Integration
wannesrams
0
230
AI時代のガードレールとしてのAPIガバナンス
nagix
0
340
AIでAIをテストする - 音声AIエージェントの品質保証戦略
morix1500
1
160
AIが盛んな時代に 技術記事を書き始めて起きた私の中での小さな変化
peintangos
0
330
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.4k
アクセシビリティはすべての人のもの
tomokusaba
0
170
MySQL 9.7がやってきた ~これまでのあらすじと基本情報~ @ 日本MySQLユーザ会会2026年04月 / mysql97-yattekita
sakaik
0
140
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.1k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.7k
Building an army of robots
kneath
306
46k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
WCS-LA-2024
lcolladotor
0
560
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
270
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
130
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]