Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.6k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
280
Critical Path Analysis
rakyll
0
690
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
ReactのdangerouslySetInnerHTMLは“dangerously”だから危険 / Security.any #09 卒業したいセキュリティLT
flatt_security
0
470
Copilot 宇宙へ 〜生成AIで「専門データの壁」を壊す方法〜
nakasho
0
160
中央集権型を脱却した話 分散型をやめて、連邦型にたどり着くまで
sansantech
PRO
1
290
AI時代のIssue駆動開発のススメ
moongift
PRO
0
140
Phase11_戦略的AI経営
overflowinc
0
1.3k
Escape from Excel方眼紙 ~マークダウンで繋ぐ、人とAIの架け橋~ /nikkei-tech-talk44
nikkei_engineer_recruiting
0
180
スピンアウト講座05_実践活用事例
overflowinc
0
1k
SSoT(Single Source of Truth)で「壊して再生」する設計
kawauso
2
320
BFCacheを活用して無限スクロールのUX を改善した話
apple_yagi
0
110
Zero Data Loss Autonomous Recovery Service サービス概要
oracle4engineer
PRO
4
13k
データマネジメント戦略Night - 4社のリアルを語る会
kubell_hr
0
180
LLMに何を任せ、何を任せないか
cap120
7
2.8k
Featured
See All Featured
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
160
Rails Girls Zürich Keynote
gr2m
96
14k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
320
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
500
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
93
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
The Curse of the Amulet
leimatthew05
1
10k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
89
Reality Check: Gamification 10 Years Later
codingconduct
0
2.1k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
160
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]