Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
1.6k
3
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Servers are doomed to fail
JBD
May 17, 2019
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.8k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.7k
Are you ready for production?
rakyll
8
3k
Serverless Containers
rakyll
1
290
Critical Path Analysis
rakyll
0
700
Monitoring and Debugging Containers
rakyll
2
1.2k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
AI駆動開発を通して感じた、 AI時代のデザイナーの役割変化
whisaiyo
3
2.2k
プロダクト開発から業務改善コンサルまで。事業全体へ「染み出す」ことで広がるエンジニアの可能性
ham0215
0
130
AIの性能が向上しても未解決な組織の重大問題は何か?/An Unsolved Organizational Problem in the Age of AI
moriyuya
4
680
SONiCで構築・運用する生成AI向けパブリッククラウドネットワーク ~実装編~
sonic
0
220
中期計画、2回作ってみた ~業務委託と正社員、両方の視点から~
demaecan
1
880
【セミナー資料】Claude Code をセキュアに使うための考え方と設定の勘どころ / Claude Code Webinar 20260616
masahirokawahara
2
350
Bucharest Tech Week 2026 - Reinventing testing practices in the AI era
edeandrea
PRO
1
160
2026TECHFRESH畢業分享會 - Lightning Talk - 資料也要 CI/CD? 用 Airbyte 自動化資料同步
line_developers_tw
PRO
0
1.1k
手塩にかけりゃいいってもんじゃない
ming_ayami
0
590
Chainlitで作るお手軽チャットUI
ynt0485
0
260
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
2k
Bedrock AgentCore RuntimeでAuth0 Changelog調査AIをアップグレードした話
t5u8a5a
1
160
Featured
See All Featured
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
430
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
300
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.4k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
620
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
370
Exploring anti-patterns in Rails
aemeredith
3
410
Automating Front-end Workflow
addyosmani
1370
210k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
250
Unsuck your backbone
ammeep
672
58k
Designing for Performance
lara
611
70k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
The Curious Case for Waylosing
cassininazir
1
390
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]