Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.6k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
270
Critical Path Analysis
rakyll
0
670
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
Lambda Durable FunctionsでStep Functionsの代わりはできるのかを試してみた
smt7174
2
140
GitHub Copilot CLI 現状確認会議
torumakabe
12
4.7k
re:Inventで出たインフラエンジニアが嬉しかったアップデート
nagisa53
4
210
ファシリテーション勉強中 その場に何が求められるかを考えるようになるまで / 20260123 Naoki Takahashi
shift_evolve
PRO
3
380
Oracle Cloud Infrastructure:2026年1月度サービス・アップデート
oracle4engineer
PRO
0
150
【Oracle Cloud ウェビナー】ランサムウェアが突く「侵入の隙」とバックアップの「死角」 ~ 過去の教訓に学ぶ — 侵入前提の防御とデータ保護 ~
oracle4engineer
PRO
2
220
EventBridge API Destination × AgentCore Runtimeで実現するLambdaレスなイベント駆動エージェント
har1101
7
260
人はいかにして 確率的な挙動を 受け入れていくのか
vaaaaanquish
4
2.6k
AI Agent Agentic Workflow の可観測性 / Observability of AI Agent Agentic Workflow
yuzujoe
7
2.3k
クラウドセキュリティの進化 — AWSの20年を振り返る
kei4eva4
0
160
Security Hub と出会ってから 1年半が過ぎました
rch850
0
180
プロダクトエンジニアこそ必要なPMスキル 〜デリバリー力を最大化し、価値を届け続けるために〜
layerx
PRO
0
130
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
97
6.5k
Deep Space Network (abreviated)
tonyrice
0
36
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
50
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
46
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.7k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
290
BBQ
matthewcrist
89
10k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
0
1.1k
Un-Boring Meetings
codingconduct
0
190
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
75
Typedesign – Prime Four
hannesfritz
42
2.9k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]