Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
JBD
May 17, 2019
Technology
3
1.6k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
280
Critical Path Analysis
rakyll
0
690
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
Lookerの最新バージョンv26.2がやばい話
waiwai2111
1
130
Data Hubグループ 紹介資料
sansan33
PRO
0
2.8k
【PyCon mini Shizuoka 2026】生成AI時代に画像処理やオーディオ処理のノードエディターを作る理由
kazuhitotakahashi
0
150
Agent Ready になるためにデータ基盤チームが今年やること / How We're Making Our Data Platform Agent-Ready
zaimy
0
180
20260222ねこIoTLT ねこIoTLTをふりかえる
poropinai1966
0
250
俺の失敗を乗り越えろ!メーカーの開発現場での失敗談と乗り越え方 ~ゆるゆるチームリーダー編~
spiddle
0
350
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4k
器用貧乏が強みになるまで ~「なんでもやる」が導いたエンジニアとしての現在地~
kakehashi
PRO
5
590
三菱UFJ銀行におけるエンタープライズAI駆動開発のリアル / Enterprise AI_Driven Development at MUFG Bank: The Real Story
muit
10
19k
opsmethod第1回_アラート調査の自動化にむけて
yamatook
0
310
競争優位を生み出す戦略的内製開発の実践技法
masuda220
PRO
2
470
社内でAWS BuilderCards体験会を立ち上げ、得られた気づき / 20260225 Masaki Okuda
shift_evolve
PRO
1
130
Featured
See All Featured
Optimizing for Happiness
mojombo
379
71k
The Invisible Side of Design
smashingmag
302
51k
WCS-LA-2024
lcolladotor
0
470
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
130
The SEO identity crisis: Don't let AI make you average
varn
0
400
Building a Scalable Design System with Sketch
lauravandoore
463
34k
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
160
Everyday Curiosity
cassininazir
0
140
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
360
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
130
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
Between Models and Reality
mayunak
1
210
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]