Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
1.6k
3
Share
Servers are doomed to fail
JBD
May 17, 2019
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.8k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
280
Critical Path Analysis
rakyll
0
690
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
スクラムを支える内部品質の話
iij_pr
0
350
新規サービス開発におけるReact Nativeのリアル〜技術選定の裏側と実践的OSS活用〜
grandbig
2
180
サイバーフィジカル社会とは何か / What Is a Cyber-Physical Society?
ks91
PRO
0
160
DIPS2.0データに基づく森林管理における無人航空機の利用状況
naokimuroki
0
190
すごいぞManaged Kubernetes
harukasakihara
1
390
終盤で崩壊させないAI駆動開発
j5ik2o
0
450
Claude Teamプランの選定と、できること/できないこと
rfdnxbro
1
1.9k
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.2k
推し活エージェント
yuntan_t
1
910
Oracle AI Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
6
1.4k
自己組織化を試される緑茶ハイを求めて、今日も全力であそんで学ぼう / Self-Organization and Shochu Green Tea
naitosatoshi
0
330
NgRx SignalStore: The Power of Extensibility
rainerhahnekamp
0
190
Featured
See All Featured
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
160
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
1
480
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
110
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
200
How to Ace a Technical Interview
jacobian
281
24k
Embracing the Ebb and Flow
colly
88
5k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
710
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]