Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
610
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
低レイヤソフトウェア技術者が YouTuberとして食っていこうとした話
sat
PRO
6
4.8k
Four Keysから始める信頼性の改善 - SRE NEXT 2025
ozakikota
0
420
Introduction to Bill One Development Engineer
sansan33
PRO
0
270
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
18k
エンジニアリングマネージャー“お悩み相談”パネルセッション
ar_tama
1
110
SREのためのeBPF活用ステップアップガイド
egmc
2
1.3k
アクセスピークを制するオートスケール再設計: 障害を乗り越えKEDAで実現したリソース管理の最適化
myamashii
1
710
LLM拡張解体新書/llm-extension-deep-dive
oracle4engineer
PRO
24
6.6k
AWS CDK 入門ガイド これだけは知っておきたいヒント集
anank
5
780
毎晩の 負荷試験自動実行による効果
recruitengineers
PRO
5
180
Transformerを用いたアイテム間の 相互影響を考慮したレコメンドリスト生成
recruitengineers
PRO
2
490
助けて! XからWaylandに移行しないと新しいGNOMEが使えなくなっちゃう 2025-07-12
nobutomurata
2
200
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
512
110k
RailsConf 2023
tenderlove
30
1.1k
Facilitating Awesome Meetings
lara
54
6.5k
Adopting Sorbet at Scale
ufuk
77
9.5k
Become a Pro
speakerdeck
PRO
29
5.4k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
The Cult of Friendly URLs
andyhume
79
6.5k
Scaling GitHub
holman
460
140k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
How STYLIGHT went responsive
nonsquared
100
5.6k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]