Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
600
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
53
31k
Liquid Glass革新とSwiftUI/UIKit進化
fumiyasac0921
0
110
標準技術と独自システムで作る「つらくない」SaaS アカウント管理 / Effortless SaaS Account Management with Standard Technologies & Custom Systems
yuyatakeyama
2
990
Oracle Audit Vault and Database Firewall 20 概要
oracle4engineer
PRO
2
1.6k
知識を整理して未来を作る 〜SKDとAI協業への助走〜
yosh1995
0
130
Observability в PHP без боли. Олег Мифле, тимлид Altenar
lamodatech
0
270
TechLION vol.41~MySQLユーザ会のほうから来ました / techlion41_mysql
sakaik
0
140
25分で解説する「最小権限の原則」を実現するための AWS「ポリシー」大全 / 20250625-aws-summit-aws-policy
opelab
6
680
20250625 Snowflake Summit 2025活用事例 レポート / Nowcast Snowflake Summit 2025 Case Study Report
kkuv
1
170
OAuth/OpenID Connectで実現するMCPのセキュアなアクセス管理
kuralab
5
810
(非公式) AWS Summit Japan と 海浜幕張 の歩き方 2025年版
coosuke
PRO
1
320
Definition of Done
kawaguti
PRO
6
460
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.5k
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.6k
Making the Leap to Tech Lead
cromwellryan
134
9.3k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.6k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
660
Adopting Sorbet at Scale
ufuk
77
9.4k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
Code Reviewing Like a Champion
maltzj
524
40k
Why Our Code Smells
bkeepers
PRO
337
57k
Building a Modern Day E-commerce SEO Strategy
aleyda
41
7.3k
Rails Girls Zürich Keynote
gr2m
94
14k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]