Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
550
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.1k
Other Decks in Technology
See All in Technology
RECRUIT TECH CONFERENCE 2025 プレイベント【高橋】
recruitengineers
PRO
0
160
ユーザーストーリーマッピングから始めるアジャイルチームと並走するQA / Starting QA with User Story Mapping
katawara
0
210
明日からできる!技術的負債の返済を加速するための実践ガイド~『ホットペッパービューティー』の事例をもとに~
recruitengineers
PRO
3
410
Tech Blogを書きやすい環境づくり
lycorptech_jp
PRO
1
240
Data-centric AI入門第6章:Data-centric AIの実践例
x_ttyszk
1
410
オブザーバビリティの観点でみるAWS / AWS from observability perspective
ymotongpoo
8
1.5k
Classmethod AI Talks(CATs) #17 司会進行スライド(2025.02.19) / classmethod-ai-talks-aka-cats_moderator-slides_vol17_2025-02-19
shinyaa31
0
120
Cloud Spanner 導入で実現した快適な開発と運用について
colopl
1
700
7日間でハッキングをはじめる本をはじめてみませんか?_ITエンジニア本大賞2025
nomizone
2
1.8k
Goで作って学ぶWebSocket
ryuichi1208
2
1.3k
運用しているアプリケーションのDBのリプレイスをやってみた
miura55
1
730
Amazon S3 Tablesと外部分析基盤連携について / Amazon S3 Tables and External Data Analytics Platform
nttcom
0
140
Featured
See All Featured
A Philosophy of Restraint
colly
203
16k
A Modern Web Designer's Workflow
chriscoyier
693
190k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
175
51k
Thoughts on Productivity
jonyablonski
69
4.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
30
4.6k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.2k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.7k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.3k
Being A Developer After 40
akosma
89
590k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]