Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
production: an owner's manual
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Igor Wiedler
April 23, 2018
Programming
190
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
production: an owner's manual
from exec(ut) 2018
Igor Wiedler
April 23, 2018
More Decks by Igor Wiedler
See All by Igor Wiedler
Redis Bedtime Stories
igorw
1
360
Wide Event Analytics (LISA19)
igorw
4
940
a day in the life of a request
igorw
0
170
The Power of 2
igorw
0
340
LISP 1.5 Programmer's Manual: A Dramatic Reading
igorw
0
480
The Moral Character of Software
igorw
1
310
interdisciplinary computing (domcode)
igorw
0
320
miniKanren (clojure berlin)
igorw
1
330
End the war on tabs (phpnw14)
igorw
1
1.1k
Other Decks in Programming
See All in Programming
Spring Security 実践 ─ GraphQL APIで実務に役立つ 認証・認可 を学ぶ
wagyu
0
260
はてなアカウント基盤 State of the Union
cockscomb
0
680
Performance Engineering for Everyone
elenatanasoiu
0
220
IBM Bobを活用したレガシーアプリの最新化
oniak3ibm
PRO
1
210
AIを活用したE2Eテスト実装効率化のあゆみ / ebisu-mobile-14-kotetu
kotetuco
0
130
Oxlintのカスタムルールの現況
syumai
6
1.2k
作って学ぶ、 JSX (TSX) ランタイムの基本
syumai
7
1.7k
技術的負債解消で開発者の未来を開く- AIの力でコード刷新
kmd2kmd
0
120
技術記事、 専門家としてのプログラマ、 言語化
mizchi
13
6.5k
依存関係から依存物へ―Dependencyという言葉の歴史をひも解く
j_lee
0
140
The NotImplementedError Problem in Ruby
koic
1
920
AI 輔助遺留系統現代化的經驗分享
jame2408
1
990
Featured
See All Featured
How GitHub (no longer) Works
holman
316
150k
Statistics for Hackers
jakevdp
799
230k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.5k
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Code Review Best Practice
trishagee
74
20k
Typedesign – Prime Four
hannesfritz
42
3.1k
It's Worth the Effort
3n
188
29k
From π to Pie charts
rasagy
0
220
Abbi's Birthday
coloredviolet
3
8.2k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
The Limits of Empathy - UXLibs8
cassininazir
1
370
The World Runs on Bad Software
bkeepers
PRO
72
12k
Transcript
production: an owner's manual
hello!
broken computers
None
getting sidetracked now so sorry* * not sorry
None
None
None
back to serious business
!
None
a production system is a system that serves real users
the goal of operations is to ensure services are reliable
in order to provide a good user experience
None
failure
app
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel the cloud
None
• cosmic rays • disk failure • power outages •
software bugs • ...
entropy
None
capacity
None
None
None
cascading failure
None
system design
redundancy
"
scale
None
"
p1 m3 c1 m2 m1 p2 c2
data storage
"
"
protocols
None
monitoring
many components many req/s
None
measure all the things?
✅ ⏱
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
0 - 50 [1620]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (74.55%) 50 - 100 [ 447]: ∎∎∎∎∎∎∎∎∎∎ (20.57%) 100 - 150 [ 49]: ∎ (2.25%) 150 - 200 [ 15]: (0.69%) 200 - 250 [ 15]: (0.69%) 250 - 300 [ 10]: (0.46%) 300 - 350 [ 6]: (0.28%) 350 - 400 [ 1]: (0.05%) 400 - 450 [ 0]: (0.00%) 450 - 500 [ 4]: (0.18%)
golden signals • latency • traffic • errors • saturation
saturation traffic latency errors
None
humans
None
oops, deleted the database
bad human!
why does this button even exist?
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans h u m a n s
epic failure is almost always systemic
failure
recap
• a production system serves real users • users like
things that work and are fast • epic failure is almost always systemic
thx @igorwhilefalse
None