Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
250
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
620
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
AIエージェント開発、DevOps and LLMOps
ymd65536
1
340
Dart 参戦!!静的型付き言語界の隠れた実力者
kno3a87
0
210
パスタの技術
yusukebe
1
400
技術的負債で信頼性が限界だったWordPress運用をShifterで完全復活させた話
rvirus0817
1
2.1k
CEDEC2025 長期運営ゲームをあと10年続けるための0から始める自動テスト ~4000項目を50%自動化し、月1→毎日実行にした3年間~
akatsukigames_tech
0
150
管你要 trace 什麼、bpftrace 用下去就對了 — COSCUP 2025
shunghsiyu
0
470
[FEConf 2025] 모노레포 절망편, 14개 레포로 부활하기까지 걸린 1년
mmmaxkim
0
990
UbieのAIパートナーを支えるコンテキストエンジニアリング実践
syucream
2
700
レガシープロジェクトで最大限AIの恩恵を受けられるようClaude Codeを利用する
tk1351
2
1.2k
AIコーディングAgentとの向き合い方
eycjur
0
220
物語を動かす行動"量" #エンジニアニメ
konifar
14
5.4k
新世界の理解
koriym
0
140
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Six Lessons from altMBA
skipperchong
28
4k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
A better future with KSS
kneath
239
17k
RailsConf 2023
tenderlove
30
1.2k
Statistics for Hackers
jakevdp
799
220k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Automating Front-end Workflow
addyosmani
1370
200k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google
[email protected]
@rakyll