Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Observability_at_Google_--_OSCON.pdf
Search
JBD
July 23, 2018
Programming
1
230
Observability_at_Google_--_OSCON.pdf
JBD
July 23, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
550
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
620
『GO』アプリ バックエンドサーバのコスト削減
mot_techtalk
0
140
Java Webフレームワークの現状 / java web framework at burikaigi
kishida
9
2.2k
なぜイベント駆動が必要なのか - CQRS/ESで解く複雑系システムの課題 -
j5ik2o
11
3.9k
Immutable ActiveRecord
megane42
0
140
GoとPHPのインターフェイスの違い
shimabox
2
190
Introduction to kotlinx.rpc
arawn
0
700
Rubyで始める関数型ドメインモデリング
shogo_tksk
0
110
コミュニティ駆動 AWS CDK ライブラリ「Open Constructs Library」 / community-cdk-library
gotok365
2
140
『GO』アプリ データ基盤のログ収集システムコスト削減
mot_techtalk
0
120
パスキーのすべて ── 導入・UX設計・実装の紹介 / 20250213 パスキー開発者の集い
kuralab
3
790
JavaScriptツール群「UnJS」を5分で一気に駆け巡る!
k1tikurisu
9
1.8k
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7.1k
How to Think Like a Performance Engineer
csswizardry
22
1.3k
A Philosophy of Restraint
colly
203
16k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
40
2k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
114
50k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.6k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.7k
Building Applications with DynamoDB
mza
93
6.2k
Bash Introduction
62gerente
611
210k
Transcript
Observability at Google JBD, Google (@rakyll)
@rakyll History Long history of distributed systems 10ks of different
services built by 100s of teams Many backends/analysis tools invented here ™
@rakyll
@rakyll 100% availability (is a lie)
“ @rakyll A service is available if users cannot tell
there is an outage.
“ @rakyll Google Load Balancers are available if users cannot
tell there is an outage.
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll An observable system tells more than its availability.
@rakyll Context, status, expectations, debuggability
@rakyll How? Observe by collecting signals Export them to analysis
tools Correlate and analyze to find root cause
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll This is hard Must have integrations for web, RPC,
and storage clients Must support all languages Must be context aware (e.g. canary vs prod) Must support many analysis tools Developers need to add custom instrumentation
@rakyll This is too hard!
@rakyll Borg Stubby Census
opencensus.io
@rakyll
@rakyll
@rakyll
@rakyll
@rakyll Z-Pages • Allows processes report their own dashboards. •
Z-Pages have no sampling.
@rakyll Try! import “go.opencensus.io/plugin/ocgrpc” s := grpc.NewServer(grpc.StatsHandler(&ocgrpc.ServerHandler{})) if err :=
s.Serve(lis); err != nil { log.Fatalf("Failed to serve: %v", err) }
@rakyll import ( “go.opencensus.io/stats/view” “go.opencensus.io/trace” “contrib.go.opencensus.io/exporter/stackdriver” ) exporter, err :=
stackdriver.NewExporter(stackdriver.Options{ … }) if err != nil { log.Fatal(err) } view.RegisterExporter(exporter) trace.RegisterExporter(exporter)
@rakyll
@rakyll
@rakyll Roadmap Stable libraries in 8+ languages Exporter daemon Cluster-wide
Z-Pages Smart sampling Exemplars Framework, database, MQ integrations
opencensus.io
Thank you! opencensus.io JBD, Google
[email protected]
@rakyll