Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
560
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
590
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
カオスに立ち向かう小規模チームの装備の選択〜フルスタックTSという装備の強み _ 弱み〜/Choosing equipment for a small team facing chaos ~ Strengths and weaknesses of full-stack TS~
bitkey
1
150
最速Green Tea 🍵 Garbage Collector
kuro_kurorrr
1
130
ドメイン駆動設計とXPで支える子どもの未来 / Domain-Driven Design and XP Supporting Children's Future
nrslib
0
300
Browser and UI #2 HTML/ARIA
ken7253
2
180
flutter_kaigi_mini_4.pdf
nobu74658
0
150
Serving TUIs over SSH with Go
caarlos0
0
720
AIコーディングの本質は“コード“ではなく“構造“だった / The essence of AI coding is not “code” but "structure
seike460
PRO
2
460
KANNA Android の技術的課題と取り組み
watabee
1
540
Rubyの!メソッドをちゃんと理解する
alstrocrack
1
320
Golangci-lint v2爆誕: 君たちはどうすべきか
logica0419
1
270
生成AIで知るお願いの仕方の難しさ
ohmori_yusuke
1
120
開発者フレンドリーで顧客も満足?Platformの秘密
algoartis
0
230
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
268
20k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
5
570
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.7k
The Invisible Side of Design
smashingmag
299
50k
Speed Design
sergeychernyshev
29
940
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
How to train your dragon (web standard)
notwaldorf
91
6k
Embracing the Ebb and Flow
colly
85
4.7k
Mobile First: as difficult as doing things right
swwweet
223
9.6k
A Modern Web Designer's Workflow
chriscoyier
693
190k
The Cost Of JavaScript in 2023
addyosmani
49
7.8k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll