Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
570
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
600
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
事業戦略を理解してソフトウェアを設計する
masuda220
PRO
7
1.1k
Devinで実践する!AIエージェントと協働する開発組織の作り方
masahiro_nishimi
6
2.6k
「兵法」から見る質とスピード
ickx
0
200
JSAI2025 RecSysChallenge2024 優勝報告
unonao
1
380
Parallel::Pipesの紹介
skaji
2
870
少数精鋭エンジニアがフルスタック力を磨く理由 -そしてAI時代へ-
rebase_engineering
0
130
Proxmoxをまとめて管理できるコンソール作ってみました
karugamo
1
410
primeNumberでのRBS導入の現在 && RBS::Traceでinline RBSを拡充してみた
mnmandahalf
0
250
推論された型の移植性エラーTS2742に挑む
teamlab
PRO
0
150
衛星の軌道をWeb地図上に表示する
sankichi92
0
250
コンポーネントライブラリで実現する、アクセシビリティの正しい実装パターン
schktjm
1
670
TypeScript Language Service Plugin で CSS Modules の開発体験を改善する
mizdra
PRO
3
2.4k
Featured
See All Featured
Building Adaptive Systems
keathley
41
2.6k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Unsuck your backbone
ammeep
671
58k
Typedesign – Prime Four
hannesfritz
41
2.6k
Statistics for Hackers
jakevdp
799
220k
The Cost Of JavaScript in 2023
addyosmani
49
8.1k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
Faster Mobile Websites
deanohume
307
31k
Scaling GitHub
holman
459
140k
Making Projects Easy
brettharned
116
6.2k
Speed Design
sergeychernyshev
30
970
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll