Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
580
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
620
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
AI時代のドメイン駆動設計-DDD実践におけるAI活用のあり方 / ddd-in-ai-era
minodriven
23
9k
🔨 小さなビルドシステムを作る
momeemt
1
500
技術的負債で信頼性が限界だったWordPress運用をShifterで完全復活させた話
rvirus0817
1
2.1k
decksh - a little language for decks
ajstarks
4
21k
UbieのAIパートナーを支えるコンテキストエンジニアリング実践
syucream
2
700
CSC305 Summer Lecture 05
javiergs
PRO
0
110
The State of Fluid (2025)
s2b
0
200
【第4回】関東Kaggler会「Kaggleは執筆に役立つ」
mipypf
0
780
兎に角、コードレビュー
mitohato14
0
150
AIエージェント開発、DevOps and LLMOps
ymd65536
1
340
Understanding Ruby Grammar Through Conflicts
yui_knk
1
120
Scale out your Claude Code ~自社専用Agentで10xする開発プロセス~
yukukotani
9
2.6k
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
53
8.8k
Producing Creativity
orderedlist
PRO
347
40k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
110
20k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
480
RailsConf 2023
tenderlove
30
1.2k
Documentation Writing (for coders)
carmenintech
73
5k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.4k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
9
780
A Tale of Four Properties
chriscoyier
160
23k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
570
How GitHub (no longer) Works
holman
315
140k
Speed Design
sergeychernyshev
32
1.1k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll