Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
630
2
Share
RPC Metrics at Google
JBD
August 09, 2018
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.8k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.7k
Are you ready for production?
rakyll
8
2.9k
Servers are doomed to fail
rakyll
3
1.6k
Serverless Containers
rakyll
1
290
Critical Path Analysis
rakyll
0
700
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
Modding RubyKaigi for Myself
yui_knk
0
650
Inside Stream API
skrb
1
250
TypeSpec で繋ぐ複数プロダクトの型安全
maroon8021
1
260
1人1案件のプロダクトエンジニア時代に、"プロセス監督"としてチャレンジしたこと
non0113
0
340
関係性から理解する"同一性"の型用語たち
pvcresin
2
600
AI 時代のソフトウェア設計の学び方
masuda220
PRO
28
11k
New "Type" system on PicoRuby
pocke
1
290
要はバランスからの卒業 #yumemi_grow
kajitack
0
200
JavaDoc 再入門
nagise
0
190
TSKaigi2026-静的解析への投資がAI時代のコード品質を支える ── カスタムESLintルールの設計と運用
hayatokudou
6
1.3k
開発体験を左右するライブラリの API 設計 - GraphQL スキーマ構築ライブラリから考える #tskaigi
izumin5210
2
1.2k
Oxcを導入して開発体験が向上した話
yug1224
4
230
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
210
Testing 201, or: Great Expectations
jmmastey
46
8.2k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
350
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
160
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
330
Paper Plane
katiecoart
PRO
1
50k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.3k
Crafting Experiences
bethany
1
160
The World Runs on Bad Software
bkeepers
PRO
72
12k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll