Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
590
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
630
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
kiroとCodexで最高のSpec駆動開発を!!数時間で web3ネイティブなミニゲームを作ってみたよ!
mashharuki
0
170
iOSエンジニア向けの英語学習アプリを作る!
yukawashouhei
0
190
overlayPreferenceValue で実現する ピュア SwiftUI な AdMob ネイティブ広告
uhucream
0
180
CSC305 Lecture 03
javiergs
PRO
0
240
Django Ninja による API 開発効率化とリプレースの実践
kashewnuts
0
1.3k
Writing Better Go: Lessons from 10 Code Reviews
konradreiche
0
1.2k
Pull-Requestの内容を1クリックで動作確認可能にするワークフロー
natmark
2
510
(Extension DC 2025) Actor境界を越える技術
teamhimeh
1
250
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1.1k
スマホから Youtube Shortsを見られないようにする
lemolatoon
27
31k
Go Conference 2025: Goで体感するMultipath TCP ― Go 1.24 時代の MPTCP Listener を理解する
takehaya
9
1.7k
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
3.4k
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
32
2.3k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.9k
The Straight Up "How To Draw Better" Workshop
denniskardys
238
140k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Done Done
chrislema
185
16k
Typedesign – Prime Four
hannesfritz
42
2.8k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Automating Front-end Workflow
addyosmani
1371
200k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Facilitating Awesome Meetings
lara
56
6.6k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll