Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
RPC Metrics at Google
Search
JBD
August 09, 2018
Programming
2
550
RPC Metrics at Google
JBD
August 09, 2018
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.7k
Servers are doomed to fail
rakyll
3
1.5k
Serverless Containers
rakyll
1
240
Critical Path Analysis
rakyll
0
550
Monitoring and Debugging Containers
rakyll
2
1.1k
Other Decks in Programming
See All in Programming
『テスト書いた方が開発が早いじゃん』を解き明かす #phpcon_nagoya
o0h
PRO
2
280
Flutter × Firebase Genkit で加速する生成 AI アプリ開発
coborinai
0
160
Honoのおもしろいミドルウェアをみてみよう
yusukebe
1
210
『品質』という言葉が嫌いな理由
korimu
0
160
昭和の職場からアジャイルの世界へ
kumagoro95
1
380
ファインディLT_ポケモン対戦の定量的分析
fufufukakaka
0
730
Grafana Cloudとソラカメ
devoc
0
170
富山発の個人開発サービスで日本中の学校の業務を改善した話
krpk1900
4
390
Conform を推す - Advocating for Conform
mizoguchicoji
3
690
Amazon S3 TablesとAmazon S3 Metadataを触ってみた / 20250201-jawsug-tochigi-s3tables-s3metadata
kasacchiful
0
170
CDK開発におけるコーディング規約の運用
yamanashi_ren01
2
130
GitHub Actions × RAGでコードレビューの検証の結果
sho_000
0
270
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
47
5.2k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
GitHub's CSS Performance
jonrohan
1030
460k
Music & Morning Musume
bryan
46
6.3k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
550
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.3k
Building Your Own Lightsaber
phodgson
104
6.2k
Why Our Code Smells
bkeepers
PRO
336
57k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
27
1.9k
Thoughts on Productivity
jonyablonski
69
4.5k
Transcript
RPC Metrics at Google JBD, Google (@rakyll)
gRPC Metrics at Google JBD, Google (@rakyll)
Request Metrics at Google JBD, Google (@rakyll)
@rakyll "100% is the wrong reliability target for basically everything."
-- Benjamin Treynor Sloss, VP of Engineering, Google
@rakyll "A service is available if users cannot tell that
there was an outage."
@rakyll Principled way of saying what level of downtime is
acceptable. • Error rate • Latency expectations SLOs
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store
@rakyll Questions infra teams want to ask: • Are we
meeting the SLO for the other team? • What’s the impact of a product on infra? • How much do we need to scale up if product grows 10%?
@rakyll High-Cardinality Breaking down the metrics data...
@rakyll Query the collected data in various ways: • Latency
distribution for RPCs originated at Google Analytics. • Requests take took more than 100ms for the customer #123. • Compare the request latency initiated at web vs mobile frontend.
@rakyll Analytics frontend server Authentication Reporting Users ... Spanner Blob
Store originator=analytics; ...
@rakyll Blob store read errors by originator
@rakyll Dynamically choose aggregation (split between recording and aggregation)
@rakyll Exemplars
@rakyll /rpz and /statz
@rakyll http://server:7777/debug/rpcz
@rakyll Export? Monarch, Prometheus, and more.
@rakyll import “cloud.google.com/go/pubsub”
@rakyll +
Thank you! JBD, Google
[email protected]
@rakyll