Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handling High Cardinality in Observability

Handling High Cardinality in Observability

I gave this talk at the "Grafana and Friends" meetup in Bengaluru on 16th September.

Meetup link - https://www.meetup.com/grafana-and-friends-bengaluru/events/295400707/

Points discussed about handling high cardinality.

- Relabel
- Drop Labels
- Split Metrics
- Streaming Aggregations - https://docs.last9.io/docs/streaming-aggregations
- Scale-Out
- Moving high cardinality metrics to separate lake
- Better Workflows like isolation, controls

Learn more about how we solve the cardinality challenges in monitoring and observability systems at - https://last9.io/levitate-tsdb/

Last9's Managed Prometheus solution

What is High Cardinality

Prometheus Cardinality

Streaming Aggregations vs. Recording Rules

How we tame high cardinality in Levitate

How we tame high cardinality with Levitate

Prometheus Downsampling

Prathamesh Sonpatki

September 17, 2023
Tweet

More Decks by Prathamesh Sonpatki

Other Decks in Technology

Transcript

  1. There is a company! Who haz customers Customers haz campaigns

    Each Campaign → sends notifications Each campaign haz multiple destinations Each campaign is deployed in a region 2
  2. 👔 Business Ask What is the performance for each campaign

    in ap-south-1 region for whatsapp channel? 🤔 3
  3. 🙂 Customer Success Ask What is the performance for all

    campaigns of customer Acme Inc? 🤔 4
  4. Differing Questions from different personas - Business cares about tenancy,

    campaigns, device types, geos and SLAs. - Product cares about tenant, channels. - Application developers care about services, SLOs, performance. - Infrastructure engineers care for infrastructure provisioning per instance and spend per tenant/campaign/channel. 7
  5. Differing Questions from different personas - Business - Product -

    Application - Infrastructure 8 Data increases!
  6. But who can answer all of these questions - Business

    - Product - Application - Infrastructure 9 Monitoring Systems
  7. Why Metrics - Aggregated - Cheaper - Can answer all

    the questions from Infra to Product to Business - Real Time - Monitoring instead of Debugging - Trend Analysis - High Level Overview of your subsystems 14
  8. High Cardinality** - Cardinality exceeding for a metric beyond a

    safer limit - Exploding labels - Each metric has its own cardinality 18
  9. High Cardinality - Cardinality exceeding for a metric beyond a

    safer limit - Exploding labels - Each metric has its own cardinality That’s 69120 combinations!! Some call this Active TimeSeries At every reporting interval. 19
  10. Why Cardinality is relevant? - Answers to questions from Business,

    Product, App, Infrastructure - More answers lead to more questions - Real Time Information - Labels can pack insights - Aggregation 20
  11. Cost of High Cardinality - 💸 Money - 😥 Toil

    - 📈 Increased Resources - 🔥 Burn - ❌ Lack of answers 22
  12. Cardinality needs to be handled 25 - Legit growth -

    Cardinality Spikes due to some incorrect change
  13. Inflight Aggregation Cardinality Limiters Usage/Unused filters Cardinality Isolation Instrumentation Cardinality

    Lakes Rollups Scale Out Retention Data Tiering Alerting Dashboards SLOs Ingestion Storage Querying Too Late & Too expensive Too Early & Too Involved Best phase to handle cardinality Cardinality needs to be handled, but where? 27
  14. Handling High Cardinality - Relabel - Can be used in

    case of legit growth - During instrumentation phase - Affects developers - Affects SREs - May 🔥 resources at agent 29
  15. Handling High Cardinality - Drop Labels - Can be used

    in case the growth is not legit - During instrumentation phase - Affects developers - Affects SREs - Removes ability to get answers 😨 30
  16. Handling High Cardinality - Split Metrics - Can be used

    in case the growth is legit - During instrumentation phase - Affects developers - Affects SREs - Affects queries and dashboards 😥 31
  17. Inflight Aggregation Cardinality Limiters Usage/Unused filters Cardinality Isolation Instrumentation Cardinality

    Lakes Rollups Scale Out Retention Data Tiering Alerting Dashboards SLOs Ingestion Storage Querying Too Late & Too expensive Too Early & Too Involved Best phase to handle cardinality Controls and Workflows to handle High Cardinality 32
  18. Handling High Cardinality - Stream Agg - Can be used

    in case the growth is legit - Before Ingestion phase - No performance penalty - Native PromQL support is bonus - Timestamp Awareness is a big plus - Affects queries and dashboards 33
  19. Handling High Cardinality - Isolation - Can be used in

    case the growth is legit - On demand - No performance penalty - Keeps everything else going 35
  20. Handling High Cardinality - Separate Lakes - Can be used

    in case the growth is legit - On demand - No performance penalty - Move high cardinality metrics to new storage - No change in query and dashboards and ingestion 36
  21. Handling High Cardinality - Relabel - Drop Labels - Split

    Metrics - Streaming Aggregations - Scale Out - Moving high cardinality metrics to separate lake - Better Workflows like isolation, controls 37
  22. Handling High Cardinality - Everything is a tradeoff. - Having

    control over which options to choose is better. - Handling legit growth is necessary. - Cost trumps all. No free lunch. - It is possible to give answers to product, business, app, infra teams from metrics monitoring systems. 38
  23. Once Cardinality is not a problem, once knowledge is not

    a problem, what would you do with it? 39
  24. But who can answer all of these questions - Business

    - Product - Application - Infrastructure 40 Monitoring Systems