Practical Monitoring for Knative Serving / KubeCon + CloudNativeCon Japan 2025

Kazuki Higashiguchi, Autify Practical Monitoring for Knative Serving

Who am I? Kazuki Higashiguchi Senior Site Reliability Engineer @
Autify AI Platform for Software Quality Assurance End-user of Knative Serving for our ML workloads in Production /in/hgsgtk

Monitors from the user’s perspective, correlates with scaling metrics, and
alerts early. But, there’s little community guidance on this. So we’re sharing our lessons learned! Why Monitoring Knative Matters ✅ Knative’s magic: • Serverless autoscaling, out of the box Gateway errors hurt user experience 🚨 But in production… • Is autoscaling fast enough? • How many warm pods? • Enough nodes for scale?

Knative Serving Architecture Request driven pod scaling - spins up
pods on demand in response to incoming requests Key Components: • Activator • Autoscaler • Controller • Webhook • Ingress gateway - Istio, Kourier Knative supports Prometheus and OpenTelemetry Collector for collecting metrics.

Monitor from the user’s perspective User experience: API-level monitors Key
component - Activator: • queues incoming requests and forwards them • triggers the autoscaler to bring scaled-to-zero services back online Key Metrics: • request_count - The number of requests • request_latencies - The response time in milliseconds for successfully routed requests

Alerting on Bad Gateway Errors in Knative 502 errors often
happen when scaling can’t keep up with traffic. Activator metrics detect issues early. ⚠ App-level error reporting won’t detect these! 💡 Optionally monitor all 5xx errors. Alert threshold activator_request_count {response_code=502}

API-level Dashboard • Request volume (request_count) ◦ Total ◦ By
Service ◦ By Response Code • Success Rate (request_count) ◦ Non-5xx / total • Response Time (request_latencies) ◦ By Service ◦ By Response Code

Monitoring Autoscaler Efficiency Key Component - Autoscaler: • scales Knative
services based on configuration, metrics, and incoming requests Key Metrics: • pending_pods – Pods currently pending • requested_pods, actual_pods, not_ready_pods, terminating_pods – Pods in various lifecycle states • excess_burst_capacity – Overserved burst capacity (buffer for scale)

Alerting on Potential Scaling Issues ⚠ High pending ratio may
indicate cluster capacity issues. Alert threshold autoscaler_pending_pods / “total_pods” e.g., insufficient allocatable nodes

Scaler-level Dashboard • Pod Counts (*_pods) ◦ requested|actual| not_ready|pendin g|terminating
• Concurrency ◦ Requested (activator.request _concurrency) ◦ Observed (excess_burst_cap acity)

Kazuki Higashiguchi, Autify Thank you!

References • Architecture ◦ Knative Serving Architecture - https://knative.dev/docs/serving/architecture/ ◦
Knative Serving Autoscaling System - https://github.com/knative/serving/blob/main/docs/scaling/SYSTEM.md • Observability ◦ Collecting metrics in Knative - https://knative.dev/docs/serving/observability/metrics/collecting-metrics/#import-grafa na-dashboards ◦ Knative Serving metrics - https://knative.dev/docs/serving/observability/metrics/serving-metrics/ ◦ Grafana Dashboards - https://github.com/knative-extensions/monitoring/tree/main/grafana

Practical Monitoring for Knative Serving / Kube...

Practical Monitoring for Knative Serving / KubeCon + CloudNativeCon Japan 2025

Kazuki Higashiguchi

More Decks by Kazuki Higashiguchi

Other Decks in Technology

Featured

Transcript

Kazuki Higashiguchi, Autify Practical Monitoring for Knative Serving

Who am I? Kazuki Higashiguchi Senior Site Reliability Engineer @

Monitors from the user’s perspective, correlates with scaling metrics, and

Knative Serving Architecture Request driven pod scaling - spins up

Monitor from the user’s perspective User experience: API-level monitors Key

Alerting on Bad Gateway Errors in Knative 502 errors often

API-level Dashboard • Request volume (request_count) ◦ Total ◦ By

Monitoring Autoscaler Efficiency Key Component - Autoscaler: • scales Knative

Alerting on Potential Scaling Issues ⚠ High pending ratio may

Scaler-level Dashboard • Pod Counts (*_pods) ◦ requested|actual| not_ready|pendin g|terminating

Kazuki Higashiguchi, Autify Thank you!

References • Architecture ◦ Knative Serving Architecture - https://knative.dev/docs/serving/architecture/ ◦