On the path to full Observability with OSS (and launch of Loki)
KubeCon 2018 presentation on how to instrument an app with Prometheus and Jaeger, how do debug an app, and about Grafana's new log aggregation solution: Loki.
running Prometheus on Kubernetes and set of configs for all exporters you need to get Kubernetes metrics • https://github.com/grafana/jsonnet-libs/tree/master/prometheus-ksonne t Our configs for running Prometheus, Alertmanager, Grafana together • https://github.com/kubernetes-monitoring/kubernetes-mixin Joint project to unify and improve common alerts for Kubernetes
because the p99 latency shot up from <10ms to >700ms • RED method dashboard is ideal entrypoint to see health of the system • Notice also DB error rates, luckily not bubbling up to user
using Jaeger • App is spending lots of time even though DB request returned quickly • Root cause: backoff period was too high • Idea for fix: lower backoff period
using the graph legend • Ad-hoc stats on structured log fields • Root cause found: “Too many open connections” • Idea for fix: more DB replicas, or connection pooling
• Prometheus-style stream selector • Regexp filtering by the backend • Simple UI: ◦ no paging ◦ return and render 1000 rows by default ◦ Use the power of Cmd+F
In-browser line parsing support for JSON and logfmt • Ad-hoc stats across returned results (up to 1000 rows by default) • Coming soon: ad-hoc graphs based on parsed numbers
To enable, edit Grafana config.ini file [explore] enabled = true Explore will be released in Grafana v6.0 (Feb 2019) Loki can be used today Feedback welcome: @davkals or [email protected]