Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prometheus as exposition format for eBPF progra...

Prometheus as exposition format for eBPF programs running on Kubernetes

Nowadays every application exposes their metrics via an HTTP endpoint readable by using Prometheus. Recently the exposition format got included into the OpenMetrics standard of the CNCF. Nevertheless, this very common pattern by definition only expose metrics regarding the specific applications being observed.

This talk wants to expose the idea, and a reference implementation, of a slightly different use case that uses eBPF programs as a source of information to allow the exposition and collection of kernel and application probes via a Prometheus endpoint.

Leonardo Di Donato

September 19, 2019
Tweet

More Decks by Leonardo Di Donato

Other Decks in Programming

Transcript

  1. Prometheus as exposition format for eBPF programs running on Kubernetes

    Leonardo Di Donato. Open Source Software Engineer @ Sysdig. 2019.09.19 - DevOpsDays - Istanbul, Turkey
  2. whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace,

    kube-bpf, kubectl-dig, and go-syslog. Reach me out @leodido on twitter & github.
  3. Deal with just a few instead of thousands of them.

    Aggregate events at kernel level @leodido
  4. What if we could exploit Prometheus (or OpenMetrics) awesomeness without

    having to punctually instrument applications to monitor? Can we avoid to clog our applications through eBPF superpowers? eBFP superpowers @leodido
  5. What eBPF is You can now write mini programs that

    run on events like disk I/O which are run in a safe register-based VM using a custom 64 bit RISC instruction set in the kernel. In-kernel verifier refuses to load eBPF programs with invalid pointer dereferences, exceeding maximum call stack, or with loop without an upper bound. Imposes a stable Application Binary Interface (ABI). Even more amazing than BPF 🚀 A core part of the Linux kernel. @leodido extended because it’s not just packets anymore
  6. load compile @leodido BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY http://bit.ly/bpf_map_types 📎

    BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT 📎 http://bit.ly/bpf_prog_types man 2 bpf man 8 tc-bpf How does eBFP work? user-space kernel BPF source BPF ELF bpf() verifier BPF Maps Maps data kprobe uprobe static tracepoint perf events XDP socket filter
  7. • fully programmable • event driven • can trace everything

    in a system • not limited to a specific application • unified tracing interface for both kernel and userspace • {k,u}probes, (dtrace)tracepoints and so on are also used by other tools • minimal (negligible) performance impact • attach JIT native compiled inst. code • no long suspensions of execution Advantages • requires a fairly recent kernel • definitely not for debugging • no knowledge of the calling higher level language implementation • not fully running in user space • kernel-user context (usually negligible) switch when eBPF instrument a user process • still not portable as other tracers • VM primarily developer in the Linux kernel (work-in-progress portings btw) Disadvantages Why use eBPF at all to trace userspace processes?
  8. @leodido Count packets by protocol Count sys_enter_write by process ID

    macro to generate sections inside the object file (later interpreted by the ELF BPF loader)
  9. Just use a sidecar container • A sidecar container sharing

    the process namespace • Image with eBPF loader + eBPF program in it • Not very generic approach but does the job! 🤔 @leodido
  10. github.com/bpftools/kube-bpf 🔗 Like loading whatever eBPF program from its ELF

    using a Kubernetes CRD ? 🤯 Grab metrics via eBPF and expose them using a Prometheus endpoint. Something more generic? @leodido
  11. 📎 http://bit.ly/k8s_crd An extension of the K8S API that let

    you store and retrieve structured data. Custom resources 📎 http://bit.ly/k8s_shared_informers The actual control loop that watches the shared state using the workqueue. Shared informers 📎 http://bit.ly/k8s_custom_controllers It declares and specifies the desired state of your resource continuously trying to match it with the actual state. Controllers Customize all the things
  12. @leodido BPF runner bpf() syscall eBPF program ... user-space kernel

    eBPF map eBPF program ... BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program BPF CRD Here’s the evil plan :9387/metrics :9387/metrics
  13. @leodido Compile and inspect This is important because communicates to

    set the current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel won’t load you eBPF! Exceptions applies... eBPF Maps
  14. @leodido # HELP test_packets No. of packets per protocol (key),

    node # TYPE test_packets counter test_packets{key="00001",node="127.0.0.1"} 8 test_packets{key="00002",node="127.0.0.1"} 1 test_packets{key="00006",node="127.0.0.1"} 551 test_packets{key="00008",node="127.0.0.1"} 1 test_packets{key="00017",node="127.0.0.1"} 15930 test_packets{key="00089",node="127.0.0.1"} 9 test_packets{key="00233",node="127.0.0.1"} 1 # EOF It is a WIP project but already open source! 🎺 Check the protocol numbers 🔗 Check it out @ gh:bfptools/kube-bpf 🔗 ip-10-12-0-136.ec2.internal:9387/metrics # <- ICMP # <- IGMP # <- TCP # <- EGP # <- UDP # <- OSPF # <- ?
  15. @leodido # HELP test_dummy No. sys_enter_write calls per PID (key),

    node # TYPE test_dummy counter test_dummy{key="00001",node="127.0.0.1"} ... test_dummy{key="00001",node="127.0.0.1"} 8 test_dummy{key="00295",node="127.0.0.1"} 1 test_dummy{key="01278",node="127.0.0.1"} 1158 test_dummy{key="04690",node="127.0.0.1"} 209 test_dummy{key="04691",node="127.0.0.1"} 889 # EOF It is a WIP project but already open source! 🎺 Check it out @ gh:bfptools/kube-bpf 🔗 ip-10-12-0-122.ec2.internal:9387/metrics
  16. @leodido It is a WIP project but already open source!

    🎺 Contributions are welcome! 🎊 Check it out @ gh:bfptools/kube-bpf 🔗
  17. kubectl-trace More eBPF + Kubernetes? Run bpftrace program (from file)

    Ctrl-C tells the program to plot the results using hist() The output histogram Maps
  18. @leodido • Prometheus exposition format is here to stay given

    how simple it is 📊 • OpenMetrics will introduce improvements on such giant shoulders 📈 • We cannot monitor and observe everything from inside our applications 🎯 • We might want to have a look at the orchestrator (context) our apps live and die in 🕸 • Kubernetes can be extended to achieve such levels of integrations 🔌 • ELF is cool 🧝 • We look for better tools (eBPF) for grabbing our metrics and even more 🔮 • Almost nullify footprint ⚡ • Enable a wider range of available data 🌊 • Do not touch our applications directly 👻 • There is a PoC doing some magic at github.com/bfptools/kube-bpf 🧞 Key takeaways
  19. Acronyms & Abbreviations In case you wonder ABI Application Binary

    Interface BPF Berkeley Packet Filters CRD Custom Resource Definition (Kubernetes) eBPF extended Berkeley Packet Filters ELF Executable and Linkable Format RISC Reduced instruction set computer VM Virtual Machine
  20. Thanks. Reach me out @leodido on twitter & github! SEE

    Y’ALL AROUND AT KUBECON Slides here.