⚫ eBPF is a great and revolutionary technology but it is difficult to understand internals. ⚫ Especially, using eBPF programs in the container environment is getting more complicated. ⚫ This session helps to operate eBPF programs in your container- based production system. 2
me ⚫ Kenta Tada ⚫ Project Manager @ Toyota Motor Corporation ⚫ I’m researching and developing both server-side and automotive systems. ✓Especially, I’m trying to integrate eBPF technologies into our systems. ⚫ I’m a member of our open source program office. ⚫ Recent activities ⚫ The reviewer of 入門 eBPF ⚫ Cloud Native Community Japan Organizer ✓CNCF Cloud Native Community Japan 4
is eBPF ⚫ eBPF is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules. 5 What is eBPF? – eBPF
is possible ⚫ Networking ⚫ Speed packet processing without leaving kernel space. Add additional protocol parsers and easily program any forwarding logic to meet changing requirements. ⚫ Observability ⚫ Collection and in-kernel aggregation of custom metrics with generation of visibility events and data structures from a wide range of possible sources without having to export samples. ⚫ Tracing & Profiling ⚫ Attach eBPF programs to trace points as well as kernel and user application probe points giving powerful introspection abilities and unique insights to troubleshoot system performance problems. ⚫ Security ⚫ Combine seeing and understanding all system calls with a packet and socket-level view of all networking to create security systems operating on more context with a better level of control. 6 eBPF - Introduction, Tutorials & Community Resources
with eBPF ⚫ Security ⚫ If one is going to run code in the kernel space, it’s going to have access to a lot of capabilities that normal programs on computers don’t get. ⚫ Performance tradeoffs ⚫ Doing too many things with eBPF may end up eating the gains. ⚫ Co-existence ⚫ eBPF tools will have to work in combination with other software. ⚫ Deep kernel expertise ⚫ Programming eBPF effectively requires deep kernel expertise. ⚫ Too much data ⚫ Interoperability The_State_of_eBPF.pdf (linuxfoundation.org) 7
for operating eBPF programs in prod ⚫ Confirm kernel facilities for eBPF ⚫ Available facilities for eBPF depend on kernel versions and architectures. ⚫ For example, eBPF tracing programs (fentry/fexit/fmod_ret/lsm) on arm64 was not supported before introducing ftrace direct call support(v6.4). ✓ https://lore.kernel.org/bpf/[email protected]/ ⚫ Observe eBPF utilization in prod ⚫ If you want to load it in prod, we should observe not only applications but eBPF programs. ⚫ Understand Linux Kernel internals for eBPF ⚫ Ex1. Memory leak in bpffs ⚫ Ex2. The behavior of bpf_send_signal ⚫ Ex3. uprobes in a separated mount namespace 8
kernel facilities for eBPF ⚫ Kernel Configuration for eBPF Features ⚫ bcc/docs/kernel_config.md at master · iovisor/bcc · GitHub ⚫ The list of such program types supported in the kernel ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ The list of program types and supported helper functions ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ How to inspect eBPF programs in your system on the fly ⚫ Use bpftool ⚫ Especially, bpftool-feature shows the the running kernel about eBPF- related parameters 9
tradeoffs ⚫ Decide which kernel facilities are actually needed for eBPF ⚫ Modify the kernel parameters ✓ Ex. /proc/sys/net/core/bpf_jit_harden ⚫ Check bpf_override_return() (CONFIG_BPF_KPROBE_OVERRIDE) ✓ Use case : chaos engineering tools ⚫ Restrict bpf_probe_write_user() using LSM Lockdown ✓ bpf_probe_write_user() can overwrite the user memory. ⚫ Maybe, we cannot disable configurations depending on most eBPF-based tools(Especially systemd). ⚫ CONFIG_BPF_SYSCALL ⚫ CONFIG_CGROUP_BPF ⚫ If the facility is experimental, we can disable it. ⚫ CONFIG_BPFILTER 10
eBPF utilization in prod (1/3) ⚫ bpftool is useful to inspect your system and eBPF programs. ⚫ If you use systemd, you can see any eBPF programs. ⚫ List eBPF programs attached to tracing facilities ⚫ # bpftool perf ⚫ List eBPF programs attached to all cgroups ⚫ # bpftool cgroup tree 11 /sys/fs/cgroup/system.slice/systemd-oomd.service 13 cgroup_inet_ingress multi 12 cgroup_inet_egress multi 11 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-resolved.service 14 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-timesyncd.service 15 cgroup_device multi
eBPF utilization in prod (3/3) ⚫ A. Cilium ⚫ Some tools give names to their BPF programs with the prefix. ✓Ex. Cilium : cil_ ⚫ bpftool is actually useful but we need more information about each eBPF program. 13
leak in BPFFS ⚫ BPFFS : BPF File System ⚫ A user space process can pin a BPF program or map in BPFFS. ⚫ We experienced the below issue about memory leak in BPFFS when we tried OpenTelemetry Auto Instrumentation using eBPF. ⚫ Call the Cleanup method of bpffs to remove the bpf fs after instrumen… by RonFed · Pull Request #347 · open- telemetry/opentelemetry-go-instrumentation · GitHub ⚫ You can show the pinned paths in BPFFS. ✓# bpftool prog show --bpffs ⚫ But how to detect the memory leak of BPFFS in other BPFFS instances?? ⚫ Ex1. Dedicated BPFFS instance ✓ See https://lpc.events/event/11/contributions/933/ ⚫ Ex2. BPF token will use BPFFS inside each mount namespace. 14
behavior of bpf_send_signal (1/2) ⚫ The bpf_send_signal() which is one of bpf-helper functions helps to send signals from kernel space. ⚫ This function is used for security observability. ⚫ For example, Tetragon tries to kill malicious processes by sending a SIGKILL using bpf_send_signal() synchronously. Malicious Process Kernel Attack SIGKILL bpf_send_signal() from your eBPF program 15
behavior of bpf_send_signal (2/2) ⚫ Q. If I tried to stop linkat(2) by bpf_send_signal(), the process is killed but the new link file is created. ⚫ A. The kernel checks the flag of signals before returning to user space. ⚫ Some kernel components check signals in fatal_signal_pending(). ✓For example, when the page cache is written back to storage in generic_perform_write(), fatal_signal_pending() is executed. ⚫ But it depends on the kernel side implementation. ⚫ After linkat(2) is done, the process is killed. 16
in a separated mount namespace ⚫ Some libbpf-based tools could not register uprobes in a container environment correctly. ⚫ libbpf-tools/gethostlatency: Resolve the path of libc for different namespaces by KentaTada · Pull Request #4785 · iovisor/bcc · GitHub ⚫ libbpf-tools: support to find symbols in different mount namespace by ethercflow · Pull Request #4854 · iovisor/bcc · GitHub ⚫ When you try to register uprobes, you need ⚫ Inode of the target binary file ⚫ Offset in the target binary file ⚫ Because the path is different among mount namespaces, we cannot register uprobes in the kernel. 17
challenges for us ⚫ First of all, we want to know capabilities of eBPF for our use cases. ⚫ From the perspective of our systems, we should consider ⚫ arm64 support ⚫ Security ⚫ Deploy ⚫ License ⚫ Without Kubernetes … 18
takeaways ⚫ Deep kernel knowledges are important to detect and prevent problems. ⚫ To integrate eBPF-based technologies into existing systems, we need a lot of knowledges about not only kernel space but user space. ⚫ Collaboration among diverse companies is essential to improve eBPF technologies. 19