Upgrade to Pro — share decks privately, control downloads, hide ads and more …

eBPF-based Process Lifecycle Monitoring

eBPF-based Process Lifecycle Monitoring

eBPF-based Process Lifecycle Monitoring
- Introduction to Tetragon Implementation -

Yuki Nakamura

March 15, 2025
Tweet

More Decks by Yuki Nakamura

Other Decks in Technology

Transcript

  1. eBPF-based Process Lifecycle Monitoring - Introduction to Tetragon Implementation -

    Yuki Nakamura March 15, 2025 Cloud Native Community Japan - eBPF Japan Meetup #3
  2. Whoami: Yuki Nakamura 👨‍💻 Platform Engineer ex-IBM Group, Mapbox Tech

    Blog Container & Kubernetes related tools (ArgoCD, Buildkit, etc.) eBPF (Tetragon, Aya) 🐝 My Journey with eBPF Motivated by Documentary: Unlocking the Kernel The story of eBPF code being merged into the Linux Kernel codebase The growing popularity of eBPF Notable quote: "This is like putting Javascript into the kernel." - Brendan Gregg Tetragon Contributing to the Tetragon Project Tetragon-mini: Learning eBPF by rewriting Tetragon in Rust
  3. Thoughts🤔 It’s impressive, but what is it useful for? What

    is Tetragon designed to do? How is Process Lifecycle Monitoring implemented? What kernel event hooks are being used? What code is written on the eBPF side and user space side?
  4. Agenda 1. Overview of Tetragon and Use Cases Runtime Enforcement

    Security Observability Analysis of collected Process Lifecycle data 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Tetragon code explanation
  5. Questions🙋‍♀️🙋‍♂️ Are you familiar with the Tetragon project? Have you

    used Tetragon before? Are you using Tetragon in a production environment?
  6. Tetragon:Overview CNCF project, subproject of Cilium Written in C (eBPF)

    and Go (UserSpace), similar to Cilium v1.0 released in November 2023 2023-11-01: v1.0 2024-04-29: v1.1 2024-09-05: v1.2 2024-12-13: v1.3 (The code I’ll introduce today is from v1.3) In one sentence, Tetragon is… eBPF-based Security Observability & Runtime Enforcement Tool
  7. Tetragon:Runtime Enforcement A mechanism to instantly control syscalls that match

    certain rules within kernel space Tracing Policy (rules): Defines kernel events to trace and actions to take when conditions are met Example 1: Kill all sys_write calls attempting to write to /etc/passwd, except those with PID 0 or 1 Example 2: Prohibit execution of specific binary files Using eBPF allows processing to be completed without transferring events to user space. This approach enables low-latency and reliable security policy enforcement. Kernel Event eBPF Map Syscall Event eBPF Program eBPF Program eBPF Program Kill / Override eBPF Program Set up eBPF Programs/Maps Tetragon Agent Tetra CLI Tracing Policy Process eBPF Map eBPF Map
  8. Tetragon:Security Observability Real-time observation and analysis of security-related events in

    the kernel Event examples: File Access, TCP Connection Event, Process Lifecycle (execution/termination), etc. eBPF programs detect events and transfer them to the user space Tetragon Agent via eBPF Maps. Any collector, storage/analytics tool can be used for storage and analysis. Storage/AnalyticsTool Kernel Event eBPF Map Syscall Event eBPF Program eBPF Program eBPF Program eBPF Program Set up eBPF Programs/Maps Tetragon Agent Tetra CLI Tracing Policy Process eBPF Map eBPF Map Grafana loki S3 Collector fluentd optl Athena tetragon.log
  9. Analysis 1: Finding Processes with Elevated Privileges Searching for processes

    with CAP_SYS_ADMIN . 1. Search events in tetragon.log(JSONL) using DuckDB↩︎ [1]
  10. Analysis 2: Detecting Suspicious Shell Execution Searching for processes executed

    from shells. Access paths can also be understood by recursively searching parent processes of the shell.
  11. Agenda 1. Overview of Tetragon and Use Case Runtime Enforcement

    Security Observability Analysis of collected Process Lifecycle 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Process data structure TGID and PID Process Management Syscalls Tetragon code explanation
  12. Linux Basics: task_struct task_struct is the data structure in the

    Linux kernel that manages each process (or thread) Linux: include/linux/sched.h Tetragon collects process information from this task_struct eBPF helper function bpf_get_current_task() : Gets a pointer to the task_struct for the current process (thread) struct task_struct { pid_t pid; pid_t tgid; char comm[TASK_COMM_LEN]; // プロセスのコマンド名 struct nsproxy *nsproxy; // Namespace struct mm_struct *mm; // プロセスが使用するユーザ空間のメモリ管理情報へのポインタ ... struct task_struct *task = bpf_get_current_task(); get_namespaces(&event->ns, task); // Get namespace information from task_struct. get_namespace is a function defined wit
  13. Linux: TGID and PID TGID is the process identifier. PID

    is the thread identifier. Multi Thread task_struct tgid 200 pid 200 comm binary_2 task_struct tgid 200 pid 201 comm binary_2 task_struct tgid 200 pid 202 comm binary_2 Single Thread task_struct tgid 100 pid 100 comm binary_1 ⚠️PID is not a process identifier⚠️ Tetragon monitors events at the process level. Creation/deletion of threads (multithreading) within a process is ignored. eBPF helper function bpf_get_current_pid_tgid() : Gets pid and tgid u64 pid_tgid = bpf_get_current_pid_tgid(); u32 tgid = pid_tgid >> 32; // Get TGID from upper 32 bits u32 pid = pid_tgid & 0xFFFFFFFF; // Get PID from lower 32 bits
  14. Linux: Process Management Syscall Processes are created, executed, and terminated

    through the following steps: 1. The parent process creates a child process by calling fork() , clone() , or similar syscalls. 2. The child process executes a program by calling execve() or similar syscalls. 3. When execution is complete, the child process terminates by calling exit() or similar syscalls. Parent Child fork() execve() exit() wait()
  15. Example: Syscalls During ls Command Execution The syscalls traced when

    bash executes the ls command are as follows. Terminal1 (bash, PID=23167) Terminal2 The following syscalls were called: 1. Creation: clone() 2. Execution: execve() 3. Termination: exit_group() ls -la strace -fp 23167 2>&1 | grep -e clone -e execve -e exit clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 23895 attached [pid 23895] execve("/usr/bin/ls", ["ls", "--color=auto", "-la"], 0xbb5be6b20cc0 * 24 vars /) = 0 [pid 23895] exit_group(0) = ? [pid 23895] +++ exited with 0 +++
  16. Agenda 1. Overview of Tetragon and Use Case Runtime Enforcement

    Security Observability Analysis of collected Process Lifecycle 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Process data structure TGID and PID Process Management Syscalls Tetragon code explanation Fork Execve Exit
  17. Tetragon: Process Lifecycle Monitoring - Overview Attach eBPF programs that

    create Process Lifecycle Events to hooks of each Process Management Syscall. The eBPF programs transfer the created events to user space via eBPF Maps. User Space Kernel Space Exit-related Syscall eBPF Map Fork-related Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent
  18. Fork: eBPF Program and Hook Point The eBPF Program with

    section name: kprobe/wake_up_new_task is attached to the kprobe of wake_up_new_task Tetragon UserSpace: base.go User Space Kernel Space Tetragon Agent fork-related syscalls event_wake_up_new_task kprobe wake_up_new_task perf_event_array tcpmon_map 47 Fork = program.Builder( 48 "bpf_fork.o", // the name of the BPF object file 49 "wake_up_new_task", // the hook point 50 "kprobe/wake_up_new_task", // the program section name 51 "kprobe_pid_clear", // the name of pin 52 "kprobe", // the type of BPF program 53 ).SetPolicy(basePolicy)
  19. Fork: Hook Point (wake_up_new_task) wake_up_new_task is a function called within

    kernel_clone(), which is the main routine of fork Linux: kernel/fork.c -> kernel_clone() pid_t kernel_clone(struct kernel_clone_args *args) { struct task_struct *p; wake_up_new_task(p); ...
  20. Fork: eBPF Program Assembles the Clone Event and writes it

    to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_fork.c Mechanism to ignore thread creation and deletion within processes: Since wake_up_new_task is also called when creating a thread, it checks if a Clone Event has already been created with the same TGID, and only creates a new one if it hasn’t been created yet. curr = execve_map_get(tgid); if (curr->key.ktime != 0) // Check whether the event for the tgid has already been created. return 0; ``` --> 23 __attribute__((section("kprobe/wake_up_new_task"), used)) int 24 BPF_KPROBE(event_wake_up_new_task, struct task_struct *task) 25 { 26 struct msg_clone_event msg; 27 ... 28 perf_event_output_metric(ctx, MSG_OP_CLONE, &tcpmon_map, 29 BPF_F_CURRENT_CPU, &msg, msg_size); // Write msg_clone_event to tcpmon_map
  21. Execve User Space Kernel Space Exit-related Syscall eBPF Map Fork-related

    Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent
  22. Execve: eBPF Program and Hook Point The eBPF Program with

    section name: tracepoint/sys_execve is attached to the tracepoint sched/sched_process_exec . Tetragon UserSpace: base.go Kernel Space Tail Call event_execve execve_send Tail Call execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map 23 Exit = program.Builder( 24 config.ExecObj(), // the name of the BPF object file 25 "sched/sched_process_exec", // the hook point 26 "tracepoint/sys_execve", // the program section name 27 "event_execve", // the name of pin 28 "execve", // the type of BPF program 29 ).SetPolicy(basePolicy)
  23. Execve: Hook Point (sched/sched_process_exec) The tracepoint: sched/sched_process_exec is triggered when

    a new process is executed. Mechanism to ignore thread creation and deletion within processes: When a thread is created within a process, sched/sched_process_exec is not triggered. Reference: When writing eBPF Programs for tracepoints, it’s important to check the format of data available in that tracepoint. This can be confirmed using the following command. cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/format name: sched_process_exec ID: 267 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__data_loc char[] filename; offset:8; size:4; signed:0; field:pid_t pid; offset:12; size:4; signed:1; field:pid_t old_pid; offset:16; size:4; signed:1; print fmt: "filename=%s pid=%d old_pid=%d", __get_str(filename), REC->pid, REC->old_pid
  24. Execve: eBPF Program Assembles the Execve Event (msg_exit) and writes

    it to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_execve_event.c event_execve execve_send __attribute__((section("tracepoint/sys_execve"), used)) int event_execve(struct trace_event_raw_sched_process_exec *ctx) { struct task_struct *task = (struct task_struct *)get_current_task(); char *filename = (char *)ctx + (_(ctx->__data_loc_filename) & 0xFFFF); // Use __data_loc_filename in ctx struct msg_execve_event *event; __attribute__((section("tracepoint"), used)) int execve_send(void *ctx __arg_ctx) { // Write msg_execve_event to tcpmon_map perf_event_output_metric(ctx, MSG_OP_EXECVE, &tcpmon_map, BPF_F_CURRENT_CPU, event, size);
  25. Execve: Tail Call Execve Event processing uses three sequential eBPF

    Programs connected by Tail Calls 1. event_execve: Assembles the Execve Event 2. execve_rate: Suppresses monitoring when a large number of events occur per cgroup(Event throttling) 3. execve_send: Writes the Execve Event to the eBPF Map Kernel Space Tail Call event_execve execve_send Tail Call execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map Benefits of introducing Tail Call: Separation of logic Avoiding eBPF Verifier’s program size limitation Reduction of stack usage (maximum 512 bytes) 1. Until v5.2, the instruction limit was 4k and the complexity limit was 128k.Afterwards, these limits were raised to 1M.↩︎ [1]
  26. Tips: Data Sharing Between Tail Calls Data cannot be passed

    when making a Tail Call. As a solution to this, eBPF Maps are used to share Event data between eBPF Programs. Tetragon eBPF: process.h Kernel Space Tail Call read/write event_execve read/write execve_send Tail Call read/write execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map Storage for sharing states PerCpuArray CPU:1 CPU:n PerCpuArray ... 360 struct { 361 __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 362 __uint(max_entries, 1); 363 __type(key, __u32); 364 __type(value, struct msg_execve_event); 365 } execve_msg_heap_map SEC(".maps");
  27. Exit User Space Kernel Space Exit-related Syscall eBPF Map Fork-related

    Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent
  28. Exit: eBPF Program and Hook Point The eBPF Program with

    section name: kprobe/acct_process is attached to the kprobe of acct_process Tetragon UserSpace: base.go User Space Kernel Space Tetragon Agent exit-related syscalls event_exit_acct_process kprobe acct_process perf_event_array tcpmon_map 39 Exit = program.Builder( 40 "bpf_exit.o", // the name of the BPF object file 41 "acct_process", // the hook point 42 "kprobe/acct_process", // the program section name 43 "event_exit", // the name of pin 44 "kprobe", // the type of BPF program 45 ).SetPolicy(basePolicy)
  29. Exit: Hook Point (acct_process) The acct_process function is called within

    do_exit() when a Thread Group is removed Linux: kernel/exit.c -> do_exit() Mechanism to ignore thread creation and deletion within processe: acct_process runs only once when a process terminates. For kernels without acct_process, disassociate_ctty is used instead. Reference: Previously, the tracepoint sched/sched_process_exit or kprobe kprobe/__put_task_struct was used. tetragon: Switch exit tracepoint to __put_task_struct kprobe #558 tetragon: Hook exit sensor on acct_process #1509 void __noreturn do_exit(long code) { if (group_dead) acct_process();
  30. Exit: eBPF Program Assembles the Exit Event (msg_exit) and writes

    it to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_exit.c kprobe/acct_process section Tetragon eBPF: bpf_exit.h 47 __attribute__((section("kprobe/acct_process"), used)) int 48 event_exit_acct_process(struct pt_regs *ctx) 49 { 50 __u64 pid_tgid = get_current_pid_tgid(); 51 52 event_exit_send(ctx, pid_tgid >> 32); 53 return 0; 54 } FUNC_INLINE void event_exit_send(void *ctx, __u32 tgid) { struct msg_exit *exit; exit->info.tid = tgid; ... perf_event_output_metric(ctx, MSG_OP_EXIT, &tcpmon_map, BPF_F_CURRENT_CPU, exit, size); // Write msg_exit to tcpmon_map
  31. Process Lifecycle Monitoring - Detailed Implementation Process Lifecycle Monitoring is

    achieved using 3 hook points, 5 eBPF programs, and multiple eBPF maps User Space Kernel Space exit-related syscalls Tail Call event_execve perf_event_array tcpmon_map Tetragon Agent Tetra CLI fork-related syscalls event_exit_acct_process execve_send Tail Call execve_rate event_wake_up_new_task execve-related syscalls kprobe wake_up_new_task trecepoint sched_process_exec kprobe acct_process
  32. Wrap up 1. Explained Tetragon’s Runtime Enforcement and Security Observability

    2. Explained Linux fundamentals (task_struct, TGID and PID, Process-related Syscalls) 3. Introduced portions of Tetragon/Kernel code (Hook Points and eBPF Programs for Fork/Execve/Exit) 4. Presented eBPF tips (Tracepoint Data Format, Tail Call, data sharing using Per-CPU Maps)