Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tracing the Containers (mainly about eBPF)
Search
KONDO Uchio
November 28, 2019
Technology
970
6
Share
Tracing the Containers (mainly about eBPF)
Presented @ CNDK 2019
KONDO Uchio
November 28, 2019
More Decks by KONDO Uchio
See All by KONDO Uchio
大規模レガシーテストを 倒すための CI基盤の作り方 / #CICD2023
udzura
5
2.6k
Ruby x BPF in Action / RubyKaigi 2022
udzura
0
310
Narrative of Ruby & Rust
udzura
0
270
開発者生産性指標の可視化 / pepabo-four-keys
udzura
3
1.8k
Talk of RBS
udzura
0
500
Re: みなさん最近どうですか? / FGN tech meetup in 2021
udzura
0
850
Dockerとやわらかい仮想化 - ProSec-IT/SECKUN 2021 edition -
udzura
2
810
Device access filtering in cgroup v2
udzura
1
1k
"Story of Rucy" on RubyKaigi takeout 2021
udzura
0
920
Other Decks in Technology
See All in Technology
地元にいないローカルオーガナイザーの立ち回り
uvb_76
1
390
自称宇宙最速で不合格となったAIP-C01にリベンジを果たすべくAIで問題集アプリを作ってみた。
yama3133
0
250
AI Adaptable なテストを整える工夫 / Ways to Make Your Tests AI-Adaptable
bitkey
PRO
2
190
AI時代の私の技術インプットとアウトプット術
tonkotsuboy_com
15
8k
AI駆動開発でなんでもハンズオン環境をつくってみた
yoshimi0227
0
180
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.7k
探して_入れて_作って_使う_Agent_Skills___LT.pdf
peintangos
2
110
イベントストーミングとKiroの仕様駆動開発で実現する要件の認識合わせプロセス
syobochim
7
980
速さだけじゃない! VoidZero ツールが移行先に選ばれる理由
mizdra
PRO
6
700
Claude Codeですべての日常業務を爆速化しよう!
minorun365
PRO
16
16k
20260528_生成AIを専属DSに_Howの次にすべきことを考える
doradora09
PRO
0
270
個人AIからチームAIへ:開発における品質と生産性の再設計
moongift
PRO
0
320
Featured
See All Featured
Bash Introduction
62gerente
615
210k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Context Engineering - Making Every Token Count
addyosmani
9
920
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
280
The Invisible Side of Design
smashingmag
302
52k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
Balancing Empowerment & Direction
lara
6
1.1k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.5k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
We Have a Design System, Now What?
morganepeng
55
8.2k
Prompt Engineering for Job Search
mfonobong
0
320
Transcript
audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,
Inc. #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura
Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
JapanContainerDays 2018.12 •CRIU
CNDF 2019 Spring
CNDT 2019 summer •cgroup v2 & PSI
Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)
•System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
Today
ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF
•Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
Tracing Your containers
Why tracing? •τϨʔεʹҎԼͷΑ͏ͳత͕͋Δ •ϩΪϯά: ෳࡶͳΞϓϦέʔγϣϯͰԿ͕͓͖͍ͯΔ͔Ѳ •ࠪɾηΩϡϦςΟ: ඞཁͳτϨʔεϩάΛग़͢͜ͱͰɺෆଌͷࣄଶ ͕͋ͬͨ߹ʹޙ͔Βௐ͕ࠪͰ͖Δɻ·ͨɺෆਖ਼ͳΞΫηεΛݕ Ͱ͖Δ͜ͱ͋Δ •σόοάɾύϑΥʔϚϯε:
୯७ͳΞϓϦέʔγϣϯϩάͰΘ͔Βͳ ͍༰Λ୳Δ
What to trace? Kubernetes/ API Host Linux Per-Container Apps (Networking)
Methodology
Kubernetes audit - orchestrator
Falco / sysdig - host, containers
Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳͷΛࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/
•ίϯςφʹಛԽͨࠪ͠ϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
Falco internal •ࠪ͢Δใͷιʔεେ͖͘ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are
actually built from the same source code •eBPF෦Ͱ͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
None
eBPF?
“Berkley Packet Filter” •ݩʑύέοτϑΟϧλͷख๏ͷจ (classic BPF, 1993) •Tcpdump ͷதͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:
Seccomp ͰΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘ (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
eBPF overview •BPFόΠτίʔυΛͭ͘Δ ʢ৭ʑͳํ๏Ͱ࡞Δʣ •ΧʔωϧͰݕࠪ͞ΕɺඞཁʹԠ͡JIT •ΧʔωϧͷΠϕϯτΛϓϩάϥϜ͕ऩू •BPF map ͱ͍͏໊લͷ Χʔωϧूੵମ͕͋Δʢͱͬͯߴʣ
From: https://www.atmarkit.co.jp/ait/articles/1811/21/news010_2.html
Tools •bpftrace(8) - ෦ͰeBPFΛ͏൚༻తτϨʔαʔ •DTraceݴޠͦͬ͘ΓͷεΫϦϓτͰτϨʔε༰Λهड़ •BCC - eBPF ͷػೳΛϥοϓͨ͠ϓϩάϥϜΛ࡞ΔͨΊͷϥΠϒϥϦ •Python,
Lua, C++ •Ruby ࣮ - RbBCC (࡞)
Existing Linux tracers Tool Ability Key sys call Invasivity gdb
ϓϩάϥϜͷεςοϓ࣮ߦɺ γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ ptrace(2) Large perf ύϑΥʔϚϯεΧϯλͳͲͷ ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ ूܭͱՄࢹԽ bpf(2) Smaller
Comparison to gdb/strace •gdb/strace ྆ํͱ伴ͱͳΔγεςϜίʔϧ ptrace(2) •Έ্ɺҰϓϩάϥϜΛࢭΊΔඞཁ͕͋Δ •ࢭΊ͍ͯΔ͔Βͦ͜ྫ͑ϨδελΛߋ৽ͨ͠ΓɺΑΓϓϩάϥϜͷ ڍಈʹ౿ΈࠐΜͩૢ࡞͕ՄೳͰ͋Δ ʮptraceγεςϜίʔϧೖʯ
https://itchyny.hatenablog.com/entry/2017/07/31/090000
Comparison to perf •perf tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ
•Ұํɺूܭɺྫ͑ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍ ʮ؍ଌऀޮՌʯ •eBPFΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ DTrace ʹ͍ۙɻ
None
eBPF and Kernel events
eBPF event source http://www.brendangregg.com/blog/2019-07-15/bpf-performance-tools-book.html
Important source for tracing •perf, ftrace, eBPF Ͱಉ͡ιʔεΛ͏ ʮperf, ftraceͷ͘͠Έʯ
http://mmi.hatenablog.com/entry/2018/03/04/052249
tracepoint •LinuxΧʔωϧʹɺ෦Ͱى͜Δ༷ʑͳΠϕϯτΛ τϨʔε͢ΔͨΊͷϑοΫϙΠϯτ͕Έࠐ·Ε͍ͯΔɻ •ͦΕΒΛ tracepoint ͱݺͿɻΧʔωϧͷཚػೳΛͬͨ࣌ͷΠϕϯ τͷྫ
kprobe •tracepointجຊతʹ͋Β͔͡ΊΧʔωϧ։ൃऀ͕༻ҙͨ͠ ϑοΫϙΠϯτ͔͠τϨʔεͰ͖ͳ͍ɻ •ࣗͰɺಛఆͷΧʔωϧؔͷݺͼग़͠ΛτϨʔε͍ͨ͠߹ kprobe Λ͏ɻόʔδϣϯɺΞʔΩςΫνϟͰҟͳΔ͜ͱʹҙ͢Δ
uprobe •ϢʔβۭؒͷϓϩάϥϜͷڍಈΛɺΧʔωϧଆͰ͍͔͚ΒΕΔ •uprobe ɺόΠφϦ୯Ґʢਖ਼֬ʹͦͷ࣮ߦϑΝΠϧͷinode୯Ґͱ ͷ͜ͱʣͰΠϕϯτΛొ͢Δඞཁ͕͋Δɻ •ྫ͑ɺόΠφϦͰݟ͍͑ͯΔؔΛొ͢Δ
USDT •User Statically Defined Tracepoint •ϢʔβϓϩάϥϜͷҙͷՕॴʹprobeΛֻ͚ɺΦʔόʔϔουগ ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖Δɻʢதͱͯ͠uprobeʹͳΔ༷ʣ
Others •perfͰ͏Α͏ͳϋʔυΣΞιϑτΣΞΧϯλͳͲeBPF͔ Βѻ͑Δɻ •bpftrace ͷϚχϡΞϧͰɺhardwareϓϩόΠμɺ softwareϓϩόΠ μɺϝϞϦͷwatchpointϓϩόΠμ͕ଘࡏ͢Δ
“Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠ͰΧʔωϧτϨʔεՄೳ (debugfs͔Βݟ͑Δͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo
"p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λͬͨίϯςφσόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
ping͕connectΛଧͭτϨʔε
OK, what is good with containers?
eBPF use case •Debugging HOST Linux itself •Syscalls or kernel
functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
Tracing kernel on containers •ίϯςφ༷ʑͳΧʔωϧػೳΛ͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑: `ip netns add/del`
•෦Ͱ copy_net_ns/cleanup_net ͱ͍͏ΧʔωϧؔΛݺͿ •͜ΕΒ͞Βʹ෦ͰΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ͍ͨˠ eBPF Ͱʂ
Demo (1)
Reference •ʮLinux Kernel: rtnl_mutex Λ࣌ؒ ϩοΫͯͬͨ͠͞ঢ়ଶΛ؍͢Δʯ •https://hiboma.hatenadiary.jp/entry/2019/10/29/123455 •ʢ༨ஊͰ͕͢hiboma͞Μͷ͓͔͛Ͱ /proc/$pid/stack
wchan ͷ͍ํΛ Ѳ͠·ͨ͠ʣ
Tracing Runtime •ʢ࡞ίϯςφHaconiwaͰʣҎԼΛܭଌͯ͠Έͨ •ίϯςφϥϯλΠϜͷىಈʙexecve͢Δ·Ͱͷ࣌ؒ •ίϯςφϥϯλΠϜͷىಈʙίϯςφ͕listen͢Δ·Ͱͷ࣌ؒ •USDTͱtracepointͷ Έ߹Θͤ
bpftrace script
bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄͷϑΥʔ Ϛοτʹมɻ •ͦͷ·· Textfile exporter
ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
ࡶʹ vfs_read ΛτϥοΫͨ͠ྫ
CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔
•Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
Demo (2) ͕࣌ؒͳ͍Ͱ͢ɺੋඇ͓͕͚Λʂ
Tracee •eBPFΛશ໘తʹ͏ίϯςφτϨʔα࣮ •෦ͰPID → NamespaceΛղܾͳͲ •bpftrace/BCC൚༻తͳͷͰɺ ಛԽͨ͠ػೳʹظ https://blog.aquasec.com/ebpf-tracing-containers
Conclusion
Happy publishing!
We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R
(WIP) •Systemd ͷ v2 default Խ (from 243)
What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware
OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2 (such as BPF_FUNC_get_current_cgroup_id, ...)
It should be “per-container” •Load Avarage •Memory usage •psutils, top,
vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup
Understand new feature to use new tools in a better
way