Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tracing the Containers (mainly about eBPF)
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
KONDO Uchio
November 28, 2019
Technology
980
6
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Tracing the Containers (mainly about eBPF)
Presented @ CNDK 2019
KONDO Uchio
November 28, 2019
More Decks by KONDO Uchio
See All by KONDO Uchio
大規模レガシーテストを 倒すための CI基盤の作り方 / #CICD2023
udzura
5
2.6k
Ruby x BPF in Action / RubyKaigi 2022
udzura
0
310
Narrative of Ruby & Rust
udzura
0
270
開発者生産性指標の可視化 / pepabo-four-keys
udzura
3
1.8k
Talk of RBS
udzura
0
500
Re: みなさん最近どうですか? / FGN tech meetup in 2021
udzura
0
860
Dockerとやわらかい仮想化 - ProSec-IT/SECKUN 2021 edition -
udzura
2
810
Device access filtering in cgroup v2
udzura
1
1k
"Story of Rucy" on RubyKaigi takeout 2021
udzura
0
920
Other Decks in Technology
See All in Technology
MUSUBI 田中裕一『AIと共に行う「しごとのリデザイン」- スモールバックオフィス編』AI Ops Lab #4
musubi
0
270
AI時代のコスト管理を考えよう〜明日から使える実践AWSノウハウ~
yoshimi0227
0
310
AIネイティブな開発のサプライチェーンリスク対策 〜激動の開発現場でリスクに立ち向かう〜【ZennFes】
cscengineer
PRO
2
140
FPC(フレキシブル)基板にZephyr実装してみた。
iotengineer22
0
120
新しいUbuntu/GNOMEが使いたいからXからWaylandへ移行頑張ってるの巻 2026-06-20
nobutomurata
0
150
Claude Codeをどのように キャッチアップしているか
oikon48
13
8.6k
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
230
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
100
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
5
1.1k
ACE-Step-1.5で見る 音楽生成AIのしくみと“破綻だけ直す”Retake機能の開発【zennfes spring 2026 登壇資料】
personabb
1
540
アジャイルな経理と Claude Code と経営の未来
kawaguti
PRO
3
160
Bucharest Tech Week 2026 - Reinventing testing practices in the AI era
edeandrea
PRO
1
170
Featured
See All Featured
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4.1k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
10k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
950
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.8k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
490
Are puppies a ranking factor?
jonoalderson
1
3.6k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
780
Transcript
audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,
Inc. #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura
Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
JapanContainerDays 2018.12 •CRIU
CNDF 2019 Spring
CNDT 2019 summer •cgroup v2 & PSI
Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)
•System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
Today
ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF
•Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
Tracing Your containers
Why tracing? •τϨʔεʹҎԼͷΑ͏ͳత͕͋Δ •ϩΪϯά: ෳࡶͳΞϓϦέʔγϣϯͰԿ͕͓͖͍ͯΔ͔Ѳ •ࠪɾηΩϡϦςΟ: ඞཁͳτϨʔεϩάΛग़͢͜ͱͰɺෆଌͷࣄଶ ͕͋ͬͨ߹ʹޙ͔Βௐ͕ࠪͰ͖Δɻ·ͨɺෆਖ਼ͳΞΫηεΛݕ Ͱ͖Δ͜ͱ͋Δ •σόοάɾύϑΥʔϚϯε:
୯७ͳΞϓϦέʔγϣϯϩάͰΘ͔Βͳ ͍༰Λ୳Δ
What to trace? Kubernetes/ API Host Linux Per-Container Apps (Networking)
Methodology
Kubernetes audit - orchestrator
Falco / sysdig - host, containers
Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳͷΛࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/
•ίϯςφʹಛԽͨࠪ͠ϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
Falco internal •ࠪ͢Δใͷιʔεେ͖͘ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are
actually built from the same source code •eBPF෦Ͱ͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
None
eBPF?
“Berkley Packet Filter” •ݩʑύέοτϑΟϧλͷख๏ͷจ (classic BPF, 1993) •Tcpdump ͷதͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:
Seccomp ͰΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘ (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
eBPF overview •BPFόΠτίʔυΛͭ͘Δ ʢ৭ʑͳํ๏Ͱ࡞Δʣ •ΧʔωϧͰݕࠪ͞ΕɺඞཁʹԠ͡JIT •ΧʔωϧͷΠϕϯτΛϓϩάϥϜ͕ऩू •BPF map ͱ͍͏໊લͷ Χʔωϧूੵମ͕͋Δʢͱͬͯߴʣ
From: https://www.atmarkit.co.jp/ait/articles/1811/21/news010_2.html
Tools •bpftrace(8) - ෦ͰeBPFΛ͏൚༻తτϨʔαʔ •DTraceݴޠͦͬ͘ΓͷεΫϦϓτͰτϨʔε༰Λهड़ •BCC - eBPF ͷػೳΛϥοϓͨ͠ϓϩάϥϜΛ࡞ΔͨΊͷϥΠϒϥϦ •Python,
Lua, C++ •Ruby ࣮ - RbBCC (࡞)
Existing Linux tracers Tool Ability Key sys call Invasivity gdb
ϓϩάϥϜͷεςοϓ࣮ߦɺ γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ ptrace(2) Large perf ύϑΥʔϚϯεΧϯλͳͲͷ ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ ूܭͱՄࢹԽ bpf(2) Smaller
Comparison to gdb/strace •gdb/strace ྆ํͱ伴ͱͳΔγεςϜίʔϧ ptrace(2) •Έ্ɺҰϓϩάϥϜΛࢭΊΔඞཁ͕͋Δ •ࢭΊ͍ͯΔ͔Βͦ͜ྫ͑ϨδελΛߋ৽ͨ͠ΓɺΑΓϓϩάϥϜͷ ڍಈʹ౿ΈࠐΜͩૢ࡞͕ՄೳͰ͋Δ ʮptraceγεςϜίʔϧೖʯ
https://itchyny.hatenablog.com/entry/2017/07/31/090000
Comparison to perf •perf tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ
•Ұํɺूܭɺྫ͑ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍ ʮ؍ଌऀޮՌʯ •eBPFΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ DTrace ʹ͍ۙɻ
None
eBPF and Kernel events
eBPF event source http://www.brendangregg.com/blog/2019-07-15/bpf-performance-tools-book.html
Important source for tracing •perf, ftrace, eBPF Ͱಉ͡ιʔεΛ͏ ʮperf, ftraceͷ͘͠Έʯ
http://mmi.hatenablog.com/entry/2018/03/04/052249
tracepoint •LinuxΧʔωϧʹɺ෦Ͱى͜Δ༷ʑͳΠϕϯτΛ τϨʔε͢ΔͨΊͷϑοΫϙΠϯτ͕Έࠐ·Ε͍ͯΔɻ •ͦΕΒΛ tracepoint ͱݺͿɻΧʔωϧͷཚػೳΛͬͨ࣌ͷΠϕϯ τͷྫ
kprobe •tracepointجຊతʹ͋Β͔͡ΊΧʔωϧ։ൃऀ͕༻ҙͨ͠ ϑοΫϙΠϯτ͔͠τϨʔεͰ͖ͳ͍ɻ •ࣗͰɺಛఆͷΧʔωϧؔͷݺͼग़͠ΛτϨʔε͍ͨ͠߹ kprobe Λ͏ɻόʔδϣϯɺΞʔΩςΫνϟͰҟͳΔ͜ͱʹҙ͢Δ
uprobe •ϢʔβۭؒͷϓϩάϥϜͷڍಈΛɺΧʔωϧଆͰ͍͔͚ΒΕΔ •uprobe ɺόΠφϦ୯Ґʢਖ਼֬ʹͦͷ࣮ߦϑΝΠϧͷinode୯Ґͱ ͷ͜ͱʣͰΠϕϯτΛొ͢Δඞཁ͕͋Δɻ •ྫ͑ɺόΠφϦͰݟ͍͑ͯΔؔΛొ͢Δ
USDT •User Statically Defined Tracepoint •ϢʔβϓϩάϥϜͷҙͷՕॴʹprobeΛֻ͚ɺΦʔόʔϔουগ ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖Δɻʢதͱͯ͠uprobeʹͳΔ༷ʣ
Others •perfͰ͏Α͏ͳϋʔυΣΞιϑτΣΞΧϯλͳͲeBPF͔ Βѻ͑Δɻ •bpftrace ͷϚχϡΞϧͰɺhardwareϓϩόΠμɺ softwareϓϩόΠ μɺϝϞϦͷwatchpointϓϩόΠμ͕ଘࡏ͢Δ
“Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠ͰΧʔωϧτϨʔεՄೳ (debugfs͔Βݟ͑Δͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo
"p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λͬͨίϯςφσόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
ping͕connectΛଧͭτϨʔε
OK, what is good with containers?
eBPF use case •Debugging HOST Linux itself •Syscalls or kernel
functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
Tracing kernel on containers •ίϯςφ༷ʑͳΧʔωϧػೳΛ͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑: `ip netns add/del`
•෦Ͱ copy_net_ns/cleanup_net ͱ͍͏ΧʔωϧؔΛݺͿ •͜ΕΒ͞Βʹ෦ͰΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ͍ͨˠ eBPF Ͱʂ
Demo (1)
Reference •ʮLinux Kernel: rtnl_mutex Λ࣌ؒ ϩοΫͯͬͨ͠͞ঢ়ଶΛ؍͢Δʯ •https://hiboma.hatenadiary.jp/entry/2019/10/29/123455 •ʢ༨ஊͰ͕͢hiboma͞Μͷ͓͔͛Ͱ /proc/$pid/stack
wchan ͷ͍ํΛ Ѳ͠·ͨ͠ʣ
Tracing Runtime •ʢ࡞ίϯςφHaconiwaͰʣҎԼΛܭଌͯ͠Έͨ •ίϯςφϥϯλΠϜͷىಈʙexecve͢Δ·Ͱͷ࣌ؒ •ίϯςφϥϯλΠϜͷىಈʙίϯςφ͕listen͢Δ·Ͱͷ࣌ؒ •USDTͱtracepointͷ Έ߹Θͤ
bpftrace script
bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄͷϑΥʔ Ϛοτʹมɻ •ͦͷ·· Textfile exporter
ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
ࡶʹ vfs_read ΛτϥοΫͨ͠ྫ
CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔
•Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
Demo (2) ͕࣌ؒͳ͍Ͱ͢ɺੋඇ͓͕͚Λʂ
Tracee •eBPFΛશ໘తʹ͏ίϯςφτϨʔα࣮ •෦ͰPID → NamespaceΛղܾͳͲ •bpftrace/BCC൚༻తͳͷͰɺ ಛԽͨ͠ػೳʹظ https://blog.aquasec.com/ebpf-tracing-containers
Conclusion
Happy publishing!
We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R
(WIP) •Systemd ͷ v2 default Խ (from 243)
What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware
OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2 (such as BPF_FUNC_get_current_cgroup_id, ...)
It should be “per-container” •Load Avarage •Memory usage •psutils, top,
vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup
Understand new feature to use new tools in a better
way