$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#29 “I’m Not Dead Yet! The Role of the Operatin...
Search
cafenero_777
June 19, 2023
Technology
0
140
#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”
HotOS '19
https://dl.acm.org/doi/10.1145/3317550.3321422
cafenero_777
June 19, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
510
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
120
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
140
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
98
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
71
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
140
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
54
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
250
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
300
Other Decks in Technology
See All in Technology
事業部のプロジェクト進行と開発チームの改善の “時間軸" のすり合わせ
konifar
9
2.8k
mablでリグレッションテストをデイリー実行するまで #mablExperience
bengo4com
0
470
安いGPUレンタルサービスについて
aratako
1
1.6k
Oracle Cloud Infrastructure:2025年11月度サービス・アップデート
oracle4engineer
PRO
1
110
Databricksによるエージェント構築
taka_aki
1
120
“決まらない”NSM設計への処方箋 〜ビットキーにおける現実的な指標デザイン事例〜 / A Prescription for "Stuck" NSM Design: Bitkey’s Practical Case Study
bitkey
PRO
1
340
オープンデータの内製化から分かったGISデータを巡る行政の課題
naokim84
2
1.3k
Capture Checking / Separation Checking 入門
tanishiking
0
110
HIG学習用スライド
yuukiw00w
0
110
Uncertainty in the LLM era - Science, more than scale
gaelvaroquaux
0
470
モバイルゲーム開発におけるエージェント技術活用への試行錯誤 ~開発効率化へのアプローチの紹介と未来に向けた展望~
qualiarts
0
280
なぜ使われないのか?──定量×定性で見極める本当のボトルネック
kakehashi
PRO
1
760
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Faster Mobile Websites
deanohume
310
31k
Embracing the Ebb and Flow
colly
88
4.9k
The World Runs on Bad Software
bkeepers
PRO
72
12k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
For a Future-Friendly Web
brad_frost
180
10k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Transcript
Research Paper Introduction #29 “I’m Not Dead Yet! The Role
of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1
Agenda • ରจ • ֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators
in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2
ରจ • I’m Not Dead Yet! The Role of the
Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDKߴ͕ͩநԽΛࡴ͢ • ৽͍͠I/OநԽ:
DemikernelͷఏҊ • ಡ͏ͱͨ͠ཧ༝ • NWߴԽͷͲͷํʁ • ۙະདྷͷ: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4
1. Introduction 5 • աڈ10ͷαʔόI/OߴԽ V.S. CPUੑೳ • TCP-o ff
l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛݮ • ػೳఏڙ͢Δ͕ɺநԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ࢄϝϞϦɾࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠
2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass
• KernelΦʔόʔϔουۃখͰ࠷ͷύέοτసૹΛࢦ͢ • I/Fػೳଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉͷػೳՃɾσόΠεຖʹػೳՃ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾԽٕज़(SR-IOV)Ͱ࣮ • RDMA: verbs I/Frdmacm I/F (~socket)༷͋Δ͕ɾɾɾ • FPGA: ԿͰͰ͖Δ͕࣮༻ੑuse case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
3. Evolving the Datacenter OS for Kernel Bypass 7 •
UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹڞ༗ɾଟॏԽͷΈ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ͕ແ͍ͷͰ࠶࣮ • ޮతͳநԽ • I/Oॲཧ͕͔ͬͨࠒͷઃܭʢͷ໊ʣV.S. ݱɿRedisreadͰ2us • நԽͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/FσόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ •
C: network/ fi le open, ͯ͘ྑ͍ -> طଘKernel • D: network/storage/memoryͷread/write -> LibOS + accelerator • I/O queueͱͯ͠நԽ • HWී௨queueΛར༻ -> ͜ΕΛͦͷ··நԽ • atomic data unitͱͯ͠ѻ͑Δʢ༨ͳͪൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧநԽKernel-BypassϨΠϠʔ
4. The Demikernel (2/3) 9 • Syscall interface • C:
socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet typeͰ fi lter(): BPF frameworkΛఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌʹԠͯ͡I/OΩϡʔΛ͏ • C: map(): P4తͳෳࡶͳpktॲཧ࣮Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch
4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ
• wait_*()͕σʔλΛฦ͢->ଞͷsyscallݺͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొΛߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐΛͳΔ͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢແ͠ -> όοϑΝมߋʢwriteʣI/Oͭඞཁ͋Γ • DCར༻Ͱແ͠ɺͱ͍͏ओுɻྫɿRedisͰput requestຖʹόοϑΝׂޙɺσʔλߏମͰͦͷϙΠϯλʹࢦఆ
5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆػೳΛLibOSͰิʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍߹શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱɾɾɾʁ
• Network Protocols • I/OͷΑΓ൚༻తͳdata unitׂΛࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCPHTTPSͳͲʣͳΒड৴ଆͰ࠶ߏͰ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ͏ʹΦʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11
6. Related Work • OS • Arraakis, IXΑΓநͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ
-> NWελοΫͳͲͷOSػೳແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋ͰඇޮԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰΔɺ੍͘͠ޚΛOS͔Β֎͢ํʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛͬͨϦϞʔτϝϞϦͷϨΠςϯγʔΞϓϦέγϣʔϯ 12
7. Conclusion • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ 13
ࡾߦ·ͱΊ 14 • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ
EoP 15