Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#29 “I’m Not Dead Yet! The Role of the Operatin...
Search
cafenero_777
June 19, 2023
Technology
0
140
#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”
HotOS '19
https://dl.acm.org/doi/10.1145/3317550.3321422
cafenero_777
June 19, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
470
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
120
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
120
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
94
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
60
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
cafenero_777
1
130
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
46
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
230
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
260
Other Decks in Technology
See All in Technology
プロダクトエンジニアリング組織への歩み、その現在地 / Our journey to becoming a product engineering organization
hiro_torii
0
130
GeminiとNotebookLMによる金融実務の業務革新
abenben
0
230
本が全く読めなかった過去の自分へ
genshun9
0
440
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
5
490
TechLION vol.41~MySQLユーザ会のほうから来ました / techlion41_mysql
sakaik
0
180
Javaで作る RAGを活用した Q&Aアプリケーション
recruitengineers
PRO
1
110
Fabric + Databricks 2025.6 の最新情報ピックアップ
ryomaru0825
1
140
How Community Opened Global Doors
hiroramos4
PRO
1
120
低レイヤを知りたいPHPerのためのCコンパイラ作成入門 完全版 / Building a C Compiler for PHPers Who Want to Dive into Low-Level Programming - Expanded
tomzoh
4
3.2k
あなたの声を届けよう! 女性エンジニア登壇の意義とアウトプット実践ガイド #wttjp / Call for Your Voice
kondoyuko
4
450
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
4
520
Node-REDのFunctionノードでMCPサーバーの実装を試してみた / Node-RED × MCP 勉強会 vol.1
you
PRO
0
110
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
Adopting Sorbet at Scale
ufuk
77
9.4k
Six Lessons from altMBA
skipperchong
28
3.8k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Code Reviewing Like a Champion
maltzj
524
40k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
17
940
Done Done
chrislema
184
16k
Typedesign – Prime Four
hannesfritz
42
2.7k
4 Signs Your Business is Dying
shpigford
184
22k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
RailsConf 2023
tenderlove
30
1.1k
Transcript
Research Paper Introduction #29 “I’m Not Dead Yet! The Role
of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1
Agenda • ରจ • ֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators
in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2
ରจ • I’m Not Dead Yet! The Role of the
Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3
֓ཁͱಡ͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDKߴ͕ͩநԽΛࡴ͢ • ৽͍͠I/OநԽ:
DemikernelͷఏҊ • ಡ͏ͱͨ͠ཧ༝ • NWߴԽͷͲͷํʁ • ۙະདྷͷ: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4
1. Introduction 5 • աڈ10ͷαʔόI/OߴԽ V.S. CPUੑೳ • TCP-o ff
l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛݮ • ػೳఏڙ͢Δ͕ɺநԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ࢄϝϞϦɾࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠
2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass
• KernelΦʔόʔϔουۃখͰ࠷ͷύέοτసૹΛࢦ͢ • I/Fػೳଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉͷػೳՃɾσόΠεຖʹػೳՃ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾԽٕज़(SR-IOV)Ͱ࣮ • RDMA: verbs I/Frdmacm I/F (~socket)༷͋Δ͕ɾɾɾ • FPGA: ԿͰͰ͖Δ͕࣮༻ੑuse case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
3. Evolving the Datacenter OS for Kernel Bypass 7 •
UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹڞ༗ɾଟॏԽͷΈ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ͕ແ͍ͷͰ࠶࣮ • ޮతͳநԽ • I/Oॲཧ͕͔ͬͨࠒͷઃܭʢͷ໊ʣV.S. ݱɿRedisreadͰ2us • நԽͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/FσόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ •
C: network/ fi le open, ͯ͘ྑ͍ -> طଘKernel • D: network/storage/memoryͷread/write -> LibOS + accelerator • I/O queueͱͯ͠நԽ • HWී௨queueΛར༻ -> ͜ΕΛͦͷ··நԽ • atomic data unitͱͯ͠ѻ͑Δʢ༨ͳͪൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧநԽKernel-BypassϨΠϠʔ
4. The Demikernel (2/3) 9 • Syscall interface • C:
socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet typeͰ fi lter(): BPF frameworkΛఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌʹԠͯ͡I/OΩϡʔΛ͏ • C: map(): P4తͳෳࡶͳpktॲཧ࣮Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch
4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ
• wait_*()͕σʔλΛฦ͢->ଞͷsyscallݺͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొΛߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐΛͳΔ͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢແ͠ -> όοϑΝมߋʢwriteʣI/Oͭඞཁ͋Γ • DCར༻Ͱແ͠ɺͱ͍͏ओுɻྫɿRedisͰput requestຖʹόοϑΝׂޙɺσʔλߏମͰͦͷϙΠϯλʹࢦఆ
5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆػೳΛLibOSͰิʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍߹શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱɾɾɾʁ
• Network Protocols • I/OͷΑΓ൚༻తͳdata unitׂΛࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCPHTTPSͳͲʣͳΒड৴ଆͰ࠶ߏͰ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ͏ʹΦʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11
6. Related Work • OS • Arraakis, IXΑΓநͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ
-> NWελοΫͳͲͷOSػೳແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋ͰඇޮԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰΔɺ੍͘͠ޚΛOS͔Β֎͢ํʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛͬͨϦϞʔτϝϞϦͷϨΠςϯγʔΞϓϦέγϣʔϯ 12
7. Conclusion • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ 13
ࡾߦ·ͱΊ 14 • I/Oੑೳେ෯্ʹ͍ͭͨ͘ΊʹKernel-bypass acceleratorsΛ͏ • Kernel-bypassͷͨΊOS/kernelͷػೳ͕͑ͳ͍ɺI/OநԽ͕ग़དྷͳ͍ • ্هͷΪϟοϓΛຒΊΒΕΔLibOSΛઃܭ͠ɺI/OநԽΛٞ
EoP 15