Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”

#29 “I’m Not Dead Yet! The Role of the Operating System in a Kernel-Bypass Era”

cafenero_777

June 19, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Research Paper Introduction #29 “I’m Not Dead Yet! The Role

    of the Operating System in a Kernel-Bypass Era” ௨ࢉ#84 @cafenero_777 2021/10/14 1
  2. Agenda • ର৅࿦จ • ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Kernel-Bypass Accelerators

    in the Datacenter 3. Evolving the Datacenter OS for Kernel Bypass 4. The Demikernel 5. Future Work 6. Related Work 7. CONCLUSION 2
  3. ର৅࿦จ • I’m Not Dead Yet! The Role of the

    Operating System in a Kernel-Bypass Era • Irene Zhang, Jing Liu, Amanda Austin, Michael Lowell Roberts, Anirudh Badam • Microsoft Research, University of Wisconsin, University of Texas • HotOS '19 • https://dl.acm.org/doi/10.1145/3317550.3321422 3
  4. ֓ཁͱಡ΋͏ͱͨ͠ཧ༝ • ֓ཁ • DCNW༻్ͰͷOS͸”ऴᖼ (demise)”͍ͯ͠Δʁʂ • RDMA/DPDK͸ߴ଎͕ͩந৅ԽΛࡴ͢ • ৽͍͠I/Oந৅Խ:

    DemikernelͷఏҊ • ಡ΋͏ͱͨ͠ཧ༝ • NWߴ଎ԽͷͲͷํ޲΁ʁ • ۙະདྷͷ࿩: library OS? • ΩϟονʔͳtitleͩͬͨͷͰ 4
  5. 1. Introduction 5 • աڈ10೥ͷαʔόI/Oߴ଎Խ V.S. CPUੑೳ • TCP-o ff

    l oad, SmartNIC/SR-IOV, Comp./Enc./ML on FPGA • kernel bypassٕज़ͰI/OΦʔόʔϔουΛ࡟ݮ • ػೳఏڙ͸͢Δ͕ɺந৅ԽϨΠϠʔ͕ແ͍ • ʢྫɿsocket, fi le, pipeʣ • ෼ࢄϝϞϦɾ෼ࢄετϨʔδ w/ RDMA • systemΛHWʹ߹ΘͤͯΧελϚΠζ -> େมʂ • OSΛͲ͏ม͑Δ΂͖͔ɻ৽OSΞʔΩςΫνϟDemikernelͰઃܭٞ͠࿦
  6. 2. Kernel-Bypass Accelerators in the Datacenter 6 • Kernel Bypass

    • KernelΦʔόʔϔου͸ۃখͰ࠷଎ͷύέοτసૹΛ໨ࢦ͢ • I/F΍ػೳ͸ଘࡏ͠ͳ͍ • ϓϩάϥϚ͕OSಉ౳ͷػೳ௥ՃɾσόΠεຖʹػೳ௥Ճ • ྫɿ • DPDK: جຊతͳI/OσόΠεػೳΛنఆ • Arrakis: HWԾ૝Խٕज़(SR-IOV౳)Ͱ࣮૷ • RDMA: verbs I/F΍rdmacm I/F (~socket)࢓༷͸͋Δ͕ɾɾɾ • FPGA: ԿͰ΋Ͱ͖Δ͕࣮༻ੑ͸use case࣍ୈɾɾɾ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
  7. 3. Evolving the Datacenter OS for Kernel Bypass 7 •

    UserۭؒͰͷIʗOॲཧ࠷దԽ • طଘLibrary OSʹ͸ڞ༗ɾଟॏԽͷ࢓૊Έ͕͋Δʢ͕ॏ͍ʣ • ಁաతϝϞϦ֬อʢi.e. DDIO, NIC<->LLCʣ౳͕ແ͍ͷͰ࠶࣮૷ • ޮ཰తͳந৅Խ • I/Oॲཧ͕௕͔ͬͨࠒͷઃܭʢͷ໊࢒ʣV.S. ݱ୅ɿRedis͸readͰ2us • ந৅Խͱੑೳͷڱؒ • طଘPOSIX APIҡ࣋͸ߋʹΦʔόʔϔου͕͔͔Δ • طଘLibrary OSͱͷػೳͷҧ͍ • طଘɿkernel I/F΍σόΠεػೳ͕ۉҰͰ͋Δલఏ • ࠓճɿKernel-Bypass framework (HWͱͷSW/kernelͷ”ྑ͍ͱ͜औΓ”తͳʣΛೖΕ͍ͨ https://www.dpdk.org/wp-content/uploads/sites/35/2017/04/DPDK-India2017-RamiaJain-ArchitectureRoadmap.pdf
  8. 4. The Demikernel (1/3) 8 • Architecture: C/D pathͷ෼཭ •

    C: network/ fi le open, ஗ͯ͘΋ྑ͍ -> طଘKernel • D: network/storage/memory΁ͷread/write -> LibOS + accelerator • I/O queueͱͯ͠ந৅Խ • HW͸ී௨queueΛར༻ -> ͜ΕΛͦͷ··ந৅Խ • atomic data unitͱͯ͠ѻ͑Δʢ༨෼ͳ଴ͪ͸ൃੜ͠ͳ͍ʣ • σόΠεʹґଘ͠ͳ͍ߴϨϕϧந৅ԽKernel-BypassϨΠϠʔ
  9. 4. The Demikernel (2/3) 9 • Syscall interface • C:

    socket(): queue descriptorΛฦ͢ (not fi le descriptor) • C: packet type౳Ͱ fi lter(): BPF frameworkΛ૝ఆ • C: merge(): I/OΩϡʔͷϚʔδ • C: sort(): ༏ઌ౓ʹԠͯ͡I/OΩϡʔΛ࢖͏ • C: map(): P4తͳෳࡶͳpktॲཧ΋࣮૷Ͱ͖ͦ͏ • D: push/pop • ૢ࡞ൣғͷࢦఆ • non-blockingॲཧ. wait_*()Ͱfetch
  10. 4. The Demikernel (3/3) 10 • qtoken: Ұͭͷqૢ࡞ຖʹݻ༗ • epollΛվળͰ͖Δ

    • wait_*()͕௚઀σʔλΛฦ͢->ଞͷsyscallݺ͹ͣʹʢۭৼΓʣࡁΉ • pop completion: pop͕׬ྃͨ͠Βthread͕ى͖Δɻbusy pollingཁΒͳ͍ • zero copy: • 1. ಁաతϝϞϦ֬อɿLibOS͕IOMMUϝϞϦొ࿥Λߦ͏ • 2. ΞϓϦέʔγϣϯͱI/OσόΠεؒͰͷڞ༗ϝϞϦͷௐ੔ΛͳΔ΂͘ݮΒ͢ • Free protect: ΞϓϦόοϑΝ։์໋ྩ -> LibOS͕I/Oऴྃ·Ͱ଴͔ͬͯΒ։์ • ʢैདྷಉ༷ʣॻ͖ࠐΈอޢ͸ແ͠ -> όοϑΝมߋʢwriteʣ͸I/O଴ͭඞཁ͋Γ • DCར༻Ͱ͸໰୊ແ͠ɺͱ͍͏ओுɻྫɿRedisͰ͸put requestຖʹόοϑΝׂ౰ޙɺσʔλߏ଄ମͰͦͷϙΠϯλʹࢦఆ
  11. 5. Future Work • OS Design • ಛఆΞΫηϥϨʔλͷෆ଍ػೳΛLibOSͰิ׬ʢDPDKͳΒNWελοΫશൠʣ • ΞΫηϥϨʔλͷछྨ͕ଟ͍৔߹͸શ෦LibOS͕ίʔυΛ࣋ͭʢʂʣLibOSͱ͸ɾɾɾʁ

    • Network Protocols • I/OͷΑΓ൚༻తͳdata unit෼ׂΛ໨ࢦ͢ɻ • طଘͷϑϨʔϜϫʔΫʢTCP΍HTTPSͳͲʣͳΒड৴ଆͰ΋࠶ߏ੒Ͱ͖Δ͕ɺ൚༻ੑ੍͕ݶ͞Εͯ͠·͏ • File System and Storage • طଘFS (ext4ͳͲ)ΛLibOS (γϯάϧΞϓϦέʔγϣϯ)Ͱ࢖͏ʹ͸Φʔόʔϔου͕େ͖͗͢ • ΞΫηϥϨʔλʹదͨ͠FS? 11
  12. 6. Related Work • OS • Arraakis, IXΑΓ΋ந৅౓ͷߴ͍I/F • ϢʔβϨϕϧͷOS֦ுͰHWӅṭ

    -> NWελοΫͳͲͷOSػೳ͸ແ͍ • I/O Accelerated System • POSIX I/Fʹҡ࣋Ͱඇޮ཰ԽɻྫɿmTCPͩͱDPDKΑΓlatency͔͔Δ • NW/TCPॲཧΛPMD/NICͰ΍Δɺ΋͘͠͸᫔᫓੍ޚΛOS͔Β֎͢ํ޲ʢQUICͳͲʣ • I/O Accelerated Application • RDMAΛ࢖ͬͨϦϞʔτϝϞϦ΁ͷ௿ϨΠςϯγʔΞϓϦέγϣʔϯ౳ 12