Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#37 “Bluebird: High-performance SDN for Bare-me...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
cafenero_777
June 22, 2023
Technology
170
1
Share
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
NSDI 2022
https://www.usenix.org/conference/nsdi22/presentation/arumugam
cafenero_777
June 22, 2023
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
560
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
150
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
160
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
120
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
94
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
69
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
290
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
350
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
200
Other Decks in Technology
See All in Technology
いつの間にかデータエンジニア以外の業務も増えていたけど、意外と経験が役に立ってる
zozotech
PRO
0
570
OWASP APTSを眺めてみた
su3158
0
130
O'Reilly Infrastructure & Ops Superstream: Platform Engineering for Developers, Architects & the Rest of Us
syntasso
0
130
フロントエンドの相手が変わった - AIが加わったWebの新しいインターフェース設計
azukiazusa1
33
11k
鹿野さんに聞く!CSSの最新トレンド Ver.2026
tonkotsuboy_com
6
3.1k
エンタープライズの厳格な制約を開発者に意識させない:クラウドネイティブ開発基盤設計/cloudnative-kaigi-golden-path
mhrtech
0
410
AWS WAFの運用を地道に改善し、自社で運用可能にするプラクティス
andpad
1
160
古今東西SRE
okaru
2
190
Terragrunt x Snowflake + dbt で作るマルチテナントなデータ基盤構築プラットフォーム
gak_t12
0
130
ワールドカフェ再び、そしてゴール・ルール・ロール・ツール / World Café Revisited, and the Goals-Rules-Roles-Tools
ks91
PRO
0
160
可視化から活用へ — Mesh化・Segmentation・アライメントの研究動向
gpuunite_official
0
200
「強制アップデート」か「チームの自律」か?エンタープライズが辿り着いたプラットフォームのハイブリッド運用/cloudnative-kaigi-hybrid-platform-operations
mhrtech
0
200
Featured
See All Featured
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
140
Product Roadmaps are Hard
iamctodd
PRO
55
12k
GitHub's CSS Performance
jonrohan
1033
470k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.1k
Practical Orchestrator
shlominoach
191
11k
Building an army of robots
kneath
306
46k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
300
Speed Design
sergeychernyshev
33
1.6k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
My Coaching Mixtape
mlcsv
0
120
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Transcript
Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud
Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1
Agenda •ରจ •֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals
and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
ରจ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan
Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌͷNSDI 2022 RecapճͰհͨ͠ͷ 3
Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft
• AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷԾNWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2ӡ༻ • ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
֓ཁͱಡ͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋҎ্Քಇͨ͠ܦݧͷհ
•ಡ͏ͱͨ͠ཧ༝ • ΫϥυͰͷP4 use case • ՝ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5
1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮ • OvS, DPDK, ASIC, FGPA,
SmartNIC •ࣗࣾγεςϜͷΫϥυҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯεΛ͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥυαʔϏε/HWaaSSDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToRSmartToRΛར༻ఆ • 1<us, 100Gbps, NAT༻ͳͲͷඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
2. Background 7 HVͰશ෦ΔͷͰγϯϓϧɻ SWͰΔͷେมɻagent͕Ϧιʔε͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվʁʣ ϕΞϝλϧͷΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճVRF(ސ٬ຖͷNWׂ)ͱVRFຖͷCA-PA mapping
(VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮ɻ
3. Design Goals and Rationale 1. Programmability: VFPͱಉͳSDNελοΫɻ࣌ͱͱʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:
ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: BluebirdઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •
VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •
Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯάΛ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿ͨ͠ • ेʁ -> NO, ։࢝ॳे͕ͩͬͨɺɺɺ • mappingใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͚͞ΔΑ͏ʹͳͬͨ 10
4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •
6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ • ૉͳ࣮ͩͱΞϯμʔϨΠʹIPv6Λ͏߹CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛΓସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹΓସ͑ •BM->VFPͷѼઌMACBMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •
ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ͘HW (To fi no)͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ·Ͱ૿ͤͨ 12
4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ
•BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩΛܭࢉͯ͠reconciliation͢Δ • ֤ߏཁૉΞτϛοΫॲཧɺߏόʔδϣϯཧ͞ΕΔ • ཧToRʢෳʣͷҰ؏ੑରԠ •BBSAZ͝ͱʹ͋ΔɻҰͭͷBBSෳAZαϙʔτՄೳɻ 13
5. Performance (1/3) •AzureͰաڈ2Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ઍنͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ
• route cache·ͩൃಈͤͣʢҰޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ
• ిྗޮطଘͷToRͱมΘΒͣ •route cacheͷԆ • 8usԆ • SFEసૹԆͱSFW->HWΤϯτϦҠಈԆ 15
5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ~25%ఔ͕”active”ͳ௨৴ • 75%SW (CPU)ʹҠߦՄೳ
• ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucketྨ • ͲͷఔੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ͍ͬͯΔactiveͳmapping(%)
6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi
gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏ͰͰ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴԽɻύέοτྲྀͯ͠ϑϩʔݕূςετՄೳɻ •ToR imageΛͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mappingՃɾআͷΈɻϝϯςφϯεଞͷϑϨʔϜϫʔΫʹͤΔ •نʹԠͨ͡ॲཧௐ: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ
6. Lessons Learned (2/2) •ToRԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠߹ੑΛऔΔඞཁ͋Γɻ • ೖઃఆͱͷࠩΛߟྀͯ͠ઃఆՃɾআΛߦ͍ɺ߹ੑΛอͭɻfail-over࣌ಉ༷ɻ
•Stateful Reconciliation: BBS࠷ॳstatelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯཧͳͲͰstate୲อ •҆શห͕ӡ༻ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmappingΛ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্࣮͛ͯࡍͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS imagepatchΛͯΔͷͰͳ͘ম͖͢ɻ͜ͷํ͕ཧ͕୯७͔ͭ༰қɺαʔϏε্࣭ •ToR OSී௨ͷlinux OS, tcpdumpiperfͳͲ”ී௨ͷ”πʔϧ͕͑ɺূ໌ॻͷߋ৽dockerίϯςφαʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ
7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,
OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢଳҬɾԆʣʹ߹Θͳ͍ •SmartNICࠓճͷཁ݅ʹ͑ͳ͍ •εΠον+αʔόߏ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2ͷupgrade, εΠονͷϝϞϦ֦ு •SDNmulti-tenancy͚ͩͷͷͰͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
Conclusions and Future Work •Bluebirdͷઃܭɾ࣮ɾܦݧ • Azure ϕΞϝλϧΫϥυαʔϏε༻ͷSDN ToRγεςϜ •
Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20
Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANมͰΧόʔ •HW༰ྔෆΩϟογϡʢSWͰͷʣͰղܾ •2ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)ܦݧΛڞ༗ 21
EoP 22