Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#37 “Bluebird: High-performance SDN for Bare-me...
Search
cafenero_777
June 22, 2023
Technology
1
130
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
NSDI 2022
https://www.usenix.org/conference/nsdi22/presentation/arumugam
cafenero_777
June 22, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
470
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
120
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
120
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
94
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
60
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
46
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
230
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
260
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
160
Other Decks in Technology
See All in Technology
強化されたAmazon Location Serviceによる新機能と開発者体験
dayjournal
3
260
MUITにおける開発プロセスモダナイズの取り組みと開発生産性可視化の取り組みについて / Modernize the Development Process and Visualize Development Productivity at MUIT
muit
0
240
論文紹介:LLMDet (CVPR2025 Highlight)
tattaka
0
240
タイミーのデータモデリング事例と今後のチャレンジ
ttccddtoki
4
1.6k
高速なプロダクト開発を実現、創業期から掲げるエンタープライズアーキテクチャ
kawauso
1
280
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
0
430
CursorによるPMO業務の代替 / Automating PMO Tasks with Cursor
motoyoshi_kakaku
2
800
製造業からパッケージ製品まで、あらゆる領域をカバー!生成AIを利用したテストシナリオ生成 / 20250627 Suguru Ishii
shift_evolve
PRO
1
160
FOSS4G 2025 KANSAI QGISで点群データをいろいろしてみた
kou_kita
0
250
LangChain Interrupt & LangChain Ambassadors meetingレポート
os1ma
2
220
Yamla: Rustでつくるリアルタイム性を追求した機械学習基盤 / Yamla: A Rust-Based Machine Learning Platform Pursuing Real-Time Capabilities
lycorptech_jp
PRO
4
170
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
6
600
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
Bash Introduction
62gerente
614
210k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
17
950
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.5k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Code Review Best Practice
trishagee
69
18k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
It's Worth the Effort
3n
185
28k
Making Projects Easy
brettharned
116
6.3k
Designing for Performance
lara
609
69k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
Transcript
Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud
Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1
Agenda •ରจ •֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals
and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
ରจ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan
Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌͷNSDI 2022 RecapճͰհͨ͠ͷ 3
Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft
• AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷԾNWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2ӡ༻ • ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
֓ཁͱಡ͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋҎ্Քಇͨ͠ܦݧͷհ
•ಡ͏ͱͨ͠ཧ༝ • ΫϥυͰͷP4 use case • ՝ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5
1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮ • OvS, DPDK, ASIC, FGPA,
SmartNIC •ࣗࣾγεςϜͷΫϥυҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯεΛ͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥυαʔϏε/HWaaSSDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToRSmartToRΛར༻ఆ • 1<us, 100Gbps, NAT༻ͳͲͷඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
2. Background 7 HVͰશ෦ΔͷͰγϯϓϧɻ SWͰΔͷେมɻagent͕Ϧιʔε͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվʁʣ ϕΞϝλϧͷΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճVRF(ސ٬ຖͷNWׂ)ͱVRFຖͷCA-PA mapping
(VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮ɻ
3. Design Goals and Rationale 1. Programmability: VFPͱಉͳSDNελοΫɻ࣌ͱͱʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:
ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: BluebirdઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •
VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •
Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯάΛ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿ͨ͠ • ेʁ -> NO, ։࢝ॳे͕ͩͬͨɺɺɺ • mappingใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͚͞ΔΑ͏ʹͳͬͨ 10
4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •
6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ • ૉͳ࣮ͩͱΞϯμʔϨΠʹIPv6Λ͏߹CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛΓସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹΓସ͑ •BM->VFPͷѼઌMACBMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •
ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ͘HW (To fi no)͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ·Ͱ૿ͤͨ 12
4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ
•BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩΛܭࢉͯ͠reconciliation͢Δ • ֤ߏཁૉΞτϛοΫॲཧɺߏόʔδϣϯཧ͞ΕΔ • ཧToRʢෳʣͷҰ؏ੑରԠ •BBSAZ͝ͱʹ͋ΔɻҰͭͷBBSෳAZαϙʔτՄೳɻ 13
5. Performance (1/3) •AzureͰաڈ2Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ઍنͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ
• route cache·ͩൃಈͤͣʢҰޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ
• ిྗޮطଘͷToRͱมΘΒͣ •route cacheͷԆ • 8usԆ • SFEసૹԆͱSFW->HWΤϯτϦҠಈԆ 15
5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ~25%ఔ͕”active”ͳ௨৴ • 75%SW (CPU)ʹҠߦՄೳ
• ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucketྨ • ͲͷఔੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ͍ͬͯΔactiveͳmapping(%)
6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi
gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏ͰͰ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴԽɻύέοτྲྀͯ͠ϑϩʔݕূςετՄೳɻ •ToR imageΛͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mappingՃɾআͷΈɻϝϯςφϯεଞͷϑϨʔϜϫʔΫʹͤΔ •نʹԠͨ͡ॲཧௐ: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ
6. Lessons Learned (2/2) •ToRԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠߹ੑΛऔΔඞཁ͋Γɻ • ೖઃఆͱͷࠩΛߟྀͯ͠ઃఆՃɾআΛߦ͍ɺ߹ੑΛอͭɻfail-over࣌ಉ༷ɻ
•Stateful Reconciliation: BBS࠷ॳstatelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯཧͳͲͰstate୲อ •҆શห͕ӡ༻ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmappingΛ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্࣮͛ͯࡍͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS imagepatchΛͯΔͷͰͳ͘ম͖͢ɻ͜ͷํ͕ཧ͕୯७͔ͭ༰қɺαʔϏε্࣭ •ToR OSී௨ͷlinux OS, tcpdumpiperfͳͲ”ී௨ͷ”πʔϧ͕͑ɺূ໌ॻͷߋ৽dockerίϯςφαʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ
7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,
OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢଳҬɾԆʣʹ߹Θͳ͍ •SmartNICࠓճͷཁ݅ʹ͑ͳ͍ •εΠον+αʔόߏ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2ͷupgrade, εΠονͷϝϞϦ֦ு •SDNmulti-tenancy͚ͩͷͷͰͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
Conclusions and Future Work •Bluebirdͷઃܭɾ࣮ɾܦݧ • Azure ϕΞϝλϧΫϥυαʔϏε༻ͷSDN ToRγεςϜ •
Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20
Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANมͰΧόʔ •HW༰ྔෆΩϟογϡʢSWͰͷʣͰղܾ •2ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)ܦݧΛڞ༗ 21
EoP 22