Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
#37 “Bluebird: High-performance SDN for Bare-me...
Search
cafenero_777
June 22, 2023
Technology
1
120
#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”
NSDI 2022
https://www.usenix.org/conference/nsdi22/presentation/arumugam
cafenero_777
June 22, 2023
Tweet
Share
More Decks by cafenero_777
See All by cafenero_777
#51 “Empowering Azure Storage with RDMA”
cafenero_777
3
460
#49 “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems”
cafenero_777
2
110
#50 “Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction”
cafenero_777
0
110
#33 “Destroying networks for fun (and profit)”
cafenero_777
0
91
#34 “MTPSA: Multi-Tenant Programmable Switches”
cafenero_777
0
54
#39 “Profiling a warehouse-scale computer”
cafenero_777
0
40
#23 “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”
cafenero_777
0
220
#24 “Ananta: Cloud Scale Load Balancing”
cafenero_777
0
240
#25 “Swift: Delay is Simple and Effective for Congestion Control in the Datacenter”
cafenero_777
0
150
Other Decks in Technology
See All in Technology
Рекомендации с нуля: как мы в Lamoda превратили главную страницу в ключевую точку входа для персонализированного шоппинга. Данил Комаров, Data Scientist, Lamoda Tech
lamodatech
0
820
Oracle Cloud Infrastructure:2025年4月度サービス・アップデート
oracle4engineer
PRO
0
200
【Λ(らむだ)】最近のアプデ情報 / RPALT20250422
lambda
0
130
AndroidアプリエンジニアもMCPを触ろう
kgmyshin
1
320
OpsJAWS34_CloudTrailLake_for_Organizations
hiashisan
0
170
Building App Extensions equivalents on Android (maybe?)
atsushieno
1
110
Writing Ruby Scripts with TypeProf
mame
0
410
コードや知識を組み込む / Incorporating Codes and Knowledge
ks91
PRO
0
130
C++26アップデート 2025-03
faithandbrave
0
1.1k
読んで学ぶ Amplify Gen2 / Amplify と CDK の関係を紐解く #jawsug_tokyo
tacck
PRO
1
270
生成AIのユースケースをとにかく集めてまるっと学ぶ!/ all about generative ai usecases
gakumura
2
280
「経験の点」の位置を意識したキャリア形成 / Career development with an awareness of the “point of experience” position
pauli
4
110
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
349
20k
StorybookのUI Testing Handbookを読んだ
zakiyama
29
5.7k
Six Lessons from altMBA
skipperchong
28
3.7k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
52
2.4k
Making Projects Easy
brettharned
116
6.1k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
680
The Invisible Side of Design
smashingmag
299
50k
Typedesign – Prime Four
hannesfritz
41
2.6k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
119
51k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
400
VelocityConf: Rendering Performance Case Studies
addyosmani
329
24k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Transcript
Research Paper Introduction #37 “Bluebird: High-performance SDN for Bare-metal Cloud
Services” ௨ࢉ#101 @cafenero_777 2022/06/09 1
Agenda •ରจ •֓ཁͱಡ͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals
and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
ରจ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan
Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌͷNSDI 2022 RecapճͰհͨ͠ͷ 3
Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft
• AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷԾNWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2ӡ༻ • ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
֓ཁͱಡ͏ͱͨ͠ཧ༝ •֓ཁ • AzureͷϕΞϝλϧɾΫϥυαʔϏε༻ͷNWΛP4SWͰ͏·͘ܨ͙ • Մ༻ੑΛߟྀͨ͠ઃܭͰɺ<1us latencyͰ100Gb/s line-rateग़ͤΔ • ೋҎ্Քಇͨ͠ܦݧͷհ
•ಡ͏ͱͨ͠ཧ༝ • ΫϥυͰͷP4 use case • ՝ͱͦͷղܾํ๏ʢઃܭͳͲʣ͕ؾʹͳΔ 5
1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮ • OvS, DPDK, ASIC, FGPA,
SmartNIC •ࣗࣾγεςϜͷΫϥυҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯεΛ͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥυαʔϏε/HWaaSSDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToRSmartToRΛར༻ఆ • 1<us, 100Gbps, NAT༻ͳͲͷඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
2. Background 7 HVͰશ෦ΔͷͰγϯϓϧɻ SWͰΔͷେมɻagent͕Ϧιʔε͏ɻ scalability/programmabilityΛҡ࣋͠ͳ͕ΒߴੑೳԽɻ ϕΞϝλϧʹ͋·Γద͞ͳ͍ɻʢෳࡶա͗ΔɻVFPվʁʣ ϕΞϝλϧͷΘΓʹToRͰෳࡶͳ͜ͱ͕Ͱ͖Δɻ ࠓճVRF(ސ٬ຖͷNWׂ)ͱVRFຖͷCA-PA mapping
(VxLAN static route) ֤छrouting/tunnelingॲཧΛP4Ͱ࣮ɻ
3. Design Goals and Rationale 1. Programmability: VFPͱಉͳSDNελοΫɻ࣌ͱͱʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:
ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: BluebirdઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •
VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •
Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯάΛ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿ͨ͠ • ेʁ -> NO, ։࢝ॳे͕ͩͬͨɺɺɺ • mappingใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͚͞ΔΑ͏ʹͳͬͨ 10
4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •
6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ • ૉͳ࣮ͩͱΞϯμʔϨΠʹIPv6Λ͏߹CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛΓସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹΓସ͑ •BM->VFPͷѼઌMACBMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •
ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ͘HW (To fi no)͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ·Ͱ૿ͤͨ 12
4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ
•BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩΛܭࢉͯ͠reconciliation͢Δ • ֤ߏཁૉΞτϛοΫॲཧɺߏόʔδϣϯཧ͞ΕΔ • ཧToRʢෳʣͷҰ؏ੑରԠ •BBSAZ͝ͱʹ͋ΔɻҰͭͷBBSෳAZαϙʔτՄೳɻ 13
5. Performance (1/3) •AzureͰաڈ2Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ઍنͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ
• route cache·ͩൃಈͤͣʢҰޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ
• ిྗޮطଘͷToRͱมΘΒͣ •route cacheͷԆ • 8usԆ • SFEసૹԆͱSFW->HWΤϯτϦҠಈԆ 15
5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ~25%ఔ͕”active”ͳ௨৴ • 75%SW (CPU)ʹҠߦՄೳ
• ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucketྨ • ͲͷఔੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ͍ͬͯΔactiveͳmapping(%)
6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi
gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏ͰͰ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴԽɻύέοτྲྀͯ͠ϑϩʔݕূςετՄೳɻ •ToR imageΛͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mappingՃɾআͷΈɻϝϯςφϯεଞͷϑϨʔϜϫʔΫʹͤΔ •نʹԠͨ͡ॲཧௐ: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ
6. Lessons Learned (2/2) •ToRԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠߹ੑΛऔΔඞཁ͋Γɻ • ೖઃఆͱͷࠩΛߟྀͯ͠ઃఆՃɾআΛߦ͍ɺ߹ੑΛอͭɻfail-over࣌ಉ༷ɻ
•Stateful Reconciliation: BBS࠷ॳstatelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯཧͳͲͰstate୲อ •҆શห͕ӡ༻ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmappingΛ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্࣮͛ͯࡍͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS imagepatchΛͯΔͷͰͳ͘ম͖͢ɻ͜ͷํ͕ཧ͕୯७͔ͭ༰қɺαʔϏε্࣭ •ToR OSී௨ͷlinux OS, tcpdumpiperfͳͲ”ී௨ͷ”πʔϧ͕͑ɺূ໌ॻͷߋ৽dockerίϯςφαʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ
7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,
OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢଳҬɾԆʣʹ߹Θͳ͍ •SmartNICࠓճͷཁ݅ʹ͑ͳ͍ •εΠον+αʔόߏ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2ͷupgrade, εΠονͷϝϞϦ֦ு •SDNmulti-tenancy͚ͩͷͷͰͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
Conclusions and Future Work •Bluebirdͷઃܭɾ࣮ɾܦݧ • Azure ϕΞϝλϧΫϥυαʔϏε༻ͷSDN ToRγεςϜ •
Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20
Key takeaways •AzureϕΞϝλϧαʔϏεʢNetappͳͲʣΛP4 ToRͷVLAN/VXLANมͰΧόʔ •HW༰ྔෆΩϟογϡʢSWͰͷʣͰղܾ •2ӡ༻ɺੑೳ(<1us latencyͰ100Gb/s line-rate)ܦݧΛڞ༗ 21
EoP 22