Upgrade to Pro — share decks privately, control downloads, hide ads and more …

#37 “Bluebird: High-performance SDN for Bare-me...

#37 “Bluebird: High-performance SDN for Bare-metal Cloud Services”

cafenero_777

June 22, 2023
Tweet

More Decks by cafenero_777

Other Decks in Technology

Transcript

  1. Agenda •ର৅࿦จ •֓ཁͱಡ΋͏ͱͨ͠ཧ༝ 1. Introduction 2. Background 3. Design Goals

    and Rationale 4. System Design 5. Performance 6. Operationalization and Experiences 7. Related Work 8. Conclusions and Future Work 2
  2. ର৅࿦จ •Bluebird: High-performance SDN for Bare-metal Cloud Services • Manikandan

    Arumugam1, et al • Arista1, Intel2, Microsoft3 • NSDI 2022 • https://www.usenix.org/conference/nsdi22/presentation/arumugam • ઌ೔ͷNSDI 2022 RecapճͰ঺հͨ͠΋ͷ 3
  3. Bluebird: High-performance SDN for Bare-metal Cloud Services Arista, Intel, Microsoft

    • AzureͷϕΞϝλϧɾΫϥ΢υαʔϏε༻ͷԾ૝NWΛP4SWͰ·͔ͳ͏ • Netapp, Cray, SAP • 100Gbps, 2೥ӡ༻ • ೔ຊޠղઆهࣄ લճͷεϥΠυΑΓൈਮ
  4. 1. Introduction •SDN, Τϯυϗετଆ (HV)ͰD-plane࣮૷ • OvS, DPDK, ASIC, FGPA,

    SmartNIC •ࣗࣾγεςϜͷΫϥ΢υҠߦͷݕ౼ • ʢઐ༻ʣΞϓϥΠΞϯε౳Λ࢖͍ͬͯΔʢNetApp, Cray, SAP, and HPCʣ •ϕΞϝλϧΫϥ΢υαʔϏε/HWaaS͸SDNελοΫΛೖΕΒΕͳ͍ʂ •ToRϕʔεͷSDNιϦϡʔγϣϯ: Bluebird • Barefoot To fi noͷToR΍SmartToRΛར༻૝ఆ • 1<us, 100Gbps, NAT༻ͳͲͷ਺ඦສͷconntrackͷ࣮ݱ • ίϯτϩʔϧϓϨʔϯ 6
  5. 3. Design Goals and Rationale 1. Programmability: VFPͱಉ౳ͳSDNελοΫɻ࣌ͱͱ΋ʹཁ͕݅มΘ͍͕ͬͯ͘ҡ࣋͢Δඞཁ͋Γɻ 2. Scalability:

    ToRͷϝϞϦ༰ྔ͕ϘτϧωοΫͷͨΊɺΩϟογϡγεςϜΛ։ൃɻ 3. Latency and Throughput: Programmable ASICΛར༻ɻ 4. High availability: Bluebird΋৑௕ઃܭΛͨ͠ɻ 5. Multitenancy support: ඞਢͳػೳཁ݅ɻ 6. Minimal overhead on host resources: θϩʹͳΔɻϕΞϝλϧੑೳͦͷ··ग़ͤΔɻ 7. Seamless integration: ϕΞϝλϧଆΛมߋͤͣʹɺBluebird͚ͩͰ࣮ݱɻ 8. External network access: ϕΞϝλϧ͕௚઀Πϯλʔωοτͱܨ͛ΔΑ͏ʹNATΛαϙʔτɻ 9. Interoperability: طଘͷSDNελοΫͱ࿈ܞ͠ಁաతͳಈ࡞Λ࣮ݱɻ 8
  6. 4. System Design (1/5) ύέοτͷྲྀΕ # Baremetal -> VM •

    VLAN 400 -> VRF/VNI 20500 • ѼઌMACΛToRͰม׵ • ToR/VFPؒVXLANτϯωϧ 9 # VM -> Baremetal • VFP/ToRؒVXLANτϯωϧ • VRF/VNI 20500 -> VLAN 400 • ѼઌMACΛToRͰղܾ
  7. 4. System Design (2/5) ֓ཁ •σόΠείετɾϝϞϦʢFIBʣɾNPU/ASICػೳͷτϨʔυΦϑ • ίΞϧʔλ: ߴ͍ɾେ༰ྔɾଟػೳ •

    Bluebird: ͍҆ɾͦΕͳΓͷྔɾଟػೳʢࣗ࡞ʣ • NetAppͷཁ݅ʢ240Gbps, <4msʣΛ6.4TbpsͳToRΛ࢖ͬͯղܾ •P4ύΠϓϥΠϯઃܭʹۤ࿑ • VTEP (VXLAN Tunnel Endpoint) tableͰදݱ͞ΕΔCA-PAϚοϐϯά਺Λ࠷େԽ͍ͨ͠ • To fi noͷIPv4/v6 unicast FIBΛॖখ͠ɺVTEP tableΛ16K -> 192Kʹ૿΍ͨ͠ • े෼ʁ -> NO, ։࢝౰ॳ͸े෼͕ͩͬͨɺɺɺ • mapping৘ใΛΩϟογϡͤ͞ɺ192KΤϯτϦҎ্Λ͞͹͚ΔΑ͏ʹͳͬͨ 10
  8. 4. System Design (3/5) P4 Platform/pipeline •To fi no-1ͷ࠾༻ •

    6.4Tbps, 12stage, 256*25G SerDes, Quad-core 2.2Ghz CPU on Arista 7170 • 192K CA-to-PA mappingཁ݅ΛΫϦΞ •P4 Pipelineͷ޻෉ • ૉ๿ͳ࣮૷ͩͱΞϯμʔϨΠʹIPv6Λ࢖͏৔߹͸CA-to-PAαΠζ֬อෆՄ • ΧελϜP4ύΠϓϥΠϯΛ࢖͏͜ͱͰ͜ΕΛղܾ •ToRͷϓϩϑΝΠϧΛ੾Γସ͑Δ͜ͱͰɺҟͳΔP4ϓϩάϥϜʹ੾Γସ͑ •BM->VFPͷѼઌMAC͸BMଆͰstatic routeͱͯ͠deploy •https://github.com/navybhatia/p4-vxlanencapdecap/blob/main/switch-vxlan.p4 11
  9. 4. System Design (4/5) route cache •192K CA-PA mappingͷϘτϧωοΫ͕ݟ͖͑ͯͨ •

    ղܾҊ1: To fi no2 (1.5M CA-PA mapping)Λ࢖͏ • ղܾҊ2: cacheػߏΛ࡞Δ • ࣮ࡍʹ௨৴ͨ͠ΒͳΔ΂͘HW (To fi no)࢖͏ • LRU age/routeͰSW (CPU)ʹୀආ •1Mఔ౓·Ͱ૿΍ͤͨ 12
  10. 4. System Design (5/5) C-plane & policy •֎෦αʔϏε(Bluebird Service) ͔ΒϓϩϏδϣχϯά͢Δ

    •BBS: goal-stateΛ࡞ͬͯpush͢Δ • DAL: ίϚϯυγʔέϯε->JSON-RPC->EOS CLI • λʔήοτͱͷcon fi gࠩ෼Λܭࢉͯ͠reconciliation͢Δ • ֤ߏ੒ཁૉ͸ΞτϛοΫॲཧɺߏ੒͸όʔδϣϯ؅ཧ͞ΕΔ • ࿦ཧToRʢෳ਺୆ʣͷҰ؏ੑରԠ •BBS͸AZ͝ͱʹ͋ΔɻҰͭͷBBS͸ෳ਺AZ΋αϙʔτՄೳɻ 13
  11. 5. Performance (1/3) •AzureͰաڈ2೥Ͱ42Ҏ্ͷDCͰSDN-ToRར༻ • ਺ઍ୆ن໛ͷϕΞϝλϧαʔόʢCray ClusterStor, and NetApp FilesؚΉʣ͕Քಇ

    • route cache͸·ͩൃಈͤͣʢҰ೥ޙ͙Β͍ʹൃಈͦ͠͏ʣ • 40Gbps NIC, Xeon E5-2673 v4 (2.3GHz) on Windows Server 2019 14
  12. 5. Performance (2/3) •SDN ToR εωʔΫςετ • <1usͰ΄΅100Gbps • ଳҬɾϨΠςϯγʹහײͳBMϫʔΫϩʔυʹ߹͍ͬͯΔ

    • ిྗޮ཰͸طଘͷToRͱมΘΒͣ •route cacheͷ஗Ԇ • 8us஗Ԇ • SFEసૹ஗ԆͱSFW->HWΤϯτϦҠಈ஗Ԇ 15
  13. 5. Performance (3/3) •route cacheͷݕূ • ࣮Քಇͷσʔλతʹ͸~25%ఔ౓͕”active”ͳ௨৴ • 75%͸SW (CPU)ʹҠߦՄೳ

    • ͭ·Γ192K PA-CAΤϯτϦҎ্͕ར༻Մೳ • route͝ͱʹageͰbucket෼ྨ • Ͳͷఔ౓ੵۃతʹҠಈ͍͔ͤͨ͞νϡʔχϯάՄೳ 16 HW(To fi no)ʹ৐͍ͬͯΔactiveͳmapping਺(%)
  14. 6. Lessons Learned (1/2) •packet mirroring: ToR CPUͰϛϥʔϦϯάͯ͠ຊ൪Ͱσόοά •Re-con fi

    gurable ASIC: route cacheػߏͳͲɺʢଞͷํ๏Ͱ͸Ͱ͖ͳ͔ͬͨʣػೳΛ։ൃͰ͖ͨ •ASIC emulators: ։ൃͷߴ଎Խɻύέοτྲྀͯ͠ϑϩʔݕূ΍ςετ΋Մೳɻ •ToR imageΛ࢖ͬͨC−planeςετ: ςετͰ׆༻ •64bit OS: ϝϞϦ͍ͬͺ͍࢖͑Δ-> route cacheΤϯτϦΛଟ͘ར༻Ͱ͖Δ •C-planeͷػೳ੍ݶ: VRF/mapping௥Ճɾ࡟আͷΈɻϝϯςφϯε͸ଞͷϑϨʔϜϫʔΫʹ೚ͤΔ •ن໛ʹԠͨ͡ॲཧௐ੔: Ωϡʔͱόονॲཧ 17 ࢀߟ: https://t.co/KEWgX8pfuj ղઆऀͷ ؾʹͳΔ఺
  15. 6. Lessons Learned (2/2) •ToR৑௕ԽʢMLAGʣʹΑΔBBSಋೖɾҡ࣋ͷ؆қԽ •Reconciliationͷඞཁੑɿ • ݹ͍ઃఆ͔Βਖ਼͍͠ઃఆʹ໭͢ʢ෮ݩϓϩηεʣͷதͰΤϥʔΛमਖ਼ͯ͠੔߹ੑΛऔΔඞཁ͋Γɻ • ౤ೖઃఆͱͷࠩ෼Λߟྀͯ͠ઃఆ௥Ճɾ࡟আΛߦ͍ɺ੔߹ੑΛอͭɻfail-over࣌΋ಉ༷ɻ

    •Stateful Reconciliation: BBS͸࠷ॳ͸statelessϞσϧ͕ͩͬͨɺॲཧʹֻ͕͔࣌ؒΓա͗ͨͷมߋɻόʔδϣϯ؅ཧͳͲͰstate୲อ •҆શห͕ӡ༻޻਺ͷ૿ՃΛҾ͖ى͜͢ɿ • route cache͕࢖͑ΔΑ͏ʹͳΔ·Ͱɺސ٬༻ͷmapping਺Λ੍ݶͨ͠ʢ҆શͷͨΊɻ͕ɺ੍ݶ͕௿͗ͨ͢ʣ • ্ݶΛΦϯσϚϯυͰ্͛Δඞཁ͋Γɻ੍ݶΛ্͛ͯ΋࣮ࡍ͸ͦ͜·Ͱ૿͑ͳ͔ͬͨ •ToR OS image͸patchΛ౰ͯΔͷͰ͸ͳ͘ম͖௚͢ɻ͜ͷํ͕؅ཧ͕୯७͔ͭ༰қɺαʔϏε඼࣭΋޲্ •ToR OS͸ී௨ͷlinux OS, tcpdump΍iperfͳͲ”ී௨ͷ”πʔϧ͕࢖͑ɺূ໌ॻͷߋ৽΍dockerίϯςφ΋αʔόͱಉ͡Α͏ʹར༻Ͱ͖Δ 18 ղઆऀͷ ؾʹͳΔ఺
  16. 7. Related Work •OpenNF, Embark, ClickOS, NFVܥ, Serverless NFܥ, middle-boxܥ,

    OpenFlowܥ • Azure bare-metalαʔϏεཁ݅ʢ޿ଳҬɾ௿஗Ԇʣʹ߹Θͳ͍ •SmartNIC͸ࠓճͷཁ݅ʹ͸࢖͑ͳ͍ •εΠον+αʔόߏ੒ -> ফඅిྗ͕ߴ͍ •ϓϩάϥϚϒϧεΠονͷϦιʔε੍ݶ • ΩϟογϡɾTo fi no-2΁ͷupgrade, εΠονͷϝϞϦ֦ு •SDN͸multi-tenancy͚ͩͷ΋ͷͰ͸ͳ͍: FBOSS, B4, EgressEngineering, Jupiter, Robotron, Espresso 19
  17. Conclusions and Future Work •Bluebirdͷઃܭɾ࣮૷ɾܦݧ • Azure ϕΞϝλϧΫϥ΢υαʔϏε༻ͷSDN ToRγεςϜ •

    Neap, Cray, SAPͷʢݫ͍͠ʣϫʔΫϩʔυͰ2೥ؒӡ༻ • ϓϩάϥϚϒϧASIC + ࣗ࡞ͷΩϟογϡػߏ • ΩϟογϡΞϧΰϦζϜվળ΍ଟ༷ͳϫʔΫϩʔυʹରԠ༧ఆ 20