Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ultra Accelerator Link Consortium - IT Press To...

Ultra Accelerator Link Consortium - IT Press Tour #62 June 2025

Avatar for The IT Press Tour

The IT Press Tour

June 04, 2025

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. •Kurtis Bowman, UALink Consortium Chair, AMD •Nathan Kalyanasundharam, UALink Consortium

    Technical Task Force Co-Chair, AMD Presenters 5/30/2025 2 Ultra Accelerator Link 2025
  2. Advancing AI Across Data Centers 5/30/2025 4 Ultra Accelerator Link

    2025 AI models continue to grow requiring more compute and memory to efficiently execute training and inference on large models The industry needs an open solution that enables efficient distribution of models across many accelerators within a pod Large inference models will require scale-up of 10’s – 100’s of accelerators in pods Large training models will require scale-up and scale-out from 100’s – 10,000’s of accelerators by connecting multiple pods
  3. 5/30/2025 6 Ultra Accelerator Link 2025 Ultra Accelerator Link Timeline

    May 2024 Promoter Group Press Release October 2024 UALink Membership Forms Posted To Website April 2025 UALink 200G 1.0 Specification
  4. 5/30/2025 7 Confidential | Ultra Accelerator Link 2025 UALink Creates

    the Scale-up Pod ▪ High performance ▪ Up to 800Gbps per Port, scalable ports per accelerator, Up to 1,024 accelerators ▪ Low latency ▪ Optimized protocol, transaction, link & physical ▪ Low power ▪ The simplified UALink stack leads to lower power solutions ▪ Low die area ▪ Optimized data layer and transaction layer saves significant die area ▪ 1 RACK : UALink ▪ 2 RACKS : UALink ▪ 3-4 RACKS : UALink or UEC ▪ > 4 RACKS : UEC UALink1.0 focus is to deliver optimized scale-up solutions with single tier switching Ethernet Scale-Out ▪ 1 RACK : UALink ▪ 2 RACKS : UALink ▪ 3-4 RACKS : UALink or Ethernet ▪ >4 RACKS : Ethernet
  5. ▪ The UALink interconnect enables Accelerator-to-Accelerator communication ▪ The initial

    focus is sharing memory among accelerators ▪ Direct load, store, and atomic operations between accelerators (i.e. GPUs) ▪ Low latency, high bandwidth fabric for 100’s of accelerators in a pod (up to 1K) ▪ Simple load/store/atomics semantics with software coherency ▪ The initial UALink specification taps into the experience of the Promoters developing and deploying a broad range of accelerators and seeded with the proven Infinity Fabric protocol UALink 200G 1.0 Specification 5/30/2025 8 Ultra Accelerator Link 2025
  6. ▪ Performance, Power & Efficiency ▪ Low-latency, high-bandwidth interconnect for

    hundreds of accelerators in a pod ▪ Features the same raw speed as Ethernet with the latency of PCIe® switches ▪ Enables a highly efficient switch design that reduces power and complexity with small packets, fixed FLIT sizes, ID based routing, and overall simplicity ▪ Significantly smaller die area for link stack, lowering power and acquisition costs ▪ Increased bandwidth efficiency further enables lower TCO ▪ Open and Standardized ▪ UALink harnesses the innovation of member companies to drive leading-edge features into the specification and interoperable products to the market ▪ Leverages ubiquitous Ethernet infrastructure ▪ Cables, Connectors, Retimers, Management Software, and more. UALink 200G 1.0 Benefits 5/30/2025 9 Ultra Accelerator Link 2025
  7. ▪Standard Ethernet Physical ▪UALink DL ▪UALink TL ▪UALink Protocol UALink

    Stack Features & Goals 5/30/2025 11 Ultra Accelerator Link 2025
  8. UALink Protocol Interface (UPLI) 5/30/2025 12 Ultra Accelerator Link 2025

    • Simple symmetric interface protocol • Request • Request Data • Read Response + Data • Write Response • Originator interface sends requests to other accelerators and receives responses. • Completer interface receives requests from other accelerators and returns responses • Src/Dst Identifier(ID) based routing • Provisioned to enable multiple address spaces • Same address ordering for Requests; Completions unordered 1x4b, 2 x 2b OR 4x1b
  9. Transaction Layer (TL) 5/30/2025 13 Ultra Accelerator Link 2025 Eff.

    95.2 % E ff 92.3% Note: For illustration ▪ TL Flit organized as sixteen 4-byte Sectors ▪ TL Flit is also divided into Upper and Lower 32-byte Half Flits ▪ Control half-flit is used for ▪ Requests, read responses, write responses, flow control and NOP indication ▪ Data uses half & full Flits ▪ Read response data, Write data and byte mask, Atomic operand data and byte mask ▪ Requests & responses may be compressed ▪ Uncompressed Requests = 16B ▪ Compressed Requests = 8B ▪ Uncompressed Responses = 8B ▪ Compressed Responses = 4B
  10. Data Link Layer (DL) – 640B 5/30/2025 14 Ultra Accelerator

    Link 2025 ▪ 640 Byte DL FLIT ▪ Flit Header = 3 Bytes ▪ Segment Hdr = 5 Bytes ▪ CRC = 4 Bytes ▪ Efficiency = 628/640 = 98.125% ▪ FEC Code Word = 680 Bytes ▪ Higher signaling rate (212.5 GHz) to cover the FEC overhead Simplified view for illustration.
  11. ▪ Single tier switches ▪ Number of switch planes scaled

    with bandwidth per accelerator ▪ Number of Accelerators per POD is limited by lanes per switch ▪ POD may be configured as many virtual pods ▪ Virtual POD reconfiguration does not impact each other ▪ Error in one Virtual POD does not impact another ▪ Error recovery expected to be contained to a Virtual POD through Port or Station Reset ▪ Internal Switch Errors may impact the entire POD. Requires application restart Scale-up POD 5/30/2025 15 Ultra Accelerator Link 2025
  12. ▪ Accelerators finely interleave (256B) memory channels ▪ Maximizes bandwidth

    to local and peer GPU memory ▪ Load/store/atomic memory accesses use small packets ▪ Application may communicate with multiple peers simultaneously Data Flow 5/30/2025 16 Ultra Accelerator Link 2025 ▪ TL packs requests and responses into same FLIT ▪ Requests and responses to many destination may be packed together ▪ Reduces latency and area ▪ TL is a light-weight implementation consuming ~0.3 sqmm in N3 technology
  13. • Flexible management models for switches • Ethernet-like appliance model

    • Lightweight PCIe-like switch model • Common work-flows/APIs • Leverage industry specifications • OCP, CPER, etc. • For Telemetry, Accelerator management, RAS, etc. Switch & Cluster Management Ultra Accelerator Link 2025 5/30/2025 18
  14. In Progress 5/30/2025 20 Ultra Accelerator Link 2025 128G DL/PL

    Specification Expected release : July 2025 In-Network Collectives (INC) Specification Expected release : Dec 2025 128G & 200G UCIe PHY Chiplet Specification Under investigation
  15. ▪ UALink addresses industry demand for a scale-up fabric empowering

    efficient, scalable AI applications ▪ Facilitates direct load/store for AI accelerators ▪ Open industry standard enables advanced models across multiple AI accelerators ▪ Advances large AI model training & inference ▪ UALink enables an efficient, low-latency and high bandwidth interconnect across hundreds of accelerators within a few racks ▪ The UALink 200G 1.0 Specification is available for download at: www.ualinkconsortium.org Summary 5/30/2025 21 Ultra Accelerator Link 2025 Thank you!!