Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DAOS - IT Press Tour #61 April 2025

DAOS - IT Press Tour #61 April 2025

The IT Press Tour

April 02, 2025

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Agenda • DAOS Foundation • Project History & Vision •

    Technical Overview • Software Ecosystem • Roadmap • Deployments & Performance
  2. Mission • The DAOS Foundation exists to ◦ Maintain DAOS

    as an open source project independent of any one organization ◦ Foster the developer and user communities around DAOS ◦ Guide the direction of the overall DAOS project ◦ Promote the use of DAOS • Governing Board ◦ Defines budget and approves expenses ◦ Oversee efforts of other subcommittees ◦ Approve roadmap provided by TSC ◦ Vote on matters as needed
  3. Meetings • Governing Board ◦ Weekly meeting on Wednesday ◦

    Currently open only to Board members • Technical Steering Committee ◦ Weekly on rotating schedule ▪ Monday ▪ Wednesday ◦ Working Groups - rotating schedule
  4. How to Join • Two step process for any organization

    ◦ Join the Linux Foundation (at any level) ◦ Join the DAOS Foundation • https://daos.io/how-to-join-the-daos-foundation • DAOS Foundation ◦ 3 levels with 5 fees DAOS Foundation Membership Level Annual Fees Premier 25,000 USD Premier for LF Associate Members 15,000 USD General 15,000 USD General for LF Associate Members 6,000 USD Associate for LF Associate Members 0 USD
  5. DAOS Foundation Levels • Premier Membership ◦ Each Premier Member

    can appoint a voting member to the DAOS Foundation’s Governing Board, its Outreach Committee, and to any other committee that the DAOS Foundation may establish (including the TSC). • General Membership ◦ The group of all General Members annually elect up to three voting representatives to the DAOS Foundation’s Governing Board (depending on the number of General Members). ◦ Each General Member can appoint a non-voting member to the DAOS Foundation’s Outreach Committee. • Associate Membership ◦ The Associate Members can participate in the activities of the DAOS Foundation, but have no seat on the Governing Board and no voting rights.
  6. 2024 Expense Summary Area Budget (USD) Actual Spend Description Community

    Engagement 27,500 0 DUG Event(s) and press releases Legal 11,000 0 Trademarks and filings Board Operations 23,750 23,750 LF project management (prorated) Development 18,400 3,200 Cloud/Hosting/Tools, Community travel, CI/CD General & Administrative 8,100 10,350 LF fee on membership revenue (9%)
  7. 2024 Achievements and 2025 Goals 2024 • Added VDURA to

    Foundation • Completed transfer of DAOS assets from Intel to Foundation • Completed charters for foundation and TSC • Regular TSC meetings including collaboration to align v2.6 • DUG’24! 2025 • Recruiting new members • Update website and promotional materials • Complete trademark of DAOS • Release DAOS v2.8 ◦ First community release • Event Planning ◦ In-person DUG event ◦ Virtual DUG event ◦ Continued presence at conferences
  8. TSC Structure • Voting Members ◦ Argonne: Kevin Harms ◦

    Google: corwin ◦ HPE: Lance Evans ◦ Intel: Allison Goodman ◦ Vdura: Brian Mueller ◦ TSC Chair: Johann Lombardi • Meet weekly (public) with rotating schedule ◦ Members distributed across US, EU, China and Australia
  9. TSC Scope • Define community roadmap (2.8+) ◦ Gather contributions

    from all community members ◦ Publish roadmap on https://daos.io • Produce community releases (2.8+) ◦ Track progress, review jira tickets & test results ◦ Tag release and sign/distribute packages ◦ Provide docker images • Organize DAOS development ◦ Simplify contributions ◦ Organize gatekeeping (members, responsibilities, process) ◦ Document contribution process
  10. TSC Scope • Community test infrastructure ◦ Goal: artifacts and

    logs available to all contributors ◦ Expand coverage ▪ ARM/AMD ▪ More fabrics ▪ More linux distributions ▪ Cloud environments ▪ Focus on pmem-less mode • Working groups ◦ Open to anyone ◦ Forums for DAOS users/administrators/contributors to exchange ◦ Rotating schedule
  11. 16 DAOS History 2012 2013 2014 2015 2016 2017 2018

    2019 2020 2021 2022 2023 2024 2025 Fast Forward Storage & I/O Extreme Scale Storage & I/O ECP Pathforward Coral NRE Prototype over Lustre - Build over ZFS OSD - DAOS API over Lustre Standalone prototype - OS-bypass - Persistent memory via PMDK - Replication & self healing DAOS embedded on FPGA - Disaggregated I/O - Monitoring - NVMe SSD support via SPDK DAOS Productization for Aurora - Hardening - 10+ new features - Support for extra AI/Big data frameworks Intel acquires whamcloud v0.1 v0.2 v0.3 v0.4 v0.5 v1.0 v1.2 v2.0 v2.2 v2.4 v2.6 Intel offers L3 support Intel discontinues Optane PMEM-less support IO500 #1 11 systems in IO500 top 22 Aurora breaks 8TiB/s Aurora breaks 20TiB/s First DAOS ARM system 4 systems in Prod IO500 top 7 (2 in top 2) DAOS Foundation Inception v2.6.3 Aurora in Production Parallelstore GA
  12. DAOS: Nextgen Open Storage Platform AI/Analytics/Scientific Workflow GPGPU CPU Admin

    Compute Instances RDMA Files Blocks Objects AI Frameworks HPC I/O Middleware Big data Frameworks libdaos • Platform for innovation • Files, blocks, objects and more • Full end-to-end userspace • Flexible built-in data protection ◦ EC/replication with self-healing • Flexible network layer • Efficient single server ◦ O(100)GB/s and O(1M) IOPS per server • Highly scalable ◦ TB/s and billions IOPS of aggregated performance ◦ O(1M) client processes • Time to first byte in O(10) μs UCX/Libfabric DAOS Control Plane DAOS Engine DAOS Instances RPC
  13. DAOS Design Fundamentals • No read-modify-write on I/O path (use

    versioning) • No locking/DLM (use MVCC) • No client tracking or client recovery • No centralized (meta)data server • No global object table • Non-blocking I/O processing (futures & promises) • Serializable distributed transactions • Built-in multi-tenancy • User snapshot Scalability & Performance High IOPS Unique Capabilities
  14. Storage Pooling - Multi-tenancy DAOS System … Pool 1 Apollo

    Tenant 100PB 20TB/s 200M IOPS Pool 2 Gemini Tenant 10PB 2TB/s 20M IOPS Pool 3 Mercury Tenant 30TB 80GB/s 2M IOPS Apollo Tenant Gemini Tenant Mercury Tenant Dataset 1 Dataset 2 Dataset 3 Dataset 4 Engine #1 Engine #2 Engine #3 Engine #n
  15. Dataset Management • New data movel to unwind 30+y of

    file-based management • Introduce notion of dataset • Basic unit of storage • Datasets have a type • POSIX datasets can include trillions of files/directories • Advanced dataset query capabilities • Unit of snapshots • ACLs/IAM POSIX Dataset root dir dir file file file file Python Dataset obj obj obj obj obj obj obj obj obj obj obj obj KV Dataset value key value key value key value key value key value key value key
  16. Object Interface e.g. POSIX Dataset root dir dir file file

    file file Mapping 128-bit object Identifier Object DAOS Container obj obj obj obj obj obj obj Middleware/Framework View DAOS Layout View Array Multi-dimensional Array Key-value Store Multi-level Key-value Store • No object create/destroy • No size, permission/ACLs or attributes • Sharded and erasure-coded/replicated • Algorithmic object placement • Very short Time To First Byte (TTFB)
  17. Software Ecosystem Generic I/O Middleware/frameworks Domain-specific data models under development

    in co-design with partners Native array Native key-value RDMA (UCX/Libfaric) SEGY FDB ROOT DAQ libdfs (Parallel Filesystem) libdaos (key-value-array interface) AI/Analytics/Scientific Workflow GPGPU CPU Compute Instances POSIX I/O / “Files” FUSE & Interception S3 Radosgw Block NVMe-oF SPDK DAOS bdev Python pydaos Hadoop Connector MPI-IO DAOS ROMIO HDF5 DAOS VOL PyTorch TensorFlow
  18. POSIX Support & Interception 1. Userspace DFS library with API

    like POSIX ◦ Require application changes ◦ Low latency & high concurrency ◦ No caching 2. DFUSE daemon to support POSIX API ◦ No application changes ◦ VFS mount point & high latency ◦ Caching by Linux kernel 3. DFUSE + Interception library ◦ No application changes ◦ 2 flavors using LD_PRELOAD ◦ libioil ▪ (f)read/write interception ▪ Metadata via dfuse ◦ libpil4dfs ▪ Data & metadata interception ▪ Aim at delivering same performance as #1 w/o any application change ▪ Mmap & binary execution via fuse DFS - DAOS Filesystem (libdfs) DAOS Library (libdaos) Interception Library libpil4dfs libioil Application/Framework dfuse Single process address space Kernel bypass DAOS Storage Engine RPC RDMA System calls Linux Kernel Data & metadata Data 1 3b 3a 3 2 1 3a 3b 2
  19. PyTorch DAOS Modules • Collaboration between Enakta Labs and Google

    • DataLoader and Checkpoint modules ◦ Support for both iterable and map-style datasets ◦ High parallelism using several DAOS event queues ◦ Parallel namespace scanning using dfs anchor API torch_api.py pytorch.utils.* torch_shim.c DAOS Filesystem (libdfs) Time to scan 1.1M Files Regular scan 291s Optimized scan 32s
  20. DAOS Community Roadmap Color coding schema: Committed (or released) release/features

    In-planning release/features Future possible release/features DAOS 2.6 OS Packages: - Leap 15.5 - RHEL/Rocky/Alma 8.8/9.2 Networking: - Change provider w/o reformat - MD duplicate RPC detection Features: - Non-PMem support phase 1 - libpil4dfs - Intel VMD hotplug - Delayed rebuild Tech preview: - Distributed consistency checker (CR) UX Improvements: - Improved version interoperability DAOS 2.8 OS Packages: - Leap 15.6 - RHEL/Rocky/Alma 8.10/9.4 Networking: - DOCA-OFED support Features: - Optimized object placement - Mount POSIX snapshots RO - Client telemetry - Incremental rebuild/reintegration - Catastrophic recovery and distributed consistency checker - Fault domains beyond servers Tech preview: - Non-PMem support phase 2 - Pytorch data loader - Rolling upgrade preparation UX Improvements: - Reintegration of all pools - daos pool listing DAOS 3.0 OS Packages: - Leap 15.7 (x86_64) - RHEL/Rocky/Alma 8.10/9.x (x86_64) - RHEL/Rocky/Alma 9.x client (ARM64) - Ubuntu 22.04 client (x86_64/ARM64) Features: - Non-PMem support phase 2 - SSD hotplug & LED without VMD Tech preview: - Rolling upgrade - WORM containers phase 1 - Multi-provider support - flock support - SSD encryption support via SED DAOS 3.x OS Packages: - Leap 15.7 (x86_64) - RHEL/Rocky/Alma 8.10/9.x (x86_64) - RHEL/Rocky/Alma 9.x (ARM64/x64_64) - Ubuntu 24.04 client (x86_64/ARM64) Features: - Pool resizing - Inline compression - Inline encryption - Inline deduplication - Middleware consistency checker - Progressive layout - Pipeline API - SQL support with predicate pushdown - Distributed transactions - Pool/container freeze - CXL SSD support / QLC - Tiered container phase 1 - Support for multiple DAOS systems - multi-NIC support per engine/process - hardlinks support in libdfs - network multipath support - Container parking/serialization Jul’24 Q4’25 Q2’26 H2’26+ DAOS 2.6 (Intel Release) DAOS 3.0 (DAOS Foundation Release) Future Releases (DAOS Foundation Release) DAOS 2.8 (DAOS Foundation Release)
  21. Aurora DAOS System • 1024x DAOS Storage nodes ◦ 2x

    Xeon 5320 CPUs (ICX) ◦ 512GB DRAM ◦ 8TB Optane Persistent Memory 200 ◦ 244TB NVMe SSDs ◦ 2x HPE Slingshot NICs • Supported data protection schemes ◦ No data protection ◦ All EC flavors: 2+1, 2+2, 4+1, 4+2, 8+1, 8+2, 16+1 and 16+2 ◦ N-way replication • Usable DAOS capacity ◦ between 220PB and 249PB depending on redundancy level chosen
  22. Aurora IO500 Run Features Values Number of MPI tasks/processes 63k

    Number of DAOS servers 642 Number of DAOS engines 1284 Largest Pool 160PiB Largest file 8.5PiB Total number of files 177 Billions Number of files in a single directory 33 Billions
  23. SuperMUC NG System SuperMUC NG Phase 2 DAOS • 42x

    Lenovo Storage nodes ◦ 2x Xeon 8352Y CPUs (ICX) ◦ 512GB DRAM ◦ 8x 3.84TB NVMe SSDs ◦ 2x HDR IB NICs ◦ 2TB Optane Persistent Memory 200 • 90x Client nodes
  24. SuperMUC NG System Comparison SuperMUC NG Phase 2 DAOS •

    42x Lenovo Storage nodes ◦ 2x Xeon 8352Y CPUs (ICX) ◦ 512GB DRAM ◦ 8x 3.84TB NVMe SSDs ◦ 2x HDR IB NICs ◦ 2TB Optane Persistent Memory 200 • 90x Client nodes IRIS MSKCC WekaIO • 54x Dell Storage nodes ◦ 2x Xeon 5317 CPUs (ICX) ◦ 256GB DRAM ◦ 8x 15TB NVMe SSDs ◦ 2x HDR IB NICs • 261x Client nodes Source: https://io500.org/submissions/configuration/719 https://io500.org/submissions/view/683
  25. SuperMUC NG Performance Comparison SuperMUC NG Phase 2 DAOS IRIS

    MSKCC WekaIO Source: https://io500.org/submissions/configuration/719 https://io500.org/submissions/view/683
  26. SuperMUC NG Performance Comparison SuperMUC NG Phase 2 DAOS IRIS

    MSKCC WekaIO Source: https://io500.org/submissions/configuration/719 https://io500.org/submissions/view/683
  27. IO500 Per-server Performance (production list) Source: https://io500.org DAOS (Aurora) DAOS

    (LRZ) Lustre Weka DAOS (Aurora) DAOS (LRZ) Lustre Weka DAOS (Aurora) DAOS (LRZ) Lustre Weka
  28. Resources • Foundation website: https://daos.io/ • Github: https://github.com/daos-stack/daos • Online

    doc: https://docs.daos.io • Mailing list & slack: https://daos.groups.io • YouTube channel: http://video.daos.io • Virtual DAOS User Group on May 22, 2025: https://daos.io/event/virtual-dug-25