Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I have come to bury the BIOS, not to open it: The need for holistic systems

Bryan Cantrill
September 19, 2022

I have come to bury the BIOS, not to open it: The need for holistic systems

Talk given at OSFC 2022 on September 19, 2022 in Gothenburg, Sweden. Video: https://vimeo.com/756050840

Bryan Cantrill

September 19, 2022
Tweet

More Decks by Bryan Cantrill

Other Decks in Technology

Transcript

  1. I have come to bury the
    BIOS, not to open it
    The need for holistic systems
    Bryan Cantrill
    Oxide Computer Company

    View full-size slide

  2. OXIDE
    In the beginning…
    • In the beginning, computing systems were holistic: hardware and
    software were designed together, to work with one another
    • System software was delivered with the computer – but often had to be
    newly developed for a new machine
    • Requiring new software for new hardware created schedule delays, viz.
    OS/360 and The Mythical Man Month
    • With the advent of Unix, system software became portable – it could be
    ported rather than developed de novo for each new computer…

    View full-size slide

  3. OXIDE
    Unix spreads – and feuds
    • The portability of Unix accelerated the minicomputer and workstation
    revolutions, with each manufacturer having its own variants
    • The systems of this era remained broadly holistic: the hardware and
    software were (broadly) designed with the other in mind
    • …but despite the original ethos of Unix, the variants themselves
    remained entirely proprietary – and the differences between them ignited
    the Unix Wars of the 1980s and 1990s

    View full-size slide

  4. OXIDE
    Elsewhere, homebrew computing
    • With the rise of the microcomputer, computing became much more
    broadly available in the 1970s – but nearly absurd variety with respect to
    hardware made software standardization challenging
    • The hardware-specific half of CP/M – the dominant microcomputer OS
    of the 1970 – was the Basic Input Output System, and could be
    delivered separately
    • This gave rise to hardware vendors delivering ROMs that contained
    platform enablement code roughly standardized as a “System BIOS”

    View full-size slide

  5. OXIDE
    The IBM PC era
    • With the emergence of the IBM PC – and its de facto standardization by
    Compaq – the system software/BIOS split became irreconcilable
    • Essential hardware enabling-software was driven into the BIOS
    • The BIOS interface became what system software bound to – it became
    the definition of “compatibility”
    • Worse, the software components on both sides of the BIOS/OS divide
    were nearly exclusively proprietary, serving to harden the boundary

    View full-size slide

  6. OXIDE
    It gets worse: SMM
    • In order to be able implement system software functionality delivered by
    the hardware – e.g., laptop suspend and resume – system management
    mode was invented
    • SMM allows effectively arbitrary, hidden code execution at arbitrary time
    without even allowing system software awareness
    • This is the opposite of a holistic system: it is one that has been
    deliberately and perniciously divided!

    View full-size slide

  7. OXIDE
    EFI/UEFI
    • All of this might have been fine had x86 remained relegated to personal
    computing…
    • …but Intel and AMD out-executed the RISC vendors in the 2000s,
    forcing PC constructs into the server space
    • Starting with (ill-fated) Itanium, Intel introduced EFI in an attempt to
    modernize…

    View full-size slide

  8. OXIDE
    UEFI: What might have been
    Source: Beyond BIOS: Developing with the Unified Extensible Firmware Interface

    View full-size slide

  9. OXIDE
    UEFI: What happened instead
    • While its goals were laudable, UEFI was overconstrained
    • In particular, the need for legacy and Windows compatibility required
    UEFI to support all past abstractions
    • UEFI has become the worst of all worlds: complicated, proprietary
    software that remains at once isolated from – yet also still entirely
    entangled with! – system software
    • UEFI has become so entangled with lowest-level platform enablement
    that non-UEFI platforms are effectively impossible

    View full-size slide

  10. OXIDE
    It gets worse, again: Hidden cores
    • A dividend of Moore’s Law: formerly discrete components were
    increasingly pulled first into large ASICs – and then pulled on-die into a
    system-on-a-chip
    • Especially as I/O was brought directly into the die, CPUs developed an
    increasing numbers of non-architectural cores to manage it
    • But these cores are hidden to system software – the operating system
    is being confined to an increasingly narrow slice of the true hardware
    capabilities of the system…

    View full-size slide

  11. OXIDE
    …which is not lost on everyone!
    Timothy Roscoe, OSDI 2021 Keynote, It's Time for Operating Systems to Rediscover Hardware

    View full-size slide

  12. OXIDE
    The battle for non-architectural cores
    • Roscoe (rightfully) calls this a “security catastrophe”
    • The non-architectural cores are – on x86 CPUs anyway – entirely
    proprietary, with all of its concomitant problems; that the system is
    “open source” is increasingly a myth
    • Roscoe correctly identifies the problem, but understates the severity:
    this isn’t a retreat of Linux – it is a resurgence of proprietary operating
    systems, wrapping themselves in firmware

    View full-size slide

  13. OXIDE
    Is an open source BIOS the answer?
    • An open source BIOS is certainly valuable and laudable – but if history is
    any guide, it is also not sustainable
    • The problem is not (merely) the proprietary BIOS – it is the ubiquity of the
    abstraction that splits our stack into open and proprietary halves
    • The presence of a deeply proprietary platform enablement layer allows
    for wildly complicated SoCs to have vast, undocumented elements – the
    implementation of the firmware has become the documentation!
    • We need a different model

    View full-size slide

  14. OXIDE
    The need for (a to return to) holistic systems
    • The platform enablement boundary as we know it today is largely
    vestigial – it serves to create abstractions that are broadly unnecessary
    • We need systems that obliterate these boundaries – that are rather
    holistic systems in which software and hardware are co-designed
    • Resetting system state over the course of booting is not holistic!
    • Holistic systems require us to be willing to take up Roscoe’s challenge
    and adopt SoC specificity in our operating systems

    View full-size slide

  15. OXIDE
    Oxide’s approach
    • At Oxide, we are taking a from-scratch, rack-scale approach to
    server-side computing, with AMD Milan-based sleds of our own design
    • We do not have a traditional BMC, but rather a fit-to-purpose service
    processor (an STM32H753) and RoT (LP55S28), both running our own
    (Rust-based, open source) OS, Hubris (see Cliff Biffle’s OSFC 2021 talk!)
    • Our approach is holistic but open
    • Could we develop a truly holistic system on x86?

    View full-size slide

  16. OXIDE
    Aside: AMD Details
    • On AMD, the Platform Security Processor (PSP) is a non-architectural
    core that executes proprietary software to perform system initialization –
    including DRAM training
    • System management controller (SP in our case) puts the PSP payload
    into SPI flash and brings the CPU out of reset
    • The PSP will perform its initialization and eventually vector into host
    software executing on the bootstrap core (BSC)
    • Historically, post-PSP initialization done by AMD’s AGESA firmware –
    which makes a holistic system impossible

    View full-size slide

  17. OXIDE
    Challenge #1: Initialization
    • To implement holistic boot, system software must perform the activities
    historically done by AGESA
    • Modern CPUs are very complicated! Post-PSP initialization includes
    configuring I/O interconnects, core complexes, etc.
    • For AMD Milan, this specifically includes DXIO engine configuration,
    NBIO PCIe strapping, hotplug configuration
    • The software that has implemented this level of initialization has
    historically been done by the CPU vendor; these interfaces are not
    always documented thoroughly – if at all!

    View full-size slide

  18. OXIDE
    Challenge #2: Boot Phasing
    • Payload that boots from PSP is size-constrained to ~13MB
    • Stage-based approaches (e.g., oreboot + LinuxBoot) use Linux drivers
    to load (and execute) a production kernel
    • This necessitates a pseudo-reset of the system – as well as the creation
    or emulation of an interface (e.g., ACPI) to pass system state to later
    stages
    • We instead adopt a phase-based approach whereby part of the
    system is loaded from SPI NOR and is able to load the remainder from
    SSDs – but the system is never discarded

    View full-size slide

  19. OXIDE
    Holistic booting!
    • Helios is our illumos derivative that includes the Oxide bhyve-based
    hypervisor – and runs our rack-wide control plane
    • We have holistic Helios booting on our EVT compute sleds, including all
    necessary functionality for platform initialization (I/O, SMP, etc.)
    • Phased boot has enough in SPI to be able to import ZFS pools from M.2
    devices
    • Helios – along with all Oxide-authored software – will be open source
    when we ship our first racks at the end of the year!

    View full-size slide

  20. OXIDE
    Towards holistic systems
    • Holistic systems have clear advantages in terms of reliability, security,
    observability, manageability, sustainability, etc.
    • Based on our experience to date, holistic systems are challenging to
    implement but emphatically attainable
    • Documentation from microprocessor vendors is essential; they
    have much to gain by encouraging more software on their platforms!
    • Oxide may represent the first open, holistic server-side system in the
    post-PC x86 era – but unlikely to be the last!

    View full-size slide