were holistic: hardware and software were designed together, to work with one another • System software was delivered with the computer – but often had to be newly developed for a new machine • Requiring new software for new hardware created schedule delays, viz. OS/360 and The Mythical Man Month • With the advent of Unix, system software became portable – it could be ported rather than developed de novo for each new computer…
Unix accelerated the minicomputer and workstation revolutions, with each manufacturer having its own variants • The systems of this era remained broadly holistic: the hardware and software were (broadly) designed with the other in mind • …but despite the original ethos of Unix, the variants themselves remained entirely proprietary – and the differences between them ignited the Unix Wars of the 1980s and 1990s
microcomputer, computing became much more broadly available in the 1970s – but nearly absurd variety with respect to hardware made software standardization challenging • The hardware-specific half of CP/M – the dominant microcomputer OS of the 1970 – was the Basic Input Output System, and could be delivered separately • This gave rise to hardware vendors delivering ROMs that contained platform enablement code roughly standardized as a “System BIOS”
the IBM PC – and its de facto standardization by Compaq – the system software/BIOS split became irreconcilable • Essential hardware enabling-software was driven into the BIOS • The BIOS interface became what system software bound to – it became the definition of “compatibility” • Worse, the software components on both sides of the BIOS/OS divide were nearly exclusively proprietary, serving to harden the boundary
able implement system software functionality delivered by the hardware – e.g., laptop suspend and resume – system management mode was invented • SMM allows effectively arbitrary, hidden code execution at arbitrary time without even allowing system software awareness • This is the opposite of a holistic system: it is one that has been deliberately and perniciously divided!
had x86 remained relegated to personal computing… • …but Intel and AMD out-executed the RISC vendors in the 2000s, forcing PC constructs into the server space • Starting with (ill-fated) Itanium, Intel introduced EFI in an attempt to modernize…
laudable, UEFI was overconstrained • In particular, the need for legacy and Windows compatibility required UEFI to support all past abstractions • UEFI has become the worst of all worlds: complicated, proprietary software that remains at once isolated from – yet also still entirely entangled with! – system software • UEFI has become so entangled with lowest-level platform enablement that non-UEFI platforms are effectively impossible
of Moore’s Law: formerly discrete components were increasingly pulled first into large ASICs – and then pulled on-die into a system-on-a-chip • Especially as I/O was brought directly into the die, CPUs developed an increasing numbers of non-architectural cores to manage it • But these cores are hidden to system software – the operating system is being confined to an increasingly narrow slice of the true hardware capabilities of the system…
this a “security catastrophe” • The non-architectural cores are – on x86 CPUs anyway – entirely proprietary, with all of its concomitant problems; that the system is “open source” is increasingly a myth • Roscoe correctly identifies the problem, but understates the severity: this isn’t a retreat of Linux – it is a resurgence of proprietary operating systems, wrapping themselves in firmware
open source BIOS is certainly valuable and laudable – but if history is any guide, it is also not sustainable • The problem is not (merely) the proprietary BIOS – it is the ubiquity of the abstraction that splits our stack into open and proprietary halves • The presence of a deeply proprietary platform enablement layer allows for wildly complicated SoCs to have vast, undocumented elements – the implementation of the firmware has become the documentation! • We need a different model
• The platform enablement boundary as we know it today is largely vestigial – it serves to create abstractions that are broadly unnecessary • We need systems that obliterate these boundaries – that are rather holistic systems in which software and hardware are co-designed • Resetting system state over the course of booting is not holistic! • Holistic systems require us to be willing to take up Roscoe’s challenge and adopt SoC specificity in our operating systems
from-scratch, rack-scale approach to server-side computing, with AMD Milan-based sleds of our own design • We do not have a traditional BMC, but rather a fit-to-purpose service processor (an STM32H753) and RoT (LP55S28), both running our own (Rust-based, open source) OS, Hubris (see Cliff Biffle’s OSFC 2021 talk!) • Our approach is holistic but open • Could we develop a truly holistic system on x86?
Processor (PSP) is a non-architectural core that executes proprietary software to perform system initialization – including DRAM training • System management controller (SP in our case) puts the PSP payload into SPI flash and brings the CPU out of reset • The PSP will perform its initialization and eventually vector into host software executing on the bootstrap core (BSC) • Historically, post-PSP initialization done by AMD’s AGESA firmware – which makes a holistic system impossible
software must perform the activities historically done by AGESA • Modern CPUs are very complicated! Post-PSP initialization includes configuring I/O interconnects, core complexes, etc. • For AMD Milan, this specifically includes DXIO engine configuration, NBIO PCIe strapping, hotplug configuration • The software that has implemented this level of initialization has historically been done by the CPU vendor; these interfaces are not always documented thoroughly – if at all!
PSP is size-constrained to ~13MB • Stage-based approaches (e.g., oreboot + LinuxBoot) use Linux drivers to load (and execute) a production kernel • This necessitates a pseudo-reset of the system – as well as the creation or emulation of an interface (e.g., ACPI) to pass system state to later stages • We instead adopt a phase-based approach whereby part of the system is loaded from SPI NOR and is able to load the remainder from SSDs – but the system is never discarded
includes the Oxide bhyve-based hypervisor – and runs our rack-wide control plane • We have holistic Helios booting on our EVT compute sleds, including all necessary functionality for platform initialization (I/O, SMP, etc.) • Phased boot has enough in SPI to be able to import ZFS pools from M.2 devices • Helios – along with all Oxide-authored software – will be open source when we ship our first racks at the end of the year!
in terms of reliability, security, observability, manageability, sustainability, etc. • Based on our experience to date, holistic systems are challenging to implement but emphatically attainable • Documentation from microprocessor vendors is essential; they have much to gain by encouraging more software on their platforms! • Oxide may represent the first open, holistic server-side system in the post-PC x86 era – but unlikely to be the last!