to be said: software must execute within the confines of hardware — and the hardware is the ultimate arbiter of performance • The history of improving systems performance consists broadly of: ◦ Process revolutions that propelled all designs ◦ Architectural revolutions that optimized for certain use cases ◦ Software revolutions that allowed us to better utilize hardware • All of these revolutions must live within the confines of economics!
Gordon Moore did not actually coin a law per se — he just a bunch of incredibly astute and prescient observations • The term “Moore’s Law” would be coined by Carver Mead in 1971 as part of his work on determining ultimate physical limits • Moore updated the law in 1975 to be a doubling of transistor density every two years (Denard scaling would be outlined in detail in 1974) • For many years, Moore’s Law could be inferred to be doublings of transistor density, speed, and economics
and early 1990s were great for Moore’s Law — so much so that computers needed a “turbo button” to counteract its effects (!!) • But even in those halcyon years, Moore’s Law was leaving DRAM behind: memory was becoming denser but no faster • An increasing number of workloads began hitting the memory wall • Caching — an essential architectural revolution born in the 1960s — was necessary but insufficient...
mid-1990s, it had become clear that symmetric multiprocessing was the path to deliver throughput on multi-threaded workloads • SMP necessitated its own software revolution (multi-threaded systems), but did little for single-threaded latency • Deep pipelining and VLIW were — largely — failed experiments • For single-threaded workloads, microprocessors turned to out-of-order and speculative execution to hide memory latency • Even in simpler times, scaling with Moore’s Law was a challenge!
stopped 7nm development, citing economics — it was simply too expensive to stay competitive • GlobalFoundries’ departure left TSMC and Samsung on 7nm — and Intel on 14nm, struggling to get to 10nm • Intel’s Cannon Lake was three years late and an unmitigated disaster — and for Ice Lake/Cascade Lake, Intel is intermixing 14nm and 10nm • Moving to 3nm/5nm requires moving beyond FinFETs to GAAFETs — and to EUV photolithography; new nodes are very expensive!
a process node is “7nm” or “5nm”, what exactly is seven nanometers or five nanometers long? (And, um, how big is a silicon atom anyway?) • Answer to the second question: ~210 picometers! • Answer to the first question: nothing! Unbelievably, the name of the process node no longer measures anything at all (!!) — it is merely a rough expression of transistor density (and implication of process) • E.g. 7nm ≈ 100MTr/mm2 (but there are lots of caveats)
is continuing to be possible, but at a greatly slowed pace — and at outsized cost • Moore’s Law has ceased to exist as an economic law • But is there another way of looking at it?
costs of aircraft manufacturing, finding that the cost dropped with experience • Over time, when volume doubled, unit costs dropped by 10-15% • This phenomenon has been observed in other technological domains • In 2013, Jessika Trancik et al. found Wright’s Law to hold better predictive power for transistor cost than Moore’s Law! • Wright’s Law seems to hold, especially for older process nodes
hold, compute will be economically viable in more and more places that were previously confined to hard logic • This is true even on die, where chiplets have made it easier than ever to build a heterogeneous system — and where mixed process nodes have demanded more sophistication • Quick, how many cores are on your server? (Don’t forget the hidden ones!)
many more places is particularly germane to system performance: ◦ More compute close to data (SmartNICs, open-channel SSDs, on-spindle compute) lowers latency ◦ Bringing data to special-purpose compute (GPGPUs, FPGAs) increases throughput • But security and multi-tenancy cannot be an afterthought! • We need to rethink our system software
compute is, at some level, special purpose • These systems are much less balanced than our general-purpose systems — with much less memory and/or non-volatile storage • The overhead of dynamic environments (Java, Go, Python, etc.) is unacceptably high — and the development benefit questionable • Languages traditional used in this domain — C and C++ — both have well-known challenges around safety and composability • Enter Rust, and its killer feature...
many respects, but one that may be underappreciated is its ability to not depend on its own standard library • Much of what is valuable about the language — sum types, ownership model, traits, hygienic macros — is in core, not the standard library • Crates marked “no_std” will not perform any heap allocations — and any such allocation is a compile-time error! • But no_std crates can depend on other no_std crates — lending real composability to a domain for whom it has been entirely deprived
small ◦ E.g., at Oxide, we are developing a message-passing, memory protected system entirely in Rust (Rust microkernel, Rust tasks); minimal systems are 30K — and entirely realistic ones are < 200K! • no_std is without real precedent in other languages or environments; it allows Rust to be put in essentially arbitrarily confined contexts • Rust is the first language since C to meaningfully exist at the boundary of hardware and systems software!
will continue to hold, resulting in more compute in more places — and this compute will be essential for systems performance • These compute elements will be increasingly special-purpose, and are going to require purpose-fit software • Rust is proving to be an excellent fit for these use cases! • We fully expect many more open source, de novo hardware-facing Rust-based systems — and thanks to no_std they will be able to leverage one another; the Rust revolution is here!