Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intelligence is not enough: The humanity of engineering

Bryan Cantrill
October 06, 2023
1.7k

Intelligence is not enough: The humanity of engineering

Presentation that I gave at Monktoberfest 2023. Video at https://www.youtube.com/watch?v=bQfJi7rjuEk

Bryan Cantrill

October 06, 2023
Tweet

Transcript

  1. Intelligence is not enough
    The humanity of engineering
    Bryan Cantrill
    Oxide Computer Company

    View full-size slide

  2. OXIDE
    It always starts with a tweet…

    View full-size slide

  3. OXIDE
    It always starts with a tweet being trolled…

    View full-size slide

  4. OXIDE
    It always starts with a tweet being trolled…

    View full-size slide

  5. OXIDE
    It always starts with a tweet being trolled…

    View full-size slide

  6. OXIDE
    It always starts with a tweet being trolled…

    View full-size slide

  7. OXIDE
    “Serious”?
    • This tweet used the word “serious” three times, mainly to deride others
    • Not clear what “serious” means in the context of an argument that
    equates a computer program with nuclear weapons?
    • Or accuses anyone who disagrees with this assessment of “just vibes”?
    • Or one that puts the risk of human extinction at the (metaphorical!)
    hands of a computer program to be 5% with zero methodology?
    • So, a serious question: why treat this seriously at all?

    View full-size slide

  8. OXIDE
    Reasons to treat this seriously
    • Fear of technology isn’t new – and isn’t always poorly founded!
    • New technologies often have unintended consequences and
    externalities that merit consideration and discussion
    • But in those who believe in AI-based extinction risk, the fear itself is
    alarming – in part because of the actions that it would justify
    • The “AI pause” – if implemented – would be brazenly authoritarian
    • The accompanying rhetoric is often disturbingly violent

    View full-size slide

  9. OXIDE
    Concrete extinction risk
    • Most AGI-based extinction risk fears – when made concrete – hinge on:
    ○ A computer program getting ahold of nuclear weapons
    ○ A computer program making a novel bioweapon
    ○ A computer program developing novel molecular nanotechnology
    • We are going to leave aside nuclear weapons, as indisputably serious
    people have been thinking about it since the dawn of the atomic age
    • But the latter two have something important in common…

    View full-size slide

  10. OXIDE
    Superintelligent engineering?
    • Whether stated explicitly or not, when we talk about the fear of a
    superintelligent AI actively killing not just some humans but all of them,
    we are talking about AI making weapons
    • Let us leave aside many questions about such scenarios (e.g., AI’s
    alignment, motivation, or means of production – and human adaptability,
    countermeasures, and resilience), and focus on one pillar…
    • It depends on AI making applying the constraints of physical and
    mathematical reality to make new stuff – which is to say, engineering

    View full-size slide

  11. OXIDE
    Engineering and intelligence
    • If our very existence is threatened by a superintelligence engaged in
    engineering, it prompts an important question…
    • Is engineering an act of intelligence alone?
    • I can’t speak to building novel bioweapons or the significant challenges
    in reviving otherwise moribund molecular nanotechnology…
    • …but we do have a bunch of recent experience building something big
    and new that is surely simpler than these domains

    View full-size slide

  12. OXIDE
    What we built!

    View full-size slide

  13. OXIDE
    Building a computer
    • In case it needs to be said: building a new computer + new network
    switch + high-speed backplane + all software from lowest levels of
    firmware to highest levels of control plane is hard and complicated
    • It is still, however, engineering not science
    • Engineering is the act of learning from failure: even when building anew,
    there will be many occasions when the system does not, in fact, work!
    • It is worth exploring a tiny fraction of the failures that we endured in
    building, as they are instructive as to the nature of engineering…

    View full-size slide

  14. OXIDE
    Failure to bring CPU out of reset
    • Despite following the documented power sequencing to the CPU (AMD
    Milan), it was refusing to come out of reset, simply reinitiating the
    power-on sequence after 1.25 seconds of inactivity
    • Natural assumption was that power was marginal – but the power
    looked good (and making it extraordinary didn’t change anything)
    • Went down any number of blind alleys, performing directed experiments
    with respect to non-connected pins that shouldn’t make any difference
    • These experiments weren’t easy!

    View full-size slide

  15. OXIDE
    Failure to bring CPU out of reset

    View full-size slide

  16. OXIDE
    Failure to bring CPU out of reset
    • After several weeks of debugging, we discovered that our voltage
    regulator had a firmware bug: it adjusted voltage as requested by the
    CPU via SVI2 – but never sent a completion (VOTF Complete)
    • The CPU had no way of knowing that the power was in fact correct
    • AMD’s tool for verifying power (SDLE) did not check for this packet
    • Corrected regulator firmware resulted in the CPU coming out of reset!

    View full-size slide

  17. OXIDE
    Failure to bring NIC out of reset
    • We could not get the Chelsio NIC to come out of reset
    • Extensive validation did not reveal any signal that was out of spec
    • Attempting to take a working add-in card (AIC) and destroy it revealed
    that one of the pinstrap resistors (to select the clock source) was
    incorrectly specified
    • We had a 1K ohm pull-down resistor, but this was in fact too weak –
    and a 499 ohm resistor was required to overcome an internal pull-up
    • Reworking with the correct resistor resulted in the NIC correctly starting!

    View full-size slide

  18. OXIDE
    NIC transiently failing to train all PCIe lanes
    • We have our own platform enablement layer (i.e., no BIOS); we are
    responsible for initializing devices at the lowest layer
    • With disconcerting frequency, some number of Chelsio NIC links did not
    train correctly for some of their lanes on boot
    • Decoding the Link Status and Training State Machine (LSTSM) on the
    CPU allowed us to better understand where it was failing, but not why
    • Discovered that a second PERST resulted in correct training – and
    moreover that this second PERST is present on legacy firmware!

    View full-size slide

  19. OXIDE
    Failure to connect to U.2 NVMe drives
    • In a revision of our PCIe-to-U.2 passthrough card (Sharkfin), we had I2C
    connectivity – but no PCIe connectivity whatsoever
    • A previous version of this card had worked, but little had changed in the
    schematic and the layout – why were the new ones broken?!
    • Physical inspection revealed that one of the parts was simply wrong!
    • The wrong reel of parts had been loaded into a pick-and-place machine,
    and an inverter had been laid down instead of an AND gate (!)
    • Reworked ~1200 cards in ~96 hours!

    View full-size slide

  20. OXIDE
    Random data corruption on software install
    • When installing OS boot images, sporadic (!) corruption was seen
    • Adding checksums to these images revealed corruption was rampant (!!)
    • Microprocessor was speculatively loading through a stowaway mapping
    from early boot, which was allocating in the TLB
    • If application address conflicted with address of stowaway mapping,
    kernel would incorrectly copy data from the wire to the wrong location
    • Eliminating stowaway mapping eliminated the corruption – but
    highlighted divergent perspectives on side-effects of speculative loads

    View full-size slide

  21. OXIDE
    What do these have in common?
    • Each posed an existential risk for the artifact: without solving them, we
    wouldn’t have something that’s impaired – we would have nothing
    • Each revealed an emergent property, often at an interface boundary
    • The breakthrough was often something that “shouldn’t” have worked
    • Intelligence alone does not solve problems like this
    • In all cases, we summoned other elements of our character: our
    resilience, our teamwork, our rigor, our optimism, our curiosity

    View full-size slide

  22. OXIDE
    Values in engineering
    • These extra-intelligence values are so important to us, that we have
    codified them – and use them very explicitly as a lens for hiring
    • To be clear, we are certainly seeking capable, intelligent people – but
    that intelligence is useless without these shared (human!) values
    • We may be more explicit about it than others, but many engineering
    teams are also implicitly hiring for shared values
    • Viz.: It is comical to think of an engineering team hiring based only on
    the results of a test – or any other linear measure of intelligence!

    View full-size slide

  23. OXIDE
    The humanity in engineering
    • This humanity necessary to understand and resolve failure – so essential
    in designing and building – is hidden in the final artifact
    • This is the soul in Tracy Kidder’s Soul of a New Machine – and the
    perspiration in Edison’s proverbial 99% perspiration
    • Computer programs lack this humanity: they do not have willpower,
    desire, or drive – let alone the deeper human qualities required
    • Which doesn’t mean that AI can’t be useful to engineers, merely that it
    cannot engineer autonomously

    View full-size slide

  24. OXIDE
    So, should we worry about AI?
    • Extinction risk due to AGI is de minimis – but we must not falsely
    dichotomize AI into posing existential risk or no risk whatsoever!
    • The risk that AI does pose may feel mundane – but it is much more
    how it will be abused (deliberately or accidentally) by existing structures
    • AI ethics is exceedingly important, especially when it is being used to
    inform decisions that affect people’s lives!
    • By acknowledging that AI is and will be an important tool, we can move
    beyond fear to focus on enforcing existing regulatory regimes

    View full-size slide

  25. OXIDE
    Further wells to fall down information
    • Richard Smalley/K. Eric Drexler debate on molecular nanotechnology
    • Lex Friedman interview with Marc Andreessen
    • Logan Bartlett interview with Eliezer Yudkowsky
    • Oxide and Friends podcast, especially Okay Doomer, Tales From the
    Bringup Lab and More Tales from the Bringup Lab

    View full-size slide