Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Perils, Pitfalls and Pratfalls of Platform Engineering (QCon NYC, 2023)

Perils, Pitfalls and Pratfalls of Platform Engineering (QCon NYC, 2023)

Platform engineering isn’t supposed to be just another name for SRE, DevOps, infrastructure, or backend software engineering teams; but if you aren’t careful, that’s what you’ll get. Let’s talk about how platform engineering teams are different from other engineering teams, and some of the ways they run into traps and other troubles.

Charity Majors

June 28, 2023

More Decks by Charity Majors

Other Decks in Technology


  1. Platform engineering isn’t “new” 🙄 Heroku, Facebook, and others had

    “platform engineering” teams a decade ago Platform engineering is actually quite new. 🤓 Until recently, “platform engineering” could mean ~anything. Now it has been defined.
  2. Platform Engineering Formerly known as “infrastructure”. The software you have

    to run in order to run the software you want to run. Platform Org Umbrella org for security, devex, SRE… engineering teams that don’t work on core product Platform Team Team most responsible for enabling product engineering teams to own their code in production.
  3. “DevOps is dead” is just a stupid thing to say…

    clickbait marketing🤮 But there is a kernel of truth there DevOps is not eternal. It will be superseded.
  4. Operating that software is eternal. Developing software is eternal. but

    “DevOps”? What happens when there are no more “dev” teams and “ops teams”? 🤔 🤔
  5. The long arc of software careers 1990 Write code and

    run what you write 1995 Devs write code, Ops runs code. Friction ensues. 2007 DevOps emerges; devs + ops Empathy, #hugops blah blah 2023 Write code and run what you write
  6. 1. Every engineer writes code. 2. Every engineer runs the

    code that they write, and operates it in production. These days:
  7. Systems are becoming rapidly more complex. They can’t really be

    operated like black boxes anymore. You need to build them to run them. And you can’t do a good job of building them unless you are regularly exposed to the feedback loops of operating them.
  8. 1. Software ownership (you write it, you run it) 2.

    We are all moving up the stack. Infrastructure is becoming boring. Two big trends are converging:
  9. We are decoupling “infrastructure” from “operating software” Standalone ops teams

    are spinning down But operational expertise is more critical than ever before.
  10. No. You have a platform engineering org, which wraps &

    packages your infrastructure needs by running as little infra as possible.
  11. Infrastructure is a cost center. It may be a competitive

    differentiator, but it is still a cost center. which means you want as little as possible. Infrastructure(n): the code you have to run, in order to run the code you want to run.
  12. which builds infra composes architecture as a product. Within your

    platform organization, you may have a platform team
  13. ⛔ Infrastructure Org ✅ Platform Org • SRE • Deep

    subsystem teams • “Pure” platform teams • Security • Release engineering • Developer tools • Front-end developer experience
  14. Be crystal clear on what “infra” means to you. Leverage

    vendors as much as possible. You will NEVER be able to outsource your core differentiators
  15. The best code is the code that doesn’t exist. The

    second best code is code someone else writes and maintains, and you get to use. The worst code is… literally anything else.
  16. Pitfall #2 Writing too much software You cannot own too

    much software surface area Or you will grind to a halt
  17. Pitfall #3 Not letting product teams own their own reliability.

    Software engineers need to own their code in production. This means being on call for it, too.
  18. Pitfall #4 Not giving engineers enough tooling to understand their

    code as well as operate it. Or giving them “ownership” without empowerment
  19. Pitfall #5 Being confused about who your customer is. Your

    customer is internal software engineering teams who work on the core product.
  20. Pitfall #6 Not running your team like a product team

    The Promised Land beyond firefighting is … working like a product org. ALL engineering teams.
  21. Your platform team *should* spend time on: • Doing discovery

    • Building champions • Baking in feedback cycles • Working with product managers • Working with design (!) • Figuring out the golden path • Practicing change management • Building a roadmap • Talking with focus groups • Building internal APIs
  22. Pitfall #7 Not paying enough attention to cost & spend

    as part of architecture & planning. Educating others about cost counts too :)
  23. Cost is an essential part of architecture. Build vs Buy

    is not the only time we need to think about this!!
  24. Pitfall #8 Not constantly looking for ways to deprecate, delete,

    and shed responsibilities. Managing your workload is like being a juggler. Success is in managing capacity.
  25. How to tell if your “platform team” is really a

    platform team or not: Is the team responsible for SLOs, service uptime, and a reliable customer experience? ✅ platform team NO ⛔ platform team YES
  26. “If you build it, they will come”?? No, they fucking

    won’t. Make sure you are building a platform that people actually want and need!
  27. “Vendor engineering” is a large share of any platform team’s

    remit Cost is part of architecture. Platform teams are super high leverage.
  28. If you’re an infra/devops/ops engineer, and you haven’t learned to

    work on product: Learn. Once you dig your way out of firefighting, product is what comes next.