Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pune_User_Group.pdf

 Pune_User_Group.pdf

Prathamesh Sonpatki

February 11, 2023
Tweet

More Decks by Prathamesh Sonpatki

Other Decks in Programming

Transcript

  1. Policies & Contracts In Distributed Systems Pune User Group 11th

    Feb 2023 Prathamesh Sonpatki Last9.io @prathamesh2_ 1
  2. 2 As a Developer - I want to write more

    code - I want to fix all the (existing*) bugs - I want to write bug free code - I want to use latest and greatest tools - I want to integrate with best in class tools - But ….
  3. 3 As a DevOps Engineer - I want to make

    sure that infrastructure scales - I want to make sure that application utilizes resources efficiently - I want to make sure that infrastructure cost is not out of control - I want to make sure …. - But …
  4. 4 As a TL/EM/DL - I want to make sure

    that my team meets the deadlines - I want to make sure that features work as expected - I want to make sure that code/infra is performant enough - I want to make sure that tech debt is not out of control - I want to make sure that team is motivated enough - I want to make sure …. - But …
  5. 5 As a Business/Product leader - I want to make

    sure team velocity is not slow - I want to make sure that external commitments are met - I want to make sure that product is getting adopted - I want to make sure that customer needs and expectations are considered by the product and engineering team - But …
  6. 9 Failures will happen.. - Every stakeholder has a boundary

    and a limit - One of the stakeholders pushes other’s boundaries too much! Eg. - Business pushing for features rollout resulting into too much of tech debt for engineering - Engineering chasing perfection slowing down delivery and velocity
  7. 11 Boundaries… - Team constraints - Quality of work -

    Time? - Perfection? - Cost? - Pricing? - Time to market?
  8. 12 Negotiation - We will be able to release this

    but with few bugs - We can do this but with increased AWS bill 💵 - We will be roll this out to new customers if team works overnight on weekend - We will be able to fix those bugs causing that main customer to go away if we de-prioritize the feature pipeline - We will be able to move faster if we add one more backend developer to the team
  9. 15 Promises of POST /users endpoint - ✅ HTTP Status

    201 - ❌ HTTP Status 400 - ⛔ HTTP Status 401 - ❓HTTP Status 404
  10. 17 Promises of POST /users - Uptime 90% - 10%

    requests are allowed to fail - Every Weekend, 20% requests are allowed to fail - During peak hours, the Latency can vary between 1000ms-5000ms
  11. 19 Service Level Objectives - Availability will be > 99.99%

    over 1 Day - Latency will be < 4000 ms over 7 Days - Uptime will be 98% over 7 Days
  12. 20 Service Level Objectives - What is the error rate

    on this checkout flow? - Can we promise 99.9% availability to this enterprise customer? - Should we prioritize tech-debt over new features? - Where should engineering focus for the next sprint? - What’s the success rate of this payment gateway?
  13. 23 Policies - Set right expectations on what’s possible -

    Buy in from multiple stakeholders - Framework for communication between stakeholders - External client communication - Helps in Build v/s Buy decisions
  14. 24 Tiered Services - P0, P1, P2 - Different expectations

    from different tiers - Not every service is priority!
  15. 25 Ladder of Reliability - You can’t improve what you

    can’t measure - First Baseline! - Go one ladder at a time - 90% -> 95%-> 99 % ✅ - 90% -> 99.999% 😭
  16. Thanks 🤝 29 Prathamesh Sonpatki 9⃣ Last9.io 󰜼 prathamesh.tech 🐧

    twitter.com/prathamesh2_ 🐘 hachyderm.io/@Prathamesh “Last9 of Reliability” Discord