Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making the Impossible Impossible: Improving Rel...

Making the Impossible Impossible: Improving Reliability by Preventing Classes of Problems

This talk was given at SREcon EMEA 22, in Amsterdam: https://www.usenix.org/conference/srecon22emea/presentation/sinjakli

---

Service Level Objectives (SLOs) are a familiar topic in SRE circles. They provide a framework for measuring and thinking about the reliability of a service in terms of a percentage of successful operations, such as HTTP requests.

That key strength of SLOs - viewing reliability as a percentage game - can also also be a weakness. Within that framing, there are certain solutions we're likely to overlook.

This talk explores another lens for reliability - one that's complementary to SLOs: structuring software in a way that rules out entire classes of problem.

We'll explore this idea via three worked examples, and finish with some concrete take-aways, including how to spot problems that fit this shape.

Avatar for Chris Sinjakli

Chris Sinjakli

October 26, 2022
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Hi

  2. A refresher: Measuring the performance of a service as a

    percentage of successful operations
  3. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  4. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  5. Today's talk: - Another lens for reliability - Examples in

    the wild 
 - How to spot problems of this shape
  6. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  7. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  8. This is not: - An attack on SLOs 
 -

    One-size- fi ts all solution - Possible if you can't edit software
  9. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  10. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  11. Simple model id description state 1 Laptop collected 2 Phone

    collected 3 Unused domain renewal collected
  12. Simple model id description state 1 Laptop paid_out 2 Phone

    collected 3 Unused domain renewal collected
  13. Simple model id description state 1 Laptop submitted 2 Phone

    collected 3 Unused domain renewal collected
  14. Simple model id description state 1 Laptop failed 2 Phone

    collected 3 Unused domain renewal collected
  15. class Payment def fail() if state == "submitted" state =

    "failed" else raise "Cannot fail from state: #{state}" State restriction pseudocode
  16. class Payment def submit() if state == "created" state =

    "submitted" else raise "Cannot submit from state: #{state}" State restriction pseudocode
  17. class Payment def fail() if state in ["submitted", "payout_submitted"] state

    = "failed" else raise "Cannot fail from state: #{state}" State restriction pseudocode
  18. State machine: - A set of states - A set

    of allowed transitions between those states
  19. char *ptr = malloc(SIZE); do_stuff(ptr); free(ptr); // Many lines more

    code do_other_stuff(ptr); Use-after-free in C
  20. Garbage collection pseudocode def main() name = "Chris" greet(name) def

    greet(name) puts("Hello #{name}") Falls out of scope
  21. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings
  22. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings Owner transferred
  23. fn main() { let name = String::from("Chris"); greet(name); } fn

    greet(name: String) { println!("Hello {}", name); } Rust greetings Falls out of scope Owner transferred
  24. fn main() { let name = String::from("Chris"); greet(name); say_goodbye(name); }

    fn greet(name: String) { println!("Hello {}", name); } Rust greetings Compiler error
  25. fn main() { let name = String::from("Chris"); greet(&name); say_goodbye(name); }

    fn greet(name: &String) { println!("Hello {}", name); } Rust greetings Borrow
  26. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint;
  27. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint;
  28. -- Create a table CREATE TABLE payments ( id int

    NOT NULL, ... ) -- Realise `int` isn't large enough (232) -- You're going to run out of IDs ALTER TABLE payments MODIFY id bigint; Blocks all other queries
  29. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  30. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  31. -- Slow transaction START TRANSACTION; SELECT * FROM payments; --

    Forces this to queue ALTER TABLE payments ADD COLUMN refunded boolean; -- Which blocks these SELECT * FROM payments WHERE id = 123;
  32. id (bigint) description 1 Laptop ALTER TABLE payments MODIFY id

    bigint; id (int) description 1 Laptop 2 Phone
  33. id (bigint) description 1 Laptop ALTER TABLE payments MODIFY id

    bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  34. id (bigint) description 1 Laptop 2 Phone ALTER TABLE payments

    MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  35. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  36. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal
  37. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  38. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  39. id (bigint) description 1 Laptop 2 Phone 3 Unused domain

    renewal ALTER TABLE payments MODIFY id bigint; id (int) description 1 Laptop 2 Phone 3 Unused domain renewal User queries (via proxy)
  40. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot
  41. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot
  42. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot - But there are some tells
  43. Take aways: - Complementary technique - You have to write

    software 
 - It's not easy to spot - But there are some tells
  44. Examples: - State machines - Memory safety 
 - Database

    migrations 
 Add more unit tests Write better C Just hire
  45. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Write better C Just hire
  46. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire
  47. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire
  48. Smug comments: - State machines - Memory safety 
 -

    Database migrations 
 Add more unit tests Write better C Just hire a DBA
  49. Image credits • Poker Winnings - slgckgc - CC-BY -

    https://www. fl ickr.com/photos/slgc/42157896194/ • Thinking Face - Twemoji - CC-BY - https://github.com/twitter/twemoji • Ferris (Extra-cute) - Unof fi cial Rust mascot - Copyright waived - https://rustacean.net/ • A350 Board - Mark Turnauckas - CC-BY - https://www. fl ickr.com/photos/marktee/ 17118767669/ • Play - Annie Roi - CC-BY - https://www. fl ickr.com/photos/annieroi/4421442720/
  50. Image credits • White jigsaw puzzle with missing piece -

    Marco Verch Professional Photographer - CC-BY - https://www. fl ickr.com/photos/30478819@N08/50605134766/ • Hedge maze - claumoho - CC-BY - https:// fl ickr.com/photos/claudiah/3929921991/ • photo_1405_20060410 - Robo Android - CC-BY - https://www. fl ickr.com/photos/ 49140926@N07/6798304070/ • Gears - Mustang Joe - Public Domain - https://www. fl ickr.com/photos/mustangjoe/ 20437315996/