Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Production Engineering for Youngbloods

Production Engineering for Youngbloods

A lot can be learned about shipping software through books, tutorials, and coursework, but there are a whole class of lessons that only ever show up in production environments. Those lessons are extremely valuable, but the barrier to learning about them is high. I'm hoping to lower that barrier a bit with this talk.

This talk contains a handful of lessons I've learned through operating production environments, as told to beginner engineers. Some are obvious, and some are surprising, but none are fiction. Not in-depth, but a good starting point for engineers who want to learn about production engineering.

Hector Castro

October 15, 2019
Tweet

More Decks by Hector Castro

Other Decks in Technology

Transcript

  1. Production Engineering for Youngbloods A small collection of things I

    have learned interacting with production environments.
  2. We build applications that use maps, location, and aerial imagery

    for civic and social impact. Azavea https://careers.azavea.com
  3. New Class of Bugs Usually, it’s the car that crashes

    into something, not the other way around. Caches
  4. Response Time A simple equation that leads to a good

    mental model for queueing systems. Queues
  5. f(f(x)) = f(x) Idempotence A useful property for tasks in

    a queue, hidden behind a big word. Queues
  6. const sgMail = require('@sendgrid/mail'); exports.nonIdempotentEmailFunction = (event) => { const

    message = event.data; // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); }; Queues !-> Idempotence
  7. const sgMail = require('@sendgrid/mail'); exports.nonIdempotentEmailFunction = (event) => { const

    message = event.data; // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); }; Queues !-> Idempotence
  8. const sgMail = require('@sendgrid/mail'); const db = nosql.database(); exports.idempotentEmailFunction =

    (event) => { const message = event.data; const eventId = event.id; const emailRef = db.collection('sentEmails').doc(eventId); return shouldSend(emailRef).then(send => { if (send) { // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); return markSent(emailRef); } }); }; function shouldSend(emailRef) { return emailRef.get().then(emailDoc => { return !emailDoc.exists || !emailDoc.data().sent; }); } function markSent(emailRef) { return emailRef.set({sent: true}); } Queues !-> Idempotence
  9. const eventId = event.id; const emailRef = db.collection('sentEmails').doc(eventId); const sgMail

    = require('@sendgrid/mail'); const db = nosql.database(); exports.idempotentEmailFunction = (event) => { const message = event.data; const eventId = event.id; const emailRef = db.collection('sentEmails').doc(eventId); return shouldSend(emailRef).then(send => { if (send) { // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); return markSent(emailRef); } }); }; function shouldSend(emailRef) { return emailRef.get().then(emailDoc => { return !emailDoc.exists || !emailDoc.data().sent; }); } function markSent(emailRef) { return emailRef.set({sent: true}); } Queues !-> Idempotence
  10. return shouldSend(emailRef).then(send => { function shouldSend(emailRef) { return emailRef.get().then(emailDoc =>

    { return !emailDoc.exists || !emailDoc.data().sent; }); } const sgMail = require('@sendgrid/mail'); const db = nosql.database(); exports.idempotentEmailFunction = (event) => { const message = event.data; const eventId = event.id; const emailRef = db.collection('sentEmails').doc(eventId); return shouldSend(emailRef).then(send => { if (send) { // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); return markSent(emailRef); } }); }; function shouldSend(emailRef) { return emailRef.get().then(emailDoc => { return !emailDoc.exists || !emailDoc.data().sent; }); } function markSent(emailRef) { return emailRef.set({sent: true}); } Queues !-> Idempotence
  11. return markSent(emailRef); function markSent(emailRef) { return emailRef.set({sent: true}); } const

    sgMail = require('@sendgrid/mail'); const db = nosql.database(); exports.idempotentEmailFunction = (event) => { const message = event.data; const eventId = event.id; const emailRef = db.collection('sentEmails').doc(eventId); return shouldSend(emailRef).then(send => { if (send) { // Send email. sgMail.setApiKey(...); sgMail.send({..., text: message}); return markSent(emailRef); } }); }; function shouldSend(emailRef) { return emailRef.get().then(emailDoc => { return !emailDoc.exists || !emailDoc.data().sent; }); } function markSent(emailRef) { return emailRef.set({sent: true}); } Queues !-> Idempotence
  12. Latency Numbers https://gist.github.com/jboner/2841832 L1 cache reference ......................... 0.5 ns Branch

    mispredict ............................ 5 ns L2 cache reference ........................... 7 ns Mutex lock/unlock ........................... 25 ns Main memory reference ...................... 100 ns Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs SSD random read ........................ 150,000 ns = 150 µs Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs Round trip within same datacenter ...... 500,000 ns = 0.5 ms Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms Disk seek ........................... 10,000,000 ns = 10 ms Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms It’s Slow
  13. L1 cache reference ......................... 0.5 ns Branch mispredict ............................ 5

    ns L2 cache reference ........................... 7 ns Mutex lock/unlock ........................... 25 ns Main memory reference ...................... 100 ns Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs SSD random read ........................ 150,000 ns = 150 µs Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs Round trip within same datacenter ...... 500,000 ns = 0.5 ms Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms Disk seek ........................... 10,000,000 ns = 10 ms Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms Latency Numbers https://gist.github.com/jboner/2841832 Main memory reference ...................... 100 ns It’s Slow
  14. L1 cache reference ......................... 0.5 ns Branch mispredict ............................ 5

    ns L2 cache reference ........................... 7 ns Mutex lock/unlock ........................... 25 ns Main memory reference ...................... 100 ns Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs SSD random read ........................ 150,000 ns = 150 µs Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs Round trip within same datacenter ...... 500,000 ns = 0.5 ms Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms Disk seek ........................... 10,000,000 ns = 10 ms Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms Latency Numbers https://gist.github.com/jboner/2841832 Main memory reference ...................... 100 ns Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms It’s Slow
  15. L1 cache reference 0.5 s One heart beat (0.5 s)

    Branch mispredict 5 s Yawn L2 cache reference 7 s Long yawn Mutex lock/unlock 25 s Making a coffee Main memory reference 100 s Brushing your teeth Compress 1K bytes with Zippy 50 min One episode of a TV show Send 2K bytes over 1 Gbps network 5.5 hr Lunch to end of work day SSD random read 1.7 days A normal weekend Read 1 MB sequentially from memory 2.9 days A long weekend Round trip within same datacenter 5.8 days A medium vacation Disk seek 16.5 weeks A semester in university Read 1 MB sequentially from disk 7.8 months Producing a new human being Humanized Latency Numbers https://gist.github.com/hellerbarde/2843375 It’s Slow !-> Latency Numbers
  16. Humanized Latency Numbers https://gist.github.com/hellerbarde/2843375 L1 cache reference 0.5 s One

    heart beat (0.5 s) Branch mispredict 5 s Yawn L2 cache reference 7 s Long yawn Mutex lock/unlock 25 s Making a coffee Main memory reference 100 s Brushing your teeth Compress 1K bytes with Zippy 50 min One episode of a TV show Send 2K bytes over 1 Gbps network 5.5 hr Lunch to end of work day SSD random read 1.7 days A normal weekend Read 1 MB sequentially from memory 2.9 days A long weekend Round trip within same datacenter 5.8 days A medium vacation Disk seek 16.5 weeks A semester in university Read 1 MB sequentially from disk 7.8 months Producing a new human being The above 2 together 1 year Main memory reference 100 s Brushing your teeth It’s Slow !-> Latency Numbers
  17. Humanized Latency Numbers https://gist.github.com/hellerbarde/2843375 L1 cache reference 0.5 s One

    heart beat (0.5 s) Branch mispredict 5 s Yawn L2 cache reference 7 s Long yawn Mutex lock/unlock 25 s Making a coffee Main memory reference 100 s Brushing your teeth Compress 1K bytes with Zippy 50 min One episode of a TV show Send 2K bytes over 1 Gbps network 5.5 hr Lunch to end of work day SSD random read 1.7 days A normal weekend Read 1 MB sequentially from memory 2.9 days A long weekend Round trip within same datacenter 5.8 days A medium vacation Disk seek 16.5 weeks A semester in university Read 1 MB sequentially from disk 7.8 months Producing a new human being The above 2 together 1 year Read 1 MB sequentially from disk 7.8 months Producing a new human being It’s Slow !-> Latency Numbers Main memory reference 100 s Brushing your teeth
  18. Be Curious About the System Strive to develop a mental

    model of the application and the architecture it resides on. It’s Slow