Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Release Engineering from the Ground Up

Avatar for Tom Santero Tom Santero
November 10, 2014

Release Engineering from the Ground Up

Slides from my talk at the USENIX Release Engineering Summit West '14: https://www.usenix.org/conference/ures14west/summit-program/presentation/santero

Avatar for Tom Santero

Tom Santero

November 10, 2014
Tweet

More Decks by Tom Santero

Other Decks in Programming

Transcript

  1. Listens for commits - builds on every push to any

    branch ! Run unit tests, reports build/test statistics ! If branch == master: - cut release as RPM - increment version number - push RPM to yum repo
  2. provisioning / termination ! release ver upgrades ! host system

    configuration - registration and discovery
  3. single repo: roles, tasks, files ! abstract out common tasks

    e.g. ElasticSearch, Riak, Jenkins ! parameterized per env + svc
  4. Jenkins: update release tag in Ansible repo ! Source of

    Truth? - correlate builds, releases and environments *
  5. nyt_lb* * naming is hard (also, too bad there’s no

    logo) service registration + discovery ! allow for load balancing internal + external traffic ! lightweight, robust, redundant ! scalable, highly-available
  6. RESTful API svc plugins: nginx, haproxy… in-memory db persistence &

    failure recovery distributed systems magic ! gossip + CRDTs
  7. nyt_lb nyt_lb nyt_lb all cluster state are CRDTs - node

    membership - registered services - service attributes
  8. nyt_lb nyt_lb nyt_lb quorum operations + gossip ! all state

    is monotonic & confluent ! new state converges
  9. nyt_lb nyt_lb nyt_lb upon provision and configuration, services register themselves

    ! take themselves out of LBs during upgrades; maintenance; destroy
  10. event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric'

    : 0.7, 'state' : ok, 'time' : 1413551091.341055, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description } event = { 'host' : ip-10-45-136-116, 'service' : load-average, 'metric' : 3.2, 'state' : warning, 'time' : 1413551176.852009, 'tags' : [dev, suggest-api, load] 'ttl' : 10, 'description' : description }
  11. operational challenges and failures are a given isolate and identify

    root causes ! check logic belongs close to the thing monitored ! push events ; compute per grp/env + expectation
  12. Lessons Learned and Future(?) Work Lot of work; difficult tradeoff

    for low-barrier to entry + robust system ! Containers are nice, but ecosystem is still too immature ! Correlating application, system, build metrics still manual - maybe emit events from Jenkins —> Riemann —> Datomic - Push button re-deploys of point-in-time environments ! Historical performance metrics as automated regression testing ! Automated security auditing, static code analysis, etc..