$30 off During Our Annual Pro Sale. View Details »

etcd: Next Steps with the Cornerstone of Distr...

Avatar for Brandon Philips Brandon Philips
August 23, 2016
320

etcd: Next Steps with the Cornerstone of Distributed Systems

Avatar for Brandon Philips

Brandon Philips

August 23, 2016
Tweet

More Decks by Brandon Philips

Transcript

  1. Next Steps with the Cornerstone of Distributed Systems Brandon Philips

    @brandonphilips | [email protected] Demo Code: http://goo.gl/R6Og3Y Free Stickers @ Podium!
  2. Motivation CoreOS cluster reboot lock - Decrement a semaphore key

    atomically - Reboot and wait... - After reboot increment the semaphore key
  3. Requirements Strong Consistency - mutual exclusive at any time for

    locking purpose Highly Available - resilient to single points of failure & network partitions Watchable - push configuration updates to application
  4. Common Problem Amazon - Replicated log for ec2 Microsoft -

    Boxwood for storage infrastructure Hadoop - ZooKeeper is the heart of the ecosystem
  5. History of etcd ◦ 2013.8 Alpha release (v0.x) ◦ 2015.2

    Stable release (v2.0+) ◦ stable replication engine (new Raft implementation) ◦ stable v2 API ◦ 2016.6 (v3.0+) ◦ efficient, powerful API ◦ highly scalable backend
  6. How does etcd work? • Raft consensus algorithm ◦ Using

    a replicated log to model a state machine ◦ "In Search of an Understandable Consensus Algorithm" (Ongaro, 2014) • Three key concepts ◦ Leaders ◦ Elections ◦ Terms
  7. How does etcd work? • The cluster elects a leader

    for every given term • All log appends (--> state machine changes) are decided by that leader and propagated to followers • Much much more at http://raft.github.io/
  8. How does etcd work? • Written in Go, statically linked

    • /bin/etcd ◦ daemon ◦ 2379 (client requests/HTTP + JSON API) ◦ 2380 (peer-to-peer/HTTP + protobuf) • /bin/etcdctl ◦ command line client • net/http, encoding/json, golang/protobuf, ...
  9. locksmith • cluster wide reboot lock ◦ "semaphore for reboots"

    • CoreOS updates happen automatically ◦ prevent all the machines restarting at once...
  10. Cluster Wide Reboot Lock • Need to reboot? Decrement the

    semaphore key (atomically) with etcd • manager.Reboot() and wait... • After reboot, increment the semaphore key in etcd (atomically)
  11. Canal Today • virtual (overlay) network for constrained envs •

    BGP for physical environments • Connection policies • Built for Kubernetes useful in other systems with CNI
  12. confd • simple configuration templating • for "dumb" applications •

    watch etcd for changes, render templates with new values, reload applications
  13. Reliability • 99% at small scale is easy ◦ Failure

    is infrequent and human manageable • 99% at large scale is not enough ◦ Not manageable by humans • 99.99% at large scale ◦ Reliable systems at bottom layer
  14. Write Ahead Log • Append only ◦ Simple is good

    • Rolling CRC protected ◦ Storage & OSes can be unreliable
  15. Snapshots • Torturing DBs for Fun and Profit (OSDI2014) ◦

    The simpler database is safer ◦ LMDB was the winner • Boltdb an append only B+Tree ◦ A simpler LMDB written in Go
  16. Testing Clusters Failure • Inject failures into running clusters •

    White box runtime checking ◦ Hash state of the system ◦ Progress of the system
  17. etcd/raft Reliability • Designed for testability and flexibility • Used

    by large scale db systems and others ◦ Cockroachdb, TiKV, Dgraph
  18. Training San Francisco September 13 & 14 New York City

    September 27 & 28 San Francisco October 11 & 12 New York City October 25 & 26 Seattle November 10 & 11 https://coreos.com/training
  19. Thank you! Brandon Philips @brandonphilips | [email protected] | coreos.com We’re

    hiring in all departments! Email: [email protected] Positions: coreos.com/ careers