Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to CoreOS or, How I Learned To Stop Worry...

Intro to CoreOS or, How I Learned To Stop Worrying and Love systemd

A discussion of what CoreOS is, what it isn't, and some learning experiences.

Ric Lister

April 14, 2015
Tweet

More Decks by Ric Lister

Other Decks in Technology

Transcript

  1. or, How I Learned to Stop Worrying and Love Systemd

    or, some pragmatic patterns for running docker in production
  2. Hello! I AM RIC LISTER director of devops at spree

    commerce @bnzmnzhnz github.com/rlister
  3. Docker frees us from the operating system No more dependency

    hell. Since the OS no longer needs to support our app, we can go minimalist. Which makes it easier to patch, and more secure.
  4. What do we need? Some way to run containers: ◦

    docker pull, start, stop, rm ◦ set environment variables ◦ restart policies ◦ capture output And an OS that can update itself in a sane way. And some orchestration …
  5. CoreOS Originally based on ChromiumOS. Which is based on Gentoo.

    No packaging system. Well ... there is: docker.
  6. Atomic updates (Omaha) In the event of boot failure, rollback

    to A System running off read-only /usr on A OS update downloads to B, system reboots when ready *
  7. Update strategies Before reboot host requests a global lock using

    magic. * By default one host per cluster can hold a reboot lock. Can turn off reboots. Define strategy in cloud-config: #cloud-config coreos: update: group: stable reboot-strategy: off * not actual magic
  8. Release channels: choose your pain tolerance Stable Production clusters, all

    software tested in alpha and beta first. Beta Promoted alpha releases. Run a few beta hosts to catch problems early. Alpha Tracks dev and gets newest docker, etcd and fleet. Frequent releases. https://coreos.com/releases/
  9. Features of etcd Useful features like TTL, locks. Simple HTTP

    API. Read and write values with curl or etcdctl. Keys and values stored in directories like filesystem. Watch a key or directory for changes.
  10. Setting up an etcd cluster Get a discovery token: $

    curl https://discovery.etcd.io/new https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94 Boot 3 machines with cloud-config: #cloud-config coreos: etcd: discovery: https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94 addr: $private_ip4:4001 peer-addr: $private_ip4:7001 units: - name: etcd.service command: start
  11. Using etcd keys set a key $ ssh 10.10.1.1 CoreOS

    stable (607.0.0) $ etcdctl set /foo "Hello world" Hello world $ curl -L -X PUT http://127.0.0.1:4001/v2/keys/bar -d value="Hello world" {"action":"set","node":{"key":"/bar","value":"Hello world","modifiedIndex": 42103694,"createdIndex":42103694}}
  12. Using etcd keys get a key $ ssh 10.10.1.1 CoreOS

    stable (607.0.0) $ etcdctl get /foo Hello world $ curl -L http://127.0.0.1:4001/v2/keys/bar {"action":"get","node":{"key":"/bar","value":"Hello world","modifiedIndex": 40004310,"createdIndex":40004310}}
  13. If you lose quorum the cluster may get split brain.

    • This cluster is finished. You must create a new one. • This is not cool. etcd gotchas Use an odd number of hosts. • Adding one to make an even number does not increase redundancy. Use Elastic IPs. • If an instance reboots with a new IP it may fail to rejoin the cluster.
  14. Setting up a fleet cluster Add fleet to the cloud-config

    #cloud-config coreos: etcd: discovery: https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94 addr: $private_ip4:4001 peer-addr: $private_ip4:7001 fleet: metadata: role=web,region=us-east-1,type=m3.medium units: - name: etcd.service command: start - name: fleet.service command: start
  15. Using fleetctl List machines in cluster $ brew install fleetctl

    $ fleetctl -tunnel 10.10.1.1 list-machines MACHINE IP METADATA 148a18ff-6e95-4cd8-92da-c9de9bb90d5a 10.10.1.1 - 491586a6-508f-4583-a71d-bfc4d146e996 10.10.1.2 - c9de9451-6a6f-1d80-b7e6-46e996bfc4d1 10.10.1.3 -
  16. Launching containers with fleet If a host goes down, fleet

    will reschedule units. Fleet submits systemd unit files to the cluster, using etcd as backing-store. Fleet-specific metadata controls scheduling of units.
  17. Example unit [Unit] Description=Hello world After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 ExecStartPre=-/usr/bin/docker

    rm hello ExecStartPre=/usr/bin/docker pull busybox ExecStart=/usr/bin/docker run \ --name hello \ busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done" ExecStop=/usr/bin/docker stop hello
  18. Running our example unit Load and start the unit $

    fleetctl -tunnel 10.10.1.1 start hello $ fleetctl -tunnel 10.10.1.1 list-units UNIT MACHINE ACTIVE SUB hello.service c9de9451.../10.10.1.3 active running $ fleetctl -tunnel 10.10.1.1 journal hello hello hello $ fleetctl -tunnel 10.10.1.1 destroy hello
  19. Example global unit [Unit] Description=Hello world After=docker.service Requires=docker.service [Service] TimeoutStartSec=0

    ExecStartPre=-/usr/bin/docker rm hello ExecStartPre=/usr/bin/docker pull busybox ExecStart=/usr/bin/docker run --name hello busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done" ExecStop=/usr/bin/docker stop hello [X-Fleet] MachineMetadata=region=us-east-1 Global=true Run on all instances with this fleet metadata
  20. Running a global unit Load and start the unit $

    fleetctl -tunnel 10.10.1.1 start hello $ fleetctl -tunnel 10.10.1.1 list-units UNIT MACHINE ACTIVE SUB hello.service 148a18ff.../10.10.1.1 active running hello.service 491586a6.../10.10.1.2 active running hello.service c9de9451.../10.10.1.3 active running $ fleetctl -tunnel 10.10.1.1 destroy hello
  21. Fleet metadata Option Description Global Schedule on all units in

    the cluster MachineID Schedule to one specific machine MachineOf Limit to machines that are running specified unit MachineMetadata Limit to machines with specific metadata Conflicts Prevent from running on same machine as matching units
  22. Start a specific number of units Refer to them in

    unit files using systemd templates. Create a unit file like: [email protected] Start specific instances named like: [email protected] [email protected]
  23. Example template unit [Unit] Description=Hello world After=docker.service Requires=docker.service [Service] TimeoutStartSec=0

    ExecStartPre=-/usr/bin/docker rm hello ExecStartPre=/usr/bin/docker pull busybox ExecStart=/usr/bin/docker run --name hello busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done" ExecStop=/usr/bin/docker stop hello [X-Fleet] Conflicts=hello@* Ensure there is only one of these on each instance
  24. Running template units Start 2 instances $ fleetctl -tunnel 10.10.1.1

    start hello@{1..2} $ fleetctl -tunnel 10.10.1.1 list-units UNIT MACHINE ACTIVE SUB [email protected] c9de9451.../10.10.1.3 active running [email protected] c9de9451.../10.10.1.1 active running $ fleetctl -tunnel 10.10.1.1 journal hello@1 hello hello
  25. To change a unit definition, you must destroy and restart

    it. • For global units this means the whole cluster. • Which means downtime. fleet gotchas Fleet does not do resource-based scheduling. • Intended as a low-level system to build more advanced systems on. When moving units around you must do discovery to route traffic. • For example sidekick patterns and etcd-aware proxies.
  26. PATTERNS How can I use CoreOS for real? Here are

    three patterns I use in production today ...
  27. Simple homogeneous ops cluster This is the most textbook “toy”

    cluster you will see in CoreOS docs. It is suitable for all those random little internal tools that can tolerate brief downtime. 1
  28. Small cluster Long-lived hosts run etcd. Submit app to cluster,

    sidekick announces app. Reverse proxy discovers app host from etcd.
  29. Sidekick units When app goes down, sidekick removes key from

    etcd. Sidekick unit sets etcd key for app container host:port when app starts. Write your own, calling etcdctl, or use something like github. com/gliderlabs/registrator Reverse proxy or load-balancer container listens for changes in etcd keys. Reconfigures to proxy to app host:port. Write config files with github. com/kelseyhightower/confd, or use etcd-specific proxy like github.com/mailgun/vulcand
  30. Etcd + workers Great for low-traffic websites that need a

    couple of instances behind a load-balancer. Works well with autoscaling. 2
  31. Etcd + workers Elastic workers connect to etcd cluster and

    discover their units based on fleet metadata. Works well with autoscaling + ELB.
  32. Immutable servers with no etcd We use this for a

    high-traffic cluster of micro-services that demands very high availability and strict change control. Systemd units are hard-coded into cloud-config with user- data. Demands some orchestration such as autoscaling groups. 3
  33. Do not do OS updates. Deploy code or OS update

    by changing launch config and replacing all hosts. Immutable servers with no etcd No etcd, no cluster. Workers spun up by autoscaling. Hard-code systemd units in launch config.
  34. Logs Get ‘em off the host ASAP. github.com/gliderlabs/logspout is a

    tiny docker container that ships all other container output to udp/514. Send to logstash/splunk/papertrail ...
  35. Monitoring ◦ AWS cloudwatch ◦ newrelic for apps ◦ newrelic-sysmond

    for instances ◦ … but it doesn’t understand cgroups ◦ datadog has better container support ◦ cadvisor presents container stats over http
  36. Alternative operating systems RancherOS: no systemd … system docker runs

    at PID 1: runs user docker container containing app containers RedHat Project Atomic: rpm-ostree merges updates to read-only /usr and /var Ubuntu Snappy Core: transactional updates with snappy packages.
  37. Schedulers Fleet is intentionally simple. Build on it for more

    sophistication: ◦ Google’s Kubernetes ◦ Apache Mesos/Marathon ◦ paz.sh … PaaS based-on CoreOS ◦ Deis … private heroku-like on CoreOS It seems like something new pops up every day at the moment ...