Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Clustering with Docker Swarm - Dockerops 2016 @...

Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy

An introduction to Docker native clustering: Swarm.
Deployment and configuration, integration with Consul, for a product-like cluster to serve web-application with multiple containers on multiple hosts. #dockerops #docker #clustering #dockerswarm #swarm

Giovanni Toraldo

February 13, 2016
Tweet

More Decks by Giovanni Toraldo

Other Decks in Technology

Transcript

  1. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com About me Lead

    developer at ClouDesire.com Open Source Enthusiast with SuperCow Powers Java/PHP/whatever developer writer of the OpenNebula book devops https://twitter.com/gionn [email protected] 2
  2. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com What is ClouDesire?

    Application Marketplace to help software vendors to sell and provision applications • Web Applications: ◦ provision VM ◦ on multiple cloud providers ◦ deploy/upgrade application and dependencies docker containers ◦ application logging ◦ resource monitoring • With multi-tenant applications/SaaS: ◦ expose REST hooks and API for billing lifecycle • manage subscriptions, billing, pay-per-use, invoicing, payments. 3
  3. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com History of Docker

    networking support 4 • 2014-06-09 — Docker 1.0 release - standard bridges, no multi-host support • 2015-06-16 — Docker 1.7 release - experimental volume plugins, networking rewritten and released as libnetwork • 2015-07-24 — libnetwork 0.4.0 release, experimental overlay driver and network plugins • 2015-08-04 — Docker Swarm 0.4 release • 2015-10-13 — Docker Swarm 1.0 release • 2015-11-03 — Docker 1.9 release - network feature exits experimental, multi-host networking using VXLAN based overlay driver • 2016-02-04 — Docker 1.10 release - DNS based discovery
  4. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker without Swarm

    • Independent Docker hosts ◦ Chef, Puppet, Ansible? • Containers manual allocation on multiple nodes ◦ Non-linear resources usage ◦ No service discovery, hardcoded configurations ▪ Consul? ◦ Manual reaction to failures • Unhandled container data - bounded to local node • Only third-party OSS “schedulers” available (without simplicity in mind) ◦ Google Kubernetes ◦ Apache Mesos ◦ Spotify Helios ◦ New Relic Centurion 5
  5. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com And then, Swarm.

    • Native clustering for Docker: ◦ turns a pool of Docker hosts into a single, virtual host • Standard Docker API ◦ re-use existing tools ▪ docker cli ▪ compose ▪ dokku ▪ anything else • Pluggable schedulers 6
  6. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Steps for bootstrap

    a Swarm cluster Bootstrapping a cluster, the practical way: • Launch a fleet of VM, reachable via SSH • docker daemon running ◦ reachable via TCP port ◦ auth with TLS certificates • external service discovery backend required • Bootstrap swarm-manager • Bootstrap swarm-agent on the remaining nodes • Use swarm-manager API 7
  7. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker-Machine for launching

    VM Machine manager (like Vagrant) https://github.com/docker/machine (Win/Mac: distributed in Docker toolkit) • Launch VM somewhere • Install Docker • Generates and copy certificates ◦ (password-less auth) • Enable remote access via TCP 8
  8. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker-Machine help •

    active: Print which machine is active • config: Print the connection config for machine • create: Create a machine • env: Display the commands to set up the environment for the Docker client • inspect: Inspect information about a machine • ip: Get the IP address of a machine • kill: Kill a machine • ls: List machines • provision: Re-provision existing machines • regenerate-certs: Regenerate TLS Certificates for a machine 9 • restart: Restart a machine • rm: Remove a machine • ssh: Log into or run a command on a machine with SSH. • scp: Copy files between machines • start: Start a machine • status: Get the status of a machine • stop: Stop a machine • upgrade: Upgrade a machine to the latest version of Docker • url: Get the URL of a machine • version: Show the Docker Machine version or a machine docker version
  9. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker-Machine backends Where

    nodes can run? • Generic backend ◦ existing hosts with ssh access • Local machine (virtualization) ◦ Virtualbox ◦ VMware Fusion • Cloud providers ◦ Amazon ◦ GCE ◦ Rackspace ◦ DigitalOcean ◦ ... 10
  10. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Bootstrap a node

    with Docker-Machine $ docker-machine create --driver generic --generic-ip-address=<ip- address> <nodename> $ docker-machine create --driver virtualbox <nodename> $ docker-machine create --driver digitalocean --digitalocean-access-token <token> <nodename> $ docker-machine create --driver amazonec2 --amazonec2-access-key <key> --amazonec2-secret-key <secret> <nodename> $ docker-machine create --driver kvm --kvm-cpu-count 2 --kvm-disk-size 20 --kvm-memory 4096 <nodename> 11
  11. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Interaction with a

    Docker-Machine node $ docker-machine env default export DOCKER_TLS_VERIFY="1" export DOCKER_HOST="tcp://192.168.99.100:2376" export DOCKER_CERT_PATH="/home/gionn/.docker/machine/machines/default" export DOCKER_MACHINE_NAME="default" # Run this command to configure your shell: # eval "$(docker-machine env default)" $ docker info Kernel Version: 4.1.17-boot2docker Operating System: Boot2Docker 1.10.0 (TCL 6.4.1); master : b09ed60 - Thu Feb 4 20:16:08 UTC 2016 12
  12. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker-Machine for launching

    a swarm-master Using the Docker Hub discovery backend (best for testing/development): $ docker run swarm create a62518a837ed196550ec83442901dfad $ docker-machine create \ -d <backend-plugin> \ --swarm \ --swarm-master \ --swarm-discovery token://<token> \ swarm-master or manually: $ docker run -d -p 3375:2375 -t swarm manage token://<token> 13
  13. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Docker-Machine for launching

    swarm nodes $ docker-machine create \ -d <backend-plugin> \ --swarm \ --swarm-discovery token://<token> \ swarm-node-00 or manually: $ docker run -d swarm join --addr=<master-ip>:2375 token://<token> 14
  14. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Check running machine

    status $ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS swarm-master - virtualbox Running tcp://192.168.99.101:2376 swarm-master (master) v1.10.0 swarm-node-00 - virtualbox Running tcp://192.168.99.100:2376 swarm-node-00 v1.10.0 $ eval $(docker-machine env swarm-master) && docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b664b357e999 swarm:latest "/swarm join --advert" 2 days ago Up 21 minutes swarm-agent 52ddf6fbab43 swarm:latest "/swarm manage --tlsv" 2 days ago Up 21 minutes swarm-agent-master 15
  15. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com First lap with

    Docker Swarm $ docker -H 192.168.99.101:3376 ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c4067a2f176b swarm:latest "/swarm join --advert" 2 minutes ago Up 2 minutes swarm-node-00/swarm-agent 9623e4e94771 swarm:latest "/swarm join --advert" 7 minutes ago Up 7 minutes swarm-master/swarm-agent 8576ffa755c4 swarm:latest "/swarm manage --tlsv" 7 minutes ago Up 7 minutes swarm-master/swarm-agent-master • agent running on every node • master running on a single node 16
  16. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Service discovery backends

    for production Swarm relies on a service discovery backend to knows endpoints of all the nodes. • Docker Hub token (ok for testing, not intended for production) • Static file with IP:port list or range (poor man service discovery) • etcd • consul • zookeeper 17
  17. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Service discovery with

    Consul Consul is a distributed, highly available Key/Value store and service registry, with simple API. https://www.hashicorp.com/ https://www.consul.io 18
  18. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Consul features •

    Agent based • Key-Value Store • Services discovery backend • Services Health Checking • Query interfaces ◦ HTTP JSON API ◦ DNS • LAN communication • WAN replication (Multi-DC) • UI for browsing 19 • Agent ◦ Health Checking ◦ Query interface (HTTP, DNS) • Server ◦ Data storage and replication ◦ Leader election ◦ Query interface (HTTP, DNS)
  19. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Bootstrap Consul cluster

    with Docker-Machine Initialize new node(s): $ docker-machine create \ -d <backend-plugin> \ consul-1 Prepare for launch: $ eval $(docker-machine env consul-1) 21
  20. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Service discovery with

    Consul Single node bootstrap: $ docker run --net=host progrium/consul -server -bootstrap Multiple node bootstrap: $ docker run --net=host progrium/consul -server -bootstrap-expect 3 $ docker run --net=host progrium/consul -server -join <existing-node-ip> https://hub.docker.com/r/progrium/consul/ 22
  21. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Bootstrap swarm-master backed

    by Consul $ docker-machine create \ -d <backend-plugin> \ --swarm \ --swarm-master \ --swarm-discovery="consul://$(docker-machine ip consul-1):8500" \ -- engine-opt="cluster-store=consul://$(docker-machine ip consul-1): 8500" \ --engine-opt="cluster-advertise=eth1:2376" swarm-master • Node information saved in the K/V store • Master announce itself on the network for being picked up by agents 23
  22. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Highly available Swarm

    master backed by Consul 24 Master replaced automatically when a last advertise TTL expires $ docker-machine create \ -d virtualbox \ --swarm \ --swarm-master \ --swarm-discovery="consul://$(docker-machine ip consul-1):8500" \ --engine-opt="cluster-store=consul://$(docker-machine ip consul-1): 8500" \ --engine-opt="cluster-advertise=eth1:2376" \ --swarm-opt="replication=true" \ --swarm-opt="advertise=eth1:3376" \ swarm-master
  23. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Multi-Host networking with

    Overlay driver Default bridge network allows only single host networking. Overlay enables multi-host networking with a software-defined network. • K/V Store is required (e.g. Consul) • Create a network with overlay driver $ docker -H 192.168.99.101:3376 network create --driver overlay --subnet=10. 0.9.0/24 cloudesire • Run containers within the new network $ docker -H 192.168.99.101:3376 run -ti --net=cloudesire busybox 25
  24. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Multi-Host networking with

    Overlay driver (2) • Example ip addr of a container attached to overlay network 11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue link/ether 02:42:0a:00:09:02 brd ff:ff:ff:ff:ff:ff inet 10.0.9.2/24 scope global eth0 14: eth1@if15: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff inet 172.18.0.2/16 scope global eth1 • Multiple overlay network can be created • Service discovery via dns enabled ◦ Forget about using links ▪ No more starting order madness ▪ No more restart parties • Additional services registered via --publish-service=service.name ◦ Multiple containers exposing the same service 26
  25. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Swarm Manager scheduler

    policies Available strategies: • spread ◦ few containers on every node • binpack ◦ most containers on few nodes • random ◦ totally cpu/memory unaware Tip: stopped containers count towards scheduler allocation 27
  26. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Scheduler filters Filters

    enabled by default: • Health ◦ avoid starting containers on unhealthy hosts • Constraints ◦ by node name ◦ by storage driver ◦ by kernel version ◦ by custom labels $ docker run -e constraint:storage==ssd mysql 28
  27. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Container filters •

    affinity ◦ container: prefer scheduling nearby existing container ▪ -e affinity:container==frontend ◦ image: prefer scheduling on node with already pulled image ▪ -e affinity:image==redis ◦ label: prefer scheduling nearby tagged containers ▪ --label com.example.type=frontend ▪ -e affinity:com.example.type==frontend • dependency ◦ --volumes-from=N — same node where volume reside ◦ --link=N:alias — same node with container to link to ◦ --net=container:N — node with same network stack of another container • port ◦ avoids port clashes when launching multiple containers on the same port 29
  28. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com What about Storage?

    • Docker 1.8 introduced volume plugins • Docker 1.9 improve usability of volume plugins Available plugins (any particular Swarm support required): • Flocker (move data along with containers) • Netshare (NFS, CIFS, AWS AFS) • Convoy (NFS, EBS, plus snapshot support) • GlusterFS • https://github.com/docker/docker/blob/master/docs/extend/plugins.md $ docker run -d --volume-driver <driver> -v <src:dst_path> <image> 30
  29. DockerOps 2016 #dockerops Giovanni Toraldo ~ ClouDesire.com Gotchas -> Roadmap

    • Too simple container rescheduling on node failure ◦ No stateful/stateless distinction • No rebalancing across nodes • No Global Scheduling (same container on every node, e.g. log collector) • No Persistence of status - no Shared State ◦ If master goes offline, and then node goes offline, master came back, no way to know what was on node running • Scalability up to hundreds of nodes • Lacking integration with larger platforms: Mesos, Kubernetes 31