Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Dive into firecracker-containerd (DockerCo...

Deep Dive into firecracker-containerd (DockerCon 2019)

Amazon recently released the Firecracker Virtual Machine Manager (VMM) built on top of the Linux KVM subsystem, which is optimized for lightweight, container-like “micro”-VMs. This session dives deep into the architecture of the firecracker-containerd project, which aims to allow portability between standard OCI container images and the larger container ecosystem with Firecracker micro-VMs. Topics covered will include the standard containerd architecture with the reference OCI runtime (runc), challenges adapting containers into micro-VMs, and the firecracker-containerd suite.

Samuel Karp

April 30, 2019
Tweet

More Decks by Samuel Karp

Other Decks in Technology

Transcript

  1. © 2019, Amazon Web Services, Inc. or its Affiliates. Agenda

    • Overview of containers • What is a container runtime? What is containerd? • The Firecracker Virtual Machine Monitor (VMM) • Adapting containerd to Firecracker • Demo • Current status and roadmap • Q&A
  2. © 2019, Amazon Web Services, Inc. or its Affiliates. A

    really brief overview of containers... • A mechanism for running software • … with some isolation • … with some repeatability • … with a standard format for distribution • … with common tooling
  3. © 2019, Amazon Web Services, Inc. or its Affiliates. Containers

    enable… • Repeatability for deployment – Fully model local software dependencies – Efficient image-based deployment • Minimize duplicate storage and network transfer • Expressive modeling of application – Shared network, communicate over loopback – Separate filesystem, bundled dependencies
  4. © 2019, Amazon Web Services, Inc. or its Affiliates. Containers

    ease… • Deployment automation • Separation of purposes • Composability – use multiple containers together
  5. © 2019, Amazon Web Services, Inc. or its Affiliates. Linux

    container primitives • Namespaces – Visibility restrictions • Control groups (cgroups) – Resource limits • Capabilities – Permission restrictions • Seccomp – Syscall allow/deny lists • Linux Security Modules – Resource access control • Union Filesystems – Image layers
  6. © 2019, Amazon Web Services, Inc. or its Affiliates. Containers

    and VMs Containers • Use Linux primitives to isolate processes • Share a Linux kernel • Fast starts, minimal overhead • Flexible isolation Virtual Machines • Virtualize or emulate hardware components • Completely separate kernels (maybe not Linux!) • Slower starts, must boot kernel and set up hardware Hardware Linux Kernel namespaces cgroups ... Container Container Hardware Linux Kernel KVM Virt hardware Virt hardware VM Guest VM Guest
  7. © 2019, Amazon Web Services, Inc. or its Affiliates. Why

    would I want a VM then? • Sharing a Linux kernel means sharing the kernel interfaces • Interface between VM and VMM (hypercalls) is defined by the hypervisor • Interface between kernel and hardware is well-understood • Good for defining trust and resource boundaries – Isolating multi-tenant workloads – Isolating non-trusted workloads
  8. © 2019, Amazon Web Services, Inc. or its Affiliates. What

    do we mean by isolation? • Prevent tenants from affecting one another – Unintentional and malicious • Threat modeling with STRIDE – Spoofing – Tampering – Repudiation – Information Disclosure – Escalation of privilege • Container security – seccomp – Linux security modules – capabilities • Hypervisor
  9. © 2019, Amazon Web Services, Inc. or its Affiliates. Common

    container tooling • Docker UX (docker build, docker run) • Images and registries for software distribution (docker push, docker pull) • Container orchestrators – Amazon ECS – Kubernetes – Mesos • Open Containers Initiative (OCI) – Image standard – Runtime standard
  10. © 2019, Amazon Web Services, Inc. or its Affiliates. ©

    2019, Amazon Web Services, Inc. or its Affiliates. Container runtimes What are they? What is containerd?
  11. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    you run containers Cluster orchestrator Amazon ECS, Kubernetes, Mesos Local orchestrator Docker Local management containerd Container runtime runc or Firecracker
  12. © 2019, Amazon Web Services, Inc. or its Affiliates. Container

    runtimes Container runtimes • Mechanism for starting and managing container workloads • (Linux containers) Set up cgroups, namespaces, filesystems, capabilities, etc • OCI Runtime specification – CLI for setting up a container – On-disk “bundle” • Root filesystem • JSON file describing configuration • runc – Reference implementation – Donated by Docker
  13. © 2019, Amazon Web Services, Inc. or its Affiliates. containerd

    containerd • Daemon for managing containers • Modular framework for container lifecycle workflows • Integrates with OCI runtimes and containerd v2 runtimes
  14. © 2019, Amazon Web Services, Inc. or its Affiliates. The

    containerd stack • gRPC API and Services • Storage services – Content store – Snapshotters • Runtime (runc, OCI, v2) • In-process plugins • Out-of-process gRPC plugins gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events OS Runtimes Runtimes
  15. © 2019, Amazon Web Services, Inc. or its Affiliates. The

    containerd stack • gRPC API and Services • Storage services – Content store – Snapshotters • Runtime (runc, OCI, v2) • In-process plugins • Out-of-process gRPC plugins gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events OS Runtimes Runtimes
  16. © 2019, Amazon Web Services, Inc. or its Affiliates. The

    containerd stack • gRPC API and Services • Storage services – Content store – Snapshotters • Runtime (runc, OCI, v2) • In-process plugins • Out-of-process gRPC plugins gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events OS Runtimes Runtimes
  17. © 2019, Amazon Web Services, Inc. or its Affiliates. The

    containerd stack • gRPC API and Services • Storage services – Content store – Snapshotters • Runtime (runc, OCI, v2) • In-process plugins • Out-of-process gRPC plugins gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events OS Runtimes Runtimes
  18. © 2019, Amazon Web Services, Inc. or its Affiliates. The

    containerd stack • gRPC API and Services • Storage services – Content store – Snapshotters • Runtime (runc, OCI, v2) • In-process plugins • Out-of-process gRPC plugins gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events OS Runtimes Runtimes
  19. © 2019, Amazon Web Services, Inc. or its Affiliates. ©

    2019, Amazon Web Services, Inc. or its Affiliates. Firecracker
  20. © 2019, Amazon Web Services, Inc. or its Affiliates. Firecracker

    Virtual Machine Monitor (VMM) • New KVM-based VMM in Rust • Targeted at Lambda and Fargate workloads • Small, limited device model • Small, limited set of features • Very fast boot • Containers, but with hypervisor- provided isolation
  21. © 2019, Amazon Web Services, Inc. or its Affiliates. Firecracker

    design philosophy Security • Very limited device model • Very limited feature set • Eliminate guest interaction with host kernel • Prohibit VMM syscalls • Memory-safe programming language • Single VM per Firecracker process Efficiency • Fast boot time • Low memory and CPU overhead • API driven
  22. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    goals Containers • Compatible images • Familiar tooling • Support existing workflows • Allow composable containers • Integrate with orchestrators • Minimal additional overhead Security • Hypervisor-based isolation • Limited access to the host
  23. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    do we make Firecracker like a container? Firecracker VMM considerations • No filesystem sharing • No hot device attachments • Limited networking options (tap) • Cross-boundary communication with vsock Adapting to containerd • Block-device snapshotter • API to manage VMM lifecycle • Split the “shim” into two parts • “Runtime” on the host • “Agent” inside the VM • ...that runs containers via runc • Network • Tap device • usable with Container Network Interface (CNI) plugins
  24. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    architecture microVM microVM Containerd Containerd FC Snapshotter FC Snapshotter Container Container Internal FC Agent Internal FC Agent runc runc Content Store Content Store FC Control Plugin FC Control Plugin VM Disk Image VM Disk Image Kernel Image Kernel Image Firecracker VMM Firecracker VMM FC Runtime FC Runtime
  25. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    architecture microVM microVM Containerd Containerd FC Snapshotter FC Snapshotter Container Container Internal FC Agent Internal FC Agent runc runc Content Store Content Store FC Control Plugin FC Control Plugin VM Disk Image VM Disk Image Kernel Image Kernel Image Firecracker VMM Firecracker VMM FC Runtime FC Runtime
  26. © 2019, Amazon Web Services, Inc. or its Affiliates. What

    is a “block device” snapshotter? • VM treats the snapshots as devices, not as filesystems or directories • Inside the VM, we mount the device to expose its filesystem • Device behaviors and assumptions, including exclusive access
  27. © 2019, Amazon Web Services, Inc. or its Affiliates. Design

    considerations and tradeoffs • “Naive” – flat files and ahead-of-time copying • “devicemapper” – like Docker’s “devicemapper” storage driver with thin provisioning (copy-on-write) • “LVM” – sparse images with LVM (copy-on-write) • “rawblock” – reflink for copy-on-write
  28. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    architecture microVM microVM Containerd Containerd FC Snapshotter FC Snapshotter Container Container Internal FC Agent Internal FC Agent runc runc Content Store Content Store FC Control Plugin FC Control Plugin VM Disk Image VM Disk Image Kernel Image Kernel Image Firecracker VMM Firecracker VMM FC Runtime FC Runtime
  29. © 2019, Amazon Web Services, Inc. or its Affiliates. What’s

    this “firecracker-control” plugin? • First-class VM construct and API • Specify VM-related parameters like the kernel and VM root filesystem • Allocate and manage VM resources: block devices, network interfaces, etc • Manages the VM lifecycle • Compiled-in plugin • gRPC API over the same socket • Specific to Firecracker for now • Hopefully leading to a more- generic “sandbox” API
  30. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    architecture microVM microVM Containerd Containerd FC Snapshotter FC Snapshotter Container Container Internal FC Agent Internal FC Agent runc runc Content Store Content Store FC Control Plugin FC Control Plugin VM Disk Image VM Disk Image Kernel Image Kernel Image Firecracker VMM Firecracker VMM FC Runtime FC Runtime
  31. © 2019, Amazon Web Services, Inc. or its Affiliates. What

    does the “runtime” do? • Proxies container management commands to agent inside VM • Proxies I/O streams • Proxies events and metrics from inside the VM back out to containerd • containerd’s V2 API • gRPC API for communication • Extended firecracker-containerd structs • (Planned) Many containers per VM, but one runtime
  32. © 2019, Amazon Web Services, Inc. or its Affiliates. firecracker-containerd

    architecture microVM microVM Containerd Containerd FC Snapshotter FC Snapshotter Container Container Internal FC Agent Internal FC Agent runc runc Content Store Content Store FC Control Plugin FC Control Plugin VM Disk Image VM Disk Image Kernel Image Kernel Image Firecracker VMM Firecracker VMM FC Runtime FC Runtime
  33. © 2019, Amazon Web Services, Inc. or its Affiliates. What

    about the “agent”? • Manages the lifecycle of the containers inside the VM • Communicates over vsock • Receives container management commands from the runtime • Proxies I/O streams • Proxies events and metrics from inside the VM back out to the runtime • Associate each container with the appropriate block device • Mounts container filesystems • Uses runc to set up cgroups, namespaces, etc – Looks like a traditional container to the workload inside
  34. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container
  35. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container
  36. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot container process for each container prepare snapshot
  37. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN container process for each container prepare snapshot snapshot $fooN
  38. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot container process for each container prepare snapshot snapshot $fooN
  39. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id container process for each container prepare snapshot snapshot $fooN
  40. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id container process for each container prepare snapshot snapshot $fooN
  41. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN container process for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  42. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN container process for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  43. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  44. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  45. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process container $barN running container $barN running for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  46. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process subscribe events for $barN container $barN running container $barN running for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  47. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process After some time, container process exits subscribe events for $barN container $barN running container $barN running for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  48. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process After some time, container process exits subscribe events for $barN container $barN running container $barN running container $barN exited container $barN exited container $barN exited for each container run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  49. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process After some time, container process exits subscribe events for $barN container $barN running container $barN running container $barN exited container $barN exited container $barN exited for each container stop VM $id stop VM run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  50. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container prepare snapshot snapshot $fooN create VM launch & boot VM $id run container $barN with snapshot $fooN attach $fooN mount $fooN container process run container $barN start process After some time, container process exits subscribe events for $barN container $barN running container $barN running container $barN exited container $barN exited container $barN exited for each container stop VM $id stop VM run container $barN with snapshot $fooN prepare snapshot snapshot $fooN
  51. © 2019, Amazon Web Services, Inc. or its Affiliates. ©

    2019, Amazon Web Services, Inc. or its Affiliates. Demo
  52. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run --tty --env CPU_THREADS=2 --env IO_THREADS=4 docker.io/nmeyerhans/stress:latest $(uuid)
  53. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run --tty --env CPU_THREADS=2 --env IO_THREADS=4 docker.io/nmeyerhans/stress:latest $(uuid) $ pgrep stress 2709 2710 2711 2712 2713 2714 2715
  54. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run --tty --env CPU_THREADS=2 --env IO_THREADS=4 docker.io/nmeyerhans/stress:latest $(uuid) $ pgrep stress 2709 2710 2711 2712 2713 2714 2715 $ pstree -pA -sS 2709 systemd(1)---containerd(1350)---containerd-shim(2666)---entrypoint(2685)---stress(2709)-+-stress(2710) |-stress(2711) |-stress(2712) |-stress(2713) |-stress(2714) `-stress(2715)
  55. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run \ --tty \ --env CPU_THREADS=2 \ --env IO_THREADS=4 \ --runtime aws.firecracker \ --snapshotter firecracker-naive \ docker.io/nmeyerhans/stress:latest $(uuid)
  56. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run \ --tty \ --env CPU_THREADS=2 \ --env IO_THREADS=4 \ --runtime aws.firecracker \ --snapshotter firecracker-naive \ docker.io/nmeyerhans/stress:latest $(uuid) $ pgrep stress $ pgrep firecracker 2501
  57. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run \ --tty \ --env CPU_THREADS=2 \ --env IO_THREADS=4 \ --runtime aws.firecracker \ --snapshotter firecracker-naive \ docker.io/nmeyerhans/stress:latest $(uuid) $ pgrep stress $ pgrep firecracker 2501 $ pstree -pA -sS 2501 systemd(1)---containerd-shim(2492)---firecracker(2501)-+-{fc_vcpu0}(2507) |-{fc_vcpu1}(2508) `-{fc_vmm}(2504)
  58. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. $ sudo ctr run \ --tty \ --env CPU_THREADS=2 \ --env IO_THREADS=4 \ --runtime aws.firecracker \ --snapshotter firecracker-naive \ docker.io/nmeyerhans/stress:latest $(uuid) $ pgrep stress $ pgrep firecracker 2501 $ pstree -pA -sS 2501 systemd(1)---containerd-shim(2492)---firecracker(2501)-+-{fc_vcpu0}(2507) |-{fc_vcpu1}(2508) `-{fc_vmm}(2504) $ top -b -n1 | head -n40 top - 22:29:55 up 3:40, 5 users, load average: 2.51, 3.08, 2.98 Tasks: 659 total, 1 running, 328 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.1 us, 0.2 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 52825036+total, 51928140+free, 867300 used, 8101676 buff/cache KiB Swap: 0 total, 0 free, 0 used. 52430393+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2787 root 20 0 270876 112660 112540 S 216.7 0.0 8:22.97 firecracker 515 root 20 0 0 0 0 I 5.6 0.0 0:03.58 kworker/56:1-ev 2838 admin 20 0 45400 4136 3076 R 5.6 0.0 0:00.03 top
  59. © 2019, Amazon Web Services, Inc. or its Affiliates. Current

    status • Everything is a prototype • ...but it works! • Runtime agent communication is evolving ↔ – Better vsock support in Firecracker • Reliable identification for block devices • firecracker-control VM API • Next up: – Multiple containers per VM – Exec
  60. © 2019, Amazon Web Services, Inc. or its Affiliates. An

    API for groups of containers • Required by CRI and implemented internally in cri-containerd • We want this for firecracker-containerd too!
  61. © 2019, Amazon Web Services, Inc. or its Affiliates. Longer

    term • Still things we’re not sure how to handle • Container Runtime Interface (CRI) conformance for Kubernetes – Dynamic workload changes (new pods) – Host-based filesystem sharing (volumes) • Networking with CNI
  62. © 2019, Amazon Web Services, Inc. or its Affiliates. How

    to get involved • GitHub: https://github.com/firecracker-microvm/firecracker-containerd • Slack: https://tinyurl.com/firecracker-microvm • … or come work with us!
  63. © 2019, Amazon Web Services, Inc. or its Affiliates. ©

    2019, Amazon Web Services, Inc. or its Affiliates. Q&A More questions? Contact Sam: [email protected] or @samuelkarp on Twitter
  64. © 2019, Amazon Web Services, Inc. or its Affiliates. A

    brief note before we finish — Session surveys provide valuable information to speakers Feedback that is very helpful: • Topics you were excited to learn about • Suggestions for improving understanding and clarity Feedback that is extremely unhelpful: • Comments unrelated to talk content (please refer to the DockerCon Code of Conduct) The “hallway track” is always open! Feedback and questions welcome ([email protected], @samuelkarp) For support, use the AWS Forums or contact AWS Support
  65. © 2019, Amazon Web Services, Inc. or its Affiliates. ©

    2019, Amazon Web Services, Inc. or its Affiliates. Thank you!