Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Containerization primatives

Avatar for Sam Kottler Sam Kottler
November 05, 2014

Containerization primatives

Avatar for Sam Kottler

Sam Kottler

November 05, 2014
Tweet

More Decks by Sam Kottler

Other Decks in Technology

Transcript

  1. ABOUT ME • Work at DigitalOcean as a systems engineer

    • Formerly of Red Hat, Venmo, Acquia • Committer/core for Puppet, Ansible, Fedora, CentOS, RubyGems, Bundler
  2. GOOD TO KNOW’S • What is a syscall • Basic

    understanding of linux networking • Containers vs. virtualization
  3. CONTAINERS ARE THE PAST *, PRESENT, AND FUTURE * Most

    of the linux ideas are poached from other OS’s
  4. NAMESPACES • mnt: filesystem • pid: process • net: network

    • ipc: SysV IPC • uts: hostname • user: UID
  5. THE BASICS • Namespaces do not have names • Six

    inodes exist under /proc/<pid>/ns • Each namespace has a unique inode
  6. NAMESPACE SYSCALLS • unshare() • moves existing process into a

    new namespace • clone() • creates new process and namespace • setns() • joins an existing namespace
  7. NETWORK ISOLATION • One namespace per networking device • Single

    default namespace, init_net(*nets) • A lo device is included in every ns_net.
  8. NETWORK NAMESPACES IN PRACTICE • ip netns add testns1 •

    creates /var/run/netns/testns1 • route management per-NS • prevents cross-NS bonds • setns(int fd, int nstype) • validates namespace type vs. FD
  9. SOCKET ISOLATION • Sockets are mapped into network namespaces •

    Also part of a single network namespace • sk_net is part of the sock struct • sock_net()/sock_net_set() getter/setter
  10. SOCKET ACTIVATION • Listen on a socket, but have no

    services behind it • Request arrives, service is spun up, responds • Enabling 10k+ low-usage services on a VM
  11. USER ISOLATION • Allows non-privileged usage • Often used as

    the start of a namespace chain • UID’s come from the overflow rules
  12. CGROUPS + NAMESPACES • “This PID can only see part

    of the filesystem” • “This PID can only see part of the filesystem, use 384mb of memory, and utilize a single CPU.”
  13. CGROUP IMPLEMENTATION • Hooks into fork() and exit() • VFS

    of a new type called “cgroup” • More complex descriptors for task_struct • Procfs entry in /proc/<pid>/cgroup • All actions take place on the FS
  14. CGROUP MANAGEMENT • 4 files per-cgroup • tasks • cgroup.procs

    • cgroup.event_control • notify_on_release
  15. MEMORY • Exposes most of the memory subsystem • NUMA

    management • Most complex type of cgroup