containers according to the OCI specification. ⚫OCI specification • https://github.com/opencontainers/runtime-spec ⚫Container Runtime needs • config.json which is a configuration file for container guest –environment variables, Linux namespace, seccomp, etc. • rootfs for a container guest 5
container technology • as a light-weight SandBox • to package an application along with its dependencies Wifi/Bluetooth/ LTE/Ethernet Isolation Hardware Host OS dockerd Apps1 Apps2 Server Host OS Apps1 Apps2 Hardware Edge 8
to concentrate on their application although “config.json” has many items. User Kernel process1 CAP_NET_ADMIN allows various network- related operations CAP_SYS_TIME allows to set up system clock write() write() /box1 /bin /usr /sbin / /bin /usr /sbin / Mount Namespace PID Namespace 5 6 7 1 2 3 4 1 2 3 ??? process2 process1 process2 15
configuration to improve the performance. –Some configuration degrades the performance of system for processes inside the container. ⚫Storage • Bind mount –Reduce the size of storage to use same files • (WIP) Deduplicate storage for Embedded Container Platform 18
size of memory to share page caches • KSM (KERNEL SAME-PAGE MERGING) –Reduce the size of memory to share anonymous pages –(WIP) Control KSM for fine grained scan. –(WIP) How to set up madvise(2) 19
• But we encountered some issues sometimes when we used it for our use case. –See https://speakerdeck.com/kentatada/container-tracer-using-oci-hooks-on- kubernetes?slide=17 ⚫(WIP) cgroupv2 support for Embedded Container Runtime 20
application for security and safety. • Needed Linux Capabilities • Correct file permissions • Executed syscall list to set up seccomp • Page fault occurrence in the critical code ⚫We need a light-weight and secure tool for embedded. 21
hook • This tracer sets up ftrace at the prestart of Container Runtime. –See https://speakerdeck.com/kentatada/debug-application-inside-kubernetes-using-linux-kernel-tools • We could trace others as same as the syscall tracer. ⚫Support operations per container • We had no way to specify OCI hook per container on Kubernetes. • We merged the patch for operations per container to mainline. –https://github.com/containerd/cri/pull/1436 • We can control our tracer per container since containerd 1.4.0 release. 22
other hand, we don’t want to provide all applications with a root privilege. ⚫Especially, embedded applications directly access devices sometimes. –mount(2) –mknod(2) –Access GPIO 23
Hardware Host OS Apps ⚫root privilege + Linux Capabilities + Prior seccomp • Linux Capabilities cannot realize fine grained access control. –Ex. Both ping and ARP spoofing need CAP_NET_RAW • seccomp just allows or denies syscall and does not provide privileges. ⚫User namespace • And what is needed to provide correct access control??? Linux Capabilities user namespace 24
handle a particular syscall in user space. ⚫Advantages over ptrace • Performance • To be able to run it on the program that uses seccomp • Protection against PID recycling ⚫But process_vm_readv(2) is needed to fetch the data from the tracee’s address space. 26
1. App1 initializes the seccomp context using seccomp_init(). 2. App1 sets up the seccomp context using seccomp_rule_add(). 3. App1 loads the seccomp context using seccomp_load(). 4. App1 gets notify fd for notification using seccomp_notify_fd(). App1 Kernel FGAC server notification fd fd https://github.com/seccomp/libseccomp/pull/232#issuecomment-627731454 27
1. App1 initializes the seccomp context using seccomp_init(). 2. App1 sets up the seccomp context using seccomp_rule_add(). 3. App1 loads the seccomp context using seccomp_load(). 4. App1 gets notify fd for notification using seccomp_notify_fd(). 5. App1 sends notify fd to FGAC server via UNIX Domain Socket. App1 Kernel FGAC server notification fd fd fd 28 https://github.com/seccomp/libseccomp/pull/232#issuecomment-627731454
1. App1 initializes the seccomp context using seccomp_init(). 2. App1 sets up the seccomp context using seccomp_rule_add(). 3. App1 loads the seccomp context using seccomp_load(). 4. App1 gets notify fd for notification using seccomp_notify_fd(). 5. App1 sends notify fd to FGAC server via UNIX Domain Socket. 6. FGAC server notify fd from App1 via UNIX Domain Socket. App1 Kernel FGAC server notification fd fd fd 29 https://github.com/seccomp/libseccomp/pull/232#issuecomment-627731454
1. App1 initializes the seccomp context using seccomp_init(). 2. App1 sets up the seccomp context using seccomp_rule_add(). 3. App1 loads the seccomp context using seccomp_load(). 4. App1 gets notify fd for notification using seccomp_notify_fd(). 5. App1 sends notify fd to FGAC server via UNIX Domain Socket. 6. FGAC server notify fd from App1 via UNIX Domain Socket. 7. FGAC server receives a notification from notify fd using seccomp_notify_receive(). App1 Kernel FGAC server notification fd fd fd Userspace handler 30 https://github.com/seccomp/libseccomp/pull/232#issuecomment-627731454
on CPU- intensive programs when use default-configured Docker. • See http://mamememo.blogspot.com/2020/05/cpu-intensive-rubypython-code-runs.html • We have CPU-intensive software in the robot field. ⚫All speculation mitigations are automatically enabled when seccomp is enabled. ⚫But we can change the setting of seccomp with SECCOMP_FILTER_FLAG_SPEC_ALLOW. 32
mitigations Linux Kernel Docker Library 2. Initialize seccomp 3. Disable speculation feature 4. Set up each mitigation ⚫This feature needs to change the behavior of Docker and runc and Linux Kernel. ⚫In addition to that, we must modify related libraries if we need. runc 33
Kernel Docker Apps Library runc libsecomp -golang To be determined Implement the new option to control speculation mitigation runtime-spec : https://github.com/opencontainers/runtime-spec/pull/1047 runc : https://github.com/opencontainers/runc/pull/2433 Support SECCOMP_FILTER_FLAG_SPEC_ALLOW https://github.com/seccomp/libseccomp-golang/pull/51 Fix PR_SPEC_FORCE_DISABLE https://lore.kernel.org/patchwork/patch/1251849 34
technologies are used. ⚫Diversity is important for OSS. • We need the knowledge of various software layers. • The perspectives from different industries make OSS great. • Let's boost the container community up together. 35