Containers on embedded systems, Problems of existing container runtimes Proposal Rust-based, secure and lightweight container runtime Evaluation How well is our runtime compared to the existing runtimes? Summary Future work, Conclusion 2 Outline of the Talk
applications The mechanism can prevent attacks from untrusted applications Container has been utilized increasingly in embedded systems More attractive to resource-constrained systems due to the lightweight Container Trusted Application Container Container Platform Operating System Hardware Container Runtime Untrusted Application Dos Attack × Container Trusted Application Illegal Access ✔ ✔ ✖ ✖ Access Access
Container Initiative) runtime specification compliance Sometimes referred to as “low-level” runtimes Set up cgroups, namespaces, capabilities, seccomp, etc. 5 What is Container Runtime? Linux Kernel Namespaces Capabilities Cgroups Seccomp ... Container Runtime
containerd CRI-O Low-level Runtime runc runsc Singularity CRI OCI ※ From the next slides onwards, the term container runtime refers to the low-level runtime
Resource-constrained systems Small memory size Low-capacity storage Low-spec CPU Mission-critical systems Real-time application Critical functionality Longer life cycle Requirements of Embedded Systems Embedded Systems Low Resource Utilization Security High Response High Dependability 8
Include performance overhead and high resource usage Write operations by the daemon process shorten the lifespan of eMMC* We run the low-level container runtime alone on the systems 9 Containers on Embedded Systems *Embedded Multimedia Card Lightweight Container Trusted Application Operating System Hardware Container Trusted Application Daemon Process Container Runtime Container Trusted Application Operating System Hardware Container Trusted Application Container Runtime eMMC eMMC General-purpose Systems Embedded Systems
Security Linux capabilities are not fine-grained access control e.g., Both ping and ARP spoofing need CAP_NET_RAW The rootless container by user namespace is very strict for the systems The rootless container cannot emulate all system calls However, some embedded apps need to access devices via mount(2), mknod(2), etc. Lightweight Container startup time is not fast enough for real-time systems The go-based runtimes are not suitable for resource-constrained systems The application binary size is big Garbage Collection (GC) includes high CPU utilization 10 Problems of the Existing Runtimes
runc (v1.0.0-rc93) singularity (v3.1.0) crun (v0.18) railcar (v1.0.4) Language Rust Go Go and C Go and C C Rust OCI compatibility Binary size* 2.63 MB 23.6 MB 14.0 MB 18.0 MB 0.43 MB 1.68 MB Memory safety Fine-grained access control GC overhead Not included Included Included Included Not included Not included Fast startup Real-time support (WIP) *All binary files are stripped Comparison Table from the Perspective of Embedded Systems
Performance is equivalent to C/C++ Memory safety without GC Small application binary size Awesome crates for developing the container runtime FFI (Foreign Function Interface) to bind Linux API Go is also good language but has some limitations Problem interacting with namespaces by go-runtime The application binary size is big compared to Rust Overhead by GC 14 Why Rust?
Crates for the Container Runtime capability : https://crates.io/crates/caps rlimit : https://crates.io/crates/rlimit cgroups : https://crates.io/crates/cgroups-rs seccomp : https://crates.io/crates/seccomp-sys passfd : https://crates.io/crates/passfd This crate is used for the fine-grained access control core_affinity : https://crates.io/crates/core_affinity This crate is used for the real-time support etc. clap : https://crates.io/crates/clap serde_json : https://crates.io/crates/serde_json anyhow : https://crates.io/crates/anyhow etc. Developing Runtime Creating Container
by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp Capabilities Execute a system call e.g., mount Create a secure container AppArmor CPU Affinity Launch a secure container Capture the system call Perform the system call on behalf of the container Hardware (Resource Constrained System) SL runtime Start the container with arbitrary execution process Container (created) Container (running) Fast Startup
call e.g., mount Hardware (Resource Constrained System) Operating System Fine-Grained Access Control (FGAC) Server 18 Architecture Overview Rootless Containers by User Namespace Container (running) Container (created) Container (running) SL runtime Namespaces Seccomp Capabilities AppArmor CPU Affinity Capture the system call Create a secure container Start the container with an arbitrary execution process Perform the system call on behalf of the container Fast Startup
a container speedily by leveraging a pre-created container Omit time for initializing the runtime and creating the container Replace only the execution process inside the container at startup Reuse the other configuration except for the execution process Fast Startup Create Container Init Runtime Normal Run Reduced Time Fast Startup Fast Startup Elapsed Time Start Container Container (created) SL runtime Container (running) Real-Time App Fast Startup: Replace the process inside the container Linux Kernel Dummy 19
at fast startup Ensuring RT performance for embedded systems Allow the runtime to set CPU affinity depending on the load at startup 20 Real-Time (RT) Support Container (created) SL runtime Container (running) RT Support: Set CPU affinity with fast startup Linux Kernel Hardware CPU 1 CPU 2 CPU 3 CPU 4 Dummy Real-Time App
System 23 Architecture Overview Rootless Containers by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp AppArmor CPU Affinity SL runtime Container (running) Fast Startup Capture the system call Execute a system call e.g., mount Create a secure container Launch a secure container Perform the system call on behalf of the container Start the container with an arbitrary execution process
containers to execute system calls safely FGAC server emulates the system call in userspace on behalf of the container The rootless container can access devices safely via mount(2), mknod(2), etc. FGAC mechanism is achieved using the new seccomp notify feature 24 Fine-Grained Access Control (FGAC) FGAC Server A: Allow mount tmpfs B: Deny mount tmpfs SL runtime Container A (running) Linux Kernel Container B (running) mount tmpfs mount tmpfs Perform the mount on behalf of the container A ✔ ✖
in userspace Introduced in Linux 5.0 25 Seccomp Notify Feature Userspace Kernel Seccomp Agent 1. Issue a system call e.g., mount() Container 4. The container wants to run the system call ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, req) 5. Read the system call arguments from /proc/$pid/mem 6. Validate the system call if OK, go to 7a. If NG, go to 7b 7a. Perform the system call on behalf of the process 7b. Reject the system call 8a. Set the return value to 0 (success) 8b. Set the return value to error code (failure) ioctl(fd, SECCOMP_IOCTL_NOTIF_SEND, req) Process 2. Execute filter 3. Return “notify” cBPF Program Seccomp 9a. Return 0 (success) 9b. Return error code (failure)
The server is launched as root by only a system administrator Run the container using config.json that describes seccomp notify OCI runtime specification already supports seccomp notify [1] 26 Design of FGAC [1] https://github.com/opencontainers/runtime-spec/pull/1074 Container “seccomp”: { “defaultAction”: SCMP_ACT_ALLOW” “listenerPath”: “/var/run/notify.sock “architectures”: [ “SCMP_ARCH_X86” ] “syscalls”: [ { “names”: [ “mount” ] “action”: “SCMP_ACT_NOTIFY” FGAC Server 4. Create a seccomp notify fd 5. Pass the notify fd to FGAC server via SCM_RIGHTS (notify.sock) 2. Input the config.json SL runtime 3. Initialize a container 1. Launch the server with security policy Admin
and Fast startup Memory consumption of the container runtimes Environment Host: AMD Ryzen 9 3900X 12-Core (Ubuntu 20.04) Evaluated the runtimes: SL runtime, runsc, singularity, runc, crun and railcar Experimental Setup All the runtimes use same config.json Remove cgroups configuration because SL runtime does not support it yet Run the container runtimes alone without any client tools Execute /usr/bin/true inside containers 28 Evaluation
Normal run achieves a 7.4x speed-up compared to runc Fast startup achieves a 1.5x speed-up compared to the Normal run 29 Results: Start Time Normal Run Time Fast Startup Time [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4
in C Rust is a great fit for resource-constrained systems 30 Results: Memory Usage Memory Usage [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4
is a research prototype Support some features such as cgroups Enable Kubernetes to use SL runtime RuntimeClass == SL Runtime We plan to integrate SL runtime into Kata Containers Kata Containers has already developed the container runtime in Rust 32 Future Work
Small memory footprint and binary size for resource-constrained systems Memory safety without any overhead for mission-critical systems Rust-based container runtime optimized for embedded systems Fast startup that launches a container speedily from a pre-created container Fine-grained access control for the rootless container The results show that our runtime is suitable for embedded systems Run the container 7.4x faster than runc The runtime memory usage is equivalent to crun written in C 33 Conclusion