New Cache Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd / KubeCon + CloudNativeCon Japan

Hidehito Yabuuchi & Toru Komatsu, Preferred Networks, Inc. New Cache
Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd

Who we are @utam0k @ordovicia KOMATSU Toru Preferred Networks, Inc.
YABUUCHI Hidehito Preferred Networks, Inc. 2

Preferred Networks’ Infrastructure Preferred Networks, Inc • Provides ML models
like LLMs, and solutions for industries • Operates own on-premise infrastructure to provide solutions Infrastructure • 3+ Kubernetes Clusters • 400+ Kubernetes Nodes • 30000+ CPU Cores • 320+ TiB Memory • 2000+ GPUs • Our AI Accelerator: MN-Core™ series ◦ HW: RTL, Board/Server Design ◦ SW: Driver, Device Plugin, Compiler 3

Our Challenges Accelerate Container Startup • > Our analysis shows
that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read. [^FAST16] • The default container image we provide to our researchers is +20GB Reduce Cloud Egress Traffic • FinOps: Reducing cloud cost • We want to save on cloud egress bandwidth even on-premises [FAST16]: Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, University of Wisconsin “Slacker: Fast Distribution with Lazy Docker Containers”. 14th USENIX Conference on File and Storage Technologies 4

How to pull container images in Kubernetes Pulling container images
CRI Pulling container images Registries 5

Plain Kubernetes Origin Registry Node Local Cache Cloud egress Traffic
6

Plain Kubernetes Origin Registry Node Local Cache 💸 Cost ⛔
Bandwidth Limit 7

New Cache Hierarchy Origin Cluster Cache Node Local Cache ？
8

New Cache Hierarchy Origin Cluster Cache Node Local Cache 9

Real-world outcome Data transfer Pod startup time Pod startup accelerated
by ~20% Saved ~23TB/week Internet data transfer 10

Our solution: CIRC

CIRC introduces new cache hierarchy level Registries Cache storage 12

CIRC introduces new cache hierarchy level Registries KubeCon NA 2024
13

CIRC introduces new cache hierarchy level Registries Cache storage OCI
Distribution Specification 14

CIRC features Transparency Users don’t need to care Multi-tenancy Secure
cache sharing Preheating Push images for fast first pulls OCI artifacts Broader use cases 15

Transparency: Users don’t need to care • What we want
◦ No manifest changes required ◦ Works with arbitrary registries • How we achieve ◦ By utilizing containerd’s Registry Configuration feature I pull any image through CIRC ! # /etc/containerd/certs.d/_default/hosts.toml [host."http://circ.internal"] capabilities = ["pull", "resolve"] 16

Transparency: Users don’t need to care I pull any image
through CIRC ! GET http://circ.internal /v2/NAME/blobs/DIGEST image: quay.io/NAME:TAG If image isn’t cached yet I fetch it from... where? ??? 17

Transparency: Users don’t need to care I pull any image
through CIRC ! GET http://circ.internal /v2/NAME/blobs/DIGEST?ns=quay.io image: quay.io/NAME:TAG If image isn’t cached yet I fetch it from quay.io ! quay.io 18

Multi-tenancy: Secure cache sharing Share cache cluster-wide → Maximize cache
hit rate Team A Team B Can Team A pods use Team B images? 19

Multi-tenancy: Secure cache sharing 1. Enforce authz by querying origin
registry HEAD /v2/NAME/blobs/DIGEST Authorization: Bearer ... 2. If OK, return image to client Share cache across cluster → Maximize cache hit rate Team A Team B 20

Preheating: Push images for fast first pulls Pull once Can
we eliminate pulls altogether? 21

Preheating: Push images for fast first pulls Images: box: Freepik,
laptop: Freepik Supports push operations defined in OCI Distribution Spec POST /v2/<name>/blobs/uploads Send also to origin registry for persistency 22

Preheating: Push images for fast first pulls Already cached →
Fast ﬁrst pulls ✗ 23

OCI artifacts: Broader use cases • KEP-4639: OCI VolumeSource ◦
Allow pods to mount OCI artifacts • OCI artifacts ◦ Any content stored in OCI image format ◦ e.g. Git repositories, AI models (Huge!) Images: artificial intelligence: shin_icons 30 GB OCI Distribution Specification 24

CIRC features Transparency Users don’t need to care Multi-tenancy Secure
cache sharing Preheating Push images for fast first pulls OCI artifacts Broader use cases 25

Real-world outcome Data transfer Pod startup time Pod startup accelerated
by ~20% 26 Saved ~23TB/week Internet data transfer

Deploying CIRC

Bootstrap Problem 28

Bootstrap Problem 😪 I’m not awake yet 29

Bootstrap Problem Fallback Registries 30

Bootstrap Problem Too slow fallback (>20 min) Why…? Registries 31

Bootstrap Problem • The timeout until fallback was fixed at
30 seconds ◦ These 30 seconds are blob units • Image pull within nodes is serial by default ◦ When the node starts up, many Pods attempt to start simultaneously using DaemonSet, etc. 32

Thundering Herd Problem … In-Cluster Registries 33

Thundering Herd Problem Pull Image-A Pull Image-A … In-Cluster Registries
34

Thundering Herd Problem … Pull Image-A Pull Image-A 🤦 Same
Request × 2 In-Cluster Registries 35

Lease Object Distributed systems often have a need for leases,
which provide a mechanism to lock shared resources and coordinate activity between members of a set. In Kubernetes, the lease concept is represented by Lease objects in the coordination.k8s.io API Group, which are used for system-critical capabilities such as node heartbeats and component-level leader election. https://kubernetes.io/docs/concepts/architecture/leases/ Kubernetes has already provided the solution to this problem 36

Thundering Herd Problem … Pull Image-A Pull Image-A In-Cluster Registries
37

Thundering Herd Problem … Pull Image-A Pull Image-A In-Cluster Lease
for Image-A Attempt to acquire a lock Registries 38

Thundering Herd Problem … Pull Image-A Pull Image-A In-Cluster 🏆
Lease for Image-A Attempt to acquire a lock Awaiting cache load Registries 39

Conclusion

Summary CIRC features Transparency Users don’t need to care Multi-tenancy
Secure cache sharing Preheating Push images for fast first pulls OCI Artifacts Broader use cases Real-world outcome • Pod startup accelerated by ~20% • Saved ~23TB/week Internet data transfer Built upon containerd and OCI standards Origin Node Local Cache 41

New Cache Hierarchy for Container Images and OC...

New Cache Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd / KubeCon + CloudNativeCon Japan

More Decks by Preferred Networks

Other Decks in Technology

Featured

Transcript