Upgrade to Pro — share decks privately, control downloads, hide ads and more …

New Cache Hierarchy for Container Images and OC...

New Cache Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd / KubeCon + CloudNativeCon Japan

One of the key bottlenecks in Kubernetes pod startup is the time taken to pull container images and OCI artifacts. It’s also costly to fetch large container images from the registry often. To tackle this problem, we developed a cache system with the following features:

* New Cache Hierarchy: Images pulled by pods are shared across the entire cluster, enabling cluster-wide optimization, not only cluster-local cache.
* Ninja: Users experience faster container image pulls without any changes on their part. Just like a ninja, the system stealthily enhances performance.
* Preheating: It supports pushing images to preheat the cache for subsequent pulls.

Deployed in a production cluster, the cache system has achieved a cache hit rate of around 95%, significantly reducing pod startup times and network communication with registries. Attendees will learn practical insights into leveraging cache and CRI to optimize image and OCI artifact pulls, ultimately enhancing cluster efficiency.
https://sched.co/1x708

Avatar for Preferred Networks

Preferred Networks

June 13, 2025
Tweet

More Decks by Preferred Networks

Other Decks in Technology

Transcript

  1. Hidehito Yabuuchi & Toru Komatsu, Preferred Networks, Inc. New Cache

    Hierarchy for Container Images and OCI Artifacts in Kubernetes Clusters using Containerd
  2. Who we are @utam0k @ordovicia KOMATSU Toru Preferred Networks, Inc.

    YABUUCHI Hidehito Preferred Networks, Inc. 2
  3. Preferred Networks’ Infrastructure Preferred Networks, Inc • Provides ML models

    like LLMs, and solutions for industries • Operates own on-premise infrastructure to provide solutions Infrastructure • 3+ Kubernetes Clusters • 400+ Kubernetes Nodes • 30000+ CPU Cores • 320+ TiB Memory • 2000+ GPUs • Our AI Accelerator: MN-Core™ series ◦ HW: RTL, Board/Server Design ◦ SW: Driver, Device Plugin, Compiler 3
  4. Our Challenges Accelerate Container Startup • > Our analysis shows

    that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read. [^FAST16] • The default container image we provide to our researchers is +20GB Reduce Cloud Egress Traffic • FinOps: Reducing cloud cost • We want to save on cloud egress bandwidth even on-premises [FAST16]: Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, University of Wisconsin “Slacker: Fast Distribution with Lazy Docker Containers”. 14th USENIX Conference on File and Storage Technologies 4
  5. Real-world outcome Data transfer Pod startup time Pod startup accelerated

    by ~20% Saved ~23TB/week Internet data transfer 10
  6. CIRC features Transparency Users don’t need to care Multi-tenancy Secure

    cache sharing Preheating Push images for fast first pulls OCI artifacts Broader use cases 15
  7. Transparency: Users don’t need to care • What we want

    ◦ No manifest changes required ◦ Works with arbitrary registries • How we achieve ◦ By utilizing containerd’s Registry Configuration feature I pull any image through CIRC ! # /etc/containerd/certs.d/_default/hosts.toml [host."http://circ.internal"] capabilities = ["pull", "resolve"] 16
  8. Transparency: Users don’t need to care I pull any image

    through CIRC ! GET http://circ.internal /v2/NAME/blobs/DIGEST image: quay.io/NAME:TAG If image isn’t cached yet I fetch it from... where? ??? 17
  9. Transparency: Users don’t need to care I pull any image

    through CIRC ! GET http://circ.internal /v2/NAME/blobs/DIGEST?ns=quay.io image: quay.io/NAME:TAG If image isn’t cached yet I fetch it from quay.io ! quay.io 18
  10. Multi-tenancy: Secure cache sharing Share cache cluster-wide → Maximize cache

    hit rate Team A Team B Can Team A pods use Team B images? 19
  11. Multi-tenancy: Secure cache sharing 1. Enforce authz by querying origin

    registry HEAD /v2/NAME/blobs/DIGEST Authorization: Bearer ... 2. If OK, return image to client Share cache across cluster → Maximize cache hit rate Team A Team B 20
  12. Preheating: Push images for fast first pulls Images: box: Freepik,

    laptop: Freepik Supports push operations defined in OCI Distribution Spec POST /v2/<name>/blobs/uploads Send also to origin registry for persistency 22
  13. OCI artifacts: Broader use cases • KEP-4639: OCI VolumeSource ◦

    Allow pods to mount OCI artifacts • OCI artifacts ◦ Any content stored in OCI image format ◦ e.g. Git repositories, AI models (Huge!) Images: artificial intelligence: shin_icons 30 GB OCI Distribution Specification 24
  14. CIRC features Transparency Users don’t need to care Multi-tenancy Secure

    cache sharing Preheating Push images for fast first pulls OCI artifacts Broader use cases 25
  15. Real-world outcome Data transfer Pod startup time Pod startup accelerated

    by ~20% 26 Saved ~23TB/week Internet data transfer
  16. Bootstrap Problem • The timeout until fallback was fixed at

    30 seconds ◦ These 30 seconds are blob units • Image pull within nodes is serial by default ◦ When the node starts up, many Pods attempt to start simultaneously using DaemonSet, etc. 32
  17. Lease Object Distributed systems often have a need for leases,

    which provide a mechanism to lock shared resources and coordinate activity between members of a set. In Kubernetes, the lease concept is represented by Lease objects in the coordination.k8s.io API Group, which are used for system-critical capabilities such as node heartbeats and component-level leader election. https://kubernetes.io/docs/concepts/architecture/leases/ Kubernetes has already provided the solution to this problem 36
  18. Thundering Herd Problem … Pull Image-A Pull Image-A In-Cluster Lease

    for Image-A Attempt to acquire a lock Registries 38
  19. Thundering Herd Problem … Pull Image-A Pull Image-A In-Cluster 🏆

    Lease for Image-A Attempt to acquire a lock Awaiting cache load Registries 39
  20. Summary CIRC features Transparency Users don’t need to care Multi-tenancy

    Secure cache sharing Preheating Push images for fast first pulls OCI Artifacts Broader use cases Real-world outcome • Pod startup accelerated by ~20% • Saved ~23TB/week Internet data transfer Built upon containerd and OCI standards Origin Node Local Cache 41