Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strengthening Kubernetes Security with Fine-gra...

Strengthening Kubernetes Security with Fine-grained SupplementalGroups Control | Japan Community Days in #KubeCon + CloudNativeCon Japan 2025

This is the slide for Japan Community Days in KubeCon + CloudNativeCon Japan 2025
https://community.cncf.io/events/details/cncf-cloud-native-community-japan-presents-japan-community-day-at-kubecon-cloudnativecon-japan-2025/

Abstract:
This session explores the security enhancement KEP-3619: "Fine-grained SupplementalGroups Control" graduated to beta in Kubernetes v1.33. This KEP addresses the previous mysterious behavior of managing supplementary groups, which could introduce security risks in accessing volumes. The feature enables more precise control over supplementary groups, strengthening the declarativeness of Pod configurations. Additionally, it enhances the transparency of UID/GID details through Pod manifests, offering improved oversight of security settings. Attendees will learn how these advancements can simplify and strengthen their security strategy regarding supplementary groups management.

Avatar for Shingo Omura

Shingo Omura

June 15, 2025
Tweet

More Decks by Shingo Omura

Other Decks in Technology

Transcript

  1. Strengthening Kubernetes Security with Fine-grained SupplementalGroups Control https://kep.k8s.io/3619 Shingo Omura,

    LY Corporation @everpeace Japan Community Day at KubeCon + CloudNativeCon Japan 2025
  2. @everpeace Shingo OMURA / @everpeace ▶ AI Platform Engineer @

    LY Corporation ◦ Develop/Operate large scale heterogeneous GPU Kubernetes clusters ▶ Upstream Contribution: SIG-Node, Sig-Scheduilng ◦ KEP-3619: Fine-grained supplemental groups control (Beta in v1.33) ◦ KEP-5055: DRA: device taints and tolerations (Alpha in v1.33) ▶ Active Engagement to Japan Kubernetes Communities
  3. @everpeace Outline • Understand the Problem: Implicit /etc/group merging ◦

    Why it happens ◦ Why it’s a security concern • Discover the Solution: KEP-3619 (supplementalGroupsPolicy API) • Explore KEP-3619 Feature Status & Tips (Beta in v1.33) • Achieve Enhanced Security & Improved Pod Declarativeness
  4. @everpeace The Basic: Supplementary Groups $ id uid=1000(codespace) gid=1000(codespace) groups=1000(codespace),106(ssh),107(docker),⋯

    User ID (UID) Primary Group ID (GID) Supplementary Group IDs Today’s Focus 📰Linux expects gid to be the first in groups. Some container runtimes did not adhere to this constraint before. This triggered a CVE. → "Vulnerability in Linux containers ‒ investigation and mitigation" TL;DR; setgid can bypass negative group permission by leaving primary group
  5. @everpeace Specifying Groups in Kubernetes Pods spec: securityContext: runAsUser:1000 runAsGroup:1000

    supplementalGroups:[60000] containers: - name: ctr securityContext: # There are runAsUser, runAsGroup fields, but no supplementalGroups field 📝 If runAsUser/runAsGroup was empty, image configuration (USER directive) will be inspected & activated
  6. @everpeace 👻 The Problem: Implicit Group Merging from Image FROM

    ubuntu RUN useradd -m -u 1000 alice \ && groupadd -g 50000 group-in-image \ && gpasswd -a alice group-in-image uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image) ,60000 spec: securityContext: runAsUser: 1000 # alice supplementalGroups: [60000] containers: - image: image-above command: ["id"] 👈 In Container Image (/etc/group): alice(1000) is in group-in-image(50000) 😱 Implicit supplementary group ID group-in-image(50000) is attached!! 😱 Can’t avoid this before KEP-3619 👈 In Pod Manifest: runAsUser is alice(1000), supplementalGroups is 60000 (no GID 50000 in manifests!) 👻
  7. @everpeace 🤔 Why is this a Problem? 1/2 groups=1000(alice),50000(group-in-image),60000 😿

    Security Risk ◦ Implicit groups can potentially bypass volume access control, particularly shared volumes, NFS, HostPath, etc. (k/k#112879) ◦ Baking Custom image can attach arbitrary gids Shared Volume (NFS, HostPath) groups=1000(alice),60000 -rw-rw---- 1 nobody group-in-image 1.5M Jun 15 2025 only-group-in-image 🔥 📰 As shown in k/k#112879 , Kubernetes Security Response Committee (SRC) responded that this behavior "works as intended," and they recommended discussing how to handle this behavior in a public
  8. @everpeace 🤔 Why is this a Problem? 2/2 😿 Policy

    Evasion ◦ Policy engines(e.g. Gatekeeper, Kyverno) cannot easily detect/validate implicit groups because no information in the manifests 😿 Reduced Declarativeness ◦ Effective supplementary groups depend on the image, not just manifests FROM ubuntu RUN useradd -m -u 1000 alice && groupadd -g 50000 group-in-image \ && gpasswd -a alice group-in-image uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image)     ,60000 spec: securityContext: runAsUser: 1000 # alice supplementalGroups: [60000] containers: - image: image-above command: ["id"]
  9. @everpeace 🪪 The Journey of User Identities in Kubernetes CRI

    Runtime (containerd, CRI-O, etc.) : converts from CRI arguments to OCI Runtime Spec CRI: LinuxContainerSecurityContext # CRI (protobuf) security_context { run_as_user: 1000, run_as_group: 1000, supplemental_groups: [60000] } OCI Runtime (runc, crun, youki, etc.) : spawn containers from OCI runtime spec OCI Runtime Spec: process.user # OCI runtime spec (json) "user": {"uid": 1000, "gid": 1000, "additionalGids": [50000, 60000]} Kubelet: Converts from Manifests/Image Config to CRI OCI Image Spec: Configuration.User PodSecurityContext (ContainerSecurityContext) # Dockerfile USER 1000:1000 # OCI Image Configuration # application/vnd.oci.image.config.v1+json "config": {"User": "1000:1000"} # PodSpec spec: securityContext: runAsUser: 1000 # alice runAsGroup: 1000 # alice supplementalGroups: [60000] 👀 Implicit merging happens inside of CRI Runtime!!
  10. @everpeace √ The Root Cause 1/2: Ambiguous Spec of supplementalGroups

    Field ~v1.25: Ambiguous v1.26〜v1.30: Clearly Explained(incl. implicit merging) A list of groups applied to the first process run in each container, in addition to the container's primary GID. If unspecified, no groups will be added to any container. A list of groups applied to the first process run in each container, in addition to the container's primary GID, the fsGroup (if specified), and group memberships defined in the container image for the uid of the container process. If unspecified, no additional groups are added to any container. Note that group memberships defined in the container image for the uid of the container process are still effective, even if they are not included in this list.
  11. @everpeace √ The Root Cause 1/2: Ambiguous Spec of supplementalGroups

    Field v1.31〜: Explains the Effect of fine-grained control A list of groups applied to the first process run in each container, in addition to the container's primary GID and fsGroup (if specified). If the SupplementalGroupsPolicy feature is enabled, the supplementalGroupsPolicy field determines whether these are in addition to or instead of any group memberships defined in the container image. If unspecified, no additional groups are added, though group memberships defined in the container image may still be used, depending on the supplementalGroupsPolicy field.
  12. @everpeace √ The Root Cause 2/2: Just Inherited From Docker

    docker run -u 1000 image-below -- id FROM ubuntu RUN useradd -m -u 1000 alice \ && groupadd -g 50000 group-in-image \ && gpasswd -a alice group-in-image uid=1000(alice) gid=1000(alice) groups=1000(alice),50000(group-in-image) ☝docker run also merges GID from /etc/group kubernetes/website#46921
  13. @everpeace Pod.spec.securityContext: runAsUser: 1000 # alice runAsGroup: 1000 # alice

    supplementalGroups:[60000] supplementalGroupsPolicy: Merge|Strict Proposal 1: SupplementalGroupsPolicy API uid=1000(alice) gid=1000(alice) groups=1000(alice), 50000(group-in-image),60000 Merge uid=1000(alice) gid=1000(alice) groups=1000(alice),60000 Strict 👻
  14. @everpeace Proposal 2: Identity Transparency in Pod Status Helps cluster

    admin checking • implicit identities before setting supplementalGroupsPolicy • insecure UID/GID (=0) 🪢 KEP-2172: Warn about quietly-insecure container UIDs/GIDs # Pod.status.containerStatuses - user: linux: uid: 1000 gid: 1000 supplementalGroups: [1000, 10000, 60000] 📝 This Identities are initially attached one 📝 Actual identities are dynamic (privileged container can change them anytime)
  15. @everpeace Proposal 3: Exposing Feature Support • Helps cluster admin

    understand if the Node (CRI Runtime) can support supplementalGroups feature or not # Node.status features: supplementalGroupsPolicy: true|false # Don’t Confuse: runtimeHandler contains OCI Runtime(runc, crun, youki) features runtimeHandlers:[{"name": "", "features": …}, …] 📝 No scheduler integration is provided 🪢 KEP-5328: Node Capabilities
  16. @everpeace Feature Status: Available in v1.33+ • Feature Gate: SupplementalGroupsPolicy

    (kube-apiserver, kubelet) • v1.31: Alpha Feature ◦ Manual Enablement Required • v1.33: Beta Feature! 🎉 ◦ Enabled by Default • The API is widely available in v1.33+ clusters!!
  17. @everpeace Strict Policy Needs CRI Runtime Support CRI Runtime (containerd,

    CRI-O, etc.) : converts from CRI arguments to OCI Runtime Spec CRI: LinuxContainerSecurityContext # CRI (protobuf) security_context { run_as_user: 1000, run_as_group: 1000, supplemental_groups: [60000], supplementalGroupsPolicy: "Strict" } OCI Runtime (runc, crun, youki, etc.) : spawn containers from OCI runtime spec OCI Runtime Spec: process.user # OCI runtime spec (json) "user": {"uid": 1000, "gid": 1000, "additionalGids": [50000, 60000]} Kubelet: Converts from Manifests/Image Config to CRI OCI Image Spec: Configuration.User PodSecurityContext (ContainerSecurityContext) # Dockerfile USER 1000:1000 # OCI Image Configuration # application/vnd.oci.image.config.v1+json "config": {"User": "1000:1000"} # PodSpec spec: securityContext: runAsUser: 1000 # alice runAsGroup: 1000 # alice supplementalGroups: [60000] supplementalGroupsPolicy: Strict 👀Strict policy needs CRI Runtime Support
  18. @everpeace CRI Runtimes Supporting Strict Policy • containerd v2.0+ •

    CRI-O v1.31+ # Node status: features: supplementalGroupsPolicy: true 👆 You can check if your CRI runtime actually supports the feature
  19. @everpeace Beta Feature: Policy Enforcement What happened if Strict policy

    on Unsupported Node? • Alpha (v1.31) : Silent Fallback to Merge policy • Beta (v1.33) : Pod Rejection by Kubelet # Event type: Warning reason: SupplementalGroupsPolicyNotSupported message: "SupplementalGroupsPolicy=Strict is not supported in this node" involvedObject: apiVersion: v1 kind: Pod 👆 kubelet emits warning events for pod rejectsions
  20. @everpeace Upgrade Considerations (v1.33+) 🆗 If you did NOT use

    Strict policy: • No worry, just upgrade • Ensure CRI Runtime supporting Strict policy when using 🚨 If you already use Strict policy: • Ensure CRI Runtime supporting Strict policy (containerd v2.0+, CRI-O v1.31+) Otherwise, your Strict policy pods will be rejected 👍 Recommended Strategies: • Upgrade CRI Runtimes together with or before upgrading Kubernetes • Or, use node labels (e.g. supplemental-groups-policy-strict=supported) • and nodeSelector on Strict policy pods ◦ But, you’ll have to monitor Pending pods instead of pod rejections
  21. @everpeace 😿 We Can Not Change These Workaround? (for v1.31-/Old

    CRI Runtime) YES! Custom RuntimeClass is the rescue! Kubelet CRI LinuxContainer SecurityContext PodSecurityContext ContainerSecurityContext CRI Runtime OCI Runtime Spec process.user OCI Runtime (RuntimeClass) # PodSpec spec: securityContext: runAsUser: 1000 # alice runAsGroup: 1000 # alice supplementalGroups: [60000] # OCI runtime spec (json) "user": { "uid": 1000, "gid": 1000, "additionalGids": [ 50000, 60000 ] } 👻 😄We Can Change Here!
  22. @everpeace pfnet-research/strict-supplementalgroups-container-runtime Kubelet CRI Runtime strict-supplementalgroups container rumtime Bundle Directory

    OCI Runtime Spec (config.json) rootfs/ underlying OCI Runtime (runc, nvidia-container-runtime, etc.) # Pod.Spec runtimeClassName: strict-supplementalgroups securityContext: runAsUser: 1000 # alice runAsGroup: 1000 # alice supplementalGroups: [60000] PodSpec ① Create config.json ② Call OCI runtime ③ Inspect Pod spec ③ Remove gids not in Pod spec ④ Call underlying OCI runtime # OCI runtime spec (config.json) "user": { "uid": 1000, "gid": 1000, "additionalGids": [50000, 60000] } ⌫ groups=1000(alice), 60000 🎉
  23. @everpeace Key Takeaways • Implicit group merging from /etc/group causes

    ◦ potential security risk, policy evasion, reduced declarativeness • KEP-3619 addresses the issue ◦ supplementalGroupsPolicy API can control the behavior ◦ Strict policy can enhance security and uid/gid transparency • The Feature is Beta in v1.33, enabled by default ◦ Need newer CRI Runtimes (containerd v2.0+, CRI-O v1.31+) ◦ Be cautious for pod rejections with Strict policy on unsupported nodes • Workaround for older Kubernetes/CRI Runtime: Custom RuntimeClass ◦ pfnet-research/strict-supplementalgroups-container-runtime
  24. @everpeace Future: Graduation to GA • Working towards GA (Planned

    in v1.35) ◦ Pass Product Readiness Review for GA ◦ More test coverage (upgrade/downgrade scenarios) • Planned Enhancements ◦ suppolementalGroupsPolicy:Strict in Pod Security Standard (restricted?baseline?) ◦ Adds suppolementalGroupsPolicy in Kubernetes Conformance
  25. @everpeace Get Involved! • Join SIG-Node community! ◦ github.com/kubernetes/community/sig-node/ ◦

    This feature is developed and maintained by SIG-Node • Provide feedback on your experience using the feature • Contribute to Documentation, Testing, or Code
  26. @everpeace Thank you❕ Questions❔ Links • https://kep.k8s.io/3619 • Kubernetes v1.33:

    Fine-grained SupplementalGroups Control Graduates to Beta Contacts @everpeace @everpeace @everpeace