Upgrade to Pro — share decks privately, control downloads, hide ads and more …

年末なのでオンプレ Kubernetes クラスタをアップグレードしよう

年末なのでオンプレ Kubernetes クラスタをアップグレードしよう

Avatar for cyokozai

cyokozai

April 01, 2026

More Decks by cyokozai

Other Decks in Technology

Transcript

  1. 井上 裕介 a.k.a cyokozai 千葉工業大学大学院 情報科学研究科 情報工学専攻 修士1年 ❏ ネットワークコンテンツ研究会

    Nekko Cloud Team ❏ 株式会社スリーシェイク sreake事業部 学生インターン ❏ CloudNative Days実行委員 グエ^〜研究やばいンゴ...
  2. 実行環境 • ローカル ◦ Macbook Air (arm) ◦ DevContainer •

    IaaSクラスタ ◦ Nodes ▪ vm-01 (Ubuntu22.04) ▪ vm-02 (Ubuntu22.04) ▪ vm-03 (Ubuntu22.04)
  3. 変更前の状態 2025.12.05時点 nke-kubespray-inventory $ kubectl get nodes NAME STATUS ROLES

    AGE VERSION vm-01 Ready control-plane 386d v1.30.4 vm-02 Ready control-plane 386d v1.30.4 vm-03 Ready control-plane 386d v1.30.4
  4. nke-kubespray-inventory $ kubectl get nodes NAME STATUS ROLES AGE VERSION

    vm-01 Ready control-plane 386d v1.30.4 vm-02 Ready control-plane 386d v1.30.4 vm-03 Ready control-plane 386d v1.30.4 変更前の状態 2025.12.05時点 全部コントロールプレーン!? 1年稼働中 EOL….
  5. k8sクラスタは一旦無事でした (ほっ....) vm-01 +------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK

    | ERROR | +------------------------+--------+-------------+-------+ | https://127.0.0.1:2379 | true | 32.384996ms | | +------------------------+--------+-------------+-------+ vm-02 +------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +------------------------+--------+-------------+-------+ | https://127.0.0.1:2379 | true | 13.916231ms | | +------------------------+--------+-------------+-------+ vm-03 +------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +------------------------+--------+-------------+-------+ | https://127.0.0.1:2379 | true | 12.934121ms | | +------------------------+--------+-------------+-------+
  6. kubesprayでk8sを更新する手順 1. 各ノードを一つずつバックアップを取る 2. DevContainerを起動 3. Python3 pipをインストール 4. upgrade-cluster.yamlのk8sバージョン情報を書き換える

    5. k8sバージョンに対応するkubesprayリポジトリをクローン 6. pip install → Ansibleなどインストール 7. Ansible Playbookを実行 8. 発行されたadmin.confをKUBECONFIGに設定
  7. 稼働中のインフラの設定repoをクローン . ├── local │ ├── group_vars -> ../sample/group_vars │

    └── hosts.ini ├── nke-cluster │ ├── artifacts │ │ ├── admin.conf │ │ ├── kubectl │ │ └── kubectl.sh │ ├── credentials │ │ └── kubeadm_certificate_key.creds │ ├── group_vars │ │ ├── all │ │ │ ├── all.yml │ │ │ ├── aws.yml │ │ │ ├── azure.yml │ │ │ ├── containerd.yml │ │ │ ├── coreos.yml
  8. 現在稼働中のインフラ - inventory.ini (変更なし) [all] node1 ansible_host=10.x.128.x ansible_user=ncadmin ip=10.x.128.x etcd_member_name=etcd1

    node2 ansible_host=10.x.128.x ansible_user=ncadmin ip=10.x.128.x etcd_member_name=etcd2 node3 ansible_host=10.x.128.x ansible_user=ncadmin ip=10.x.128.x etcd_member_name=etcd3 [kube_control_plane] node1 node2 node3 [etcd] node1 node2 node3 [kube_node] node1 node2 node3 [calico_rr] [k8s_cluster:children] kube_control_plane kube_node calico_rr
  9. 実行完了🙌 PLAY RECAP *************************************************************************************************************************************** vm-01 : ok=937 changed=78 unreachable=0 failed=0

    skipped=1587 rescued=0 ignored=1 vm-02 : ok=849 changed=67 unreachable=0 failed=0 skipped=1356 rescued=0 ignored=2 vm-03 : ok=851 changed=67 unreachable=0 failed=0 skipped=1354 rescued=0 ignored=2 Saturday 06 December 2025 00:01:20 +0000 (0:00:00.058) 0:32:14.991 ***** =============================================================================== kubernetes/control-plane : Kubeadm | Upgrade first control plane node ------------------------------------------------------------------- 123.56s kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes ------------------------------------------------------------------ 123.52s kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes ------------------------------------------------------------------ 108.19s network_plugin/cilium : Cilium | Wait for pods to run ------------------------------------------------------------------------------------ 54.03s upgrade/system-upgrade : Reboot after APT Dist-Upgrade ----------------------------------------------------------------------------------- 40.92s upgrade/system-upgrade : Reboot after APT Dist-Upgrade ----------------------------------------------------------------------------------- 40.59s upgrade/system-upgrade : Reboot after APT Dist-Upgrade ----------------------------------------------------------------------------------- 40.15s upgrade/post-upgrade : Wait for cilium --------------------------------------------------------------------------------------------------- 21.52s download : Prep_download | Register docker images info ----------------------------------------------------------------------------------- 20.98s kubernetes/client : Copy kubectl binary to ansible host ---------------------------------------------------------------------------------- 20.61s etcd : Gen_certs | Write etcd member/admin and kube_control_plane client certs to other etcd nodes --------------------------------------- 18.83s kubernetes/control-plane : Kubeadm | Check api is up ------------------------------------------------------------------------------------- 17.70s upgrade/pre-upgrade : Drain node --------------------------------------------------------------------------------------------------------- 17.60s container-engine/containerd : Containerd | Unpack containerd archive --------------------------------------------------------------------- 14.95s kubernetes-apps/argocd : Kubernetes Apps | Install ArgoCD -------------------------------------------------------------------------------- 12.75s container-engine/validate-container-engine : Populate service facts ---------------------------------------------------------------------- 12.67s kubernetes-apps/argocd : Kubernetes Apps | Install ArgoCD -------------------------------------------------------------------------------- 12.53s kubernetes/control-plane : Kubeadm | Check api is up ------------------------------------------------------------------------------------- 11.74s network_plugin/cni : CNI | Copy cni plugins ---------------------------------------------------------------------------------------------- 11.62s network_plugin/cilium : Cilium | Create Cilium node manifests ---------------------------------------------------------------------------- 11.25s
  10. なんか出てる....? nke-kubespray-inventory $ kubectl get nodes NAME STATUS ROLES AGE

    VERSION vm-01 Ready,SchedulingDisabled control-plane 386d v1.31.4 vm-02 Ready control-plane 386d v1.31.4 vm-03 Ready control-plane 386d v1.31.4
  11. uncordonを忘れずに nke-kubespray-inventory $ kubectl uncordon vm-01 nke-kubespray-inventory $ kubectl get

    nodes NAME STATUS ROLES AGE VERSION vm-01 Ready, control-plane 386d v1.31.4 vm-02 Ready control-plane 386d v1.31.4 vm-03 Ready control-plane 386d v1.31.4 SchedulingDisabled: ノードが cordon 済み (Pod のスケジューリング停止状態)
  12. 2. 顕著な属人化 • 現在Kubernetesを操作可能な人間 …. 4,5人? (実は割と多いんじゃね?) • 管理者の若返りが必要 ◦

    後輩が頑張ってDocker/Kubernetes勉強中💪 • ドキュメント整理の必要性 ◦ 知識の共有と認識の共通化
  13. 東京を超える、過去最大のカンファレンス CloudNative Days × Platform Engineering Kaigi × SRE Kaigi

    国内最前線の3大カンファレンスが、名古屋に集結。 2026年 5月14日(木)・15日(金) 中日ホール&カンファレンス 来週スポンサー受付開始! CfPは1月予定