Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
OpenShift.Run-2024-Medik8s
Search
orimanabu
March 28, 2024
Technology
1.9k
2
Share
OpenShift.Run-2024-Medik8s
OpenShift.Run 2024でのMedik8sセッションの資料です。
orimanabu
March 28, 2024
More Decks by orimanabu
See All by orimanabu
OpenShiftのBGPサポート - MetalLB+FRR-k8s編
orimanabu
0
510
OpenShiftのBGPサポート - OVN-Kubernetes編
orimanabu
0
690
Whats_new_in_Podman_and_CRI-O_2025-06
orimanabu
3
290
podman_update_2024-12
orimanabu
2
610
KubeVirt Networking ONIC 2024
orimanabu
6
2.3k
Podman_CRI-O_update_2024-02
orimanabu
5
1.8k
Podman_update_2023-11
orimanabu
2
960
OVN-Kubernetes-Introduction-ja-2023-01-27.pdf
orimanabu
3
3.9k
Podman_CRI-O_update_2022-08
orimanabu
2
1.2k
Other Decks in Technology
See All in Technology
ポスター発表&デモと総括 / Poster Presentations & Demonstrations and Summary
ks91
PRO
0
190
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
2
110
PHP と TypeScript の型システム比較:AI 時代の「型」は誰のためにあるのか? #frontend_phpcon_do / frontend_phpcon_do_2026
shogogg
1
240
「嘘をつくテスト」の失敗例から学ぶ 良いテストコード #frontend_phpcon_do
asumikam
0
170
トークン数だけでは測れない — Claude Code 組織展開の効果検証から学んだこと
makikub
0
120
ITエンジニアを取り巻く環境とキャリアパス / A career path for Japanese IT engineers
takatama
4
1.8k
Ruby::Boxでできること、Refinementsでできること
joker1007
3
380
サイバーセキュリティ概論 / Introduction to Cybersecurity
ks91
PRO
0
130
JEP 522 Deep Dive - G1 GC同期コスト削減によるスループット向上を徹底検証&解説
tabatad
1
720
ChatworkとBPaaS 異なる特性で学んだAI機能開発の ベストプラクティス
kubell_hr
2
2.4k
価格.comをAI駆動で全面刷新する ー 30年分の技術的負債を返し、次の30年の土台をつくる ー / AI Engineering Summit Tokyo 2026
tkyowa
45
45k
JJUG CCC 2026 Spring AI時代の開発こそ標準化を武器に! ― 方式・プロセス・プラットフォームの標準化
s27watanabe
2
690
Featured
See All Featured
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
200
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
570
Music & Morning Musume
bryan
47
7.2k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
190
How to Ace a Technical Interview
jacobian
281
24k
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
Odyssey Design
rkendrick25
PRO
2
670
Tell your own story through comics
letsgokoyo
1
940
The agentic SEO stack - context over prompts
schlessera
0
790
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
1
340
What does AI have to do with Human Rights?
axbom
PRO
1
2.2k
A Soul's Torment
seathinner
6
2.9k
Transcript
- Workload Availability for OpenShift - ノード障害時にStatefulSetも 自動回復したい! そんなときは medik8s.io
Manabu Ori Red Hat 1 OpenShift.Run 2024 2024-03-28
2 ▸ worker-2で動いていたPodにEvictionが発動 ▸ k8sの自動復旧の仕組みにより、他のノードで自動的に ...起動してない子がいる?! ▸ worker-2で動いていたPodは Terminating のまま
こういう状況見たことありませんか? ▸ worker-2でノード障害が発生!すると...
3 こういう状況見たことありませんか? ▸ ...といった状況を自動で復旧させる方法をご紹介します 勝手に復旧!
4 ▸ デフォルト設定では、ノード障害発生からおよそ 5分40秒でPodのEvictionが始まる ノード障害時の動き ノード障害発生 Taint付与 Eviction発動 • node.kubernetes.io/not-ready
• node.kubernetes.io/unreachable tolerationSecondsのデフォルトは 300秒 node-monitor-grace-period (40秒) default-not-ready-toleration-seconds default-unreachable-toleration-seconds (300秒)
5 ▸ ノード障害時にPodが自動回復しないパターン: at-most-one semantics ・ StatefulSet ・ RWOなPVを持つPod ▸
KubeVirt/OpenShift Virtualizationでノード障害時にVMが別ノードで自動回復しない Kubernetesのありがちな悩み ▸ 本資料では、これを自動でなんとかするための仕組みをご紹介します ・ キーワード: ・ Workload Availability for OpenShift ・ Node Health Check Operator, Self Node Remediation Operator, ... ・ Medik8s https://www.medik8s.io/ ・ そういえば、最近(k8s v1.28で)Non Graceful Node Shutdownの機能がGAしましたね...
6 ▸ Failure Detectionの仕組みでノード障害を検知 ・ Machine Health Check (OpenShift標準機能、IPI前提) ・
Node Health Check Operator (UPIでも使用可) ・ SIG ClusterのExternal Remediation APIを利用 ▸ 検知の結果を元に、Remediationの仕組みを使って自動復旧 ・ Self Node Remediation (IPMI/Redfish必要なし) ・ Fence Agents (IPMI/Redfish前提) ・ Machine Deletion (APIでノード削除) 概要 Failure Detection Remediation Machine Health Check Node Health Check Operator Self Node Remediation Operator Fence Agents Remediation Operator Machine Deletion Remediation Operator OpenShift標準機能 追加Operator Legend: 障害検知 自動復旧
7 概要 Node Health Check Operator NodeHealthCheck SelfNode Remediation Self
Node Remediation Operator SelfNode Remediation Config SelfNode Remediation Template Soft Watchdogに よる回復処理 FenceAgent Remediation Fence Agent Remediation Operator SelfNode Remediation Template FenceAgent Remediation Template IPMIによる フェンシング ノード障害を検知し、 Remediationが必要と判断する と、Templateから XXRemediation CRを生成 NHC Operatorがノード障害を検 知し、Remediationが必要と判断 すると、Templateから XXRemediation CRを生成 Self Node Remediation Fence Agent Remediation
8 ▸ 処理の概要: ・ 各ノードのNodeConditionsを見て、NodeHealthCheckで定義した判断基準/しきい値にひっ かかるかを確認 ・ 障害発生と判断でき、かつRemediation処理を実施するべき状況であれば、 Remediation Request
カスタムリソースを作成 ・ このCRを見てRemediation処理が走る ▸ ノードのラベルに応じて異なる対応が可能 ・ NodeHealthCheckでnodeSelectorを指定 ▸ もしクラスターのバージョンアップ中であれば、それが完了するまで remediation処理をスキップする ・ https://github.com/medik8s/node-healthcheck-operator/blob/2edb6fd6d1b0294f19 d62c2e91815e44b96b74f4/controllers/nodehealthcheck_controller.go#L249-L255 ・ 今のところOpenShiftのみ Node Healthcheck Operator
9 ▸ NodeHealthCheck CRの例 Node Healthcheck Operator apiVersion: remediation.medik8s.io/v1alpha1 kind:
NodeHealthCheck metadata: name: nodehealthcheck-sample spec: minHealthy: 51% pauseRequests: - <pause-test-cluster> remediationTemplate: apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: self-node-remediation-resource-deletion-template namespace: openshift-workload-availability kind: SelfNodeRemediationTemplate escalatingRemediations: - remediationTemplate: apiVersion: self-node-remediation.medik8s.io/v1alpha1 name: self-node-remediation-resource-deletion-template namespace: openshift-workload-availability kind: SelfNodeRemediationTemplate order: 1 timeout: 300s selector: matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists unhealthyConditions: - type: Ready status: "False" duration: 300s - type: Ready status: Unknown duration: 300s 最低限残す必要がある Healthyノード数 一時的にRemediationを止めたいときに使用 メインのRemediation処理のテンプレート Remediation処理を多重化したいときはここにリ ストする 適用するノードの指定 ノード障害と判断する条件
10 ▸ MachineHealthCheck/NodeHealthCheckで障害を検知した際、自動復旧を行う仕 組 みのひとつ ・ MHC/NHCコントローラがSelfNodeRemediation CRを作成 ・ SelfNodeRemediation
CRを見てSNR OperatorがRemediation処理を実行 ▸ IPMI/Redfishやノード操作のAPIを必要としない ・ Software Watchdogを使用 Self Node Remediation Operator apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediation metadata: name: selfnoderemediation-sample namespace: openshift-operators spec: remediationStrategy: <remediation_strategy> status: lastError: <last_error_message> apiVersion: self-node-remediation.medik8s.io/v1alpha1 kind: SelfNodeRemediationConfig metadata: name: self-node-remediation-config namespace: openshift-operators spec: safeTimeToAssumeNodeRebootedSeconds: 180 watchdogFilePath: /dev/watchdog isSoftwareRebootEnabled: true apiServerTimeout: 15s apiCheckInterval: 5s maxApiErrorThreshold: 3 peerApiServerTimeout: 5s peerDialTimeout: 5s peerRequestTimeout: 5s peerUpdateInterval: 15m (次ページ参照)
11 ▸ ResourceDeletion ・ ノード上のPodおよびPVCを削除する ▸ OutOfServiceTaint ・ KEP-2268: Non
graceful node shutdownの機能を使用 ・ PVを持つPodが動いているノードに障害が発生した場合でも、 PVのデ タッチおよびPodの削除ができるようになる ・ ノードがNotReady状態かつ「node.kubernetes.io/out-of-service」 Taintが付与されている場合に、kube-controller-managerがPVのデ タッチ/Pod削除処理を行う ▸ Automatic ・ OutofServiceTaintが利用できる場合は使用し、そうでなければ ResourceDeletionを使用する (デフォルト設定) Remediation Strategy
12 ▸ fence agentを使って電源オン/オフを行う ・ よく使われるのはfence_ipmilan Fence Agent Remediation Operator
apiVersion: fence-agents-remediation.medik8s.io/v1alpha1 kind: FenceAgentsRemediationTemplate metadata: name: fence-agents-remediation-template-fence-ipmilan namespace: openshift-workload-availability spec: template: spec: agent: fence_ipmilan nodeparameters: --ipport: master-0-0: '6230' master-0-1: '6231' master-0-2: '6232' worker-0-0: '6233' worker-0-1: '6234' worker-0-2: '6235' sharedparameters: '--action': reboot '--ip': 192.168.123.1 '--lanplus': '' '--password': password '--username': admin retryCount: '5' retryInterval: '5' timeout: '60'
13 ▸ 03:56:00 worker-2電源断 実際の動き (1) controllers.NodeHealthCheck Node is going
to match unhealthy condition {"node": "worker-2", "condition type": "Ready", "condition status": "Unknown", "duration left": "4m59.920799888s"} controllers.NodeHealthCheck.resource manager Creating a remediation CR {"CR name": "worker-2", "CR kind": "SelfNodeRemediation", "namespace": "openshift-workload-availability"} ▸ 03:56:35 [node-healthcheck-controller-manager] Unhealhtyノードの検出 ▸ 04:01:35 [node-healthcheck-controller-manager] SelfNodeRemediation作成 node-monitor-grace-period default-not-ready-toleration-seconds default-unreachable-toleration-seconds
14 ▸ 04:04:37 [node-healthcheck-controller-manager] SelfNodeRemediation更新 (phase: Reboot-Completed) ▸ 04:04:38 [self-node-remediation-controller]
out-of-service taint add 実際の動き(2) kube-controller-manager-master-2 kube-controller-manager I0328 04:04:47.598293 1 reconciler.go:277] "attacherDetacher.DetachVolume started: node has out-of-service taint, force detaching" node="worker-2" volumeName="kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-67bda59f-cf8f-4ad3-9ad9-df681d058b9c" kube-controller-manager-master-2 kube-controller-manager I0328 04:04:47.634199 1 reconciler.go:277] "attacherDetacher.DetachVolume started: node has out-of-service taint, force detaching" node="worker-2" volumeName="kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000002-25426b29-4dfa-48bf-8099-e7827b7cdca1" controllers.SelfNodeRemediation out-of-service taint added {"selfnoderemediation": "openshift-workload-availability/worker-2", "new taints": [{"key":"node.kubernetes.io/unreachable","effect":"NoSchedule","timeAdded":"2024-03-28T03:56:35Z"},{"key":"node.kubernetes.io/unreachable","effect":"NoExecute","time Added":"2024-03-28T03:56:40Z"},{"key":"medik8s.io/remediation","value":"self-node-remediation","effect":"NoExecute","timeAdded":"2024-03-28T04:01:36Z"},{"key":"node. kubernetes.io/unschedulable","effect":"NoSchedule","timeAdded":"2024-03-28T04:01:36Z"},{"key":"node.kubernetes.io/out-of-service","value":"nodeshutdown","effect":"No Execute","timeAdded":"2024-03-28T04:04:38Z"}]} ▸ 04:04:47 [kube-controller-manager] force detaching ▸ 04:01:37 [node-healthcheck-controller-manager] SelfNodeRemediation更新 (phase: Pre-Reboot-Completed) safeTimeToAssumeNodeRebootedSeconds
15 ▸ 04:04:48 [node-healthcheck-controller-manager] SelfNodeRemediation更新 (phase: Fencing-Completed) ▸ 04:04:48 [self-node-remediation-controller]
out-of-service taint remove 実際の動き (3) controllers.SelfNodeRemediation out-of-service taint removed {"selfnoderemediation": "openshift-workload-availability/worker-2", "new taints": [{"key":"node.kubernetes.io/unreachable","effect":"NoSchedule","timeAdded":"2024-03-28T03:56:35Z"},{"key":"node.kubernetes.io/unreachable","effect":"NoEx ecute","timeAdded":"2024-03-28T03:56:40Z"},{"key":"medik8s.io/remediation","value":"self-node-remediation","effect":"NoExecute","timeAdded":"2024-03-28T0 4:01:36Z"},{"key":"node.kubernetes.io/unschedulable","effect":"NoSchedule","timeAdded":"2024-03-28T04:01:36Z"}]}
16 ▸ Workload Availability for Red Hat OpenShift ・ https://access.redhat.com/documentation/en-us/workload_availability_for_red_hat_openshift
▸ Node Health Check Operator ・ https://www.redhat.com/en/blog/node-health-check-operator ▸ Medik8s ・ https://www.medik8s.io/ ▸ Ø Kubernetes Failover Improvement: Non-Graceful Node Shutdown - Yuiko Mori, NEC Solution Innovators ・ https://www.youtube.com/watch?v=28dbE24j_zc&list=PLbzoR-pLrL6q7ytV2im0AoNcJcADAeBtm ・ https://jpn.nec.com/oss/community/blog/kubernetes-failover-improvement-non-graceful-node-sh utdown.html ・ https://qiita.com/y-mo/items/ecb9207175543392ffbe 参考文献
17 ▸ KEP-2268: Non graceful node shutdown ・ https://github.com/kubernetes/enhancements/tree/master/keps/sig -storage/2268-non-graceful-shutdown
・ https://github.com/kubernetes/kubernetes/pull/108486 ▸ External Remediation API ・ https://github.com/kubernetes-sigs/cluster-api/pull/3190 参考文献
linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHat 18 Red Hat is the world’s
leading provider of enterprise open source software solutions. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. Thank you