Cluster AutoscalerをTerraformとHelmfileでデプロイしてPrometheusでモニタリングする / Deploy the Cluster Autoscaler with Terraform and Helmfile, Monitor with Prometheus

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングする Kubernetes Meetup Tokyo #25 Hidetake Iwata
at NTT DATA (@int128)

2 Who are you? Software Engineer at NTT DATA, working
on DevOps and Cloud Native Technology R&D. Author of kubectl plugins (kubelogin, kauthproxy).

お話しすること • Cluster Autoscalerのデプロイ（Terraform, Helmﬁle） • Cluster Autoscalerのモニタリング（Prometheus, Grafana）お話ししないこと
• Cluster Autoscalerのマニアックな仕様今日お話しすること 3 CI/CD Observability

クラスタに必要なリソース（CPU Request, Memory Request）に応じて、ノード数を自動的に増減させるツール。 Kubernetes Cluster Cluster Autoscalerとは Worker
Nodes https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler クラスタに必要なリソースを計算する (Core) 例：メモリ不足で新しい Podが起動できないため、ノードの追加が必要と判断するクラウド依存のスケール処理を行う (Cloud Provider) 例：AWSの場合はAuto Scaling GroupのDesired Capacityを増やす 4

公式のHelm Chartを利用するとCluster Autoscalerを簡単にデプロイできる。（GCP やAzureの場合はマネージドサービスで設定できる） Helm Chart Cluster Autoscalerのデプロイ https://github.com/helm/charts/tree/master/stable/cluster-autoscaler stable/cluster-autoscaler
5 Helm Release Deployment Cluster Role Service Account ...

AWS Cluster Autoscalerのデプロイ AWSの場合、Cluster AutoscalerにIAM Roleを割り当てて、Cluster Autoscalerが Auto Scaling Groupを操作できる必要がある。
6 stable/cluster-autoscaler Deployment stable/kube2iam DaemonSet Auto Scaling Group IAM Role (Cluster Autoscaler) Cluster AutoscalerがAWS APIにアクセスする kube2iamが一時的なクレデンシャルを取得する IAM Role (Worker) https://github.com/jtblin/kube2iam

クラスタにデプロイするHelm Releasesを宣言的に管理できるツール。すべてデプロイするには： $ helmfile sync YAMLとクラスタの差分を表示するには： $ helmfile diff
Helmﬁleとは https://github.com/roboll/helmfile 7 # helmfile.yaml releases: - name: cluster-autoscaler namespace: kube-system chart: stable/cluster-autoscaler values: - cloudProvider: aws awsRegion: {{ env "AWS_REGION" }} - name: kube2iam namespace: kube-system • Helm ReleasesのセットをYAMLで宣言できる • 設定値はインラインでも外部ファイルでも OK • テンプレートで環境変数を参照できる

Helm ReleasesはHelmfile、AWSのリソースはTerraformでデプロイする。（Terraformでも管理できるけどHelmfileの方がおすすめ※） HelmfileとTerraformによるデプロイ Auto Scaling Group IAM Role (Worker)
stable/cluster-autoscaler helmfile.yaml *.tf Helmfile Terraform ※個人の感想です 8 stable/kube2iam IAM Role (CA)

CI Ops AWS Cluster Autoscalerのデプロイメントパイプライン Helmﬁle Terraform 9 Git Repository
Auto Scaling Group IAM Role (Worker) stable/cluster-autoscaler stable/kube2iam IAM Role (CA) helmfile.yaml *.tf HelmfileでGitOpsも可能らしい（未検証）

Cluster Autoscalerの動作確認（1/2） CPU Requestの大きなPodをデプロイすると、ノードが追加される。 10 I0927 11:50:35.158353 1 scale_up.go:263] Pod
echoserver/echoserver-74fd7d865f-vkzqb is unschedulable I0927 11:50:35.158391 1 scale_up.go:300] Upcoming 0 nodes I0927 11:50:35.158521 1 scale_up.go:423] Best option to resize: ASG_NAME I0927 11:50:35.158540 1 scale_up.go:427] Estimated 1 nodes needed in ASG_NAME I0927 11:50:35.158556 1 scale_up.go:529] Final scale-up plan: [{ASG_NAME 4->5 (max: 8)}] I0927 11:50:35.158572 1 scale_up.go:694] Scale-up: setting group ASG_NAME size to 5 I0927 11:52:36.144782 1 clusterstate.go:194] Scale up in group ASG_NAME finished successfully in 2m0.794268739s

Cluster Autoscalerの動作確認（2/2）デフォルトでは、Cluster Autoscalerがノードが必要ないと判断してから10分後にノードが削除される。 11 I0927 11:57:07.790306 1 scale_down.go:407]
Node ip-172-19-67-52.ap-northeast-1.compute.internal - utilization 0.055000 I0927 11:57:07.790634 1 static_autoscaler.go:359] ip-172-19-67-52.ap-northeast-1.compute.internal is unneeded since 2019-09-27 11:57:07.773690521 +0000 UTC m=+2997.491422805 duration 0s I0927 12:07:12.161679 1 static_autoscaler.go:359] ip-172-19-67-52.ap-northeast-1.compute.internal is unneeded since 2019-09-27 11:57:07.773690521 +0000 UTC m=+2997.491422805 duration 10m4.367847963s I0927 12:07:12.391908 1 auto_scaling_groups.go:269] Terminating EC2 instance: i-066bc60549f083e38

Cluster Autoscalerのモニタリング Cluster Autoscalerは以下の方法でモニタリングできる。 • メトリクスをPrometheusで取得する。　←本スライドで説明 • Podのログを参照する。 • ConﬁgMapに格納されているステータスを参照する。
• Eventをsubscribeする。 12 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md

Prometheus OperatorのServiceMonitorリソースを利用すると、監視対象のServiceとPrometheusを紐づけることができる。同じNamespaceに配置する必要がある Prometheus ServiceMonitorとは Prometheus ServiceMonitor Service Pod
(exporter) Grafana https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/running-exporters.md 13

# 実際に生成されるマニフェスト apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: prometheus: kube-prometheus
release: prometheus-operator name: cluster-autoscaler-aws-cluster-autoscaler namespace: monitoring # helmfile.yaml releases: - name: cluster-autoscaler namespace: kube-system chart: stable/cluster-autoscaler values: - serviceMonitor: enabled: true namespace: monitoring selector: release: prometheus-operator Cluster AutoscalerのServiceMonitor Cluster AutoscalerのHelm ChartにはServiceMonitorが含まれている。このラベルが付いた Prometheusに登録される 14

15 Cluster AutoscalerのGrafanaダッシュボード https://grafana.com/grafana/dashboards/3831

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングするシリーズにできるかも？？ 16

まとめ Cluster Autoscalerを利用すると、クラスタに必要なリソースに応じてノード数を自動的に増減できます。 TerraformとHelmﬁleによるCluster Autoscalerのデプロイ、 PrometheusとGrafanaによるCluster Autoscalerのモニタリングについて説明しました。 17
※記載されている会社名、商品名、サービス名は各社の登録商標または商標です。

Cluster AutoscalerをTerraformとHelmfileでデプロイしてPro...

Cluster AutoscalerをTerraformとHelmfileでデプロイしてPrometheusでモニタリングする / Deploy the Cluster Autoscaler with Terraform and Helmfile, Monitor with Prometheus

Hidetake Iwata

More Decks by Hidetake Iwata

Other Decks in Technology

Featured

Transcript

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングする Kubernetes Meetup Tokyo #25 Hidetake Iwata

2 Who are you? Software Engineer at NTT DATA, working

お話しすること • Cluster Autoscalerのデプロイ（Terraform, Helmﬁle） • Cluster Autoscalerのモニタリング（Prometheus, Grafana）お話ししないこと

クラスタに必要なリソース（CPU Request, Memory Request）に応じて、ノード数を自動的に増減させるツール。 Kubernetes Cluster Cluster Autoscalerとは Worker

AWS Cluster Autoscalerのデプロイ AWSの場合、Cluster AutoscalerにIAM Roleを割り当てて、Cluster Autoscalerが Auto Scaling Groupを操作できる必要がある。

クラスタにデプロイするHelm Releasesを宣言的に管理できるツール。すべてデプロイするには： $ helmfile sync YAMLとクラスタの差分を表示するには： $ helmfile diff

Helm ReleasesはHelmfile、AWSのリソースはTerraformでデプロイする。（Terraformでも管理できるけどHelmfileの方がおすすめ※） HelmfileとTerraformによるデプロイ Auto Scaling Group IAM Role (Worker)

CI Ops AWS Cluster Autoscalerのデプロイメントパイプライン Helmﬁle Terraform 9 Git Repository

Cluster Autoscalerの動作確認（1/2） CPU Requestの大きなPodをデプロイすると、ノードが追加される。 10 I0927 11:50:35.158353 1 scale_up.go:263] Pod

Cluster Autoscalerの動作確認（2/2）デフォルトでは、Cluster Autoscalerがノードが必要ないと判断してから10分後にノードが削除される。 11 I0927 11:57:07.790306 1 scale_down.go:407]

Cluster Autoscalerのモニタリング Cluster Autoscalerは以下の方法でモニタリングできる。 • メトリクスをPrometheusで取得する。　←本スライドで説明 • Podのログを参照する。 • ConﬁgMapに格納されているステータスを参照する。

Prometheus OperatorのServiceMonitorリソースを利用すると、監視対象のServiceとPrometheusを紐づけることができる。同じNamespaceに配置する必要がある Prometheus ServiceMonitorとは Prometheus ServiceMonitor Service Pod

# 実際に生成されるマニフェスト apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: prometheus: kube-prometheus

15 Cluster AutoscalerのGrafanaダッシュボード https://grafana.com/grafana/dashboards/3831

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングするシリーズにできるかも？？ 16