2022.11.30 Kubernetes/Fargateの導入・活用事例-カカクコム×CROOZ×ナビタイム-【TECHHILLS】
https://techplay.jp/event/877265
ECS Fargate spotΛ׆༻ͨ͠ෛՙࢼݧπʔϧΛ3ؒӡ༻͔ͯͬͨ͜͠ͱ2022 / 11 / 30
View Slide
ࣗݾհלౡ ࠀӳେग़झຯΓɺΰϧϑɺϥϯχϯάɺٿ୲SRE PJ Manager։ൃɺӡ༻ɺϚωʔδϝϯτ …etc.
໊ࣾ: גࣜձࣾφϏλΠϜδϟύϯࣾһ: 500ਓຊࣾ: ౦ژߓ۠ೆ੨ࢁઃཱ: 2000ձࣾ֓ཁφϏλΠϜδϟύϯͷೆ੨ࢁΦϑΟε
ܦӦཧ೦ܦ࿏୳ࡧΤϯδϯͷٕज़Ͱੈքͷ࢈ۀʹไ͢Δ
ࣄۀྖҬ - B to CϝσΟΞࣄۀ ૯߹φϏαʔϏεϚʔέςΟϯάࢧԉτϥϕϧࣄۀ ཱྀߦαʔϏε؍ޫϚʔέςΟϯάࢧԉυϥΠϒࣄۀ ࣗಈं͚αʔϏεπʔϦϯάࣄۀ ࣗసं/ΦʔτόΠ͚αʔϏε๏ਓ/࣏ࣗମ͚αʔϏεόεɾΥʔΩϯάࣄۀόεɾ݈߁αʔϏεΩϟϦΞڠۀࣄۀϏδωεφϏλΠϜࣄۀ େܕं͚ಈଶཧ
ࣄۀྖҬ - B to BMaasࣄۀ MaasΞϓϦ͚ APIఏڙίϯςϯππʔϦζϜͷاըɾӡӦҬ࿈ܞࣄۀ ࣏ࣗମ͚αʔϏε։ൃࢧԉ؍ޫ٬༠கͷίϯαϧςΟϯάCASEࣄۀ ࣗಈं͚αʔϏεͷ։ൃࢧԉަ௨σʔλࣄۀ ަ௨ɾ؍ޫσʔλͷఏڙɾੳϩέʔγϣϯϚʔέςΟϯάࣄۀళฮσʔλཧΫϥυαʔϏεళฮϚʔέςΟϯάࢧԉιϦϡʔγϣϯࣄۀެڞަ௨ࣄۀ ަ௨ࣄۀऀ͚ιϦϡʔγϣϯ๏ਓ͚API,SDK,πʔϧͷఏڙ
৽ػೳɾαʔϏεୡྃ·Ͱͷॴཁ࣌ؒΛॖͨ͠८ճܦ࿏ݕࡧࠓͷॵͷਏ͔͞Βੜ·Εͨػೳ&7ϢʔβʔͷࠔΓ͝ͱʮॆిʯʹϑΥʔΧεҠಈͷ՝ղܾʮ/"7*5*.&GPS#BCZʯ
׆༻͍ͯ͠Δίϯςφؔ࿈ٕज़ɾσϓϩΠख๏ɾ՝ECS on fargateECS on ec2EKS on self managed nodesECS on fargate Λ3ؒར༻ͯ͠Ͳ͏͔ͩͬͨʁຊɺ͓͢Δ͜ͱ
શͯΦϯϓϨͰӡ༻NAVITIME Πϯϑϥͷྺ࢙35AWSΛࢼݧར༻ ΫϥυҠߦ։࢝2001~ 2015ʙ 2016ʙ
ར༻͍ͯ͠ΔΫϥυϕϯλʔ
ར༻ׂ߹͕Ұ൪େ͖͍ͷAWS୯Ұͷϕϯμʔʹݶఆͤͣɺ֤ϕϯμʔͷྑ͍෦Λ༗ޮʹ׆༻
ίϯςφؔ࿈ٕज़ར༻ࣄྫհ
ECS fargateࣄྫհ
AWS͕ఏڙ͍ͯ͠Δ ίϯςφΦʔέετϨʔγϣϯ αʔϏείϯςφΛՔಇͤ͞Δ Nodeͷཧ͕ෆཁECS fargate ͱʁ
ECS fargate ͷར༻ࣄྫLocust (ෛՙࢼݧπʔϧ) ͰFargateΛ׆༻140ݸͷServiceʹର͢ΔෛՙࢼݧͰར༻
γφϦΦΛPythonͰ࣮Ͱ͖Δʢ֦ுੑ͕ߴ͍ʣMaster - Slave ߏͰେنͳෛՙࢼݧΛߦ͏ࣄ͕ՄೳϦΞϧλΠϜʹෛՙࢼݧ݁ՌΛϒϥβ্Ͱ֬ೝͰ͖ΔLocustͷಛ
શମߏslaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService Discovery
Locust - masterͷׂslaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService Discoveryࢼݧ݁Ռ֬ೝ༻ WebϖʔδΛϦΞϧλΠϜʹՄࢹԽslave ͔Β౷ܭใΛऩूmasterͷׂ
slaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService Discoveryܭଌରαʔόʔʹ࣮ࡍʹϦΫΤετΛૹ৴͢ΔslaveͷׂLocust - slaveͷׂ
ॲཧͷྲྀΕᶃ Start swarming Λԡ͢
slaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService Discoveryᶄ Locust ͷ֤slaveίϯςφෛՙࢼݧͰར༻͢ΔϦΫΤετϦετΛs3͔ΒऔಘॲཧͷྲྀΕ
slaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService DiscoveryॲཧͷྲྀΕᶄ Locust ͷ֤slaveίϯςφܭଌରαʔόʔʹϦΫΤετΛૹ৴ɻϨεϙϯε݁ՌϦΞϧλΠϜʹMasterʹૹ৴͞ΕΔ
slaveECS Service: Master ECS Service: SlaveslaveslavemasterAccess logܭଌରService DiscoveryPythonͰ࣮Ͱ͖ΔͷͰ֦ுੑ͕ߴ͍S3͔ΒࣄલʹςετϦΫΤετΛऔಘϦΫΤετΛվ᜵/v1Λ /v2ʹมߋϦΫΤετϔομʔΛՃϦΫΤετૹ৴ִؒΛௐྫ
AWS CLI Λར༻͠CloudformationͷελοΫΛհͯ͠શͯͷAWSϦιʔεΛ࡞ࣾπʔϧͳͷͰ Blue/GreenσϓϩΠߦ͍ͬͯ·ͤΜ࡞ or আͷΈECS FargateڥͷσϓϩΠ
ECS fargate ͷσϓϩΠ1. ઃఆϑΝΠϧΛ࡞͠ɺGitʹPushECSClusterName: locustVpcId: vpc-xxxxxxxxECSSecurityGroupId: sg-xxxxxxxECSTaskExecutionRole: arn:aws:iam::333333333:role/EC2_allow_locustECSSubnetId1: subnet-22222222222ECSSubnetId2: subnet-44444444444ECSSubnetId3: subnet-11111111111ECSImageName: hoge:1.2.2TargetUrl: https://hoge.jpTargetService: xxx-truckapp-stgApacheLogUrl: s3://hoge/archive/latest/latest/*.gzRequestUrlPattern: /v1/[0-9]{8}/(route)RequestUrlExcludePattern: ''RequestUrlReplace: ''MasquaradeUseragent: falseRequestToEveryPath: falseSetScaleInAlarm: falseAddNLBToLocustMaster: falseLocustMaster:ECSTaskCPUUnit: 512ECSTaskMemory: 1024LocustOpts: --loglevel=ERROR --csv=outputMasterNamespace: hoge.jpPrivateNamespaceId: ns-xxxxxLocustSlave:ECSTaskCPUUnit: 256ECSTaskMemory: 512ECSTaskDesiredCount: 1LocustOpts: --loglevel=ERRORྫෛՙࢼݧΛߦ͏։ൃऀ
ECS fargate ͷσϓϩΠ2. Jenkins δϣϒΛͬͯ ECS Fargate ͷServiceΛσϓϩΠෛՙࢼݧΛߦ͏։ൃऀαʔϏεA ༻ڥαʔϏεB ༻ڥαʔϏεC ༻ڥ
ECS fargate ͷσϓϩΠ3. ECS TaskىಈޙɺServiceຖʹ͍ग़͞ΕΔRoute53υϝΠϯʹΞΫηε͠ɺෛՙࢼݧΛ։࢝ෛՙࢼݧΛߦ͏։ൃऀ
ECS FargateΛ࠾༻ͨ͠ܦҢӡ༻ίετར༻ϓϩμΫτ͕ଟ͍ࣾπʔϧͳͷͰɺॳظߏஙɾӡ༻ίετΛ͔͚ͨ͘ͳ͍
ECS FargateΛ࠾༻ͨ͠ܦҢLambda > ECS Fargate > ECS on ec2 ͷॱͰΞʔΩςΫνϟΛݕ౼ֶशίετ͕ൺֱత͍ (ECS on ec2 ɺEKSͱൺͯ)
ίετݮͷҝͷECS fargate spotΛ׆༻• ࠷େ70% ׂҾ͞ΕΔ• ࠓͷͱ͜ΖҰൃੜ͍ͯ͠·ͤΜTaskͷࣗಈఀࢭ• ෛՙࢼݧΛ࣮ߦ͍ͯ͠ͳ͍TaskΛݕ͠ɺࣗಈͰআ
ECS on ec2ࣄྫհ
NAVITIME NAVITIME ΧʔφϏλΠϜ ࣗసंNAVITIMEJapan Travelby NAVITIMEτϥοΫΧʔφϏ πʔϦϯάαϙʔλʔͰӡ༻͍ͯ͠ΔϓϩμΫτͷྫECS on ec2ALKOO by NAVITIME
NAVITIME NAVITIME ΧʔφϏλΠϜ ࣗసंNAVITIMEJapan Travelby NAVITIMEτϥοΫΧʔφϏ πʔϦϯάαϙʔλʔͰӡ༻͍ͯ͠ΔϓϩμΫτͷྫECS on ec2ECS on ec2ࣾͰҰ൪ଟ͘ར༻͞Ε͍ͯΔALKOO by NAVITIME
ΠϯϑϥίετΛۃྗԼ͍͛ͨFargate͕ϦϦʔε͞Εͨ࣌ɺطʹAutoscalingGroupͱECS Λ࿈ಈͤ͞Δ ϊϋ͕ࣾʹ͋ͬͨҰ෦APIͰNAVITIMEͷੑೳཁ݅Λຬͨͤͳ͍Մೳੑ͕͋ͬͨECS on ec2 ΛϝΠϯͰར༻͍ͯ͠Δཧ༝
JenkinsδϣϒͰσϓϩΠΛ࣮ࢪᶃ ίϯςφ࡞ & ECRᶄ Cloudformation ΛͬͯCanaryϦϦʔεDeployϑϩʔ
Deploy - Jenkins ͰίϯςφΛϏϧυBuild & PushECR
CloudformationͰશϦιʔεΛ࡞Availability Zone A Availability Zone C Availability Zone DServiceTask Task֤αʔϏεͷAWSڥECSALBECR
CloudformationͰશϦιʔεΛ࡞Availability Zone A Availability Zone C Availability Zone D֤αʔϏεͷAWSڥαʔϏεAlertϩάσʔλੵ
Availability Zone A Availability Zone A Availability Zone ACanaryϦϦʔεͷྲྀΕ֤αʔϏεͷAWSڥBlue Service100 %
CanaryϦϦʔεͷྲྀΕ֤αʔϏεͷAWSڥBlue ServiceAvailability Zone A100 %Availability Zone A Availability Zone A0 %Green Service
Canary - ALBͷՃॏϧʔςΟϯάͰGreenʹϦΫΤετΛྲྀ֤͢αʔϏεͷAWSڥBlue ServiceAvailability Zone A0 %Availability Zone A Availability Zone A100 %Green Service
CanaryϦϦʔεͷྲྀΕ֤αʔϏεͷAWSڥAvailability Zone A Availability Zone A Availability Zone A100 %Green Service
CodeDeploy ͍ͬͯ·ͤΜཧ༝ECS Service ͱ AutoscalingGroupͷ࿈ಈઃఆΛߦ͏ඞཁ͕͋ͬͨҝCanaryϦϦʔεݱࡏCapacity Providerػೳ͕ఏڙ͞Ε͍ͯΔҝɺCodeDeployͰཁ݅ຬͨͤΔ͔͠Ε·ͤΜ (ະݕূ)
ΠϯϑϥνʔϜ͕࡞ͨ͠ڞ௨εΫϦϓτΛ֤ϓϩμΫτͷσϓϩΠδϣϒͰར༻ (bash + aws-cli)CanaryϦϦʔεAutoscalingGroupɺECS Service࡞ALBՃॏϧʔςΟϯάมߋڞ௨ϩδοΫ
ϩάసૹ༻fluentdίϯςφ͔ΒߴසͰs3ʹϩάσʔλ͕Put͞Ε͍ͯͨࣄʹΑΓɺs3ྉ͕ۚߴֹ ʹͳ͍ͬͯͨӡ༻࣌ʹൃੜͨ͠՝fluentdͷflush_interval ͷΛେ͖ͳʹมߋରॲ
Debugϩά͕ग़ྗ͞ΕΔঢ়ଶͷApplicationʹରͯ࣌ؒ͠ෛՙࢼݧΛߦͬͨࣄͰCloudwatchϩάͷྉ͕ۚߴֹʹͳͬͨӡ༻࣌ʹൃੜͨ͠՝ෛՙࢼݧ࣮ࢪ࣌ʹΞϓϦέʔγϣϯͷLoggerઃఆঢ়گΛνΣοΫରॲ
Serviceؒ௨৴ͰύϒϦοΫͳALBΛར༻͍ͯͨ͠ɺAPIϨεϙϯε͕ѹॖͰ͖͍ͯͳ͔ͬͨࣄʹΑΓɺDataTransfer-Regional-Byte ͷྉ͕ۚߴֹʹͳ͍ͬͯͨӡ༻࣌ʹൃੜͨ͠՝Internal ALBʹมߋ͢ΔAPIϨεϙϯεΛѹॖ͢Δରॲ
HealthyHostCount͕0ͷTargetGroupʹϦΫΤετ͕ϧʔςΟϯά͞ΕΔӡ༻࣌ʹൃੜͨ͠՝ՃॏઃఆมߋલʹTargetGroupͷHealthyHostCountΛνΣοΫ͢ΔΑ͏ʹεΫϦϓτΛվमରॲ
EKS on self managed nodesࣄྫհ
શจݕࡧAPIAPI: 30ݸɺΠϯελϯε: 100JenkinsΠϯελϯε 33ʙhttps://note.com/navitime_tech/n/nc663cc1e866eEKS Ͱӡ༻͍ͯ͠ΔService
ArgoCD Λར༻ArgoCDͱʁKubernetes Ϋϥελʹରͯ͠GitopsʹΑΔܧଓతσϦόϦʔΛߦ͏πʔϧEKS Cluster ͷσϓϩΠ
Argo eventsArgo workflowTriggerᶄ ϦϙδτϦߋ৽ᶃ WebhookϦΫΤετఆظfetchBlue ServiceGreen ServiceArgorollousᶅ Sync (Deploy)શจݕࡧAPI ͷσϓϩΠ - શମ૾
Blue/GreenɺCanaryϦϦʔεΛαϙʔτ͍ͯ͠ΔKubernetesίϯτϩʔϥςετέʔεɺࣗಈϩʔϧόοΫ͕؆୯ʹઃఆͰ͖ΔArgo rolloutsͱ?
Blue/GreenσϓϩΠޙʹ 2xxͷׂ߹͕95%ΛԼճͬͨɺ͘͠ϩʔυόϥϯαʹαʔϏε͕ଓ͞Ε͍ͯͳ͍߹ʹࣗಈͰϩʔϧόοΫΛ࣮ߦArgo rollouts - ྫ
Argo rollouts - BlueGreenσϓϩΠͷྲྀΕ͜Ε͔ΒαʔϏεΠϯ͢Δ৽͍͠ίϯςφαʔϏε͔Βࢀর͞Ε͍ͯΔίϯςφ
Argo rollouts - BlueGreenσϓϩΠͷྲྀΕ͜Ε͔ΒαʔϏεΠϯ͢Δ৽͍͠ίϯςφαʔϏε͔Βࢀর͞Ε͍ͯΔίϯςφ৽͍͠ίϯςφΛαʔϏεΠϯͤ͞ΔલʹࣄલʹςετϦΫΤετΛૹ৴
Argo rollouts - BlueGreenσϓϩΠͷྲྀΕ৽͍͠ίϯςφ͕όϥϯαʹͭͳ͕Δچίϯςφ͕όϥϯα͔Β֎ΕΔςετϦΫΤετૹ৴Ͱͳ͍ࣄΛ֬ೝͨ͠Βچίϯςφ͔Β৽ίϯςφʹϧʔςΟϯάઌΛมߋ
Argo rollouts - BlueGreenσϓϩΠͷྲྀΕྫ͑ɺ#MVFɾ(SFFOΓସ͑ޙʹ৽ίϯςφͰޭʢYYϦΫΤετͷׂ߹ʣ͕ҰఆͷᮢΛԼճͬͨ͜ͱΛݕ
Argo rollouts - BlueGreenσϓϩΠͷྲྀΕچίϯςφچίϯςφʹϧʔςΟϯάઌ͕มߋ͞ΕɺσϓϩΠεςʔλε͕%FHSBEFEͷঢ়ଶͱͳΔ৽ίϯςφ
GitϦϙδτϦʹPushͨ͠ΒσϓϩΠ͕Δҝɺͷ͋ΔϓϩάϥϜ͕ຊ൪ڥʹ؆୯ʹ্͕ͬͯ͠·͏Gitopsӡ༻࣌ͷManifestΛຊ൪ө͢Δલʹςετ͢Δඞཁ͕͋Δ
git push ͢Δલʹ conftest ͰςετΛ࣮ߦkubernetesΫϥελʹө͞ΕΔલʹGatekeeperͰςετΛ࣮ߦ2छྨͷManifestςετΛ࣮ࢪ
ຊ൪͚Ingress Ϧιʔεʹݕূ͖υϝΠϯ͕ઃఆ͞Ε͍ͯͳ͍͔ʁຊ൪͚Ingress Ϧιʔεʹݕূ༻ίϯςφ͕ઃఆ͞Ε͍ͯͳ͍͔ʁLatestλάͷίϯςφ͕ࢦఆ͞Ε͍ͯͳ͍͔ʁHPAઃఆద͔ʁςετέʔεͷྫ
EKSόʔδϣϯΞοϓʹ͕͔͔࣌ؒΔӡ༻࣌ʹൃੜͨ͠՝argocd ͰσϓϩΠ͢Δํࣜʹ͢ΔࣄͰ͕࣌ؒॖɹ(ҎલkubectlͰmanifestΛapplyͯ͠·ͨ͠ʣରॲ
EKSόʔδϣϯΞοϓʹ͕͔͔࣌ؒΔӡ༻࣌ʹൃੜͨ͠՝argocd ͰσϓϩΠ͢Δํࣜʹ͢ΔࣄͰ͕࣌ؒॖɹ(ҎલkubectlͰmanifestΛapplyͯ͠·ͨ͠ʣରॲK8sόʔδϣϯΛ্͛ΔຖʹManifestΛมߋ͢Δ࡞ۀ͕΄΅ൃੜ͠·͢ɻAddon Ҏ֎ͷίϯϙʔωϯτͷଟ͍ͷͰɺݕূɾຊ൪ͷEKSόʔδϣϯΞοϓʹࠓͰ 1ϲ݄ ͕͔͔͍࣌ؒͬͯΔঢ়گͰ͢ɻ
Node-pressure Eviction ͕ൃੜ͠ɺಛఆNode্ͰՔಇ͍ͯͨ͠શPod͕ڧ੍Terminate͞ΕΔӡ༻࣌ʹൃੜͨ͠՝resources > limits Ͱ্ݶΛࢦఆkubelet_evictions ϝτϦΫεΛGrafanaͰఆ؍ଌରॲ
Kubernetesٕज़ऀෆӡ༻࣌ʹൃੜ͍ͯ͠Δ՝
ECS fargateΛ3ؒӡ༻ͯ͠Ͳ͏͔ͩͬͨʁ
݁ӡ༻ɾֶशʹίετ͕͔͔Βͳ͍ͷ͕ΠΠʂ
ECS on EC2 ͱൺֱͯ͠AutoscalingGroup࡞ͷख͕ؒল͚ΔECS Optimized AMIͷఆظߋ৽ෆཁεέʔϧઃఆ͕γϯϓϧ (Nodeͷεέʔϧઃఆ͕ෆཁ)ֶशίετ͕গ͍͠ϥϯχϯάίετߴ͍
EKS on self managed nodesͱൺֱͯ͠EKS ΫϥελόʔδϣϯΞοϓ࡞ۀʹ͕͔͔࣌ؒΔEKS (k8s)ͷֶशίετߴ͍ ( ݁ߏߴ͍… )EKS (k8s)֦ுੑ͕ߴ͘ɺ͍ΖΜͳϢʔεέʔεͰར༻Ͱ͖Δ
͝੩ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠