Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

顧客のアプリケーションコードが動くマルチテナント環境における課題とEKSにたどり着くまで

 顧客のアプリケーションコードが動くマルチテナント環境における課題とEKSにたどり着くまで

shogomuranushi

March 20, 2020
Tweet

More Decks by shogomuranushi

Other Decks in Technology

Transcript

  1. σʔλ औಘ σʔλ ஝ੵ σʔλ ֬ೝ ڭࢣσʔλ ࡞੒ Ϟσϧ ઃܭ

    ֶश ධՁ σϓϩΠ ਪ࿦ ࠶ֶश σʔλ΢ΣΞϋ΢ε ͷ४උͱ؅ཧ σʔλͷόϦσʔγϣϯʢਖ਼֬ੑʣͷ֬ೝ 0͔ΒͷϞσϧઃܭ GPU؀ڥͷ४උͱ ߴ౓ͳ෼ࢄԽ σʔλɺϞσϧɺ݁Ռͷόʔδϣϯ؅ཧ ౷ܭతʹຊ൪ʹσϓϩΠͨ͠ॠؒ ͔Βਫ਼౓͕Լ͕Δ͜ͱΛ୲อ େྔσʔλͷऔಘʹඞཁͳAPI΍ෛՙ෼ࢄ ͷ࢓૊Έ΍४උɺηΩϡϦςΟ୲อ ڭࢣσʔλͷ࡞੒ʹඞཁͳπʔϧͱਓࡐͷ४උ ։ൃ؀ڥ͔Βຊ൪؀ڥ΁ͷҾ͖౉͠ ৑௕ੑ΍GPUϦιʔεͷ୲อɺ Τοδଆͱͷ࿈ܞϓϩηεߏங
  2. σʔλ औಘ σʔλ ஝ੵ σʔλ ֬ೝ ڭࢣσʔλ ࡞੒ Ϟσϧ ઃܭ

    ֶश ධՁ σϓϩΠ ਪ࿦ ࠶ֶश σʔλ΢ΣΞϋ΢ε ͷ४උͱ؅ཧ σʔλͷόϦσʔγϣϯʢਖ਼֬ੑʣͷ֬ೝ 0͔ΒͷϞσϧઃܭ GPU؀ڥͷ४උͱ ߴ౓ͳ෼ࢄԽ σʔλɺϞσϧɺ݁Ռͷόʔδϣϯ؅ཧ ౷ܭతʹຊ൪ʹσϓϩΠͨ͠ॠؒ ͔Βਫ਼౓͕Լ͕Δ͜ͱΛ୲อ େྔσʔλͷऔಘʹඞཁͳAPI΍ෛՙ෼ࢄ ͷ࢓૊Έ΍४උɺηΩϡϦςΟ୲อ ڭࢣσʔλͷ࡞੒ʹඞཁͳπʔϧͱਓࡐͷ४උ ։ൃ؀ڥ͔Βຊ൪؀ڥ΁ͷҾ͖౉͠ ৑௕ੑ΍GPUϦιʔεͷ୲อɺ Τοδଆͱͷ࿈ܞϓϩηεߏங AI׆༻·Ͱʹ਺ଟ͘ͷ՝୊͕ଘࡏ
  3. Ref: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf “ As the machine learning (ML) community continues

    to accumulate years of experience with live systems ” “ ։ൃ͓ΑͼMLγεςϜΛಋೖ͢Δ͜ͱ͸ൺֱతߴ଎Ͱ҆ՁͰ͕͢ɺ࣌ؒΛ͔͚ͯ ͦΕΛҡ࣋͢Δ͜ͱ͸ࠔ೉͔ͭߴՁͰ͋Δ”
  4. ୈҰੈ୅ΞʔΩςΫνϟ of Datalake • Raw σʔλΛ஝ੵ͢ΔͨΊͷαʔϏε • ετϨʔδ͸ S3 •

    ॳظόʔδϣϯ͸ API Gateway ͱ Lambda Ͱ REST API Ͱఏڙ • ౰ॳ͸ϑΝΠϧͷอଘɺऔಘɺҰཡͳͲͷΦϖϨʔγϣϯͷΈ • ߏஙʹ͸Serverless FrameworkΛར༻ • Signed URLΛൃߦ͠Ξοϓϩʔυͯ͠΋Β͏࢓༷
  5. ୈҰੈ୅ΞʔΩςΫνϟ of Datalake Good • ϝϯςφϯεϑϦʔ Bad • API GatewayͭΒ͍

    • ϩʔΧϧͷ࠶ݱੑ͕௿͘։ൃޮ཰ѱ͍ • ϖΠϩʔυαΠζ • Serverless Framework·͊·͊ਏ͍ • ࠓޙଞͷαʔϏε΋ಉ͡ελΠϧͰ։ൃʁ
  6. ୈೋੈ୅ΞʔΩςΫνϟ of Datalake • AWSͷAPI Gateway͸γϯυ͍ͷͰɺAPI GatewayΛ಺੡ • ։ൃޮ཰͕ѱ͍ɺେࣄͳΤϯυϙΠϯτͳͷͰো֐࣌ͷίϯτϩʔϧ͸͔ͨͬͨ͠ •

    API Gateway͕෼཭͞Εͨ͜ͱͰɺೝূɾϧʔςΟϯά͕ڞ௨ʹ • ୔ࢁͷAPI Gatewayͱ͸͓͞Β͹ • Datalake͸LambdaͱS3ͷΈʹ • ౰໘͸͜ͷߏ੒
  7. ୈࡾੈ୅ΞʔΩςΫνϟ of Datalake • ʮݕࡧ΍Χ΢ϯτ͕͍ͨ͠ʯ • ϝλσʔλݕࡧ΍Χ΢ϯτػೳ༻ͷDBΛߏங • ݕࡧ͸PostgreSQL(Aurora)ͷJSON+GIN IndexͰ࣮૷

    • ॊೈͳϝλσʔλͷ෇༩ͱݕࡧΛͰ͖ΔΑ͏ʹ͠ա͗ͯIndexരൃ • Cassandra…? Or ॊೈੑΛܰݮͤ͞Δ͜ͱΛݕ౼த • S3 Event + SQS + Lambda ͔Β Datalake API Λݺͼग़͢ • όοΫΤϯυͷෛՙ͕଱͖͑Εͳ͔ͬͨɺCW Logs͕ߴ͘ͳͬͨͷͰόοΫΤϯυΛLambda ͔ΒECSʹҠߦ • ʮS3ͷSigned URL͸खؒͳͷͰҰճͰDatalakeʹΞοϓϩʔυ͍ͨ͠ʯ • S3΁ͷPut͸API GatewayͰ୲͏Α͏ʹ࢓༷มߋ
  8. ୈࡾੈ୅ΞʔΩςΫνϟ of Datalake Good • Signed URL͸ෆཁʹͳΓUXվળ • ϝλσʔλݕࡧͰ͖ΔΑ͏ʹ Bad

    • ΋͸΍αʔόϨε͸ແ͘ͳͬͨͷͰϝϯ ςίετ૿Ճ • ϝλσʔλػೳ͕ࣗ༝ա͗ͯIndexංେԽ Ͱਏ͍ όοΫΤϯυෛՙɺCWLogsίετ૿ՃʹΑΓ ECSʹมߋ
  9. ୈྵੈ୅ΞʔΩςΫνϟ of Serving Good • γϯϓϧ • ҰׅͰؔ࿈ϦιʔεΛ࡞੒ɾ࡟আͰ͖Δ Bad •

    1Serviceຖʹ1ͭͷELB͸ແବ • CFnͰ࡞੒͞ΕΔͷʹ਺෼͔͔Γ஗͍ • CFn͸ඇಉظͳͷͰΤϥʔݕ஌͕೉͍͠
  10. ୈҰੈ୅ΞʔΩςΫνϟ of Serving • 1 Loadbalancer = Muliti Serviceʹมߋ •

    ALBͷϧʔςΟϯάϧʔϧ͸100ݸ͕ϋʔυϦϛοτ • ސ٬ͷAPI͕ͲΜͲΜ৐ΔͨΊ͙͢ʹഁ୼͢Δ͜ͱ͕ݟ͑ͯͨ • ·ͨ΋΍ࣗલͰGatewayΛ։ൃʢECS + Lambda + DynamoDBʣ orz • ސ٬ͷΞϓϦέʔγϣϯ͕ৗ࣌Τϥʔൃੜɻίϯςφ͕࠶ىಈ͠·͘ΓEBSόʔετΫϨδοτΛ৯͍ͭͿ͢ • ECS Sevice͸ৗʹىಈ͢Δ࢓༷ͰαʔΩοτɾϒϨΠΫ͸ແ͍ • γεςϜىҼͷΤϥʔͰͷαʔΩοτɾϒϨΠΫ͸ޙ೔࣮૷͞Ε͕ͨɺΞϓϦىҼͷαʔΩοτɾϒϨΠ Ϋ͸·ͩແ͍ • ͳͷͰ࠶ىಈ܁Γସ͑͠IO৯͍ͭͿ͢ʢѹ౗తϊΠδʔωΠόʔʣ • ࣗલʢLambdaʣͰϔϧενΣοΫͯ͠ɺҰఆճ਺Failͷ৔߹͸ϧʔςΟϯά͠ͳ͍Α͏ʹࣗલGatewayͷ DynamoDBʹอଘ
  11. ୈҰੈ୅ΞʔΩςΫνϟ of Serving • Blue/Greenػೳͷ࣮૷ • APIͷΤϯυϙΠϯτͷ޲͖ઌͱͳΔϞσϧΛ؆୯ʹ੾Γସ͑ΒΕΔػೳ • ECSλεΫ͔ΒϗετͷIAMϩʔϧ͕৮Εͯ͠·͏໰୊ •

    iptablesͰmetadata΁ͷΞΫηεΛͿͬͨ੾Δ • ࠷ۙ͸ɺawspvc ωοτϫʔΫϞʔυͷ৔߹ͷ৔߹͸ҎԼͷΑ͏ʹ؆୯ • ECS_AWSVPC_BLOCK_IMDS: true
  12. ୈҰੈ୅ΞʔΩςΫνϟ of Serving Good • 1ਪ࿦API = 1ELBͰ͸ແ͘ͳΔͨΊίετμ ΢ϯ •

    Blue/Green͕Ͱ͖ΔΑ͏ʹ • EBSόʔετ໰୊ղܾ • EC2 metadataΞΫηεͰ͖ͳ͍Α͏ʹ Bad • ࣗલ࣮૷͕ଟ͘ͳ͖ͬͯͯϝϯςίετ ͕͕͕ɻͱ͸͍͑ɺAWSͷػೳͰ͸Χ όʔͯ͠ͳ͍՝୊͸ࣗલ࣮૷͔͠ͳ͍
  13. ୈೋੈ୅ΞʔΩςΫνϟ of Serving • ECSͱEC2ͷ૬ੑ͕ѱ͍໰୊ • AutoScalingൃಈ࣌ʹίϯςφΛແࢹͯ͠EC2ΛTerminate͢ΔɻDrainͯ͠Αʢࠓ͸ղফ͞Ε͍ͯΔʁʣ • AutoScalingൃಈ৚݅͸CPU༧໿ྔϕʔε͕جຊͳͷͰɺίϯςφαΠζͷεέʔϧ͕͔ͳΓखؒͩ͠ɺܭ ࢉϩδοΫ΋ΊΜͲ͍͘͞

    • ۭ͖Ϧιʔε͕͋ͬͯ΋ू໿͞Εͳ͍ • ΠϯελϯεΛೖΕସ͑ΔࡍͷBlue/Green Deployment͸ࣗલ࣮૷ • ސ٬ͷਪ࿦APIΛ୔ࢁࡌͤΔͱίετ͕ංେԽ͢Δ • εϙοτΠϯελϯεΛ͏·͘ѻ͑ͳ͍͔ݕ౼ • ΠϯελϯελΠϓ΍κʔϯΛࢄΓ͹ΊͨΓɺεϙοτΠϯελϯε͕ചΓ੾Εͨ࣌ʹΦϯσϚϯυʹ੾ Γସ͑ͨΓ͢Δඞཁ͕͋Δ • ݕূͷ݁ՌɺSpotinstͱ͍͏αʔϏεΛೖΕΔ͜ͱʹ
  14. Bad • εϙοτΠϯελϯεͷద༻ൣғΛؒҧ ͑Δͱҙਤ͠ͳ͍ఀࢭ͕ൃੜ͢Δ ʢJupyter Notebookͱ͔ʣ ୈೋੈ୅ΞʔΩςΫνϟ of Serving Good

    • SpotinstΛར༻͢Δ͜ͱʹΑΓECS͕଍ Γͳ͍ͱ͜ΖΛΧόʔ • Blue/Green Deployment͕ڧ੍͞ΕΔ • ू໿ޮ཰্͕͕Γɺ60-70%ίετμ΢ϯ
  15. ୈࡾੈ୅ΞʔΩςΫνϟ of Serving • 1 ServiceͰ 300ίϯςφΛಈ͔͢πϫϞϊ͕ग़͖ͯͨ • DynamoDBͷύʔςΟγϣχϯάͷภΓ͕ൃੜ͠εϩοτϦϯά͕େྔʹ •

    Kubernetesػӡ͕ߴ·͖ͬͯͨɻKubernetesΛ׆༻͢Δ͜ͱͰ • API Gateway૬౰ͷػೳ͸Ambassador(Envoy)Ͱ୅༻Մೳ • ͔͠͠ɺService/Pod͕େྔʹଘࡏ͢ΔͱEnvoyͷϧʔςΟϯάϧʔϧͷߋ৽ʹ਺ेඵ͔͔Δ͜ͱ΋ • Service DiscoveryɺϔϧενΣοΫ͸ඪ४ػೳͰ೚ͤΒΕΔ • Service؀ڥ΋KubernetesԽͱͱ΋ʹࣗલ࣮૷෦෼ͷݮΒ͢ํ޲ʹ
  16. ୈࡾੈ୅ΞʔΩςΫνϟ of Serving Good • ϔϧενΣοΫɾαʔϏεσΟεΧόϦΛࣗ લ࣮૷͔ΒKubernetesͷҰൠతͳػೳʹஔ͖ ׵͑ • GatewayͷػೳΛKubernetesͷ֦ுػೳ

    ʢAmbassadorʣʹஔ͖׵͑ Bad • Ambassador(Envoy)ͷཧղ͕ඞཁʹ • ࣗલ࣮૷ΑΓ͸༷ʑͳ໘Ͱߟྀ͞Ε͍ͯ ͯϝϯςίετ͸Լ͕Δ͕ΧελϚΠζ ੑ͸མͪΔ
  17. ୈࡾੈ୅ΞʔΩςΫνϟ of Serving • ݱঢ়ͷ՝୊ • ސ٬ຖͷPodͷΞ΢τό΢ϯυͷసૹྔͷܭଌํ๏͕Θ͔Βͳ͍ • GKEʹ͸͋ΔΒ͍͠ɻIstioͷग़൪͔ɾɾʁ •

    ސ٬ຖʹίετΛՄࢹԽ͢ΔͨΊʹKubernetesͷeventΛhook͠·͘Βͳ͍ͱ͍͚ͳ͍ • ͜ΕGKE͸usage meteringͱ͍͏ͷͰग़དྷΔΆ͍ͷͰEKSͰ΋ͥͻ΍ͬͯ΄͍͠ɻࣄۀ෦ϚϧνςφϯτͰ΋ඞཁͱࢥ͏ • Kubernetes ͷ݁Ռ੔߹ੑͷৼΔ෣͍ͷ্ʹࣗ෼ͨͪͷγεςϜΛߏங͢Δ೉͠͞ • ྫ͑͹ • Pod ͷ STATUS ͕ Running ʹͳΔɻૄ௨ग़དྷΔ͔ͱࢥ͍͖΍ Ready (readinessProbe) ͕ 0/1 • Ready ͕ 1/1ʹͳΔɻૄ௨Ͱ͖Δ͔ͱࢥ͍͖΍ Ambassador (Envoy) ͷͱ͋Δ Pod ͸ૄ௨Ͱ͖Δ͕ɺͱ͋Δ Pod ͸ߋ ৽଴ͪͷͨΊૄ௨Ͱ͖ͳ͍ɻ෼ࢄγεςϜʹ͓͚Δ݁Ռ੔߹ͳͷͰʮ͍ͭʯ͔Β࢖͑ΔΑ͏ʹͳͬͨঢ়ଶ͔ΛϢʔβ ʹ஌ΒͤΔͷ͕೉͍͠ • ͦΕͧΕͷػೳ͕݁Ռ੔߹Λ୲อ͍ͯ͠Δ͕ނʹɺ࿈ಈͯ͠ཉ͍͠ͱ͜ΖΛຒΊΔඞཁ͕͋Δ
  18. ୈࡾੈ୅ΞʔΩςΫνϟ of Serving • ݱঢ়ͷ՝୊ • IPΞυϨεͷރׇ • 1ΠϯελϯεลΓ Serving

    = 20ݸɺTraining = 30ݸ Λ࢖༻͍ͯ͠Δɻ20ݸ * 400 Service ͕࡞ΒΕͨ࣌ʹ8,000ݸͷIP Λ࢖༻ɻ·͔͞ͷ /16 αϒωοτ͕ރׇ • એݴతʹͳΓͮΒ͍ • SDKͰσϓϩΠͯ͠ΔͷͰʮyamlΛ΋͏Ұ౓ద༻ͨ͠Β࠶ݱͰ͖ΔΑʯঢ়ଶʢએݴతʣʹ͸ͳ͍ͬͯͳ͍ɻ൵͍͠ • ؂ࢹ͠ਏ͍ • ϢʔβʔىҼͷ໰୊ͱϓϥοτϑΥʔϜͷ໰୊ͷ੾Γ෼͚ํ๏͕೉͍͠ • ސ٬ىҼͰࢮΜͰΔpod͕ଘࡏ͢Δɻશ͕ͯਖ਼ৗʹՔಇ͍ͯ͠Δ༁Ͱ͸ͳ͍ • ಠࣗGatewayͰ5xx/4xxͷ؂ࢹ͕Ͱ͖ͳ͍ɻόοΫΤϯυ͕ސ٬ґଘͷͨΊ5xx͸े෼༗ΓಘΔ • ϚΠΫϩαʔϏε͋Δ͋Δ • LB/Proxy͕ଟஈͳͷͰௐࠪͮ͠Β͍ɻސ٬ʹͲ͜·ͰϩάɾϝτϦΫεΛग़ͤ͹ྑ͍ͷ͔೉͍͠
  19. ୈҰੈ୅ΞʔΩςΫνϟ of Training • ֶशδϣϒΛ࣮ߦ͢Δج൫ • ॳظόʔδϣϯ͔ΒKubernetesϕʔεͰ࣮૷ͨ͠ • JobɺPodʢartifactอଘ༻ίϯςφʣͳͲΛECSͰࣗલ࣮૷͸ඇޮ཰ͩͬͨͨΊ •

    ͔͠͠EKS͸ແ͔ͬͨͷͰ on EC2 Ͱ • ސ٬ͷίϯςφͱ؅ཧܥίϯςφ͸PodΛ෼͚ͨ • TrainingޙͷϞσϧΛs3ʹࣗಈอଘ͢ΔίϯςφɺϩάΛऩू͢Δίϯςφ͸ɺϢʔβͷίʔυͱಉډͤ͞ΔͱIAMͷݖ ݶతʹྑ͘ͳ͍ͷͰɺผPodͱͯ͠agentతʹىಈ • kube2iam Λ༻͍ͯPodຖʹIAMϩʔϧΛΞλονʢࠓ͸ެ͕ࣜग़͍ͯΔʁʣ • privileged͸Կ͕͋ͬͯ΋off • GPUυϥΠόपΓ͸ۤ࿑͢Δ͚ͲؤுΔ
  20. ୈೋੈ୅ΞʔΩςΫνϟ of Training Good • Jupyter NotebookɺTensorboradΛఏڙ • ڞ༗ετϨʔδͷఏڙ Bad

    • Kubernetes on EC2 ͸ӡ༻͕݁ߏਏ͍ • Jupyterະ࢖༻࣌ͷՔಇίετ͕ແବ • ڞ༗ετϨʔδ͕ߴͯ͘NFSͳͷͰ஗͍
  21. ୈࡾੈ୅ΞʔΩςΫνϟ of Training • ݱঢ়ͷ՝୊ • ೥ؒܭըͷച্ɾݪՁ • ਫ਼៛Խ͢Δඞཁ͕͋Δ͕ɺސ٬࣍ୈͳͷͰݪՁΛ༧ଌ͢Δͱ͔΋͸΍Α͘෼͔ΒΜɻAWS͞ΜͲ͏΍ͬͯ ؅ཧ͍ͯ͠ΔͷͩΖ͏

    • όά • p3.16xlarge͕ఀࢭͤͣʹPສҐ͔͔ͬͯΔ࣌΋͋ͬͨ • OS • nvidia-driver͕αϙʔτ͢ΔOSͰ͋Δඞཁ͕͋Δɻͭ·ΓUbuntu or Amazon Linux2ɻʮBottlerocketʯͷ nvidia-driverαϙʔτظ଴
  22. ୈࡾੈ୅ΞʔΩςΫνϟ of Training • ݱঢ়ͷ՝୊ • ίϯςφؒͷґଘؔ܎ • ϩάΛ࿙Εͳ͘ऩू͢ΔͨΊʹ Affinity

    Λۦ࢖ͯ͠ log collector pod -> platform agent pod -> training job ͱ͍͏༏ઌॱҐΛ෇͚ͯPodΛىಈͤ͞Δͱ ɺඞཁͳϦιʔε͕଍Γͳ͍EventͷൃՐ͕஗ΕΔͨΊ Autoscaler ΁ͷ௨஌͕஗ΕɺΠϯελϯεͷىಈ͕஗͘ͳΔ • DockerΠϝʔδɾύοέʔδͷޓ׵ੑɺαϙʔτ • ఏڙ͢ΔDockerΠϝʔδͷޓ׵ੑҡ͕࣋೉͍͠ • αϙʔτର৅ • DLϥΠϒϥϦͷछྨ x όʔδϣϯ x Pythonόʔδϣϯ x CUDAͷόʔδϣϯ …
  23. ୈҰੈ୅ΞʔΩςΫνϟ of Trigger • ਪ࿦δϣϒΛ࣮ߦ͢Δج൫ • Datalakeʹσʔλ͕౤ೖ͞Εͨ͜ͱΛτϦΨʔʹൃಈ • S3->SNS->SQSͰɺҰ୴ΩϡʔΠϯά͢Δ •

    SQSͷQueue͔ΒSubscriber͕औಘ͠ɺAWS Batch΁λεΫΛ౤͛Δ • δϣϒ͕ऴΘΕ͹Πϯελϯε͕ࣗಈఀࢭ͢Δ • ֖Λ։͚Ε͹1෼ʹԿඦͱδϣϒ͕౤͛ΒΕΔ
  24. ୈҰੈ୅ΞʔΩςΫνϟ of Trigger Good • ٸܹͳෛՙ͸SQS͕ٵऩ • Subscriber͸Queueͷ਺Ͱεέʔϧ • AWS

    Batch͸౤͛ͨΒྑ͍͚ͩ Bad • ىಈ͕஗͍ɺϝτϦΫεແ͍ɺϩά͕ू໿ ͞ΕͯJob୯ҐͰݟΕͳ͍ • AWS BatchͷϢʔεέʔεʹ߹ͬͯͳ͔ͬͨ • AZؒͷωοτϫʔΫసૹྔ͕Ϡό͍
  25. ୈҰੈ୅ΞʔΩςΫνϟ of Logging for Customer Good • αʔόϨεܥʢCWLogs / Kinesiss

    / Lambda / DynamoDBʣͳͷͰӡ༻؅ཧෆ ཁ Bad • ϩάྔ͕ٸ૿͢ΔͱLambda͔ DynamoDBͰεϩοτϦϯάى͖Δ • CWLogs࣮࣭࢖ͬͯͳׂ͍ʹ݁ߏߴ͍
  26. ୈೋੈ୅ΞʔΩςΫνϟ of Logging for Customer • ϩάͷόοΫΤϯυΛDatadog Logsʹมߋ • ServiceɺTrainingͷϩάΛDatadog

    Logsʹอଘ • Datadog Logs ͷAPI͔ΒϩάΛऔಘ͠ސ٬ʹఏڙ • ElasticSearchʹ͢Δ͔ߟ͕͑ͨ • ݕ౼࣍఺Ͱϩάྔ͸5ԯϨίʔυ/݄ • ࣄۀͷ੒௕ͱϩΪϯάର৅Λ૿΍͢͜ͱΛߟ͑Δͱ3ϲ݄ຖʹഒʑʹ૿͑Δ • ഒʑʹ૿͑ΔElasticSearchΛӡ༻ͨ͘͠ͳ͔ͬͨ͠ɺϩά͸ίΞίϯϐλϯεͰ͸ͳ͍ͷͰӡ༻ί ετΛֻ͚ͨ͘ͳ͔ͬͨ
  27. ୈೋੈ୅ΞʔΩςΫνϟ of Logging for Customer Good • ϑϧϚωʔδυͳͷͰӡ༻ϑϦʔ Bad •

    ͓͕͔͔ۚΔɻͱ͸͍͑ɺCWLogsΑΓ҆͘ ࣗલͰӡ༻͢ΔΑΓϚγ • Datadogͷ࢓༷ʹҾͬுΒΕΔ • ݁Ռ੔߹ɺॱংอূແ͠ • datadog-agentͷڍಈɾ࢓༷
  28. ୈҰੈ୅ΞʔΩςΫνϟ of Logging for System & Application • ࣾ಺Ͱར༻͢ΔγεςϜϩάɾΞϓϦέʔγϣϯϩάͷऩूج൫ •

    ϩά͸ίΞίϯϐλϯεͰ͸ͳ͍ͨΊɺग़དྷΔݶΓࣗલͰӡ༻ͨ͘͠ͳ͍ • ECS΍LambdaΛத৺ʹར༻͍ͯͨͨ͠ΊCloudWatch Logsʹϩά͸֨ೲ͞Ε͍ͯͨ • ͱΓ͋͑ͣCW LogsΛར༻ͨ͠
  29. ୈҰੈ୅ΞʔΩςΫνϟ of Logging for System & Application Good • Πϯϑϥͷ͜ͱ͸ߟ͑ͳͯ͘ྑ͍

    Bad • Ͳ͜ʹ֨ೲ͞Ε͍ͯΔ͔෼͔Βͳ͍ • ݕࡧੑօແɻ໨grepྗ্͕Δ • ϚΠΫϩαʔϏεؒͷϩάௐࠪͱ͔͔ͳΓπ ϥϛ͔͠ͳ͍ • ݁Ռɺ໰୊͔͋ͬͨ࣌͠ϩάΛݟͳ͍ • Ҏ֎ʹCWLogs͸ߴ͍
  30. ୈೋੈ୅ΞʔΩςΫνϟ of Logging for System & Application Good • Πϯϑϥͷ͜ͱ͸ߟ͑ͳͯ͘ྑ͍

    • ҰՕॴͰશ෦ݕࡧग़དྷΔ • Tag, Attributeʹରͯ͠IndexுΕͯݕࡧର৅ʹ Ͱ͖Δ • Ͳͷ߲໨ʹԿ݅͋Δ͔Ұ໨ྎવ • ϚΠΫϩαʔϏεͳͷͰɺTracing IDͳͲΛຒ ΊࠐΜͰ௥͍΍ͨ͘͢͠Γ • APIͷ܏޲෼ੳͨ͠ΓɺӨڹൣғௐ΂ͨΓͱ ׆༻͕޿͕ͬͨ Bad • ͓͕͔͔ۚΔɻͱ͸͍͑ɺCWLogsΑΓ҆ࣗ͘ લͰӡ༻͢ΔΑΓϚγ