Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Your AWS Deployment Pipeline-Case study...

Scaling Your AWS Deployment Pipeline-Case study of CloudOne-SRE

您是否遇到在 AWS 上部署及管理多個服務時的瓶頸?或是在接⼿來⾃不同 Feature Team 的服務時,感到⼿忙腳亂?如果您有這樣的困擾,本次議程的內容會帶給你⼀些幫助。 本次議程將會分享 TrendMicro CloudOne SRE 的實際案例:我們如何在 Multi-account, Multi-region 下於 AWS 部署及管理多個服務,以及如何因應不同的產品/服務來建立 Deployment pipeline 讓 SRE 在 Cloud workload 上管理的更加順暢。

Avatar for Tsung-Ting, Wu

Tsung-Ting, Wu

July 19, 2023
Tweet

More Decks by Tsung-Ting, Wu

Other Decks in Programming

Transcript

  1. Scaling Your AWS Deployment Pipeline: Case study of CloudOne SRE

    One platform x 7 product x 9 region Taiwan CloudSummit 2023 by 宗庭 吳
  2. 吳宗庭 Ken Wu Senior SRE @TrendMicro CloudOne 👨‍💻 A Backend

    Engineer and Site Reliability Engineer 💪 Specializing in cloud-based technologies, Cybersecurity 🔭 interested in DevOps, SRE, Cloud Native, software design and architecture.
  3. Agenda Scaling Your AWS Deployment Pipeline: Case study of CloudOne

    SRE Context Challenges CloudOne deployment pipeline platform Pros, Cons Takeways AWS, Multi-region, Multi-account, CI/CD, Shift Right testing, Ring deployment
  4. How CloudOne SRE work Operational readiness review Architecture review Threat

    Modeling Cost optimization SLI/SLO/ Runbook Deployment pipeline…etc
  5. From Monolith to Microservice deployment frequency ⬆️, # of microservice

    ⬆️ —>CI/CD pipeline ⬆️ 2016 Monthly Release SaaS Monolith 2017 3 week ~ 5 week 2019 1 ~ 5 day Microservice 2023 1000 deployment per month 2
  6. High service/region onboarding overhead issue deployment complexity: n service x

    5 stage x 9 region New Service Dev Team need to build CI/CD pipeline to fit CloudOne requirement New Account Provisioning issue Isolation by separate AWS account High Provisioning cost New Region Need support by every dev team 3 Basic CI/CD pipeline in CloudOne
  7. Deployment machine issue IaC: Serverless framework (CloudFormation) issues Single point

    of failure Deployment machine reliability issue Security Need high privilege on EC2 instance profile 4
  8. Familiarity issue for SRE 1 CI/CD pipeline Every service may

    have one design/layout/structure of pipeline 2 IaC Cloudformation x Terraform x (CDK ) P1 incident ‎️‍🔥 5
  9. Challenges Recap 1 High Region/Service Onboarding overhead microservice⬆️ AWS account

    provisioning issue deployment complexity: n service x 5 stage x 9 region 2 SPF on deployment machine Deployment machine in every AWS account Single point of failure EC2 security issue 3 Risk control Deploy multi-region service to Global 4 No standard for CI/CD pipeline Familiarity issue for SRE
  10. Solution Standardized CD pipeline platform for CloudOne 1 Flexible Deployment

    Management Support 5 stage and 9 Region 2 Centralize Centralized deployment platform and AWS account orchestration 3 Deployment Risk control
  11. CD platform CodePipeline x CloudFormation Manage AWS account to deploy

    CodePipeline control which region/account need to deploy In every deploy stage, CodePipeline will assume role to target account and trigger cloudoformation update
  12. New flow Branch strategy: Github flow diagram image source: https://build5nines.com/introduction-to-git-version-control-workflow/

    A developer creates a new branch in the git repo and pushes some code. 1. GitHub sends a webhook push event to Jenkins. 2. Jenkins starts a new job execution to build, compile, and static testing the branch code. 3. The Jenkins job creates a new AWS CodePipeline for the branch and triggers a new CodePipeline execution. 4. The CodePipeline deploys the service CloudFormation template. 5. improvement Streamlined Environment Management Using deploy platform, the team was able to reduce manual effort and improve efficiency. Increased Reliability Deployments became more reliable PR/Branch build Early find IaC issue in dev stage
  13. pipeline-config example serviceName: service_x regionsToDeployByEnvironment: #dev: # - us-east-2 alpha:

    - us-east-2 staging: - us-east-1 prod: - us-east-1 - us-east-2 dr: - us-east-1 deploySteps: - typeID: DeployCloudFormation - typeID: InvokeLambda lambdaName: healthCheckCanary Dynamic Create Stage
  14. How we deploy multi-region service safely? Ring deployment img source:

    https://devblogs.microsoft.com/devops/configuring-your-release-pipelines-for-safe-deployments/
  15. Challenges 1 High Region/Service Onboarding overhead microservice⬆️ AWS account provisioning

    issue deployment complexity: n service x 5 stage x 9 region 2 SPF on deployment machine Deployment machine in every account Single point of failure EC2 security issue 3 Risk control Deploy multi-region service to Global 4 No standard for CI/CD pipeline Familiarity issue for SRE
  16. Solution Standardized CD pipeline platform for CloudOne 1 Centralize Centralized

    deployment platform and AWS account orchestration 2 CodePipeline x Cloudformation Flexible deployment pipeline Support 5 stage and 9 Region 3 Ring Deployment To minimize deployment risk with multi-region service
  17. Pros and Cons Pros Centralize Environment/Account Management Standardized deployment for

    CloudOne service Scalability/Reliability ⬆️ Early find out the IaC issue in PR/branch build Cons Slow (due to several static testing for compliance) (8~15mins) Testing need also execute in AWS Only supported Serverless framework Resource cost up (due to PR/branch build)
  18. Takeaways 1 Serverless Use serverless for fundamental internal service to

    let more reliable 2 Ring deployment minimize deployment risk with multi-region service 3 Operation readiness review Some issue can be found before go to production