Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned from Scaling Infrastructure as ...

Lessons Learned from Scaling Infrastructure as Code

Originally presented at Summer Systems @Scale 2022.

You adopted an infrastructure as code tool like Terraform. What started as one person writing some configuration and deploying new infrastructure scales to everyone in the company writing their own infrastructure configuration and deploying their own systems. In this talk, we’ll share some of the lessons learned across the Terraform community when scaling infrastructure as code practices from one team to an entire company and its users. We’ll cover the patterns and practices that help address challenges of updating infrastructure, managing infrastructure modules, maintaining security, streamlining cost, and even upgrading and migrating tools.

Rosemary Wang

June 29, 2022
Tweet

More Decks by Rosemary Wang

Other Decks in Technology

Transcript

  1. Standardize infrastructure resources. TERMINAL module "boundary" { depends_on = [module.vpc]

    source = "joatmon08/boundary/aws" version = "0.2.0" vpc_id = module.vpc.vpc_id vpc_cidr_block = module.vpc.vpc_cidr_block public_subnet_ids = module.vpc.public_subnets private_subnet_ids = module.vpc.database_subnets name = var.name key_pair_name = var.key_pair_name allow_cidr_blocks_to_workers = var.client_cidr_block allow_cidr_blocks_to_api = ["0.0.0.0/0"] boundary_db_password = random_password.boundary_database.result }
  2. Infrastructure Modules 1. Set opinionated defaults. 2. Allow collaboration (pull

    requests). 3. Apply a testing strategy. 
 unit, contract, & integration tests 4. Version modules.
  3. Infrastructure Configuration 1. Minimize blast radius of changes. 2. Use

    immutability to roll forward. 
 terraform apply -target, terraform taint 3. Use version control when possible.
  4. Decouple dependencies. TERMINAL data "aws_eks_cluster" "cluster" { name = var.aws_eks_cluster_id

    == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } data "aws_eks_cluster_auth" "cluster" { name = var.aws_eks_cluster_id == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } provider "kubernetes" { host = data.aws_eks_cluster.cluster.endpoin t cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data ) token = data.aws_eks_cluster_auth.cluster.toke n experiments { manifest_resource = tru e } }
  5. Dependency Injection 1. Reference outputs stored in state 2. Use

    the infrastructure API 3. Store and reference from configuration manager
  6. Download modules and plugins. github.com/hashicorp/go- getter TERMINAL > terraform ini

    t Initializing the backend.. . Initializing provider plugins.. . - terraform.io/builtin/terraform is built in to Terrafor m - Reusing previous version of hashicorp/aws from the dependency lock fil e - Installing hashicorp/aws v4.15.0.. . - Installed hashicorp/aws v4.15.0 (signed by HashiCorp ) - Installing hashicorp/boundary v1.0.6.. . - Installed hashicorp/boundary v1.0.6 (signed by HashiCorp )
  7. 1. Use internal artifact repository. 2. Cache providers & modules

    on local filesystem. 
 git submodule add terraform.io/language/providers/requirements#in-house-providers
  8. Refresh state. Reads information from infrastructure API. > terraform appl

    y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR
  9. Apply changes. Create, read, update, and delete resources with infrastructure

    API. > terraform appl y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR
  10. 1. Enable concurrent operations. 
 terraform apply -parallelism=n 2. Tune

    infrastructure API (rate limiting). 
 create, read, update, delete 3. Modularize into fewer resources. 
 faster refresh & apply, fewer state locking conflicts
  11. CODE EDITOR import "tfplan/v2" as tfplan database_only_has_non_permissive_firewall_rules = rule {

    all database_firewall_rules as firewall_rule { firewall_rule.values.start_ip_address is not "0.0.0.0" and firewall_rule.values.end_ip_address is not "255.255.255.255" } } resources_with_tag_field_have_defined_tags = rule { all resources_with_tag_field as resource { resource.values.tags is not null } }
  12. 1. Standardize tests for static analysis of IaC. 
 secure

    standards & defaults 2. Enforce changes through IaC. 
 terraform apply -target, terraform taint 3. Enable dynamic analysis of infrastructure. 
 drift detection, automated reconciliation
  13. Commit changes. Run unit tests. Run integration tests. Estimate cost.

    Deploy changes. ✓ Security ✓ Cost compliance test_cpu_size_less_than_or_equal_to_32()
  14. Cost Compliance 1. Enforce tags. 
 expiration date, standard tagging

    2. Implement reboot schedule. 3. Set resource type, size, or reservation. 4. Check autoscaling enabled.