Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Incident Management with Terraform and PagerDuty

Incident Management with Terraform and PagerDuty

In this presentation, I explain how to manage incidents with HashiCorp Terraform and PagerDuty.

This version of the talk was given at AWS re:Invent, in December 2024.

Avatar for Kerim Satirli

Kerim Satirli

December 05, 2024
Tweet

More Decks by Kerim Satirli

Other Decks in Technology

Transcript

  1. Infrastructure WORKFLOW AUTOMATION SYSTEM OF RECORD LIFECYCLE MANAGEMENT Infrastructure as

    code to build, deploy and manage the lifecycle of infrastructure and applications. Lifecycle
  2. provider "aws" { region = "us-west-2" profile = "pagerduty-x-hashicorp" default_tags

    { Environment = "workshops" } } Provider Terraform uses plugins called "Providers" to interact with APIs. Providers add support for AWS services and related SaaS tools. module "pagerduty_services" { source = "workloads/pagerduty-service" version = "1.2.0" region = "us" services = var.pathfinder_services description = "Pathfinder Services" } Module Encapsulates Terraform files and docs in a ready-to-use format. Can be used to create best-practice building blocks for your customers. Terraform Concepts Terraform CLI that provides access to all Terraform operations. Works locally, and remote via HCP Terraform (SaaS offering) >_ terraform version Terraform v1.10.0 on darwin_amd64
  3. data "aws_ami" "main" { most_recent = true filter { name

    = "owner-alias" values = ["amazon"] } } Data Sources Used by Terraform to consume infrastructure that is not managed by Terraform. Only read operations are supported. { "version": 3, "serial": 2, "terraform_version": "1.10.0", "backend": { "type": "cloud" } } State Maps real world resources and their metadata to your Terraform configuration. Can be stored in HCP Terraform to enable team-wide collaboration. Terraform Concepts Resources Used by Terraform to manage the full lifecycle of an infrastructure item. Create, read, update, and delete operations are supported. resource "aws_instance" "main" { ami = data.aws_ami.main.id instance_type = "t3.micro" tags = { Event = "AWSTechSummitEMEA" } }
  4. resource "pagerduty_team" "websites" { name = "All Websites (US, EU,

    non-regional Microsites)" description = "All Websites" } teams.tf Creating Teams resource "pagerduty_team" "website_us" { name = "Website (US)" description = "US-specific Website and Endpoints." parent = pagerduty_team.websites.id } resource "pagerduty_team" "website_eu" { name = "Website (EU)" description = "EU-specific Website and Endpoints." parent = pagerduty_team.websites.id }
  5. locals { users = toset(csvdecode(file("${path.module}/users.csv"))) } resource "pagerduty_user" "users_us" {

    for_each = { for user in local.users : user.email !" user if user.team !# "website_us" } name = each.value.name email = each.value.email job_title = each.value.job_title role = each.value.role } teams.tf Creating Users
  6. data "pagerduty_extension_schema" "webhook" { name = "Generic V2 Webhook" }

    resource "pagerduty_extension" "slack_website_us" { name = "Slack Extension for ${pagerduty_service.website_us.name}." endpoint_url = "https:!$slack.svcs.dev/XXX/YYY" extension_schema = data.pagerduty_extension_schema.webhook.id extension_objects = [ pagerduty_service.website_us.id ] config = !!% } teams.tf Adding Extensions
  7. resource "pagerduty_addon" "status_page" { name = "Internal Status Page" src

    = aws_elb.websites_status_page.dns_name } teams.tf Creating Teams
  8. resource "pagerduty_escalation_policy" "websites" { name = "Policy for Websites (US,

    EU) SREs" num_loops = 2 teams = [pagerduty_team.websites.id] rule { escalation_delay_in_minutes = 5 dynamic "target" { for_each = pagerduty_user.users_us content { type = "user_reference" id = target.value.id } } } } teams.tf Escalation Policies