firefrost-gaming/claude-skills-reference

Files

Leo dac49ee9f9 feat(skills): add terraform-patterns agent skill

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-15 23:29:01 +01:00

12 KiB

Raw Blame History

Terraform State Management Reference

Backend Configuration Patterns

AWS: S3 + DynamoDB (Recommended)

terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "project/env/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
    # Optional: KMS key for encryption
    # kms_key_id   = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
  }
}

Prerequisites:

# Bootstrap these resources manually or with a separate Terraform config
resource "aws_s3_bucket" "state" {
  bucket = "mycompany-terraform-state"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
  bucket = aws_s3_bucket.state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "state" {
  bucket                  = aws_s3_bucket.state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

GCP: Google Cloud Storage

terraform {
  backend "gcs" {
    bucket = "mycompany-terraform-state"
    prefix = "project/env"
  }
}

Key features:

Native locking (no separate lock table needed)
Object versioning for state history
IAM-based access control
Encryption at rest by default

Azure: Blob Storage

terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-state-rg"
    storage_account_name = "mycompanytfstate"
    container_name       = "tfstate"
    key                  = "project/env/terraform.tfstate"
  }
}

Key features:

Native blob locking
Encryption at rest with Microsoft-managed or customer-managed keys
RBAC-based access control

Terraform Cloud / Enterprise

terraform {
  cloud {
    organization = "mycompany"
    workspaces {
      name = "project-dev"
    }
  }
}

Key features:

Built-in state locking, encryption, and versioning
RBAC and team-based access control
Remote execution (plan/apply run in TF Cloud)
Sentinel policy-as-code integration
Cost estimation on plans

Environment Isolation Strategies

Strategy 1: Separate Directories (Recommended)

infrastructure/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── backend.tf      # key = "project/dev/terraform.tfstate"
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf      # key = "project/staging/terraform.tfstate"
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── backend.tf      # key = "project/prod/terraform.tfstate"
│       └── terraform.tfvars
└── modules/
    └── ...

Pros:

Complete isolation — a mistake in dev can't affect prod
Different provider versions per environment
Different module versions per environment (pin prod, iterate in dev)
Clear audit trail — who changed what, where

Cons:

Some duplication across environment directories
Must update modules in each environment separately

Strategy 2: Terraform Workspaces

# Single directory, multiple workspaces
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "project/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# State files stored at:
# env:/dev/project/terraform.tfstate
# env:/staging/project/terraform.tfstate
# env:/prod/project/terraform.tfstate

terraform workspace new dev
terraform workspace select dev
terraform plan -var-file="env/dev.tfvars"

Pros:

Less duplication — single set of .tf files
Quick to switch between environments
Built-in workspace support in backends

Cons:

Shared code means a bug affects all environments simultaneously
Can't have different provider versions per workspace
Easy to accidentally apply to wrong workspace
Less isolation than separate directories

Strategy 3: Terragrunt (DRY Configuration)

infrastructure/
├── terragrunt.hcl          # Root — defines remote state pattern
├── modules/
│   └── vpc/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── dev/
│   ├── terragrunt.hcl      # env = "dev"
│   └── vpc/
│       └── terragrunt.hcl  # inputs for dev VPC
├── staging/
│   └── ...
└── prod/
    └── ...

# Root terragrunt.hcl
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "mycompany-terraform-state"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# dev/vpc/terragrunt.hcl
terraform {
  source = "../../modules/vpc"
}

inputs = {
  environment = "dev"
  vpc_cidr    = "10.0.0.0/16"
}

Pros:

Maximum DRY — define module once, parameterize per environment
Automatic state key generation from directory structure
Dependency management between modules (dependency blocks)
run-all for applying multiple modules at once

Cons:

Additional tool dependency (Terragrunt)
Learning curve
Debugging can be harder (generated files)

State Migration Patterns

Local to Remote (S3)

# 1. Add backend configuration to backend.tf
# 2. Run init with migration flag
terraform init -migrate-state

# Terraform will prompt:
# "Do you want to copy existing state to the new backend?"
# Answer: yes

Between Remote Backends

# 1. Pull current state
terraform state pull > terraform.tfstate.backup

# 2. Update backend configuration in backend.tf

# 3. Reinitialize with migration
terraform init -migrate-state

# 4. Verify
terraform plan  # Should show no changes

State Import (Existing Resources)

# Import a single resource
terraform import aws_instance.web i-1234567890abcdef0

# Import with for_each key
terraform import 'aws_subnet.public["us-east-1a"]' subnet-0123456789abcdef0

# Bulk import (Terraform 1.5+ import blocks)
import {
  to = aws_instance.web
  id = "i-1234567890abcdef0"
}

State Move (Refactoring)

# Rename a resource (avoids destroy/recreate)
terraform state mv aws_instance.old_name aws_instance.new_name

# Move into a module
terraform state mv aws_instance.web module.compute.aws_instance.web

# Move between state files
terraform state mv -state-out=other.tfstate aws_instance.web aws_instance.web

State Locking

Why Locking Matters

Without locking, two concurrent terraform apply runs can corrupt state. The second apply reads stale state and may create duplicate resources or lose track of existing ones.

Lock Behavior by Backend

Backend	Lock Mechanism	Auto-Lock	Force Unlock
S3	DynamoDB table	Yes (if table configured)	`terraform force-unlock LOCK_ID`
GCS	Native blob locking	Yes	`terraform force-unlock LOCK_ID`
Azure Blob	Native blob lease	Yes	`terraform force-unlock LOCK_ID`
TF Cloud	Built-in	Always	Via UI or API
Consul	Key-value lock	Yes	`terraform force-unlock LOCK_ID`
Local	`.terraform.lock.hcl`	Yes (single user)	Delete lock file

Force Unlock (Emergency Only)

# Only use when you're certain no other process is running
terraform force-unlock LOCK_ID

# The LOCK_ID is shown in the error message when lock fails:
# Error: Error locking state: Error acquiring the state lock
# Lock Info:
#   ID:        12345678-abcd-1234-abcd-1234567890ab

State Security Best Practices

1. Encrypt at Rest

# S3 — server-side encryption
backend "s3" {
  encrypt    = true
  kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
}

2. Restrict Access

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::mycompany-terraform-state/project/*",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/Team": "platform"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/terraform-locks"
    }
  ]
}

3. Enable Versioning (State History)

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

Versioning lets you recover from state corruption by restoring a previous version.

4. Audit Access

Enable S3 access logging or CloudTrail data events
Monitor for unexpected state reads (potential secret extraction)
State files contain sensitive values — treat them like credentials

5. Sensitive Values in State

Terraform stores all resource attributes in state, including passwords, private keys, and tokens. This is unavoidable. Mitigate by:

Encrypting state at rest (KMS)
Restricting state file access (IAM)
Using sensitive = true on variables and outputs (prevents display, not storage)
Rotating secrets regularly (state contains the value at apply time)

Drift Detection and Reconciliation

Detect Drift

# Plan with detailed exit code
terraform plan -detailed-exitcode
# Exit 0 = no changes
# Exit 1 = error
# Exit 2 = changes detected (drift)

Common Drift Sources

Source	Example	Prevention
Console changes	Someone edits SG rules in AWS Console	SCPs to restrict console access, or accept and reconcile
Auto-scaling	ASG launches instances not in state	Don't manage individual instances; manage ASG
External tools	Ansible modifies EC2 tags	Agree on ownership boundaries
Dependent resource changes	AMI deregistered	Use data sources to detect, lifecycle ignore_changes

Reconciliation Options

# Option 1: Apply to restore desired state
terraform apply

# Option 2: Refresh state to match reality
terraform apply -refresh-only

# Option 3: Ignore specific attribute drift
resource "aws_instance" "web" {
  lifecycle {
    ignore_changes = [tags["LastModifiedBy"], ami]
  }
}

# Option 4: Import the manually-created resource
terraform import aws_security_group_rule.new sg-12345_ingress_tcp_443_443_0.0.0.0/0

Troubleshooting Checklist

Symptom	Likely Cause	Fix
"Error acquiring state lock"	Concurrent run or crashed process	Wait for other run to finish, or `force-unlock`
"Backend configuration changed"	Backend config modified	Run `terraform init -reconfigure` or `-migrate-state`
"Resource already exists"	Resource created outside Terraform	`terraform import` the resource
"No matching resource found"	Resource deleted outside Terraform	`terraform state rm` the resource
State file growing very large	Too many resources in one state	Split into smaller state files using modules
Slow plan/apply	Large state file, many resources	Split state, use `-target` for urgent changes
"Provider produced inconsistent result"	Provider bug or API race condition	Retry, or pin provider version
Workspace confusion	Applied to wrong workspace	Always check `terraform workspace show` before apply

12 KiB Raw Blame History

Terraform State Management Reference

Backend Configuration Patterns

AWS: S3 + DynamoDB (Recommended)

GCP: Google Cloud Storage

Azure: Blob Storage

Terraform Cloud / Enterprise

Environment Isolation Strategies

Strategy 1: Separate Directories (Recommended)

Strategy 2: Terraform Workspaces

Strategy 3: Terragrunt (DRY Configuration)

State Migration Patterns

Local to Remote (S3)

Between Remote Backends

State Import (Existing Resources)

State Move (Refactoring)

State Locking

Why Locking Matters

Lock Behavior by Backend

Force Unlock (Emergency Only)

State Security Best Practices

1. Encrypt at Rest

2. Restrict Access

3. Enable Versioning (State History)

4. Audit Access

5. Sensitive Values in State

Drift Detection and Reconciliation

Detect Drift

Common Drift Sources

Reconciliation Options

Troubleshooting Checklist

12 KiB

Raw Blame History