12 KiB
12 KiB
Terraform State Management Reference
Backend Configuration Patterns
AWS: S3 + DynamoDB (Recommended)
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "project/env/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
# Optional: KMS key for encryption
# kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
}
}
Prerequisites:
# Bootstrap these resources manually or with a separate Terraform config
resource "aws_s3_bucket" "state" {
bucket = "mycompany-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "state" {
bucket = aws_s3_bucket.state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "state" {
bucket = aws_s3_bucket.state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
GCP: Google Cloud Storage
terraform {
backend "gcs" {
bucket = "mycompany-terraform-state"
prefix = "project/env"
}
}
Key features:
- Native locking (no separate lock table needed)
- Object versioning for state history
- IAM-based access control
- Encryption at rest by default
Azure: Blob Storage
terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "mycompanytfstate"
container_name = "tfstate"
key = "project/env/terraform.tfstate"
}
}
Key features:
- Native blob locking
- Encryption at rest with Microsoft-managed or customer-managed keys
- RBAC-based access control
Terraform Cloud / Enterprise
terraform {
cloud {
organization = "mycompany"
workspaces {
name = "project-dev"
}
}
}
Key features:
- Built-in state locking, encryption, and versioning
- RBAC and team-based access control
- Remote execution (plan/apply run in TF Cloud)
- Sentinel policy-as-code integration
- Cost estimation on plans
Environment Isolation Strategies
Strategy 1: Separate Directories (Recommended)
infrastructure/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── backend.tf # key = "project/dev/terraform.tfstate"
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ ├── backend.tf # key = "project/staging/terraform.tfstate"
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ ├── backend.tf # key = "project/prod/terraform.tfstate"
│ └── terraform.tfvars
└── modules/
└── ...
Pros:
- Complete isolation — a mistake in dev can't affect prod
- Different provider versions per environment
- Different module versions per environment (pin prod, iterate in dev)
- Clear audit trail — who changed what, where
Cons:
- Some duplication across environment directories
- Must update modules in each environment separately
Strategy 2: Terraform Workspaces
# Single directory, multiple workspaces
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
# State files stored at:
# env:/dev/project/terraform.tfstate
# env:/staging/project/terraform.tfstate
# env:/prod/project/terraform.tfstate
terraform workspace new dev
terraform workspace select dev
terraform plan -var-file="env/dev.tfvars"
Pros:
- Less duplication — single set of .tf files
- Quick to switch between environments
- Built-in workspace support in backends
Cons:
- Shared code means a bug affects all environments simultaneously
- Can't have different provider versions per workspace
- Easy to accidentally apply to wrong workspace
- Less isolation than separate directories
Strategy 3: Terragrunt (DRY Configuration)
infrastructure/
├── terragrunt.hcl # Root — defines remote state pattern
├── modules/
│ └── vpc/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── dev/
│ ├── terragrunt.hcl # env = "dev"
│ └── vpc/
│ └── terragrunt.hcl # inputs for dev VPC
├── staging/
│ └── ...
└── prod/
└── ...
# Root terragrunt.hcl
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "mycompany-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# dev/vpc/terragrunt.hcl
terraform {
source = "../../modules/vpc"
}
inputs = {
environment = "dev"
vpc_cidr = "10.0.0.0/16"
}
Pros:
- Maximum DRY — define module once, parameterize per environment
- Automatic state key generation from directory structure
- Dependency management between modules (
dependencyblocks) run-allfor applying multiple modules at once
Cons:
- Additional tool dependency (Terragrunt)
- Learning curve
- Debugging can be harder (generated files)
State Migration Patterns
Local to Remote (S3)
# 1. Add backend configuration to backend.tf
# 2. Run init with migration flag
terraform init -migrate-state
# Terraform will prompt:
# "Do you want to copy existing state to the new backend?"
# Answer: yes
Between Remote Backends
# 1. Pull current state
terraform state pull > terraform.tfstate.backup
# 2. Update backend configuration in backend.tf
# 3. Reinitialize with migration
terraform init -migrate-state
# 4. Verify
terraform plan # Should show no changes
State Import (Existing Resources)
# Import a single resource
terraform import aws_instance.web i-1234567890abcdef0
# Import with for_each key
terraform import 'aws_subnet.public["us-east-1a"]' subnet-0123456789abcdef0
# Bulk import (Terraform 1.5+ import blocks)
import {
to = aws_instance.web
id = "i-1234567890abcdef0"
}
State Move (Refactoring)
# Rename a resource (avoids destroy/recreate)
terraform state mv aws_instance.old_name aws_instance.new_name
# Move into a module
terraform state mv aws_instance.web module.compute.aws_instance.web
# Move between state files
terraform state mv -state-out=other.tfstate aws_instance.web aws_instance.web
State Locking
Why Locking Matters
Without locking, two concurrent terraform apply runs can corrupt state. The second apply reads stale state and may create duplicate resources or lose track of existing ones.
Lock Behavior by Backend
| Backend | Lock Mechanism | Auto-Lock | Force Unlock |
|---|---|---|---|
| S3 | DynamoDB table | Yes (if table configured) | terraform force-unlock LOCK_ID |
| GCS | Native blob locking | Yes | terraform force-unlock LOCK_ID |
| Azure Blob | Native blob lease | Yes | terraform force-unlock LOCK_ID |
| TF Cloud | Built-in | Always | Via UI or API |
| Consul | Key-value lock | Yes | terraform force-unlock LOCK_ID |
| Local | .terraform.lock.hcl |
Yes (single user) | Delete lock file |
Force Unlock (Emergency Only)
# Only use when you're certain no other process is running
terraform force-unlock LOCK_ID
# The LOCK_ID is shown in the error message when lock fails:
# Error: Error locking state: Error acquiring the state lock
# Lock Info:
# ID: 12345678-abcd-1234-abcd-1234567890ab
State Security Best Practices
1. Encrypt at Rest
# S3 — server-side encryption
backend "s3" {
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY_ID"
}
2. Restrict Access
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::mycompany-terraform-state/project/*",
"Condition": {
"StringEquals": {
"aws:PrincipalTag/Team": "platform"
}
}
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem"
],
"Resource": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/terraform-locks"
}
]
}
3. Enable Versioning (State History)
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.state.id
versioning_configuration {
status = "Enabled"
}
}
Versioning lets you recover from state corruption by restoring a previous version.
4. Audit Access
- Enable S3 access logging or CloudTrail data events
- Monitor for unexpected state reads (potential secret extraction)
- State files contain sensitive values — treat them like credentials
5. Sensitive Values in State
Terraform stores all resource attributes in state, including passwords, private keys, and tokens. This is unavoidable. Mitigate by:
- Encrypting state at rest (KMS)
- Restricting state file access (IAM)
- Using
sensitive = trueon variables and outputs (prevents display, not storage) - Rotating secrets regularly (state contains the value at apply time)
Drift Detection and Reconciliation
Detect Drift
# Plan with detailed exit code
terraform plan -detailed-exitcode
# Exit 0 = no changes
# Exit 1 = error
# Exit 2 = changes detected (drift)
Common Drift Sources
| Source | Example | Prevention |
|---|---|---|
| Console changes | Someone edits SG rules in AWS Console | SCPs to restrict console access, or accept and reconcile |
| Auto-scaling | ASG launches instances not in state | Don't manage individual instances; manage ASG |
| External tools | Ansible modifies EC2 tags | Agree on ownership boundaries |
| Dependent resource changes | AMI deregistered | Use data sources to detect, lifecycle ignore_changes |
Reconciliation Options
# Option 1: Apply to restore desired state
terraform apply
# Option 2: Refresh state to match reality
terraform apply -refresh-only
# Option 3: Ignore specific attribute drift
resource "aws_instance" "web" {
lifecycle {
ignore_changes = [tags["LastModifiedBy"], ami]
}
}
# Option 4: Import the manually-created resource
terraform import aws_security_group_rule.new sg-12345_ingress_tcp_443_443_0.0.0.0/0
Troubleshooting Checklist
| Symptom | Likely Cause | Fix |
|---|---|---|
| "Error acquiring state lock" | Concurrent run or crashed process | Wait for other run to finish, or force-unlock |
| "Backend configuration changed" | Backend config modified | Run terraform init -reconfigure or -migrate-state |
| "Resource already exists" | Resource created outside Terraform | terraform import the resource |
| "No matching resource found" | Resource deleted outside Terraform | terraform state rm the resource |
| State file growing very large | Too many resources in one state | Split into smaller state files using modules |
| Slow plan/apply | Large state file, many resources | Split state, use -target for urgent changes |
| "Provider produced inconsistent result" | Provider bug or API race condition | Retry, or pin provider version |
| Workspace confusion | Applied to wrong workspace | Always check terraform workspace show before apply |