Files
claude-code-skills-reference/terraform-skill/references/zero-to-deploy-checklist.md
daymade 87221d94d5 feat(pdf-creator): add theme system + Chrome backend; add terraform-skill draft
- pdf-creator v1.2.0: theme system (default/warm-terra), dual backend
  (weasyprint/chrome auto-detect), argparse CLI, extracted CSS to themes/
- terraform-skill: operational traps from real deployments (provisioner
  timing, DNS duplication, multi-env isolation, pre-deploy validation)
- asr-transcribe-to-text: add security scan marker

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 23:33:03 +08:00

5.6 KiB

Zero-to-Deployment Checklist

A fresh instance with an empty data disk exposes every implicit dependency that production silently relies on. This checklist covers everything that must be explicitly created before services will start.

Pre-flight: cloud-init must handle

These run at OS boot, before Terraform provisioners:

  • Mount data disk: Format if new (blkid check), mount to /data, add to fstab
  • Create service directories: mkdir -p /data/{service1,service2,...} — file provisioners fail if target dir doesn't exist
  • Install Docker + Compose: Curl installer, enable systemd service
  • Configure swap: fallocate on data disk (NOT system disk)
  • SSH hardening: key-only auth, no password root login
  • Firewall: UFW + DOCKER-USER iptables chain
  • Debconf preseed: For any package with interactive prompts (iptables-persistent, etc.)
  • Signal readiness: Write timestamp to /data/cloud-init.log

Provisioner ordering

Terraform provisioners execute in declaration order within a resource, but resources execute in parallel unless depends_on is set.

lobehub_deploy ──────────────────→ channel_sync (depends_on lobehub)
                                 → casdoor_sync (depends_on lobehub)
                                 → minio_sync (depends_on lobehub)

claude4dev_deploy (depends_on lobehub_deploy)
  ├─ wait for cloud-init
  ├─ upload source (tarball via file provisioner)
  ├─ upload .env (staging variant)
  ├─ start stateful (postgres, redis) --no-recreate
  ├─ run DB migrations
  ├─ build stateless images
  ├─ fix volume permissions
  ├─ start stateless (relay, api, frontend, gateway)
  └─ verify health

Database bootstrap

PostgreSQL databases

PostgreSQL docker-entrypoint-initdb.d scripts only run when the data directory is empty (first-ever start). On subsequent starts — even if a database doesn't exist — init scripts are skipped.

Fix: Explicitly create databases in provisioner:

# Wait for postgres healthy
sleep 10
# Create database if missing (idempotent)
docker exec my-postgres psql -U postgres -tc \
  "SELECT 1 FROM pg_database WHERE datname='mydb'" | grep -q 1 \
  || docker exec my-postgres psql -U postgres -c "CREATE DATABASE mydb;"

Schema migrations

Migrations must be idempotent. Track applied versions:

PSQL='docker compose exec -T postgres psql -v ON_ERROR_STOP=1 -U myuser -d mydb'

# Create tracking table
$PSQL -tAc "CREATE TABLE IF NOT EXISTS schema_migrations (
  version TEXT PRIMARY KEY,
  applied_at TIMESTAMPTZ DEFAULT now()
)"

# Apply each migration file in order
for f in migrations/*.sql; do
  VER=$(basename $f)
  APPLIED=$($PSQL -tAc "SELECT 1 FROM schema_migrations WHERE version='$VER'" | tr -d ' ')
  if [ "$APPLIED" = "1" ]; then
    echo "Skip: $VER"
  else
    echo "Apply: $VER"
    { echo 'BEGIN;'; cat $f; echo 'COMMIT;'; } | $PSQL
    $PSQL -tAc "INSERT INTO schema_migrations(version) VALUES ('$VER') ON CONFLICT DO NOTHING"
  fi
done

Docker build on remote

Proxy mode

Docker Compose reads build args from .env via ${VAR:-default}. Command-line env vars do NOT override .env values for compose interpolation.

# WRONG: compose still reads DOCKER_WITH_PROXY_MODE from .env
DOCKER_WITH_PROXY_MODE=disabled docker compose build myapp

# RIGHT: modify .env so compose reads the correct value
grep -q DOCKER_WITH_PROXY_MODE .env || echo 'DOCKER_WITH_PROXY_MODE=disabled' >> .env
docker compose build myapp

Memory management

Building Docker images while 10+ containers run can OOM on small instances (8GB). Strategy:

# Stop non-critical containers to free RAM
cd /data/other-project && docker compose stop search-engine analytics-db || true

# Build (memory-intensive)
cd /data/myproject && docker compose build myapp

# Restart stopped containers
cd /data/other-project && docker compose up -d search-engine analytics-db || true

Volume permissions

Containers running as non-root need writable volume directories:

# Before docker compose up:
mkdir -p data-dir logs-dir
chown -R 1001:1001 data-dir logs-dir  # match container UID

Find the UID from the Dockerfile:

RUN adduser -S myuser -u 1001 -G mygroup
USER myuser  # runs as uid 1001

Environment-specific .env files

Production .env contains production URLs. Staging needs its own .env with:

Variable Production Staging
FRONTEND_URL https://myapp.com https://staging.myapp.com
CORS_ORIGIN https://myapp.com https://staging.myapp.com
NEW_API_URL http://api-container:3000 Same (internal Docker network)
DOCKER_WITH_PROXY_MODE required (if behind proxy) disabled (direct internet)

Pattern: Create .env.staging alongside .env. In Terraform:

locals {
  env_src = "${local.repo}/.env.staging"  # staging-specific
}

provisioner "file" {
  source      = local.env_src
  destination = "${local.deploy_dir}/.env"
}

Rsync must exclude .env files (otherwise production .env overwrites staging .env):

--exclude=.env --exclude='.env.*'

Verification template

After all services start, verify in the provisioner (not ad-hoc SSH):

sleep 20
echo '=== Service logs ==='
docker logs my-critical-service --tail 20 2>&1 || true
echo '=== All containers ==='
docker ps --format 'table {{.Names}}\t{{.Status}}' 2>&1 || true
# Final gate (only line that can fail)
docker ps --filter name=my-critical-service --format '{{.Status}}' | grep -q healthy \
  || { echo 'FATAL: service unhealthy'; exit 1; }