# Zero-to-Deployment Checklist

A fresh instance with an empty data disk exposes every implicit dependency that production silently relies on. This checklist covers everything that must be explicitly created before services will start.

## Pre-flight: cloud-init must handle

These run at OS boot, before Terraform provisioners:

- [ ] **Mount data disk**: Format if new (`blkid` check), mount to `/data`, add to fstab
- [ ] **Create service directories**: `mkdir -p /data/{service1,service2,...}` — file provisioners fail if target dir doesn't exist
- [ ] **Install Docker + Compose**: Curl installer, enable systemd service
- [ ] **Configure swap**: `fallocate` on data disk (NOT system disk)
- [ ] **SSH hardening**: key-only auth, no password root login
- [ ] **Firewall**: UFW + DOCKER-USER iptables chain
- [ ] **Debconf preseed**: For any package with interactive prompts (iptables-persistent, etc.)
- [ ] **Signal readiness**: Write timestamp to `/data/cloud-init.log`

## Provisioner ordering

Terraform provisioners execute in declaration order within a resource, but resources execute in parallel unless `depends_on` is set.

```
lobehub_deploy ──────────────────→ channel_sync (depends_on lobehub)
                                 → casdoor_sync (depends_on lobehub)
                                 → minio_sync (depends_on lobehub)

claude4dev_deploy (depends_on lobehub_deploy)
  ├─ wait for cloud-init
  ├─ upload source (tarball via file provisioner)
  ├─ upload .env (staging variant)
  ├─ start stateful (postgres, redis) --no-recreate
  ├─ run DB migrations
  ├─ build stateless images
  ├─ fix volume permissions
  ├─ start stateless (relay, api, frontend, gateway)
  └─ verify health
```

## Database bootstrap

### PostgreSQL databases

PostgreSQL `docker-entrypoint-initdb.d` scripts only run when the data directory is empty (first-ever start). On subsequent starts — even if a database doesn't exist — init scripts are skipped.

**Fix**: Explicitly create databases in provisioner:
```bash
# Wait for postgres healthy
sleep 10
# Create database if missing (idempotent)
docker exec my-postgres psql -U postgres -tc \
  "SELECT 1 FROM pg_database WHERE datname='mydb'" | grep -q 1 \
  || docker exec my-postgres psql -U postgres -c "CREATE DATABASE mydb;"
```

### Schema migrations

Migrations must be idempotent. Track applied versions:
```bash
PSQL='docker compose exec -T postgres psql -v ON_ERROR_STOP=1 -U myuser -d mydb'

# Create tracking table
$PSQL -tAc "CREATE TABLE IF NOT EXISTS schema_migrations (
  version TEXT PRIMARY KEY,
  applied_at TIMESTAMPTZ DEFAULT now()
)"

# Apply each migration file in order
for f in migrations/*.sql; do
  VER=$(basename $f)
  APPLIED=$($PSQL -tAc "SELECT 1 FROM schema_migrations WHERE version='$VER'" | tr -d ' ')
  if [ "$APPLIED" = "1" ]; then
    echo "Skip: $VER"
  else
    echo "Apply: $VER"
    { echo 'BEGIN;'; cat $f; echo 'COMMIT;'; } | $PSQL
    $PSQL -tAc "INSERT INTO schema_migrations(version) VALUES ('$VER') ON CONFLICT DO NOTHING"
  fi
done
```

## Docker build on remote

### Proxy mode

Docker Compose reads build args from `.env` via `${VAR:-default}`. Command-line env vars do NOT override `.env` values for compose interpolation.

```bash
# WRONG: compose still reads DOCKER_WITH_PROXY_MODE from .env
DOCKER_WITH_PROXY_MODE=disabled docker compose build myapp

# RIGHT: modify .env so compose reads the correct value
grep -q DOCKER_WITH_PROXY_MODE .env || echo 'DOCKER_WITH_PROXY_MODE=disabled' >> .env
docker compose build myapp
```

### Memory management

Building Docker images while 10+ containers run can OOM on small instances (8GB). Strategy:

```bash
# Stop non-critical containers to free RAM
cd /data/other-project && docker compose stop search-engine analytics-db || true

# Build (memory-intensive)
cd /data/myproject && docker compose build myapp

# Restart stopped containers
cd /data/other-project && docker compose up -d search-engine analytics-db || true
```

## Volume permissions

Containers running as non-root need writable volume directories:

```bash
# Before docker compose up:
mkdir -p data-dir logs-dir
chown -R 1001:1001 data-dir logs-dir  # match container UID
```

Find the UID from the Dockerfile:
```dockerfile
RUN adduser -S myuser -u 1001 -G mygroup
USER myuser  # runs as uid 1001
```

## Environment-specific .env files

Production `.env` contains production URLs. Staging needs its own `.env` with:

| Variable | Production | Staging |
|---|---|---|
| `FRONTEND_URL` | `https://myapp.com` | `https://staging.myapp.com` |
| `CORS_ORIGIN` | `https://myapp.com` | `https://staging.myapp.com` |
| `NEW_API_URL` | `http://api-container:3000` | Same (internal Docker network) |
| `DOCKER_WITH_PROXY_MODE` | `required` (if behind proxy) | `disabled` (direct internet) |

**Pattern**: Create `.env.staging` alongside `.env`. In Terraform:
```hcl
locals {
  env_src = "${local.repo}/.env.staging"  # staging-specific
}

provisioner "file" {
  source      = local.env_src
  destination = "${local.deploy_dir}/.env"
}
```

Rsync must exclude `.env` files (otherwise production .env overwrites staging .env):
```
--exclude=.env --exclude='.env.*'
```

## Verification template

After all services start, verify in the provisioner (not ad-hoc SSH):

```bash
sleep 20
echo '=== Service logs ==='
docker logs my-critical-service --tail 20 2>&1 || true
echo '=== All containers ==='
docker ps --format 'table {{.Names}}\t{{.Status}}' 2>&1 || true
# Final gate (only line that can fail)
docker ps --filter name=my-critical-service --format '{{.Status}}' | grep -q healthy \
  || { echo 'FATAL: service unhealthy'; exit 1; }
```