feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance

secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:05:11 +01:00
parent 7a2189fa21
commit 87f3a007c9
36 changed files with 13450 additions and 6 deletions
--- a/engineering/secrets-vault-manager/SKILL.md
+++ b/engineering/secrets-vault-manager/SKILL.md
@@ -0,0 +1,403 @@
+---
+name: "secrets-vault-manager"
+description: "Use when the user asks to set up secret management infrastructure, integrate HashiCorp Vault, configure cloud secret stores (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager), implement secret rotation, or audit secret access patterns."
+---
+
+# Secrets Vault Manager
+
+**Tier:** POWERFUL
+**Category:** Engineering
+**Domain:** Security / Infrastructure / DevOps
+
+---
+
+## Overview
+
+Production secret infrastructure management for teams running HashiCorp Vault, cloud-native secret stores, or hybrid architectures. This skill covers policy authoring, auth method configuration, automated rotation, dynamic secrets, audit logging, and incident response.
+
+**Distinct from env-secrets-manager** which handles local `.env` file hygiene and leak detection. This skill operates at the infrastructure layer — Vault clusters, cloud KMS, certificate authorities, and CI/CD secret injection.
+
+### When to Use
+
+- Standing up a new Vault cluster or migrating to a managed secret store
+- Designing auth methods for services, CI runners, and human operators
+- Implementing automated credential rotation (database, API keys, certificates)
+- Auditing secret access patterns for compliance (SOC 2, ISO 27001, HIPAA)
+- Responding to a secret leak that requires mass revocation
+- Integrating secrets into Kubernetes workloads or CI/CD pipelines
+
+---
+
+## HashiCorp Vault Patterns
+
+### Architecture Decisions
+
+| Decision | Recommendation | Rationale |
+|----------|---------------|-----------|
+| Deployment mode | HA with Raft storage | No external dependency, built-in leader election |
+| Auto-unseal | Cloud KMS (AWS KMS / Azure Key Vault / GCP KMS) | Eliminates manual unseal, enables automated restarts |
+| Namespaces | One per environment (dev/staging/prod) | Blast-radius isolation, independent policies |
+| Audit devices | File + syslog (dual) | Vault refuses requests if all audit devices fail — dual prevents outages |
+
+### Auth Methods
+
+**AppRole** — Machine-to-machine authentication for services and batch jobs.
+
+```hcl
+# Enable AppRole
+path "auth/approle/*" {
+  capabilities = ["create", "read", "update", "delete", "list"]
+}
+
+# Application-specific role
+vault write auth/approle/role/payment-service \
+  token_ttl=1h \
+  token_max_ttl=4h \
+  secret_id_num_uses=1 \
+  secret_id_ttl=10m \
+  token_policies="payment-service-read"
+```
+
+**Kubernetes** — Pod-native authentication via service account tokens.
+
+```hcl
+vault write auth/kubernetes/role/api-server \
+  bound_service_account_names=api-server \
+  bound_service_account_namespaces=production \
+  policies=api-server-secrets \
+  ttl=1h
+```
+
+**OIDC** — Human operator access via SSO provider (Okta, Azure AD, Google Workspace).
+
+```hcl
+vault write auth/oidc/role/engineering \
+  bound_audiences="vault" \
+  allowed_redirect_uris="https://vault.example.com/ui/vault/auth/oidc/oidc/callback" \
+  user_claim="email" \
+  oidc_scopes="openid,profile,email" \
+  policies="engineering-read" \
+  ttl=8h
+```
+
+### Secret Engines
+
+| Engine | Use Case | TTL Strategy |
+|--------|----------|-------------|
+| KV v2 | Static secrets (API keys, config) | Versioned, manual rotation |
+| Database | Dynamic DB credentials | 1h default, 24h max |
+| PKI | TLS certificates | 90d leaf certs, 5y intermediate CA |
+| Transit | Encryption-as-a-service | Key rotation every 90d |
+| SSH | Signed SSH certificates | 30m for interactive, 8h for automation |
+
+### Policy Design
+
+Follow least-privilege with path-based granularity:
+
+```hcl
+# payment-service-read policy
+path "secret/data/production/payment/*" {
+  capabilities = ["read"]
+}
+
+path "database/creds/payment-readonly" {
+  capabilities = ["read"]
+}
+
+# Deny access to admin paths explicitly
+path "sys/*" {
+  capabilities = ["deny"]
+}
+```
+
+**Policy naming convention:** `{service}-{access-level}` (e.g., `payment-service-read`, `api-gateway-admin`).
+
+---
+
+## Cloud Secret Store Integration
+
+### Comparison Matrix
+
+| Feature | AWS Secrets Manager | Azure Key Vault | GCP Secret Manager |
+|---------|--------------------|-----------------|--------------------|
+| Rotation | Built-in Lambda | Custom logic via Functions | Cloud Functions |
+| Versioning | Automatic | Manual or automatic | Automatic |
+| Encryption | AWS KMS (default or CMK) | HSM-backed | Google-managed or CMEK |
+| Access control | IAM policies + resource policy | RBAC + Access Policies | IAM bindings |
+| Cross-region | Replication supported | Geo-redundant by default | Replication supported |
+| Audit | CloudTrail | Azure Monitor + Diagnostic Logs | Cloud Audit Logs |
+| Pricing model | Per-secret + per-API call | Per-operation + per-key | Per-secret version + per-access |
+
+### When to Use Which
+
+- **AWS Secrets Manager**: RDS/Aurora credential rotation out of the box. Best when fully on AWS.
+- **Azure Key Vault**: Certificate management strength. Required for Azure AD integrated workloads.
+- **GCP Secret Manager**: Simplest API surface. Best for GKE-native workloads with Workload Identity.
+- **HashiCorp Vault**: Multi-cloud, dynamic secrets, PKI, transit encryption. Best for complex or hybrid environments.
+
+### SDK Access Patterns
+
+**Principle:** Always fetch secrets at startup or via sidecar — never bake into images or config files.
+
+```python
+# AWS Secrets Manager pattern
+import boto3, json
+
+def get_secret(secret_name, region="us-east-1"):
+    client = boto3.client("secretsmanager", region_name=region)
+    response = client.get_secret_value(SecretId=secret_name)
+    return json.loads(response["SecretString"])
+```
+
+```python
+# GCP Secret Manager pattern
+from google.cloud import secretmanager
+
+def get_secret(project_id, secret_id, version="latest"):
+    client = secretmanager.SecretManagerServiceClient()
+    name = f"projects/{project_id}/secrets/{secret_id}/versions/{version}"
+    response = client.access_secret_version(request={"name": name})
+    return response.payload.data.decode("UTF-8")
+```
+
+```python
+# Azure Key Vault pattern
+from azure.identity import DefaultAzureCredential
+from azure.keyvault.secrets import SecretClient
+
+def get_secret(vault_url, secret_name):
+    credential = DefaultAzureCredential()
+    client = SecretClient(vault_url=vault_url, credential=credential)
+    return client.get_secret(secret_name).value
+```
+
+---
+
+## Secret Rotation Workflows
+
+### Rotation Strategy by Secret Type
+
+| Secret Type | Rotation Frequency | Method | Downtime Risk |
+|-------------|-------------------|--------|---------------|
+| Database passwords | 30 days | Dual-account swap | Zero (A/B rotation) |
+| API keys | 90 days | Generate new, deprecate old | Zero (overlap window) |
+| TLS certificates | 60 days before expiry | ACME or Vault PKI | Zero (graceful reload) |
+| SSH keys | 90 days | Vault-signed certificates | Zero (CA-based) |
+| Service tokens | 24 hours | Dynamic generation | Zero (short-lived) |
+| Encryption keys | 90 days | Key versioning (rewrap) | Zero (version coexistence) |
+
+### Database Credential Rotation (Dual-Account)
+
+1. Two database accounts exist: `app_user_a` and `app_user_b`
+2. Application currently uses `app_user_a`
+3. Rotation rotates `app_user_b` password, updates secret store
+4. Application switches to `app_user_b` on next credential fetch
+5. After grace period, `app_user_a` password is rotated
+6. Cycle repeats
+
+### API Key Rotation (Overlap Window)
+
+1. Generate new API key with provider
+2. Store new key in secret store as `current`, move old to `previous`
+3. Deploy applications — they read `current`
+4. After all instances restarted (or TTL expired), revoke `previous`
+5. Monitoring confirms zero usage of old key before revocation
+
+---
+
+## Dynamic Secrets
+
+Dynamic secrets are generated on-demand with automatic expiration. Prefer dynamic secrets over static credentials wherever possible.
+
+### Database Dynamic Credentials (Vault)
+
+```hcl
+# Configure database engine
+vault write database/config/postgres \
+  plugin_name=postgresql-database-plugin \
+  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/app" \
+  allowed_roles="app-readonly,app-readwrite" \
+  username="vault_admin" \
+  password="<admin-password>"
+
+# Create role with TTL
+vault write database/roles/app-readonly \
+  db_name=postgres \
+  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
+  default_ttl=1h \
+  max_ttl=24h
+```
+
+### Cloud IAM Dynamic Credentials
+
+Vault can generate short-lived AWS IAM credentials, Azure service principal passwords, or GCP service account keys — eliminating long-lived cloud credentials entirely.
+
+### SSH Certificate Authority
+
+Replace SSH key distribution with a Vault-signed certificate model:
+
+1. Vault acts as SSH CA
+2. Users/machines request signed certificates with short TTL (30 min)
+3. SSH servers trust the CA public key — no `authorized_keys` management
+4. Certificates expire automatically — no revocation needed for normal operations
+
+---
+
+## Audit Logging
+
+### What to Log
+
+| Event | Priority | Retention |
+|-------|----------|-----------|
+| Secret read access | HIGH | 1 year minimum |
+| Secret creation/update | HIGH | 1 year minimum |
+| Auth method login | MEDIUM | 90 days |
+| Policy changes | CRITICAL | 2 years (compliance) |
+| Failed access attempts | CRITICAL | 1 year |
+| Token creation/revocation | MEDIUM | 90 days |
+| Seal/unseal operations | CRITICAL | Indefinite |
+
+### Anomaly Detection Signals
+
+- Secret accessed from new IP/CIDR range
+- Access volume spike (>3x baseline for a path)
+- Off-hours access for human auth methods
+- Service accessing secrets outside its policy scope (denied requests)
+- Multiple failed auth attempts from single source
+- Token created with unusually long TTL
+
+### Compliance Reporting
+
+Generate periodic reports covering:
+
+1. **Access inventory** — Which identities accessed which secrets, when
+2. **Rotation compliance** — Secrets overdue for rotation
+3. **Policy drift** — Policies modified since last review
+4. **Orphaned secrets** — Secrets with no recent access (>90 days)
+
+Use `audit_log_analyzer.py` to parse Vault or cloud audit logs for these signals.
+
+---
+
+## Emergency Procedures
+
+### Secret Leak Response (Immediate)
+
+**Time target: Contain within 15 minutes of detection.**
+
+1. **Identify scope** — Which secret(s) leaked, where (repo, log, error message, third party)
+2. **Revoke immediately** — Rotate the compromised credential at the source (provider API, Vault, cloud SM)
+3. **Invalidate tokens** — Revoke all Vault tokens that accessed the leaked secret
+4. **Audit blast radius** — Query audit logs for usage of the compromised secret in the exposure window
+5. **Notify stakeholders** — Security team, affected service owners, compliance (if PII/regulated data)
+6. **Post-mortem** — Document root cause, update controls to prevent recurrence
+
+### Vault Seal Operations
+
+**When to seal:** Active security incident affecting Vault infrastructure, suspected key compromise.
+
+**Sealing** stops all Vault operations. Use only as last resort.
+
+**Unseal procedure:**
+1. Gather quorum of unseal key holders (Shamir threshold)
+2. Or confirm auto-unseal KMS key is accessible
+3. Unseal via `vault operator unseal` or restart with auto-unseal
+4. Verify audit devices reconnected
+5. Check active leases and token validity
+
+See `references/emergency_procedures.md` for complete playbooks.
+
+---
+
+## CI/CD Integration
+
+### Vault Agent Sidecar (Kubernetes)
+
+Vault Agent runs alongside application pods, handles authentication and secret rendering:
+
+```yaml
+# Pod annotation for Vault Agent Injector
+annotations:
+  vault.hashicorp.com/agent-inject: "true"
+  vault.hashicorp.com/role: "api-server"
+  vault.hashicorp.com/agent-inject-secret-db: "database/creds/app-readonly"
+  vault.hashicorp.com/agent-inject-template-db: |
+    {{- with secret "database/creds/app-readonly" -}}
+    postgresql://{{ .Data.username }}:{{ .Data.password }}@db:5432/app
+    {{- end }}
+```
+
+### External Secrets Operator (Kubernetes)
+
+For teams preferring declarative GitOps over agent sidecars:
+
+```yaml
+apiVersion: external-secrets.io/v1beta1
+kind: ExternalSecret
+metadata:
+  name: api-credentials
+spec:
+  refreshInterval: 1h
+  secretStoreRef:
+    name: vault-backend
+    kind: ClusterSecretStore
+  target:
+    name: api-credentials
+  data:
+    - secretKey: api-key
+      remoteRef:
+        key: secret/data/production/api
+        property: key
+```
+
+### GitHub Actions OIDC
+
+Eliminate long-lived secrets in CI by using OIDC federation:
+
+```yaml
+- name: Authenticate to Vault
+  uses: hashicorp/vault-action@v2
+  with:
+    url: https://vault.example.com
+    method: jwt
+    role: github-ci
+    jwtGithubAudience: https://vault.example.com
+    secrets: |
+      secret/data/ci/deploy api_key | DEPLOY_API_KEY ;
+      secret/data/ci/deploy db_password | DB_PASSWORD
+```
+
+---
+
+## Anti-Patterns
+
+| Anti-Pattern | Risk | Correct Approach |
+|-------------|------|-----------------|
+| Hardcoded secrets in source code | Leak via repo, logs, error output | Fetch from secret store at runtime |
+| Long-lived static tokens (>30 days) | Stale credentials, no accountability | Dynamic secrets or short TTL + rotation |
+| Shared service accounts | No audit trail per consumer | Per-service identity with unique credentials |
+| No rotation policy | Compromised creds persist indefinitely | Automated rotation on schedule |
+| Secrets in environment variables on CI | Visible in build logs, process table | Vault Agent or OIDC-based injection |
+| Single unseal key holder | Bus factor of 1, recovery blocked | Shamir split (3-of-5) or auto-unseal |
+| No audit device configured | Zero visibility into access | Dual audit devices (file + syslog) |
+| Wildcard policies (`path "*"`) | Over-permissioned, violates least privilege | Explicit path-based policies per service |
+
+---
+
+## Tools
+
+| Script | Purpose |
+|--------|---------|
+| `vault_config_generator.py` | Generate Vault policy and auth config from application requirements |
+| `rotation_planner.py` | Create rotation schedule from a secret inventory file |
+| `audit_log_analyzer.py` | Analyze audit logs for anomalies and compliance gaps |
+
+---
+
+## Cross-References
+
+- **env-secrets-manager** — Local `.env` file hygiene, leak detection, drift awareness
+- **senior-secops** — Security operations, incident response, threat modeling
+- **ci-cd-pipeline-builder** — Pipeline design where secrets are consumed
+- **docker-development** — Container secret injection patterns
+- **helm-chart-builder** — Kubernetes secret management in Helm charts
--- a/engineering/secrets-vault-manager/references/cloud_secret_stores.md
+++ b/engineering/secrets-vault-manager/references/cloud_secret_stores.md
@@ -0,0 +1,354 @@
+# Cloud Secret Store Reference
+
+## Provider Comparison
+
+### Feature Matrix
+
+| Feature | AWS Secrets Manager | Azure Key Vault | GCP Secret Manager |
+|---------|--------------------|-----------------|--------------------|
+| **Secret types** | String, binary | Secrets, keys, certificates | String, binary |
+| **Max secret size** | 64 KB | 25 KB (secret), 200 KB (cert) | 64 KB |
+| **Versioning** | Automatic (all versions) | Manual enable per secret | Automatic |
+| **Rotation** | Built-in Lambda rotation | Custom via Functions/Logic Apps | Custom via Cloud Functions |
+| **Encryption** | AWS KMS (default or CMK) | HSM-backed (FIPS 140-2 L2) | Google-managed or CMEK |
+| **Cross-region** | Replication to multiple regions | Geo-redundant by SKU | Replication supported |
+| **Access control** | IAM + resource-based policies | RBAC + access policies | IAM bindings |
+| **Audit** | CloudTrail | Azure Monitor + Diagnostics | Cloud Audit Logs |
+| **Secret references** | ARN | Vault URI + secret name | Resource name |
+| **Cost model** | $0.40/secret/mo + $0.05/10K calls | $0.03/10K ops (Standard) | $0.06/10K access ops |
+| **Free tier** | No | No | 6 active versions free |
+
+### Decision Guide
+
+**Choose AWS Secrets Manager when:**
+- Fully on AWS
+- Need native RDS/Aurora/Redshift rotation
+- Using ECS/EKS with native AWS IAM integration
+- Cross-account secret sharing via resource policies
+
+**Choose Azure Key Vault when:**
+- Azure-primary workloads
+- Certificate lifecycle management is critical (built-in CA integration)
+- Need HSM-backed key protection (Premium SKU)
+- Azure AD conditional access integration required
+
+**Choose GCP Secret Manager when:**
+- GCP-primary workloads
+- Using GKE with Workload Identity
+- Want simplest API surface (few concepts, fast to integrate)
+- Cost-sensitive (generous free tier)
+
+**Choose HashiCorp Vault when:**
+- Multi-cloud or hybrid environments
+- Dynamic secrets (database, cloud IAM, SSH) are primary use case
+- Need transit encryption, PKI, or SSH CA
+- Regulatory requirement for self-hosted secret management
+
+## AWS Secrets Manager
+
+### Access Patterns
+
+```python
+import boto3
+import json
+from botocore.exceptions import ClientError
+
+def get_secret(secret_name, region="us-east-1"):
+    """Retrieve secret from AWS Secrets Manager."""
+    client = boto3.client("secretsmanager", region_name=region)
+    try:
+        response = client.get_secret_value(SecretId=secret_name)
+    except ClientError as e:
+        code = e.response["Error"]["Code"]
+        if code == "ResourceNotFoundException":
+            raise ValueError(f"Secret {secret_name} not found")
+        elif code == "DecryptionFailureException":
+            raise RuntimeError("KMS decryption failed — check key permissions")
+        raise
+    if "SecretString" in response:
+        return json.loads(response["SecretString"])
+    return response["SecretBinary"]
+```
+
+### Rotation with Lambda
+
+```python
+# rotation_lambda.py — skeleton for custom rotation
+def lambda_handler(event, context):
+    secret_id = event["SecretId"]
+    step = event["Step"]
+    token = event["ClientRequestToken"]
+    client = boto3.client("secretsmanager")
+
+    if step == "createSecret":
+        # Generate new credentials
+        new_password = generate_password()
+        client.put_secret_value(
+            SecretId=secret_id,
+            ClientRequestToken=token,
+            SecretString=json.dumps({"password": new_password}),
+            VersionStages=["AWSPENDING"],
+        )
+    elif step == "setSecret":
+        # Apply new credentials to the target service
+        pending = get_secret_version(client, secret_id, "AWSPENDING", token)
+        apply_credentials(pending)
+    elif step == "testSecret":
+        # Verify new credentials work
+        pending = get_secret_version(client, secret_id, "AWSPENDING", token)
+        test_connection(pending)
+    elif step == "finishSecret":
+        # Mark AWSPENDING as AWSCURRENT
+        client.update_secret_version_stage(
+            SecretId=secret_id,
+            VersionStage="AWSCURRENT",
+            MoveToVersionId=token,
+            RemoveFromVersionId=get_current_version(client, secret_id),
+        )
+```
+
+### IAM Policy for Secret Access
+
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": ["secretsmanager:GetSecretValue"],
+      "Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:production/api/*",
+      "Condition": {
+        "StringEquals": {
+          "aws:RequestedRegion": "us-east-1"
+        }
+      }
+    }
+  ]
+}
+```
+
+### Cross-Account Access
+
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Principal": {"AWS": "arn:aws:iam::987654321098:role/shared-secret-reader"},
+      "Action": "secretsmanager:GetSecretValue",
+      "Resource": "*",
+      "Condition": {
+        "ForAnyValue:StringEquals": {
+          "secretsmanager:VersionStage": "AWSCURRENT"
+        }
+      }
+    }
+  ]
+}
+```
+
+## Azure Key Vault
+
+### Access Patterns
+
+```python
+from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
+from azure.keyvault.secrets import SecretClient
+
+def get_secret(vault_url, secret_name, use_managed_identity=True):
+    """Retrieve secret from Azure Key Vault."""
+    if use_managed_identity:
+        credential = ManagedIdentityCredential()
+    else:
+        credential = DefaultAzureCredential()
+    client = SecretClient(vault_url=vault_url, credential=credential)
+    return client.get_secret(secret_name).value
+
+def list_secrets(vault_url):
+    """List all secret names (not values)."""
+    credential = DefaultAzureCredential()
+    client = SecretClient(vault_url=vault_url, credential=credential)
+    return [s.name for s in client.list_properties_of_secrets()]
+```
+
+### RBAC vs Access Policies
+
+**RBAC (recommended):**
+- Uses Azure AD roles (`Key Vault Secrets User`, `Key Vault Secrets Officer`)
+- Managed at subscription/resource group/vault level
+- Audit via Azure AD activity logs
+
+**Access Policies (legacy):**
+- Per-vault configuration
+- Object ID based
+- No inheritance from resource group
+
+```bash
+# Assign RBAC role
+az role assignment create \
+  --role "Key Vault Secrets User" \
+  --assignee <service-principal-id> \
+  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault>
+```
+
+### Certificate Management
+
+Azure Key Vault has first-class certificate management with automatic renewal:
+
+```bash
+# Create certificate with auto-renewal
+az keyvault certificate create \
+  --vault-name my-vault \
+  --name api-tls \
+  --policy @cert-policy.json
+
+# cert-policy.json
+{
+  "issuerParameters": {"name": "Self"},
+  "keyProperties": {"keyType": "RSA", "keySize": 2048},
+  "lifetimeActions": [
+    {"action": {"actionType": "AutoRenew"}, "trigger": {"daysBeforeExpiry": 30}}
+  ],
+  "x509CertificateProperties": {
+    "subject": "CN=api.example.com",
+    "validityInMonths": 12
+  }
+}
+```
+
+## GCP Secret Manager
+
+### Access Patterns
+
+```python
+from google.cloud import secretmanager
+
+def get_secret(project_id, secret_id, version="latest"):
+    """Retrieve secret from GCP Secret Manager."""
+    client = secretmanager.SecretManagerServiceClient()
+    name = f"projects/{project_id}/secrets/{secret_id}/versions/{version}"
+    response = client.access_secret_version(request={"name": name})
+    return response.payload.data.decode("UTF-8")
+
+def create_secret(project_id, secret_id, secret_value):
+    """Create a new secret with initial version."""
+    client = secretmanager.SecretManagerServiceClient()
+    parent = f"projects/{project_id}"
+
+    # Create the secret resource
+    secret = client.create_secret(
+        request={
+            "parent": parent,
+            "secret_id": secret_id,
+            "secret": {"replication": {"automatic": {}}},
+        }
+    )
+
+    # Add a version with the secret value
+    client.add_secret_version(
+        request={
+            "parent": secret.name,
+            "payload": {"data": secret_value.encode("UTF-8")},
+        }
+    )
+    return secret.name
+```
+
+### Workload Identity for GKE
+
+Eliminate service account key files by binding Kubernetes service accounts to GCP IAM:
+
+```bash
+# Create IAM binding
+gcloud iam service-accounts add-iam-policy-binding \
+  secret-accessor@my-project.iam.gserviceaccount.com \
+  --role roles/iam.workloadIdentityUser \
+  --member "serviceAccount:my-project.svc.id.goog[namespace/ksa-name]"
+
+# Annotate Kubernetes service account
+kubectl annotate serviceaccount ksa-name \
+  --namespace namespace \
+  iam.gke.io/gcp-service-account=secret-accessor@my-project.iam.gserviceaccount.com
+```
+
+### IAM Policy
+
+```bash
+# Grant secret accessor role to a service account
+gcloud secrets add-iam-policy-binding my-secret \
+  --member="serviceAccount:my-app@my-project.iam.gserviceaccount.com" \
+  --role="roles/secretmanager.secretAccessor"
+```
+
+## Cross-Cloud Patterns
+
+### Abstraction Layer
+
+When operating multi-cloud, create a thin abstraction that normalizes secret access:
+
+```python
+# secret_client.py — cross-cloud abstraction
+class SecretClient:
+    def __init__(self, provider, **kwargs):
+        if provider == "aws":
+            self._client = AWSSecretClient(**kwargs)
+        elif provider == "azure":
+            self._client = AzureSecretClient(**kwargs)
+        elif provider == "gcp":
+            self._client = GCPSecretClient(**kwargs)
+        elif provider == "vault":
+            self._client = VaultSecretClient(**kwargs)
+        else:
+            raise ValueError(f"Unknown provider: {provider}")
+
+    def get(self, key):
+        return self._client.get(key)
+
+    def set(self, key, value):
+        return self._client.set(key, value)
+```
+
+### Migration Strategy
+
+When migrating between providers:
+
+1. **Dual-write phase** — Write to both old and new store simultaneously
+2. **Dual-read phase** — Read from new store, fallback to old
+3. **Cut-over** — Read exclusively from new store
+4. **Cleanup** — Remove secrets from old store after grace period
+
+### Secret Synchronization
+
+For hybrid setups (e.g., Vault as primary, cloud SM for specific workloads):
+
+- Use Vault's cloud secret engines to generate cloud-native credentials dynamically
+- Or use External Secrets Operator to sync from Vault into cloud-native stores
+- Never manually copy secrets between stores — always automate
+
+## Caching and Performance
+
+### Client-Side Caching
+
+All three cloud providers support caching SDKs:
+
+- **AWS:** `aws-secretsmanager-caching-python` — caches with configurable TTL
+- **Azure:** Built-in HTTP caching in SDK, or use Azure App Configuration
+- **GCP:** No official caching library — implement in-process cache with TTL
+
+### Caching Rules
+
+1. Cache TTL should be shorter than rotation period (e.g., cache 5 min if rotating every 30 days)
+2. Implement cache invalidation on secret version change events
+3. Never cache secrets to disk — in-memory only
+4. Log cache hits/misses for debugging rotation issues
+
+## Compliance Mapping
+
+| Requirement | AWS SM | Azure KV | GCP SM | Vault |
+|------------|--------|----------|--------|-------|
+| SOC 2 audit trail | CloudTrail | Monitor logs | Audit Logs | Audit device |
+| HIPAA encryption | KMS (BAA) | HSM (BAA) | CMEK (BAA) | Auto-encrypt |
+| PCI DSS key mgmt | KMS compliance | Premium HSM | CMEK | Transit engine |
+| GDPR data residency | Region selection | Region selection | Region selection | Self-hosted |
+| ISO 27001 | Certified | Certified | Certified | Self-certify |
--- a/engineering/secrets-vault-manager/references/emergency_procedures.md
+++ b/engineering/secrets-vault-manager/references/emergency_procedures.md
@@ -0,0 +1,280 @@
+# Emergency Procedures Reference
+
+## Secret Leak Response Playbook
+
+### Severity Classification
+
+| Severity | Definition | Response Time | Example |
+|----------|-----------|---------------|---------|
+| **P0 — Critical** | Production credentials exposed publicly | Immediate (15 min) | Database password in public GitHub repo |
+| **P1 — High** | Internal credentials exposed beyond intended scope | 1 hour | API key in build logs accessible to wider org |
+| **P2 — Medium** | Non-production credentials exposed | 4 hours | Staging DB password in internal wiki |
+| **P3 — Low** | Expired or limited-scope credential exposed | 24 hours | Rotated API key found in old commit history |
+
+### P0/P1 Response Procedure
+
+**Phase 1: Contain (0-15 minutes)**
+
+1. **Identify the leaked secret**
+   - What credential was exposed? (type, scope, permissions)
+   - Where was it exposed? (repo, log, error page, third-party service)
+   - When was it first exposed? (commit timestamp, log timestamp)
+   - Is the exposure still active? (repo public? log accessible?)
+
+2. **Revoke immediately**
+   - Database password: `ALTER ROLE app_user WITH PASSWORD 'new_password';`
+   - API key: Regenerate via provider console/API
+   - Vault token: `vault token revoke <token>`
+   - AWS access key: `aws iam delete-access-key --access-key-id <key>`
+   - Cloud service account: Delete and recreate key
+   - TLS certificate: Revoke via CA, generate new certificate
+
+3. **Remove exposure**
+   - Public repo: Remove file, force-push to remove from history, request GitHub cache purge
+   - Build logs: Delete log artifacts, rotate CI/CD secrets
+   - Error page: Deploy fix to suppress secret in error output
+   - Third-party: Contact vendor for log purge if applicable
+
+4. **Deploy new credentials**
+   - Update secret store with rotated credential
+   - Restart affected services to pick up new credential
+   - Verify services are healthy with new credential
+
+**Phase 2: Assess (15-60 minutes)**
+
+5. **Audit blast radius**
+   - Query Vault/cloud SM audit logs for the compromised credential
+   - Check for unauthorized usage during the exposure window
+   - Review network logs for suspicious connections from unknown IPs
+   - Check if the compromised credential grants access to other secrets (privilege escalation)
+
+6. **Notify stakeholders**
+   - Security team (always)
+   - Service owners for affected systems
+   - Compliance team if regulated data was potentially accessed
+   - Legal if customer data may have been compromised
+   - Executive leadership for P0 incidents
+
+**Phase 3: Recover (1-24 hours)**
+
+7. **Rotate adjacent credentials**
+   - If the leaked credential could access other secrets, rotate those too
+   - If a Vault token leaked, check what policies it had — rotate everything accessible
+
+8. **Harden against recurrence**
+   - Add pre-commit hook to detect secrets (e.g., `gitleaks`, `detect-secrets`)
+   - Review CI/CD pipeline for secret masking
+   - Audit who has access to the source of the leak
+
+**Phase 4: Post-Mortem (24-72 hours)**
+
+9. **Document incident**
+   - Timeline of events
+   - Root cause analysis
+   - Impact assessment
+   - Remediation actions taken
+   - Preventive measures added
+
+### Response Communication Template
+
+```
+SECURITY INCIDENT — SECRET EXPOSURE
+Severity: P0/P1
+Time detected: YYYY-MM-DD HH:MM UTC
+Secret type: [database password / API key / token / certificate]
+Exposure vector: [public repo / build log / error output / other]
+Status: [CONTAINED / INVESTIGATING / RESOLVED]
+
+Immediate actions taken:
+- [ ] Credential revoked at source
+- [ ] Exposure removed
+- [ ] New credential deployed
+- [ ] Services verified healthy
+- [ ] Audit log review in progress
+
+Blast radius assessment: [PENDING / COMPLETE — no unauthorized access / COMPLETE — unauthorized access detected]
+
+Next update: [time]
+Incident commander: [name]
+```
+
+## Vault Seal/Unseal Procedures
+
+### Understanding Seal Status
+
+Vault uses a **seal** mechanism to protect the encryption key hierarchy. When sealed, Vault cannot decrypt any data or serve any requests.
+
+```
+Sealed State:
+  Vault process running → YES
+  API responding → YES (503 Sealed)
+  Serving secrets → NO
+  All active leases → FROZEN (not revoked)
+  Audit logging → NO
+
+Unsealed State:
+  Vault process running → YES
+  API responding → YES (200 OK)
+  Serving secrets → YES
+  Active leases → RESUMING
+  Audit logging → YES
+```
+
+### When to Seal Vault (Emergency Only)
+
+Seal Vault when:
+- Active intrusion on Vault infrastructure is confirmed
+- Vault server compromise is suspected (unauthorized root access)
+- Encryption key material may have been extracted
+- Regulatory/legal hold requires immediate data access prevention
+
+**Do NOT seal for:**
+- Routine maintenance (use graceful shutdown instead)
+- Single-node issues in HA cluster (let standby take over)
+- Suspected secret leak (revoke the secret, don't seal Vault)
+
+### Seal Procedure
+
+```bash
+# Seal a single node
+vault operator seal
+
+# Seal all nodes (HA cluster)
+# Seal each node individually — leader last
+vault operator seal -address=https://vault-standby-1:8200
+vault operator seal -address=https://vault-standby-2:8200
+vault operator seal -address=https://vault-leader:8200
+```
+
+**Impact of sealing:**
+- All active client connections dropped immediately
+- All token and lease timers paused
+- Applications lose secret access — prepare for cascading failures
+- Monitoring will fire alerts for sealed state
+
+### Unseal Procedure (Shamir Keys)
+
+Requires a quorum of key holders (e.g., 3 of 5).
+
+```bash
+# Each key holder provides their unseal key
+vault operator unseal <key-1>
+vault operator unseal <key-2>
+vault operator unseal <key-3>
+# Vault unseals after reaching threshold
+```
+
+**Operational checklist after unseal:**
+1. Verify health: `vault status` shows `Sealed: false`
+2. Check audit devices: `vault audit list` — confirm all enabled
+3. Check auth methods: `vault auth list`
+4. Verify HA status: `vault operator raft list-peers`
+5. Check lease count: monitor `vault.expire.num_leases`
+6. Verify applications reconnecting (check application logs)
+
+### Unseal Procedure (Auto-Unseal)
+
+If using cloud KMS auto-unseal, Vault unseals automatically on restart:
+
+```bash
+# Restart Vault service
+systemctl restart vault
+
+# Verify unseal (should happen within seconds)
+vault status
+```
+
+**If auto-unseal fails:**
+- Check cloud KMS key permissions (IAM role may have been modified)
+- Check network connectivity to cloud KMS endpoint
+- Check KMS key status (not disabled, not scheduled for deletion)
+- Check Vault logs: `journalctl -u vault -f`
+
+## Mass Credential Rotation Procedure
+
+When a broad compromise requires rotating many credentials simultaneously.
+
+### Pre-Rotation Checklist
+
+- [ ] Identify all credentials in scope
+- [ ] Map credential dependencies (which services use which credentials)
+- [ ] Determine rotation order (databases before applications)
+- [ ] Prepare rollback plan for each credential
+- [ ] Notify all service owners
+- [ ] Schedule maintenance window if zero-downtime not possible
+- [ ] Stage new credentials in secret store (but don't activate yet)
+
+### Rotation Order
+
+1. **Infrastructure credentials** — Database root passwords, cloud IAM admin keys
+2. **Service credentials** — Application database users, API keys
+3. **Integration credentials** — Third-party API keys, webhook secrets
+4. **Human credentials** — Force password reset, revoke SSO sessions
+
+### Rollback Plan
+
+For each credential, document:
+- Previous value (store in sealed emergency envelope or HSM)
+- How to revert (specific command or API call)
+- Verification step (how to confirm old credential works)
+- Maximum time to rollback (SLA)
+
+## Vault Recovery Procedures
+
+### Lost Unseal Keys
+
+If unseal keys are lost and auto-unseal is not configured:
+
+1. **If Vault is currently unsealed:** Enable auto-unseal immediately, then reseal/unseal with KMS
+2. **If Vault is sealed:** Data is irrecoverable without keys. Restore from Raft snapshot backup
+3. **Prevention:** Store unseal keys in separate, secure locations (HSMs, safety deposit boxes). Use auto-unseal for production.
+
+### Raft Cluster Recovery
+
+**Single node failure (cluster still has quorum):**
+```bash
+# Remove failed peer
+vault operator raft remove-peer <failed-node-id>
+
+# Add replacement node
+# (new node joins via retry_join in config)
+```
+
+**Loss of quorum (majority of nodes failed):**
+```bash
+# On a surviving node with recent data
+vault operator raft join -leader-ca-cert=@ca.crt https://surviving-node:8200
+
+# If no node survives, restore from snapshot
+vault operator raft snapshot restore /backups/latest.snap
+```
+
+### Root Token Recovery
+
+If root token is lost (it should be revoked after initial setup):
+
+```bash
+# Generate new root token (requires unseal key quorum)
+vault operator generate-root -init
+# Each key holder provides their key
+vault operator generate-root -nonce=<nonce> <unseal-key>
+# After quorum, decode the encoded token
+vault operator generate-root -decode=<encoded-token> -otp=<otp>
+```
+
+**Best practice:** Generate a root token only when needed, complete the task, then revoke it:
+```bash
+vault token revoke <root-token>
+```
+
+## Incident Severity Escalation Matrix
+
+| Signal | Escalation |
+|--------|-----------|
+| Single secret exposed in internal log | P2 — Rotate secret, add log masking |
+| Secret in public repository (no evidence of use) | P1 — Immediate rotation, history scrub |
+| Secret in public repository (evidence of unauthorized use) | P0 — Full incident response, legal notification |
+| Vault node compromised | P0 — Seal cluster, rotate all accessible secrets |
+| Cloud KMS key compromised | P0 — Create new key, re-encrypt all secrets, rotate all credentials |
+| Audit log gap detected | P1 — Investigate cause, assume worst case for gap period |
+| Multiple failed auth attempts from unknown source | P2 — Block source, investigate, rotate targeted credentials |
--- a/engineering/secrets-vault-manager/references/vault_patterns.md
+++ b/engineering/secrets-vault-manager/references/vault_patterns.md
@@ -0,0 +1,342 @@
+# HashiCorp Vault Architecture & Patterns Reference
+
+## Architecture Overview
+
+Vault operates as a centralized secret management service with a client-server model. All secrets are encrypted at rest and in transit. The seal/unseal mechanism protects the master encryption key.
+
+### Core Components
+
+```
+┌─────────────────────────────────────────────────┐
+│                   Vault Cluster                  │
+│  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
+│  │  Leader    │  │ Standby   │  │ Standby   │   │
+│  │  (active)  │  │ (forward) │  │ (forward) │   │
+│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘   │
+│        │               │               │         │
+│  ┌─────┴───────────────┴───────────────┴─────┐   │
+│  │            Raft Storage Backend            │   │
+│  └───────────────────────────────────────────┘   │
+│                                                   │
+│  ┌──────────┐  ┌──────────┐  ┌──────────────┐   │
+│  │ Auth     │  │ Secret   │  │ Audit        │   │
+│  │ Methods  │  │ Engines  │  │ Devices      │   │
+│  └──────────┘  └──────────┘  └──────────────┘   │
+└─────────────────────────────────────────────────┘
+```
+
+### Storage Backend Selection
+
+| Backend | HA Support | Operational Complexity | Recommendation |
+|---------|-----------|----------------------|----------------|
+| Integrated Raft | Yes | Low | **Default choice** — no external dependencies |
+| Consul | Yes | Medium | Legacy — use Raft unless already running Consul |
+| S3/GCS/Azure Blob | No | Low | Dev/test only — no HA |
+| PostgreSQL/MySQL | No | Medium | Not recommended — no HA, added dependency |
+
+## High Availability Setup
+
+### Raft Cluster Configuration
+
+Minimum 3 nodes for production (tolerates 1 failure). 5 nodes for critical workloads (tolerates 2 failures).
+
+```hcl
+# vault-config.hcl (per node)
+storage "raft" {
+  path    = "/opt/vault/data"
+  node_id = "vault-1"
+
+  retry_join {
+    leader_api_addr = "https://vault-2.internal:8200"
+  }
+  retry_join {
+    leader_api_addr = "https://vault-3.internal:8200"
+  }
+}
+
+listener "tcp" {
+  address       = "0.0.0.0:8200"
+  tls_cert_file = "/opt/vault/tls/vault.crt"
+  tls_key_file  = "/opt/vault/tls/vault.key"
+}
+
+api_addr     = "https://vault-1.internal:8200"
+cluster_addr = "https://vault-1.internal:8201"
+```
+
+### Auto-Unseal with AWS KMS
+
+Eliminates manual unseal key management. Vault encrypts its master key with the KMS key.
+
+```hcl
+seal "awskms" {
+  region     = "us-east-1"
+  kms_key_id = "alias/vault-unseal"
+}
+```
+
+**Requirements:**
+- IAM role with `kms:Encrypt`, `kms:Decrypt`, `kms:DescribeKey` permissions
+- KMS key must be in the same region or accessible cross-region
+- KMS key should have restricted access — only Vault nodes
+
+### Auto-Unseal with Azure Key Vault
+
+```hcl
+seal "azurekeyvault" {
+  tenant_id  = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
+  vault_name = "vault-unseal-kv"
+  key_name   = "vault-unseal-key"
+}
+```
+
+### Auto-Unseal with GCP KMS
+
+```hcl
+seal "gcpckms" {
+  project    = "my-project"
+  region     = "global"
+  key_ring   = "vault-keyring"
+  crypto_key = "vault-unseal-key"
+}
+```
+
+## Namespaces (Enterprise)
+
+Namespaces provide tenant isolation within a single Vault cluster. Each namespace has independent policies, auth methods, and secret engines.
+
+```
+root/
+├── dev/           # Development environment
+│   ├── auth/
+│   └── secret/
+├── staging/       # Staging environment
+│   ├── auth/
+│   └── secret/
+└── production/    # Production environment
+    ├── auth/
+    └── secret/
+```
+
+**OSS alternative:** Use path-based isolation with strict policies. Prefix all paths with environment name (e.g., `secret/data/production/...`).
+
+## Policy Patterns
+
+### Templated Policies
+
+Use identity-based templates for scalable policy management:
+
+```hcl
+# Allow entities to manage their own secrets
+path "secret/data/{{identity.entity.name}}/*" {
+  capabilities = ["create", "read", "update", "delete"]
+}
+
+# Read shared config for the entity's group
+path "secret/data/shared/{{identity.groups.names}}/*" {
+  capabilities = ["read"]
+}
+```
+
+### Sentinel Policies (Enterprise)
+
+Enforce governance rules beyond path-based access:
+
+```python
+# Require MFA for production secret writes
+import "mfa"
+
+main = rule {
+  request.path matches "secret/data/production/.*" and
+  request.operation in ["create", "update", "delete"] and
+  mfa.methods.totp.valid
+}
+```
+
+### Policy Hierarchy
+
+1. **Global deny** — Explicit deny on `sys/*`, `auth/token/create-orphan`
+2. **Environment base** — Read access to environment-specific paths
+3. **Service-specific** — Scoped to exact paths the service needs
+4. **Admin override** — Requires MFA, time-limited, audit-heavy
+
+## Secret Engine Configuration
+
+### KV v2 (Versioned Key-Value)
+
+```bash
+# Enable with custom config
+vault secrets enable -path=secret -version=2 kv
+
+# Configure version retention
+vault write secret/config max_versions=10 cas_required=true delete_version_after=90d
+```
+
+**Check-and-Set (CAS):** Prevents accidental overwrites. Client must supply the current version number to update.
+
+### Database Engine
+
+```bash
+# Enable and configure PostgreSQL
+vault secrets enable database
+
+vault write database/config/postgres \
+  plugin_name=postgresql-database-plugin \
+  connection_url="postgresql://{{username}}:{{password}}@db.internal:5432/app?sslmode=require" \
+  allowed_roles="app-readonly,app-readwrite" \
+  username="vault_admin" \
+  password="INITIAL_PASSWORD"
+
+# Rotate the root password (Vault manages it from now on)
+vault write -f database/rotate-root/postgres
+
+# Create a read-only role
+vault write database/roles/app-readonly \
+  db_name=postgres \
+  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
+  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
+  default_ttl=1h \
+  max_ttl=24h
+```
+
+### PKI Engine (Certificate Authority)
+
+```bash
+# Enable PKI engine
+vault secrets enable -path=pki pki
+vault secrets tune -max-lease-ttl=87600h pki
+
+# Generate root CA
+vault write -field=certificate pki/root/generate/internal \
+  common_name="Example Root CA" \
+  ttl=87600h > root_ca.crt
+
+# Enable intermediate CA
+vault secrets enable -path=pki_int pki
+vault secrets tune -max-lease-ttl=43800h pki_int
+
+# Generate intermediate CSR
+vault write -field=csr pki_int/intermediate/generate/internal \
+  common_name="Example Intermediate CA" > intermediate.csr
+
+# Sign with root CA
+vault write -field=certificate pki/root/sign-intermediate \
+  csr=@intermediate.csr format=pem_bundle ttl=43800h > intermediate.crt
+
+# Set signed certificate
+vault write pki_int/intermediate/set-signed certificate=@intermediate.crt
+
+# Create role for leaf certificates
+vault write pki_int/roles/web-server \
+  allowed_domains="example.com" \
+  allow_subdomains=true \
+  max_ttl=2160h
+```
+
+### Transit Engine (Encryption-as-a-Service)
+
+```bash
+vault secrets enable transit
+
+# Create encryption key
+vault write -f transit/keys/payment-data \
+  type=aes256-gcm96
+
+# Encrypt data
+vault write transit/encrypt/payment-data \
+  plaintext=$(echo "sensitive-data" | base64)
+
+# Decrypt data
+vault write transit/decrypt/payment-data \
+  ciphertext="vault:v1:..."
+
+# Rotate key (old versions still decrypt, new encrypts with latest)
+vault write -f transit/keys/payment-data/rotate
+
+# Rewrap ciphertext to latest key version
+vault write transit/rewrap/payment-data \
+  ciphertext="vault:v1:..."
+```
+
+## Performance and Scaling
+
+### Performance Replication (Enterprise)
+
+Primary cluster replicates to secondary clusters in other regions. Secondaries handle read traffic locally.
+
+### Performance Standbys (Enterprise)
+
+Standby nodes serve read requests without forwarding to the leader, reducing leader load.
+
+### Response Wrapping
+
+Wrap sensitive responses in a single-use token — the recipient unwraps exactly once:
+
+```bash
+# Wrap a secret (TTL = 5 minutes)
+vault kv get -wrap-ttl=5m secret/data/production/db-creds
+
+# Recipient unwraps
+vault unwrap <wrapping_token>
+```
+
+### Batch Tokens
+
+For high-throughput workloads (Lambda, serverless), use batch tokens instead of service tokens. Batch tokens are not persisted to storage, reducing I/O.
+
+## Monitoring and Health
+
+### Key Metrics
+
+| Metric | Alert Threshold | Source |
+|--------|----------------|--------|
+| `vault.core.unsealed` | 0 (sealed) | Telemetry |
+| `vault.expire.num_leases` | >10,000 | Telemetry |
+| `vault.audit.log_response` | Error rate >1% | Telemetry |
+| `vault.runtime.alloc_bytes` | >80% memory | Telemetry |
+| `vault.raft.leader.lastContact` | >500ms | Telemetry |
+| `vault.token.count` | >50,000 | Telemetry |
+
+### Health Check Endpoint
+
+```bash
+# Returns 200 if initialized, unsealed, and active
+curl -s https://vault.internal:8200/v1/sys/health
+
+# Status codes:
+# 200 — initialized, unsealed, active
+# 429 — unsealed, standby
+# 472 — disaster recovery secondary
+# 473 — performance standby
+# 501 — not initialized
+# 503 — sealed
+```
+
+## Disaster Recovery
+
+### Backup
+
+```bash
+# Raft snapshot (includes all data)
+vault operator raft snapshot save backup-$(date +%Y%m%d).snap
+
+# Schedule daily backups via cron
+0 2 * * * /usr/local/bin/vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap
+```
+
+### Restore
+
+```bash
+# Restore from snapshot (causes brief outage)
+vault operator raft snapshot restore backup-20260320.snap
+```
+
+### DR Replication (Enterprise)
+
+Secondary cluster in standby. Promote on primary failure:
+
+```bash
+# On DR secondary
+vault operator generate-root -dr-token
+vault write sys/replication/dr/secondary/promote dr_operation_token=<token>
+```
--- a/engineering/secrets-vault-manager/scripts/audit_log_analyzer.py
+++ b/engineering/secrets-vault-manager/scripts/audit_log_analyzer.py
@@ -0,0 +1,330 @@
+#!/usr/bin/env python3
+"""Analyze Vault or cloud secret manager audit logs for anomalies.
+
+Reads JSON-lines or JSON-array audit log files and flags unusual access
+patterns including volume spikes, off-hours access, new source IPs,
+and failed authentication attempts.
+
+Usage:
+    python audit_log_analyzer.py --log-file vault-audit.log --threshold 5
+    python audit_log_analyzer.py --log-file audit.json --threshold 3 --json
+
+Expected log entry format (JSON lines or JSON array):
+{
+  "timestamp": "2026-03-20T14:32:00Z",
+  "type": "request",
+  "auth": {"accessor": "token-abc123", "entity_id": "eid-001", "display_name": "approle-payment-svc"},
+  "request": {"path": "secret/data/production/payment/api-keys", "operation": "read"},
+  "response": {"status_code": 200},
+  "remote_address": "10.0.1.15"
+}
+
+Fields are optional — the analyzer works with whatever is available.
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from collections import defaultdict
+from datetime import datetime
+
+
+def load_logs(path):
+    """Load audit log entries from file. Supports JSON lines and JSON array."""
+    entries = []
+    try:
+        with open(path, "r") as f:
+            content = f.read().strip()
+    except FileNotFoundError:
+        print(f"ERROR: Log file not found: {path}", file=sys.stderr)
+        sys.exit(1)
+
+    if not content:
+        return entries
+
+    # Try JSON array first
+    if content.startswith("["):
+        try:
+            entries = json.loads(content)
+            return entries
+        except json.JSONDecodeError:
+            pass
+
+    # Try JSON lines
+    for i, line in enumerate(content.split("\n"), 1):
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            entries.append(json.loads(line))
+        except json.JSONDecodeError:
+            print(f"WARNING: Skipping malformed line {i}", file=sys.stderr)
+
+    return entries
+
+
+def extract_fields(entry):
+    """Extract normalized fields from a log entry."""
+    timestamp_raw = entry.get("timestamp", entry.get("time", ""))
+    ts = None
+    if timestamp_raw:
+        for fmt in ("%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%dT%H:%M:%S.%fZ", "%Y-%m-%dT%H:%M:%S%z", "%Y-%m-%d %H:%M:%S"):
+            try:
+                ts = datetime.strptime(timestamp_raw.replace("+00:00", "Z").rstrip("Z") + "Z", fmt.rstrip("Z") + "Z") if "Z" not in fmt else datetime.strptime(timestamp_raw, fmt)
+                break
+            except (ValueError, TypeError):
+                continue
+        if ts is None:
+            # Fallback: try basic parse
+            try:
+                ts = datetime.fromisoformat(timestamp_raw.replace("Z", "+00:00").replace("+00:00", ""))
+            except (ValueError, TypeError):
+                pass
+
+    auth = entry.get("auth", {})
+    request = entry.get("request", {})
+    response = entry.get("response", {})
+
+    return {
+        "timestamp": ts,
+        "hour": ts.hour if ts else None,
+        "identity": auth.get("display_name", auth.get("entity_id", "unknown")),
+        "path": request.get("path", entry.get("path", "unknown")),
+        "operation": request.get("operation", entry.get("operation", "unknown")),
+        "status_code": response.get("status_code", entry.get("status_code")),
+        "remote_address": entry.get("remote_address", entry.get("source_address", "unknown")),
+        "entry_type": entry.get("type", "unknown"),
+    }
+
+
+def analyze(entries, threshold):
+    """Run anomaly detection across all log entries."""
+    parsed = [extract_fields(e) for e in entries]
+
+    # Counters
+    access_by_identity = defaultdict(int)
+    access_by_path = defaultdict(int)
+    access_by_ip = defaultdict(set)        # identity -> set of IPs
+    ip_to_identities = defaultdict(set)    # IP -> set of identities
+    failed_by_source = defaultdict(int)
+    off_hours_access = []
+    path_by_identity = defaultdict(set)    # identity -> set of paths
+    hourly_distribution = defaultdict(int)
+
+    for p in parsed:
+        identity = p["identity"]
+        path = p["path"]
+        ip = p["remote_address"]
+        status = p["status_code"]
+        hour = p["hour"]
+
+        access_by_identity[identity] += 1
+        access_by_path[path] += 1
+        access_by_ip[identity].add(ip)
+        ip_to_identities[ip].add(identity)
+        path_by_identity[identity].add(path)
+
+        if hour is not None:
+            hourly_distribution[hour] += 1
+
+        # Failed access (non-200 or 4xx/5xx)
+        if status and (status >= 400 or status == 0):
+            failed_by_source[f"{identity}@{ip}"] += 1
+
+        # Off-hours: before 6 AM or after 10 PM
+        if hour is not None and (hour < 6 or hour >= 22):
+            off_hours_access.append(p)
+
+    # Build anomalies
+    anomalies = []
+
+    # 1. Volume spikes — identities accessing secrets more than threshold * average
+    if access_by_identity:
+        avg_access = sum(access_by_identity.values()) / len(access_by_identity)
+        spike_threshold = max(threshold * avg_access, threshold)
+        for identity, count in access_by_identity.items():
+            if count >= spike_threshold:
+                anomalies.append({
+                    "type": "volume_spike",
+                    "severity": "HIGH",
+                    "identity": identity,
+                    "access_count": count,
+                    "threshold": round(spike_threshold, 1),
+                    "description": f"Identity '{identity}' made {count} accesses (threshold: {round(spike_threshold, 1)})",
+                })
+
+    # 2. Multi-IP access — single identity from many IPs
+    for identity, ips in access_by_ip.items():
+        if len(ips) >= threshold:
+            anomalies.append({
+                "type": "multi_ip_access",
+                "severity": "MEDIUM",
+                "identity": identity,
+                "ip_count": len(ips),
+                "ips": sorted(ips),
+                "description": f"Identity '{identity}' accessed from {len(ips)} different IPs",
+            })
+
+    # 3. Failed access attempts
+    for source, count in failed_by_source.items():
+        if count >= threshold:
+            anomalies.append({
+                "type": "failed_access",
+                "severity": "HIGH",
+                "source": source,
+                "failure_count": count,
+                "description": f"Source '{source}' had {count} failed access attempts",
+            })
+
+    # 4. Off-hours access
+    if off_hours_access:
+        off_hours_identities = defaultdict(int)
+        for p in off_hours_access:
+            off_hours_identities[p["identity"]] += 1
+
+        for identity, count in off_hours_identities.items():
+            if count >= max(threshold, 2):
+                anomalies.append({
+                    "type": "off_hours_access",
+                    "severity": "MEDIUM",
+                    "identity": identity,
+                    "access_count": count,
+                    "description": f"Identity '{identity}' made {count} accesses outside business hours (before 6 AM / after 10 PM)",
+                })
+
+    # 5. Broad path access — single identity touching many paths
+    for identity, paths in path_by_identity.items():
+        if len(paths) >= threshold * 2:
+            anomalies.append({
+                "type": "broad_access",
+                "severity": "MEDIUM",
+                "identity": identity,
+                "path_count": len(paths),
+                "paths": sorted(paths)[:10],
+                "description": f"Identity '{identity}' accessed {len(paths)} distinct secret paths",
+            })
+
+    # Sort anomalies by severity
+    severity_order = {"CRITICAL": 0, "HIGH": 1, "MEDIUM": 2, "LOW": 3}
+    anomalies.sort(key=lambda x: severity_order.get(x["severity"], 4))
+
+    # Summary stats
+    summary = {
+        "total_entries": len(entries),
+        "parsed_entries": len(parsed),
+        "unique_identities": len(access_by_identity),
+        "unique_paths": len(access_by_path),
+        "unique_source_ips": len(ip_to_identities),
+        "total_failures": sum(failed_by_source.values()),
+        "off_hours_events": len(off_hours_access),
+        "anomalies_found": len(anomalies),
+    }
+
+    # Top accessed paths
+    top_paths = sorted(access_by_path.items(), key=lambda x: -x[1])[:10]
+
+    return {
+        "summary": summary,
+        "anomalies": anomalies,
+        "top_accessed_paths": [{"path": p, "count": c} for p, c in top_paths],
+        "hourly_distribution": dict(sorted(hourly_distribution.items())),
+    }
+
+
+def print_human(result, threshold):
+    """Print human-readable analysis report."""
+    summary = result["summary"]
+    anomalies = result["anomalies"]
+
+    print("=== Audit Log Analysis Report ===")
+    print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
+    print(f"Anomaly threshold: {threshold}")
+    print()
+
+    print("--- Summary ---")
+    print(f"  Total log entries:     {summary['total_entries']}")
+    print(f"  Unique identities:     {summary['unique_identities']}")
+    print(f"  Unique secret paths:   {summary['unique_paths']}")
+    print(f"  Unique source IPs:     {summary['unique_source_ips']}")
+    print(f"  Total failures:        {summary['total_failures']}")
+    print(f"  Off-hours events:      {summary['off_hours_events']}")
+    print(f"  Anomalies detected:    {summary['anomalies_found']}")
+    print()
+
+    if anomalies:
+        print("--- Anomalies ---")
+        for i, a in enumerate(anomalies, 1):
+            print(f"  [{a['severity']}] {a['type']}: {a['description']}")
+        print()
+    else:
+        print("--- No anomalies detected ---")
+        print()
+
+    if result["top_accessed_paths"]:
+        print("--- Top Accessed Paths ---")
+        for item in result["top_accessed_paths"]:
+            print(f"  {item['count']:5d}  {item['path']}")
+        print()
+
+    if result["hourly_distribution"]:
+        print("--- Hourly Distribution ---")
+        max_count = max(result["hourly_distribution"].values()) if result["hourly_distribution"] else 1
+        for hour in range(24):
+            count = result["hourly_distribution"].get(hour, 0)
+            bar_len = int((count / max_count) * 40) if max_count > 0 else 0
+            marker = " *" if (hour < 6 or hour >= 22) else ""
+            print(f"  {hour:02d}:00  {'#' * bar_len:40s}  {count}{marker}")
+        print("  (* = off-hours)")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Analyze Vault/cloud secret manager audit logs for anomalies.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=textwrap.dedent("""\
+            The analyzer detects:
+              - Volume spikes (identity accessing secrets above threshold * average)
+              - Multi-IP access (single identity from many source IPs)
+              - Failed access attempts (repeated auth/access failures)
+              - Off-hours access (before 6 AM or after 10 PM)
+              - Broad path access (single identity accessing many distinct paths)
+
+            Log format: JSON lines or JSON array. Each entry should include
+            timestamp, auth info, request path/operation, response status,
+            and remote address. Missing fields are handled gracefully.
+
+            Examples:
+              %(prog)s --log-file vault-audit.log --threshold 5
+              %(prog)s --log-file audit.json --threshold 3 --json
+        """),
+    )
+    parser.add_argument("--log-file", required=True, help="Path to audit log file (JSON lines or JSON array)")
+    parser.add_argument(
+        "--threshold",
+        type=int,
+        default=5,
+        help="Anomaly sensitivity threshold — lower = more sensitive (default: 5)",
+    )
+    parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
+
+    args = parser.parse_args()
+
+    entries = load_logs(args.log_file)
+    if not entries:
+        print("No log entries found in file.", file=sys.stderr)
+        sys.exit(1)
+
+    result = analyze(entries, args.threshold)
+    result["log_file"] = args.log_file
+    result["threshold"] = args.threshold
+    result["analyzed_at"] = datetime.now().isoformat()
+
+    if args.json_output:
+        print(json.dumps(result, indent=2))
+    else:
+        print_human(result, args.threshold)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/secrets-vault-manager/scripts/rotation_planner.py
+++ b/engineering/secrets-vault-manager/scripts/rotation_planner.py
@@ -0,0 +1,280 @@
+#!/usr/bin/env python3
+"""Create a rotation schedule from a secret inventory file.
+
+Reads a JSON inventory of secrets and produces a rotation plan based on
+the selected policy (30d, 60d, 90d) with urgency classification.
+
+Usage:
+    python rotation_planner.py --inventory secrets.json --policy 30d
+    python rotation_planner.py --inventory secrets.json --policy 90d --json
+
+Inventory file format (JSON):
+[
+  {
+    "name": "prod-db-password",
+    "type": "database",
+    "store": "vault",
+    "last_rotated": "2026-01-15",
+    "owner": "platform-team",
+    "environment": "production"
+  },
+  ...
+]
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from datetime import datetime, timedelta
+
+
+POLICY_DAYS = {
+    "30d": 30,
+    "60d": 60,
+    "90d": 90,
+}
+
+# Default rotation period by secret type if not overridden by policy
+TYPE_DEFAULTS = {
+    "database": 30,
+    "api-key": 90,
+    "tls-certificate": 60,
+    "ssh-key": 90,
+    "service-token": 1,
+    "encryption-key": 90,
+    "oauth-secret": 90,
+    "password": 30,
+}
+
+URGENCY_THRESHOLDS = {
+    "critical": 0,    # Already overdue
+    "high": 7,         # Due within 7 days
+    "medium": 14,      # Due within 14 days
+    "low": 30,         # Due within 30 days
+}
+
+
+def load_inventory(path):
+    """Load and validate secret inventory from JSON file."""
+    try:
+        with open(path, "r") as f:
+            data = json.load(f)
+    except FileNotFoundError:
+        print(f"ERROR: Inventory file not found: {path}", file=sys.stderr)
+        sys.exit(1)
+    except json.JSONDecodeError as e:
+        print(f"ERROR: Invalid JSON in {path}: {e}", file=sys.stderr)
+        sys.exit(1)
+
+    if not isinstance(data, list):
+        print("ERROR: Inventory must be a JSON array of secret objects", file=sys.stderr)
+        sys.exit(1)
+
+    validated = []
+    for i, entry in enumerate(data):
+        if not isinstance(entry, dict):
+            print(f"WARNING: Skipping entry {i} — not an object", file=sys.stderr)
+            continue
+
+        name = entry.get("name", f"unnamed-{i}")
+        secret_type = entry.get("type", "unknown")
+        last_rotated = entry.get("last_rotated")
+
+        if not last_rotated:
+            print(f"WARNING: '{name}' has no last_rotated date — marking as overdue", file=sys.stderr)
+            last_rotated_dt = None
+        else:
+            try:
+                last_rotated_dt = datetime.strptime(last_rotated, "%Y-%m-%d")
+            except ValueError:
+                print(f"WARNING: '{name}' has invalid date '{last_rotated}' — marking as overdue", file=sys.stderr)
+                last_rotated_dt = None
+
+        validated.append({
+            "name": name,
+            "type": secret_type,
+            "store": entry.get("store", "unknown"),
+            "last_rotated": last_rotated_dt,
+            "owner": entry.get("owner", "unassigned"),
+            "environment": entry.get("environment", "unknown"),
+        })
+
+    return validated
+
+
+def compute_schedule(inventory, policy_days):
+    """Compute rotation schedule for each secret."""
+    now = datetime.now()
+    schedule = []
+
+    for secret in inventory:
+        # Determine rotation interval
+        type_default = TYPE_DEFAULTS.get(secret["type"], 90)
+        rotation_interval = min(policy_days, type_default)
+
+        if secret["last_rotated"] is None:
+            days_since = 999
+            next_rotation = now  # Immediate
+            days_until = -999
+        else:
+            days_since = (now - secret["last_rotated"]).days
+            next_rotation = secret["last_rotated"] + timedelta(days=rotation_interval)
+            days_until = (next_rotation - now).days
+
+        # Classify urgency
+        if days_until <= URGENCY_THRESHOLDS["critical"]:
+            urgency = "CRITICAL"
+        elif days_until <= URGENCY_THRESHOLDS["high"]:
+            urgency = "HIGH"
+        elif days_until <= URGENCY_THRESHOLDS["medium"]:
+            urgency = "MEDIUM"
+        else:
+            urgency = "LOW"
+
+        schedule.append({
+            "name": secret["name"],
+            "type": secret["type"],
+            "store": secret["store"],
+            "owner": secret["owner"],
+            "environment": secret["environment"],
+            "last_rotated": secret["last_rotated"].strftime("%Y-%m-%d") if secret["last_rotated"] else "NEVER",
+            "rotation_interval_days": rotation_interval,
+            "next_rotation": next_rotation.strftime("%Y-%m-%d"),
+            "days_until_due": days_until,
+            "days_since_rotation": days_since,
+            "urgency": urgency,
+        })
+
+    # Sort by urgency (critical first), then by days until due
+    urgency_order = {"CRITICAL": 0, "HIGH": 1, "MEDIUM": 2, "LOW": 3}
+    schedule.sort(key=lambda x: (urgency_order.get(x["urgency"], 4), x["days_until_due"]))
+
+    return schedule
+
+
+def build_summary(schedule):
+    """Build summary statistics."""
+    total = len(schedule)
+    by_urgency = {}
+    by_type = {}
+    by_owner = {}
+
+    for entry in schedule:
+        urg = entry["urgency"]
+        by_urgency[urg] = by_urgency.get(urg, 0) + 1
+        t = entry["type"]
+        by_type[t] = by_type.get(t, 0) + 1
+        o = entry["owner"]
+        by_owner[o] = by_owner.get(o, 0) + 1
+
+    return {
+        "total_secrets": total,
+        "by_urgency": by_urgency,
+        "by_type": by_type,
+        "by_owner": by_owner,
+        "overdue_count": by_urgency.get("CRITICAL", 0),
+        "due_within_7d": by_urgency.get("HIGH", 0),
+    }
+
+
+def print_human(schedule, summary, policy):
+    """Print human-readable rotation plan."""
+    print(f"=== Secret Rotation Plan (Policy: {policy}) ===")
+    print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
+    print(f"Total secrets: {summary['total_secrets']}")
+    print()
+
+    print("--- Urgency Summary ---")
+    for urg in ["CRITICAL", "HIGH", "MEDIUM", "LOW"]:
+        count = summary["by_urgency"].get(urg, 0)
+        if count > 0:
+            print(f"  {urg:10s}  {count}")
+    print()
+
+    if not schedule:
+        print("No secrets in inventory.")
+        return
+
+    print("--- Rotation Schedule ---")
+    print(f"  {'Name':30s}  {'Type':15s}  {'Urgency':10s}  {'Last Rotated':12s}  {'Next Due':12s}  {'Owner'}")
+    print(f"  {'-'*30}  {'-'*15}  {'-'*10}  {'-'*12}  {'-'*12}  {'-'*15}")
+
+    for entry in schedule:
+        overdue_marker = " **OVERDUE**" if entry["urgency"] == "CRITICAL" else ""
+        print(
+            f"  {entry['name']:30s}  {entry['type']:15s}  {entry['urgency']:10s}  "
+            f"{entry['last_rotated']:12s}  {entry['next_rotation']:12s}  "
+            f"{entry['owner']}{overdue_marker}"
+        )
+
+    print()
+    print("--- Action Items ---")
+    critical = [e for e in schedule if e["urgency"] == "CRITICAL"]
+    high = [e for e in schedule if e["urgency"] == "HIGH"]
+
+    if critical:
+        print(f"  IMMEDIATE: Rotate {len(critical)} overdue secret(s):")
+        for e in critical:
+            print(f"    - {e['name']} ({e['type']}, owner: {e['owner']})")
+    if high:
+        print(f"  THIS WEEK: Rotate {len(high)} secret(s) due within 7 days:")
+        for e in high:
+            print(f"    - {e['name']} (due: {e['next_rotation']}, owner: {e['owner']})")
+    if not critical and not high:
+        print("  No urgent rotations needed.")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Create rotation schedule from a secret inventory file.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=textwrap.dedent("""\
+            Policies:
+              30d   Aggressive — all secrets rotate within 30 days max
+              60d   Standard — 60-day maximum rotation window
+              90d   Relaxed — 90-day maximum rotation window
+
+            Note: Some secret types (e.g., database passwords) have shorter
+            built-in defaults that override the policy maximum.
+
+            Example inventory file (secrets.json):
+            [
+              {"name": "prod-db", "type": "database", "store": "vault",
+               "last_rotated": "2026-01-15", "owner": "platform-team",
+               "environment": "production"}
+            ]
+        """),
+    )
+    parser.add_argument("--inventory", required=True, help="Path to JSON inventory file")
+    parser.add_argument(
+        "--policy",
+        required=True,
+        choices=["30d", "60d", "90d"],
+        help="Rotation policy (maximum rotation interval)",
+    )
+    parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
+
+    args = parser.parse_args()
+
+    policy_days = POLICY_DAYS[args.policy]
+    inventory = load_inventory(args.inventory)
+    schedule = compute_schedule(inventory, policy_days)
+    summary = build_summary(schedule)
+
+    result = {
+        "policy": args.policy,
+        "policy_days": policy_days,
+        "generated_at": datetime.now().isoformat(),
+        "summary": summary,
+        "schedule": schedule,
+    }
+
+    if args.json_output:
+        print(json.dumps(result, indent=2))
+    else:
+        print_human(schedule, summary, args.policy)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/secrets-vault-manager/scripts/vault_config_generator.py
+++ b/engineering/secrets-vault-manager/scripts/vault_config_generator.py
@@ -0,0 +1,302 @@
+#!/usr/bin/env python3
+"""Generate Vault policy and auth configuration from application requirements.
+
+Produces HCL policy files and auth method setup commands for HashiCorp Vault
+based on application name, auth method, and required secret paths.
+
+Usage:
+    python vault_config_generator.py --app-name payment-service --auth-method approle --secrets "db-creds,api-key,tls-cert"
+    python vault_config_generator.py --app-name api-gateway --auth-method kubernetes --secrets "db-creds" --namespace production --json
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from datetime import datetime
+
+
+# Default TTLs by auth method
+AUTH_METHOD_DEFAULTS = {
+    "approle": {
+        "token_ttl": "1h",
+        "token_max_ttl": "4h",
+        "secret_id_num_uses": 1,
+        "secret_id_ttl": "10m",
+    },
+    "kubernetes": {
+        "token_ttl": "1h",
+        "token_max_ttl": "4h",
+    },
+    "oidc": {
+        "token_ttl": "8h",
+        "token_max_ttl": "12h",
+    },
+}
+
+# Secret type templates
+SECRET_TYPE_MAP = {
+    "db-creds": {
+        "engine": "database",
+        "path": "database/creds/{app}-readonly",
+        "capabilities": ["read"],
+        "description": "Dynamic database credentials",
+    },
+    "db-admin": {
+        "engine": "database",
+        "path": "database/creds/{app}-readwrite",
+        "capabilities": ["read"],
+        "description": "Dynamic database admin credentials",
+    },
+    "api-key": {
+        "engine": "kv-v2",
+        "path": "secret/data/{env}/{app}/api-keys",
+        "capabilities": ["read"],
+        "description": "Static API keys (KV v2)",
+    },
+    "tls-cert": {
+        "engine": "pki",
+        "path": "pki/issue/{app}-cert",
+        "capabilities": ["create", "update"],
+        "description": "TLS certificate issuance",
+    },
+    "encryption": {
+        "engine": "transit",
+        "path": "transit/encrypt/{app}-key",
+        "capabilities": ["update"],
+        "description": "Transit encryption operations",
+    },
+    "ssh-cert": {
+        "engine": "ssh",
+        "path": "ssh/sign/{app}-role",
+        "capabilities": ["create", "update"],
+        "description": "SSH certificate signing",
+    },
+    "config": {
+        "engine": "kv-v2",
+        "path": "secret/data/{env}/{app}/config",
+        "capabilities": ["read"],
+        "description": "Application configuration secrets",
+    },
+}
+
+
+def parse_secrets(secrets_str):
+    """Parse comma-separated secret types into list."""
+    secrets = [s.strip() for s in secrets_str.split(",") if s.strip()]
+    valid = []
+    unknown = []
+    for s in secrets:
+        if s in SECRET_TYPE_MAP:
+            valid.append(s)
+        else:
+            unknown.append(s)
+    return valid, unknown
+
+
+def generate_policy_hcl(app_name, secrets, environment="production"):
+    """Generate HCL policy document."""
+    lines = [
+        f'# Vault policy for {app_name}',
+        f'# Generated: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}',
+        f'# Environment: {environment}',
+        '',
+    ]
+
+    for secret_type in secrets:
+        tmpl = SECRET_TYPE_MAP[secret_type]
+        path = tmpl["path"].format(app=app_name, env=environment)
+        caps = ", ".join(f'"{c}"' for c in tmpl["capabilities"])
+
+        lines.append(f'# {tmpl["description"]}')
+        lines.append(f'path "{path}" {{')
+        lines.append(f'  capabilities = [{caps}]')
+        lines.append('}')
+        lines.append('')
+
+    # Always deny sys paths
+    lines.append('# Deny admin paths')
+    lines.append('path "sys/*" {')
+    lines.append('  capabilities = ["deny"]')
+    lines.append('}')
+
+    return "\n".join(lines)
+
+
+def generate_auth_config(app_name, auth_method, policy_name, namespace=None):
+    """Generate auth method setup commands."""
+    commands = []
+    defaults = AUTH_METHOD_DEFAULTS.get(auth_method, {})
+
+    if auth_method == "approle":
+        cmd = (
+            f"vault write auth/approle/role/{app_name} \\\n"
+            f"  token_ttl={defaults['token_ttl']} \\\n"
+            f"  token_max_ttl={defaults['token_max_ttl']} \\\n"
+            f"  secret_id_num_uses={defaults['secret_id_num_uses']} \\\n"
+            f"  secret_id_ttl={defaults['secret_id_ttl']} \\\n"
+            f"  token_policies=\"{policy_name}\""
+        )
+        commands.append({"description": f"Create AppRole for {app_name}", "command": cmd})
+
+        commands.append({
+            "description": "Fetch RoleID",
+            "command": f"vault read auth/approle/role/{app_name}/role-id",
+        })
+        commands.append({
+            "description": "Generate SecretID (single-use)",
+            "command": f"vault write -f auth/approle/role/{app_name}/secret-id",
+        })
+
+    elif auth_method == "kubernetes":
+        ns = namespace or "default"
+        cmd = (
+            f"vault write auth/kubernetes/role/{app_name} \\\n"
+            f"  bound_service_account_names={app_name} \\\n"
+            f"  bound_service_account_namespaces={ns} \\\n"
+            f"  policies={policy_name} \\\n"
+            f"  ttl={defaults['token_ttl']}"
+        )
+        commands.append({"description": f"Create Kubernetes auth role for {app_name}", "command": cmd})
+
+    elif auth_method == "oidc":
+        cmd = (
+            f"vault write auth/oidc/role/{app_name} \\\n"
+            f"  bound_audiences=\"vault\" \\\n"
+            f"  allowed_redirect_uris=\"https://vault.example.com/ui/vault/auth/oidc/oidc/callback\" \\\n"
+            f"  user_claim=\"email\" \\\n"
+            f"  oidc_scopes=\"openid,profile,email\" \\\n"
+            f"  policies=\"{policy_name}\" \\\n"
+            f"  ttl={defaults['token_ttl']}"
+        )
+        commands.append({"description": f"Create OIDC role for {app_name}", "command": cmd})
+
+    return commands
+
+
+def build_output(app_name, auth_method, secrets, environment, namespace):
+    """Build complete configuration output."""
+    valid_secrets, unknown_secrets = parse_secrets(secrets)
+
+    if not valid_secrets:
+        return {
+            "error": "No valid secret types provided",
+            "unknown": unknown_secrets,
+            "available_types": list(SECRET_TYPE_MAP.keys()),
+        }
+
+    policy_name = f"{app_name}-policy"
+    policy_hcl = generate_policy_hcl(app_name, valid_secrets, environment)
+    auth_commands = generate_auth_config(app_name, auth_method, policy_name, namespace)
+
+    secret_details = []
+    for s in valid_secrets:
+        tmpl = SECRET_TYPE_MAP[s]
+        secret_details.append({
+            "type": s,
+            "engine": tmpl["engine"],
+            "path": tmpl["path"].format(app=app_name, env=environment),
+            "capabilities": tmpl["capabilities"],
+            "description": tmpl["description"],
+        })
+
+    result = {
+        "app_name": app_name,
+        "auth_method": auth_method,
+        "environment": environment,
+        "policy_name": policy_name,
+        "policy_hcl": policy_hcl,
+        "auth_commands": auth_commands,
+        "secrets": secret_details,
+        "generated_at": datetime.now().isoformat(),
+    }
+
+    if unknown_secrets:
+        result["warnings"] = [f"Unknown secret type '{u}' — skipped. Available: {list(SECRET_TYPE_MAP.keys())}" for u in unknown_secrets]
+    if namespace:
+        result["namespace"] = namespace
+
+    return result
+
+
+def print_human(result):
+    """Print human-readable output."""
+    if "error" in result:
+        print(f"ERROR: {result['error']}")
+        if result.get("unknown"):
+            print(f"  Unknown types: {', '.join(result['unknown'])}")
+        print(f"  Available types: {', '.join(result['available_types'])}")
+        sys.exit(1)
+
+    print(f"=== Vault Configuration for {result['app_name']} ===")
+    print(f"Auth Method: {result['auth_method']}")
+    print(f"Environment: {result['environment']}")
+    print(f"Policy Name: {result['policy_name']}")
+    print()
+
+    if result.get("warnings"):
+        for w in result["warnings"]:
+            print(f"WARNING: {w}")
+        print()
+
+    print("--- Policy HCL ---")
+    print(result["policy_hcl"])
+    print()
+
+    print(f"Write policy: vault policy write {result['policy_name']} {result['policy_name']}.hcl")
+    print()
+
+    print("--- Auth Method Setup ---")
+    for cmd_info in result["auth_commands"]:
+        print(f"# {cmd_info['description']}")
+        print(cmd_info["command"])
+        print()
+
+    print("--- Secret Paths ---")
+    for s in result["secrets"]:
+        caps = ", ".join(s["capabilities"])
+        print(f"  {s['type']:15s}  {s['path']:50s}  [{caps}]")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate Vault policy and auth configuration from application requirements.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=textwrap.dedent("""\
+            Secret types:
+              db-creds     Dynamic database credentials (read-only)
+              db-admin     Dynamic database credentials (read-write)
+              api-key      Static API keys in KV v2
+              tls-cert     TLS certificate issuance via PKI
+              encryption   Transit encryption-as-a-service
+              ssh-cert     SSH certificate signing
+              config       Application configuration secrets
+
+            Examples:
+              %(prog)s --app-name payment-svc --auth-method approle --secrets "db-creds,api-key"
+              %(prog)s --app-name api-gw --auth-method kubernetes --secrets "db-creds,config" --namespace prod --json
+        """),
+    )
+    parser.add_argument("--app-name", required=True, help="Application or service name")
+    parser.add_argument(
+        "--auth-method",
+        required=True,
+        choices=["approle", "kubernetes", "oidc"],
+        help="Vault auth method to configure",
+    )
+    parser.add_argument("--secrets", required=True, help="Comma-separated secret types (e.g., db-creds,api-key,tls-cert)")
+    parser.add_argument("--environment", default="production", help="Target environment (default: production)")
+    parser.add_argument("--namespace", help="Kubernetes namespace (for kubernetes auth method)")
+    parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
+
+    args = parser.parse_args()
+    result = build_output(args.app_name, args.auth_method, args.secrets, args.environment, args.namespace)
+
+    if args.json_output:
+        print(json.dumps(result, indent=2))
+    else:
+        print_human(result)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/sql-database-assistant/SKILL.md
+++ b/engineering/sql-database-assistant/SKILL.md
@@ -0,0 +1,457 @@
+---
+name: "sql-database-assistant"
+description: "Use when the user asks to write SQL queries, optimize database performance, generate migrations, explore database schemas, or work with ORMs like Prisma, Drizzle, TypeORM, or SQLAlchemy."
+---
+
+# SQL Database Assistant - POWERFUL Tier Skill
+
+## Overview
+
+The operational companion to database design. While **database-designer** focuses on schema architecture and **database-schema-designer** handles ERD modeling, this skill covers the day-to-day: writing queries, optimizing performance, generating migrations, and bridging the gap between application code and database engines.
+
+### Core Capabilities
+
+- **Natural Language to SQL** — translate requirements into correct, performant queries
+- **Schema Exploration** — introspect live databases across PostgreSQL, MySQL, SQLite, SQL Server
+- **Query Optimization** — EXPLAIN analysis, index recommendations, N+1 detection, rewrite patterns
+- **Migration Generation** — up/down scripts, zero-downtime strategies, rollback plans
+- **ORM Integration** — Prisma, Drizzle, TypeORM, SQLAlchemy patterns and escape hatches
+- **Multi-Database Support** — dialect-aware SQL with compatibility guidance
+
+### Tools
+
+| Script | Purpose |
+|--------|---------|
+| `scripts/query_optimizer.py` | Static analysis of SQL queries for performance issues |
+| `scripts/migration_generator.py` | Generate migration file templates from change descriptions |
+| `scripts/schema_explorer.py` | Generate schema documentation from introspection queries |
+
+---
+
+## Natural Language to SQL
+
+### Translation Patterns
+
+When converting requirements to SQL, follow this sequence:
+
+1. **Identify entities** — map nouns to tables
+2. **Identify relationships** — map verbs to JOINs or subqueries
+3. **Identify filters** — map adjectives/conditions to WHERE clauses
+4. **Identify aggregations** — map "total", "average", "count" to GROUP BY
+5. **Identify ordering** — map "top", "latest", "highest" to ORDER BY + LIMIT
+
+### Common Query Templates
+
+**Top-N per group (window function)**
+```sql
+SELECT * FROM (
+  SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
+  FROM employees
+) ranked WHERE rn <= 3;
+```
+
+**Running totals**
+```sql
+SELECT date, amount,
+  SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
+FROM transactions;
+```
+
+**Gap detection**
+```sql
+SELECT curr.id, curr.seq_num, prev.seq_num AS prev_seq
+FROM records curr
+LEFT JOIN records prev ON prev.seq_num = curr.seq_num - 1
+WHERE prev.id IS NULL AND curr.seq_num > 1;
+```
+
+**UPSERT (PostgreSQL)**
+```sql
+INSERT INTO settings (key, value, updated_at)
+VALUES ('theme', 'dark', NOW())
+ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
+```
+
+**UPSERT (MySQL)**
+```sql
+INSERT INTO settings (key_name, value, updated_at)
+VALUES ('theme', 'dark', NOW())
+ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
+```
+
+> See references/query_patterns.md for JOINs, CTEs, window functions, JSON operations, and more.
+
+---
+
+## Schema Exploration
+
+### Introspection Queries
+
+**PostgreSQL — list tables and columns**
+```sql
+SELECT table_name, column_name, data_type, is_nullable, column_default
+FROM information_schema.columns
+WHERE table_schema = 'public'
+ORDER BY table_name, ordinal_position;
+```
+
+**PostgreSQL — foreign keys**
+```sql
+SELECT tc.table_name, kcu.column_name,
+  ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
+FROM information_schema.table_constraints tc
+JOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_name
+JOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_name
+WHERE tc.constraint_type = 'FOREIGN KEY';
+```
+
+**MySQL — table sizes**
+```sql
+SELECT table_name, table_rows,
+  ROUND(data_length / 1024 / 1024, 2) AS data_mb,
+  ROUND(index_length / 1024 / 1024, 2) AS index_mb
+FROM information_schema.tables
+WHERE table_schema = DATABASE()
+ORDER BY data_length DESC;
+```
+
+**SQLite — schema dump**
+```sql
+SELECT name, sql FROM sqlite_master WHERE type = 'table' ORDER BY name;
+```
+
+**SQL Server — columns with types**
+```sql
+SELECT t.name AS table_name, c.name AS column_name,
+  ty.name AS data_type, c.max_length, c.is_nullable
+FROM sys.columns c
+JOIN sys.tables t ON c.object_id = t.object_id
+JOIN sys.types ty ON c.user_type_id = ty.user_type_id
+ORDER BY t.name, c.column_id;
+```
+
+### Generating Documentation from Schema
+
+Use `scripts/schema_explorer.py` to produce markdown or JSON documentation:
+
+```bash
+python scripts/schema_explorer.py --dialect postgres --tables all --format md
+python scripts/schema_explorer.py --dialect mysql --tables users,orders --format json --json
+```
+
+---
+
+## Query Optimization
+
+### EXPLAIN Analysis Workflow
+
+1. **Run EXPLAIN ANALYZE** (PostgreSQL) or **EXPLAIN FORMAT=JSON** (MySQL)
+2. **Identify the costliest node** — Seq Scan on large tables, Nested Loop with high row estimates
+3. **Check for missing indexes** — sequential scans on filtered columns
+4. **Look for estimation errors** — planned vs actual rows divergence signals stale statistics
+5. **Evaluate JOIN order** — ensure the smallest result set drives the join
+
+### Index Recommendation Checklist
+
+- Columns in WHERE clauses with high selectivity
+- Columns in JOIN conditions (foreign keys)
+- Columns in ORDER BY when combined with LIMIT
+- Composite indexes matching multi-column WHERE predicates (most selective column first)
+- Partial indexes for queries with constant filters (e.g., `WHERE status = 'active'`)
+- Covering indexes to avoid table lookups for read-heavy queries
+
+### Query Rewriting Patterns
+
+| Anti-Pattern | Rewrite |
+|-------------|---------|
+| `SELECT * FROM orders` | `SELECT id, status, total FROM orders` (explicit columns) |
+| `WHERE YEAR(created_at) = 2025` | `WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'` (sargable) |
+| Correlated subquery in SELECT | LEFT JOIN with aggregation |
+| `NOT IN (SELECT ...)` with NULLs | `NOT EXISTS (SELECT 1 ...)` |
+| `UNION` (dedup) when not needed | `UNION ALL` |
+| `LIKE '%search%'` | Full-text search index (GIN/FULLTEXT) |
+| `ORDER BY RAND()` | Application-side random sampling or `TABLESAMPLE` |
+
+### N+1 Detection
+
+**Symptoms:**
+- Application loop that executes one query per parent row
+- ORM lazy-loading related entities inside a loop
+- Query log shows hundreds of identical SELECT patterns with different IDs
+
+**Fixes:**
+- Use eager loading (`include` in Prisma, `joinedload` in SQLAlchemy)
+- Batch queries with `WHERE id IN (...)`
+- Use DataLoader pattern for GraphQL resolvers
+
+### Static Analysis Tool
+
+```bash
+python scripts/query_optimizer.py --query "SELECT * FROM orders WHERE status = 'pending'" --dialect postgres
+python scripts/query_optimizer.py --query queries.sql --dialect mysql --json
+```
+
+> See references/optimization_guide.md for EXPLAIN plan reading, index types, and connection pooling.
+
+---
+
+## Migration Generation
+
+### Zero-Downtime Migration Patterns
+
+**Adding a column (safe)**
+```sql
+-- Up
+ALTER TABLE users ADD COLUMN phone VARCHAR(20);
+
+-- Down
+ALTER TABLE users DROP COLUMN phone;
+```
+
+**Renaming a column (expand-contract)**
+```sql
+-- Step 1: Add new column
+ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
+-- Step 2: Backfill
+UPDATE users SET full_name = name;
+-- Step 3: Deploy app reading both columns
+-- Step 4: Deploy app writing only new column
+-- Step 5: Drop old column
+ALTER TABLE users DROP COLUMN name;
+```
+
+**Adding a NOT NULL column (safe sequence)**
+```sql
+-- Step 1: Add nullable
+ALTER TABLE orders ADD COLUMN region VARCHAR(50);
+-- Step 2: Backfill with default
+UPDATE orders SET region = 'unknown' WHERE region IS NULL;
+-- Step 3: Add constraint
+ALTER TABLE orders ALTER COLUMN region SET NOT NULL;
+ALTER TABLE orders ALTER COLUMN region SET DEFAULT 'unknown';
+```
+
+**Index creation (non-blocking, PostgreSQL)**
+```sql
+CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
+```
+
+### Data Backfill Strategies
+
+- **Batch updates** — process in chunks of 1000-10000 rows to avoid lock contention
+- **Background jobs** — run backfills asynchronously with progress tracking
+- **Dual-write** — write to old and new columns during transition period
+- **Validation queries** — verify row counts and data integrity after each batch
+
+### Rollback Strategies
+
+Every migration must have a reversible down script. For irreversible changes:
+
+1. **Backup before execution** — `pg_dump` the affected tables
+2. **Feature flags** — application can switch between old/new schema reads
+3. **Shadow tables** — keep a copy of the original table during migration window
+
+### Migration Generator Tool
+
+```bash
+python scripts/migration_generator.py --change "add email_verified boolean to users" --dialect postgres --format sql
+python scripts/migration_generator.py --change "rename column name to full_name in customers" --dialect mysql --format alembic --json
+```
+
+---
+
+## Multi-Database Support
+
+### Dialect Differences
+
+| Feature | PostgreSQL | MySQL | SQLite | SQL Server |
+|---------|-----------|-------|--------|------------|
+| UPSERT | `ON CONFLICT DO UPDATE` | `ON DUPLICATE KEY UPDATE` | `ON CONFLICT DO UPDATE` | `MERGE` |
+| Boolean | Native `BOOLEAN` | `TINYINT(1)` | `INTEGER` | `BIT` |
+| Auto-increment | `SERIAL` / `GENERATED` | `AUTO_INCREMENT` | `INTEGER PRIMARY KEY` | `IDENTITY` |
+| JSON | `JSONB` (indexed) | `JSON` | Text (ext) | `NVARCHAR(MAX)` |
+| Array | Native `ARRAY` | Not supported | Not supported | Not supported |
+| CTE (recursive) | Full support | 8.0+ | 3.8.3+ | Full support |
+| Window functions | Full support | 8.0+ | 3.25.0+ | Full support |
+| Full-text search | `tsvector` + GIN | `FULLTEXT` index | FTS5 extension | Full-text catalog |
+| LIMIT/OFFSET | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `OFFSET m ROWS FETCH NEXT n ROWS ONLY` |
+
+### Compatibility Tips
+
+- **Always use parameterized queries** — prevents SQL injection across all dialects
+- **Avoid dialect-specific functions in shared code** — wrap in adapter layer
+- **Test migrations on target engine** — `information_schema` varies between engines
+- **Use ISO date format** — `'YYYY-MM-DD'` works everywhere
+- **Quote identifiers** — use double quotes (SQL standard) or backticks (MySQL)
+
+---
+
+## ORM Patterns
+
+### Prisma
+
+**Schema definition**
+```prisma
+model User {
+  id        Int      @id @default(autoincrement())
+  email     String   @unique
+  name      String?
+  posts     Post[]
+  createdAt DateTime @default(now())
+}
+
+model Post {
+  id       Int    @id @default(autoincrement())
+  title    String
+  author   User   @relation(fields: [authorId], references: [id])
+  authorId Int
+}
+```
+
+**Migrations**: `npx prisma migrate dev --name add_user_email`
+**Query API**: `prisma.user.findMany({ where: { email: { contains: '@' } }, include: { posts: true } })`
+**Raw SQL escape hatch**: `prisma.$queryRaw\`SELECT * FROM users WHERE id = ${userId}\``
+
+### Drizzle
+
+**Schema-first definition**
+```typescript
+export const users = pgTable('users', {
+  id: serial('id').primaryKey(),
+  email: varchar('email', { length: 255 }).notNull().unique(),
+  name: text('name'),
+  createdAt: timestamp('created_at').defaultNow(),
+});
+```
+
+**Query builder**: `db.select().from(users).where(eq(users.email, email))`
+**Migrations**: `npx drizzle-kit generate:pg` then `npx drizzle-kit push:pg`
+
+### TypeORM
+
+**Entity decorators**
+```typescript
+@Entity()
+export class User {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column({ unique: true })
+  email: string;
+
+  @OneToMany(() => Post, post => post.author)
+  posts: Post[];
+}
+```
+
+**Repository pattern**: `userRepo.find({ where: { email }, relations: ['posts'] })`
+**Migrations**: `npx typeorm migration:generate -n AddUserEmail`
+
+### SQLAlchemy
+
+**Declarative models**
+```python
+class User(Base):
+    __tablename__ = 'users'
+    id = Column(Integer, primary_key=True)
+    email = Column(String(255), unique=True, nullable=False)
+    name = Column(String(255))
+    posts = relationship('Post', back_populates='author')
+```
+
+**Session management**: Always use `with Session() as session:` context manager
+**Alembic migrations**: `alembic revision --autogenerate -m "add user email"`
+
+> See references/orm_patterns.md for side-by-side comparisons and migration workflows per ORM.
+
+---
+
+## Data Integrity
+
+### Constraint Strategy
+
+- **Primary keys** — every table must have one; prefer surrogate keys (serial/UUID)
+- **Foreign keys** — enforce referential integrity; define ON DELETE behavior explicitly
+- **UNIQUE constraints** — for business-level uniqueness (email, slug, API key)
+- **CHECK constraints** — validate ranges, enums, and business rules at the DB level
+- **NOT NULL** — default to NOT NULL; make nullable only when genuinely optional
+
+### Transaction Isolation Levels
+
+| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
+|-------|-----------|-------------------|-------------|----------|
+| READ UNCOMMITTED | Yes | Yes | Yes | Never recommended |
+| READ COMMITTED | No | Yes | Yes | Default for PostgreSQL, general OLTP |
+| REPEATABLE READ | No | No | Yes (InnoDB: No) | Financial calculations |
+| SERIALIZABLE | No | No | No | Critical consistency (billing, inventory) |
+
+### Deadlock Prevention
+
+1. **Consistent lock ordering** — always acquire locks in the same table/row order
+2. **Short transactions** — minimize time between first lock and commit
+3. **Advisory locks** — use `pg_advisory_lock()` for application-level coordination
+4. **Retry logic** — catch deadlock errors and retry with exponential backoff
+
+---
+
+## Backup & Restore
+
+### PostgreSQL
+```bash
+# Full backup
+pg_dump -Fc --no-owner dbname > backup.dump
+# Restore
+pg_restore -d dbname --clean --no-owner backup.dump
+# Point-in-time recovery: configure WAL archiving + restore_command
+```
+
+### MySQL
+```bash
+# Full backup
+mysqldump --single-transaction --routines --triggers dbname > backup.sql
+# Restore
+mysql dbname < backup.sql
+# Binary log for PITR: mysqlbinlog --start-datetime="2025-01-01 00:00:00" binlog.000001
+```
+
+### SQLite
+```bash
+# Backup (safe with concurrent reads)
+sqlite3 dbname ".backup backup.db"
+```
+
+### Backup Best Practices
+- **Automate** — cron or systemd timer, never manual-only
+- **Test restores** — untested backups are not backups
+- **Offsite copies** — S3, GCS, or separate region
+- **Retention policy** — daily for 7 days, weekly for 4 weeks, monthly for 12 months
+- **Monitor backup size and duration** — sudden changes signal issues
+
+---
+
+## Anti-Patterns
+
+| Anti-Pattern | Problem | Fix |
+|-------------|---------|-----|
+| `SELECT *` | Transfers unnecessary data, breaks on schema changes | Explicit column list |
+| Missing indexes on FK columns | Slow JOINs and cascading deletes | Add indexes on all foreign keys |
+| N+1 queries | 1 + N round trips to database | Eager loading or batch queries |
+| Implicit type coercion | `WHERE id = '123'` prevents index use | Match types in predicates |
+| No connection pooling | Exhausts connections under load | PgBouncer, ProxySQL, or ORM pool |
+| Unbounded queries | No LIMIT risks returning millions of rows | Always paginate |
+| Storing money as FLOAT | Rounding errors | Use `DECIMAL(19,4)` or integer cents |
+| God tables | One table with 50+ columns | Normalize or use vertical partitioning |
+| Soft deletes everywhere | Complicates every query with `WHERE deleted_at IS NULL` | Archive tables or event sourcing |
+| Raw string concatenation | SQL injection | Parameterized queries always |
+
+---
+
+## Cross-References
+
+| Skill | Relationship |
+|-------|-------------|
+| **database-designer** | Schema architecture, normalization analysis, ERD generation |
+| **database-schema-designer** | Visual ERD modeling, relationship mapping |
+| **migration-architect** | Complex multi-step migration orchestration |
+| **api-design-reviewer** | Ensuring API endpoints align with query patterns |
+| **observability-platform** | Query performance monitoring, slow query alerts |
--- a/engineering/sql-database-assistant/references/optimization_guide.md
+++ b/engineering/sql-database-assistant/references/optimization_guide.md
@@ -0,0 +1,330 @@
+# Query Optimization Guide
+
+How to read EXPLAIN plans, choose the right index types, understand query plan operators, and configure connection pooling.
+
+---
+
+## Reading EXPLAIN Plans
+
+### PostgreSQL — EXPLAIN ANALYZE
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Sample output:**
+```
+Limit  (cost=0.43..12.87 rows=20 width=128) (actual time=0.052..0.089 rows=20 loops=1)
+  ->  Index Scan Backward using idx_orders_status_created on orders  (cost=0.43..4521.33 rows=7284 width=128) (actual time=0.051..0.085 rows=20 loops=1)
+        Index Cond: (status = 'paid')
+        Buffers: shared hit=4
+Planning Time: 0.156 ms
+Execution Time: 0.112 ms
+```
+
+**Key fields to check:**
+
+| Field | What it tells you |
+|-------|-------------------|
+| `cost` | Estimated startup..total cost (arbitrary units) |
+| `rows` | Estimated row count at that node |
+| `actual time` | Real wall-clock time in milliseconds |
+| `actual rows` | Real row count — compare against estimate |
+| `Buffers: shared hit` | Pages read from cache (good) |
+| `Buffers: shared read` | Pages read from disk (slow) |
+| `loops` | How many times the node executed |
+
+**Red flags:**
+- `Seq Scan` on a large table with a WHERE clause — missing index
+- `actual rows` >> `rows` (estimated) — stale statistics, run `ANALYZE`
+- `Nested Loop` with high loop count — consider hash join or add index
+- `Sort` with `external merge` — not enough `work_mem`, spilling to disk
+- `Buffers: shared read` much higher than `shared hit` — cold cache or table too large for memory
+
+### MySQL — EXPLAIN FORMAT=JSON
+
+```sql
+EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Key fields:**
+- `query_block.select_id` — identifies subqueries
+- `table.access_type` — `ALL` (full scan), `ref` (index lookup), `range`, `index`, `const`
+- `table.rows_examined_per_scan` — how many rows the engine reads
+- `table.using_index` — covering index (no table lookup needed)
+- `table.attached_condition` — the WHERE filter applied
+
+**Access types ranked (best to worst):**
+`system` > `const` > `eq_ref` > `ref` > `range` > `index` > `ALL`
+
+---
+
+## Index Types
+
+### B-tree (default)
+
+The workhorse index. Supports equality, range, prefix, and ORDER BY operations.
+
+**Best for:** `=`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`, `MIN()`, `MAX()`
+
+```sql
+CREATE INDEX idx_orders_created ON orders (created_at);
+```
+
+**Composite B-tree:** Column order matters. The index is useful for queries that filter on a leftmost prefix of the indexed columns.
+
+```sql
+-- This index serves: WHERE status = ... AND created_at > ...
+-- Also serves: WHERE status = ...
+-- Does NOT serve: WHERE created_at > ... (without status)
+CREATE INDEX idx_orders_status_created ON orders (status, created_at);
+```
+
+### Hash
+
+Equality-only lookups. Faster than B-tree for exact matches but no range support.
+
+**Best for:** `=` lookups on high-cardinality columns
+
+```sql
+-- PostgreSQL
+CREATE INDEX idx_sessions_token ON sessions USING hash (token);
+```
+
+**Limitations:** No range queries, no ORDER BY, not WAL-logged before PostgreSQL 10.
+
+### GIN (Generalized Inverted Index)
+
+For multi-valued data: arrays, JSONB, full-text search vectors.
+
+```sql
+-- JSONB containment
+CREATE INDEX idx_products_tags ON products USING gin (tags);
+-- Query: SELECT * FROM products WHERE tags @> '["sale"]';
+
+-- Full-text search
+CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
+```
+
+### GiST (Generalized Search Tree)
+
+For geometric, range, and proximity data.
+
+```sql
+-- Range type (e.g., date ranges)
+CREATE INDEX idx_bookings_period ON bookings USING gist (during);
+-- Query: SELECT * FROM bookings WHERE during && '[2025-01-01, 2025-01-31]';
+
+-- PostGIS geometry
+CREATE INDEX idx_locations_geom ON locations USING gist (geom);
+```
+
+### BRIN (Block Range INdex)
+
+Tiny index for naturally ordered data (e.g., time-series append-only tables).
+
+```sql
+CREATE INDEX idx_events_created ON events USING brin (created_at);
+```
+
+**Best for:** Large tables where the indexed column correlates with physical row order. Much smaller than B-tree but less precise.
+
+### Partial Index
+
+Index only rows matching a condition. Smaller and faster for targeted queries.
+
+```sql
+-- Only index active users (skip millions of inactive)
+CREATE INDEX idx_users_active_email ON users (email) WHERE status = 'active';
+```
+
+### Covering Index (INCLUDE)
+
+Store extra columns in the index to avoid table lookups (index-only scans).
+
+```sql
+-- PostgreSQL 11+
+CREATE INDEX idx_orders_status ON orders (status) INCLUDE (total, created_at);
+-- Query can be answered entirely from the index:
+-- SELECT total, created_at FROM orders WHERE status = 'paid';
+```
+
+### Expression Index
+
+Index the result of a function or expression.
+
+```sql
+CREATE INDEX idx_users_lower_email ON users (LOWER(email));
+-- Query: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
+```
+
+---
+
+## Query Plan Operators
+
+### Scan operators
+
+| Operator | Description | Performance |
+|----------|-------------|-------------|
+| **Seq Scan** | Full table scan, reads every row | Slow on large tables |
+| **Index Scan** | B-tree lookup + table fetch | Fast for selective queries |
+| **Index Only Scan** | Reads only the index (covering) | Fastest for covered queries |
+| **Bitmap Index Scan** | Builds a bitmap of matching pages | Good for medium selectivity |
+| **Bitmap Heap Scan** | Fetches pages identified by bitmap | Pairs with bitmap index scan |
+
+### Join operators
+
+| Operator | Description | Best when |
+|----------|-------------|-----------|
+| **Nested Loop** | For each outer row, scan inner | Small outer set, indexed inner |
+| **Hash Join** | Build hash table on inner, probe with outer | Medium-large sets, no index |
+| **Merge Join** | Merge two sorted inputs | Both inputs already sorted |
+
+### Other operators
+
+| Operator | Description |
+|----------|-------------|
+| **Sort** | Sorts rows (may spill to disk if work_mem exceeded) |
+| **Hash Aggregate** | GROUP BY using hash table |
+| **Group Aggregate** | GROUP BY on pre-sorted input |
+| **Limit** | Stops after N rows |
+| **Materialize** | Caches subquery results in memory |
+| **Gather / Gather Merge** | Collects results from parallel workers |
+
+---
+
+## Connection Pooling
+
+### Why pool connections?
+
+Each database connection consumes memory (5-10 MB in PostgreSQL). Without pooling:
+- Application creates a new connection per request (slow: TCP + TLS + auth)
+- Under load, connection count spikes past `max_connections`
+- Database OOM or connection refused errors
+
+### PgBouncer (PostgreSQL)
+
+The standard external connection pooler for PostgreSQL.
+
+**Modes:**
+- **Session** — connection assigned for entire client session (safest, least efficient)
+- **Transaction** — connection returned to pool after each transaction (recommended)
+- **Statement** — connection returned after each statement (cannot use transactions)
+
+```ini
+# pgbouncer.ini
+[databases]
+mydb = host=127.0.0.1 port=5432 dbname=mydb
+
+[pgbouncer]
+pool_mode = transaction
+max_client_conn = 200
+default_pool_size = 20
+min_pool_size = 5
+reserve_pool_size = 5
+reserve_pool_timeout = 3
+server_idle_timeout = 300
+```
+
+**Sizing formula:**
+```
+default_pool_size = num_cpu_cores * 2 + effective_spindle_count
+```
+For SSDs, start with `num_cpu_cores * 2` (typically 4-16 connections is optimal).
+
+### ProxySQL (MySQL)
+
+```ini
+mysql_servers = ({ address="127.0.0.1", port=3306, hostgroup=0, max_connections=100 })
+mysql_query_rules = ({ rule_id=1, match_pattern="^SELECT.*FOR UPDATE", destination_hostgroup=0 })
+```
+
+### Application-Level Pooling
+
+Most ORMs and drivers include built-in pooling:
+
+| Platform | Pool Configuration |
+|----------|--------------------|
+| **node-postgres** | `new Pool({ max: 20, idleTimeoutMillis: 30000 })` |
+| **SQLAlchemy** | `create_engine(url, pool_size=20, max_overflow=5)` |
+| **HikariCP (Java)** | `maximumPoolSize=20, minimumIdle=5, idleTimeout=300000` |
+| **Prisma** | `connection_limit=20` in connection string |
+
+### Pool Sizing Guidelines
+
+| Metric | Guideline |
+|--------|-----------|
+| **Minimum** | Number of always-active background workers |
+| **Maximum** | 2-4x CPU cores for OLTP; lower for OLAP |
+| **Idle timeout** | 30-300 seconds (reclaim unused connections) |
+| **Connection timeout** | 3-10 seconds (fail fast under pressure) |
+| **Queue size** | 2-5x pool max (buffer bursts before rejecting) |
+
+**Warning:** More connections does not mean better performance. Beyond the optimal point (usually 20-50), contention on locks, CPU, and I/O causes throughput to decrease.
+
+---
+
+## Statistics and Maintenance
+
+### PostgreSQL
+```sql
+-- Update statistics for the query planner
+ANALYZE orders;
+ANALYZE;  -- All tables
+
+-- Check table bloat and dead tuples
+SELECT relname, n_dead_tup, last_autovacuum, last_autoanalyze
+FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
+
+-- Identify unused indexes
+SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
+FROM pg_stat_user_indexes
+WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
+ORDER BY pg_relation_size(indexrelid) DESC;
+```
+
+### MySQL
+```sql
+-- Update statistics
+ANALYZE TABLE orders;
+
+-- Check index usage
+SELECT * FROM sys.schema_unused_indexes;
+SELECT * FROM sys.schema_redundant_indexes;
+
+-- Identify long-running queries
+SELECT * FROM information_schema.processlist WHERE time > 10;
+```
+
+---
+
+## Performance Checklist
+
+Before deploying any query to production:
+
+1. Run `EXPLAIN ANALYZE` and verify no unexpected sequential scans
+2. Check that estimated rows are within 10x of actual rows
+3. Verify index usage on all WHERE, JOIN, and ORDER BY columns
+4. Ensure LIMIT is present for user-facing list queries
+5. Confirm parameterized queries (no string concatenation)
+6. Test with production-like data volume (not just 10 rows)
+7. Monitor query time in application metrics after deployment
+8. Set up slow query log alerting (> 100ms for OLTP, > 5s for reports)
+
+---
+
+## Quick Reference: When to Use Which Index
+
+| Query Pattern | Index Type |
+|--------------|-----------|
+| `WHERE col = value` | B-tree or Hash |
+| `WHERE col > value` | B-tree |
+| `WHERE col LIKE 'prefix%'` | B-tree |
+| `WHERE col LIKE '%substring%'` | GIN (full-text) or trigram |
+| `WHERE jsonb_col @> '{...}'` | GIN |
+| `WHERE array_col && ARRAY[...]` | GIN |
+| `WHERE range_col && '[a,b]'` | GiST |
+| `WHERE ST_DWithin(geom, ...)` | GiST |
+| `WHERE col = value` (append-only) | BRIN |
+| `WHERE col = value AND status = 'active'` | Partial B-tree |
+| `SELECT a, b WHERE c = value` | Covering (INCLUDE) |
--- a/engineering/sql-database-assistant/references/orm_patterns.md
+++ b/engineering/sql-database-assistant/references/orm_patterns.md
@@ -0,0 +1,451 @@
+# ORM Patterns Reference
+
+Side-by-side comparison of Prisma, Drizzle, TypeORM, and SQLAlchemy patterns for common database operations.
+
+---
+
+## Schema Definition
+
+### Prisma (schema.prisma)
+```prisma
+model User {
+  id        Int      @id @default(autoincrement())
+  email     String   @unique
+  name      String?
+  role      Role     @default(USER)
+  posts     Post[]
+  profile   Profile?
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  @@index([email])
+  @@map("users")
+}
+
+model Post {
+  id        Int      @id @default(autoincrement())
+  title     String
+  body      String?
+  published Boolean  @default(false)
+  author    User     @relation(fields: [authorId], references: [id], onDelete: Cascade)
+  authorId  Int
+  tags      Tag[]
+  createdAt DateTime @default(now())
+
+  @@index([authorId])
+  @@index([published, createdAt])
+  @@map("posts")
+}
+
+enum Role {
+  USER
+  ADMIN
+  MODERATOR
+}
+```
+
+### Drizzle (schema.ts)
+```typescript
+import { pgTable, serial, varchar, text, boolean, timestamp, integer, pgEnum } from 'drizzle-orm/pg-core';
+
+export const roleEnum = pgEnum('role', ['USER', 'ADMIN', 'MODERATOR']);
+
+export const users = pgTable('users', {
+  id: serial('id').primaryKey(),
+  email: varchar('email', { length: 255 }).notNull().unique(),
+  name: varchar('name', { length: 255 }),
+  role: roleEnum('role').default('USER').notNull(),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+  updatedAt: timestamp('updated_at').defaultNow().notNull(),
+});
+
+export const posts = pgTable('posts', {
+  id: serial('id').primaryKey(),
+  title: varchar('title', { length: 255 }).notNull(),
+  body: text('body'),
+  published: boolean('published').default(false).notNull(),
+  authorId: integer('author_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+}, (table) => ({
+  authorIdx: index('idx_posts_author').on(table.authorId),
+  publishedIdx: index('idx_posts_published').on(table.published, table.createdAt),
+}));
+```
+
+### TypeORM (entities)
+```typescript
+import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, OneToMany, CreateDateColumn, UpdateDateColumn, Index } from 'typeorm';
+
+export enum Role { USER = 'USER', ADMIN = 'ADMIN', MODERATOR = 'MODERATOR' }
+
+@Entity('users')
+export class User {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column({ unique: true })
+  @Index()
+  email: string;
+
+  @Column({ nullable: true })
+  name: string;
+
+  @Column({ type: 'enum', enum: Role, default: Role.USER })
+  role: Role;
+
+  @OneToMany(() => Post, post => post.author)
+  posts: Post[];
+
+  @CreateDateColumn()
+  createdAt: Date;
+
+  @UpdateDateColumn()
+  updatedAt: Date;
+}
+
+@Entity('posts')
+@Index(['published', 'createdAt'])
+export class Post {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column()
+  title: string;
+
+  @Column({ nullable: true, type: 'text' })
+  body: string;
+
+  @Column({ default: false })
+  published: boolean;
+
+  @ManyToOne(() => User, user => user.posts, { onDelete: 'CASCADE' })
+  author: User;
+
+  @Column()
+  authorId: number;
+
+  @CreateDateColumn()
+  createdAt: Date;
+}
+```
+
+### SQLAlchemy (models.py)
+```python
+import enum
+from datetime import datetime
+from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, Enum, ForeignKey, Index
+from sqlalchemy.orm import relationship, DeclarativeBase
+
+class Base(DeclarativeBase):
+    pass
+
+class Role(enum.Enum):
+    USER = "USER"
+    ADMIN = "ADMIN"
+    MODERATOR = "MODERATOR"
+
+class User(Base):
+    __tablename__ = 'users'
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    email = Column(String(255), unique=True, nullable=False, index=True)
+    name = Column(String(255), nullable=True)
+    role = Column(Enum(Role), default=Role.USER, nullable=False)
+    posts = relationship('Post', back_populates='author', cascade='all, delete-orphan')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
+
+class Post(Base):
+    __tablename__ = 'posts'
+    __table_args__ = (
+        Index('idx_posts_published', 'published', 'created_at'),
+    )
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    title = Column(String(255), nullable=False)
+    body = Column(Text, nullable=True)
+    published = Column(Boolean, default=False, nullable=False)
+    author_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False, index=True)
+    author = relationship('User', back_populates='posts')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+```
+
+---
+
+## CRUD Operations
+
+### Create
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.create({ data: { email, name } })` |
+| **Drizzle** | `await db.insert(users).values({ email, name }).returning()` |
+| **TypeORM** | `await userRepo.save(userRepo.create({ email, name }))` |
+| **SQLAlchemy** | `session.add(User(email=email, name=name)); session.commit()` |
+
+### Read (with filter)
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.findMany({ where: { role: 'ADMIN' }, orderBy: { createdAt: 'desc' } })` |
+| **Drizzle** | `await db.select().from(users).where(eq(users.role, 'ADMIN')).orderBy(desc(users.createdAt))` |
+| **TypeORM** | `await userRepo.find({ where: { role: Role.ADMIN }, order: { createdAt: 'DESC' } })` |
+| **SQLAlchemy** | `session.query(User).filter(User.role == Role.ADMIN).order_by(User.created_at.desc()).all()` |
+
+### Update
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.update({ where: { id }, data: { name } })` |
+| **Drizzle** | `await db.update(users).set({ name }).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.update(id, { name })` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).update({User.name: name}); session.commit()` |
+
+### Delete
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.delete({ where: { id } })` |
+| **Drizzle** | `await db.delete(users).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.delete(id)` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).delete(); session.commit()` |
+
+---
+
+## Relations and Eager Loading
+
+### Prisma — include / select
+```typescript
+// Eager load posts with user
+const user = await prisma.user.findUnique({
+  where: { id: 1 },
+  include: { posts: { where: { published: true }, orderBy: { createdAt: 'desc' } } },
+});
+
+// Nested create
+await prisma.user.create({
+  data: {
+    email: 'new@example.com',
+    posts: { create: [{ title: 'First post' }] },
+  },
+});
+```
+
+### Drizzle — relational queries
+```typescript
+const result = await db.query.users.findFirst({
+  where: eq(users.id, 1),
+  with: { posts: { where: eq(posts.published, true), orderBy: [desc(posts.createdAt)] } },
+});
+```
+
+### TypeORM — relations / query builder
+```typescript
+// FindOptions
+const user = await userRepo.findOne({ where: { id: 1 }, relations: ['posts'] });
+
+// QueryBuilder for complex joins
+const result = await userRepo.createQueryBuilder('u')
+  .leftJoinAndSelect('u.posts', 'p', 'p.published = :pub', { pub: true })
+  .where('u.id = :id', { id: 1 })
+  .getOne();
+```
+
+### SQLAlchemy — joinedload / selectinload
+```python
+from sqlalchemy.orm import joinedload, selectinload
+
+# Eager load in one JOIN query
+user = session.query(User).options(joinedload(User.posts)).filter(User.id == 1).first()
+
+# Eager load in a separate IN query (better for collections)
+users = session.query(User).options(selectinload(User.posts)).all()
+```
+
+---
+
+## Raw SQL Escape Hatches
+
+Every ORM should provide a way to execute raw SQL for complex queries:
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `` prisma.$queryRaw`SELECT * FROM users WHERE id = ${id}` `` |
+| **Drizzle** | `db.execute(sql`SELECT * FROM users WHERE id = ${id}`)` |
+| **TypeORM** | `dataSource.query('SELECT * FROM users WHERE id = $1', [id])` |
+| **SQLAlchemy** | `session.execute(text('SELECT * FROM users WHERE id = :id'), {'id': id})` |
+
+Always use parameterized queries in raw SQL to prevent injection.
+
+---
+
+## Transaction Patterns
+
+### Prisma
+```typescript
+await prisma.$transaction(async (tx) => {
+  const user = await tx.user.create({ data: { email } });
+  await tx.post.create({ data: { title: 'Welcome', authorId: user.id } });
+});
+```
+
+### Drizzle
+```typescript
+await db.transaction(async (tx) => {
+  const [user] = await tx.insert(users).values({ email }).returning();
+  await tx.insert(posts).values({ title: 'Welcome', authorId: user.id });
+});
+```
+
+### TypeORM
+```typescript
+await dataSource.transaction(async (manager) => {
+  const user = await manager.save(User, { email });
+  await manager.save(Post, { title: 'Welcome', authorId: user.id });
+});
+```
+
+### SQLAlchemy
+```python
+with Session() as session:
+    try:
+        user = User(email=email)
+        session.add(user)
+        session.flush()  # Get user.id without committing
+        session.add(Post(title='Welcome', author_id=user.id))
+        session.commit()
+    except Exception:
+        session.rollback()
+        raise
+```
+
+---
+
+## Migration Workflows
+
+### Prisma
+```bash
+# Generate migration from schema changes
+npx prisma migrate dev --name add_posts_table
+
+# Apply in production
+npx prisma migrate deploy
+
+# Reset database (dev only)
+npx prisma migrate reset
+
+# Generate client after schema change
+npx prisma generate
+```
+
+**Files:** `prisma/migrations/<timestamp>_<name>/migration.sql`
+
+### Drizzle
+```bash
+# Generate migration SQL from schema diff
+npx drizzle-kit generate:pg
+
+# Push schema directly (dev only, no migration files)
+npx drizzle-kit push:pg
+
+# Apply migrations
+npx drizzle-kit migrate
+```
+
+**Files:** `drizzle/<timestamp>_<name>.sql`
+
+### TypeORM
+```bash
+# Auto-generate migration from entity changes
+npx typeorm migration:generate -d data-source.ts -n AddPostsTable
+
+# Create empty migration
+npx typeorm migration:create -n CustomMigration
+
+# Run pending migrations
+npx typeorm migration:run -d data-source.ts
+
+# Revert last migration
+npx typeorm migration:revert -d data-source.ts
+```
+
+**Files:** `src/migrations/<timestamp>-<Name>.ts`
+
+### SQLAlchemy (Alembic)
+```bash
+# Initialize Alembic
+alembic init alembic
+
+# Auto-generate migration from model changes
+alembic revision --autogenerate -m "add posts table"
+
+# Apply all pending
+alembic upgrade head
+
+# Revert one step
+alembic downgrade -1
+
+# Show current state
+alembic current
+```
+
+**Files:** `alembic/versions/<hash>_<slug>.py`
+
+---
+
+## N+1 Prevention Cheat Sheet
+
+| ORM | Lazy (N+1 risk) | Eager (fixed) |
+|-----|-----------------|---------------|
+| **Prisma** | Not accessing `include` | `include: { posts: true }` |
+| **Drizzle** | Separate queries | `with: { posts: true }` |
+| **TypeORM** | `@ManyToOne(() => ..., { lazy: true })` | `relations: ['posts']` or `leftJoinAndSelect` |
+| **SQLAlchemy** | Default `lazy='select'` | `joinedload()` or `selectinload()` |
+
+**Rule of thumb:** If you access a relation inside a loop, you have an N+1 problem. Always load relations before the loop.
+
+---
+
+## Connection Pooling
+
+### Prisma
+```
+# In .env or connection string
+DATABASE_URL="postgresql://user:pass@host/db?connection_limit=20&pool_timeout=10"
+```
+
+### Drizzle (with node-postgres)
+```typescript
+import { Pool } from 'pg';
+const pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000 });
+const db = drizzle(pool);
+```
+
+### TypeORM
+```typescript
+const dataSource = new DataSource({
+  type: 'postgres',
+  extra: { max: 20, idleTimeoutMillis: 30000 },
+});
+```
+
+### SQLAlchemy
+```python
+from sqlalchemy import create_engine
+engine = create_engine('postgresql://user:pass@host/db', pool_size=20, max_overflow=5, pool_timeout=30)
+```
+
+---
+
+## Best Practices Summary
+
+1. **Always use migrations** — never modify production schemas by hand
+2. **Eager load relations** — prevent N+1 in every list/collection query
+3. **Use transactions** — group related writes to maintain consistency
+4. **Parameterize raw SQL** — never concatenate user input into queries
+5. **Connection pooling** — configure pool size matching your workload
+6. **Index foreign keys** — ORMs often skip this; add manually if needed
+7. **Review generated SQL** — enable query logging in development to catch inefficiencies
+8. **Type-safe queries** — leverage TypeScript/Python typing for compile-time checks
+9. **Separate read/write models** — use views or read replicas for heavy reporting queries
+10. **Test migrations both ways** — always verify that down migrations actually reverse up migrations
--- a/engineering/sql-database-assistant/references/query_patterns.md
+++ b/engineering/sql-database-assistant/references/query_patterns.md
@@ -0,0 +1,406 @@
+# SQL Query Patterns Reference
+
+Common query patterns for everyday database operations. All examples use PostgreSQL syntax with dialect notes where they differ.
+
+---
+
+## JOIN Patterns
+
+### INNER JOIN — matching rows in both tables
+```sql
+SELECT u.name, o.id AS order_id, o.total
+FROM users u
+INNER JOIN orders o ON o.user_id = u.id
+WHERE o.status = 'paid';
+```
+
+### LEFT JOIN — all rows from left, matching from right
+```sql
+SELECT u.name, COUNT(o.id) AS order_count
+FROM users u
+LEFT JOIN orders o ON o.user_id = u.id
+GROUP BY u.id, u.name;
+```
+Returns users even if they have zero orders.
+
+### Self JOIN — comparing rows within the same table
+```sql
+-- Find employees who earn more than their manager
+SELECT e.name AS employee, m.name AS manager, e.salary, m.salary AS manager_salary
+FROM employees e
+JOIN employees m ON e.manager_id = m.id
+WHERE e.salary > m.salary;
+```
+
+### CROSS JOIN — every combination (cartesian product)
+```sql
+-- Generate a calendar grid
+SELECT d.date, s.shift_name
+FROM dates d
+CROSS JOIN shifts s;
+```
+Use intentionally. Accidental cartesian joins are a performance killer.
+
+### LATERAL JOIN (PostgreSQL) — correlated subquery as a table
+```sql
+-- Top 3 orders per user
+SELECT u.name, top_orders.*
+FROM users u
+CROSS JOIN LATERAL (
+  SELECT id, total FROM orders
+  WHERE user_id = u.id
+  ORDER BY total DESC LIMIT 3
+) top_orders;
+```
+MySQL equivalent: use a subquery with `ROW_NUMBER()`.
+
+---
+
+## Common Table Expressions (CTEs)
+
+### Basic CTE — readable subquery
+```sql
+WITH active_users AS (
+  SELECT id, name, email
+  FROM users
+  WHERE last_login > CURRENT_DATE - INTERVAL '30 days'
+)
+SELECT au.name, COUNT(o.id) AS recent_orders
+FROM active_users au
+JOIN orders o ON o.user_id = au.id
+GROUP BY au.name;
+```
+
+### Multiple CTEs — chaining transformations
+```sql
+WITH monthly_revenue AS (
+  SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue
+  FROM orders WHERE status = 'paid'
+  GROUP BY 1
+),
+growth AS (
+  SELECT month, revenue,
+    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
+    ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
+  FROM monthly_revenue
+)
+SELECT * FROM growth ORDER BY month;
+```
+
+### Recursive CTE — hierarchical data
+```sql
+-- Organization tree
+WITH RECURSIVE org_tree AS (
+  -- Base case: top-level managers
+  SELECT id, name, manager_id, 0 AS depth
+  FROM employees WHERE manager_id IS NULL
+
+  UNION ALL
+
+  -- Recursive case: subordinates
+  SELECT e.id, e.name, e.manager_id, ot.depth + 1
+  FROM employees e
+  JOIN org_tree ot ON e.manager_id = ot.id
+)
+SELECT * FROM org_tree ORDER BY depth, name;
+```
+
+### Recursive CTE — path traversal
+```sql
+-- Category breadcrumb
+WITH RECURSIVE breadcrumb AS (
+  SELECT id, name, parent_id, name::TEXT AS path
+  FROM categories WHERE id = 42
+
+  UNION ALL
+
+  SELECT c.id, c.name, c.parent_id, c.name || ' > ' || b.path
+  FROM categories c
+  JOIN breadcrumb b ON c.id = b.parent_id
+)
+SELECT path FROM breadcrumb WHERE parent_id IS NULL;
+```
+
+---
+
+## Window Functions
+
+### ROW_NUMBER — assign unique rank per partition
+```sql
+SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
+FROM employees;
+```
+
+### RANK and DENSE_RANK — handle ties
+```sql
+-- RANK: 1, 2, 2, 4 (skips after tie)
+-- DENSE_RANK: 1, 2, 2, 3 (no skip)
+SELECT name, salary,
+  RANK() OVER (ORDER BY salary DESC) AS rank,
+  DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
+FROM employees;
+```
+
+### Running total and moving average
+```sql
+SELECT date, amount,
+  SUM(amount) OVER (ORDER BY date) AS running_total,
+  AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
+FROM daily_revenue;
+```
+
+### LAG / LEAD — access adjacent rows
+```sql
+SELECT date, revenue,
+  LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
+  revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day_change
+FROM daily_revenue;
+```
+
+### NTILE — divide into buckets
+```sql
+-- Split customers into quartiles by total spend
+SELECT customer_id, total_spend,
+  NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
+FROM customer_summary;
+```
+
+### FIRST_VALUE / LAST_VALUE
+```sql
+SELECT department_id, name, salary,
+  FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_paid
+FROM employees;
+```
+
+---
+
+## Subquery Patterns
+
+### EXISTS — correlated existence check
+```sql
+-- Users who have placed at least one order
+SELECT u.* FROM users u
+WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### NOT EXISTS — safer than NOT IN for NULLs
+```sql
+-- Users who have never ordered
+SELECT u.* FROM users u
+WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### Scalar subquery — single value
+```sql
+SELECT name, salary,
+  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
+FROM employees;
+```
+
+### Derived table — subquery in FROM
+```sql
+SELECT dept, avg_salary
+FROM (
+  SELECT department_id AS dept, AVG(salary) AS avg_salary
+  FROM employees GROUP BY department_id
+) dept_avg
+WHERE avg_salary > 100000;
+```
+
+---
+
+## Aggregation Patterns
+
+### GROUP BY with HAVING
+```sql
+-- Departments with more than 10 employees
+SELECT department_id, COUNT(*) AS headcount, AVG(salary) AS avg_salary
+FROM employees
+GROUP BY department_id
+HAVING COUNT(*) > 10;
+```
+
+### GROUPING SETS — multiple grouping levels
+```sql
+SELECT region, product_category, SUM(revenue)
+FROM sales
+GROUP BY GROUPING SETS (
+  (region, product_category),
+  (region),
+  (product_category),
+  ()
+);
+```
+
+### ROLLUP — hierarchical subtotals
+```sql
+SELECT region, city, SUM(revenue)
+FROM sales
+GROUP BY ROLLUP (region, city);
+-- Produces: (region, city), (region), ()
+```
+
+### CUBE — all combinations
+```sql
+SELECT region, product, SUM(revenue)
+FROM sales
+GROUP BY CUBE (region, product);
+```
+
+### FILTER clause (PostgreSQL) — conditional aggregation
+```sql
+SELECT
+  COUNT(*) AS total,
+  COUNT(*) FILTER (WHERE status = 'paid') AS paid,
+  COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
+  SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
+FROM orders;
+```
+MySQL/SQL Server equivalent: `SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)`.
+
+---
+
+## UPSERT Patterns
+
+### PostgreSQL — ON CONFLICT
+```sql
+INSERT INTO user_settings (user_id, key, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON CONFLICT (user_id, key)
+DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
+```
+
+### MySQL — ON DUPLICATE KEY
+```sql
+INSERT INTO user_settings (user_id, key_name, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
+```
+
+### SQL Server — MERGE
+```sql
+MERGE INTO user_settings AS target
+USING (VALUES (1, 'theme', 'dark')) AS source (user_id, key_name, value)
+ON target.user_id = source.user_id AND target.key_name = source.key_name
+WHEN MATCHED THEN UPDATE SET value = source.value, updated_at = GETDATE()
+WHEN NOT MATCHED THEN INSERT (user_id, key_name, value, updated_at)
+  VALUES (source.user_id, source.key_name, source.value, GETDATE());
+```
+
+---
+
+## JSON Operations
+
+### PostgreSQL JSONB
+```sql
+-- Extract field
+SELECT data->>'name' AS name FROM products WHERE data->>'category' = 'electronics';
+
+-- Array contains
+SELECT * FROM products WHERE data->'tags' ? 'sale';
+
+-- Update nested field
+UPDATE products SET data = jsonb_set(data, '{price}', '29.99') WHERE id = 1;
+
+-- Aggregate into JSON array
+SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM users;
+```
+
+### MySQL JSON
+```sql
+-- Extract field
+SELECT JSON_EXTRACT(data, '$.name') AS name FROM products;
+-- Shorthand: SELECT data->>"$.name"
+
+-- Search in array
+SELECT * FROM products WHERE JSON_CONTAINS(data->"$.tags", '"sale"');
+
+-- Update
+UPDATE products SET data = JSON_SET(data, '$.price', 29.99) WHERE id = 1;
+```
+
+---
+
+## Pagination Patterns
+
+### Offset pagination (simple but slow for deep pages)
+```sql
+SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
+```
+
+### Keyset pagination (fast, requires ordered unique column)
+```sql
+-- Page after the last seen id
+SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20;
+```
+
+### Keyset with composite sort
+```sql
+SELECT * FROM products
+WHERE (created_at, id) < (:last_created_at, :last_id)
+ORDER BY created_at DESC, id DESC
+LIMIT 20;
+```
+
+---
+
+## Bulk Operations
+
+### Batch INSERT
+```sql
+INSERT INTO events (type, payload, created_at) VALUES
+  ('click', '{"page": "/home"}', NOW()),
+  ('view', '{"page": "/pricing"}', NOW()),
+  ('click', '{"page": "/signup"}', NOW());
+```
+
+### Batch UPDATE with VALUES
+```sql
+UPDATE products AS p SET price = v.price
+FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, price)
+WHERE p.id = v.id;
+```
+
+### DELETE with subquery
+```sql
+DELETE FROM sessions
+WHERE user_id IN (SELECT id FROM users WHERE deleted_at IS NOT NULL);
+```
+
+### COPY (PostgreSQL bulk load)
+```sql
+COPY products (name, price, category) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
+```
+
+---
+
+## Utility Patterns
+
+### Generate series (PostgreSQL)
+```sql
+-- Fill date gaps
+SELECT d::date FROM generate_series('2025-01-01'::date, '2025-12-31', '1 day') d;
+```
+
+### Deduplicate rows
+```sql
+DELETE FROM events a USING events b
+WHERE a.id > b.id AND a.user_id = b.user_id AND a.event_type = b.event_type
+  AND a.created_at = b.created_at;
+```
+
+### Pivot (manual)
+```sql
+SELECT user_id,
+  SUM(CASE WHEN month = 1 THEN revenue END) AS jan,
+  SUM(CASE WHEN month = 2 THEN revenue END) AS feb,
+  SUM(CASE WHEN month = 3 THEN revenue END) AS mar
+FROM monthly_revenue
+GROUP BY user_id;
+```
+
+### Conditional INSERT (skip if exists)
+```sql
+INSERT INTO tags (name) SELECT 'new-tag'
+WHERE NOT EXISTS (SELECT 1 FROM tags WHERE name = 'new-tag');
+```
--- a/engineering/sql-database-assistant/scripts/migration_generator.py
+++ b/engineering/sql-database-assistant/scripts/migration_generator.py
@@ -0,0 +1,442 @@
+#!/usr/bin/env python3
+"""
+Migration Generator
+
+Generates database migration file templates (up/down) from natural-language
+schema change descriptions.
+
+Supported operations:
+- Add column, drop column, rename column
+- Add table, drop table, rename table
+- Add index, drop index
+- Add constraint, drop constraint
+- Change column type
+
+Usage:
+    python migration_generator.py --change "add email_verified boolean to users" --dialect postgres
+    python migration_generator.py --change "rename column name to full_name in customers" --format alembic
+    python migration_generator.py --change "add index on orders(status, created_at)" --output 001_add_index.sql
+    python migration_generator.py --change "create table reviews with id, user_id, rating, body" --json
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+import textwrap
+from dataclasses import dataclass, asdict
+from datetime import datetime
+from typing import List, Optional, Tuple
+
+
+@dataclass
+class Migration:
+    """A generated migration with up and down scripts."""
+    description: str
+    dialect: str
+    format: str
+    up: str
+    down: str
+    warnings: List[str]
+
+    def to_dict(self):
+        return asdict(self)
+
+
+# ---------------------------------------------------------------------------
+# Change parsers — extract structured intent from natural language
+# ---------------------------------------------------------------------------
+
+def parse_add_column(desc: str) -> Optional[dict]:
+    """Parse: add <column> <type> to <table>"""
+    m = re.match(
+        r'add\s+(?:column\s+)?(\w+)\s+(\w[\w(),.]*)\s+(?:to|on)\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "add_column", "column": m.group(1), "type": m.group(2), "table": m.group(3)}
+    return None
+
+
+def parse_drop_column(desc: str) -> Optional[dict]:
+    """Parse: drop/remove <column> from <table>"""
+    m = re.match(
+        r'(?:drop|remove)\s+(?:column\s+)?(\w+)\s+from\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "drop_column", "column": m.group(1), "table": m.group(2)}
+    return None
+
+
+def parse_rename_column(desc: str) -> Optional[dict]:
+    """Parse: rename column <old> to <new> in <table>"""
+    m = re.match(
+        r'rename\s+column\s+(\w+)\s+to\s+(\w+)\s+in\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "rename_column", "old": m.group(1), "new": m.group(2), "table": m.group(3)}
+    return None
+
+
+def parse_add_table(desc: str) -> Optional[dict]:
+    """Parse: create table <name> with <col1>, <col2>, ..."""
+    m = re.match(
+        r'create\s+table\s+(\w+)\s+with\s+(.+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        cols = [c.strip() for c in m.group(2).split(",")]
+        return {"op": "add_table", "table": m.group(1), "columns": cols}
+    return None
+
+
+def parse_drop_table(desc: str) -> Optional[dict]:
+    """Parse: drop table <name>"""
+    m = re.match(r'drop\s+table\s+(\w+)', desc, re.IGNORECASE)
+    if m:
+        return {"op": "drop_table", "table": m.group(1)}
+    return None
+
+
+def parse_add_index(desc: str) -> Optional[dict]:
+    """Parse: add index on <table>(<col1>, <col2>)"""
+    m = re.match(
+        r'add\s+(?:unique\s+)?index\s+(?:on\s+)?(\w+)\s*\(([^)]+)\)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        unique = "unique" in desc.lower()
+        cols = [c.strip() for c in m.group(2).split(",")]
+        return {"op": "add_index", "table": m.group(1), "columns": cols, "unique": unique}
+    return None
+
+
+def parse_change_type(desc: str) -> Optional[dict]:
+    """Parse: change <column> type to <type> in <table>"""
+    m = re.match(
+        r'change\s+(?:column\s+)?(\w+)\s+type\s+to\s+(\w[\w(),.]*)\s+in\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "change_type", "column": m.group(1), "new_type": m.group(2), "table": m.group(3)}
+    return None
+
+
+PARSERS = [
+    parse_add_column,
+    parse_drop_column,
+    parse_rename_column,
+    parse_add_table,
+    parse_drop_table,
+    parse_add_index,
+    parse_change_type,
+]
+
+
+def parse_change(desc: str) -> Optional[dict]:
+    for parser in PARSERS:
+        result = parser(desc)
+        if result:
+            return result
+    return None
+
+
+# ---------------------------------------------------------------------------
+# SQL generators per dialect
+# ---------------------------------------------------------------------------
+
+TYPE_MAP = {
+    "boolean": {"postgres": "BOOLEAN", "mysql": "TINYINT(1)", "sqlite": "INTEGER", "sqlserver": "BIT"},
+    "text": {"postgres": "TEXT", "mysql": "TEXT", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
+    "integer": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
+    "int": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
+    "serial": {"postgres": "SERIAL", "mysql": "INT AUTO_INCREMENT", "sqlite": "INTEGER", "sqlserver": "INT IDENTITY(1,1)"},
+    "varchar": {"postgres": "VARCHAR(255)", "mysql": "VARCHAR(255)", "sqlite": "TEXT", "sqlserver": "NVARCHAR(255)"},
+    "timestamp": {"postgres": "TIMESTAMP", "mysql": "DATETIME", "sqlite": "TEXT", "sqlserver": "DATETIME2"},
+    "uuid": {"postgres": "UUID", "mysql": "CHAR(36)", "sqlite": "TEXT", "sqlserver": "UNIQUEIDENTIFIER"},
+    "json": {"postgres": "JSONB", "mysql": "JSON", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
+    "decimal": {"postgres": "DECIMAL(19,4)", "mysql": "DECIMAL(19,4)", "sqlite": "REAL", "sqlserver": "DECIMAL(19,4)"},
+    "float": {"postgres": "DOUBLE PRECISION", "mysql": "DOUBLE", "sqlite": "REAL", "sqlserver": "FLOAT"},
+}
+
+
+def map_type(type_name: str, dialect: str) -> str:
+    """Map a generic type name to a dialect-specific type."""
+    key = type_name.lower().rstrip("()")
+    if key in TYPE_MAP and dialect in TYPE_MAP[key]:
+        return TYPE_MAP[key][dialect]
+    return type_name.upper()
+
+
+def gen_add_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    col_type = map_type(change["type"], dialect)
+    table = change["table"]
+    col = change["column"]
+    up = f"ALTER TABLE {table} ADD COLUMN {col} {col_type};"
+    down = f"ALTER TABLE {table} DROP COLUMN {col};"
+    return up, down, []
+
+
+def gen_drop_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    col = change["column"]
+    up = f"ALTER TABLE {table} DROP COLUMN {col};"
+    down = f"-- WARNING: Cannot fully reverse DROP COLUMN. Provide the original type.\nALTER TABLE {table} ADD COLUMN {col} TEXT;"
+    return up, down, ["Down migration uses TEXT as placeholder. Replace with the original column type."]
+
+
+def gen_rename_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    old, new = change["old"], change["new"]
+    warnings = []
+    if dialect == "postgres":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    elif dialect == "mysql":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    elif dialect == "sqlite":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+        warnings.append("SQLite RENAME COLUMN requires version 3.25.0+.")
+    elif dialect == "sqlserver":
+        up = f"EXEC sp_rename '{table}.{old}', '{new}', 'COLUMN';"
+        down = f"EXEC sp_rename '{table}.{new}', '{old}', 'COLUMN';"
+    else:
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    return up, down, warnings
+
+
+def gen_add_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    cols = change["columns"]
+    col_defs = []
+    has_id = False
+    for col in cols:
+        col = col.strip()
+        if col.lower() == "id":
+            has_id = True
+            if dialect == "postgres":
+                col_defs.append("    id SERIAL PRIMARY KEY")
+            elif dialect == "mysql":
+                col_defs.append("    id INT AUTO_INCREMENT PRIMARY KEY")
+            elif dialect == "sqlite":
+                col_defs.append("    id INTEGER PRIMARY KEY AUTOINCREMENT")
+            elif dialect == "sqlserver":
+                col_defs.append("    id INT IDENTITY(1,1) PRIMARY KEY")
+        else:
+            # Check if type is specified (e.g., "rating int")
+            parts = col.split()
+            if len(parts) >= 2:
+                col_defs.append(f"    {parts[0]} {map_type(parts[1], dialect)}")
+            else:
+                col_defs.append(f"    {col} TEXT")
+
+    cols_sql = ",\n".join(col_defs)
+    up = f"CREATE TABLE {table} (\n{cols_sql}\n);"
+    down = f"DROP TABLE {table};"
+    warnings = []
+    if not has_id:
+        warnings.append("Table has no explicit primary key. Consider adding an 'id' column.")
+    return up, down, warnings
+
+
+def gen_drop_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    up = f"DROP TABLE {table};"
+    down = f"-- WARNING: Cannot reverse DROP TABLE without original DDL.\nCREATE TABLE {table} (id INTEGER PRIMARY KEY);"
+    return up, down, ["Down migration is a placeholder. Replace with the original CREATE TABLE statement."]
+
+
+def gen_add_index(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    cols = change["columns"]
+    unique = "UNIQUE " if change.get("unique") else ""
+    idx_name = f"idx_{table}_{'_'.join(cols)}"
+    if dialect == "postgres":
+        up = f"CREATE {unique}INDEX CONCURRENTLY {idx_name} ON {table} ({', '.join(cols)});"
+    else:
+        up = f"CREATE {unique}INDEX {idx_name} ON {table} ({', '.join(cols)});"
+    down = f"DROP INDEX {idx_name};" if dialect != "mysql" else f"DROP INDEX {idx_name} ON {table};"
+    warnings = []
+    if dialect == "postgres":
+        warnings.append("CONCURRENTLY cannot run inside a transaction. Run outside migration transaction.")
+    return up, down, warnings
+
+
+def gen_change_type(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    col = change["column"]
+    new_type = map_type(change["new_type"], dialect)
+    warnings = ["Down migration uses TEXT as placeholder. Replace with the original column type."]
+    if dialect == "postgres":
+        up = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE {new_type};"
+        down = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE TEXT;"
+    elif dialect == "mysql":
+        up = f"ALTER TABLE {table} MODIFY COLUMN {col} {new_type};"
+        down = f"ALTER TABLE {table} MODIFY COLUMN {col} TEXT;"
+    elif dialect == "sqlserver":
+        up = f"ALTER TABLE {table} ALTER COLUMN {col} {new_type};"
+        down = f"ALTER TABLE {table} ALTER COLUMN {col} NVARCHAR(MAX);"
+    else:
+        up = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
+        down = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
+        warnings.append("SQLite requires table recreation for type changes.")
+    return up, down, warnings
+
+
+GENERATORS = {
+    "add_column": gen_add_column,
+    "drop_column": gen_drop_column,
+    "rename_column": gen_rename_column,
+    "add_table": gen_add_table,
+    "drop_table": gen_drop_table,
+    "add_index": gen_add_index,
+    "change_type": gen_change_type,
+}
+
+
+# ---------------------------------------------------------------------------
+# Format wrappers
+# ---------------------------------------------------------------------------
+
+def wrap_sql(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Wrap as plain SQL migration files."""
+    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
+    header = f"-- Migration: {description}\n-- Generated: {datetime.now().isoformat()}\n\n"
+    return header + "-- Up\n" + up, header + "-- Down\n" + down
+
+
+def wrap_prisma(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Format as Prisma migration SQL (Prisma uses raw SQL in migration.sql)."""
+    header = f"-- Migration: {description}\n-- Format: Prisma (migration.sql)\n\n"
+    return header + up, header + "-- Rollback\n" + down
+
+
+def wrap_alembic(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Format as Alembic Python migration."""
+    slug = re.sub(r'\W+', '_', description.lower())[:40]
+    revision = datetime.now().strftime("%Y%m%d%H%M")
+    template = textwrap.dedent(f'''\
+        """
+        {description}
+
+        Revision ID: {revision}
+        """
+        from alembic import op
+        import sqlalchemy as sa
+
+        revision = '{revision}'
+        down_revision = None  # Set to previous revision
+
+
+        def upgrade():
+            op.execute("""
+        {textwrap.indent(up, "        ")}
+            """)
+
+
+        def downgrade():
+            op.execute("""
+        {textwrap.indent(down, "        ")}
+            """)
+    ''')
+    return template, ""
+
+
+FORMATTERS = {
+    "sql": wrap_sql,
+    "prisma": wrap_prisma,
+    "alembic": wrap_alembic,
+}
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate database migration templates from change descriptions.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Supported change descriptions:
+  "add email_verified boolean to users"
+  "drop column legacy_flag from accounts"
+  "rename column name to full_name in customers"
+  "create table reviews with id, user_id, rating int, body text"
+  "drop table temp_imports"
+  "add index on orders(status, created_at)"
+  "add unique index on users(email)"
+  "change email type to varchar in users"
+
+Examples:
+  %(prog)s --change "add phone varchar to users" --dialect postgres
+  %(prog)s --change "create table reviews with id, user_id, rating int, body" --format prisma
+  %(prog)s --change "add index on orders(status)" --output migrations/001.sql --json
+        """,
+    )
+    parser.add_argument("--change", required=True, help="Natural-language description of the schema change")
+    parser.add_argument("--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
+                        default="postgres", help="Target database dialect (default: postgres)")
+    parser.add_argument("--format", choices=["sql", "prisma", "alembic"], default="sql",
+                        dest="fmt", help="Output format (default: sql)")
+    parser.add_argument("--output", help="Write migration to file instead of stdout")
+    parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
+    args = parser.parse_args()
+
+    change = parse_change(args.change)
+    if not change:
+        print(f"Error: Could not parse change description: '{args.change}'", file=sys.stderr)
+        print("Run with --help to see supported patterns.", file=sys.stderr)
+        sys.exit(1)
+
+    gen_fn = GENERATORS.get(change["op"])
+    if not gen_fn:
+        print(f"Error: No generator for operation '{change['op']}'", file=sys.stderr)
+        sys.exit(1)
+
+    up, down, warnings = gen_fn(change, args.dialect)
+
+    fmt_fn = FORMATTERS[args.fmt]
+    up_formatted, down_formatted = fmt_fn(up, down, args.change)
+
+    migration = Migration(
+        description=args.change,
+        dialect=args.dialect,
+        format=args.fmt,
+        up=up_formatted,
+        down=down_formatted,
+        warnings=warnings,
+    )
+
+    if args.json_output:
+        print(json.dumps(migration.to_dict(), indent=2))
+    else:
+        if args.output:
+            with open(args.output, "w") as f:
+                f.write(migration.up)
+            print(f"Migration written to {args.output}")
+            if migration.down:
+                down_path = args.output.replace(".sql", "_down.sql")
+                with open(down_path, "w") as f:
+                    f.write(migration.down)
+                print(f"Rollback written to {down_path}")
+        else:
+            print(migration.up)
+            if migration.down:
+                print("\n" + "=" * 40 + " ROLLBACK " + "=" * 40 + "\n")
+                print(migration.down)
+
+        if warnings:
+            print("\nWarnings:")
+            for w in warnings:
+                print(f"  - {w}")
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/sql-database-assistant/scripts/query_optimizer.py
+++ b/engineering/sql-database-assistant/scripts/query_optimizer.py
@@ -0,0 +1,348 @@
+#!/usr/bin/env python3
+"""
+SQL Query Optimizer — Static Analysis
+
+Analyzes SQL queries for common performance issues:
+- SELECT * usage
+- Missing WHERE clauses on UPDATE/DELETE
+- Cartesian joins (missing JOIN conditions)
+- Subqueries in SELECT list
+- Missing LIMIT on unbounded SELECTs
+- Function calls on indexed columns (non-sargable)
+- LIKE with leading wildcard
+- ORDER BY RAND()
+- UNION instead of UNION ALL
+- NOT IN with subquery (NULL-unsafe)
+
+Usage:
+    python query_optimizer.py --query "SELECT * FROM users"
+    python query_optimizer.py --query queries.sql --dialect postgres
+    python query_optimizer.py --query "SELECT * FROM orders" --json
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+from dataclasses import dataclass, asdict
+from typing import List, Optional
+
+
+@dataclass
+class Issue:
+    """A single optimization issue found in a query."""
+    severity: str  # critical, warning, info
+    rule: str
+    message: str
+    suggestion: str
+    line: Optional[int] = None
+
+
+@dataclass
+class QueryAnalysis:
+    """Analysis result for one SQL query."""
+    query: str
+    issues: List[Issue]
+    score: int  # 0-100, higher is better
+
+    def to_dict(self):
+        return {
+            "query": self.query[:200] + ("..." if len(self.query) > 200 else ""),
+            "issues": [asdict(i) for i in self.issues],
+            "issue_count": len(self.issues),
+            "score": self.score,
+        }
+
+
+# ---------------------------------------------------------------------------
+# Rule checkers
+# ---------------------------------------------------------------------------
+
+def check_select_star(sql: str) -> Optional[Issue]:
+    """Detect SELECT * usage."""
+    if re.search(r'\bSELECT\s+\*\s', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="select-star",
+            message="SELECT * transfers unnecessary data and breaks on schema changes.",
+            suggestion="List only the columns you need: SELECT col1, col2, ...",
+        )
+    return None
+
+
+def check_missing_where(sql: str) -> Optional[Issue]:
+    """Detect UPDATE/DELETE without WHERE."""
+    upper = sql.upper().strip()
+    for keyword in ("UPDATE", "DELETE"):
+        if upper.startswith(keyword) and "WHERE" not in upper:
+            return Issue(
+                severity="critical",
+                rule="missing-where",
+                message=f"{keyword} without WHERE affects every row in the table.",
+                suggestion=f"Add a WHERE clause to restrict the {keyword} scope.",
+            )
+    return None
+
+
+def check_cartesian_join(sql: str) -> Optional[Issue]:
+    """Detect comma-separated tables without explicit JOIN or WHERE join condition."""
+    upper = sql.upper()
+    if "SELECT" not in upper:
+        return None
+    from_match = re.search(r'\bFROM\s+(.+?)(?:\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
+                           sql, re.IGNORECASE | re.DOTALL)
+    if not from_match:
+        return None
+    from_clause = from_match.group(1)
+    # Skip if explicit JOINs are used
+    if re.search(r'\bJOIN\b', from_clause, re.IGNORECASE):
+        return None
+    # Count comma-separated tables
+    tables = [t.strip() for t in from_clause.split(",") if t.strip()]
+    if len(tables) > 1 and "WHERE" not in upper:
+        return Issue(
+            severity="critical",
+            rule="cartesian-join",
+            message="Multiple tables in FROM without JOIN or WHERE creates a cartesian product.",
+            suggestion="Use explicit JOIN syntax with ON conditions.",
+        )
+    return None
+
+
+def check_subquery_in_select(sql: str) -> Optional[Issue]:
+    """Detect correlated subqueries in SELECT list."""
+    select_match = re.search(r'\bSELECT\b(.+?)\bFROM\b', sql, re.IGNORECASE | re.DOTALL)
+    if select_match:
+        select_clause = select_match.group(1)
+        if re.search(r'\(\s*SELECT\b', select_clause, re.IGNORECASE):
+            return Issue(
+                severity="warning",
+                rule="subquery-in-select",
+                message="Subquery in SELECT list executes once per row (correlated subquery).",
+                suggestion="Rewrite as a LEFT JOIN with aggregation.",
+            )
+    return None
+
+
+def check_missing_limit(sql: str) -> Optional[Issue]:
+    """Detect unbounded SELECT without LIMIT."""
+    upper = sql.upper().strip()
+    if not upper.startswith("SELECT"):
+        return None
+    # Skip if it's a subquery or aggregate-only
+    if re.search(r'\bCOUNT\s*\(', upper) and "GROUP BY" not in upper:
+        return None
+    if "LIMIT" not in upper and "FETCH" not in upper and "TOP " not in upper:
+        return Issue(
+            severity="info",
+            rule="missing-limit",
+            message="SELECT without LIMIT may return unbounded rows.",
+            suggestion="Add LIMIT to prevent returning excessive data.",
+        )
+    return None
+
+
+def check_function_on_column(sql: str) -> Optional[Issue]:
+    """Detect function calls on columns in WHERE (non-sargable)."""
+    where_match = re.search(r'\bWHERE\b(.+?)(?:\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
+                            sql, re.IGNORECASE | re.DOTALL)
+    if not where_match:
+        return None
+    where_clause = where_match.group(1)
+    non_sargable = re.search(
+        r'\b(YEAR|MONTH|DAY|DATE|UPPER|LOWER|TRIM|CAST|COALESCE|IFNULL|NVL)\s*\(',
+        where_clause, re.IGNORECASE
+    )
+    if non_sargable:
+        func = non_sargable.group(1).upper()
+        return Issue(
+            severity="warning",
+            rule="non-sargable",
+            message=f"Function {func}() on column in WHERE prevents index usage.",
+            suggestion="Rewrite to compare the raw column against transformed constants.",
+        )
+    return None
+
+
+def check_leading_wildcard(sql: str) -> Optional[Issue]:
+    """Detect LIKE '%...' patterns."""
+    if re.search(r"LIKE\s+'%", sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="leading-wildcard",
+            message="LIKE with leading wildcard prevents index usage.",
+            suggestion="Use full-text search (GIN index, FULLTEXT, FTS5) for substring matching.",
+        )
+    return None
+
+
+def check_order_by_rand(sql: str) -> Optional[Issue]:
+    """Detect ORDER BY RAND() / RANDOM()."""
+    if re.search(r'ORDER\s+BY\s+(RAND|RANDOM)\s*\(\)', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="order-by-rand",
+            message="ORDER BY RAND() scans and sorts the entire table.",
+            suggestion="Use application-side random sampling or TABLESAMPLE.",
+        )
+    return None
+
+
+def check_union_vs_union_all(sql: str) -> Optional[Issue]:
+    """Detect UNION without ALL (unnecessary dedup)."""
+    if re.search(r'\bUNION\b(?!\s+ALL\b)', sql, re.IGNORECASE):
+        return Issue(
+            severity="info",
+            rule="union-without-all",
+            message="UNION performs deduplication sort; use UNION ALL if duplicates are acceptable.",
+            suggestion="Replace UNION with UNION ALL unless you specifically need deduplication.",
+        )
+    return None
+
+
+def check_not_in_subquery(sql: str) -> Optional[Issue]:
+    """Detect NOT IN (SELECT ...) which is NULL-unsafe."""
+    if re.search(r'\bNOT\s+IN\s*\(\s*SELECT\b', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="not-in-subquery",
+            message="NOT IN with subquery returns no rows if any subquery result is NULL.",
+            suggestion="Use NOT EXISTS (SELECT 1 ...) instead.",
+        )
+    return None
+
+
+ALL_CHECKS = [
+    check_select_star,
+    check_missing_where,
+    check_cartesian_join,
+    check_subquery_in_select,
+    check_missing_limit,
+    check_function_on_column,
+    check_leading_wildcard,
+    check_order_by_rand,
+    check_union_vs_union_all,
+    check_not_in_subquery,
+]
+
+
+# ---------------------------------------------------------------------------
+# Analysis engine
+# ---------------------------------------------------------------------------
+
+def analyze_query(sql: str, dialect: str = "postgres") -> QueryAnalysis:
+    """Run all checks against a single SQL query."""
+    issues: List[Issue] = []
+    for check_fn in ALL_CHECKS:
+        issue = check_fn(sql)
+        if issue:
+            issues.append(issue)
+
+    # Score: start at 100, deduct per severity
+    score = 100
+    for issue in issues:
+        if issue.severity == "critical":
+            score -= 25
+        elif issue.severity == "warning":
+            score -= 10
+        else:
+            score -= 5
+    score = max(0, score)
+
+    return QueryAnalysis(query=sql.strip(), issues=issues, score=score)
+
+
+def split_queries(text: str) -> List[str]:
+    """Split SQL text into individual statements."""
+    queries = []
+    for stmt in text.split(";"):
+        stmt = stmt.strip()
+        if stmt and len(stmt) > 5:
+            queries.append(stmt + ";")
+    return queries
+
+
+# ---------------------------------------------------------------------------
+# Output formatting
+# ---------------------------------------------------------------------------
+
+SEVERITY_ICONS = {"critical": "[CRITICAL]", "warning": "[WARNING]", "info": "[INFO]"}
+
+
+def format_text(analyses: List[QueryAnalysis]) -> str:
+    """Format analysis results as human-readable text."""
+    lines = []
+    for i, analysis in enumerate(analyses, 1):
+        lines.append(f"{'='*60}")
+        lines.append(f"Query {i} (Score: {analysis.score}/100)")
+        lines.append(f"  {analysis.query[:120]}{'...' if len(analysis.query) > 120 else ''}")
+        lines.append("")
+        if not analysis.issues:
+            lines.append("  No issues detected.")
+        for issue in analysis.issues:
+            icon = SEVERITY_ICONS.get(issue.severity, "")
+            lines.append(f"  {icon} {issue.rule}: {issue.message}")
+            lines.append(f"    -> {issue.suggestion}")
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_json(analyses: List[QueryAnalysis]) -> str:
+    """Format analysis results as JSON."""
+    return json.dumps(
+        {"analyses": [a.to_dict() for a in analyses], "total_queries": len(analyses)},
+        indent=2,
+    )
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Analyze SQL queries for common performance issues.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --query "SELECT * FROM users"
+  %(prog)s --query queries.sql --dialect mysql
+  %(prog)s --query "DELETE FROM orders" --json
+        """,
+    )
+    parser.add_argument(
+        "--query", required=True,
+        help="SQL query string or path to a .sql file",
+    )
+    parser.add_argument(
+        "--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
+        default="postgres", help="SQL dialect (default: postgres)",
+    )
+    parser.add_argument(
+        "--json", action="store_true", dest="json_output",
+        help="Output results as JSON",
+    )
+    args = parser.parse_args()
+
+    # Determine if query is a file path or inline SQL
+    sql_text = args.query
+    if os.path.isfile(args.query):
+        with open(args.query, "r") as f:
+            sql_text = f.read()
+
+    queries = split_queries(sql_text)
+    if not queries:
+        # Treat the whole input as a single query
+        queries = [sql_text.strip()]
+
+    analyses = [analyze_query(q, args.dialect) for q in queries]
+
+    if args.json_output:
+        print(format_json(analyses))
+    else:
+        print(format_text(analyses))
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/sql-database-assistant/scripts/schema_explorer.py
+++ b/engineering/sql-database-assistant/scripts/schema_explorer.py
@@ -0,0 +1,315 @@
+#!/usr/bin/env python3
+"""
+Schema Explorer
+
+Generates schema documentation from database introspection queries.
+Outputs the introspection SQL and sample documentation templates
+for PostgreSQL, MySQL, SQLite, and SQL Server.
+
+Since this tool runs without a live database connection, it generates:
+1. The introspection queries you need to run
+2. Documentation templates from the results
+3. Sample schema docs for common table patterns
+
+Usage:
+    python schema_explorer.py --dialect postgres --tables all --format md
+    python schema_explorer.py --dialect mysql --tables users,orders --format json
+    python schema_explorer.py --dialect sqlite --tables all --json
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from dataclasses import dataclass, asdict
+from typing import List, Optional, Dict
+
+
+# ---------------------------------------------------------------------------
+# Introspection query templates per dialect
+# ---------------------------------------------------------------------------
+
+INTROSPECTION_QUERIES: Dict[str, Dict[str, str]] = {
+    "postgres": {
+        "tables": textwrap.dedent("""\
+            SELECT table_name
+            FROM information_schema.tables
+            WHERE table_schema = 'public' AND table_type = 'BASE TABLE'
+            ORDER BY table_name;"""),
+        "columns": textwrap.dedent("""\
+            SELECT table_name, column_name, data_type, character_maximum_length,
+                   is_nullable, column_default
+            FROM information_schema.columns
+            WHERE table_schema = 'public' {table_filter}
+            ORDER BY table_name, ordinal_position;"""),
+        "primary_keys": textwrap.dedent("""\
+            SELECT tc.table_name, kcu.column_name
+            FROM information_schema.table_constraints tc
+            JOIN information_schema.key_column_usage kcu
+              ON tc.constraint_name = kcu.constraint_name
+            WHERE tc.constraint_type = 'PRIMARY KEY' AND tc.table_schema = 'public'
+            ORDER BY tc.table_name;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT tc.table_name, kcu.column_name,
+                   ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
+            FROM information_schema.table_constraints tc
+            JOIN information_schema.key_column_usage kcu
+              ON tc.constraint_name = kcu.constraint_name
+            JOIN information_schema.constraint_column_usage ccu
+              ON tc.constraint_name = ccu.constraint_name
+            WHERE tc.constraint_type = 'FOREIGN KEY'
+            ORDER BY tc.table_name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT schemaname, tablename, indexname, indexdef
+            FROM pg_indexes
+            WHERE schemaname = 'public'
+            ORDER BY tablename, indexname;"""),
+        "table_sizes": textwrap.dedent("""\
+            SELECT relname AS table_name,
+                   pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
+                   pg_size_pretty(pg_relation_size(relid)) AS data_size,
+                   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
+            FROM pg_catalog.pg_statio_user_tables
+            ORDER BY pg_total_relation_size(relid) DESC;"""),
+    },
+    "mysql": {
+        "tables": textwrap.dedent("""\
+            SELECT table_name
+            FROM information_schema.tables
+            WHERE table_schema = DATABASE() AND table_type = 'BASE TABLE'
+            ORDER BY table_name;"""),
+        "columns": textwrap.dedent("""\
+            SELECT table_name, column_name, column_type, is_nullable,
+                   column_default, column_key, extra
+            FROM information_schema.columns
+            WHERE table_schema = DATABASE() {table_filter}
+            ORDER BY table_name, ordinal_position;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT table_name, column_name, referenced_table_name, referenced_column_name
+            FROM information_schema.key_column_usage
+            WHERE table_schema = DATABASE() AND referenced_table_name IS NOT NULL
+            ORDER BY table_name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT table_name, index_name, non_unique, column_name, seq_in_index
+            FROM information_schema.statistics
+            WHERE table_schema = DATABASE()
+            ORDER BY table_name, index_name, seq_in_index;"""),
+        "table_sizes": textwrap.dedent("""\
+            SELECT table_name, table_rows,
+                   ROUND(data_length / 1024 / 1024, 2) AS data_mb,
+                   ROUND(index_length / 1024 / 1024, 2) AS index_mb
+            FROM information_schema.tables
+            WHERE table_schema = DATABASE()
+            ORDER BY data_length DESC;"""),
+    },
+    "sqlite": {
+        "tables": textwrap.dedent("""\
+            SELECT name FROM sqlite_master
+            WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
+            ORDER BY name;"""),
+        "columns": textwrap.dedent("""\
+            -- Run for each table:
+            PRAGMA table_info({table_name});"""),
+        "foreign_keys": textwrap.dedent("""\
+            -- Run for each table:
+            PRAGMA foreign_key_list({table_name});"""),
+        "indexes": textwrap.dedent("""\
+            SELECT name, tbl_name, sql FROM sqlite_master
+            WHERE type = 'index'
+            ORDER BY tbl_name, name;"""),
+        "schema_dump": textwrap.dedent("""\
+            SELECT name, sql FROM sqlite_master
+            WHERE type = 'table'
+            ORDER BY name;"""),
+    },
+    "sqlserver": {
+        "tables": textwrap.dedent("""\
+            SELECT TABLE_NAME
+            FROM INFORMATION_SCHEMA.TABLES
+            WHERE TABLE_TYPE = 'BASE TABLE'
+            ORDER BY TABLE_NAME;"""),
+        "columns": textwrap.dedent("""\
+            SELECT t.name AS table_name, c.name AS column_name,
+                   ty.name AS data_type, c.max_length, c.precision, c.scale,
+                   c.is_nullable, dc.definition AS default_value
+            FROM sys.columns c
+            JOIN sys.tables t ON c.object_id = t.object_id
+            JOIN sys.types ty ON c.user_type_id = ty.user_type_id
+            LEFT JOIN sys.default_constraints dc ON c.default_object_id = dc.object_id
+            {table_filter}
+            ORDER BY t.name, c.column_id;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT fk.name AS fk_name,
+                   tp.name AS parent_table, cp.name AS parent_column,
+                   tr.name AS referenced_table, cr.name AS referenced_column
+            FROM sys.foreign_keys fk
+            JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
+            JOIN sys.tables tp ON fkc.parent_object_id = tp.object_id
+            JOIN sys.columns cp ON fkc.parent_object_id = cp.object_id AND fkc.parent_column_id = cp.column_id
+            JOIN sys.tables tr ON fkc.referenced_object_id = tr.object_id
+            JOIN sys.columns cr ON fkc.referenced_object_id = cr.object_id AND fkc.referenced_column_id = cr.column_id
+            ORDER BY tp.name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT t.name AS table_name, i.name AS index_name,
+                   i.type_desc, i.is_unique, c.name AS column_name,
+                   ic.key_ordinal
+            FROM sys.indexes i
+            JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
+            JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
+            JOIN sys.tables t ON i.object_id = t.object_id
+            WHERE i.name IS NOT NULL
+            ORDER BY t.name, i.name, ic.key_ordinal;"""),
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# Documentation generators
+# ---------------------------------------------------------------------------
+
+SAMPLE_TABLES = {
+    "users": {
+        "columns": [
+            {"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
+            {"name": "email", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "Unique, indexed"},
+            {"name": "name", "type": "VARCHAR(255)", "nullable": "YES", "default": "NULL", "notes": "Display name"},
+            {"name": "password_hash", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "bcrypt hash"},
+            {"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+            {"name": "updated_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+        ],
+        "indexes": ["PRIMARY KEY (id)", "UNIQUE INDEX (email)"],
+        "foreign_keys": [],
+    },
+    "orders": {
+        "columns": [
+            {"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
+            {"name": "user_id", "type": "INTEGER", "nullable": "NO", "default": "-", "notes": "FK -> users.id"},
+            {"name": "status", "type": "VARCHAR(50)", "nullable": "NO", "default": "'pending'", "notes": "pending/paid/shipped/cancelled"},
+            {"name": "total", "type": "DECIMAL(19,4)", "nullable": "NO", "default": "0", "notes": "Order total in cents"},
+            {"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+        ],
+        "indexes": ["PRIMARY KEY (id)", "INDEX (user_id)", "INDEX (status, created_at)"],
+        "foreign_keys": ["user_id -> users.id ON DELETE CASCADE"],
+    },
+}
+
+
+def generate_md(dialect: str, tables: List[str]) -> str:
+    """Generate markdown schema documentation."""
+    lines = [f"# Database Schema Documentation ({dialect.upper()})\n"]
+    lines.append(f"Generated by sql-database-assistant schema_explorer.\n")
+
+    # Introspection queries section
+    lines.append("## Introspection Queries\n")
+    lines.append("Run these queries against your database to extract schema information:\n")
+    queries = INTROSPECTION_QUERIES.get(dialect, {})
+    for qname, qsql in queries.items():
+        table_filter = ""
+        if "all" not in tables:
+            tlist = ", ".join(f"'{t}'" for t in tables)
+            table_filter = f"AND table_name IN ({tlist})"
+        qsql = qsql.replace("{table_filter}", table_filter)
+        qsql = qsql.replace("{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME")
+        lines.append(f"### {qname.replace('_', ' ').title()}\n")
+        lines.append(f"```sql\n{qsql}\n```\n")
+
+    # Sample documentation
+    lines.append("## Sample Table Documentation\n")
+    lines.append("Below is an example of the documentation format produced from query results:\n")
+
+    show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
+    for tname in show_tables:
+        sample = SAMPLE_TABLES.get(tname)
+        if not sample:
+            lines.append(f"### {tname}\n")
+            lines.append("_No sample data available. Run introspection queries above._\n")
+            continue
+
+        lines.append(f"### {tname}\n")
+        lines.append("| Column | Type | Nullable | Default | Notes |")
+        lines.append("|--------|------|----------|---------|-------|")
+        for col in sample["columns"]:
+            lines.append(f"| {col['name']} | {col['type']} | {col['nullable']} | {col['default']} | {col['notes']} |")
+        lines.append("")
+        if sample["indexes"]:
+            lines.append("**Indexes:** " + ", ".join(sample["indexes"]))
+        if sample["foreign_keys"]:
+            lines.append("**Foreign Keys:** " + ", ".join(sample["foreign_keys"]))
+        lines.append("")
+
+    return "\n".join(lines)
+
+
+def generate_json_output(dialect: str, tables: List[str]) -> dict:
+    """Generate JSON schema documentation."""
+    queries = INTROSPECTION_QUERIES.get(dialect, {})
+    processed = {}
+    for qname, qsql in queries.items():
+        table_filter = ""
+        if "all" not in tables:
+            tlist = ", ".join(f"'{t}'" for t in tables)
+            table_filter = f"AND table_name IN ({tlist})"
+        processed[qname] = qsql.replace("{table_filter}", table_filter).replace(
+            "{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME"
+        )
+
+    show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
+    sample_docs = {}
+    for tname in show_tables:
+        sample = SAMPLE_TABLES.get(tname)
+        if sample:
+            sample_docs[tname] = sample
+
+    return {
+        "dialect": dialect,
+        "requested_tables": tables,
+        "introspection_queries": processed,
+        "sample_documentation": sample_docs,
+        "instructions": "Run the introspection queries against your database, then use the results to populate documentation in the sample format shown.",
+    }
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate schema documentation from database introspection.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --dialect postgres --tables all --format md
+  %(prog)s --dialect mysql --tables users,orders --format json
+  %(prog)s --dialect sqlite --tables all --json
+        """,
+    )
+    parser.add_argument(
+        "--dialect", required=True, choices=["postgres", "mysql", "sqlite", "sqlserver"],
+        help="Target database dialect",
+    )
+    parser.add_argument(
+        "--tables", default="all",
+        help="Comma-separated table names or 'all' (default: all)",
+    )
+    parser.add_argument(
+        "--format", choices=["md", "json"], default="md", dest="fmt",
+        help="Output format (default: md)",
+    )
+    parser.add_argument(
+        "--json", action="store_true", dest="json_output",
+        help="Output as JSON (overrides --format)",
+    )
+    args = parser.parse_args()
+
+    tables = [t.strip() for t in args.tables.split(",")]
+
+    if args.json_output or args.fmt == "json":
+        result = generate_json_output(args.dialect, tables)
+        print(json.dumps(result, indent=2))
+    else:
+        print(generate_md(args.dialect, tables))
+
+
+if __name__ == "__main__":
+    main()