secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
343 lines
10 KiB
Markdown
343 lines
10 KiB
Markdown
# HashiCorp Vault Architecture & Patterns Reference
|
|
|
|
## Architecture Overview
|
|
|
|
Vault operates as a centralized secret management service with a client-server model. All secrets are encrypted at rest and in transit. The seal/unseal mechanism protects the master encryption key.
|
|
|
|
### Core Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────┐
|
|
│ Vault Cluster │
|
|
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
|
|
│ │ Leader │ │ Standby │ │ Standby │ │
|
|
│ │ (active) │ │ (forward) │ │ (forward) │ │
|
|
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
|
|
│ │ │ │ │
|
|
│ ┌─────┴───────────────┴───────────────┴─────┐ │
|
|
│ │ Raft Storage Backend │ │
|
|
│ └───────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
|
|
│ │ Auth │ │ Secret │ │ Audit │ │
|
|
│ │ Methods │ │ Engines │ │ Devices │ │
|
|
│ └──────────┘ └──────────┘ └──────────────┘ │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Storage Backend Selection
|
|
|
|
| Backend | HA Support | Operational Complexity | Recommendation |
|
|
|---------|-----------|----------------------|----------------|
|
|
| Integrated Raft | Yes | Low | **Default choice** — no external dependencies |
|
|
| Consul | Yes | Medium | Legacy — use Raft unless already running Consul |
|
|
| S3/GCS/Azure Blob | No | Low | Dev/test only — no HA |
|
|
| PostgreSQL/MySQL | No | Medium | Not recommended — no HA, added dependency |
|
|
|
|
## High Availability Setup
|
|
|
|
### Raft Cluster Configuration
|
|
|
|
Minimum 3 nodes for production (tolerates 1 failure). 5 nodes for critical workloads (tolerates 2 failures).
|
|
|
|
```hcl
|
|
# vault-config.hcl (per node)
|
|
storage "raft" {
|
|
path = "/opt/vault/data"
|
|
node_id = "vault-1"
|
|
|
|
retry_join {
|
|
leader_api_addr = "https://vault-2.internal:8200"
|
|
}
|
|
retry_join {
|
|
leader_api_addr = "https://vault-3.internal:8200"
|
|
}
|
|
}
|
|
|
|
listener "tcp" {
|
|
address = "0.0.0.0:8200"
|
|
tls_cert_file = "/opt/vault/tls/vault.crt"
|
|
tls_key_file = "/opt/vault/tls/vault.key"
|
|
}
|
|
|
|
api_addr = "https://vault-1.internal:8200"
|
|
cluster_addr = "https://vault-1.internal:8201"
|
|
```
|
|
|
|
### Auto-Unseal with AWS KMS
|
|
|
|
Eliminates manual unseal key management. Vault encrypts its master key with the KMS key.
|
|
|
|
```hcl
|
|
seal "awskms" {
|
|
region = "us-east-1"
|
|
kms_key_id = "alias/vault-unseal"
|
|
}
|
|
```
|
|
|
|
**Requirements:**
|
|
- IAM role with `kms:Encrypt`, `kms:Decrypt`, `kms:DescribeKey` permissions
|
|
- KMS key must be in the same region or accessible cross-region
|
|
- KMS key should have restricted access — only Vault nodes
|
|
|
|
### Auto-Unseal with Azure Key Vault
|
|
|
|
```hcl
|
|
seal "azurekeyvault" {
|
|
tenant_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
|
|
vault_name = "vault-unseal-kv"
|
|
key_name = "vault-unseal-key"
|
|
}
|
|
```
|
|
|
|
### Auto-Unseal with GCP KMS
|
|
|
|
```hcl
|
|
seal "gcpckms" {
|
|
project = "my-project"
|
|
region = "global"
|
|
key_ring = "vault-keyring"
|
|
crypto_key = "vault-unseal-key"
|
|
}
|
|
```
|
|
|
|
## Namespaces (Enterprise)
|
|
|
|
Namespaces provide tenant isolation within a single Vault cluster. Each namespace has independent policies, auth methods, and secret engines.
|
|
|
|
```
|
|
root/
|
|
├── dev/ # Development environment
|
|
│ ├── auth/
|
|
│ └── secret/
|
|
├── staging/ # Staging environment
|
|
│ ├── auth/
|
|
│ └── secret/
|
|
└── production/ # Production environment
|
|
├── auth/
|
|
└── secret/
|
|
```
|
|
|
|
**OSS alternative:** Use path-based isolation with strict policies. Prefix all paths with environment name (e.g., `secret/data/production/...`).
|
|
|
|
## Policy Patterns
|
|
|
|
### Templated Policies
|
|
|
|
Use identity-based templates for scalable policy management:
|
|
|
|
```hcl
|
|
# Allow entities to manage their own secrets
|
|
path "secret/data/{{identity.entity.name}}/*" {
|
|
capabilities = ["create", "read", "update", "delete"]
|
|
}
|
|
|
|
# Read shared config for the entity's group
|
|
path "secret/data/shared/{{identity.groups.names}}/*" {
|
|
capabilities = ["read"]
|
|
}
|
|
```
|
|
|
|
### Sentinel Policies (Enterprise)
|
|
|
|
Enforce governance rules beyond path-based access:
|
|
|
|
```python
|
|
# Require MFA for production secret writes
|
|
import "mfa"
|
|
|
|
main = rule {
|
|
request.path matches "secret/data/production/.*" and
|
|
request.operation in ["create", "update", "delete"] and
|
|
mfa.methods.totp.valid
|
|
}
|
|
```
|
|
|
|
### Policy Hierarchy
|
|
|
|
1. **Global deny** — Explicit deny on `sys/*`, `auth/token/create-orphan`
|
|
2. **Environment base** — Read access to environment-specific paths
|
|
3. **Service-specific** — Scoped to exact paths the service needs
|
|
4. **Admin override** — Requires MFA, time-limited, audit-heavy
|
|
|
|
## Secret Engine Configuration
|
|
|
|
### KV v2 (Versioned Key-Value)
|
|
|
|
```bash
|
|
# Enable with custom config
|
|
vault secrets enable -path=secret -version=2 kv
|
|
|
|
# Configure version retention
|
|
vault write secret/config max_versions=10 cas_required=true delete_version_after=90d
|
|
```
|
|
|
|
**Check-and-Set (CAS):** Prevents accidental overwrites. Client must supply the current version number to update.
|
|
|
|
### Database Engine
|
|
|
|
```bash
|
|
# Enable and configure PostgreSQL
|
|
vault secrets enable database
|
|
|
|
vault write database/config/postgres \
|
|
plugin_name=postgresql-database-plugin \
|
|
connection_url="postgresql://{{username}}:{{password}}@db.internal:5432/app?sslmode=require" \
|
|
allowed_roles="app-readonly,app-readwrite" \
|
|
username="vault_admin" \
|
|
password="INITIAL_PASSWORD"
|
|
|
|
# Rotate the root password (Vault manages it from now on)
|
|
vault write -f database/rotate-root/postgres
|
|
|
|
# Create a read-only role
|
|
vault write database/roles/app-readonly \
|
|
db_name=postgres \
|
|
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
|
|
revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
|
|
default_ttl=1h \
|
|
max_ttl=24h
|
|
```
|
|
|
|
### PKI Engine (Certificate Authority)
|
|
|
|
```bash
|
|
# Enable PKI engine
|
|
vault secrets enable -path=pki pki
|
|
vault secrets tune -max-lease-ttl=87600h pki
|
|
|
|
# Generate root CA
|
|
vault write -field=certificate pki/root/generate/internal \
|
|
common_name="Example Root CA" \
|
|
ttl=87600h > root_ca.crt
|
|
|
|
# Enable intermediate CA
|
|
vault secrets enable -path=pki_int pki
|
|
vault secrets tune -max-lease-ttl=43800h pki_int
|
|
|
|
# Generate intermediate CSR
|
|
vault write -field=csr pki_int/intermediate/generate/internal \
|
|
common_name="Example Intermediate CA" > intermediate.csr
|
|
|
|
# Sign with root CA
|
|
vault write -field=certificate pki/root/sign-intermediate \
|
|
csr=@intermediate.csr format=pem_bundle ttl=43800h > intermediate.crt
|
|
|
|
# Set signed certificate
|
|
vault write pki_int/intermediate/set-signed certificate=@intermediate.crt
|
|
|
|
# Create role for leaf certificates
|
|
vault write pki_int/roles/web-server \
|
|
allowed_domains="example.com" \
|
|
allow_subdomains=true \
|
|
max_ttl=2160h
|
|
```
|
|
|
|
### Transit Engine (Encryption-as-a-Service)
|
|
|
|
```bash
|
|
vault secrets enable transit
|
|
|
|
# Create encryption key
|
|
vault write -f transit/keys/payment-data \
|
|
type=aes256-gcm96
|
|
|
|
# Encrypt data
|
|
vault write transit/encrypt/payment-data \
|
|
plaintext=$(echo "sensitive-data" | base64)
|
|
|
|
# Decrypt data
|
|
vault write transit/decrypt/payment-data \
|
|
ciphertext="vault:v1:..."
|
|
|
|
# Rotate key (old versions still decrypt, new encrypts with latest)
|
|
vault write -f transit/keys/payment-data/rotate
|
|
|
|
# Rewrap ciphertext to latest key version
|
|
vault write transit/rewrap/payment-data \
|
|
ciphertext="vault:v1:..."
|
|
```
|
|
|
|
## Performance and Scaling
|
|
|
|
### Performance Replication (Enterprise)
|
|
|
|
Primary cluster replicates to secondary clusters in other regions. Secondaries handle read traffic locally.
|
|
|
|
### Performance Standbys (Enterprise)
|
|
|
|
Standby nodes serve read requests without forwarding to the leader, reducing leader load.
|
|
|
|
### Response Wrapping
|
|
|
|
Wrap sensitive responses in a single-use token — the recipient unwraps exactly once:
|
|
|
|
```bash
|
|
# Wrap a secret (TTL = 5 minutes)
|
|
vault kv get -wrap-ttl=5m secret/data/production/db-creds
|
|
|
|
# Recipient unwraps
|
|
vault unwrap <wrapping_token>
|
|
```
|
|
|
|
### Batch Tokens
|
|
|
|
For high-throughput workloads (Lambda, serverless), use batch tokens instead of service tokens. Batch tokens are not persisted to storage, reducing I/O.
|
|
|
|
## Monitoring and Health
|
|
|
|
### Key Metrics
|
|
|
|
| Metric | Alert Threshold | Source |
|
|
|--------|----------------|--------|
|
|
| `vault.core.unsealed` | 0 (sealed) | Telemetry |
|
|
| `vault.expire.num_leases` | >10,000 | Telemetry |
|
|
| `vault.audit.log_response` | Error rate >1% | Telemetry |
|
|
| `vault.runtime.alloc_bytes` | >80% memory | Telemetry |
|
|
| `vault.raft.leader.lastContact` | >500ms | Telemetry |
|
|
| `vault.token.count` | >50,000 | Telemetry |
|
|
|
|
### Health Check Endpoint
|
|
|
|
```bash
|
|
# Returns 200 if initialized, unsealed, and active
|
|
curl -s https://vault.internal:8200/v1/sys/health
|
|
|
|
# Status codes:
|
|
# 200 — initialized, unsealed, active
|
|
# 429 — unsealed, standby
|
|
# 472 — disaster recovery secondary
|
|
# 473 — performance standby
|
|
# 501 — not initialized
|
|
# 503 — sealed
|
|
```
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backup
|
|
|
|
```bash
|
|
# Raft snapshot (includes all data)
|
|
vault operator raft snapshot save backup-$(date +%Y%m%d).snap
|
|
|
|
# Schedule daily backups via cron
|
|
0 2 * * * /usr/local/bin/vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap
|
|
```
|
|
|
|
### Restore
|
|
|
|
```bash
|
|
# Restore from snapshot (causes brief outage)
|
|
vault operator raft snapshot restore backup-20260320.snap
|
|
```
|
|
|
|
### DR Replication (Enterprise)
|
|
|
|
Secondary cluster in standby. Promote on primary failure:
|
|
|
|
```bash
|
|
# On DR secondary
|
|
vault operator generate-root -dr-token
|
|
vault write sys/replication/dr/secondary/promote dr_operation_token=<token>
|
|
```
|