# HashiCorp Vault Architecture & Patterns Reference ## Architecture Overview Vault operates as a centralized secret management service with a client-server model. All secrets are encrypted at rest and in transit. The seal/unseal mechanism protects the master encryption key. ### Core Components ``` ┌─────────────────────────────────────────────────┐ │ Vault Cluster │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │ Leader │ │ Standby │ │ Standby │ │ │ │ (active) │ │ (forward) │ │ (forward) │ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ │ │ │ │ ┌─────┴───────────────┴───────────────┴─────┐ │ │ │ Raft Storage Backend │ │ │ └───────────────────────────────────────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │ │ Auth │ │ Secret │ │ Audit │ │ │ │ Methods │ │ Engines │ │ Devices │ │ │ └──────────┘ └──────────┘ └──────────────┘ │ └─────────────────────────────────────────────────┘ ``` ### Storage Backend Selection | Backend | HA Support | Operational Complexity | Recommendation | |---------|-----------|----------------------|----------------| | Integrated Raft | Yes | Low | **Default choice** — no external dependencies | | Consul | Yes | Medium | Legacy — use Raft unless already running Consul | | S3/GCS/Azure Blob | No | Low | Dev/test only — no HA | | PostgreSQL/MySQL | No | Medium | Not recommended — no HA, added dependency | ## High Availability Setup ### Raft Cluster Configuration Minimum 3 nodes for production (tolerates 1 failure). 5 nodes for critical workloads (tolerates 2 failures). ```hcl # vault-config.hcl (per node) storage "raft" { path = "/opt/vault/data" node_id = "vault-1" retry_join { leader_api_addr = "https://vault-2.internal:8200" } retry_join { leader_api_addr = "https://vault-3.internal:8200" } } listener "tcp" { address = "0.0.0.0:8200" tls_cert_file = "/opt/vault/tls/vault.crt" tls_key_file = "/opt/vault/tls/vault.key" } api_addr = "https://vault-1.internal:8200" cluster_addr = "https://vault-1.internal:8201" ``` ### Auto-Unseal with AWS KMS Eliminates manual unseal key management. Vault encrypts its master key with the KMS key. ```hcl seal "awskms" { region = "us-east-1" kms_key_id = "alias/vault-unseal" } ``` **Requirements:** - IAM role with `kms:Encrypt`, `kms:Decrypt`, `kms:DescribeKey` permissions - KMS key must be in the same region or accessible cross-region - KMS key should have restricted access — only Vault nodes ### Auto-Unseal with Azure Key Vault ```hcl seal "azurekeyvault" { tenant_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" vault_name = "vault-unseal-kv" key_name = "vault-unseal-key" } ``` ### Auto-Unseal with GCP KMS ```hcl seal "gcpckms" { project = "my-project" region = "global" key_ring = "vault-keyring" crypto_key = "vault-unseal-key" } ``` ## Namespaces (Enterprise) Namespaces provide tenant isolation within a single Vault cluster. Each namespace has independent policies, auth methods, and secret engines. ``` root/ ├── dev/ # Development environment │ ├── auth/ │ └── secret/ ├── staging/ # Staging environment │ ├── auth/ │ └── secret/ └── production/ # Production environment ├── auth/ └── secret/ ``` **OSS alternative:** Use path-based isolation with strict policies. Prefix all paths with environment name (e.g., `secret/data/production/...`). ## Policy Patterns ### Templated Policies Use identity-based templates for scalable policy management: ```hcl # Allow entities to manage their own secrets path "secret/data/{{identity.entity.name}}/*" { capabilities = ["create", "read", "update", "delete"] } # Read shared config for the entity's group path "secret/data/shared/{{identity.groups.names}}/*" { capabilities = ["read"] } ``` ### Sentinel Policies (Enterprise) Enforce governance rules beyond path-based access: ```python # Require MFA for production secret writes import "mfa" main = rule { request.path matches "secret/data/production/.*" and request.operation in ["create", "update", "delete"] and mfa.methods.totp.valid } ``` ### Policy Hierarchy 1. **Global deny** — Explicit deny on `sys/*`, `auth/token/create-orphan` 2. **Environment base** — Read access to environment-specific paths 3. **Service-specific** — Scoped to exact paths the service needs 4. **Admin override** — Requires MFA, time-limited, audit-heavy ## Secret Engine Configuration ### KV v2 (Versioned Key-Value) ```bash # Enable with custom config vault secrets enable -path=secret -version=2 kv # Configure version retention vault write secret/config max_versions=10 cas_required=true delete_version_after=90d ``` **Check-and-Set (CAS):** Prevents accidental overwrites. Client must supply the current version number to update. ### Database Engine ```bash # Enable and configure PostgreSQL vault secrets enable database vault write database/config/postgres \ plugin_name=postgresql-database-plugin \ connection_url="postgresql://{{username}}:{{password}}@db.internal:5432/app?sslmode=require" \ allowed_roles="app-readonly,app-readwrite" \ username="vault_admin" \ password="INITIAL_PASSWORD" # Rotate the root password (Vault manages it from now on) vault write -f database/rotate-root/postgres # Create a read-only role vault write database/roles/app-readonly \ db_name=postgres \ creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \ revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \ default_ttl=1h \ max_ttl=24h ``` ### PKI Engine (Certificate Authority) ```bash # Enable PKI engine vault secrets enable -path=pki pki vault secrets tune -max-lease-ttl=87600h pki # Generate root CA vault write -field=certificate pki/root/generate/internal \ common_name="Example Root CA" \ ttl=87600h > root_ca.crt # Enable intermediate CA vault secrets enable -path=pki_int pki vault secrets tune -max-lease-ttl=43800h pki_int # Generate intermediate CSR vault write -field=csr pki_int/intermediate/generate/internal \ common_name="Example Intermediate CA" > intermediate.csr # Sign with root CA vault write -field=certificate pki/root/sign-intermediate \ csr=@intermediate.csr format=pem_bundle ttl=43800h > intermediate.crt # Set signed certificate vault write pki_int/intermediate/set-signed certificate=@intermediate.crt # Create role for leaf certificates vault write pki_int/roles/web-server \ allowed_domains="example.com" \ allow_subdomains=true \ max_ttl=2160h ``` ### Transit Engine (Encryption-as-a-Service) ```bash vault secrets enable transit # Create encryption key vault write -f transit/keys/payment-data \ type=aes256-gcm96 # Encrypt data vault write transit/encrypt/payment-data \ plaintext=$(echo "sensitive-data" | base64) # Decrypt data vault write transit/decrypt/payment-data \ ciphertext="vault:v1:..." # Rotate key (old versions still decrypt, new encrypts with latest) vault write -f transit/keys/payment-data/rotate # Rewrap ciphertext to latest key version vault write transit/rewrap/payment-data \ ciphertext="vault:v1:..." ``` ## Performance and Scaling ### Performance Replication (Enterprise) Primary cluster replicates to secondary clusters in other regions. Secondaries handle read traffic locally. ### Performance Standbys (Enterprise) Standby nodes serve read requests without forwarding to the leader, reducing leader load. ### Response Wrapping Wrap sensitive responses in a single-use token — the recipient unwraps exactly once: ```bash # Wrap a secret (TTL = 5 minutes) vault kv get -wrap-ttl=5m secret/data/production/db-creds # Recipient unwraps vault unwrap ``` ### Batch Tokens For high-throughput workloads (Lambda, serverless), use batch tokens instead of service tokens. Batch tokens are not persisted to storage, reducing I/O. ## Monitoring and Health ### Key Metrics | Metric | Alert Threshold | Source | |--------|----------------|--------| | `vault.core.unsealed` | 0 (sealed) | Telemetry | | `vault.expire.num_leases` | >10,000 | Telemetry | | `vault.audit.log_response` | Error rate >1% | Telemetry | | `vault.runtime.alloc_bytes` | >80% memory | Telemetry | | `vault.raft.leader.lastContact` | >500ms | Telemetry | | `vault.token.count` | >50,000 | Telemetry | ### Health Check Endpoint ```bash # Returns 200 if initialized, unsealed, and active curl -s https://vault.internal:8200/v1/sys/health # Status codes: # 200 — initialized, unsealed, active # 429 — unsealed, standby # 472 — disaster recovery secondary # 473 — performance standby # 501 — not initialized # 503 — sealed ``` ## Disaster Recovery ### Backup ```bash # Raft snapshot (includes all data) vault operator raft snapshot save backup-$(date +%Y%m%d).snap # Schedule daily backups via cron 0 2 * * * /usr/local/bin/vault operator raft snapshot save /backups/vault-$(date +\%Y\%m\%d).snap ``` ### Restore ```bash # Restore from snapshot (causes brief outage) vault operator raft snapshot restore backup-20260320.snap ``` ### DR Replication (Enterprise) Secondary cluster in standby. Promote on primary failure: ```bash # On DR secondary vault operator generate-root -dr-token vault write sys/replication/dr/secondary/promote dr_operation_token= ```