firefrost-gaming/claude-skills-reference

Files

Reza Rezvani 87f3a007c9 feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance

secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references):
- HashiCorp Vault, AWS SM, Azure KV, GCP SM integration
- Secret rotation, dynamic secrets, audit logging, emergency procedures

sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references):
- Query optimization, migration generation, schema exploration
- Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server)
- ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy)

gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references):
- 6-step workflow mirroring aws-solution-architect for GCP
- Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization
- Completes cloud trifecta (AWS + Azure + GCP)

soc2-compliance (417-line SKILL.md, 3 scripts, 3 references):
- SOC 2 Type I & II preparation, Trust Service Criteria mapping
- Control matrix generation, evidence tracking, gap analysis
- First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485)

All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-25 14:05:11 +01:00

9.7 KiB

Raw Blame History

Emergency Procedures Reference

Secret Leak Response Playbook

Severity Classification

Severity	Definition	Response Time	Example
P0 — Critical	Production credentials exposed publicly	Immediate (15 min)	Database password in public GitHub repo
P1 — High	Internal credentials exposed beyond intended scope	1 hour	API key in build logs accessible to wider org
P2 — Medium	Non-production credentials exposed	4 hours	Staging DB password in internal wiki
P3 — Low	Expired or limited-scope credential exposed	24 hours	Rotated API key found in old commit history

P0/P1 Response Procedure

Phase 1: Contain (0-15 minutes)

Identify the leaked secret
- What credential was exposed? (type, scope, permissions)
- Where was it exposed? (repo, log, error page, third-party service)
- When was it first exposed? (commit timestamp, log timestamp)
- Is the exposure still active? (repo public? log accessible?)
Revoke immediately
- Database password: ALTER ROLE app_user WITH PASSWORD 'new_password';
- API key: Regenerate via provider console/API
- Vault token: vault token revoke <token>
- AWS access key: aws iam delete-access-key --access-key-id <key>
- Cloud service account: Delete and recreate key
- TLS certificate: Revoke via CA, generate new certificate
Remove exposure
- Public repo: Remove file, force-push to remove from history, request GitHub cache purge
- Build logs: Delete log artifacts, rotate CI/CD secrets
- Error page: Deploy fix to suppress secret in error output
- Third-party: Contact vendor for log purge if applicable
Deploy new credentials
- Update secret store with rotated credential
- Restart affected services to pick up new credential
- Verify services are healthy with new credential

Phase 2: Assess (15-60 minutes)

Audit blast radius
- Query Vault/cloud SM audit logs for the compromised credential
- Check for unauthorized usage during the exposure window
- Review network logs for suspicious connections from unknown IPs
- Check if the compromised credential grants access to other secrets (privilege escalation)
Notify stakeholders
- Security team (always)
- Service owners for affected systems
- Compliance team if regulated data was potentially accessed
- Legal if customer data may have been compromised
- Executive leadership for P0 incidents

Phase 3: Recover (1-24 hours)

Rotate adjacent credentials
- If the leaked credential could access other secrets, rotate those too
- If a Vault token leaked, check what policies it had — rotate everything accessible
Harden against recurrence
- Add pre-commit hook to detect secrets (e.g., gitleaks, detect-secrets)
- Review CI/CD pipeline for secret masking
- Audit who has access to the source of the leak

Phase 4: Post-Mortem (24-72 hours)

Document incident
- Timeline of events
- Root cause analysis
- Impact assessment
- Remediation actions taken
- Preventive measures added

Response Communication Template

SECURITY INCIDENT — SECRET EXPOSURE
Severity: P0/P1
Time detected: YYYY-MM-DD HH:MM UTC
Secret type: [database password / API key / token / certificate]
Exposure vector: [public repo / build log / error output / other]
Status: [CONTAINED / INVESTIGATING / RESOLVED]

Immediate actions taken:
- [ ] Credential revoked at source
- [ ] Exposure removed
- [ ] New credential deployed
- [ ] Services verified healthy
- [ ] Audit log review in progress

Blast radius assessment: [PENDING / COMPLETE — no unauthorized access / COMPLETE — unauthorized access detected]

Next update: [time]
Incident commander: [name]

Vault Seal/Unseal Procedures

Understanding Seal Status

Vault uses a seal mechanism to protect the encryption key hierarchy. When sealed, Vault cannot decrypt any data or serve any requests.

Sealed State:
  Vault process running → YES
  API responding → YES (503 Sealed)
  Serving secrets → NO
  All active leases → FROZEN (not revoked)
  Audit logging → NO

Unsealed State:
  Vault process running → YES
  API responding → YES (200 OK)
  Serving secrets → YES
  Active leases → RESUMING
  Audit logging → YES

When to Seal Vault (Emergency Only)

Seal Vault when:

Active intrusion on Vault infrastructure is confirmed
Vault server compromise is suspected (unauthorized root access)
Encryption key material may have been extracted
Regulatory/legal hold requires immediate data access prevention

Do NOT seal for:

Routine maintenance (use graceful shutdown instead)
Single-node issues in HA cluster (let standby take over)
Suspected secret leak (revoke the secret, don't seal Vault)

Seal Procedure

# Seal a single node
vault operator seal

# Seal all nodes (HA cluster)
# Seal each node individually — leader last
vault operator seal -address=https://vault-standby-1:8200
vault operator seal -address=https://vault-standby-2:8200
vault operator seal -address=https://vault-leader:8200

Impact of sealing:

All active client connections dropped immediately
All token and lease timers paused
Applications lose secret access — prepare for cascading failures
Monitoring will fire alerts for sealed state

Unseal Procedure (Shamir Keys)

Requires a quorum of key holders (e.g., 3 of 5).

# Each key holder provides their unseal key
vault operator unseal <key-1>
vault operator unseal <key-2>
vault operator unseal <key-3>
# Vault unseals after reaching threshold

Operational checklist after unseal:

Verify health: vault status shows Sealed: false
Check audit devices: vault audit list — confirm all enabled
Check auth methods: vault auth list
Verify HA status: vault operator raft list-peers
Check lease count: monitor vault.expire.num_leases
Verify applications reconnecting (check application logs)

Unseal Procedure (Auto-Unseal)

If using cloud KMS auto-unseal, Vault unseals automatically on restart:

# Restart Vault service
systemctl restart vault

# Verify unseal (should happen within seconds)
vault status

If auto-unseal fails:

Check cloud KMS key permissions (IAM role may have been modified)
Check network connectivity to cloud KMS endpoint
Check KMS key status (not disabled, not scheduled for deletion)
Check Vault logs: journalctl -u vault -f

Mass Credential Rotation Procedure

When a broad compromise requires rotating many credentials simultaneously.

Pre-Rotation Checklist

Identify all credentials in scope
Map credential dependencies (which services use which credentials)
Determine rotation order (databases before applications)
Prepare rollback plan for each credential
Notify all service owners
Schedule maintenance window if zero-downtime not possible
Stage new credentials in secret store (but don't activate yet)

Rotation Order

Infrastructure credentials — Database root passwords, cloud IAM admin keys
Service credentials — Application database users, API keys
Integration credentials — Third-party API keys, webhook secrets
Human credentials — Force password reset, revoke SSO sessions

Rollback Plan

For each credential, document:

Previous value (store in sealed emergency envelope or HSM)
How to revert (specific command or API call)
Verification step (how to confirm old credential works)
Maximum time to rollback (SLA)

Vault Recovery Procedures

Lost Unseal Keys

If unseal keys are lost and auto-unseal is not configured:

If Vault is currently unsealed: Enable auto-unseal immediately, then reseal/unseal with KMS
If Vault is sealed: Data is irrecoverable without keys. Restore from Raft snapshot backup
Prevention: Store unseal keys in separate, secure locations (HSMs, safety deposit boxes). Use auto-unseal for production.

Raft Cluster Recovery

Single node failure (cluster still has quorum):

# Remove failed peer
vault operator raft remove-peer <failed-node-id>

# Add replacement node
# (new node joins via retry_join in config)

Loss of quorum (majority of nodes failed):

# On a surviving node with recent data
vault operator raft join -leader-ca-cert=@ca.crt https://surviving-node:8200

# If no node survives, restore from snapshot
vault operator raft snapshot restore /backups/latest.snap

Root Token Recovery

If root token is lost (it should be revoked after initial setup):

# Generate new root token (requires unseal key quorum)
vault operator generate-root -init
# Each key holder provides their key
vault operator generate-root -nonce=<nonce> <unseal-key>
# After quorum, decode the encoded token
vault operator generate-root -decode=<encoded-token> -otp=<otp>

Best practice: Generate a root token only when needed, complete the task, then revoke it:

vault token revoke <root-token>

Incident Severity Escalation Matrix

Signal	Escalation
Single secret exposed in internal log	P2 — Rotate secret, add log masking
Secret in public repository (no evidence of use)	P1 — Immediate rotation, history scrub
Secret in public repository (evidence of unauthorized use)	P0 — Full incident response, legal notification
Vault node compromised	P0 — Seal cluster, rotate all accessible secrets
Cloud KMS key compromised	P0 — Create new key, re-encrypt all secrets, rotate all credentials
Audit log gap detected	P1 — Investigate cause, assume worst case for gap period
Multiple failed auth attempts from unknown source	P2 — Block source, investigate, rotate targeted credentials

9.7 KiB Raw Blame History