Merge pull request #408 from alirezarezvani/feature/sprint-improvements
improve(engineering): enhance 5 existing skills — tdd-guide, env-secrets-manager, senior-secops, database-designer, senior-devops
This commit is contained in:
@@ -59,6 +59,229 @@ A comprehensive database design skill that provides expert-level analysis, optim
|
||||
4. **Validate inputs**: Prevent SQL injection attacks
|
||||
5. **Regular security updates**: Keep database software current
|
||||
|
||||
## Query Generation Patterns
|
||||
|
||||
### SELECT with JOINs
|
||||
|
||||
```sql
|
||||
-- INNER JOIN: only matching rows
|
||||
SELECT o.id, c.name, o.total
|
||||
FROM orders o
|
||||
INNER JOIN customers c ON c.id = o.customer_id;
|
||||
|
||||
-- LEFT JOIN: all left rows, NULLs for non-matches
|
||||
SELECT c.name, COUNT(o.id) AS order_count
|
||||
FROM customers c
|
||||
LEFT JOIN orders o ON o.customer_id = c.id
|
||||
GROUP BY c.name;
|
||||
|
||||
-- Self-join: hierarchical data (employees/managers)
|
||||
SELECT e.name AS employee, m.name AS manager
|
||||
FROM employees e
|
||||
LEFT JOIN employees m ON m.id = e.manager_id;
|
||||
```
|
||||
|
||||
### Common Table Expressions (CTEs)
|
||||
|
||||
```sql
|
||||
-- Recursive CTE for org chart
|
||||
WITH RECURSIVE org AS (
|
||||
SELECT id, name, manager_id, 1 AS depth
|
||||
FROM employees WHERE manager_id IS NULL
|
||||
UNION ALL
|
||||
SELECT e.id, e.name, e.manager_id, o.depth + 1
|
||||
FROM employees e INNER JOIN org o ON o.id = e.manager_id
|
||||
)
|
||||
SELECT * FROM org ORDER BY depth, name;
|
||||
```
|
||||
|
||||
### Window Functions
|
||||
|
||||
```sql
|
||||
-- ROW_NUMBER for pagination / dedup
|
||||
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
|
||||
FROM orders;
|
||||
|
||||
-- RANK with gaps, DENSE_RANK without gaps
|
||||
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;
|
||||
|
||||
-- LAG/LEAD for comparing adjacent rows
|
||||
SELECT date, revenue,
|
||||
revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
|
||||
FROM daily_sales;
|
||||
```
|
||||
|
||||
### Aggregation Patterns
|
||||
|
||||
```sql
|
||||
-- FILTER clause (PostgreSQL) for conditional aggregation
|
||||
SELECT
|
||||
COUNT(*) AS total,
|
||||
COUNT(*) FILTER (WHERE status = 'active') AS active,
|
||||
AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
|
||||
FROM accounts;
|
||||
|
||||
-- GROUPING SETS for multi-level rollups
|
||||
SELECT region, product, SUM(revenue)
|
||||
FROM sales
|
||||
GROUP BY GROUPING SETS ((region, product), (region), ());
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Patterns
|
||||
|
||||
### Up/Down Migration Scripts
|
||||
|
||||
Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:
|
||||
|
||||
```
|
||||
migrations/
|
||||
├── 20260101_000001_create_users.up.sql
|
||||
├── 20260101_000001_create_users.down.sql
|
||||
├── 20260115_000002_add_users_email_index.up.sql
|
||||
└── 20260115_000002_add_users_email_index.down.sql
|
||||
```
|
||||
|
||||
### Zero-Downtime Migrations (Expand/Contract)
|
||||
|
||||
Use the expand-contract pattern to avoid locking or breaking running code:
|
||||
|
||||
1. **Expand** — add the new column/table (nullable, with default)
|
||||
2. **Migrate data** — backfill in batches; dual-write from application
|
||||
3. **Transition** — application reads from new column; stop writing to old
|
||||
4. **Contract** — drop old column in a follow-up migration
|
||||
|
||||
### Data Backfill Strategies
|
||||
|
||||
```sql
|
||||
-- Batch update to avoid long-running locks
|
||||
UPDATE users SET email_normalized = LOWER(email)
|
||||
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
|
||||
-- Repeat in a loop until 0 rows affected
|
||||
```
|
||||
|
||||
### Rollback Procedures
|
||||
|
||||
- Always test the `down.sql` in staging before deploying `up.sql` to production
|
||||
- Keep rollback window short — if the contract step has run, rollback requires a new forward migration
|
||||
- For irreversible changes (dropping columns with data), take a logical backup first
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Indexing Strategies
|
||||
|
||||
| Index Type | Use Case | Example |
|
||||
|------------|----------|---------|
|
||||
| **B-tree** (default) | Equality, range, ORDER BY | `CREATE INDEX idx_users_email ON users(email);` |
|
||||
| **GIN** | Full-text search, JSONB, arrays | `CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));` |
|
||||
| **GiST** | Geometry, range types, nearest-neighbor | `CREATE INDEX idx_locations ON places USING gist(coords);` |
|
||||
| **Partial** | Subset of rows (reduce size) | `CREATE INDEX idx_active ON users(email) WHERE active = true;` |
|
||||
| **Covering** | Index-only scans | `CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);` |
|
||||
|
||||
### EXPLAIN Plan Reading
|
||||
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
|
||||
```
|
||||
|
||||
Key signals to watch:
|
||||
- **Seq Scan** on large tables — missing index
|
||||
- **Nested Loop** with high row estimates — consider hash/merge join or add index
|
||||
- **Buffers shared read** much higher than **hit** — working set exceeds memory
|
||||
|
||||
### N+1 Query Detection
|
||||
|
||||
Symptoms: application issues one query per row (e.g., fetching related records in a loop).
|
||||
|
||||
Fixes:
|
||||
- Use `JOIN` or subquery to fetch in one round-trip
|
||||
- ORM eager loading (`select_related` / `includes` / `with`)
|
||||
- DataLoader pattern for GraphQL resolvers
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
| Tool | Protocol | Best For |
|
||||
|------|----------|----------|
|
||||
| **PgBouncer** | PostgreSQL | Transaction/statement pooling, low overhead |
|
||||
| **ProxySQL** | MySQL | Query routing, read/write splitting |
|
||||
| **Built-in pool** (HikariCP, SQLAlchemy pool) | Any | Application-level pooling |
|
||||
|
||||
**Rule of thumb:** Set pool size to `(2 * CPU cores) + disk spindles`. For cloud SSDs, start with `2 * vCPUs` and tune.
|
||||
|
||||
### Read Replicas and Query Routing
|
||||
|
||||
- Route all `SELECT` queries to replicas; writes to primary
|
||||
- Account for replication lag (typically <1s for async, 0 for sync)
|
||||
- Use `pg_last_wal_replay_lsn()` to detect lag before reading critical data
|
||||
|
||||
---
|
||||
|
||||
## Multi-Database Decision Matrix
|
||||
|
||||
| Criteria | PostgreSQL | MySQL | SQLite | SQL Server |
|
||||
|----------|-----------|-------|--------|------------|
|
||||
| **Best for** | Complex queries, JSONB, extensions | Web apps, read-heavy workloads | Embedded, dev/test, edge | Enterprise .NET stacks |
|
||||
| **JSON support** | Excellent (JSONB + GIN) | Good (JSON type) | Minimal | Good (OPENJSON) |
|
||||
| **Replication** | Streaming, logical | Group replication, InnoDB cluster | N/A | Always On AG |
|
||||
| **Licensing** | Open source (PostgreSQL License) | Open source (GPL) / commercial | Public domain | Commercial |
|
||||
| **Max practical size** | Multi-TB | Multi-TB | ~1 TB (single-writer) | Multi-TB |
|
||||
|
||||
**When to choose:**
|
||||
- **PostgreSQL** — default choice for new projects; best extensibility and standards compliance
|
||||
- **MySQL** — existing MySQL ecosystem; simple read-heavy web applications
|
||||
- **SQLite** — mobile apps, CLI tools, unit test databases, IoT/edge
|
||||
- **SQL Server** — mandated by enterprise policy; deep .NET/Azure integration
|
||||
|
||||
### NoSQL Considerations
|
||||
|
||||
| Database | Model | Use When |
|
||||
|----------|-------|----------|
|
||||
| **MongoDB** | Document | Schema flexibility, rapid prototyping, content management |
|
||||
| **Redis** | Key-value / cache | Session store, rate limiting, leaderboards, pub/sub |
|
||||
| **DynamoDB** | Wide-column | Serverless AWS apps, single-digit-ms latency at any scale |
|
||||
|
||||
> Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.
|
||||
|
||||
---
|
||||
|
||||
## Sharding & Replication
|
||||
|
||||
### Horizontal vs Vertical Partitioning
|
||||
|
||||
- **Vertical partitioning**: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
|
||||
- **Horizontal partitioning (sharding)**: Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.
|
||||
|
||||
### Sharding Strategies
|
||||
|
||||
| Strategy | How It Works | Pros | Cons |
|
||||
|----------|-------------|------|------|
|
||||
| **Hash** | `shard = hash(key) % N` | Even distribution | Resharding is expensive |
|
||||
| **Range** | Shard by date or ID range | Simple, good for time-series | Hot spots on latest shard |
|
||||
| **Geographic** | Shard by user region | Data locality, compliance | Cross-region queries are hard |
|
||||
|
||||
### Replication Patterns
|
||||
|
||||
| Pattern | Consistency | Latency | Use Case |
|
||||
|---------|------------|---------|----------|
|
||||
| **Synchronous** | Strong | Higher write latency | Financial transactions |
|
||||
| **Asynchronous** | Eventual | Low write latency | Read-heavy web apps |
|
||||
| **Semi-synchronous** | At-least-one replica confirmed | Moderate | Balance of safety and speed |
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- **sql-database-assistant** — query writing, optimization, and debugging for day-to-day SQL work
|
||||
- **database-schema-designer** — ERD modeling, normalization analysis, and schema generation
|
||||
- **migration-architect** — large-scale migration planning across database engines or major schema overhauls
|
||||
- **senior-backend** — application-layer patterns (connection pooling, ORM best practices)
|
||||
- **senior-devops** — infrastructure provisioning for database clusters and replicas
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.
|
||||
|
||||
@@ -76,3 +76,185 @@ python3 scripts/env_auditor.py /path/to/repo --json
|
||||
2. Keep dev env files local and gitignored.
|
||||
3. Enforce detection in CI before merge.
|
||||
4. Re-test application paths immediately after credential rotation.
|
||||
|
||||
---
|
||||
|
||||
## Cloud Secret Store Integration
|
||||
|
||||
Production applications should never read secrets from `.env` files or environment variables baked into container images. Use a dedicated secret store instead.
|
||||
|
||||
### Provider Comparison
|
||||
|
||||
| Provider | Best For | Key Feature |
|
||||
|----------|----------|-------------|
|
||||
| **HashiCorp Vault** | Multi-cloud / hybrid | Dynamic secrets, policy engine, pluggable backends |
|
||||
| **AWS Secrets Manager** | AWS-native workloads | Native Lambda/ECS/EKS integration, automatic RDS rotation |
|
||||
| **Azure Key Vault** | Azure-native workloads | Managed HSM, Azure AD RBAC, certificate management |
|
||||
| **GCP Secret Manager** | GCP-native workloads | IAM-based access, automatic replication, versioning |
|
||||
|
||||
### Selection Guidance
|
||||
|
||||
- **Single cloud provider** — use the cloud-native secret manager. It integrates tightly with IAM, reduces operational overhead, and costs less than self-hosting.
|
||||
- **Multi-cloud or hybrid** — use HashiCorp Vault. It provides a uniform API across environments and supports dynamic secret generation (database credentials, cloud IAM keys) that expire automatically.
|
||||
- **Kubernetes-heavy** — combine External Secrets Operator with any backend above to sync secrets into K8s `Secret` objects without hardcoding.
|
||||
|
||||
### Application Access Patterns
|
||||
|
||||
1. **SDK/API pull** — application fetches secret at startup or on-demand via provider SDK.
|
||||
2. **Sidecar injection** — a sidecar container (e.g., Vault Agent) writes secrets to a shared volume or injects them as environment variables.
|
||||
3. **Init container** — a Kubernetes init container fetches secrets before the main container starts.
|
||||
4. **CSI driver** — secrets mount as a filesystem volume via the Secrets Store CSI Driver.
|
||||
|
||||
> **Cross-reference:** See `engineering/secrets-vault-manager` for production vault infrastructure patterns, HA deployment, and disaster recovery procedures.
|
||||
|
||||
---
|
||||
|
||||
## Secret Rotation Workflow
|
||||
|
||||
Stale secrets are a liability. Rotation ensures that even if a credential leaks, its useful lifetime is bounded.
|
||||
|
||||
### Phase 1: Detection
|
||||
|
||||
- Track secret creation and expiry dates in your secret store metadata.
|
||||
- Set alerts at 30, 14, and 7 days before expiry.
|
||||
- Use `scripts/env_auditor.py` to flag secrets with no recorded rotation date.
|
||||
|
||||
### Phase 2: Rotation
|
||||
|
||||
1. **Generate** a new credential (API key, database password, certificate).
|
||||
2. **Deploy** the new credential to all consumers (apps, services, pipelines) in parallel.
|
||||
3. **Verify** each consumer can authenticate using the new credential.
|
||||
4. **Revoke** the old credential only after all consumers are confirmed healthy.
|
||||
5. **Update** metadata with the new rotation timestamp and next rotation date.
|
||||
|
||||
### Phase 3: Automation
|
||||
|
||||
- **AWS Secrets Manager** — use built-in Lambda-based rotation for RDS, Redshift, and DocumentDB.
|
||||
- **HashiCorp Vault** — configure dynamic secrets with TTLs; credentials are generated on-demand and auto-expire.
|
||||
- **Azure Key Vault** — use Event Grid notifications to trigger rotation functions.
|
||||
- **GCP Secret Manager** — use Pub/Sub notifications tied to Cloud Functions for rotation logic.
|
||||
|
||||
### Emergency Rotation Checklist
|
||||
|
||||
When a secret is confirmed leaked:
|
||||
|
||||
1. **Immediately revoke** the compromised credential at the provider level.
|
||||
2. Generate and deploy a replacement credential to all consumers.
|
||||
3. Audit access logs for unauthorized usage during the exposure window.
|
||||
4. Scan git history, CI logs, and artifact registries for the leaked value.
|
||||
5. File an incident report documenting scope, timeline, and remediation steps.
|
||||
6. Review and tighten detection controls to prevent recurrence.
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Secret Injection
|
||||
|
||||
Secrets in CI/CD pipelines require careful handling to avoid exposure in logs, artifacts, or pull request contexts.
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
- Use **repository secrets** or **environment secrets** via `${{ secrets.SECRET_NAME }}`.
|
||||
- Prefer **OIDC federation** (`aws-actions/configure-aws-credentials` with `role-to-assume`) over long-lived access keys.
|
||||
- Environment secrets with required reviewers add approval gates for production deployments.
|
||||
- GitHub automatically masks secrets in logs, but avoid `echo` or `toJSON()` on secret values.
|
||||
|
||||
### GitLab CI
|
||||
|
||||
- Store secrets as **CI/CD variables** with the `masked` and `protected` flags enabled.
|
||||
- Use **HashiCorp Vault integration** (`secrets:vault`) for dynamic secret injection without storing values in GitLab.
|
||||
- Scope variables to specific environments (`production`, `staging`) to enforce least privilege.
|
||||
|
||||
### Universal Patterns
|
||||
|
||||
- **Never echo or print** secret values in pipeline output, even for debugging.
|
||||
- **Use short-lived tokens** (OIDC, STS AssumeRole) instead of static credentials wherever possible.
|
||||
- **Restrict PR access** — do not expose secrets to pipelines triggered by forks or untrusted branches.
|
||||
- **Rotate CI secrets** on the same schedule as application secrets; pipeline credentials are attack vectors too.
|
||||
- **Audit pipeline logs** periodically for accidental secret exposure that masking may have missed.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Commit Secret Detection
|
||||
|
||||
Catching secrets before they reach version control is the most cost-effective defense. Two leading tools cover this space.
|
||||
|
||||
### gitleaks
|
||||
|
||||
```toml
|
||||
# .gitleaks.toml — minimal configuration
|
||||
[extend]
|
||||
useDefault = true
|
||||
|
||||
[[rules]]
|
||||
id = "custom-internal-token"
|
||||
description = "Internal service token pattern"
|
||||
regex = '''INTERNAL_TOKEN_[A-Za-z0-9]{32}'''
|
||||
secretGroup = 0
|
||||
```
|
||||
|
||||
- Install: `brew install gitleaks` or download from GitHub releases.
|
||||
- Pre-commit hook: `gitleaks git --pre-commit --staged`
|
||||
- Baseline scanning: `gitleaks detect --source . --report-path gitleaks-report.json`
|
||||
- Manage false positives in `.gitleaksignore` (one fingerprint per line).
|
||||
|
||||
### detect-secrets
|
||||
|
||||
```bash
|
||||
# Generate baseline
|
||||
detect-secrets scan --all-files > .secrets.baseline
|
||||
|
||||
# Pre-commit hook (via pre-commit framework)
|
||||
# .pre-commit-config.yaml
|
||||
repos:
|
||||
- repo: https://github.com/Yelp/detect-secrets
|
||||
rev: v1.5.0
|
||||
hooks:
|
||||
- id: detect-secrets
|
||||
args: ['--baseline', '.secrets.baseline']
|
||||
```
|
||||
|
||||
- Supports **custom plugins** for organization-specific patterns.
|
||||
- Audit workflow: `detect-secrets audit .secrets.baseline` interactively marks true/false positives.
|
||||
|
||||
### False Positive Management
|
||||
|
||||
- Maintain `.gitleaksignore` or `.secrets.baseline` in version control so the whole team shares exclusions.
|
||||
- Review false positive lists during security audits — patterns may mask real leaks over time.
|
||||
- Prefer tightening regex patterns over broadly ignoring files.
|
||||
|
||||
---
|
||||
|
||||
## Audit Logging
|
||||
|
||||
Knowing who accessed which secret and when is critical for incident investigation and compliance.
|
||||
|
||||
### Cloud-Native Audit Trails
|
||||
|
||||
| Provider | Service | What It Captures |
|
||||
|----------|---------|-----------------|
|
||||
| **AWS** | CloudTrail | Every `GetSecretValue`, `DescribeSecret`, `RotateSecret` API call |
|
||||
| **Azure** | Activity Log + Diagnostic Logs | Key Vault access events, including caller identity and IP |
|
||||
| **GCP** | Cloud Audit Logs | Data access logs for Secret Manager with principal and timestamp |
|
||||
| **Vault** | Audit Backend | Full request/response logging (file, syslog, or socket backend) |
|
||||
|
||||
### Alerting Strategy
|
||||
|
||||
- Alert on **access from unknown IP ranges** or service accounts outside the expected set.
|
||||
- Alert on **bulk secret reads** (more than N secrets accessed within a time window).
|
||||
- Alert on **access outside deployment windows** when no CI/CD pipeline is running.
|
||||
- Feed audit logs into your SIEM (Splunk, Datadog, Elastic) for correlation with other security events.
|
||||
- Review audit logs quarterly as part of access recertification.
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
This skill covers env hygiene and secret detection. For deeper coverage of related domains, see:
|
||||
|
||||
| Skill | Path | Relationship |
|
||||
|-------|------|-------------|
|
||||
| **Secrets Vault Manager** | `engineering/secrets-vault-manager` | Production vault infrastructure, HA deployment, DR |
|
||||
| **Senior SecOps** | `engineering/senior-secops` | Security operations perspective, incident response |
|
||||
| **CI/CD Pipeline Builder** | `engineering/ci-cd-pipeline-builder` | Pipeline architecture, secret injection patterns |
|
||||
| **Infrastructure as Code** | `engineering/infrastructure-as-code` | Terraform/Pulumi secret backend configuration |
|
||||
| **Container Orchestration** | `engineering/container-orchestration` | Kubernetes secret mounting, sealed secrets |
|
||||
|
||||
Reference in New Issue
Block a user