Merge pull request #408 from alirezarezvani/feature/sprint-improvements
improve(engineering): enhance 5 existing skills — tdd-guide, env-secrets-manager, senior-secops, database-designer, senior-devops
This commit is contained in:
@@ -270,6 +270,54 @@ kubectl get pods -n production -l app=myapp
|
||||
curl -sf https://app.example.com/healthz || echo "ROLLBACK FAILED — escalate"
|
||||
```
|
||||
|
||||
## Multi-Cloud Cross-References
|
||||
|
||||
Use these companion skills for cloud-specific deep dives:
|
||||
|
||||
| Skill | Cloud | Use When |
|
||||
|-------|-------|----------|
|
||||
| **aws-solution-architect** | AWS | ECS/EKS, Lambda, VPC design, cost optimization |
|
||||
| **azure-cloud-architect** | Azure | AKS, App Service, Virtual Networks, Azure DevOps |
|
||||
| **gcp-cloud-architect** | GCP | GKE, Cloud Run, VPC, Cloud Build *(coming soon)* |
|
||||
|
||||
**Multi-cloud vs single-cloud decision:**
|
||||
- **Single-cloud** (default) — lower operational complexity, deeper managed-service integration, better cost leverage with committed-use discounts
|
||||
- **Multi-cloud** — required when mandated by compliance/data residency, acquiring companies on different clouds, or needing best-of-breed services across providers (e.g., AWS for compute + GCP for ML)
|
||||
- **Hybrid** — on-prem + cloud; use when regulated workloads must stay on-prem while burst/non-sensitive workloads run in the cloud
|
||||
|
||||
> Start single-cloud. Add a second cloud only when there is a concrete business or compliance driver — not for theoretical redundancy.
|
||||
|
||||
---
|
||||
|
||||
## Cloud-Agnostic IaC
|
||||
|
||||
### Terraform / OpenTofu (Default Choice)
|
||||
|
||||
Terraform (or its open-source fork OpenTofu) is the recommended IaC tool for most teams:
|
||||
- Single language (HCL) across AWS, Azure, GCP, and 3,000+ providers
|
||||
- State management with remote backends (S3, GCS, Azure Blob)
|
||||
- Plan-before-apply workflow prevents drift surprises
|
||||
- Cross-reference **terraform-patterns** for module structure, state isolation, and CI/CD integration
|
||||
|
||||
### Pulumi (Programming Language IaC)
|
||||
|
||||
Choose Pulumi when the team strongly prefers TypeScript, Python, Go, or C# over HCL:
|
||||
- Full programming language — loops, conditionals, unit tests native
|
||||
- Same cloud provider coverage as Terraform
|
||||
- Easier onboarding for dev teams that resist learning HCL
|
||||
|
||||
### When to Use Cloud-Native IaC
|
||||
|
||||
| Tool | Use When |
|
||||
|------|----------|
|
||||
| **CloudFormation** | AWS-only shop; need native AWS support (StackSets, Service Catalog) |
|
||||
| **Bicep** | Azure-only shop; simpler syntax than ARM templates |
|
||||
| **Cloud Deployment Manager** | GCP-only; rare — most GCP teams prefer Terraform |
|
||||
|
||||
> **Rule of thumb:** Use Terraform/OpenTofu unless you are 100% committed to a single cloud AND the cloud-native tool offers a feature Terraform cannot replicate (e.g., AWS Service Catalog integration).
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Check the comprehensive troubleshooting section in `references/deployment_strategies.md`.
|
||||
|
||||
@@ -413,6 +413,89 @@ app.use((req, res, next) => {
|
||||
|
||||
---
|
||||
|
||||
## OWASP Top 10 Quick-Check
|
||||
|
||||
Rapid 15-minute assessment — run through each category and note pass/fail. For deep-dive testing, hand off to the **security-pen-testing** skill.
|
||||
|
||||
| # | Category | One-Line Check |
|
||||
|---|----------|----------------|
|
||||
| A01 | Broken Access Control | Verify role checks on every endpoint; test horizontal privilege escalation |
|
||||
| A02 | Cryptographic Failures | Confirm TLS 1.2+ everywhere; no secrets in logs or source |
|
||||
| A03 | Injection | Run parameterized query audit; check ORM raw-query usage |
|
||||
| A04 | Insecure Design | Review threat model exists for critical flows |
|
||||
| A05 | Security Misconfiguration | Check default credentials removed; error pages generic |
|
||||
| A06 | Vulnerable Components | Run `vulnerability_assessor.py`; zero critical/high CVEs |
|
||||
| A07 | Auth Failures | Verify MFA on admin; brute-force protection active |
|
||||
| A08 | Software & Data Integrity | Confirm CI/CD pipeline signs artifacts; no unsigned deps |
|
||||
| A09 | Logging & Monitoring | Validate audit logs capture auth events; alerts configured |
|
||||
| A10 | SSRF | Test internal URL filters; block metadata endpoints (169.254.169.254) |
|
||||
|
||||
> **Deep dive needed?** Hand off to `security-pen-testing` for full OWASP Testing Guide coverage.
|
||||
|
||||
---
|
||||
|
||||
## Secret Scanning Tools
|
||||
|
||||
Choose the right scanner for each stage of your workflow:
|
||||
|
||||
| Tool | Best For | Language | Pre-commit | CI/CD | Custom Rules |
|
||||
|------|----------|----------|:----------:|:-----:|:------------:|
|
||||
| **gitleaks** | CI pipelines, full-repo scans | Go | Yes | Yes | TOML regexes |
|
||||
| **detect-secrets** | Pre-commit hooks, incremental | Python | Yes | Partial | Plugin-based |
|
||||
| **truffleHog** | Deep history scans, entropy | Go | No | Yes | Regex + entropy |
|
||||
|
||||
**Recommended setup:** Use `detect-secrets` as a pre-commit hook (catches secrets before they enter history) and `gitleaks` in CI (catches anything that slips through).
|
||||
|
||||
```bash
|
||||
# detect-secrets pre-commit hook (.pre-commit-config.yaml)
|
||||
- repo: https://github.com/Yelp/detect-secrets
|
||||
rev: v1.4.0
|
||||
hooks:
|
||||
- id: detect-secrets
|
||||
args: ['--baseline', '.secrets.baseline']
|
||||
|
||||
# gitleaks in GitHub Actions
|
||||
- name: gitleaks
|
||||
uses: gitleaks/gitleaks-action@v2
|
||||
env:
|
||||
GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Supply Chain Security
|
||||
|
||||
Protect against dependency and artifact tampering with SBOM generation, artifact signing, and SLSA compliance.
|
||||
|
||||
**SBOM Generation:**
|
||||
- **syft** — generates SBOMs from container images or source dirs (SPDX, CycloneDX formats)
|
||||
- **cyclonedx-cli** — CycloneDX-native tooling; merge multiple SBOMs for mono-repos
|
||||
|
||||
```bash
|
||||
# Generate SBOM from container image
|
||||
syft packages ghcr.io/org/app:latest -o cyclonedx-json > sbom.json
|
||||
```
|
||||
|
||||
**Artifact Signing (Sigstore/cosign):**
|
||||
```bash
|
||||
# Sign a container image (keyless via OIDC)
|
||||
cosign sign ghcr.io/org/app:latest
|
||||
# Verify signature
|
||||
cosign verify ghcr.io/org/app:latest --certificate-identity=ci@org.com --certificate-oidc-issuer=https://token.actions.githubusercontent.com
|
||||
```
|
||||
|
||||
**SLSA Levels Overview:**
|
||||
| Level | Requirement | What It Proves |
|
||||
|-------|-------------|----------------|
|
||||
| 1 | Build process documented | Provenance exists |
|
||||
| 2 | Hosted build service, signed provenance | Tamper-resistant provenance |
|
||||
| 3 | Hardened build platform, non-falsifiable provenance | Tamper-proof build |
|
||||
| 4 | Two-party review, hermetic builds | Maximum supply-chain assurance |
|
||||
|
||||
> **Cross-references:** `security-pen-testing` (vulnerability exploitation testing), `dependency-auditor` (license and CVE audit for dependencies).
|
||||
|
||||
---
|
||||
|
||||
## Reference Documentation
|
||||
|
||||
| Document | Description |
|
||||
|
||||
@@ -148,6 +148,254 @@ Additional scripts: `framework_adapter.py` (convert between frameworks), `metric
|
||||
|
||||
---
|
||||
|
||||
## Spec-First Workflow
|
||||
|
||||
TDD is most effective when driven by a written spec. The flow:
|
||||
|
||||
1. **Write or receive a spec** — stored in `specs/<feature>.md`
|
||||
2. **Extract acceptance criteria** — each criterion becomes one or more test cases
|
||||
3. **Write failing tests (RED)** — one test per acceptance criterion
|
||||
4. **Implement minimal code (GREEN)** — satisfy each test in order
|
||||
5. **Refactor** — clean up while all tests stay green
|
||||
|
||||
### Spec Directory Convention
|
||||
|
||||
```
|
||||
project/
|
||||
├── specs/
|
||||
│ ├── user-auth.md # Feature spec with acceptance criteria
|
||||
│ ├── payment-processing.md
|
||||
│ └── notification-system.md
|
||||
├── tests/
|
||||
│ ├── test_user_auth.py # Tests derived from specs/user-auth.md
|
||||
│ ├── test_payments.py
|
||||
│ └── test_notifications.py
|
||||
└── src/
|
||||
```
|
||||
|
||||
### Extracting Tests from Specs
|
||||
|
||||
Each acceptance criterion in a spec maps to at least one test:
|
||||
|
||||
| Spec Criterion | Test Case |
|
||||
|---------------|-----------|
|
||||
| "User can log in with valid credentials" | `test_login_valid_credentials_returns_token` |
|
||||
| "Invalid password returns 401" | `test_login_invalid_password_returns_401` |
|
||||
| "Account locks after 5 failed attempts" | `test_login_locks_after_five_failures` |
|
||||
|
||||
**Tip:** Number your acceptance criteria in the spec. Reference the number in the test docstring for traceability (`# AC-3: Account locks after 5 failed attempts`).
|
||||
|
||||
> **Cross-reference:** See `engineering/spec-driven-workflow` for the full spec methodology, including spec templates and review checklists.
|
||||
|
||||
---
|
||||
|
||||
## Red-Green-Refactor Examples Per Language
|
||||
|
||||
### TypeScript / Jest
|
||||
|
||||
```typescript
|
||||
// test/cart.test.ts
|
||||
describe("Cart", () => {
|
||||
describe("addItem", () => {
|
||||
it("should add a new item to an empty cart", () => {
|
||||
const cart = new Cart();
|
||||
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
|
||||
|
||||
expect(cart.items).toHaveLength(1);
|
||||
expect(cart.items[0].id).toBe("sku-1");
|
||||
});
|
||||
|
||||
it("should increment quantity when adding an existing item", () => {
|
||||
const cart = new Cart();
|
||||
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
|
||||
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 2 });
|
||||
|
||||
expect(cart.items).toHaveLength(1);
|
||||
expect(cart.items[0].qty).toBe(3);
|
||||
});
|
||||
|
||||
it("should throw when quantity is zero or negative", () => {
|
||||
const cart = new Cart();
|
||||
expect(() =>
|
||||
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 0 })
|
||||
).toThrow("Quantity must be positive");
|
||||
});
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Python / Pytest (Advanced Patterns)
|
||||
|
||||
```python
|
||||
# tests/conftest.py — shared fixtures
|
||||
import pytest
|
||||
from app.db import create_engine, Session
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def db_engine():
|
||||
engine = create_engine("sqlite:///:memory:")
|
||||
yield engine
|
||||
engine.dispose()
|
||||
|
||||
@pytest.fixture
|
||||
def db_session(db_engine):
|
||||
session = Session(bind=db_engine)
|
||||
yield session
|
||||
session.rollback()
|
||||
session.close()
|
||||
|
||||
# tests/test_pricing.py — parametrize for multiple cases
|
||||
import pytest
|
||||
from app.pricing import calculate_discount
|
||||
|
||||
@pytest.mark.parametrize("subtotal, expected_discount", [
|
||||
(50.0, 0.0), # Below threshold — no discount
|
||||
(100.0, 5.0), # 5% tier
|
||||
(250.0, 25.0), # 10% tier
|
||||
(500.0, 75.0), # 15% tier
|
||||
])
|
||||
def test_calculate_discount(subtotal, expected_discount):
|
||||
assert calculate_discount(subtotal) == pytest.approx(expected_discount)
|
||||
```
|
||||
|
||||
### Go — Table-Driven Tests
|
||||
|
||||
```go
|
||||
// cart_test.go
|
||||
package cart
|
||||
|
||||
import "testing"
|
||||
|
||||
func TestApplyDiscount(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
subtotal float64
|
||||
want float64
|
||||
}{
|
||||
{"no discount below threshold", 50.0, 0.0},
|
||||
{"5 percent tier", 100.0, 5.0},
|
||||
{"10 percent tier", 250.0, 25.0},
|
||||
{"15 percent tier", 500.0, 75.0},
|
||||
{"zero subtotal", 0.0, 0.0},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
got := ApplyDiscount(tt.subtotal)
|
||||
if got != tt.want {
|
||||
t.Errorf("ApplyDiscount(%v) = %v, want %v", tt.subtotal, got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bounded Autonomy Rules
|
||||
|
||||
When generating tests autonomously, follow these rules to decide when to stop and ask the user:
|
||||
|
||||
### Stop and Ask When
|
||||
|
||||
- **Ambiguous requirements** — the spec or user story has conflicting or unclear acceptance criteria
|
||||
- **Missing edge cases** — you cannot determine boundary values without domain knowledge (e.g., max allowed transaction amount)
|
||||
- **Test count exceeds 50** — large test suites need human review before committing; present a summary and ask which areas to prioritize
|
||||
- **External dependencies unclear** — the feature relies on third-party APIs or services with undocumented behavior
|
||||
- **Security-sensitive logic** — authentication, authorization, encryption, or payment flows require human sign-off on test scenarios
|
||||
|
||||
### Continue Autonomously When
|
||||
|
||||
- **Clear spec with numbered acceptance criteria** — each criterion maps directly to tests
|
||||
- **Straightforward CRUD operations** — create, read, update, delete with well-defined models
|
||||
- **Well-defined API contracts** — OpenAPI spec or typed interfaces available
|
||||
- **Pure functions** — deterministic input/output with no side effects
|
||||
- **Existing test patterns** — the codebase already has similar tests to follow
|
||||
|
||||
---
|
||||
|
||||
## Property-Based Testing
|
||||
|
||||
Property-based testing generates random inputs to verify invariants instead of relying on hand-picked examples. Use it when the input space is large and the expected behavior can be described as a property.
|
||||
|
||||
### Python — Hypothesis
|
||||
|
||||
```python
|
||||
from hypothesis import given, strategies as st
|
||||
from app.serializers import serialize, deserialize
|
||||
|
||||
@given(st.text())
|
||||
def test_roundtrip_serialization(data):
|
||||
"""Serialization followed by deserialization returns the original."""
|
||||
assert deserialize(serialize(data)) == data
|
||||
|
||||
@given(st.integers(), st.integers())
|
||||
def test_addition_is_commutative(a, b):
|
||||
assert a + b == b + a
|
||||
```
|
||||
|
||||
### TypeScript — fast-check
|
||||
|
||||
```typescript
|
||||
import fc from "fast-check";
|
||||
import { encode, decode } from "./codec";
|
||||
|
||||
test("encode/decode roundtrip", () => {
|
||||
fc.assert(
|
||||
fc.property(fc.string(), (input) => {
|
||||
expect(decode(encode(input))).toBe(input);
|
||||
})
|
||||
);
|
||||
});
|
||||
```
|
||||
|
||||
### When to Use Property-Based Over Example-Based
|
||||
|
||||
| Use Property-Based | Example |
|
||||
|-------------------|---------|
|
||||
| Data transformations | Serialize/deserialize roundtrips |
|
||||
| Mathematical properties | Commutativity, associativity, idempotency |
|
||||
| Encoding/decoding | Base64, URL encoding, compression |
|
||||
| Sorting and filtering | Output is sorted, length preserved |
|
||||
| Parser correctness | Valid input always parses without error |
|
||||
|
||||
---
|
||||
|
||||
## Mutation Testing
|
||||
|
||||
Mutation testing modifies your production code (creates "mutants") and checks whether your tests catch the changes. If a mutant survives (tests still pass), your tests have a gap that coverage alone cannot reveal.
|
||||
|
||||
### Tools
|
||||
|
||||
| Language | Tool | Command |
|
||||
|----------|------|---------|
|
||||
| TypeScript/JavaScript | **Stryker** | `npx stryker run` |
|
||||
| Python | **mutmut** | `mutmut run --paths-to-mutate=src/` |
|
||||
| Java | **PIT** | `mvn org.pitest:pitest-maven:mutationCoverage` |
|
||||
|
||||
### Why Mutation Testing Matters
|
||||
|
||||
- **100% line coverage != good tests** — coverage tells you code was executed, not that it was verified
|
||||
- **Catches weak assertions** — tests that run code but assert nothing meaningful
|
||||
- **Finds missing boundary tests** — mutants that change `<` to `<=` expose off-by-one gaps
|
||||
- **Quantifiable quality metric** — mutation score (% mutants killed) is a stronger signal than coverage %
|
||||
|
||||
**Recommendation:** Run mutation testing on critical paths (auth, payments, data processing) even if overall coverage is high. Target 85%+ mutation score on P0 modules.
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
| Skill | Relationship |
|
||||
|-------|-------------|
|
||||
| `engineering/spec-driven-workflow` | Spec → acceptance criteria → test extraction pipeline |
|
||||
| `engineering-team/focused-fix` | Phase 5 (Verify) uses TDD to confirm the fix with a regression test |
|
||||
| `engineering-team/senior-qa` | Broader QA strategy; TDD is one layer in the test pyramid |
|
||||
| `engineering-team/code-reviewer` | Review generated tests for assertion quality and coverage completeness |
|
||||
| `engineering-team/senior-fullstack` | Project scaffolders include testing infrastructure compatible with TDD workflows |
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
| Scope | Details |
|
||||
|
||||
@@ -59,6 +59,229 @@ A comprehensive database design skill that provides expert-level analysis, optim
|
||||
4. **Validate inputs**: Prevent SQL injection attacks
|
||||
5. **Regular security updates**: Keep database software current
|
||||
|
||||
## Query Generation Patterns
|
||||
|
||||
### SELECT with JOINs
|
||||
|
||||
```sql
|
||||
-- INNER JOIN: only matching rows
|
||||
SELECT o.id, c.name, o.total
|
||||
FROM orders o
|
||||
INNER JOIN customers c ON c.id = o.customer_id;
|
||||
|
||||
-- LEFT JOIN: all left rows, NULLs for non-matches
|
||||
SELECT c.name, COUNT(o.id) AS order_count
|
||||
FROM customers c
|
||||
LEFT JOIN orders o ON o.customer_id = c.id
|
||||
GROUP BY c.name;
|
||||
|
||||
-- Self-join: hierarchical data (employees/managers)
|
||||
SELECT e.name AS employee, m.name AS manager
|
||||
FROM employees e
|
||||
LEFT JOIN employees m ON m.id = e.manager_id;
|
||||
```
|
||||
|
||||
### Common Table Expressions (CTEs)
|
||||
|
||||
```sql
|
||||
-- Recursive CTE for org chart
|
||||
WITH RECURSIVE org AS (
|
||||
SELECT id, name, manager_id, 1 AS depth
|
||||
FROM employees WHERE manager_id IS NULL
|
||||
UNION ALL
|
||||
SELECT e.id, e.name, e.manager_id, o.depth + 1
|
||||
FROM employees e INNER JOIN org o ON o.id = e.manager_id
|
||||
)
|
||||
SELECT * FROM org ORDER BY depth, name;
|
||||
```
|
||||
|
||||
### Window Functions
|
||||
|
||||
```sql
|
||||
-- ROW_NUMBER for pagination / dedup
|
||||
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
|
||||
FROM orders;
|
||||
|
||||
-- RANK with gaps, DENSE_RANK without gaps
|
||||
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;
|
||||
|
||||
-- LAG/LEAD for comparing adjacent rows
|
||||
SELECT date, revenue,
|
||||
revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
|
||||
FROM daily_sales;
|
||||
```
|
||||
|
||||
### Aggregation Patterns
|
||||
|
||||
```sql
|
||||
-- FILTER clause (PostgreSQL) for conditional aggregation
|
||||
SELECT
|
||||
COUNT(*) AS total,
|
||||
COUNT(*) FILTER (WHERE status = 'active') AS active,
|
||||
AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
|
||||
FROM accounts;
|
||||
|
||||
-- GROUPING SETS for multi-level rollups
|
||||
SELECT region, product, SUM(revenue)
|
||||
FROM sales
|
||||
GROUP BY GROUPING SETS ((region, product), (region), ());
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Patterns
|
||||
|
||||
### Up/Down Migration Scripts
|
||||
|
||||
Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:
|
||||
|
||||
```
|
||||
migrations/
|
||||
├── 20260101_000001_create_users.up.sql
|
||||
├── 20260101_000001_create_users.down.sql
|
||||
├── 20260115_000002_add_users_email_index.up.sql
|
||||
└── 20260115_000002_add_users_email_index.down.sql
|
||||
```
|
||||
|
||||
### Zero-Downtime Migrations (Expand/Contract)
|
||||
|
||||
Use the expand-contract pattern to avoid locking or breaking running code:
|
||||
|
||||
1. **Expand** — add the new column/table (nullable, with default)
|
||||
2. **Migrate data** — backfill in batches; dual-write from application
|
||||
3. **Transition** — application reads from new column; stop writing to old
|
||||
4. **Contract** — drop old column in a follow-up migration
|
||||
|
||||
### Data Backfill Strategies
|
||||
|
||||
```sql
|
||||
-- Batch update to avoid long-running locks
|
||||
UPDATE users SET email_normalized = LOWER(email)
|
||||
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
|
||||
-- Repeat in a loop until 0 rows affected
|
||||
```
|
||||
|
||||
### Rollback Procedures
|
||||
|
||||
- Always test the `down.sql` in staging before deploying `up.sql` to production
|
||||
- Keep rollback window short — if the contract step has run, rollback requires a new forward migration
|
||||
- For irreversible changes (dropping columns with data), take a logical backup first
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Indexing Strategies
|
||||
|
||||
| Index Type | Use Case | Example |
|
||||
|------------|----------|---------|
|
||||
| **B-tree** (default) | Equality, range, ORDER BY | `CREATE INDEX idx_users_email ON users(email);` |
|
||||
| **GIN** | Full-text search, JSONB, arrays | `CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));` |
|
||||
| **GiST** | Geometry, range types, nearest-neighbor | `CREATE INDEX idx_locations ON places USING gist(coords);` |
|
||||
| **Partial** | Subset of rows (reduce size) | `CREATE INDEX idx_active ON users(email) WHERE active = true;` |
|
||||
| **Covering** | Index-only scans | `CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);` |
|
||||
|
||||
### EXPLAIN Plan Reading
|
||||
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
|
||||
```
|
||||
|
||||
Key signals to watch:
|
||||
- **Seq Scan** on large tables — missing index
|
||||
- **Nested Loop** with high row estimates — consider hash/merge join or add index
|
||||
- **Buffers shared read** much higher than **hit** — working set exceeds memory
|
||||
|
||||
### N+1 Query Detection
|
||||
|
||||
Symptoms: application issues one query per row (e.g., fetching related records in a loop).
|
||||
|
||||
Fixes:
|
||||
- Use `JOIN` or subquery to fetch in one round-trip
|
||||
- ORM eager loading (`select_related` / `includes` / `with`)
|
||||
- DataLoader pattern for GraphQL resolvers
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
| Tool | Protocol | Best For |
|
||||
|------|----------|----------|
|
||||
| **PgBouncer** | PostgreSQL | Transaction/statement pooling, low overhead |
|
||||
| **ProxySQL** | MySQL | Query routing, read/write splitting |
|
||||
| **Built-in pool** (HikariCP, SQLAlchemy pool) | Any | Application-level pooling |
|
||||
|
||||
**Rule of thumb:** Set pool size to `(2 * CPU cores) + disk spindles`. For cloud SSDs, start with `2 * vCPUs` and tune.
|
||||
|
||||
### Read Replicas and Query Routing
|
||||
|
||||
- Route all `SELECT` queries to replicas; writes to primary
|
||||
- Account for replication lag (typically <1s for async, 0 for sync)
|
||||
- Use `pg_last_wal_replay_lsn()` to detect lag before reading critical data
|
||||
|
||||
---
|
||||
|
||||
## Multi-Database Decision Matrix
|
||||
|
||||
| Criteria | PostgreSQL | MySQL | SQLite | SQL Server |
|
||||
|----------|-----------|-------|--------|------------|
|
||||
| **Best for** | Complex queries, JSONB, extensions | Web apps, read-heavy workloads | Embedded, dev/test, edge | Enterprise .NET stacks |
|
||||
| **JSON support** | Excellent (JSONB + GIN) | Good (JSON type) | Minimal | Good (OPENJSON) |
|
||||
| **Replication** | Streaming, logical | Group replication, InnoDB cluster | N/A | Always On AG |
|
||||
| **Licensing** | Open source (PostgreSQL License) | Open source (GPL) / commercial | Public domain | Commercial |
|
||||
| **Max practical size** | Multi-TB | Multi-TB | ~1 TB (single-writer) | Multi-TB |
|
||||
|
||||
**When to choose:**
|
||||
- **PostgreSQL** — default choice for new projects; best extensibility and standards compliance
|
||||
- **MySQL** — existing MySQL ecosystem; simple read-heavy web applications
|
||||
- **SQLite** — mobile apps, CLI tools, unit test databases, IoT/edge
|
||||
- **SQL Server** — mandated by enterprise policy; deep .NET/Azure integration
|
||||
|
||||
### NoSQL Considerations
|
||||
|
||||
| Database | Model | Use When |
|
||||
|----------|-------|----------|
|
||||
| **MongoDB** | Document | Schema flexibility, rapid prototyping, content management |
|
||||
| **Redis** | Key-value / cache | Session store, rate limiting, leaderboards, pub/sub |
|
||||
| **DynamoDB** | Wide-column | Serverless AWS apps, single-digit-ms latency at any scale |
|
||||
|
||||
> Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.
|
||||
|
||||
---
|
||||
|
||||
## Sharding & Replication
|
||||
|
||||
### Horizontal vs Vertical Partitioning
|
||||
|
||||
- **Vertical partitioning**: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
|
||||
- **Horizontal partitioning (sharding)**: Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.
|
||||
|
||||
### Sharding Strategies
|
||||
|
||||
| Strategy | How It Works | Pros | Cons |
|
||||
|----------|-------------|------|------|
|
||||
| **Hash** | `shard = hash(key) % N` | Even distribution | Resharding is expensive |
|
||||
| **Range** | Shard by date or ID range | Simple, good for time-series | Hot spots on latest shard |
|
||||
| **Geographic** | Shard by user region | Data locality, compliance | Cross-region queries are hard |
|
||||
|
||||
### Replication Patterns
|
||||
|
||||
| Pattern | Consistency | Latency | Use Case |
|
||||
|---------|------------|---------|----------|
|
||||
| **Synchronous** | Strong | Higher write latency | Financial transactions |
|
||||
| **Asynchronous** | Eventual | Low write latency | Read-heavy web apps |
|
||||
| **Semi-synchronous** | At-least-one replica confirmed | Moderate | Balance of safety and speed |
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- **sql-database-assistant** — query writing, optimization, and debugging for day-to-day SQL work
|
||||
- **database-schema-designer** — ERD modeling, normalization analysis, and schema generation
|
||||
- **migration-architect** — large-scale migration planning across database engines or major schema overhauls
|
||||
- **senior-backend** — application-layer patterns (connection pooling, ORM best practices)
|
||||
- **senior-devops** — infrastructure provisioning for database clusters and replicas
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.
|
||||
|
||||
@@ -76,3 +76,185 @@ python3 scripts/env_auditor.py /path/to/repo --json
|
||||
2. Keep dev env files local and gitignored.
|
||||
3. Enforce detection in CI before merge.
|
||||
4. Re-test application paths immediately after credential rotation.
|
||||
|
||||
---
|
||||
|
||||
## Cloud Secret Store Integration
|
||||
|
||||
Production applications should never read secrets from `.env` files or environment variables baked into container images. Use a dedicated secret store instead.
|
||||
|
||||
### Provider Comparison
|
||||
|
||||
| Provider | Best For | Key Feature |
|
||||
|----------|----------|-------------|
|
||||
| **HashiCorp Vault** | Multi-cloud / hybrid | Dynamic secrets, policy engine, pluggable backends |
|
||||
| **AWS Secrets Manager** | AWS-native workloads | Native Lambda/ECS/EKS integration, automatic RDS rotation |
|
||||
| **Azure Key Vault** | Azure-native workloads | Managed HSM, Azure AD RBAC, certificate management |
|
||||
| **GCP Secret Manager** | GCP-native workloads | IAM-based access, automatic replication, versioning |
|
||||
|
||||
### Selection Guidance
|
||||
|
||||
- **Single cloud provider** — use the cloud-native secret manager. It integrates tightly with IAM, reduces operational overhead, and costs less than self-hosting.
|
||||
- **Multi-cloud or hybrid** — use HashiCorp Vault. It provides a uniform API across environments and supports dynamic secret generation (database credentials, cloud IAM keys) that expire automatically.
|
||||
- **Kubernetes-heavy** — combine External Secrets Operator with any backend above to sync secrets into K8s `Secret` objects without hardcoding.
|
||||
|
||||
### Application Access Patterns
|
||||
|
||||
1. **SDK/API pull** — application fetches secret at startup or on-demand via provider SDK.
|
||||
2. **Sidecar injection** — a sidecar container (e.g., Vault Agent) writes secrets to a shared volume or injects them as environment variables.
|
||||
3. **Init container** — a Kubernetes init container fetches secrets before the main container starts.
|
||||
4. **CSI driver** — secrets mount as a filesystem volume via the Secrets Store CSI Driver.
|
||||
|
||||
> **Cross-reference:** See `engineering/secrets-vault-manager` for production vault infrastructure patterns, HA deployment, and disaster recovery procedures.
|
||||
|
||||
---
|
||||
|
||||
## Secret Rotation Workflow
|
||||
|
||||
Stale secrets are a liability. Rotation ensures that even if a credential leaks, its useful lifetime is bounded.
|
||||
|
||||
### Phase 1: Detection
|
||||
|
||||
- Track secret creation and expiry dates in your secret store metadata.
|
||||
- Set alerts at 30, 14, and 7 days before expiry.
|
||||
- Use `scripts/env_auditor.py` to flag secrets with no recorded rotation date.
|
||||
|
||||
### Phase 2: Rotation
|
||||
|
||||
1. **Generate** a new credential (API key, database password, certificate).
|
||||
2. **Deploy** the new credential to all consumers (apps, services, pipelines) in parallel.
|
||||
3. **Verify** each consumer can authenticate using the new credential.
|
||||
4. **Revoke** the old credential only after all consumers are confirmed healthy.
|
||||
5. **Update** metadata with the new rotation timestamp and next rotation date.
|
||||
|
||||
### Phase 3: Automation
|
||||
|
||||
- **AWS Secrets Manager** — use built-in Lambda-based rotation for RDS, Redshift, and DocumentDB.
|
||||
- **HashiCorp Vault** — configure dynamic secrets with TTLs; credentials are generated on-demand and auto-expire.
|
||||
- **Azure Key Vault** — use Event Grid notifications to trigger rotation functions.
|
||||
- **GCP Secret Manager** — use Pub/Sub notifications tied to Cloud Functions for rotation logic.
|
||||
|
||||
### Emergency Rotation Checklist
|
||||
|
||||
When a secret is confirmed leaked:
|
||||
|
||||
1. **Immediately revoke** the compromised credential at the provider level.
|
||||
2. Generate and deploy a replacement credential to all consumers.
|
||||
3. Audit access logs for unauthorized usage during the exposure window.
|
||||
4. Scan git history, CI logs, and artifact registries for the leaked value.
|
||||
5. File an incident report documenting scope, timeline, and remediation steps.
|
||||
6. Review and tighten detection controls to prevent recurrence.
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Secret Injection
|
||||
|
||||
Secrets in CI/CD pipelines require careful handling to avoid exposure in logs, artifacts, or pull request contexts.
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
- Use **repository secrets** or **environment secrets** via `${{ secrets.SECRET_NAME }}`.
|
||||
- Prefer **OIDC federation** (`aws-actions/configure-aws-credentials` with `role-to-assume`) over long-lived access keys.
|
||||
- Environment secrets with required reviewers add approval gates for production deployments.
|
||||
- GitHub automatically masks secrets in logs, but avoid `echo` or `toJSON()` on secret values.
|
||||
|
||||
### GitLab CI
|
||||
|
||||
- Store secrets as **CI/CD variables** with the `masked` and `protected` flags enabled.
|
||||
- Use **HashiCorp Vault integration** (`secrets:vault`) for dynamic secret injection without storing values in GitLab.
|
||||
- Scope variables to specific environments (`production`, `staging`) to enforce least privilege.
|
||||
|
||||
### Universal Patterns
|
||||
|
||||
- **Never echo or print** secret values in pipeline output, even for debugging.
|
||||
- **Use short-lived tokens** (OIDC, STS AssumeRole) instead of static credentials wherever possible.
|
||||
- **Restrict PR access** — do not expose secrets to pipelines triggered by forks or untrusted branches.
|
||||
- **Rotate CI secrets** on the same schedule as application secrets; pipeline credentials are attack vectors too.
|
||||
- **Audit pipeline logs** periodically for accidental secret exposure that masking may have missed.
|
||||
|
||||
---
|
||||
|
||||
## Pre-Commit Secret Detection
|
||||
|
||||
Catching secrets before they reach version control is the most cost-effective defense. Two leading tools cover this space.
|
||||
|
||||
### gitleaks
|
||||
|
||||
```toml
|
||||
# .gitleaks.toml — minimal configuration
|
||||
[extend]
|
||||
useDefault = true
|
||||
|
||||
[[rules]]
|
||||
id = "custom-internal-token"
|
||||
description = "Internal service token pattern"
|
||||
regex = '''INTERNAL_TOKEN_[A-Za-z0-9]{32}'''
|
||||
secretGroup = 0
|
||||
```
|
||||
|
||||
- Install: `brew install gitleaks` or download from GitHub releases.
|
||||
- Pre-commit hook: `gitleaks git --pre-commit --staged`
|
||||
- Baseline scanning: `gitleaks detect --source . --report-path gitleaks-report.json`
|
||||
- Manage false positives in `.gitleaksignore` (one fingerprint per line).
|
||||
|
||||
### detect-secrets
|
||||
|
||||
```bash
|
||||
# Generate baseline
|
||||
detect-secrets scan --all-files > .secrets.baseline
|
||||
|
||||
# Pre-commit hook (via pre-commit framework)
|
||||
# .pre-commit-config.yaml
|
||||
repos:
|
||||
- repo: https://github.com/Yelp/detect-secrets
|
||||
rev: v1.5.0
|
||||
hooks:
|
||||
- id: detect-secrets
|
||||
args: ['--baseline', '.secrets.baseline']
|
||||
```
|
||||
|
||||
- Supports **custom plugins** for organization-specific patterns.
|
||||
- Audit workflow: `detect-secrets audit .secrets.baseline` interactively marks true/false positives.
|
||||
|
||||
### False Positive Management
|
||||
|
||||
- Maintain `.gitleaksignore` or `.secrets.baseline` in version control so the whole team shares exclusions.
|
||||
- Review false positive lists during security audits — patterns may mask real leaks over time.
|
||||
- Prefer tightening regex patterns over broadly ignoring files.
|
||||
|
||||
---
|
||||
|
||||
## Audit Logging
|
||||
|
||||
Knowing who accessed which secret and when is critical for incident investigation and compliance.
|
||||
|
||||
### Cloud-Native Audit Trails
|
||||
|
||||
| Provider | Service | What It Captures |
|
||||
|----------|---------|-----------------|
|
||||
| **AWS** | CloudTrail | Every `GetSecretValue`, `DescribeSecret`, `RotateSecret` API call |
|
||||
| **Azure** | Activity Log + Diagnostic Logs | Key Vault access events, including caller identity and IP |
|
||||
| **GCP** | Cloud Audit Logs | Data access logs for Secret Manager with principal and timestamp |
|
||||
| **Vault** | Audit Backend | Full request/response logging (file, syslog, or socket backend) |
|
||||
|
||||
### Alerting Strategy
|
||||
|
||||
- Alert on **access from unknown IP ranges** or service accounts outside the expected set.
|
||||
- Alert on **bulk secret reads** (more than N secrets accessed within a time window).
|
||||
- Alert on **access outside deployment windows** when no CI/CD pipeline is running.
|
||||
- Feed audit logs into your SIEM (Splunk, Datadog, Elastic) for correlation with other security events.
|
||||
- Review audit logs quarterly as part of access recertification.
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
This skill covers env hygiene and secret detection. For deeper coverage of related domains, see:
|
||||
|
||||
| Skill | Path | Relationship |
|
||||
|-------|------|-------------|
|
||||
| **Secrets Vault Manager** | `engineering/secrets-vault-manager` | Production vault infrastructure, HA deployment, DR |
|
||||
| **Senior SecOps** | `engineering/senior-secops` | Security operations perspective, incident response |
|
||||
| **CI/CD Pipeline Builder** | `engineering/ci-cd-pipeline-builder` | Pipeline architecture, secret injection patterns |
|
||||
| **Infrastructure as Code** | `engineering/infrastructure-as-code` | Terraform/Pulumi secret backend configuration |
|
||||
| **Container Orchestration** | `engineering/container-orchestration` | Kubernetes secret mounting, sealed secrets |
|
||||
|
||||
Reference in New Issue
Block a user