feat: add 20 new practical skills (65→86 total)
20 production-ready skills for professional Claude Code users. Engineering (12): git-worktree-manager, ci-cd-pipeline-builder, mcp-server-builder, changelog-generator, pr-review-expert, api-test-suite-builder, env-secrets-manager, database-schema-designer, codebase-onboarding, performance-profiler, runbook-generator, monorepo-navigator Engineering Team (2): stripe-integration-expert, email-template-builder Product (3): saas-scaffolder, landing-page-generator, competitive-teardown Business (1): contract-and-proposal-writer Marketing (1): prompt-engineer-toolkit AI Engineering (1): agent-workflow-designer Also: README updated (badges, counts, new section), STORE.md (Stan Store + Gumroad distribution plan)
This commit is contained in:
438
engineering/agent-workflow-designer/SKILL.md
Normal file
438
engineering/agent-workflow-designer/SKILL.md
Normal file
@@ -0,0 +1,438 @@
|
||||
# Agent Workflow Designer
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Multi-Agent Systems / AI Orchestration
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Design production-grade multi-agent orchestration systems. Covers five core patterns (sequential pipeline, parallel fan-out/fan-in, hierarchical delegation, event-driven, consensus), platform-specific implementations, handoff protocols, state management, error recovery, context window budgeting, and cost optimization.
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- Pattern selection guide for any orchestration requirement
|
||||
- Handoff protocol templates (structured context passing)
|
||||
- State management patterns for multi-agent workflows
|
||||
- Error recovery and retry strategies
|
||||
- Context window budget management
|
||||
- Cost optimization strategies per platform
|
||||
- Platform-specific configs: Claude Code Agent Teams, OpenClaw, CrewAI, AutoGen
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Building a multi-step AI pipeline that exceeds one agent's context capacity
|
||||
- Parallelizing research, generation, or analysis tasks for speed
|
||||
- Creating specialist agents with defined roles and handoff contracts
|
||||
- Designing fault-tolerant AI workflows for production
|
||||
|
||||
---
|
||||
|
||||
## Pattern Selection Guide
|
||||
|
||||
```
|
||||
Is the task sequential (each step needs previous output)?
|
||||
YES → Sequential Pipeline
|
||||
NO → Can tasks run in parallel?
|
||||
YES → Parallel Fan-out/Fan-in
|
||||
NO → Is there a hierarchy of decisions?
|
||||
YES → Hierarchical Delegation
|
||||
NO → Is it event-triggered?
|
||||
YES → Event-Driven
|
||||
NO → Need consensus/validation?
|
||||
YES → Consensus Pattern
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1: Sequential Pipeline
|
||||
|
||||
**Use when:** Each step depends on the previous output. Research → Draft → Review → Polish.
|
||||
|
||||
```python
|
||||
# sequential_pipeline.py
|
||||
from dataclasses import dataclass
|
||||
from typing import Callable, Any
|
||||
import anthropic
|
||||
|
||||
@dataclass
|
||||
class PipelineStage:
|
||||
name: str
|
||||
system_prompt: str
|
||||
input_key: str # what to take from state
|
||||
output_key: str # what to write to state
|
||||
model: str = "claude-3-5-sonnet-20241022"
|
||||
max_tokens: int = 2048
|
||||
|
||||
class SequentialPipeline:
|
||||
def __init__(self, stages: list[PipelineStage]):
|
||||
self.stages = stages
|
||||
self.client = anthropic.Anthropic()
|
||||
|
||||
def run(self, initial_input: str) -> dict:
|
||||
state = {"input": initial_input}
|
||||
|
||||
for stage in self.stages:
|
||||
print(f"[{stage.name}] Processing...")
|
||||
|
||||
stage_input = state.get(stage.input_key, "")
|
||||
|
||||
response = self.client.messages.create(
|
||||
model=stage.model,
|
||||
max_tokens=stage.max_tokens,
|
||||
system=stage.system_prompt,
|
||||
messages=[{"role": "user", "content": stage_input}],
|
||||
)
|
||||
|
||||
state[stage.output_key] = response.content[0].text
|
||||
state[f"{stage.name}_tokens"] = response.usage.input_tokens + response.usage.output_tokens
|
||||
|
||||
print(f"[{stage.name}] Done. Tokens: {state[f'{stage.name}_tokens']}")
|
||||
|
||||
return state
|
||||
|
||||
# Example: Blog post pipeline
|
||||
pipeline = SequentialPipeline([
|
||||
PipelineStage(
|
||||
name="researcher",
|
||||
system_prompt="You are a research specialist. Given a topic, produce a structured research brief with: key facts, statistics, expert perspectives, and controversy points.",
|
||||
input_key="input",
|
||||
output_key="research",
|
||||
),
|
||||
PipelineStage(
|
||||
name="writer",
|
||||
system_prompt="You are a senior content writer. Using the research provided, write a compelling 800-word blog post with a clear hook, 3 main sections, and a strong CTA.",
|
||||
input_key="research",
|
||||
output_key="draft",
|
||||
),
|
||||
PipelineStage(
|
||||
name="editor",
|
||||
system_prompt="You are a copy editor. Review the draft for: clarity, flow, grammar, and SEO. Return the improved version only, no commentary.",
|
||||
input_key="draft",
|
||||
output_key="final",
|
||||
),
|
||||
])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2: Parallel Fan-out / Fan-in
|
||||
|
||||
**Use when:** Independent tasks that can run concurrently. Research 5 competitors simultaneously.
|
||||
|
||||
```python
|
||||
# parallel_fanout.py
|
||||
import asyncio
|
||||
import anthropic
|
||||
from typing import Any
|
||||
|
||||
async def run_agent(client, task_name: str, system: str, user: str, model: str = "claude-3-5-sonnet-20241022") -> dict:
|
||||
"""Single async agent call"""
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
def _call():
|
||||
return client.messages.create(
|
||||
model=model,
|
||||
max_tokens=2048,
|
||||
system=system,
|
||||
messages=[{"role": "user", "content": user}],
|
||||
)
|
||||
|
||||
response = await loop.run_in_executor(None, _call)
|
||||
return {
|
||||
"task": task_name,
|
||||
"output": response.content[0].text,
|
||||
"tokens": response.usage.input_tokens + response.usage.output_tokens,
|
||||
}
|
||||
|
||||
async def parallel_research(competitors: list[str], research_type: str) -> dict:
|
||||
"""Fan-out: research all competitors in parallel. Fan-in: synthesize results."""
|
||||
client = anthropic.Anthropic()
|
||||
|
||||
# FAN-OUT: spawn parallel agent calls
|
||||
tasks = [
|
||||
run_agent(
|
||||
client,
|
||||
task_name=competitor,
|
||||
system=f"You are a competitive intelligence analyst. Research {competitor} and provide: pricing, key features, target market, and known weaknesses.",
|
||||
user=f"Analyze {competitor} for comparison with our product in the {research_type} market.",
|
||||
)
|
||||
for competitor in competitors
|
||||
]
|
||||
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
# Handle failures gracefully
|
||||
successful = [r for r in results if not isinstance(r, Exception)]
|
||||
failed = [r for r in results if isinstance(r, Exception)]
|
||||
|
||||
if failed:
|
||||
print(f"Warning: {len(failed)} research tasks failed: {failed}")
|
||||
|
||||
# FAN-IN: synthesize
|
||||
combined_research = "\n\n".join([
|
||||
f"## {r['task']}\n{r['output']}" for r in successful
|
||||
])
|
||||
|
||||
synthesis = await run_agent(
|
||||
client,
|
||||
task_name="synthesizer",
|
||||
system="You are a strategic analyst. Synthesize competitor research into a concise comparison matrix and strategic recommendations.",
|
||||
user=f"Synthesize these competitor analyses:\n\n{combined_research}",
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
)
|
||||
|
||||
return {
|
||||
"individual_analyses": successful,
|
||||
"synthesis": synthesis["output"],
|
||||
"total_tokens": sum(r["tokens"] for r in successful) + synthesis["tokens"],
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3: Hierarchical Delegation
|
||||
|
||||
**Use when:** Complex tasks with subtask discovery. Orchestrator breaks down work, delegates to specialists.
|
||||
|
||||
```python
|
||||
# hierarchical_delegation.py
|
||||
import json
|
||||
import anthropic
|
||||
|
||||
ORCHESTRATOR_SYSTEM = """You are an orchestration agent. Your job is to:
|
||||
1. Analyze the user's request
|
||||
2. Break it into subtasks
|
||||
3. Assign each to the appropriate specialist agent
|
||||
4. Collect results and synthesize
|
||||
|
||||
Available specialists:
|
||||
- researcher: finds facts, data, and information
|
||||
- writer: creates content and documents
|
||||
- coder: writes and reviews code
|
||||
- analyst: analyzes data and produces insights
|
||||
|
||||
Respond with a JSON plan:
|
||||
{
|
||||
"subtasks": [
|
||||
{"id": "1", "agent": "researcher", "task": "...", "depends_on": []},
|
||||
{"id": "2", "agent": "writer", "task": "...", "depends_on": ["1"]}
|
||||
]
|
||||
}"""
|
||||
|
||||
SPECIALIST_SYSTEMS = {
|
||||
"researcher": "You are a research specialist. Find accurate, relevant information and cite sources when possible.",
|
||||
"writer": "You are a professional writer. Create clear, engaging content in the requested format.",
|
||||
"coder": "You are a senior software engineer. Write clean, well-commented code with error handling.",
|
||||
"analyst": "You are a data analyst. Provide structured analysis with evidence-backed conclusions.",
|
||||
}
|
||||
|
||||
class HierarchicalOrchestrator:
|
||||
def __init__(self):
|
||||
self.client = anthropic.Anthropic()
|
||||
|
||||
def run(self, user_request: str) -> str:
|
||||
# 1. Orchestrator creates plan
|
||||
plan_response = self.client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=1024,
|
||||
system=ORCHESTRATOR_SYSTEM,
|
||||
messages=[{"role": "user", "content": user_request}],
|
||||
)
|
||||
|
||||
plan = json.loads(plan_response.content[0].text)
|
||||
results = {}
|
||||
|
||||
# 2. Execute subtasks respecting dependencies
|
||||
for subtask in self._topological_sort(plan["subtasks"]):
|
||||
context = self._build_context(subtask, results)
|
||||
specialist = SPECIALIST_SYSTEMS[subtask["agent"]]
|
||||
|
||||
result = self.client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=2048,
|
||||
system=specialist,
|
||||
messages=[{"role": "user", "content": f"{context}\n\nTask: {subtask['task']}"}],
|
||||
)
|
||||
results[subtask["id"]] = result.content[0].text
|
||||
|
||||
# 3. Final synthesis
|
||||
all_results = "\n\n".join([f"### {k}\n{v}" for k, v in results.items()])
|
||||
synthesis = self.client.messages.create(
|
||||
model="claude-3-5-sonnet-20241022",
|
||||
max_tokens=2048,
|
||||
system="Synthesize the specialist outputs into a coherent final response.",
|
||||
messages=[{"role": "user", "content": f"Original request: {user_request}\n\nSpecialist outputs:\n{all_results}"}],
|
||||
)
|
||||
return synthesis.content[0].text
|
||||
|
||||
def _build_context(self, subtask: dict, results: dict) -> str:
|
||||
if not subtask.get("depends_on"):
|
||||
return ""
|
||||
deps = [f"Output from task {dep}:\n{results[dep]}" for dep in subtask["depends_on"] if dep in results]
|
||||
return "Previous results:\n" + "\n\n".join(deps) if deps else ""
|
||||
|
||||
def _topological_sort(self, subtasks: list) -> list:
|
||||
# Simple ordered execution respecting depends_on
|
||||
ordered, remaining = [], list(subtasks)
|
||||
completed = set()
|
||||
while remaining:
|
||||
for task in remaining:
|
||||
if all(dep in completed for dep in task.get("depends_on", [])):
|
||||
ordered.append(task)
|
||||
completed.add(task["id"])
|
||||
remaining.remove(task)
|
||||
break
|
||||
return ordered
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Handoff Protocol Template
|
||||
|
||||
```python
|
||||
# Standard handoff context format — use between all agents
|
||||
@dataclass
|
||||
class AgentHandoff:
|
||||
"""Structured context passed between agents in a workflow."""
|
||||
task_id: str
|
||||
workflow_id: str
|
||||
step_number: int
|
||||
total_steps: int
|
||||
|
||||
# What was done
|
||||
previous_agent: str
|
||||
previous_output: str
|
||||
artifacts: dict # {"filename": "content"} for any files produced
|
||||
|
||||
# What to do next
|
||||
current_agent: str
|
||||
current_task: str
|
||||
constraints: list[str] # hard rules for this step
|
||||
|
||||
# Metadata
|
||||
context_budget_remaining: int # tokens left for this agent
|
||||
cost_so_far_usd: float
|
||||
|
||||
def to_prompt(self) -> str:
|
||||
return f"""
|
||||
# Agent Handoff — Step {self.step_number}/{self.total_steps}
|
||||
|
||||
## Your Task
|
||||
{self.current_task}
|
||||
|
||||
## Constraints
|
||||
{chr(10).join(f'- {c}' for c in self.constraints)}
|
||||
|
||||
## Context from Previous Step ({self.previous_agent})
|
||||
{self.previous_output[:2000]}{"... [truncated]" if len(self.previous_output) > 2000 else ""}
|
||||
|
||||
## Context Budget
|
||||
You have approximately {self.context_budget_remaining} tokens remaining. Be concise.
|
||||
"""
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Recovery Patterns
|
||||
|
||||
```python
|
||||
import time
|
||||
from functools import wraps
|
||||
|
||||
def with_retry(max_attempts=3, backoff_seconds=2, fallback_model=None):
|
||||
"""Decorator for agent calls with exponential backoff and model fallback."""
|
||||
def decorator(fn):
|
||||
@wraps(fn)
|
||||
def wrapper(*args, **kwargs):
|
||||
last_error = None
|
||||
for attempt in range(max_attempts):
|
||||
try:
|
||||
return fn(*args, **kwargs)
|
||||
except Exception as e:
|
||||
last_error = e
|
||||
if attempt < max_attempts - 1:
|
||||
wait = backoff_seconds * (2 ** attempt)
|
||||
print(f"Attempt {attempt+1} failed: {e}. Retrying in {wait}s...")
|
||||
time.sleep(wait)
|
||||
|
||||
# Fall back to cheaper/faster model on rate limit
|
||||
if fallback_model and "rate_limit" in str(e).lower():
|
||||
kwargs["model"] = fallback_model
|
||||
raise last_error
|
||||
return wrapper
|
||||
return decorator
|
||||
|
||||
@with_retry(max_attempts=3, fallback_model="claude-3-haiku-20240307")
|
||||
def call_agent(model, system, user):
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Context Window Budgeting
|
||||
|
||||
```python
|
||||
# Budget context across a multi-step pipeline
|
||||
# Rule: never let any step consume more than 60% of remaining budget
|
||||
|
||||
CONTEXT_LIMITS = {
|
||||
"claude-3-5-sonnet-20241022": 200_000,
|
||||
"gpt-4o": 128_000,
|
||||
}
|
||||
|
||||
class ContextBudget:
|
||||
def __init__(self, model: str, reserve_pct: float = 0.2):
|
||||
total = CONTEXT_LIMITS.get(model, 128_000)
|
||||
self.total = total
|
||||
self.reserve = int(total * reserve_pct) # keep 20% as buffer
|
||||
self.used = 0
|
||||
|
||||
@property
|
||||
def remaining(self):
|
||||
return self.total - self.reserve - self.used
|
||||
|
||||
def allocate(self, step_name: str, requested: int) -> int:
|
||||
allocated = min(requested, int(self.remaining * 0.6)) # max 60% of remaining
|
||||
print(f"[Budget] {step_name}: allocated {allocated:,} tokens (remaining: {self.remaining:,})")
|
||||
return allocated
|
||||
|
||||
def consume(self, tokens_used: int):
|
||||
self.used += tokens_used
|
||||
|
||||
def truncate_to_budget(text: str, token_budget: int, chars_per_token: float = 4.0) -> str:
|
||||
"""Rough truncation — use tiktoken for precision."""
|
||||
char_budget = int(token_budget * chars_per_token)
|
||||
if len(text) <= char_budget:
|
||||
return text
|
||||
return text[:char_budget] + "\n\n[... truncated to fit context budget ...]"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Optimization Strategies
|
||||
|
||||
| Strategy | Savings | Tradeoff |
|
||||
|---|---|---|
|
||||
| Use Haiku for routing/classification | 85-90% | Slightly less nuanced judgment |
|
||||
| Cache repeated system prompts | 50-90% | Requires prompt caching setup |
|
||||
| Truncate intermediate outputs | 20-40% | May lose detail in handoffs |
|
||||
| Batch similar tasks | 50% | Latency increases |
|
||||
| Use Sonnet for most, Opus for final step only | 60-70% | Final quality may improve |
|
||||
| Short-circuit on confidence threshold | 30-50% | Need confidence scoring |
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Circular dependencies** — agents calling each other in loops; enforce DAG structure at design time
|
||||
- **Context bleed** — passing entire previous output to every step; summarize or extract only what's needed
|
||||
- **No timeout** — a stuck agent blocks the whole pipeline; always set max_tokens and wall-clock timeouts
|
||||
- **Silent failures** — agent returns plausible but wrong output; add validation steps for critical paths
|
||||
- **Ignoring cost** — 10 parallel Opus calls is $0.50 per workflow; model selection is a cost decision
|
||||
- **Over-orchestration** — if a single prompt can do it, it should; only add agents when genuinely needed
|
||||
676
engineering/api-test-suite-builder/SKILL.md
Normal file
676
engineering/api-test-suite-builder/SKILL.md
Normal file
@@ -0,0 +1,676 @@
|
||||
# API Test Suite Builder
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Testing / API Quality
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Scans API route definitions across frameworks (Next.js App Router, Express, FastAPI, Django REST) and
|
||||
auto-generates comprehensive test suites covering auth, input validation, error codes, pagination, file
|
||||
uploads, and rate limiting. Outputs ready-to-run test files for Vitest+Supertest (Node) or Pytest+httpx
|
||||
(Python).
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Route detection** — scan source files to extract all API endpoints
|
||||
- **Auth coverage** — valid/invalid/expired tokens, missing auth header
|
||||
- **Input validation** — missing fields, wrong types, boundary values, injection attempts
|
||||
- **Error code matrix** — 400/401/403/404/422/500 for each route
|
||||
- **Pagination** — first/last/empty/oversized pages
|
||||
- **File uploads** — valid, oversized, wrong MIME type, empty
|
||||
- **Rate limiting** — burst detection, per-user vs global limits
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- New API added — generate test scaffold before writing implementation (TDD)
|
||||
- Legacy API with no tests — scan and generate baseline coverage
|
||||
- API contract review — verify existing tests match current route definitions
|
||||
- Pre-release regression check — ensure all routes have at least smoke tests
|
||||
- Security audit prep — generate adversarial input tests
|
||||
|
||||
---
|
||||
|
||||
## Route Detection
|
||||
|
||||
### Next.js App Router
|
||||
```bash
|
||||
# Find all route handlers
|
||||
find ./app/api -name "route.ts" -o -name "route.js" | sort
|
||||
|
||||
# Extract HTTP methods from each route file
|
||||
grep -rn "export async function\|export function" app/api/**/route.ts | \
|
||||
grep -oE "(GET|POST|PUT|PATCH|DELETE|HEAD|OPTIONS)" | sort -u
|
||||
|
||||
# Full route map
|
||||
find ./app/api -name "route.ts" | while read f; do
|
||||
route=$(echo $f | sed 's|./app||' | sed 's|/route.ts||')
|
||||
methods=$(grep -oE "export (async )?function (GET|POST|PUT|PATCH|DELETE)" "$f" | \
|
||||
grep -oE "(GET|POST|PUT|PATCH|DELETE)")
|
||||
echo "$methods $route"
|
||||
done
|
||||
```
|
||||
|
||||
### Express
|
||||
```bash
|
||||
# Find all router files
|
||||
find ./src -name "*.ts" -o -name "*.js" | xargs grep -l "router\.\(get\|post\|put\|delete\|patch\)" 2>/dev/null
|
||||
|
||||
# Extract routes with line numbers
|
||||
grep -rn "router\.\(get\|post\|put\|delete\|patch\)\|app\.\(get\|post\|put\|delete\|patch\)" \
|
||||
src/ --include="*.ts" | grep -oE "(get|post|put|delete|patch)\(['\"][^'\"]*['\"]"
|
||||
|
||||
# Generate route map
|
||||
grep -rn "router\.\|app\." src/ --include="*.ts" | \
|
||||
grep -oE "\.(get|post|put|delete|patch)\(['\"][^'\"]+['\"]" | \
|
||||
sed "s/\.\(.*\)('\(.*\)'/\U\1 \2/"
|
||||
```
|
||||
|
||||
### FastAPI
|
||||
```bash
|
||||
# Find all route decorators
|
||||
grep -rn "@app\.\|@router\." . --include="*.py" | \
|
||||
grep -E "@(app|router)\.(get|post|put|delete|patch)"
|
||||
|
||||
# Extract with path and function name
|
||||
grep -rn "@\(app\|router\)\.\(get\|post\|put\|delete\|patch\)" . --include="*.py" | \
|
||||
grep -oE "@(app|router)\.(get|post|put|delete|patch)\(['\"][^'\"]*['\"]"
|
||||
```
|
||||
|
||||
### Django REST Framework
|
||||
```bash
|
||||
# urlpatterns extraction
|
||||
grep -rn "path\|re_path\|url(" . --include="*.py" | grep "urlpatterns" -A 50 | \
|
||||
grep -E "path\(['\"]" | grep -oE "['\"][^'\"]+['\"]" | head -40
|
||||
|
||||
# ViewSet router registration
|
||||
grep -rn "router\.register\|DefaultRouter\|SimpleRouter" . --include="*.py"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Generation Patterns
|
||||
|
||||
### Auth Test Matrix
|
||||
|
||||
For every authenticated endpoint, generate:
|
||||
|
||||
| Test Case | Expected Status |
|
||||
|-----------|----------------|
|
||||
| No Authorization header | 401 |
|
||||
| Invalid token format | 401 |
|
||||
| Valid token, wrong user role | 403 |
|
||||
| Expired JWT token | 401 |
|
||||
| Valid token, correct role | 2xx |
|
||||
| Token from deleted user | 401 |
|
||||
|
||||
### Input Validation Matrix
|
||||
|
||||
For every POST/PUT/PATCH endpoint with a request body:
|
||||
|
||||
| Test Case | Expected Status |
|
||||
|-----------|----------------|
|
||||
| Empty body `{}` | 400 or 422 |
|
||||
| Missing required fields (one at a time) | 400 or 422 |
|
||||
| Wrong type (string where int expected) | 400 or 422 |
|
||||
| Boundary: value at min-1 | 400 or 422 |
|
||||
| Boundary: value at min | 2xx |
|
||||
| Boundary: value at max | 2xx |
|
||||
| Boundary: value at max+1 | 400 or 422 |
|
||||
| SQL injection in string field | 400 or 200 (sanitized) |
|
||||
| XSS payload in string field | 400 or 200 (sanitized) |
|
||||
| Null values for required fields | 400 or 422 |
|
||||
|
||||
---
|
||||
|
||||
## Example Test Files
|
||||
|
||||
### Example 1 — Node.js: Vitest + Supertest (Next.js API Route)
|
||||
|
||||
```typescript
|
||||
// tests/api/users.test.ts
|
||||
import { describe, it, expect, beforeAll, afterAll } from 'vitest'
|
||||
import request from 'supertest'
|
||||
import { createServer } from '@/test/helpers/server'
|
||||
import { generateJWT, generateExpiredJWT } from '@/test/helpers/auth'
|
||||
import { createTestUser, cleanupTestUsers } from '@/test/helpers/db'
|
||||
|
||||
const app = createServer()
|
||||
|
||||
describe('GET /api/users/:id', () => {
|
||||
let validToken: string
|
||||
let adminToken: string
|
||||
let testUserId: string
|
||||
|
||||
beforeAll(async () => {
|
||||
const user = await createTestUser({ role: 'user' })
|
||||
const admin = await createTestUser({ role: 'admin' })
|
||||
testUserId = user.id
|
||||
validToken = generateJWT(user)
|
||||
adminToken = generateJWT(admin)
|
||||
})
|
||||
|
||||
afterAll(async () => {
|
||||
await cleanupTestUsers()
|
||||
})
|
||||
|
||||
// --- Auth tests ---
|
||||
it('returns 401 with no auth header', async () => {
|
||||
const res = await request(app).get(`/api/users/${testUserId}`)
|
||||
expect(res.status).toBe(401)
|
||||
expect(res.body).toHaveProperty('error')
|
||||
})
|
||||
|
||||
it('returns 401 with malformed token', async () => {
|
||||
const res = await request(app)
|
||||
.get(`/api/users/${testUserId}`)
|
||||
.set('Authorization', 'Bearer not-a-real-jwt')
|
||||
expect(res.status).toBe(401)
|
||||
})
|
||||
|
||||
it('returns 401 with expired token', async () => {
|
||||
const expiredToken = generateExpiredJWT({ id: testUserId })
|
||||
const res = await request(app)
|
||||
.get(`/api/users/${testUserId}`)
|
||||
.set('Authorization', `Bearer ${expiredToken}`)
|
||||
expect(res.status).toBe(401)
|
||||
expect(res.body.error).toMatch(/expired/i)
|
||||
})
|
||||
|
||||
it('returns 403 when accessing another user\'s profile without admin', async () => {
|
||||
const otherUser = await createTestUser({ role: 'user' })
|
||||
const otherToken = generateJWT(otherUser)
|
||||
const res = await request(app)
|
||||
.get(`/api/users/${testUserId}`)
|
||||
.set('Authorization', `Bearer ${otherToken}`)
|
||||
expect(res.status).toBe(403)
|
||||
await cleanupTestUsers([otherUser.id])
|
||||
})
|
||||
|
||||
it('returns 200 with valid token for own profile', async () => {
|
||||
const res = await request(app)
|
||||
.get(`/api/users/${testUserId}`)
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
expect(res.status).toBe(200)
|
||||
expect(res.body).toMatchObject({ id: testUserId })
|
||||
expect(res.body).not.toHaveProperty('password')
|
||||
expect(res.body).not.toHaveProperty('hashedPassword')
|
||||
})
|
||||
|
||||
it('returns 404 for non-existent user', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users/00000000-0000-0000-0000-000000000000')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(404)
|
||||
})
|
||||
|
||||
// --- Input validation ---
|
||||
it('returns 400 for invalid UUID format', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users/not-a-uuid')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(400)
|
||||
})
|
||||
})
|
||||
|
||||
describe('POST /api/users', () => {
|
||||
let adminToken: string
|
||||
|
||||
beforeAll(async () => {
|
||||
const admin = await createTestUser({ role: 'admin' })
|
||||
adminToken = generateJWT(admin)
|
||||
})
|
||||
|
||||
afterAll(cleanupTestUsers)
|
||||
|
||||
// --- Input validation ---
|
||||
it('returns 422 when body is empty', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({})
|
||||
expect(res.status).toBe(422)
|
||||
expect(res.body.errors).toBeDefined()
|
||||
})
|
||||
|
||||
it('returns 422 when email is missing', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({ name: 'Test User', role: 'user' })
|
||||
expect(res.status).toBe(422)
|
||||
expect(res.body.errors).toContainEqual(
|
||||
expect.objectContaining({ field: 'email' })
|
||||
)
|
||||
})
|
||||
|
||||
it('returns 422 for invalid email format', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({ email: 'not-an-email', name: 'Test', role: 'user' })
|
||||
expect(res.status).toBe(422)
|
||||
})
|
||||
|
||||
it('returns 422 for SQL injection attempt in email field', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({ email: "' OR '1'='1", name: 'Hacker', role: 'user' })
|
||||
expect(res.status).toBe(422)
|
||||
})
|
||||
|
||||
it('returns 409 when email already exists', async () => {
|
||||
const existing = await createTestUser({ role: 'user' })
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({ email: existing.email, name: 'Duplicate', role: 'user' })
|
||||
expect(res.status).toBe(409)
|
||||
})
|
||||
|
||||
it('creates user successfully with valid data', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
.send({ email: 'newuser@example.com', name: 'New User', role: 'user' })
|
||||
expect(res.status).toBe(201)
|
||||
expect(res.body).toHaveProperty('id')
|
||||
expect(res.body.email).toBe('newuser@example.com')
|
||||
expect(res.body).not.toHaveProperty('password')
|
||||
})
|
||||
})
|
||||
|
||||
describe('GET /api/users (pagination)', () => {
|
||||
let adminToken: string
|
||||
|
||||
beforeAll(async () => {
|
||||
const admin = await createTestUser({ role: 'admin' })
|
||||
adminToken = generateJWT(admin)
|
||||
// Create 15 test users for pagination
|
||||
await Promise.all(Array.from({ length: 15 }, (_, i) =>
|
||||
createTestUser({ email: `pagtest${i}@example.com` })
|
||||
))
|
||||
})
|
||||
|
||||
afterAll(cleanupTestUsers)
|
||||
|
||||
it('returns first page with default limit', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(200)
|
||||
expect(res.body.data).toBeInstanceOf(Array)
|
||||
expect(res.body).toHaveProperty('total')
|
||||
expect(res.body).toHaveProperty('page')
|
||||
expect(res.body).toHaveProperty('pageSize')
|
||||
})
|
||||
|
||||
it('returns empty array for page beyond total', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users?page=9999')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(200)
|
||||
expect(res.body.data).toHaveLength(0)
|
||||
})
|
||||
|
||||
it('returns 400 for negative page number', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users?page=-1')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(400)
|
||||
})
|
||||
|
||||
it('caps pageSize at maximum allowed value', async () => {
|
||||
const res = await request(app)
|
||||
.get('/api/users?pageSize=9999')
|
||||
.set('Authorization', `Bearer ${adminToken}`)
|
||||
expect(res.status).toBe(200)
|
||||
expect(res.body.data.length).toBeLessThanOrEqual(100)
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 2 — Node.js: File Upload Tests
|
||||
|
||||
```typescript
|
||||
// tests/api/uploads.test.ts
|
||||
import { describe, it, expect } from 'vitest'
|
||||
import request from 'supertest'
|
||||
import path from 'path'
|
||||
import fs from 'fs'
|
||||
import { createServer } from '@/test/helpers/server'
|
||||
import { generateJWT } from '@/test/helpers/auth'
|
||||
import { createTestUser } from '@/test/helpers/db'
|
||||
|
||||
const app = createServer()
|
||||
|
||||
describe('POST /api/upload', () => {
|
||||
let validToken: string
|
||||
|
||||
beforeAll(async () => {
|
||||
const user = await createTestUser({ role: 'user' })
|
||||
validToken = generateJWT(user)
|
||||
})
|
||||
|
||||
it('returns 401 without authentication', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.attach('file', Buffer.from('test'), 'test.pdf')
|
||||
expect(res.status).toBe(401)
|
||||
})
|
||||
|
||||
it('returns 400 when no file attached', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
expect(res.status).toBe(400)
|
||||
expect(res.body.error).toMatch(/file/i)
|
||||
})
|
||||
|
||||
it('returns 400 for unsupported file type (exe)', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
.attach('file', Buffer.from('MZ fake exe'), { filename: 'virus.exe', contentType: 'application/octet-stream' })
|
||||
expect(res.status).toBe(400)
|
||||
expect(res.body.error).toMatch(/type|format|allowed/i)
|
||||
})
|
||||
|
||||
it('returns 413 for oversized file (>10MB)', async () => {
|
||||
const largeBuf = Buffer.alloc(11 * 1024 * 1024) // 11MB
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
.attach('file', largeBuf, { filename: 'large.pdf', contentType: 'application/pdf' })
|
||||
expect(res.status).toBe(413)
|
||||
})
|
||||
|
||||
it('returns 400 for empty file (0 bytes)', async () => {
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
.attach('file', Buffer.alloc(0), { filename: 'empty.pdf', contentType: 'application/pdf' })
|
||||
expect(res.status).toBe(400)
|
||||
})
|
||||
|
||||
it('rejects MIME type spoofing (pdf extension but exe content)', async () => {
|
||||
// Real malicious file: exe magic bytes but pdf extension
|
||||
const fakeExe = Buffer.from('4D5A9000', 'hex') // MZ header
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
.attach('file', fakeExe, { filename: 'document.pdf', contentType: 'application/pdf' })
|
||||
// Should detect magic bytes mismatch
|
||||
expect([400, 415]).toContain(res.status)
|
||||
})
|
||||
|
||||
it('accepts valid PDF file', async () => {
|
||||
const pdfHeader = Buffer.from('%PDF-1.4 test content')
|
||||
const res = await request(app)
|
||||
.post('/api/upload')
|
||||
.set('Authorization', `Bearer ${validToken}`)
|
||||
.attach('file', pdfHeader, { filename: 'valid.pdf', contentType: 'application/pdf' })
|
||||
expect(res.status).toBe(200)
|
||||
expect(res.body).toHaveProperty('url')
|
||||
expect(res.body).toHaveProperty('id')
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 3 — Python: Pytest + httpx (FastAPI)
|
||||
|
||||
```python
|
||||
# tests/api/test_items.py
|
||||
import pytest
|
||||
import httpx
|
||||
from datetime import datetime, timedelta
|
||||
import jwt
|
||||
|
||||
BASE_URL = "http://localhost:8000"
|
||||
JWT_SECRET = "test-secret" # use test config, never production secret
|
||||
|
||||
|
||||
def make_token(user_id: str, role: str = "user", expired: bool = False) -> str:
|
||||
exp = datetime.utcnow() + (timedelta(hours=-1) if expired else timedelta(hours=1))
|
||||
return jwt.encode(
|
||||
{"sub": user_id, "role": role, "exp": exp},
|
||||
JWT_SECRET,
|
||||
algorithm="HS256",
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def client():
|
||||
with httpx.Client(base_url=BASE_URL) as c:
|
||||
yield c
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def valid_token():
|
||||
return make_token("user-123", role="user")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def admin_token():
|
||||
return make_token("admin-456", role="admin")
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def expired_token():
|
||||
return make_token("user-123", expired=True)
|
||||
|
||||
|
||||
class TestGetItem:
|
||||
def test_returns_401_without_auth(self, client):
|
||||
res = client.get("/api/items/1")
|
||||
assert res.status_code == 401
|
||||
|
||||
def test_returns_401_with_invalid_token(self, client):
|
||||
res = client.get("/api/items/1", headers={"Authorization": "Bearer garbage"})
|
||||
assert res.status_code == 401
|
||||
|
||||
def test_returns_401_with_expired_token(self, client, expired_token):
|
||||
res = client.get("/api/items/1", headers={"Authorization": f"Bearer {expired_token}"})
|
||||
assert res.status_code == 401
|
||||
assert "expired" in res.json().get("detail", "").lower()
|
||||
|
||||
def test_returns_404_for_nonexistent_item(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items/99999999",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 404
|
||||
|
||||
def test_returns_400_for_invalid_id_format(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items/not-a-number",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code in (400, 422)
|
||||
|
||||
def test_returns_200_with_valid_auth(self, client, valid_token, test_item):
|
||||
res = client.get(
|
||||
f"/api/items/{test_item['id']}",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 200
|
||||
data = res.json()
|
||||
assert data["id"] == test_item["id"]
|
||||
assert "password" not in data
|
||||
|
||||
|
||||
class TestCreateItem:
|
||||
def test_returns_422_with_empty_body(self, client, admin_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
errors = res.json()["detail"]
|
||||
assert len(errors) > 0
|
||||
|
||||
def test_returns_422_with_missing_required_field(self, client, admin_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"description": "no name field"},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
fields = [e["loc"][-1] for e in res.json()["detail"]]
|
||||
assert "name" in fields
|
||||
|
||||
def test_returns_422_with_wrong_type(self, client, admin_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"name": "test", "price": "not-a-number"},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
|
||||
@pytest.mark.parametrize("price", [-1, -0.01])
|
||||
def test_returns_422_for_negative_price(self, client, admin_token, price):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"name": "test", "price": price},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
|
||||
def test_returns_422_for_price_exceeding_max(self, client, admin_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"name": "test", "price": 1_000_001},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
|
||||
def test_creates_item_successfully(self, client, admin_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"name": "New Widget", "price": 9.99, "category": "tools"},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert res.status_code == 201
|
||||
data = res.json()
|
||||
assert "id" in data
|
||||
assert data["name"] == "New Widget"
|
||||
|
||||
def test_returns_403_for_non_admin(self, client, valid_token):
|
||||
res = client.post(
|
||||
"/api/items",
|
||||
json={"name": "test", "price": 1.0},
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 403
|
||||
|
||||
|
||||
class TestPagination:
|
||||
def test_returns_paginated_response(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items?page=1&size=10",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 200
|
||||
data = res.json()
|
||||
assert "items" in data
|
||||
assert "total" in data
|
||||
assert "page" in data
|
||||
assert len(data["items"]) <= 10
|
||||
|
||||
def test_empty_result_for_out_of_range_page(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items?page=99999",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 200
|
||||
assert res.json()["items"] == []
|
||||
|
||||
def test_returns_422_for_page_zero(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items?page=0",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 422
|
||||
|
||||
def test_caps_page_size_at_maximum(self, client, valid_token):
|
||||
res = client.get(
|
||||
"/api/items?size=9999",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
assert res.status_code == 200
|
||||
assert len(res.json()["items"]) <= 100 # max page size
|
||||
|
||||
|
||||
class TestRateLimiting:
|
||||
def test_rate_limit_after_burst(self, client, valid_token):
|
||||
responses = []
|
||||
for _ in range(60): # exceed typical 50/min limit
|
||||
res = client.get(
|
||||
"/api/items",
|
||||
headers={"Authorization": f"Bearer {valid_token}"},
|
||||
)
|
||||
responses.append(res.status_code)
|
||||
if res.status_code == 429:
|
||||
break
|
||||
assert 429 in responses, "Rate limit was not triggered"
|
||||
|
||||
def test_rate_limit_response_has_retry_after(self, client, valid_token):
|
||||
for _ in range(60):
|
||||
res = client.get("/api/items", headers={"Authorization": f"Bearer {valid_token}"})
|
||||
if res.status_code == 429:
|
||||
assert "Retry-After" in res.headers or "retry_after" in res.json()
|
||||
break
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Generating Tests from Route Scan
|
||||
|
||||
When given a codebase, follow this process:
|
||||
|
||||
1. **Scan routes** using the detection commands above
|
||||
2. **Read each route handler** to understand:
|
||||
- Expected request body schema
|
||||
- Auth requirements (middleware, decorators)
|
||||
- Return types and status codes
|
||||
- Business rules (ownership, role checks)
|
||||
3. **Generate test file** per route group using the patterns above
|
||||
4. **Name tests descriptively**: `"returns 401 when token is expired"` not `"auth test 3"`
|
||||
5. **Use factories/fixtures** for test data — never hardcode IDs
|
||||
6. **Assert response shape**, not just status code
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Testing only happy paths** — 80% of bugs live in error paths; test those first
|
||||
- **Hardcoded test data IDs** — use factories/fixtures; IDs change between environments
|
||||
- **Shared state between tests** — always clean up in afterEach/afterAll
|
||||
- **Testing implementation, not behavior** — test what the API returns, not how it does it
|
||||
- **Missing boundary tests** — off-by-one errors are extremely common in pagination and limits
|
||||
- **Not testing token expiry** — expired tokens behave differently from invalid ones
|
||||
- **Ignoring Content-Type** — test that API rejects wrong content types (xml when json expected)
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. One describe block per endpoint — keeps failures isolated and readable
|
||||
2. Seed minimal data — don't load the entire DB; create only what the test needs
|
||||
3. Use `beforeAll` for shared setup, `afterAll` for cleanup — not `beforeEach` for expensive ops
|
||||
4. Assert specific error messages/fields, not just status codes
|
||||
5. Test that sensitive fields (password, secret) are never in responses
|
||||
6. For auth tests, always test the "missing header" case separately from "invalid token"
|
||||
7. Add rate limit tests last — they can interfere with other test suites if run in parallel
|
||||
487
engineering/changelog-generator/SKILL.md
Normal file
487
engineering/changelog-generator/SKILL.md
Normal file
@@ -0,0 +1,487 @@
|
||||
# Changelog Generator
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Release Management / Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Parse conventional commits, determine semantic version bumps, and generate structured changelogs in Keep a Changelog format. Supports monorepo changelogs, GitHub Releases integration, and separates user-facing from developer changelogs.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Conventional commit parsing** — feat, fix, chore, docs, refactor, perf, test, build, ci
|
||||
- **SemVer bump determination** — breaking change → major, feat → minor, fix → patch
|
||||
- **Keep a Changelog format** — Added, Changed, Deprecated, Removed, Fixed, Security
|
||||
- **Monorepo support** — per-package changelogs with shared version strategy
|
||||
- **GitHub/GitLab Releases** — auto-create release with changelog body
|
||||
- **Audience-aware output** — user-facing (what changed) vs developer (why + technical details)
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Before every release to generate the CHANGELOG.md entry
|
||||
- Setting up automated changelog generation in CI
|
||||
- Converting git log into readable release notes for GitHub Releases
|
||||
- Maintaining monorepo changelogs for individual packages
|
||||
- Generating internal release notes for the engineering team
|
||||
|
||||
---
|
||||
|
||||
## Conventional Commits Reference
|
||||
|
||||
```
|
||||
<type>(<scope>): <description>
|
||||
|
||||
[optional body]
|
||||
|
||||
[optional footer(s)]
|
||||
```
|
||||
|
||||
### Types and SemVer impact
|
||||
|
||||
| Type | Changelog section | SemVer bump |
|
||||
|------|------------------|-------------|
|
||||
| `feat` | Added | minor |
|
||||
| `fix` | Fixed | patch |
|
||||
| `perf` | Changed | patch |
|
||||
| `refactor` | Changed (internal) | patch |
|
||||
| `docs` | — (omit or include) | patch |
|
||||
| `chore` | — (omit) | patch |
|
||||
| `test` | — (omit) | patch |
|
||||
| `build` | — (omit) | patch |
|
||||
| `ci` | — (omit) | patch |
|
||||
| `security` | Security | patch |
|
||||
| `deprecated` | Deprecated | minor |
|
||||
| `remove` | Removed | major (if breaking) |
|
||||
| `BREAKING CHANGE:` footer | — (major bump) | major |
|
||||
| `!` after type | — (major bump) | major |
|
||||
|
||||
### Examples
|
||||
|
||||
```
|
||||
feat(auth): add OAuth2 login with Google
|
||||
fix(api): correct pagination offset calculation
|
||||
feat!: rename /users endpoint to /accounts (BREAKING)
|
||||
perf(db): add index on users.email column
|
||||
security: patch XSS vulnerability in comment renderer
|
||||
docs: update API reference for v2 endpoints
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Changelog Generation Script
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# generate-changelog.sh — generate CHANGELOG entry for the latest release
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
CURRENT_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
|
||||
PREVIOUS_TAG=$(git describe --tags --abbrev=0 "${CURRENT_TAG}^" 2>/dev/null || echo "")
|
||||
DATE=$(date +%Y-%m-%d)
|
||||
|
||||
if [ -z "$CURRENT_TAG" ]; then
|
||||
echo "No tags found. Create a tag first: git tag v1.0.0"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
RANGE="${PREVIOUS_TAG:+${PREVIOUS_TAG}..}${CURRENT_TAG}"
|
||||
echo "Generating changelog for: $RANGE"
|
||||
|
||||
# Parse commits
|
||||
ADDED=""
|
||||
CHANGED=""
|
||||
DEPRECATED=""
|
||||
REMOVED=""
|
||||
FIXED=""
|
||||
SECURITY=""
|
||||
BREAKING=""
|
||||
|
||||
while IFS= read -r line; do
|
||||
# Skip empty lines
|
||||
[ -z "$line" ] && continue
|
||||
|
||||
# Detect type
|
||||
if [[ "$line" =~ ^feat(\([^)]+\))?\!:\ (.+)$ ]]; then
|
||||
desc="${BASH_REMATCH[2]}"
|
||||
BREAKING="${BREAKING}- **BREAKING** ${desc}\n"
|
||||
ADDED="${ADDED}- ${desc}\n"
|
||||
elif [[ "$line" =~ ^feat(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
ADDED="${ADDED}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^fix(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
FIXED="${FIXED}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^perf(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
CHANGED="${CHANGED}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^security(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
SECURITY="${SECURITY}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^deprecated(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
DEPRECATED="${DEPRECATED}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^remove(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
REMOVED="${REMOVED}- ${BASH_REMATCH[2]}\n"
|
||||
elif [[ "$line" =~ ^refactor(\([^)]+\))?:\ (.+)$ ]]; then
|
||||
CHANGED="${CHANGED}- ${BASH_REMATCH[2]}\n"
|
||||
fi
|
||||
done < <(git log "${RANGE}" --pretty=format:"%s" --no-merges)
|
||||
|
||||
# Build output
|
||||
OUTPUT="## [${CURRENT_TAG}] - ${DATE}\n\n"
|
||||
|
||||
[ -n "$BREAKING" ] && OUTPUT="${OUTPUT}### ⚠ BREAKING CHANGES\n${BREAKING}\n"
|
||||
[ -n "$SECURITY" ] && OUTPUT="${OUTPUT}### Security\n${SECURITY}\n"
|
||||
[ -n "$ADDED" ] && OUTPUT="${OUTPUT}### Added\n${ADDED}\n"
|
||||
[ -n "$CHANGED" ] && OUTPUT="${OUTPUT}### Changed\n${CHANGED}\n"
|
||||
[ -n "$DEPRECATED" ] && OUTPUT="${OUTPUT}### Deprecated\n${DEPRECATED}\n"
|
||||
[ -n "$REMOVED" ] && OUTPUT="${OUTPUT}### Removed\n${REMOVED}\n"
|
||||
[ -n "$FIXED" ] && OUTPUT="${OUTPUT}### Fixed\n${FIXED}\n"
|
||||
|
||||
printf "$OUTPUT"
|
||||
|
||||
# Optionally prepend to CHANGELOG.md
|
||||
if [ "${1:-}" = "--write" ]; then
|
||||
TEMP=$(mktemp)
|
||||
printf "$OUTPUT" > "$TEMP"
|
||||
|
||||
if [ -f CHANGELOG.md ]; then
|
||||
# Insert after the first line (# Changelog header)
|
||||
head -n 1 CHANGELOG.md >> "$TEMP"
|
||||
echo "" >> "$TEMP"
|
||||
printf "$OUTPUT" >> "$TEMP"
|
||||
tail -n +2 CHANGELOG.md >> "$TEMP"
|
||||
else
|
||||
echo "# Changelog" > CHANGELOG.md
|
||||
echo "All notable changes to this project will be documented here." >> CHANGELOG.md
|
||||
echo "" >> CHANGELOG.md
|
||||
cat "$TEMP" >> CHANGELOG.md
|
||||
fi
|
||||
|
||||
mv "$TEMP" CHANGELOG.md
|
||||
echo "✅ CHANGELOG.md updated"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Python Changelog Generator (more robust)
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""generate_changelog.py — parse conventional commits and emit Keep a Changelog"""
|
||||
|
||||
import subprocess
|
||||
import re
|
||||
import sys
|
||||
from datetime import date
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
|
||||
COMMIT_RE = re.compile(
|
||||
r"^(?P<type>feat|fix|perf|refactor|docs|test|chore|build|ci|security|deprecated|remove)"
|
||||
r"(?:\((?P<scope>[^)]+)\))?(?P<breaking>!)?: (?P<desc>.+)$"
|
||||
)
|
||||
|
||||
SECTION_MAP = {
|
||||
"feat": "Added",
|
||||
"fix": "Fixed",
|
||||
"perf": "Changed",
|
||||
"refactor": "Changed",
|
||||
"security": "Security",
|
||||
"deprecated": "Deprecated",
|
||||
"remove": "Removed",
|
||||
}
|
||||
|
||||
@dataclass
|
||||
class Commit:
|
||||
type: str
|
||||
scope: Optional[str]
|
||||
breaking: bool
|
||||
desc: str
|
||||
body: str = ""
|
||||
sha: str = ""
|
||||
|
||||
@dataclass
|
||||
class ChangelogEntry:
|
||||
version: str
|
||||
date: str
|
||||
added: list[str] = field(default_factory=list)
|
||||
changed: list[str] = field(default_factory=list)
|
||||
deprecated: list[str] = field(default_factory=list)
|
||||
removed: list[str] = field(default_factory=list)
|
||||
fixed: list[str] = field(default_factory=list)
|
||||
security: list[str] = field(default_factory=list)
|
||||
breaking: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
def get_commits(from_tag: str, to_tag: str) -> list[Commit]:
|
||||
range_spec = f"{from_tag}..{to_tag}" if from_tag else to_tag
|
||||
result = subprocess.run(
|
||||
["git", "log", range_spec, "--pretty=format:%H|%s|%b", "--no-merges"],
|
||||
capture_output=True, text=True, check=True
|
||||
)
|
||||
|
||||
commits = []
|
||||
for line in result.stdout.splitlines():
|
||||
if not line.strip():
|
||||
continue
|
||||
parts = line.split("|", 2)
|
||||
sha = parts[0] if len(parts) > 0 else ""
|
||||
subject = parts[1] if len(parts) > 1 else ""
|
||||
body = parts[2] if len(parts) > 2 else ""
|
||||
|
||||
m = COMMIT_RE.match(subject)
|
||||
if m:
|
||||
commits.append(Commit(
|
||||
type=m.group("type"),
|
||||
scope=m.group("scope"),
|
||||
breaking=m.group("breaking") == "!" or "BREAKING CHANGE" in body,
|
||||
desc=m.group("desc"),
|
||||
body=body,
|
||||
sha=sha[:8],
|
||||
))
|
||||
|
||||
return commits
|
||||
|
||||
|
||||
def determine_bump(commits: list[Commit], current_version: str) -> str:
|
||||
parts = current_version.lstrip("v").split(".")
|
||||
major, minor, patch = int(parts[0]), int(parts[1]), int(parts[2])
|
||||
|
||||
has_breaking = any(c.breaking for c in commits)
|
||||
has_feat = any(c.type == "feat" for c in commits)
|
||||
|
||||
if has_breaking:
|
||||
return f"v{major + 1}.0.0"
|
||||
elif has_feat:
|
||||
return f"v{major}.{minor + 1}.0"
|
||||
else:
|
||||
return f"v{major}.{minor}.{patch + 1}"
|
||||
|
||||
|
||||
def build_entry(commits: list[Commit], version: str) -> ChangelogEntry:
|
||||
entry = ChangelogEntry(version=version, date=date.today().isoformat())
|
||||
|
||||
for c in commits:
|
||||
scope_prefix = f"**{c.scope}**: " if c.scope else ""
|
||||
desc = f"{scope_prefix}{c.desc}"
|
||||
|
||||
if c.breaking:
|
||||
entry.breaking.append(desc)
|
||||
|
||||
section = SECTION_MAP.get(c.type)
|
||||
if section == "Added":
|
||||
entry.added.append(desc)
|
||||
elif section == "Fixed":
|
||||
entry.fixed.append(desc)
|
||||
elif section == "Changed":
|
||||
entry.changed.append(desc)
|
||||
elif section == "Security":
|
||||
entry.security.append(desc)
|
||||
elif section == "Deprecated":
|
||||
entry.deprecated.append(desc)
|
||||
elif section == "Removed":
|
||||
entry.removed.append(desc)
|
||||
|
||||
return entry
|
||||
|
||||
|
||||
def render_entry(entry: ChangelogEntry) -> str:
|
||||
lines = [f"## [{entry.version}] - {entry.date}", ""]
|
||||
|
||||
sections = [
|
||||
("⚠ BREAKING CHANGES", entry.breaking),
|
||||
("Security", entry.security),
|
||||
("Added", entry.added),
|
||||
("Changed", entry.changed),
|
||||
("Deprecated", entry.deprecated),
|
||||
("Removed", entry.removed),
|
||||
("Fixed", entry.fixed),
|
||||
]
|
||||
|
||||
for title, items in sections:
|
||||
if items:
|
||||
lines.append(f"### {title}")
|
||||
for item in items:
|
||||
lines.append(f"- {item}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
tags = subprocess.run(
|
||||
["git", "tag", "--sort=-version:refname"],
|
||||
capture_output=True, text=True
|
||||
).stdout.splitlines()
|
||||
|
||||
current_tag = tags[0] if tags else ""
|
||||
previous_tag = tags[1] if len(tags) > 1 else ""
|
||||
|
||||
if not current_tag:
|
||||
print("No tags found. Create a tag first.")
|
||||
sys.exit(1)
|
||||
|
||||
commits = get_commits(previous_tag, current_tag)
|
||||
entry = build_entry(commits, current_tag)
|
||||
print(render_entry(entry))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monorepo Changelog Strategy
|
||||
|
||||
For repos with multiple packages (e.g., pnpm workspaces, nx, turborepo):
|
||||
|
||||
```bash
|
||||
# packages/api/CHANGELOG.md — API package only
|
||||
# packages/ui/CHANGELOG.md — UI package only
|
||||
# CHANGELOG.md — Root (affects all)
|
||||
|
||||
# Filter commits by package path
|
||||
git log v1.2.0..v1.3.0 --pretty=format:"%s" -- packages/api/
|
||||
```
|
||||
|
||||
With Changesets (recommended for monorepos):
|
||||
|
||||
```bash
|
||||
# Install changesets
|
||||
pnpm add -D @changesets/cli
|
||||
pnpm changeset init
|
||||
|
||||
# Developer workflow: create a changeset for each PR
|
||||
pnpm changeset
|
||||
# → prompts for: which packages changed, bump type, description
|
||||
|
||||
# On release branch: version all packages
|
||||
pnpm changeset version
|
||||
|
||||
# Publish and create GitHub release
|
||||
pnpm changeset publish
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GitHub Releases Integration
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# create-github-release.sh
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
VERSION=$(git describe --tags --abbrev=0)
|
||||
NOTES=$(python3 generate_changelog.py)
|
||||
|
||||
# Using GitHub CLI
|
||||
gh release create "$VERSION" \
|
||||
--title "Release $VERSION" \
|
||||
--notes "$NOTES" \
|
||||
--verify-tag
|
||||
|
||||
# Or via API
|
||||
curl -s -X POST \
|
||||
-H "Authorization: Bearer $GITHUB_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
"https://api.github.com/repos/${REPO}/releases" \
|
||||
-d "$(jq -n \
|
||||
--arg tag "$VERSION" \
|
||||
--arg name "Release $VERSION" \
|
||||
--arg body "$NOTES" \
|
||||
'{tag_name: $tag, name: $name, body: $body, draft: false}')"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## User-Facing vs Developer Changelog
|
||||
|
||||
### User-facing (product changelog)
|
||||
- Plain language, no jargon
|
||||
- Focus on what changed, not how
|
||||
- Skip: refactor, test, chore, ci, docs
|
||||
- Include: feat, fix, security, perf (if user-visible)
|
||||
|
||||
```markdown
|
||||
## Version 2.3.0 — March 1, 2026
|
||||
|
||||
**New:** You can now log in with Google.
|
||||
**Fixed:** Dashboard no longer freezes when loading large datasets.
|
||||
**Improved:** Search results load 3x faster.
|
||||
```
|
||||
|
||||
### Developer changelog (CHANGELOG.md)
|
||||
- Technical details, scope, SemVer impact
|
||||
- Include all breaking changes with migration notes
|
||||
- Reference PR numbers and issue IDs
|
||||
|
||||
```markdown
|
||||
## [2.3.0] - 2026-03-01
|
||||
|
||||
### Added
|
||||
- **auth**: OAuth2 Google login via passport-google (#234)
|
||||
- **api**: GraphQL subscriptions for real-time updates (#241)
|
||||
|
||||
### Fixed
|
||||
- **dashboard**: resolve infinite re-render on large datasets (closes #228)
|
||||
|
||||
### Performance
|
||||
- **search**: switch from Elasticsearch to Typesense, P99 latency -67% (#239)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GitHub Actions — Automated Changelog CI
|
||||
|
||||
```yaml
|
||||
name: Release
|
||||
|
||||
on:
|
||||
push:
|
||||
tags: ['v*']
|
||||
|
||||
jobs:
|
||||
release:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0 # Full history for git log
|
||||
|
||||
- name: Generate changelog
|
||||
id: changelog
|
||||
run: |
|
||||
NOTES=$(python3 scripts/generate_changelog.py)
|
||||
echo "notes<<EOF" >> $GITHUB_OUTPUT
|
||||
echo "$NOTES" >> $GITHUB_OUTPUT
|
||||
echo "EOF" >> $GITHUB_OUTPUT
|
||||
|
||||
- name: Create GitHub Release
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
body: ${{ steps.changelog.outputs.notes }}
|
||||
generate_release_notes: false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **`--depth=1` in CI** — git log needs full history; use `fetch-depth: 0`
|
||||
- **Merge commits polluting log** — always use `--no-merges`
|
||||
- **No conventional commits discipline** — enforce with `commitlint` in CI
|
||||
- **Missing previous tag** — handle first-release case (no previous tag)
|
||||
- **Version in multiple places** — single source of truth; read from git tag, not package.json
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **commitlint in CI** — enforce conventional commits before merge
|
||||
2. **Tag before generating** — tag the release commit first, then generate
|
||||
3. **Separate user/dev changelog** — product team wants plain English
|
||||
4. **Keep a link section** — `[2.3.0]: https://github.com/org/repo/compare/v2.2.0...v2.3.0`
|
||||
5. **Automate but review** — generate in CI, human reviews before publish
|
||||
517
engineering/ci-cd-pipeline-builder/SKILL.md
Normal file
517
engineering/ci-cd-pipeline-builder/SKILL.md
Normal file
@@ -0,0 +1,517 @@
|
||||
# CI/CD Pipeline Builder
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** DevOps / Automation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Analyzes your project stack and generates production-ready CI/CD pipeline configurations for GitHub Actions, GitLab CI, and Bitbucket Pipelines. Handles matrix testing, caching strategies, deployment stages, environment promotion, and secret management — tailored to your actual tech stack.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Stack detection** — reads `package.json`, `Dockerfile`, `pyproject.toml`, `go.mod`, etc.
|
||||
- **Pipeline generation** — GitHub Actions, GitLab CI, Bitbucket Pipelines
|
||||
- **Matrix testing** — multi-version, multi-OS, multi-environment
|
||||
- **Smart caching** — npm, pip, Docker layer, Gradle, Maven
|
||||
- **Deployment stages** — build → test → staging → production with approvals
|
||||
- **Environment promotion** — automatic on green tests, manual gate for production
|
||||
- **Secret management** — patterns for GitHub Secrets, GitLab CI Variables, Vault, AWS SSM
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Starting a new project and need a CI/CD baseline
|
||||
- Migrating from one CI platform to another
|
||||
- Adding deployment stages to an existing pipeline
|
||||
- Auditing a slow pipeline and optimizing caching
|
||||
- Setting up environment promotion with manual approval gates
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1 — Stack Detection
|
||||
|
||||
Ask Claude to analyze your repo:
|
||||
|
||||
```
|
||||
Analyze my repo and generate a GitHub Actions CI/CD pipeline.
|
||||
Check: package.json, Dockerfile, .nvmrc, pyproject.toml, go.mod
|
||||
```
|
||||
|
||||
Claude will inspect:
|
||||
|
||||
| File | Signals |
|
||||
|------|---------|
|
||||
| `package.json` | Node version, test runner, build tool |
|
||||
| `.nvmrc` / `.node-version` | Exact Node version |
|
||||
| `Dockerfile` | Base image, multi-stage build |
|
||||
| `pyproject.toml` | Python version, test runner |
|
||||
| `go.mod` | Go version |
|
||||
| `vercel.json` | Vercel deployment config |
|
||||
| `k8s/` or `helm/` | Kubernetes deployment |
|
||||
|
||||
---
|
||||
|
||||
## Complete Example: Next.js + Vercel
|
||||
|
||||
```yaml
|
||||
# .github/workflows/ci.yml
|
||||
name: CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main, develop]
|
||||
pull_request:
|
||||
branches: [main, develop]
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
env:
|
||||
NODE_VERSION: '20'
|
||||
PNPM_VERSION: '8'
|
||||
|
||||
jobs:
|
||||
lint-typecheck:
|
||||
name: Lint & Typecheck
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: pnpm/action-setup@v3
|
||||
with:
|
||||
version: ${{ env.PNPM_VERSION }}
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: ${{ env.NODE_VERSION }}
|
||||
cache: 'pnpm'
|
||||
- run: pnpm install --frozen-lockfile
|
||||
- run: pnpm lint
|
||||
- run: pnpm typecheck
|
||||
|
||||
test:
|
||||
name: Test (Node ${{ matrix.node }})
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
node: ['18', '20', '22']
|
||||
fail-fast: false
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: pnpm/action-setup@v3
|
||||
with:
|
||||
version: ${{ env.PNPM_VERSION }}
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: ${{ matrix.node }}
|
||||
cache: 'pnpm'
|
||||
- run: pnpm install --frozen-lockfile
|
||||
- name: Run tests with coverage
|
||||
run: pnpm test:ci
|
||||
env:
|
||||
DATABASE_URL: ${{ secrets.TEST_DATABASE_URL }}
|
||||
- name: Upload coverage
|
||||
uses: codecov/codecov-action@v4
|
||||
with:
|
||||
token: ${{ secrets.CODECOV_TOKEN }}
|
||||
|
||||
build:
|
||||
name: Build
|
||||
runs-on: ubuntu-latest
|
||||
needs: [lint-typecheck, test]
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: pnpm/action-setup@v3
|
||||
with:
|
||||
version: ${{ env.PNPM_VERSION }}
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: ${{ env.NODE_VERSION }}
|
||||
cache: 'pnpm'
|
||||
- run: pnpm install --frozen-lockfile
|
||||
- name: Build
|
||||
run: pnpm build
|
||||
env:
|
||||
NEXT_PUBLIC_API_URL: ${{ vars.NEXT_PUBLIC_API_URL }}
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: build-${{ github.sha }}
|
||||
path: .next/
|
||||
retention-days: 7
|
||||
|
||||
deploy-staging:
|
||||
name: Deploy to Staging
|
||||
runs-on: ubuntu-latest
|
||||
needs: build
|
||||
if: github.ref == 'refs/heads/develop'
|
||||
environment:
|
||||
name: staging
|
||||
url: https://staging.myapp.com
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: amondnet/vercel-action@v25
|
||||
with:
|
||||
vercel-token: ${{ secrets.VERCEL_TOKEN }}
|
||||
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
|
||||
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
|
||||
|
||||
deploy-production:
|
||||
name: Deploy to Production
|
||||
runs-on: ubuntu-latest
|
||||
needs: build
|
||||
if: github.ref == 'refs/heads/main'
|
||||
environment:
|
||||
name: production
|
||||
url: https://myapp.com
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: amondnet/vercel-action@v25
|
||||
with:
|
||||
vercel-token: ${{ secrets.VERCEL_TOKEN }}
|
||||
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
|
||||
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
|
||||
vercel-args: '--prod'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Example: Python + AWS Lambda
|
||||
|
||||
```yaml
|
||||
# .github/workflows/deploy.yml
|
||||
name: Python Lambda CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
python-version: ['3.11', '3.12']
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
cache: 'pip'
|
||||
- run: pip install -r requirements-dev.txt
|
||||
- run: pytest tests/ -v --cov=src --cov-report=xml
|
||||
- run: mypy src/
|
||||
- run: ruff check src/ tests/
|
||||
|
||||
security:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.12'
|
||||
cache: 'pip'
|
||||
- run: pip install bandit safety
|
||||
- run: bandit -r src/ -ll
|
||||
- run: safety check
|
||||
|
||||
package:
|
||||
needs: [test, security]
|
||||
runs-on: ubuntu-latest
|
||||
if: github.ref == 'refs/heads/main'
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: '3.12'
|
||||
- name: Build Lambda zip
|
||||
run: |
|
||||
pip install -r requirements.txt --target ./package
|
||||
cd package && zip -r ../lambda.zip .
|
||||
cd .. && zip lambda.zip -r src/
|
||||
- uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: lambda-${{ github.sha }}
|
||||
path: lambda.zip
|
||||
|
||||
deploy-staging:
|
||||
needs: package
|
||||
runs-on: ubuntu-latest
|
||||
environment: staging
|
||||
steps:
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: lambda-${{ github.sha }}
|
||||
- uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: eu-west-1
|
||||
- run: |
|
||||
aws lambda update-function-code \
|
||||
--function-name myapp-staging \
|
||||
--zip-file fileb://lambda.zip
|
||||
|
||||
deploy-production:
|
||||
needs: deploy-staging
|
||||
runs-on: ubuntu-latest
|
||||
environment: production
|
||||
steps:
|
||||
- uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: lambda-${{ github.sha }}
|
||||
- uses: aws-actions/configure-aws-credentials@v4
|
||||
with:
|
||||
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
|
||||
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
|
||||
aws-region: eu-west-1
|
||||
- run: |
|
||||
aws lambda update-function-code \
|
||||
--function-name myapp-production \
|
||||
--zip-file fileb://lambda.zip
|
||||
VERSION=$(aws lambda publish-version \
|
||||
--function-name myapp-production \
|
||||
--query 'Version' --output text)
|
||||
aws lambda update-alias \
|
||||
--function-name myapp-production \
|
||||
--name live \
|
||||
--function-version $VERSION
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Example: Docker + Kubernetes
|
||||
|
||||
```yaml
|
||||
# .github/workflows/k8s-deploy.yml
|
||||
name: Docker + Kubernetes
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
tags: ['v*']
|
||||
|
||||
env:
|
||||
REGISTRY: ghcr.io
|
||||
IMAGE_NAME: ${{ github.repository }}
|
||||
|
||||
jobs:
|
||||
build-push:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
packages: write
|
||||
outputs:
|
||||
image-digest: ${{ steps.push.outputs.digest }}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Log in to GHCR
|
||||
uses: docker/login-action@v3
|
||||
with:
|
||||
registry: ${{ env.REGISTRY }}
|
||||
username: ${{ github.actor }}
|
||||
password: ${{ secrets.GITHUB_TOKEN }}
|
||||
|
||||
- name: Extract metadata
|
||||
id: meta
|
||||
uses: docker/metadata-action@v5
|
||||
with:
|
||||
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
|
||||
tags: |
|
||||
type=ref,event=branch
|
||||
type=semver,pattern={{version}}
|
||||
type=sha,prefix=sha-
|
||||
|
||||
- name: Build and push
|
||||
id: push
|
||||
uses: docker/build-push-action@v5
|
||||
with:
|
||||
context: .
|
||||
push: true
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
labels: ${{ steps.meta.outputs.labels }}
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
|
||||
deploy-staging:
|
||||
needs: build-push
|
||||
runs-on: ubuntu-latest
|
||||
environment: staging
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: azure/setup-kubectl@v3
|
||||
- name: Set kubeconfig
|
||||
run: |
|
||||
echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > /tmp/kubeconfig
|
||||
echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
|
||||
- name: Deploy
|
||||
run: |
|
||||
kubectl set image deployment/myapp \
|
||||
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-push.outputs.image-digest }} \
|
||||
-n staging
|
||||
kubectl rollout status deployment/myapp -n staging --timeout=5m
|
||||
|
||||
deploy-production:
|
||||
needs: deploy-staging
|
||||
runs-on: ubuntu-latest
|
||||
environment: production
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: azure/setup-kubectl@v3
|
||||
- name: Set kubeconfig
|
||||
run: |
|
||||
echo "${{ secrets.KUBE_CONFIG_PROD }}" | base64 -d > /tmp/kubeconfig
|
||||
echo "KUBECONFIG=/tmp/kubeconfig" >> $GITHUB_ENV
|
||||
- name: Canary deploy
|
||||
run: |
|
||||
kubectl set image deployment/myapp-canary \
|
||||
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-push.outputs.image-digest }} \
|
||||
-n production
|
||||
kubectl rollout status deployment/myapp-canary -n production --timeout=5m
|
||||
sleep 120
|
||||
kubectl set image deployment/myapp \
|
||||
myapp=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}@${{ needs.build-push.outputs.image-digest }} \
|
||||
-n production
|
||||
kubectl rollout status deployment/myapp -n production --timeout=10m
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GitLab CI Equivalent
|
||||
|
||||
```yaml
|
||||
# .gitlab-ci.yml
|
||||
stages: [lint, test, build, deploy-staging, deploy-production]
|
||||
|
||||
variables:
|
||||
NODE_VERSION: "20"
|
||||
DOCKER_BUILDKIT: "1"
|
||||
|
||||
.node-cache: &node-cache
|
||||
cache:
|
||||
key:
|
||||
files: [pnpm-lock.yaml]
|
||||
paths:
|
||||
- node_modules/
|
||||
- .pnpm-store/
|
||||
|
||||
lint:
|
||||
stage: lint
|
||||
image: node:${NODE_VERSION}-alpine
|
||||
<<: *node-cache
|
||||
script:
|
||||
- corepack enable && pnpm install --frozen-lockfile
|
||||
- pnpm lint && pnpm typecheck
|
||||
|
||||
test:
|
||||
stage: test
|
||||
image: node:${NODE_VERSION}-alpine
|
||||
<<: *node-cache
|
||||
parallel:
|
||||
matrix:
|
||||
- NODE_VERSION: ["18", "20", "22"]
|
||||
script:
|
||||
- corepack enable && pnpm install --frozen-lockfile
|
||||
- pnpm test:ci
|
||||
coverage: '/Lines\s*:\s*(\d+\.?\d*)%/'
|
||||
|
||||
deploy-staging:
|
||||
stage: deploy-staging
|
||||
environment:
|
||||
name: staging
|
||||
url: https://staging.myapp.com
|
||||
only: [develop]
|
||||
script:
|
||||
- npx vercel --token=$VERCEL_TOKEN
|
||||
|
||||
deploy-production:
|
||||
stage: deploy-production
|
||||
environment:
|
||||
name: production
|
||||
url: https://myapp.com
|
||||
only: [main]
|
||||
when: manual
|
||||
script:
|
||||
- npx vercel --prod --token=$VERCEL_TOKEN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Secret Management Patterns
|
||||
|
||||
### GitHub Actions — Secret Hierarchy
|
||||
```
|
||||
Repository secrets → all branches
|
||||
Environment secrets → only that environment
|
||||
Organization secrets → all repos in org
|
||||
```
|
||||
|
||||
### Fetching from AWS SSM at runtime
|
||||
```yaml
|
||||
- name: Load secrets from SSM
|
||||
run: |
|
||||
DB_URL=$(aws ssm get-parameter \
|
||||
--name "/myapp/production/DATABASE_URL" \
|
||||
--with-decryption \
|
||||
--query 'Parameter.Value' --output text)
|
||||
echo "DATABASE_URL=$DB_URL" >> $GITHUB_ENV
|
||||
env:
|
||||
AWS_REGION: eu-west-1
|
||||
```
|
||||
|
||||
### HashiCorp Vault integration
|
||||
```yaml
|
||||
- uses: hashicorp/vault-action@v2
|
||||
with:
|
||||
url: ${{ secrets.VAULT_ADDR }}
|
||||
token: ${{ secrets.VAULT_TOKEN }}
|
||||
secrets: |
|
||||
secret/data/myapp/prod DATABASE_URL | DATABASE_URL ;
|
||||
secret/data/myapp/prod API_KEY | API_KEY
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Caching Cheat Sheet
|
||||
|
||||
| Stack | Cache key | Cache path |
|
||||
|-------|-----------|------------|
|
||||
| npm | `package-lock.json` | `~/.npm` |
|
||||
| pnpm | `pnpm-lock.yaml` | `~/.pnpm-store` |
|
||||
| pip | `requirements.txt` | `~/.cache/pip` |
|
||||
| poetry | `poetry.lock` | `~/.cache/pypoetry` |
|
||||
| Docker | SHA of Dockerfile | GHA cache (type=gha) |
|
||||
| Go | `go.sum` | `~/go/pkg/mod` |
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Secrets in logs** — never `echo $SECRET`; use `::add-mask::$SECRET` if needed
|
||||
- **No concurrency limits** — add `concurrency:` to cancel stale runs on PR push
|
||||
- **Skipping `--frozen-lockfile`** — lockfile drift breaks reproducibility
|
||||
- **No rollback plan** — test `kubectl rollout undo` or `vercel rollback` before you need it
|
||||
- **Mutable image tags** — never use `latest` in production; tag by git SHA
|
||||
- **Missing environment protection rules** — set required reviewers in GitHub Environments
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Fail fast** — lint/typecheck before expensive test jobs
|
||||
2. **Artifact immutability** — Docker image tagged by git SHA
|
||||
3. **Environment parity** — same image through all envs, config via env vars
|
||||
4. **Canary first** — 10% traffic + error rate check before 100%
|
||||
5. **Pin action versions** — `@v4` not `@main`
|
||||
6. **Least privilege** — each job gets only the IAM scopes it needs
|
||||
7. **Notify on failure** — Slack webhook for production deploy failures
|
||||
497
engineering/codebase-onboarding/SKILL.md
Normal file
497
engineering/codebase-onboarding/SKILL.md
Normal file
@@ -0,0 +1,497 @@
|
||||
# Codebase Onboarding
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Documentation / Developer Experience
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Analyze a codebase and generate comprehensive onboarding documentation tailored to your audience. Produces architecture overviews, key file maps, local setup guides, common task runbooks, debugging guides, and contribution guidelines. Outputs to Markdown, Notion, or Confluence.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Architecture overview** — tech stack, system boundaries, data flow diagrams
|
||||
- **Key file map** — what's important and why, with annotations
|
||||
- **Local setup guide** — step-by-step from clone to running tests
|
||||
- **Common developer tasks** — how to add a route, run migrations, create a component
|
||||
- **Debugging guide** — common errors, log locations, useful queries
|
||||
- **Contribution guidelines** — branch strategy, PR process, code style
|
||||
- **Audience-aware output** — junior, senior, or contractor mode
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Onboarding a new team member or contractor
|
||||
- After a major refactor that made existing docs stale
|
||||
- Before open-sourcing a project
|
||||
- Creating a team wiki page for a service
|
||||
- Self-documenting before a long vacation
|
||||
|
||||
---
|
||||
|
||||
## Codebase Analysis Commands
|
||||
|
||||
Run these before generating docs to gather facts:
|
||||
|
||||
```bash
|
||||
# Project overview
|
||||
cat package.json | jq '{name, version, scripts, dependencies: (.dependencies | keys), devDependencies: (.devDependencies | keys)}'
|
||||
|
||||
# Directory structure (top 2 levels)
|
||||
find . -maxdepth 2 -not -path '*/node_modules/*' -not -path '*/.git/*' -not -path '*/.next/*' | sort | head -60
|
||||
|
||||
# Largest files (often core modules)
|
||||
find src/ -name "*.ts" -not -path "*/test*" -exec wc -l {} + | sort -rn | head -20
|
||||
|
||||
# All routes (Next.js App Router)
|
||||
find app/ -name "route.ts" -o -name "page.tsx" | sort
|
||||
|
||||
# All routes (Express)
|
||||
grep -rn "router\.\(get\|post\|put\|patch\|delete\)" src/routes/ --include="*.ts"
|
||||
|
||||
# Recent major changes
|
||||
git log --oneline --since="90 days ago" | grep -E "feat|refactor|breaking"
|
||||
|
||||
# Top contributors
|
||||
git shortlog -sn --no-merges | head -10
|
||||
|
||||
# Test coverage summary
|
||||
pnpm test:ci --coverage 2>&1 | tail -20
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Generated Documentation Template
|
||||
|
||||
### README.md — Full Template
|
||||
|
||||
```markdown
|
||||
# [Project Name]
|
||||
|
||||
> One-sentence description of what this does and who uses it.
|
||||
|
||||
[](https://github.com/org/repo/actions/workflows/ci.yml)
|
||||
[](https://codecov.io/gh/org/repo)
|
||||
|
||||
## What is this?
|
||||
|
||||
[2-3 sentences: problem it solves, who uses it, current state]
|
||||
|
||||
**Live:** https://myapp.com
|
||||
**Staging:** https://staging.myapp.com
|
||||
**Docs:** https://docs.myapp.com
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Tool | Version | Install |
|
||||
|------|---------|---------|
|
||||
| Node.js | 20+ | `nvm install 20` |
|
||||
| pnpm | 8+ | `npm i -g pnpm` |
|
||||
| Docker | 24+ | [docker.com](https://docker.com) |
|
||||
| PostgreSQL | 16+ | via Docker (see below) |
|
||||
|
||||
### Setup (5 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Clone
|
||||
git clone https://github.com/org/repo
|
||||
cd repo
|
||||
|
||||
# 2. Install dependencies
|
||||
pnpm install
|
||||
|
||||
# 3. Start infrastructure
|
||||
docker compose up -d # Starts Postgres, Redis
|
||||
|
||||
# 4. Environment
|
||||
cp .env.example .env
|
||||
# Edit .env — ask a teammate for real values or see Vault
|
||||
|
||||
# 5. Database setup
|
||||
pnpm db:migrate # Run migrations
|
||||
pnpm db:seed # Optional: load test data
|
||||
|
||||
# 6. Start dev server
|
||||
pnpm dev # → http://localhost:3000
|
||||
|
||||
# 7. Verify
|
||||
pnpm test # Should be all green
|
||||
```
|
||||
|
||||
### Verify it works
|
||||
|
||||
- [ ] `http://localhost:3000` loads the app
|
||||
- [ ] `http://localhost:3000/api/health` returns `{"status":"ok"}`
|
||||
- [ ] `pnpm test` passes
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### System Overview
|
||||
|
||||
```
|
||||
Browser / Mobile
|
||||
│
|
||||
▼
|
||||
[Next.js App] ←──── [Auth: NextAuth]
|
||||
│
|
||||
├──→ [PostgreSQL] (primary data store)
|
||||
├──→ [Redis] (sessions, job queue)
|
||||
└──→ [S3] (file uploads)
|
||||
|
||||
Background:
|
||||
[BullMQ workers] ←── Redis queue
|
||||
└──→ [External APIs: Stripe, SendGrid]
|
||||
```
|
||||
|
||||
### Tech Stack
|
||||
|
||||
| Layer | Technology | Why |
|
||||
|-------|-----------|-----|
|
||||
| Frontend | Next.js 14 (App Router) | SSR, file-based routing |
|
||||
| Styling | Tailwind CSS + shadcn/ui | Rapid UI development |
|
||||
| API | Next.js Route Handlers | Co-located with frontend |
|
||||
| Database | PostgreSQL 16 | Relational, RLS for multi-tenancy |
|
||||
| ORM | Drizzle ORM | Type-safe, lightweight |
|
||||
| Auth | NextAuth v5 | OAuth + email/password |
|
||||
| Queue | BullMQ + Redis | Background jobs |
|
||||
| Storage | AWS S3 | File uploads |
|
||||
| Email | SendGrid | Transactional email |
|
||||
| Payments | Stripe | Subscriptions |
|
||||
| Deployment | Vercel (app) + Railway (workers) | |
|
||||
| Monitoring | Sentry + Datadog | |
|
||||
|
||||
---
|
||||
|
||||
## Key Files
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `app/` | Next.js App Router — pages and API routes |
|
||||
| `app/api/` | API route handlers |
|
||||
| `app/(auth)/` | Auth pages (login, register, reset) |
|
||||
| `app/(app)/` | Protected app pages |
|
||||
| `src/db/` | Database schema, migrations, client |
|
||||
| `src/db/schema.ts` | **Drizzle schema — single source of truth** |
|
||||
| `src/lib/` | Shared utilities (auth, email, stripe) |
|
||||
| `src/lib/auth.ts` | **Auth configuration — read this first** |
|
||||
| `src/components/` | Reusable React components |
|
||||
| `src/hooks/` | Custom React hooks |
|
||||
| `src/types/` | Shared TypeScript types |
|
||||
| `workers/` | BullMQ background job processors |
|
||||
| `emails/` | React Email templates |
|
||||
| `tests/` | Test helpers, factories, integration tests |
|
||||
| `.env.example` | All env vars with descriptions |
|
||||
| `docker-compose.yml` | Local infrastructure |
|
||||
|
||||
---
|
||||
|
||||
## Common Developer Tasks
|
||||
|
||||
### Add a new API endpoint
|
||||
|
||||
```bash
|
||||
# 1. Create route handler
|
||||
touch app/api/my-resource/route.ts
|
||||
```
|
||||
|
||||
```typescript
|
||||
// app/api/my-resource/route.ts
|
||||
import { NextRequest, NextResponse } from 'next/server'
|
||||
import { auth } from '@/lib/auth'
|
||||
import { db } from '@/db/client'
|
||||
|
||||
export async function GET(req: NextRequest) {
|
||||
const session = await auth()
|
||||
if (!session) {
|
||||
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
|
||||
}
|
||||
|
||||
const data = await db.query.myResource.findMany({
|
||||
where: (r, { eq }) => eq(r.userId, session.user.id),
|
||||
})
|
||||
|
||||
return NextResponse.json({ data })
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# 2. Add tests
|
||||
touch tests/api/my-resource.test.ts
|
||||
|
||||
# 3. Add to OpenAPI spec (if applicable)
|
||||
pnpm generate:openapi
|
||||
```
|
||||
|
||||
### Run a database migration
|
||||
|
||||
```bash
|
||||
# Create migration
|
||||
pnpm db:generate # Generates SQL from schema changes
|
||||
|
||||
# Review the generated SQL
|
||||
cat drizzle/migrations/0001_my_change.sql
|
||||
|
||||
# Apply
|
||||
pnpm db:migrate
|
||||
|
||||
# Roll back (manual — inspect generated SQL and revert)
|
||||
psql $DATABASE_URL -f scripts/rollback_0001.sql
|
||||
```
|
||||
|
||||
### Add a new email template
|
||||
|
||||
```bash
|
||||
# 1. Create template
|
||||
touch emails/my-email.tsx
|
||||
|
||||
# 2. Preview in browser
|
||||
pnpm email:preview
|
||||
|
||||
# 3. Send in code
|
||||
import { sendEmail } from '@/lib/email'
|
||||
await sendEmail({
|
||||
to: user.email,
|
||||
subject: 'Subject line',
|
||||
template: 'my-email',
|
||||
props: { name: user.name },
|
||||
})
|
||||
```
|
||||
|
||||
### Add a background job
|
||||
|
||||
```typescript
|
||||
// 1. Define job in workers/jobs/my-job.ts
|
||||
import { Queue, Worker } from 'bullmq'
|
||||
import { redis } from '@/lib/redis'
|
||||
|
||||
export const myJobQueue = new Queue('my-job', { connection: redis })
|
||||
|
||||
export const myJobWorker = new Worker('my-job', async (job) => {
|
||||
const { userId, data } = job.data
|
||||
// do work
|
||||
}, { connection: redis })
|
||||
|
||||
// 2. Enqueue
|
||||
await myJobQueue.add('process', { userId, data }, {
|
||||
attempts: 3,
|
||||
backoff: { type: 'exponential', delay: 1000 },
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging Guide
|
||||
|
||||
### Common Errors
|
||||
|
||||
**`Error: DATABASE_URL is not set`**
|
||||
```bash
|
||||
# Check your .env file exists and has the var
|
||||
cat .env | grep DATABASE_URL
|
||||
|
||||
# Start Postgres if not running
|
||||
docker compose up -d postgres
|
||||
```
|
||||
|
||||
**`PrismaClientKnownRequestError: P2002 Unique constraint failed`**
|
||||
```
|
||||
User already exists with that email. Check: is this a duplicate registration?
|
||||
Run: SELECT * FROM users WHERE email = 'test@example.com';
|
||||
```
|
||||
|
||||
**`Error: JWT expired`**
|
||||
```bash
|
||||
# Dev: extend token TTL in .env
|
||||
JWT_EXPIRES_IN=30d
|
||||
|
||||
# Check clock skew between server and client
|
||||
date && docker exec postgres date
|
||||
```
|
||||
|
||||
**`500 on /api/*` in local dev**
|
||||
```bash
|
||||
# 1. Check terminal for stack trace
|
||||
# 2. Check database connectivity
|
||||
psql $DATABASE_URL -c "SELECT 1"
|
||||
# 3. Check Redis
|
||||
redis-cli ping
|
||||
# 4. Check logs
|
||||
pnpm dev 2>&1 | grep -E "error|Error|ERROR"
|
||||
```
|
||||
|
||||
### Useful SQL Queries
|
||||
|
||||
```sql
|
||||
-- Find slow queries (requires pg_stat_statements)
|
||||
SELECT query, mean_exec_time, calls, total_exec_time
|
||||
FROM pg_stat_statements
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Check active connections
|
||||
SELECT count(*), state FROM pg_stat_activity GROUP BY state;
|
||||
|
||||
-- Find bloated tables
|
||||
SELECT relname, n_dead_tup, n_live_tup,
|
||||
round(n_dead_tup::numeric/nullif(n_live_tup,0)*100, 2) AS dead_pct
|
||||
FROM pg_stat_user_tables
|
||||
ORDER BY n_dead_tup DESC;
|
||||
```
|
||||
|
||||
### Debug Authentication
|
||||
|
||||
```bash
|
||||
# Decode a JWT (no secret needed for header/payload)
|
||||
echo "YOUR_JWT" | cut -d. -f2 | base64 -d | jq .
|
||||
|
||||
# Check session in DB
|
||||
psql $DATABASE_URL -c "SELECT * FROM sessions WHERE user_id = 'usr_...' ORDER BY expires_at DESC LIMIT 5;"
|
||||
```
|
||||
|
||||
### Log Locations
|
||||
|
||||
| Environment | Logs |
|
||||
|-------------|------|
|
||||
| Local dev | Terminal running `pnpm dev` |
|
||||
| Vercel production | Vercel dashboard → Logs |
|
||||
| Workers (Railway) | Railway dashboard → Deployments → Logs |
|
||||
| Database | `docker logs postgres` (local) |
|
||||
| Background jobs | `pnpm worker:dev` terminal |
|
||||
|
||||
---
|
||||
|
||||
## Contribution Guidelines
|
||||
|
||||
### Branch Strategy
|
||||
|
||||
```
|
||||
main → production (protected, requires PR + CI)
|
||||
└── feature/PROJ-123-short-desc
|
||||
└── fix/PROJ-456-bug-description
|
||||
└── chore/update-dependencies
|
||||
```
|
||||
|
||||
### PR Requirements
|
||||
|
||||
- [ ] Branch name includes ticket ID (e.g., `feature/PROJ-123-...`)
|
||||
- [ ] PR description explains the why
|
||||
- [ ] All CI checks pass
|
||||
- [ ] Test coverage doesn't decrease
|
||||
- [ ] Self-reviewed (read your own diff before requesting review)
|
||||
- [ ] Screenshots/video for UI changes
|
||||
|
||||
### Commit Convention
|
||||
|
||||
```
|
||||
feat(scope): short description → new feature
|
||||
fix(scope): short description → bug fix
|
||||
chore: update dependencies → maintenance
|
||||
docs: update API reference → documentation
|
||||
```
|
||||
|
||||
### Code Style
|
||||
|
||||
```bash
|
||||
# Lint + format
|
||||
pnpm lint
|
||||
pnpm format
|
||||
|
||||
# Type check
|
||||
pnpm typecheck
|
||||
|
||||
# All checks (run before pushing)
|
||||
pnpm validate
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Audience-Specific Notes
|
||||
|
||||
### For Junior Developers
|
||||
- Start with `src/lib/auth.ts` to understand authentication
|
||||
- Read existing tests in `tests/api/` — they document expected behavior
|
||||
- Ask before touching anything in `src/db/schema.ts` — schema changes affect everyone
|
||||
- Use `pnpm db:seed` to get realistic local data
|
||||
|
||||
### For Senior Engineers / Tech Leads
|
||||
- Architecture decisions are documented in `docs/adr/` (Architecture Decision Records)
|
||||
- Performance benchmarks: `pnpm bench` — baseline is in `tests/benchmarks/baseline.json`
|
||||
- Security model: RLS policies in `src/db/rls.sql`, enforced at DB level
|
||||
- Scaling notes: `docs/scaling.md`
|
||||
|
||||
### For Contractors
|
||||
- Scope is limited to `src/features/[your-feature]/` unless discussed
|
||||
- Never push directly to `main`
|
||||
- All external API calls go through `src/lib/` wrappers (for mocking in tests)
|
||||
- Time estimates: log in Linear ticket comments daily
|
||||
|
||||
---
|
||||
|
||||
## Output Formats
|
||||
|
||||
### Notion Export
|
||||
|
||||
```javascript
|
||||
// Use Notion API to create onboarding page
|
||||
const { Client } = require('@notionhq/client')
|
||||
const notion = new Client({ auth: process.env.NOTION_TOKEN })
|
||||
|
||||
const blocks = markdownToNotionBlocks(onboardingMarkdown) // use notion-to-md
|
||||
await notion.pages.create({
|
||||
parent: { page_id: ONBOARDING_PARENT_PAGE_ID },
|
||||
properties: { title: { title: [{ text: { content: 'Engineer Onboarding — MyApp' } }] } },
|
||||
children: blocks,
|
||||
})
|
||||
```
|
||||
|
||||
### Confluence Export
|
||||
|
||||
```bash
|
||||
# Using confluence-cli or REST API
|
||||
curl -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-u "user@example.com:$CONFLUENCE_TOKEN" \
|
||||
"https://yourorg.atlassian.net/wiki/rest/api/content" \
|
||||
-d '{
|
||||
"type": "page",
|
||||
"title": "Codebase Onboarding",
|
||||
"space": {"key": "ENG"},
|
||||
"body": {
|
||||
"storage": {
|
||||
"value": "<p>Generated content...</p>",
|
||||
"representation": "storage"
|
||||
}
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Docs written once, never updated** — add doc updates to PR checklist
|
||||
- **Missing local setup step** — test setup instructions on a fresh machine quarterly
|
||||
- **No error troubleshooting** — debugging section is the most valuable part for new hires
|
||||
- **Too much detail for contractors** — they need task-specific, not architecture-deep docs
|
||||
- **No screenshots** — UI flows need screenshots; they go stale but are still valuable
|
||||
- **Skipping the "why"** — document why decisions were made, not just what was decided
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Keep setup under 10 minutes** — if it takes longer, fix the setup, not the docs
|
||||
2. **Test the docs** — have a new hire follow them literally, fix every gap they hit
|
||||
3. **Link, don't repeat** — link to ADRs, issues, and external docs instead of duplicating
|
||||
4. **Update in the same PR** — docs changes alongside code changes
|
||||
5. **Version-specific notes** — call out things that changed in recent versions
|
||||
6. **Runbooks over theory** — "run this command" beats "the system uses Redis for..."
|
||||
522
engineering/database-schema-designer/SKILL.md
Normal file
522
engineering/database-schema-designer/SKILL.md
Normal file
@@ -0,0 +1,522 @@
|
||||
# Database Schema Designer
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Data Architecture / Backend
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Design relational database schemas from requirements and generate migrations, TypeScript/Python types, seed data, RLS policies, and indexes. Handles multi-tenancy, soft deletes, audit trails, versioning, and polymorphic associations.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Schema design** — normalize requirements into tables, relationships, constraints
|
||||
- **Migration generation** — Drizzle, Prisma, TypeORM, Alembic
|
||||
- **Type generation** — TypeScript interfaces, Python dataclasses/Pydantic models
|
||||
- **RLS policies** — Row-Level Security for multi-tenant apps
|
||||
- **Index strategy** — composite indexes, partial indexes, covering indexes
|
||||
- **Seed data** — realistic test data generation
|
||||
- **ERD generation** — Mermaid diagram from schema
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Designing a new feature that needs database tables
|
||||
- Reviewing a schema for performance or normalization issues
|
||||
- Adding multi-tenancy to an existing schema
|
||||
- Generating TypeScript types from a Prisma schema
|
||||
- Planning a schema migration for a breaking change
|
||||
|
||||
---
|
||||
|
||||
## Schema Design Process
|
||||
|
||||
### Step 1: Requirements → Entities
|
||||
|
||||
Given requirements:
|
||||
> "Users can create projects. Each project has tasks. Tasks can have labels. Tasks can be assigned to users. We need a full audit trail."
|
||||
|
||||
Extract entities:
|
||||
```
|
||||
User, Project, Task, Label, TaskLabel (junction), TaskAssignment, AuditLog
|
||||
```
|
||||
|
||||
### Step 2: Identify Relationships
|
||||
|
||||
```
|
||||
User 1──* Project (owner)
|
||||
Project 1──* Task
|
||||
Task *──* Label (via TaskLabel)
|
||||
Task *──* User (via TaskAssignment)
|
||||
User 1──* AuditLog
|
||||
```
|
||||
|
||||
### Step 3: Add Cross-cutting Concerns
|
||||
|
||||
- Multi-tenancy: add `organization_id` to all tenant-scoped tables
|
||||
- Soft deletes: add `deleted_at TIMESTAMPTZ` instead of hard deletes
|
||||
- Audit trail: add `created_by`, `updated_by`, `created_at`, `updated_at`
|
||||
- Versioning: add `version INTEGER` for optimistic locking
|
||||
|
||||
---
|
||||
|
||||
## Full Schema Example (Task Management SaaS)
|
||||
|
||||
### Prisma Schema
|
||||
|
||||
```prisma
|
||||
// schema.prisma
|
||||
generator client {
|
||||
provider = "prisma-client-js"
|
||||
}
|
||||
|
||||
datasource db {
|
||||
provider = "postgresql"
|
||||
url = env("DATABASE_URL")
|
||||
}
|
||||
|
||||
// ── Multi-tenancy ─────────────────────────────────────────────────────────────
|
||||
|
||||
model Organization {
|
||||
id String @id @default(cuid())
|
||||
name String
|
||||
slug String @unique
|
||||
plan Plan @default(FREE)
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
deletedAt DateTime? @map("deleted_at")
|
||||
|
||||
users OrganizationMember[]
|
||||
projects Project[]
|
||||
auditLogs AuditLog[]
|
||||
|
||||
@@map("organizations")
|
||||
}
|
||||
|
||||
model OrganizationMember {
|
||||
id String @id @default(cuid())
|
||||
organizationId String @map("organization_id")
|
||||
userId String @map("user_id")
|
||||
role OrgRole @default(MEMBER)
|
||||
joinedAt DateTime @default(now()) @map("joined_at")
|
||||
|
||||
organization Organization @relation(fields: [organizationId], references: [id], onDelete: Cascade)
|
||||
user User @relation(fields: [userId], references: [id], onDelete: Cascade)
|
||||
|
||||
@@unique([organizationId, userId])
|
||||
@@index([userId])
|
||||
@@map("organization_members")
|
||||
}
|
||||
|
||||
model User {
|
||||
id String @id @default(cuid())
|
||||
email String @unique
|
||||
name String?
|
||||
avatarUrl String? @map("avatar_url")
|
||||
passwordHash String? @map("password_hash")
|
||||
emailVerifiedAt DateTime? @map("email_verified_at")
|
||||
lastLoginAt DateTime? @map("last_login_at")
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
deletedAt DateTime? @map("deleted_at")
|
||||
|
||||
memberships OrganizationMember[]
|
||||
ownedProjects Project[] @relation("ProjectOwner")
|
||||
assignedTasks TaskAssignment[]
|
||||
comments Comment[]
|
||||
auditLogs AuditLog[]
|
||||
|
||||
@@map("users")
|
||||
}
|
||||
|
||||
// ── Core entities ─────────────────────────────────────────────────────────────
|
||||
|
||||
model Project {
|
||||
id String @id @default(cuid())
|
||||
organizationId String @map("organization_id")
|
||||
ownerId String @map("owner_id")
|
||||
name String
|
||||
description String?
|
||||
status ProjectStatus @default(ACTIVE)
|
||||
settings Json @default("{}")
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
deletedAt DateTime? @map("deleted_at")
|
||||
|
||||
organization Organization @relation(fields: [organizationId], references: [id])
|
||||
owner User @relation("ProjectOwner", fields: [ownerId], references: [id])
|
||||
tasks Task[]
|
||||
labels Label[]
|
||||
|
||||
@@index([organizationId])
|
||||
@@index([organizationId, status])
|
||||
@@index([deletedAt])
|
||||
@@map("projects")
|
||||
}
|
||||
|
||||
model Task {
|
||||
id String @id @default(cuid())
|
||||
projectId String @map("project_id")
|
||||
title String
|
||||
description String?
|
||||
status TaskStatus @default(TODO)
|
||||
priority Priority @default(MEDIUM)
|
||||
dueDate DateTime? @map("due_date")
|
||||
position Float @default(0) // For drag-and-drop ordering
|
||||
version Int @default(1) // Optimistic locking
|
||||
createdById String @map("created_by_id")
|
||||
updatedById String @map("updated_by_id")
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
updatedAt DateTime @updatedAt @map("updated_at")
|
||||
deletedAt DateTime? @map("deleted_at")
|
||||
|
||||
project Project @relation(fields: [projectId], references: [id])
|
||||
assignments TaskAssignment[]
|
||||
labels TaskLabel[]
|
||||
comments Comment[]
|
||||
attachments Attachment[]
|
||||
|
||||
@@index([projectId])
|
||||
@@index([projectId, status])
|
||||
@@index([projectId, deletedAt])
|
||||
@@index([dueDate], where: { deletedAt: null }) // Partial index
|
||||
@@map("tasks")
|
||||
}
|
||||
|
||||
// ── Polymorphic attachments ───────────────────────────────────────────────────
|
||||
|
||||
model Attachment {
|
||||
id String @id @default(cuid())
|
||||
// Polymorphic association
|
||||
entityType String @map("entity_type") // "task" | "comment"
|
||||
entityId String @map("entity_id")
|
||||
filename String
|
||||
mimeType String @map("mime_type")
|
||||
sizeBytes Int @map("size_bytes")
|
||||
storageKey String @map("storage_key") // S3 key
|
||||
uploadedById String @map("uploaded_by_id")
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
|
||||
// Only one concrete relation (task) — polymorphic handled at app level
|
||||
task Task? @relation(fields: [entityId], references: [id], map: "attachment_task_fk")
|
||||
|
||||
@@index([entityType, entityId])
|
||||
@@map("attachments")
|
||||
}
|
||||
|
||||
// ── Audit trail ───────────────────────────────────────────────────────────────
|
||||
|
||||
model AuditLog {
|
||||
id String @id @default(cuid())
|
||||
organizationId String @map("organization_id")
|
||||
userId String? @map("user_id")
|
||||
action String // "task.created", "task.status_changed"
|
||||
entityType String @map("entity_type")
|
||||
entityId String @map("entity_id")
|
||||
before Json? // Previous state
|
||||
after Json? // New state
|
||||
ipAddress String? @map("ip_address")
|
||||
userAgent String? @map("user_agent")
|
||||
createdAt DateTime @default(now()) @map("created_at")
|
||||
|
||||
organization Organization @relation(fields: [organizationId], references: [id])
|
||||
user User? @relation(fields: [userId], references: [id])
|
||||
|
||||
@@index([organizationId, createdAt(sort: Desc)])
|
||||
@@index([entityType, entityId])
|
||||
@@index([userId])
|
||||
@@map("audit_logs")
|
||||
}
|
||||
|
||||
enum Plan { FREE STARTER GROWTH ENTERPRISE }
|
||||
enum OrgRole { OWNER ADMIN MEMBER VIEWER }
|
||||
enum ProjectStatus { ACTIVE ARCHIVED }
|
||||
enum TaskStatus { TODO IN_PROGRESS IN_REVIEW DONE CANCELLED }
|
||||
enum Priority { LOW MEDIUM HIGH CRITICAL }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Drizzle Schema (TypeScript)
|
||||
|
||||
```typescript
|
||||
// db/schema.ts
|
||||
import {
|
||||
pgTable, text, timestamp, integer, boolean,
|
||||
varchar, jsonb, real, pgEnum, uniqueIndex, index,
|
||||
} from 'drizzle-orm/pg-core'
|
||||
import { createId } from '@paralleldrive/cuid2'
|
||||
|
||||
export const taskStatusEnum = pgEnum('task_status', [
|
||||
'todo', 'in_progress', 'in_review', 'done', 'cancelled'
|
||||
])
|
||||
export const priorityEnum = pgEnum('priority', ['low', 'medium', 'high', 'critical'])
|
||||
|
||||
export const tasks = pgTable('tasks', {
|
||||
id: text('id').primaryKey().$defaultFn(() => createId()),
|
||||
projectId: text('project_id').notNull().references(() => projects.id),
|
||||
title: varchar('title', { length: 500 }).notNull(),
|
||||
description: text('description'),
|
||||
status: taskStatusEnum('status').notNull().default('todo'),
|
||||
priority: priorityEnum('priority').notNull().default('medium'),
|
||||
dueDate: timestamp('due_date', { withTimezone: true }),
|
||||
position: real('position').notNull().default(0),
|
||||
version: integer('version').notNull().default(1),
|
||||
createdById: text('created_by_id').notNull().references(() => users.id),
|
||||
updatedById: text('updated_by_id').notNull().references(() => users.id),
|
||||
createdAt: timestamp('created_at', { withTimezone: true }).notNull().defaultNow(),
|
||||
updatedAt: timestamp('updated_at', { withTimezone: true }).notNull().defaultNow(),
|
||||
deletedAt: timestamp('deleted_at', { withTimezone: true }),
|
||||
}, (table) => ({
|
||||
projectIdx: index('tasks_project_id_idx').on(table.projectId),
|
||||
projectStatusIdx: index('tasks_project_status_idx').on(table.projectId, table.status),
|
||||
}))
|
||||
|
||||
// Infer TypeScript types
|
||||
export type Task = typeof tasks.$inferSelect
|
||||
export type NewTask = typeof tasks.$inferInsert
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Alembic Migration (Python / SQLAlchemy)
|
||||
|
||||
```python
|
||||
# alembic/versions/20260301_create_tasks.py
|
||||
"""Create tasks table
|
||||
|
||||
Revision ID: a1b2c3d4e5f6
|
||||
Revises: previous_revision
|
||||
Create Date: 2026-03-01 12:00:00
|
||||
"""
|
||||
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
from sqlalchemy.dialects import postgresql
|
||||
|
||||
revision = 'a1b2c3d4e5f6'
|
||||
down_revision = 'previous_revision'
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
# Create enums
|
||||
task_status = postgresql.ENUM(
|
||||
'todo', 'in_progress', 'in_review', 'done', 'cancelled',
|
||||
name='task_status'
|
||||
)
|
||||
task_status.create(op.get_bind())
|
||||
|
||||
op.create_table(
|
||||
'tasks',
|
||||
sa.Column('id', sa.Text(), primary_key=True),
|
||||
sa.Column('project_id', sa.Text(), sa.ForeignKey('projects.id'), nullable=False),
|
||||
sa.Column('title', sa.VARCHAR(500), nullable=False),
|
||||
sa.Column('description', sa.Text()),
|
||||
sa.Column('status', postgresql.ENUM('todo', 'in_progress', 'in_review', 'done', 'cancelled', name='task_status', create_type=False), nullable=False, server_default='todo'),
|
||||
sa.Column('priority', sa.Text(), nullable=False, server_default='medium'),
|
||||
sa.Column('due_date', sa.TIMESTAMP(timezone=True)),
|
||||
sa.Column('position', sa.Float(), nullable=False, server_default='0'),
|
||||
sa.Column('version', sa.Integer(), nullable=False, server_default='1'),
|
||||
sa.Column('created_by_id', sa.Text(), sa.ForeignKey('users.id'), nullable=False),
|
||||
sa.Column('updated_by_id', sa.Text(), sa.ForeignKey('users.id'), nullable=False),
|
||||
sa.Column('created_at', sa.TIMESTAMP(timezone=True), nullable=False, server_default=sa.text('NOW()')),
|
||||
sa.Column('updated_at', sa.TIMESTAMP(timezone=True), nullable=False, server_default=sa.text('NOW()')),
|
||||
sa.Column('deleted_at', sa.TIMESTAMP(timezone=True)),
|
||||
)
|
||||
|
||||
# Indexes
|
||||
op.create_index('tasks_project_id_idx', 'tasks', ['project_id'])
|
||||
op.create_index('tasks_project_status_idx', 'tasks', ['project_id', 'status'])
|
||||
# Partial index for active tasks only
|
||||
op.create_index(
|
||||
'tasks_due_date_active_idx',
|
||||
'tasks', ['due_date'],
|
||||
postgresql_where=sa.text('deleted_at IS NULL')
|
||||
)
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
op.drop_table('tasks')
|
||||
op.execute("DROP TYPE IF EXISTS task_status")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Row-Level Security (RLS) Policies
|
||||
|
||||
```sql
|
||||
-- Enable RLS
|
||||
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
|
||||
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
|
||||
|
||||
-- Create app role
|
||||
CREATE ROLE app_user;
|
||||
|
||||
-- Users can only see tasks in their organization's projects
|
||||
CREATE POLICY tasks_org_isolation ON tasks
|
||||
FOR ALL TO app_user
|
||||
USING (
|
||||
project_id IN (
|
||||
SELECT p.id FROM projects p
|
||||
JOIN organization_members om ON om.organization_id = p.organization_id
|
||||
WHERE om.user_id = current_setting('app.current_user_id')::text
|
||||
)
|
||||
);
|
||||
|
||||
-- Soft delete: never show deleted records
|
||||
CREATE POLICY tasks_no_deleted ON tasks
|
||||
FOR SELECT TO app_user
|
||||
USING (deleted_at IS NULL);
|
||||
|
||||
-- Only task creator or admin can delete
|
||||
CREATE POLICY tasks_delete_policy ON tasks
|
||||
FOR DELETE TO app_user
|
||||
USING (
|
||||
created_by_id = current_setting('app.current_user_id')::text
|
||||
OR EXISTS (
|
||||
SELECT 1 FROM organization_members om
|
||||
JOIN projects p ON p.organization_id = om.organization_id
|
||||
WHERE p.id = tasks.project_id
|
||||
AND om.user_id = current_setting('app.current_user_id')::text
|
||||
AND om.role IN ('owner', 'admin')
|
||||
)
|
||||
);
|
||||
|
||||
-- Set user context (call at start of each request)
|
||||
SELECT set_config('app.current_user_id', $1, true);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Seed Data Generation
|
||||
|
||||
```typescript
|
||||
// db/seed.ts
|
||||
import { faker } from '@faker-js/faker'
|
||||
import { db } from './client'
|
||||
import { organizations, users, projects, tasks } from './schema'
|
||||
import { createId } from '@paralleldrive/cuid2'
|
||||
import { hashPassword } from '../src/lib/auth'
|
||||
|
||||
async function seed() {
|
||||
console.log('Seeding database...')
|
||||
|
||||
// Create org
|
||||
const [org] = await db.insert(organizations).values({
|
||||
id: createId(),
|
||||
name: 'Acme Corp',
|
||||
slug: 'acme',
|
||||
plan: 'growth',
|
||||
}).returning()
|
||||
|
||||
// Create users
|
||||
const adminUser = await db.insert(users).values({
|
||||
id: createId(),
|
||||
email: 'admin@acme.com',
|
||||
name: 'Alice Admin',
|
||||
passwordHash: await hashPassword('password123'),
|
||||
}).returning().then(r => r[0])
|
||||
|
||||
// Create projects
|
||||
const projectsData = Array.from({ length: 3 }, () => ({
|
||||
id: createId(),
|
||||
organizationId: org.id,
|
||||
ownerId: adminUser.id,
|
||||
name: faker.company.catchPhrase(),
|
||||
description: faker.lorem.paragraph(),
|
||||
status: 'active' as const,
|
||||
}))
|
||||
|
||||
const createdProjects = await db.insert(projects).values(projectsData).returning()
|
||||
|
||||
// Create tasks for each project
|
||||
for (const project of createdProjects) {
|
||||
const tasksData = Array.from({ length: faker.number.int({ min: 5, max: 20 }) }, (_, i) => ({
|
||||
id: createId(),
|
||||
projectId: project.id,
|
||||
title: faker.hacker.phrase(),
|
||||
description: faker.lorem.sentences(2),
|
||||
status: faker.helpers.arrayElement(['todo', 'in_progress', 'done'] as const),
|
||||
priority: faker.helpers.arrayElement(['low', 'medium', 'high'] as const),
|
||||
position: i * 1000,
|
||||
createdById: adminUser.id,
|
||||
updatedById: adminUser.id,
|
||||
}))
|
||||
|
||||
await db.insert(tasks).values(tasksData)
|
||||
}
|
||||
|
||||
console.log(`✅ Seeded: 1 org, ${projectsData.length} projects, tasks`)
|
||||
}
|
||||
|
||||
seed().catch(console.error).finally(() => process.exit(0))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ERD Generation (Mermaid)
|
||||
|
||||
```
|
||||
erDiagram
|
||||
Organization ||--o{ OrganizationMember : has
|
||||
Organization ||--o{ Project : owns
|
||||
User ||--o{ OrganizationMember : joins
|
||||
User ||--o{ Task : "created by"
|
||||
Project ||--o{ Task : contains
|
||||
Task ||--o{ TaskAssignment : has
|
||||
Task ||--o{ TaskLabel : has
|
||||
Task ||--o{ Comment : has
|
||||
Task ||--o{ Attachment : has
|
||||
Label ||--o{ TaskLabel : "applied to"
|
||||
User ||--o{ TaskAssignment : assigned
|
||||
|
||||
Organization {
|
||||
string id PK
|
||||
string name
|
||||
string slug
|
||||
string plan
|
||||
}
|
||||
|
||||
Task {
|
||||
string id PK
|
||||
string project_id FK
|
||||
string title
|
||||
string status
|
||||
string priority
|
||||
timestamp due_date
|
||||
timestamp deleted_at
|
||||
int version
|
||||
}
|
||||
```
|
||||
|
||||
Generate from Prisma:
|
||||
```bash
|
||||
npx prisma-erd-generator
|
||||
# or: npx @dbml/cli prisma2dbml -i schema.prisma | npx dbml-to-mermaid
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Soft delete without index** — `WHERE deleted_at IS NULL` without index = full scan
|
||||
- **Missing composite indexes** — `WHERE org_id = ? AND status = ?` needs a composite index
|
||||
- **Mutable surrogate keys** — never use email or slug as PK; use UUID/CUID
|
||||
- **Non-nullable without default** — adding a NOT NULL column to existing table requires default or migration plan
|
||||
- **No optimistic locking** — concurrent updates overwrite each other; add `version` column
|
||||
- **RLS not tested** — always test RLS with a non-superuser role
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Timestamps everywhere** — `created_at`, `updated_at` on every table
|
||||
2. **Soft deletes for auditable data** — `deleted_at` instead of DELETE
|
||||
3. **Audit log for compliance** — log before/after JSON for regulated domains
|
||||
4. **UUIDs or CUIDs as PKs** — avoid sequential integer leakage
|
||||
5. **Index foreign keys** — every FK column should have an index
|
||||
6. **Partial indexes** — use `WHERE deleted_at IS NULL` for active-only queries
|
||||
7. **RLS over application-level filtering** — database enforces tenancy, not just app code
|
||||
686
engineering/env-secrets-manager/SKILL.md
Normal file
686
engineering/env-secrets-manager/SKILL.md
Normal file
@@ -0,0 +1,686 @@
|
||||
# Env & Secrets Manager
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Security / DevOps / Configuration Management
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Complete environment and secrets management workflow: .env file lifecycle across dev/staging/prod,
|
||||
.env.example auto-generation, required-var validation, secret leak detection in git history, and
|
||||
credential rotation playbook. Integrates with HashiCorp Vault, AWS SSM, 1Password CLI, and Doppler.
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **.env lifecycle** — create, validate, sync across environments
|
||||
- **.env.example generation** — strip values, preserve keys and comments
|
||||
- **Validation script** — fail-fast on missing required vars at startup
|
||||
- **Secret leak detection** — regex scan of git history and working tree
|
||||
- **Rotation workflow** — detect → scope → rotate → deploy → verify
|
||||
- **Secret manager integrations** — Vault KV v2, AWS SSM, 1Password, Doppler
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Setting up a new project — scaffold .env.example and validation
|
||||
- Before every commit — scan for accidentally staged secrets
|
||||
- Post-incident response — leaked credential rotation procedure
|
||||
- Onboarding new developers — they need all vars, not just some
|
||||
- Environment drift investigation — prod behaving differently from staging
|
||||
|
||||
---
|
||||
|
||||
## .env File Structure
|
||||
|
||||
### Canonical Layout
|
||||
```bash
|
||||
# .env.example — committed to git (no values)
|
||||
# .env.local — developer machine (gitignored)
|
||||
# .env.staging — CI/CD or secret manager reference
|
||||
# .env.prod — never on disk; pulled from secret manager at runtime
|
||||
|
||||
# Application
|
||||
APP_NAME=
|
||||
APP_ENV= # dev | staging | prod
|
||||
APP_PORT=3000 # default port if not set
|
||||
APP_SECRET= # REQUIRED: JWT signing secret (min 32 chars)
|
||||
APP_URL= # REQUIRED: public base URL
|
||||
|
||||
# Database
|
||||
DATABASE_URL= # REQUIRED: full connection string
|
||||
DATABASE_POOL_MIN=2
|
||||
DATABASE_POOL_MAX=10
|
||||
|
||||
# Auth
|
||||
AUTH_JWT_SECRET= # REQUIRED
|
||||
AUTH_JWT_EXPIRY=3600 # seconds
|
||||
AUTH_REFRESH_SECRET= # REQUIRED
|
||||
|
||||
# Third-party APIs
|
||||
STRIPE_SECRET_KEY= # REQUIRED in prod
|
||||
STRIPE_WEBHOOK_SECRET= # REQUIRED in prod
|
||||
SENDGRID_API_KEY=
|
||||
|
||||
# Storage
|
||||
AWS_ACCESS_KEY_ID=
|
||||
AWS_SECRET_ACCESS_KEY=
|
||||
AWS_REGION=eu-central-1
|
||||
AWS_S3_BUCKET=
|
||||
|
||||
# Monitoring
|
||||
SENTRY_DSN=
|
||||
DD_API_KEY=
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## .gitignore Patterns
|
||||
|
||||
Add to your project's `.gitignore`:
|
||||
|
||||
```gitignore
|
||||
# Environment files — NEVER commit these
|
||||
.env
|
||||
.env.local
|
||||
.env.development
|
||||
.env.development.local
|
||||
.env.test.local
|
||||
.env.staging
|
||||
.env.staging.local
|
||||
.env.production
|
||||
.env.production.local
|
||||
.env.prod
|
||||
.env.*.local
|
||||
|
||||
# Secret files
|
||||
*.pem
|
||||
*.key
|
||||
*.p12
|
||||
*.pfx
|
||||
secrets.json
|
||||
secrets.yaml
|
||||
secrets.yml
|
||||
credentials.json
|
||||
service-account.json
|
||||
|
||||
# AWS
|
||||
.aws/credentials
|
||||
|
||||
# Terraform state (may contain secrets)
|
||||
*.tfstate
|
||||
*.tfstate.backup
|
||||
.terraform/
|
||||
|
||||
# Kubernetes secrets
|
||||
*-secret.yaml
|
||||
*-secrets.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## .env.example Auto-Generation
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/gen-env-example.sh
|
||||
# Strips values from .env, preserves keys, defaults, and comments
|
||||
|
||||
INPUT="${1:-.env}"
|
||||
OUTPUT="${2:-.env.example}"
|
||||
|
||||
if [ ! -f "$INPUT" ]; then
|
||||
echo "ERROR: $INPUT not found"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
python3 - "$INPUT" "$OUTPUT" << 'PYEOF'
|
||||
import sys, re
|
||||
|
||||
input_file = sys.argv[1]
|
||||
output_file = sys.argv[2]
|
||||
lines = []
|
||||
|
||||
with open(input_file) as f:
|
||||
for line in f:
|
||||
stripped = line.rstrip('\n')
|
||||
# Keep blank lines and comments as-is
|
||||
if stripped == '' or stripped.startswith('#'):
|
||||
lines.append(stripped)
|
||||
continue
|
||||
# Match KEY=VALUE or KEY="VALUE"
|
||||
m = re.match(r'^([A-Z_][A-Z0-9_]*)=(.*)$', stripped)
|
||||
if m:
|
||||
key = m.group(1)
|
||||
value = m.group(2).strip('"\'')
|
||||
# Keep non-sensitive defaults (ports, regions, feature flags)
|
||||
safe_defaults = re.compile(
|
||||
r'^(APP_PORT|APP_ENV|APP_NAME|AWS_REGION|DATABASE_POOL_|LOG_LEVEL|'
|
||||
r'FEATURE_|CACHE_TTL|RATE_LIMIT_|PAGINATION_|TIMEOUT_)',
|
||||
re.I
|
||||
)
|
||||
sensitive = re.compile(
|
||||
r'(SECRET|KEY|TOKEN|PASSWORD|PASS|CREDENTIAL|DSN|AUTH|PRIVATE|CERT)',
|
||||
re.I
|
||||
)
|
||||
if safe_defaults.match(key) and value:
|
||||
lines.append(f"{key}={value} # default")
|
||||
else:
|
||||
lines.append(f"{key}=")
|
||||
else:
|
||||
lines.append(stripped)
|
||||
|
||||
with open(output_file, 'w') as f:
|
||||
f.write('\n'.join(lines) + '\n')
|
||||
|
||||
print(f"Generated {output_file} from {input_file}")
|
||||
PYEOF
|
||||
```
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
bash scripts/gen-env-example.sh .env .env.example
|
||||
# Commit .env.example, never .env
|
||||
git add .env.example
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Required Variable Validation Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/validate-env.sh
|
||||
# Run at app startup or in CI before deploy
|
||||
# Exit 1 if any required var is missing or empty
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MISSING=()
|
||||
WARNINGS=()
|
||||
|
||||
# --- Define required vars by environment ---
|
||||
ALWAYS_REQUIRED=(
|
||||
APP_SECRET
|
||||
APP_URL
|
||||
DATABASE_URL
|
||||
AUTH_JWT_SECRET
|
||||
AUTH_REFRESH_SECRET
|
||||
)
|
||||
|
||||
PROD_REQUIRED=(
|
||||
STRIPE_SECRET_KEY
|
||||
STRIPE_WEBHOOK_SECRET
|
||||
SENTRY_DSN
|
||||
)
|
||||
|
||||
# --- Check always-required vars ---
|
||||
for var in "${ALWAYS_REQUIRED[@]}"; do
|
||||
if [ -z "${!var:-}" ]; then
|
||||
MISSING+=("$var")
|
||||
fi
|
||||
done
|
||||
|
||||
# --- Check prod-only vars ---
|
||||
if [ "${APP_ENV:-}" = "production" ] || [ "${NODE_ENV:-}" = "production" ]; then
|
||||
for var in "${PROD_REQUIRED[@]}"; do
|
||||
if [ -z "${!var:-}" ]; then
|
||||
MISSING+=("$var (required in production)")
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
# --- Validate format/length constraints ---
|
||||
if [ -n "${AUTH_JWT_SECRET:-}" ] && [ ${#AUTH_JWT_SECRET} -lt 32 ]; then
|
||||
WARNINGS+=("AUTH_JWT_SECRET is shorter than 32 chars — insecure")
|
||||
fi
|
||||
|
||||
if [ -n "${DATABASE_URL:-}" ]; then
|
||||
if ! echo "$DATABASE_URL" | grep -qE "^(postgres|postgresql|mysql|mongodb|redis)://"; then
|
||||
WARNINGS+=("DATABASE_URL doesn't look like a valid connection string")
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ -n "${APP_PORT:-}" ]; then
|
||||
if ! [[ "$APP_PORT" =~ ^[0-9]+$ ]] || [ "$APP_PORT" -lt 1 ] || [ "$APP_PORT" -gt 65535 ]; then
|
||||
WARNINGS+=("APP_PORT=$APP_PORT is not a valid port number")
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- Report ---
|
||||
if [ ${#WARNINGS[@]} -gt 0 ]; then
|
||||
echo "WARNINGS:"
|
||||
for w in "${WARNINGS[@]}"; do
|
||||
echo " ⚠️ $w"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ ${#MISSING[@]} -gt 0 ]; then
|
||||
echo ""
|
||||
echo "FATAL: Missing required environment variables:"
|
||||
for var in "${MISSING[@]}"; do
|
||||
echo " ❌ $var"
|
||||
done
|
||||
echo ""
|
||||
echo "Copy .env.example to .env and fill in missing values."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✅ All required environment variables are set"
|
||||
```
|
||||
|
||||
Node.js equivalent:
|
||||
```typescript
|
||||
// src/config/validateEnv.ts
|
||||
const required = [
|
||||
'APP_SECRET', 'APP_URL', 'DATABASE_URL',
|
||||
'AUTH_JWT_SECRET', 'AUTH_REFRESH_SECRET',
|
||||
]
|
||||
|
||||
const missing = required.filter(key => !process.env[key])
|
||||
|
||||
if (missing.length > 0) {
|
||||
console.error('FATAL: Missing required environment variables:', missing)
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
if (process.env.AUTH_JWT_SECRET && process.env.AUTH_JWT_SECRET.length < 32) {
|
||||
console.error('FATAL: AUTH_JWT_SECRET must be at least 32 characters')
|
||||
process.exit(1)
|
||||
}
|
||||
|
||||
export const config = {
|
||||
appSecret: process.env.APP_SECRET!,
|
||||
appUrl: process.env.APP_URL!,
|
||||
databaseUrl: process.env.DATABASE_URL!,
|
||||
jwtSecret: process.env.AUTH_JWT_SECRET!,
|
||||
refreshSecret: process.env.AUTH_REFRESH_SECRET!,
|
||||
stripeKey: process.env.STRIPE_SECRET_KEY, // optional
|
||||
port: parseInt(process.env.APP_PORT ?? '3000', 10),
|
||||
} as const
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Secret Leak Detection
|
||||
|
||||
### Scan Working Tree
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/scan-secrets.sh
|
||||
# Scan staged files and working tree for common secret patterns
|
||||
|
||||
FAIL=0
|
||||
|
||||
check() {
|
||||
local label="$1"
|
||||
local pattern="$2"
|
||||
local matches
|
||||
|
||||
matches=$(git diff --cached -U0 2>/dev/null | grep "^+" | grep -vE "^(\+\+\+|#|\/\/)" | \
|
||||
grep -E "$pattern" | grep -v ".env.example" | grep -v "test\|mock\|fixture\|fake" || true)
|
||||
|
||||
if [ -n "$matches" ]; then
|
||||
echo "SECRET DETECTED [$label]:"
|
||||
echo "$matches" | head -5
|
||||
FAIL=1
|
||||
fi
|
||||
}
|
||||
|
||||
# AWS Access Keys
|
||||
check "AWS Access Key" "AKIA[0-9A-Z]{16}"
|
||||
check "AWS Secret Key" "aws_secret_access_key\s*=\s*['\"]?[A-Za-z0-9/+]{40}"
|
||||
|
||||
# Stripe
|
||||
check "Stripe Live Key" "sk_live_[0-9a-zA-Z]{24,}"
|
||||
check "Stripe Test Key" "sk_test_[0-9a-zA-Z]{24,}"
|
||||
check "Stripe Webhook" "whsec_[0-9a-zA-Z]{32,}"
|
||||
|
||||
# JWT / Generic secrets
|
||||
check "Hardcoded JWT" "eyJ[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{20,}"
|
||||
check "Generic Secret" "(secret|password|passwd|api_key|apikey|token)\s*[:=]\s*['\"][^'\"]{12,}['\"]"
|
||||
|
||||
# Private keys
|
||||
check "Private Key Block" "-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----"
|
||||
check "PEM Certificate" "-----BEGIN CERTIFICATE-----"
|
||||
|
||||
# Connection strings with credentials
|
||||
check "DB Connection" "(postgres|mysql|mongodb)://[^:]+:[^@]+@"
|
||||
check "Redis Auth" "redis://:[^@]+@\|rediss://:[^@]+@"
|
||||
|
||||
# Google
|
||||
check "Google API Key" "AIza[0-9A-Za-z_-]{35}"
|
||||
check "Google OAuth" "[0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent\.com"
|
||||
|
||||
# GitHub
|
||||
check "GitHub Token" "gh[ps]_[A-Za-z0-9]{36,}"
|
||||
check "GitHub Fine-grained" "github_pat_[A-Za-z0-9_]{82}"
|
||||
|
||||
# Slack
|
||||
check "Slack Token" "xox[baprs]-[0-9A-Za-z]{10,}"
|
||||
check "Slack Webhook" "https://hooks\.slack\.com/services/[A-Z0-9]{9,}/[A-Z0-9]{9,}/[A-Za-z0-9]{24,}"
|
||||
|
||||
# Twilio
|
||||
check "Twilio SID" "AC[a-z0-9]{32}"
|
||||
check "Twilio Token" "SK[a-z0-9]{32}"
|
||||
|
||||
if [ $FAIL -eq 1 ]; then
|
||||
echo ""
|
||||
echo "BLOCKED: Secrets detected in staged changes."
|
||||
echo "Remove secrets before committing. Use environment variables instead."
|
||||
echo "If this is a false positive, add it to .secretsignore or use:"
|
||||
echo " git commit --no-verify (only if you're 100% certain it's safe)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "No secrets detected in staged changes."
|
||||
```
|
||||
|
||||
### Scan Git History (post-incident)
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/scan-history.sh — scan entire git history for leaked secrets
|
||||
|
||||
PATTERNS=(
|
||||
"AKIA[0-9A-Z]{16}"
|
||||
"sk_live_[0-9a-zA-Z]{24}"
|
||||
"sk_test_[0-9a-zA-Z]{24}"
|
||||
"-----BEGIN.*PRIVATE KEY-----"
|
||||
"AIza[0-9A-Za-z_-]{35}"
|
||||
"ghp_[A-Za-z0-9]{36}"
|
||||
"xox[baprs]-[0-9A-Za-z]{10,}"
|
||||
)
|
||||
|
||||
for pattern in "${PATTERNS[@]}"; do
|
||||
echo "Scanning for: $pattern"
|
||||
git log --all -p --no-color 2>/dev/null | \
|
||||
grep -n "$pattern" | \
|
||||
grep "^+" | \
|
||||
grep -v "^+++" | \
|
||||
head -10
|
||||
done
|
||||
|
||||
# Alternative: use truffleHog or gitleaks for comprehensive scanning
|
||||
# gitleaks detect --source . --log-opts="--all"
|
||||
# trufflehog git file://. --only-verified
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pre-commit Hook Installation
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Install the pre-commit hook
|
||||
HOOK_PATH=".git/hooks/pre-commit"
|
||||
|
||||
cat > "$HOOK_PATH" << 'HOOK'
|
||||
#!/bin/bash
|
||||
# Pre-commit: scan for secrets before every commit
|
||||
|
||||
SCRIPT="scripts/scan-secrets.sh"
|
||||
|
||||
if [ -f "$SCRIPT" ]; then
|
||||
bash "$SCRIPT"
|
||||
else
|
||||
# Inline fallback if script not present
|
||||
if git diff --cached -U0 | grep "^+" | grep -qE "AKIA[0-9A-Z]{16}|sk_live_|-----BEGIN.*PRIVATE KEY"; then
|
||||
echo "BLOCKED: Possible secret detected in staged changes."
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
HOOK
|
||||
|
||||
chmod +x "$HOOK_PATH"
|
||||
echo "Pre-commit hook installed at $HOOK_PATH"
|
||||
```
|
||||
|
||||
Using `pre-commit` framework (recommended for teams):
|
||||
```yaml
|
||||
# .pre-commit-config.yaml
|
||||
repos:
|
||||
- repo: https://github.com/gitleaks/gitleaks
|
||||
rev: v8.18.0
|
||||
hooks:
|
||||
- id: gitleaks
|
||||
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: validate-env-example
|
||||
name: Check .env.example is up to date
|
||||
language: script
|
||||
entry: bash scripts/check-env-example.sh
|
||||
pass_filenames: false
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Credential Rotation Workflow
|
||||
|
||||
When a secret is leaked or compromised:
|
||||
|
||||
### Step 1 — Detect & Confirm
|
||||
```bash
|
||||
# Confirm which secret was exposed
|
||||
git log --all -p --no-color | grep -A2 -B2 "AKIA\|sk_live_\|SECRET"
|
||||
|
||||
# Check if secret is in any open PRs
|
||||
gh pr list --state open | while read pr; do
|
||||
gh pr diff $(echo $pr | awk '{print $1}') | grep -E "AKIA|sk_live_" && echo "Found in PR: $pr"
|
||||
done
|
||||
```
|
||||
|
||||
### Step 2 — Identify Exposure Window
|
||||
```bash
|
||||
# Find first commit that introduced the secret
|
||||
git log --all -p --no-color -- "*.env" "*.json" "*.yaml" "*.ts" "*.py" | \
|
||||
grep -B 10 "THE_LEAKED_VALUE" | grep "^commit" | tail -1
|
||||
|
||||
# Get commit date
|
||||
git show --format="%ci" COMMIT_HASH | head -1
|
||||
|
||||
# Check if secret appears in public repos (GitHub)
|
||||
gh api search/code -X GET -f q="THE_LEAKED_VALUE" | jq '.total_count, .items[].html_url'
|
||||
```
|
||||
|
||||
### Step 3 — Rotate Credential
|
||||
Per service — rotate immediately:
|
||||
- **AWS**: IAM console → delete access key → create new → update everywhere
|
||||
- **Stripe**: Dashboard → Developers → API keys → Roll key
|
||||
- **GitHub PAT**: Settings → Developer Settings → Personal access tokens → Revoke → Create new
|
||||
- **DB password**: `ALTER USER app_user PASSWORD 'new-strong-password-here';`
|
||||
- **JWT secret**: Rotate key (all existing sessions invalidated — users re-login)
|
||||
|
||||
### Step 4 — Update All Environments
|
||||
```bash
|
||||
# Update secret manager (source of truth)
|
||||
# Then redeploy to pull new values
|
||||
|
||||
# Vault KV v2
|
||||
vault kv put secret/myapp/prod \
|
||||
STRIPE_SECRET_KEY="sk_live_NEW..." \
|
||||
APP_SECRET="new-secret-here"
|
||||
|
||||
# AWS SSM
|
||||
aws ssm put-parameter \
|
||||
--name "/myapp/prod/STRIPE_SECRET_KEY" \
|
||||
--value "sk_live_NEW..." \
|
||||
--type "SecureString" \
|
||||
--overwrite
|
||||
|
||||
# 1Password
|
||||
op item edit "MyApp Prod" \
|
||||
--field "STRIPE_SECRET_KEY[password]=sk_live_NEW..."
|
||||
|
||||
# Doppler
|
||||
doppler secrets set STRIPE_SECRET_KEY="sk_live_NEW..." --project myapp --config prod
|
||||
```
|
||||
|
||||
### Step 5 — Remove from Git History
|
||||
```bash
|
||||
# WARNING: rewrites history — coordinate with team first
|
||||
git filter-repo --path-glob "*.env" --invert-paths
|
||||
|
||||
# Or remove specific string from all commits
|
||||
git filter-repo --replace-text <(echo "LEAKED_VALUE==>REDACTED")
|
||||
|
||||
# Force push all branches (requires team coordination + force push permissions)
|
||||
git push origin --force --all
|
||||
|
||||
# Notify all developers to re-clone
|
||||
```
|
||||
|
||||
### Step 6 — Verify
|
||||
```bash
|
||||
# Confirm secret no longer in history
|
||||
git log --all -p | grep "LEAKED_VALUE" | wc -l # should be 0
|
||||
|
||||
# Test new credentials work
|
||||
curl -H "Authorization: Bearer $NEW_TOKEN" https://api.service.com/test
|
||||
|
||||
# Monitor for unauthorized usage of old credential (check service audit logs)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Secret Manager Integrations
|
||||
|
||||
### HashiCorp Vault KV v2
|
||||
```bash
|
||||
# Setup
|
||||
export VAULT_ADDR="https://vault.internal.company.com"
|
||||
export VAULT_TOKEN="$(vault login -method=oidc -format=json | jq -r '.auth.client_token')"
|
||||
|
||||
# Write secrets
|
||||
vault kv put secret/myapp/prod \
|
||||
DATABASE_URL="postgres://user:pass@host/db" \
|
||||
APP_SECRET="$(openssl rand -base64 32)"
|
||||
|
||||
# Read secrets into env
|
||||
eval $(vault kv get -format=json secret/myapp/prod | \
|
||||
jq -r '.data.data | to_entries[] | "export \(.key)=\(.value)"')
|
||||
|
||||
# In CI/CD (GitHub Actions)
|
||||
# Use vault-action: hashicorp/vault-action@v2
|
||||
```
|
||||
|
||||
### AWS SSM Parameter Store
|
||||
```bash
|
||||
# Write (SecureString = encrypted with KMS)
|
||||
aws ssm put-parameter \
|
||||
--name "/myapp/prod/DATABASE_URL" \
|
||||
--value "postgres://..." \
|
||||
--type "SecureString" \
|
||||
--key-id "alias/myapp-secrets"
|
||||
|
||||
# Read all params for an app/env into shell
|
||||
eval $(aws ssm get-parameters-by-path \
|
||||
--path "/myapp/prod/" \
|
||||
--with-decryption \
|
||||
--query "Parameters[*].[Name,Value]" \
|
||||
--output text | \
|
||||
awk '{split($1,a,"/"); print "export " a[length(a)] "=\"" $2 "\""}')
|
||||
|
||||
# In Node.js at startup
|
||||
# Use @aws-sdk/client-ssm to pull params before server starts
|
||||
```
|
||||
|
||||
### 1Password CLI
|
||||
```bash
|
||||
# Authenticate
|
||||
eval $(op signin)
|
||||
|
||||
# Get a specific field
|
||||
op read "op://MyVault/MyApp Prod/STRIPE_SECRET_KEY"
|
||||
|
||||
# Export all fields from an item as env vars
|
||||
op item get "MyApp Prod" --format json | \
|
||||
jq -r '.fields[] | select(.value != null) | "export \(.label)=\"\(.value)\""' | \
|
||||
grep -E "^export [A-Z_]+" | source /dev/stdin
|
||||
|
||||
# .env injection
|
||||
op inject -i .env.tpl -o .env
|
||||
# .env.tpl uses {{ op://Vault/Item/field }} syntax
|
||||
```
|
||||
|
||||
### Doppler
|
||||
```bash
|
||||
# Setup
|
||||
doppler setup # interactive: select project + config
|
||||
|
||||
# Run any command with secrets injected
|
||||
doppler run -- node server.js
|
||||
doppler run -- npm run dev
|
||||
|
||||
# Export to .env (local dev only — never commit output)
|
||||
doppler secrets download --no-file --format env > .env.local
|
||||
|
||||
# Pull specific secret
|
||||
doppler secrets get DATABASE_URL --plain
|
||||
|
||||
# Sync to another environment
|
||||
doppler secrets upload --project myapp --config staging < .env.staging.example
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Drift Detection
|
||||
|
||||
Check if staging and prod have the same set of keys (values may differ):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/check-env-drift.sh
|
||||
|
||||
# Pull key names from both environments (not values)
|
||||
STAGING_KEYS=$(doppler secrets --project myapp --config staging --format json 2>/dev/null | \
|
||||
jq -r 'keys[]' | sort)
|
||||
PROD_KEYS=$(doppler secrets --project myapp --config prod --format json 2>/dev/null | \
|
||||
jq -r 'keys[]' | sort)
|
||||
|
||||
ONLY_IN_STAGING=$(comm -23 <(echo "$STAGING_KEYS") <(echo "$PROD_KEYS"))
|
||||
ONLY_IN_PROD=$(comm -13 <(echo "$STAGING_KEYS") <(echo "$PROD_KEYS"))
|
||||
|
||||
if [ -n "$ONLY_IN_STAGING" ]; then
|
||||
echo "Keys in STAGING but NOT in PROD:"
|
||||
echo "$ONLY_IN_STAGING" | sed 's/^/ /'
|
||||
fi
|
||||
|
||||
if [ -n "$ONLY_IN_PROD" ]; then
|
||||
echo "Keys in PROD but NOT in STAGING:"
|
||||
echo "$ONLY_IN_PROD" | sed 's/^/ /'
|
||||
fi
|
||||
|
||||
if [ -z "$ONLY_IN_STAGING" ] && [ -z "$ONLY_IN_PROD" ]; then
|
||||
echo "✅ No env drift detected — staging and prod have identical key sets"
|
||||
fi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Committing .env instead of .env.example** — add `.env` to .gitignore on day 1; use pre-commit hooks
|
||||
- **Storing secrets in CI/CD logs** — never `echo $SECRET`; mask vars in CI settings
|
||||
- **Rotating only one place** — secrets often appear in Heroku, Vercel, Docker, K8s, CI — update ALL
|
||||
- **Forgetting to invalidate sessions after JWT secret rotation** — all users will be logged out; communicate this
|
||||
- **Using .env.example with real values** — example files are public; strip everything sensitive
|
||||
- **Not monitoring after rotation** — watch audit logs for 24h after rotation to catch unauthorized old-credential use
|
||||
- **Weak secrets** — `APP_SECRET=mysecret` is not a secret. Use `openssl rand -base64 32`
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Secret manager is source of truth** — .env files are for local dev only; never in prod
|
||||
2. **Rotate on a schedule**, not just after incidents — quarterly minimum for long-lived keys
|
||||
3. **Principle of least privilege** — each service gets its own API key with minimal permissions
|
||||
4. **Audit access** — log every secret read in Vault/SSM; alert on anomalous access
|
||||
5. **Never log secrets** — add log scrubbing middleware that redacts known secret patterns
|
||||
6. **Use short-lived credentials** — prefer OIDC/instance roles over long-lived access keys
|
||||
7. **Separate secrets per environment** — never share a key between dev and prod
|
||||
8. **Document rotation runbooks** — before an incident, not during one
|
||||
157
engineering/git-worktree-manager/SKILL.md
Normal file
157
engineering/git-worktree-manager/SKILL.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Git Worktree Manager
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Parallel Development & Branch Isolation
|
||||
|
||||
## Overview
|
||||
|
||||
The Git Worktree Manager skill provides systematic management of Git worktrees for parallel development workflows. It handles worktree creation with automatic port allocation, environment file management, secret copying, and cleanup — enabling developers to run multiple Claude Code instances on separate features simultaneously without conflicts.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Worktree Lifecycle Management** — create, list, switch, and cleanup worktrees with automated setup
|
||||
- **Port Allocation & Isolation** — automatic port assignment per worktree to avoid dev server conflicts
|
||||
- **Environment Synchronization** — copy .env files, secrets, and config between main and worktrees
|
||||
- **Docker Compose Overrides** — generate per-worktree port override files for multi-service stacks
|
||||
- **Conflict Prevention** — detect and warn about shared resources, database names, and API endpoints
|
||||
- **Cleanup & Pruning** — safe removal with stale branch detection and uncommitted work warnings
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
- Running multiple Claude Code sessions on different features simultaneously
|
||||
- Working on a hotfix while a feature branch has uncommitted work
|
||||
- Reviewing a PR while continuing development on your branch
|
||||
- Parallel CI/testing against multiple branches
|
||||
- Monorepo development with isolated package changes
|
||||
|
||||
## Worktree Creation Workflow
|
||||
|
||||
### Step 1: Create Worktree
|
||||
|
||||
```bash
|
||||
# Create worktree for a new feature branch
|
||||
git worktree add ../project-feature-auth -b feature/auth
|
||||
|
||||
# Create worktree from an existing remote branch
|
||||
git worktree add ../project-fix-123 origin/fix/issue-123
|
||||
|
||||
# Create worktree with tracking
|
||||
git worktree add --track -b feature/new-api ../project-new-api origin/main
|
||||
```
|
||||
|
||||
### Step 2: Environment Setup
|
||||
|
||||
After creating the worktree, automatically:
|
||||
|
||||
1. **Copy environment files:**
|
||||
```bash
|
||||
cp .env ../project-feature-auth/.env
|
||||
cp .env.local ../project-feature-auth/.env.local 2>/dev/null
|
||||
```
|
||||
|
||||
2. **Install dependencies:**
|
||||
```bash
|
||||
cd ../project-feature-auth
|
||||
[ -f "pnpm-lock.yaml" ] && pnpm install
|
||||
[ -f "yarn.lock" ] && yarn install
|
||||
[ -f "package-lock.json" ] && npm install
|
||||
[ -f "bun.lockb" ] && bun install
|
||||
```
|
||||
|
||||
3. **Allocate ports:**
|
||||
```
|
||||
Main worktree: localhost:3000 (dev), :5432 (db), :6379 (redis)
|
||||
Worktree 1: localhost:3010 (dev), :5442 (db), :6389 (redis)
|
||||
Worktree 2: localhost:3020 (dev), :5452 (db), :6399 (redis)
|
||||
```
|
||||
|
||||
### Step 3: Docker Compose Override
|
||||
|
||||
For Docker Compose projects, generate per-worktree override:
|
||||
|
||||
```yaml
|
||||
# docker-compose.worktree.yml (auto-generated)
|
||||
services:
|
||||
app:
|
||||
ports:
|
||||
- "3010:3000"
|
||||
db:
|
||||
ports:
|
||||
- "5442:5432"
|
||||
redis:
|
||||
ports:
|
||||
- "6389:6379"
|
||||
```
|
||||
|
||||
Usage: `docker compose -f docker-compose.yml -f docker-compose.worktree.yml up`
|
||||
|
||||
### Step 4: Database Isolation
|
||||
|
||||
```bash
|
||||
# Option A: Separate database per worktree
|
||||
createdb myapp_feature_auth
|
||||
|
||||
# Option B: DATABASE_URL override
|
||||
echo 'DATABASE_URL="postgresql://localhost:5442/myapp_wt1"' >> .env.local
|
||||
|
||||
# Option C: SQLite — file-based, automatic isolation
|
||||
```
|
||||
|
||||
## Monorepo Optimization
|
||||
|
||||
Combine worktrees with sparse checkout for large repos:
|
||||
|
||||
```bash
|
||||
git worktree add --no-checkout ../project-packages-only
|
||||
cd ../project-packages-only
|
||||
git sparse-checkout init --cone
|
||||
git sparse-checkout set packages/shared packages/api
|
||||
git checkout feature/api-refactor
|
||||
```
|
||||
|
||||
## Claude Code Integration
|
||||
|
||||
Each worktree gets auto-generated CLAUDE.md:
|
||||
|
||||
```markdown
|
||||
# Worktree: feature/auth
|
||||
# Dev server port: 3010
|
||||
# Created: 2026-03-01
|
||||
|
||||
## Scope
|
||||
Focus on changes related to this branch only.
|
||||
|
||||
## Commands
|
||||
- Dev: PORT=3010 npm run dev
|
||||
- Test: npm test -- --related
|
||||
- Lint: npm run lint
|
||||
```
|
||||
|
||||
Run parallel sessions:
|
||||
```bash
|
||||
# Terminal 1: Main feature
|
||||
cd ~/project && claude
|
||||
# Terminal 2: Hotfix
|
||||
cd ~/project-hotfix && claude
|
||||
# Terminal 3: PR review
|
||||
cd ~/project-pr-review && claude
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Shared node_modules** — Worktrees share git dir but NOT node_modules. Always install deps.
|
||||
2. **Port conflicts** — Two dev servers on :3000 = silent failures. Always allocate unique ports.
|
||||
3. **Database migrations** — Migrations in one worktree affect all if sharing same DB. Isolate.
|
||||
4. **Git hooks** — Live in `.git/hooks` (shared). Worktree-specific hooks need symlinks.
|
||||
5. **IDE confusion** — VSCode may show wrong branch. Open as separate window.
|
||||
6. **Stale worktrees** — Prune regularly: `git worktree prune`.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. Name worktrees by purpose: `project-auth`, `project-hotfix-123`, `project-pr-456`
|
||||
2. Never create worktrees inside the main repo directory
|
||||
3. Keep worktrees short-lived — merge and cleanup within days
|
||||
4. Use the setup script — manual creation skips env/port/deps
|
||||
5. One Claude Code instance per worktree — isolation is the point
|
||||
6. Commit before switching — even WIP commits prevent lost work
|
||||
575
engineering/mcp-server-builder/SKILL.md
Normal file
575
engineering/mcp-server-builder/SKILL.md
Normal file
@@ -0,0 +1,575 @@
|
||||
# MCP Server Builder
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** AI / API Integration
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Design and implement Model Context Protocol (MCP) servers that expose any REST API, database, or service as structured tools for Claude and other LLMs. Covers both FastMCP (Python) and the TypeScript MCP SDK, with patterns for reading OpenAPI/Swagger specs, generating tool definitions, handling auth, errors, and testing.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **OpenAPI → MCP tools** — parse Swagger/OpenAPI specs and generate tool definitions
|
||||
- **FastMCP (Python)** — decorator-based server with automatic schema generation
|
||||
- **TypeScript MCP SDK** — typed server with zod validation
|
||||
- **Auth handling** — API keys, Bearer tokens, OAuth2, mTLS
|
||||
- **Error handling** — structured error responses LLMs can reason about
|
||||
- **Testing** — unit tests for tool handlers, integration tests with MCP inspector
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Exposing a REST API to Claude without writing a custom integration
|
||||
- Building reusable tool packs for a team's Claude setup
|
||||
- Wrapping internal company APIs (Jira, HubSpot, custom microservices)
|
||||
- Creating database-backed tools (read/write structured data)
|
||||
- Replacing brittle browser automation with typed API calls
|
||||
|
||||
---
|
||||
|
||||
## MCP Architecture
|
||||
|
||||
```
|
||||
Claude / LLM
|
||||
│
|
||||
│ MCP Protocol (JSON-RPC over stdio or HTTP/SSE)
|
||||
▼
|
||||
MCP Server
|
||||
│ calls
|
||||
▼
|
||||
External API / Database / Service
|
||||
```
|
||||
|
||||
Each MCP server exposes:
|
||||
- **Tools** — callable functions with typed inputs/outputs
|
||||
- **Resources** — readable data (files, DB rows, API responses)
|
||||
- **Prompts** — reusable prompt templates
|
||||
|
||||
---
|
||||
|
||||
## Reading an OpenAPI Spec
|
||||
|
||||
Given a Swagger/OpenAPI file, extract tool definitions:
|
||||
|
||||
```python
|
||||
import yaml
|
||||
import json
|
||||
|
||||
def openapi_to_tools(spec_path: str) -> list[dict]:
|
||||
with open(spec_path) as f:
|
||||
spec = yaml.safe_load(f)
|
||||
|
||||
tools = []
|
||||
for path, methods in spec.get("paths", {}).items():
|
||||
for method, op in methods.items():
|
||||
if method not in ("get", "post", "put", "patch", "delete"):
|
||||
continue
|
||||
|
||||
# Build parameter schema
|
||||
properties = {}
|
||||
required = []
|
||||
|
||||
# Path/query parameters
|
||||
for param in op.get("parameters", []):
|
||||
name = param["name"]
|
||||
schema = param.get("schema", {"type": "string"})
|
||||
properties[name] = {
|
||||
"type": schema.get("type", "string"),
|
||||
"description": param.get("description", ""),
|
||||
}
|
||||
if param.get("required"):
|
||||
required.append(name)
|
||||
|
||||
# Request body
|
||||
if "requestBody" in op:
|
||||
content = op["requestBody"].get("content", {})
|
||||
json_schema = content.get("application/json", {}).get("schema", {})
|
||||
if "$ref" in json_schema:
|
||||
ref_name = json_schema["$ref"].split("/")[-1]
|
||||
json_schema = spec["components"]["schemas"][ref_name]
|
||||
for prop_name, prop_schema in json_schema.get("properties", {}).items():
|
||||
properties[prop_name] = prop_schema
|
||||
required.extend(json_schema.get("required", []))
|
||||
|
||||
tool_name = op.get("operationId") or f"{method}_{path.replace('/', '_').strip('_')}"
|
||||
tools.append({
|
||||
"name": tool_name,
|
||||
"description": op.get("summary", op.get("description", "")),
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": properties,
|
||||
"required": required,
|
||||
}
|
||||
})
|
||||
|
||||
return tools
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Full Example: FastMCP Python Server for CRUD API
|
||||
|
||||
This builds a complete MCP server for a hypothetical Task Management REST API.
|
||||
|
||||
```python
|
||||
# server.py
|
||||
from fastmcp import FastMCP
|
||||
from pydantic import BaseModel, Field
|
||||
import httpx
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
# Initialize MCP server
|
||||
mcp = FastMCP(
|
||||
name="task-manager",
|
||||
description="MCP server for Task Management API",
|
||||
)
|
||||
|
||||
# Config
|
||||
API_BASE = os.environ.get("TASK_API_BASE", "https://api.tasks.example.com")
|
||||
API_KEY = os.environ["TASK_API_KEY"] # Fail fast if missing
|
||||
|
||||
# Shared HTTP client with auth
|
||||
def get_client() -> httpx.Client:
|
||||
return httpx.Client(
|
||||
base_url=API_BASE,
|
||||
headers={
|
||||
"Authorization": f"Bearer {API_KEY}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
timeout=30.0,
|
||||
)
|
||||
|
||||
|
||||
# ── Pydantic models for input validation ──────────────────────────────────────
|
||||
|
||||
class CreateTaskInput(BaseModel):
|
||||
title: str = Field(..., description="Task title", min_length=1, max_length=200)
|
||||
description: Optional[str] = Field(None, description="Task description")
|
||||
assignee_id: Optional[str] = Field(None, description="User ID to assign to")
|
||||
due_date: Optional[str] = Field(None, description="Due date in ISO 8601 format (YYYY-MM-DD)")
|
||||
priority: str = Field("medium", description="Priority: low, medium, high, critical")
|
||||
|
||||
class UpdateTaskInput(BaseModel):
|
||||
task_id: str = Field(..., description="Task ID to update")
|
||||
title: Optional[str] = Field(None, description="New title")
|
||||
status: Optional[str] = Field(None, description="New status: todo, in_progress, done, cancelled")
|
||||
assignee_id: Optional[str] = Field(None, description="Reassign to user ID")
|
||||
due_date: Optional[str] = Field(None, description="New due date (YYYY-MM-DD)")
|
||||
|
||||
|
||||
# ── Tool implementations ───────────────────────────────────────────────────────
|
||||
|
||||
@mcp.tool()
|
||||
def list_tasks(
|
||||
status: Optional[str] = None,
|
||||
assignee_id: Optional[str] = None,
|
||||
limit: int = 20,
|
||||
offset: int = 0,
|
||||
) -> dict:
|
||||
"""
|
||||
List tasks with optional filtering by status or assignee.
|
||||
Returns paginated results with total count.
|
||||
"""
|
||||
params = {"limit": limit, "offset": offset}
|
||||
if status:
|
||||
params["status"] = status
|
||||
if assignee_id:
|
||||
params["assignee_id"] = assignee_id
|
||||
|
||||
with get_client() as client:
|
||||
resp = client.get("/tasks", params=params)
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def get_task(task_id: str) -> dict:
|
||||
"""
|
||||
Get a single task by ID including full details and comments.
|
||||
"""
|
||||
with get_client() as client:
|
||||
resp = client.get(f"/tasks/{task_id}")
|
||||
if resp.status_code == 404:
|
||||
return {"error": f"Task {task_id} not found"}
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def create_task(input: CreateTaskInput) -> dict:
|
||||
"""
|
||||
Create a new task. Returns the created task with its ID.
|
||||
"""
|
||||
with get_client() as client:
|
||||
resp = client.post("/tasks", json=input.model_dump(exclude_none=True))
|
||||
if resp.status_code == 422:
|
||||
return {"error": "Validation failed", "details": resp.json()}
|
||||
resp.raise_for_status()
|
||||
task = resp.json()
|
||||
return {
|
||||
"success": True,
|
||||
"task_id": task["id"],
|
||||
"task": task,
|
||||
}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def update_task(input: UpdateTaskInput) -> dict:
|
||||
"""
|
||||
Update an existing task's title, status, assignee, or due date.
|
||||
Only provided fields are updated (PATCH semantics).
|
||||
"""
|
||||
payload = input.model_dump(exclude_none=True)
|
||||
task_id = payload.pop("task_id")
|
||||
|
||||
if not payload:
|
||||
return {"error": "No fields to update provided"}
|
||||
|
||||
with get_client() as client:
|
||||
resp = client.patch(f"/tasks/{task_id}", json=payload)
|
||||
if resp.status_code == 404:
|
||||
return {"error": f"Task {task_id} not found"}
|
||||
resp.raise_for_status()
|
||||
return {"success": True, "task": resp.json()}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def delete_task(task_id: str, confirm: bool = False) -> dict:
|
||||
"""
|
||||
Delete a task permanently. Set confirm=true to proceed.
|
||||
This action cannot be undone.
|
||||
"""
|
||||
if not confirm:
|
||||
return {
|
||||
"error": "Deletion requires explicit confirmation",
|
||||
"hint": "Call again with confirm=true to permanently delete this task",
|
||||
}
|
||||
|
||||
with get_client() as client:
|
||||
resp = client.delete(f"/tasks/{task_id}")
|
||||
if resp.status_code == 404:
|
||||
return {"error": f"Task {task_id} not found"}
|
||||
resp.raise_for_status()
|
||||
return {"success": True, "deleted_task_id": task_id}
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def search_tasks(query: str, limit: int = 10) -> dict:
|
||||
"""
|
||||
Full-text search across task titles and descriptions.
|
||||
Returns matching tasks ranked by relevance.
|
||||
"""
|
||||
with get_client() as client:
|
||||
resp = client.get("/tasks/search", params={"q": query, "limit": limit})
|
||||
resp.raise_for_status()
|
||||
results = resp.json()
|
||||
return {
|
||||
"query": query,
|
||||
"total": results.get("total", 0),
|
||||
"tasks": results.get("items", []),
|
||||
}
|
||||
|
||||
|
||||
# ── Resource: expose task list as readable resource ───────────────────────────
|
||||
|
||||
@mcp.resource("tasks://recent")
|
||||
def recent_tasks_resource() -> str:
|
||||
"""Returns the 10 most recently updated tasks as JSON."""
|
||||
with get_client() as client:
|
||||
resp = client.get("/tasks", params={"sort": "-updated_at", "limit": 10})
|
||||
resp.raise_for_status()
|
||||
return resp.text
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TypeScript MCP SDK Version
|
||||
|
||||
```typescript
|
||||
// server.ts
|
||||
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
|
||||
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
|
||||
import { z } from "zod";
|
||||
|
||||
const API_BASE = process.env.TASK_API_BASE ?? "https://api.tasks.example.com";
|
||||
const API_KEY = process.env.TASK_API_KEY!;
|
||||
if (!API_KEY) throw new Error("TASK_API_KEY is required");
|
||||
|
||||
const server = new McpServer({
|
||||
name: "task-manager",
|
||||
version: "1.0.0",
|
||||
});
|
||||
|
||||
async function apiRequest(
|
||||
method: string,
|
||||
path: string,
|
||||
body?: unknown,
|
||||
params?: Record<string, string>
|
||||
): Promise<unknown> {
|
||||
const url = new URL(`${API_BASE}${path}`);
|
||||
if (params) {
|
||||
Object.entries(params).forEach(([k, v]) => url.searchParams.set(k, v));
|
||||
}
|
||||
|
||||
const resp = await fetch(url.toString(), {
|
||||
method,
|
||||
headers: {
|
||||
Authorization: `Bearer ${API_KEY}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: body ? JSON.stringify(body) : undefined,
|
||||
});
|
||||
|
||||
if (!resp.ok) {
|
||||
const text = await resp.text();
|
||||
throw new Error(`API error ${resp.status}: ${text}`);
|
||||
}
|
||||
|
||||
return resp.json();
|
||||
}
|
||||
|
||||
// List tasks
|
||||
server.tool(
|
||||
"list_tasks",
|
||||
"List tasks with optional status/assignee filter",
|
||||
{
|
||||
status: z.enum(["todo", "in_progress", "done", "cancelled"]).optional(),
|
||||
assignee_id: z.string().optional(),
|
||||
limit: z.number().int().min(1).max(100).default(20),
|
||||
},
|
||||
async ({ status, assignee_id, limit }) => {
|
||||
const params: Record<string, string> = { limit: String(limit) };
|
||||
if (status) params.status = status;
|
||||
if (assignee_id) params.assignee_id = assignee_id;
|
||||
|
||||
const data = await apiRequest("GET", "/tasks", undefined, params);
|
||||
return {
|
||||
content: [{ type: "text", text: JSON.stringify(data, null, 2) }],
|
||||
};
|
||||
}
|
||||
);
|
||||
|
||||
// Create task
|
||||
server.tool(
|
||||
"create_task",
|
||||
"Create a new task",
|
||||
{
|
||||
title: z.string().min(1).max(200),
|
||||
description: z.string().optional(),
|
||||
priority: z.enum(["low", "medium", "high", "critical"]).default("medium"),
|
||||
due_date: z.string().regex(/^\d{4}-\d{2}-\d{2}$/).optional(),
|
||||
},
|
||||
async (input) => {
|
||||
const task = await apiRequest("POST", "/tasks", input);
|
||||
return {
|
||||
content: [
|
||||
{
|
||||
type: "text",
|
||||
text: `Created task: ${JSON.stringify(task, null, 2)}`,
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
);
|
||||
|
||||
// Start server
|
||||
const transport = new StdioServerTransport();
|
||||
await server.connect(transport);
|
||||
console.error("Task Manager MCP server running");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Auth Patterns
|
||||
|
||||
### API Key (header)
|
||||
```python
|
||||
headers={"X-API-Key": os.environ["API_KEY"]}
|
||||
```
|
||||
|
||||
### Bearer token
|
||||
```python
|
||||
headers={"Authorization": f"Bearer {os.environ['ACCESS_TOKEN']}"}
|
||||
```
|
||||
|
||||
### OAuth2 client credentials (auto-refresh)
|
||||
```python
|
||||
import httpx
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
_token_cache = {"token": None, "expires_at": datetime.min}
|
||||
|
||||
def get_access_token() -> str:
|
||||
if datetime.now() < _token_cache["expires_at"]:
|
||||
return _token_cache["token"]
|
||||
|
||||
resp = httpx.post(
|
||||
os.environ["TOKEN_URL"],
|
||||
data={
|
||||
"grant_type": "client_credentials",
|
||||
"client_id": os.environ["CLIENT_ID"],
|
||||
"client_secret": os.environ["CLIENT_SECRET"],
|
||||
"scope": "api.read api.write",
|
||||
},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
_token_cache["token"] = data["access_token"]
|
||||
_token_cache["expires_at"] = datetime.now() + timedelta(seconds=data["expires_in"] - 30)
|
||||
return _token_cache["token"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling Best Practices
|
||||
|
||||
LLMs reason better when errors are descriptive:
|
||||
|
||||
```python
|
||||
@mcp.tool()
|
||||
def get_user(user_id: str) -> dict:
|
||||
"""Get user by ID."""
|
||||
try:
|
||||
with get_client() as client:
|
||||
resp = client.get(f"/users/{user_id}")
|
||||
|
||||
if resp.status_code == 404:
|
||||
return {
|
||||
"error": "User not found",
|
||||
"user_id": user_id,
|
||||
"suggestion": "Use list_users to find valid user IDs",
|
||||
}
|
||||
|
||||
if resp.status_code == 403:
|
||||
return {
|
||||
"error": "Access denied",
|
||||
"detail": "Current API key lacks permission to read this user",
|
||||
}
|
||||
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
except httpx.TimeoutException:
|
||||
return {"error": "Request timed out", "suggestion": "Try again in a few seconds"}
|
||||
|
||||
except httpx.HTTPError as e:
|
||||
return {"error": f"HTTP error: {str(e)}"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing MCP Servers
|
||||
|
||||
### Unit tests (pytest)
|
||||
```python
|
||||
# tests/test_server.py
|
||||
import pytest
|
||||
from unittest.mock import patch, MagicMock
|
||||
from server import create_task, list_tasks
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_api_key(monkeypatch):
|
||||
monkeypatch.setenv("TASK_API_KEY", "test-key")
|
||||
|
||||
def test_create_task_success():
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 201
|
||||
mock_resp.json.return_value = {"id": "task-123", "title": "Test task"}
|
||||
|
||||
with patch("httpx.Client.post", return_value=mock_resp):
|
||||
from server import CreateTaskInput
|
||||
result = create_task(CreateTaskInput(title="Test task"))
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["task_id"] == "task-123"
|
||||
|
||||
def test_create_task_validation_error():
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.status_code = 422
|
||||
mock_resp.json.return_value = {"detail": "title too long"}
|
||||
|
||||
with patch("httpx.Client.post", return_value=mock_resp):
|
||||
from server import CreateTaskInput
|
||||
result = create_task(CreateTaskInput(title="x" * 201)) # Over limit
|
||||
|
||||
assert "error" in result
|
||||
```
|
||||
|
||||
### Integration test with MCP Inspector
|
||||
```bash
|
||||
# Install MCP inspector
|
||||
npx @modelcontextprotocol/inspector python server.py
|
||||
|
||||
# Or for TypeScript
|
||||
npx @modelcontextprotocol/inspector node dist/server.js
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Packaging and Distribution
|
||||
|
||||
### pyproject.toml for FastMCP server
|
||||
```toml
|
||||
[project]
|
||||
name = "my-mcp-server"
|
||||
version = "1.0.0"
|
||||
dependencies = [
|
||||
"fastmcp>=0.4",
|
||||
"httpx>=0.27",
|
||||
"pydantic>=2.0",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
my-mcp-server = "server:main"
|
||||
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
```
|
||||
|
||||
### Claude Desktop config (~/.claude/config.json)
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"task-manager": {
|
||||
"command": "python",
|
||||
"args": ["/path/to/server.py"],
|
||||
"env": {
|
||||
"TASK_API_KEY": "your-key-here",
|
||||
"TASK_API_BASE": "https://api.tasks.example.com"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Returning raw API errors** — LLMs can't act on HTTP 422; translate to human-readable messages
|
||||
- **No confirmation on destructive actions** — add `confirm: bool = False` pattern for deletes
|
||||
- **Blocking I/O without timeout** — always set `timeout=30.0` on HTTP clients
|
||||
- **Leaking API keys in tool responses** — never echo env vars back in responses
|
||||
- **Tool names with hyphens** — use underscores; some LLM routers break on hyphens
|
||||
- **Giant response payloads** — truncate/paginate; LLMs have context limits
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **One tool, one action** — don't build "swiss army knife" tools; compose small tools
|
||||
2. **Descriptive tool descriptions** — LLMs use them for routing; be explicit about what it does
|
||||
3. **Return structured data** — JSON dicts, not formatted strings, so LLMs can reason about fields
|
||||
4. **Validate inputs with Pydantic/zod** — catch bad inputs before hitting the API
|
||||
5. **Idempotency hints** — note in description if a tool is safe to retry
|
||||
6. **Resource vs Tool** — use resources for read-only data LLMs reference; tools for actions
|
||||
595
engineering/monorepo-navigator/SKILL.md
Normal file
595
engineering/monorepo-navigator/SKILL.md
Normal file
@@ -0,0 +1,595 @@
|
||||
# Monorepo Navigator
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Monorepo Architecture / Build Systems
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Navigate, manage, and optimize monorepos. Covers Turborepo, Nx, pnpm workspaces, and Lerna. Enables cross-package impact analysis, selective builds/tests on affected packages only, remote caching, dependency graph visualization, and structured migrations from multi-repo to monorepo. Includes Claude Code configuration for workspace-aware development.
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Cross-package impact analysis** — determine which apps break when a shared package changes
|
||||
- **Selective commands** — run tests/builds only for affected packages (not everything)
|
||||
- **Dependency graph** — visualize package relationships as Mermaid diagrams
|
||||
- **Build optimization** — remote caching, incremental builds, parallel execution
|
||||
- **Migration** — step-by-step multi-repo → monorepo with zero history loss
|
||||
- **Publishing** — changesets for versioning, pre-release channels, npm publish workflows
|
||||
- **Claude Code config** — workspace-aware CLAUDE.md with per-package instructions
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
Use when:
|
||||
- Multiple packages/apps share code (UI components, utils, types, API clients)
|
||||
- Build times are slow because everything rebuilds when anything changes
|
||||
- Migrating from multiple repos to a single repo
|
||||
- Need to publish packages to npm with coordinated versioning
|
||||
- Teams work across multiple packages and need unified tooling
|
||||
|
||||
Skip when:
|
||||
- Single-app project with no shared packages
|
||||
- Team/project boundaries are completely isolated (polyrepo is fine)
|
||||
- Shared code is minimal and copy-paste overhead is acceptable
|
||||
|
||||
---
|
||||
|
||||
## Tool Selection
|
||||
|
||||
| Tool | Best For | Key Feature |
|
||||
|---|---|---|
|
||||
| **Turborepo** | JS/TS monorepos, simple pipeline config | Best-in-class remote caching, minimal config |
|
||||
| **Nx** | Large enterprises, plugin ecosystem | Project graph, code generation, affected commands |
|
||||
| **pnpm workspaces** | Workspace protocol, disk efficiency | `workspace:*` for local package refs |
|
||||
| **Lerna** | npm publishing, versioning | Batch publishing, conventional commits |
|
||||
| **Changesets** | Modern versioning (preferred over Lerna) | Changelog generation, pre-release channels |
|
||||
|
||||
Most modern setups: **pnpm workspaces + Turborepo + Changesets**
|
||||
|
||||
---
|
||||
|
||||
## Turborepo
|
||||
|
||||
### turbo.json pipeline config
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://turbo.build/schema.json",
|
||||
"globalEnv": ["NODE_ENV", "DATABASE_URL"],
|
||||
"pipeline": {
|
||||
"build": {
|
||||
"dependsOn": ["^build"], // build deps first (topological order)
|
||||
"outputs": [".next/**", "dist/**", "build/**"],
|
||||
"env": ["NEXT_PUBLIC_API_URL"]
|
||||
},
|
||||
"test": {
|
||||
"dependsOn": ["^build"], // need built deps to test
|
||||
"outputs": ["coverage/**"],
|
||||
"cache": true
|
||||
},
|
||||
"lint": {
|
||||
"outputs": [],
|
||||
"cache": true
|
||||
},
|
||||
"dev": {
|
||||
"cache": false, // never cache dev servers
|
||||
"persistent": true // long-running process
|
||||
},
|
||||
"type-check": {
|
||||
"dependsOn": ["^build"],
|
||||
"outputs": []
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key commands
|
||||
|
||||
```bash
|
||||
# Build everything (respects dependency order)
|
||||
turbo run build
|
||||
|
||||
# Build only affected packages (requires --filter)
|
||||
turbo run build --filter=...[HEAD^1] # changed since last commit
|
||||
turbo run build --filter=...[main] # changed vs main branch
|
||||
|
||||
# Test only affected
|
||||
turbo run test --filter=...[HEAD^1]
|
||||
|
||||
# Run for a specific app and all its dependencies
|
||||
turbo run build --filter=@myorg/web...
|
||||
|
||||
# Run for a specific package only (no dependencies)
|
||||
turbo run build --filter=@myorg/ui
|
||||
|
||||
# Dry-run — see what would run without executing
|
||||
turbo run build --dry-run
|
||||
|
||||
# Enable remote caching (Vercel Remote Cache)
|
||||
turbo login
|
||||
turbo link
|
||||
```
|
||||
|
||||
### Remote caching setup
|
||||
|
||||
```bash
|
||||
# .turbo/config.json (auto-created by turbo link)
|
||||
{
|
||||
"teamid": "team_xxxx",
|
||||
"apiurl": "https://vercel.com"
|
||||
}
|
||||
|
||||
# Self-hosted cache server (open-source alternative)
|
||||
# Run ducktape/turborepo-remote-cache or Turborepo's official server
|
||||
TURBO_API=http://your-cache-server.internal \
|
||||
TURBO_TOKEN=your-token \
|
||||
TURBO_TEAM=your-team \
|
||||
turbo run build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Nx
|
||||
|
||||
### Project graph and affected commands
|
||||
|
||||
```bash
|
||||
# Install
|
||||
npx create-nx-workspace@latest my-monorepo
|
||||
|
||||
# Visualize the project graph (opens browser)
|
||||
nx graph
|
||||
|
||||
# Show affected packages for the current branch
|
||||
nx affected:graph
|
||||
|
||||
# Run only affected tests
|
||||
nx affected --target=test
|
||||
|
||||
# Run only affected builds
|
||||
nx affected --target=build
|
||||
|
||||
# Run affected with base/head (for CI)
|
||||
nx affected --target=test --base=main --head=HEAD
|
||||
```
|
||||
|
||||
### nx.json configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "./node_modules/nx/schemas/nx-schema.json",
|
||||
"targetDefaults": {
|
||||
"build": {
|
||||
"dependsOn": ["^build"],
|
||||
"cache": true
|
||||
},
|
||||
"test": {
|
||||
"cache": true,
|
||||
"inputs": ["default", "^production"]
|
||||
}
|
||||
},
|
||||
"namedInputs": {
|
||||
"default": ["{projectRoot}/**/*", "sharedGlobals"],
|
||||
"production": ["default", "!{projectRoot}/**/*.spec.ts", "!{projectRoot}/jest.config.*"],
|
||||
"sharedGlobals": []
|
||||
},
|
||||
"parallel": 4,
|
||||
"cacheDirectory": "/tmp/nx-cache"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## pnpm Workspaces
|
||||
|
||||
### pnpm-workspace.yaml
|
||||
|
||||
```yaml
|
||||
packages:
|
||||
- 'apps/*'
|
||||
- 'packages/*'
|
||||
- 'tools/*'
|
||||
```
|
||||
|
||||
### workspace:* protocol for local packages
|
||||
|
||||
```json
|
||||
// apps/web/package.json
|
||||
{
|
||||
"name": "@myorg/web",
|
||||
"dependencies": {
|
||||
"@myorg/ui": "workspace:*", // always use local version
|
||||
"@myorg/utils": "workspace:^", // local, but respect semver on publish
|
||||
"@myorg/types": "workspace:~"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Useful pnpm workspace commands
|
||||
|
||||
```bash
|
||||
# Install all packages across workspace
|
||||
pnpm install
|
||||
|
||||
# Run script in a specific package
|
||||
pnpm --filter @myorg/web dev
|
||||
|
||||
# Run script in all packages
|
||||
pnpm --filter "*" build
|
||||
|
||||
# Run script in a package and all its dependencies
|
||||
pnpm --filter @myorg/web... build
|
||||
|
||||
# Add a dependency to a specific package
|
||||
pnpm --filter @myorg/web add react
|
||||
|
||||
# Add a shared dev dependency to root
|
||||
pnpm add -D typescript -w
|
||||
|
||||
# List workspace packages
|
||||
pnpm ls --depth -1 -r
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Package Impact Analysis
|
||||
|
||||
When a shared package changes, determine what's affected before you ship.
|
||||
|
||||
```bash
|
||||
# Using Turborepo — show affected packages
|
||||
turbo run build --filter=...[HEAD^1] --dry-run 2>&1 | grep "Tasks to run"
|
||||
|
||||
# Using Nx
|
||||
nx affected:apps --base=main --head=HEAD # which apps are affected
|
||||
nx affected:libs --base=main --head=HEAD # which libs are affected
|
||||
|
||||
# Manual analysis with pnpm
|
||||
# Find all packages that depend on @myorg/utils:
|
||||
grep -r '"@myorg/utils"' packages/*/package.json apps/*/package.json
|
||||
|
||||
# Using jq for structured output
|
||||
for pkg in packages/*/package.json apps/*/package.json; do
|
||||
name=$(jq -r '.name' "$pkg")
|
||||
if jq -e '.dependencies["@myorg/utils"] // .devDependencies["@myorg/utils"]' "$pkg" > /dev/null 2>&1; then
|
||||
echo "$name depends on @myorg/utils"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependency Graph Visualization
|
||||
|
||||
Generate a Mermaid diagram from your workspace:
|
||||
|
||||
```bash
|
||||
# Generate dependency graph as Mermaid
|
||||
cat > scripts/gen-dep-graph.js << 'EOF'
|
||||
const { execSync } = require('child_process');
|
||||
const fs = require('fs');
|
||||
|
||||
// Parse pnpm workspace packages
|
||||
const packages = JSON.parse(
|
||||
execSync('pnpm ls --depth -1 -r --json').toString()
|
||||
);
|
||||
|
||||
let mermaid = 'graph TD\n';
|
||||
packages.forEach(pkg => {
|
||||
const deps = Object.keys(pkg.dependencies || {})
|
||||
.filter(d => d.startsWith('@myorg/'));
|
||||
deps.forEach(dep => {
|
||||
const from = pkg.name.replace('@myorg/', '');
|
||||
const to = dep.replace('@myorg/', '');
|
||||
mermaid += ` ${from} --> ${to}\n`;
|
||||
});
|
||||
});
|
||||
|
||||
fs.writeFileSync('docs/dep-graph.md', '```mermaid\n' + mermaid + '```\n');
|
||||
console.log('Written to docs/dep-graph.md');
|
||||
EOF
|
||||
node scripts/gen-dep-graph.js
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
web --> ui
|
||||
web --> utils
|
||||
web --> types
|
||||
mobile --> ui
|
||||
mobile --> utils
|
||||
mobile --> types
|
||||
admin --> ui
|
||||
admin --> utils
|
||||
api --> types
|
||||
ui --> utils
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Claude Code Configuration (Workspace-Aware CLAUDE.md)
|
||||
|
||||
Place a root CLAUDE.md + per-package CLAUDE.md files:
|
||||
|
||||
```markdown
|
||||
# /CLAUDE.md — Root (applies to all packages)
|
||||
|
||||
## Monorepo Structure
|
||||
- apps/web — Next.js customer-facing app
|
||||
- apps/admin — Next.js internal admin
|
||||
- apps/api — Express REST API
|
||||
- packages/ui — Shared React component library
|
||||
- packages/utils — Shared utilities (pure functions only)
|
||||
- packages/types — Shared TypeScript types (no runtime code)
|
||||
|
||||
## Build System
|
||||
- pnpm workspaces + Turborepo
|
||||
- Always use `pnpm --filter <package>` to scope commands
|
||||
- Never run `npm install` or `yarn` — pnpm only
|
||||
- Run `turbo run build --filter=...[HEAD^1]` before committing
|
||||
|
||||
## Task Scoping Rules
|
||||
- When modifying packages/ui: also run tests for apps/web and apps/admin (they depend on it)
|
||||
- When modifying packages/types: run type-check across ALL packages
|
||||
- When modifying apps/api: only need to test apps/api
|
||||
|
||||
## Package Manager
|
||||
pnpm — version pinned in packageManager field of root package.json
|
||||
```
|
||||
|
||||
```markdown
|
||||
# /packages/ui/CLAUDE.md — Package-specific
|
||||
|
||||
## This Package
|
||||
Shared React component library. Zero business logic. Pure UI only.
|
||||
|
||||
## Rules
|
||||
- All components must be exported from src/index.ts
|
||||
- No direct API calls in components — accept data via props
|
||||
- Every component needs a Storybook story in src/stories/
|
||||
- Use Tailwind for styling — no CSS modules or styled-components
|
||||
|
||||
## Testing
|
||||
- Component tests: `pnpm --filter @myorg/ui test`
|
||||
- Visual regression: `pnpm --filter @myorg/ui test:storybook`
|
||||
|
||||
## Publishing
|
||||
- Version bumps via changesets only — never edit package.json version manually
|
||||
- Run `pnpm changeset` from repo root after changes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration: Multi-Repo → Monorepo
|
||||
|
||||
```bash
|
||||
# Step 1: Create monorepo scaffold
|
||||
mkdir my-monorepo && cd my-monorepo
|
||||
pnpm init
|
||||
echo "packages:\n - 'apps/*'\n - 'packages/*'" > pnpm-workspace.yaml
|
||||
|
||||
# Step 2: Move repos with git history preserved
|
||||
mkdir -p apps packages
|
||||
|
||||
# For each existing repo:
|
||||
git clone https://github.com/myorg/web-app
|
||||
cd web-app
|
||||
git filter-repo --to-subdirectory-filter apps/web # rewrites history into subdir
|
||||
cd ..
|
||||
git remote add web-app ./web-app
|
||||
git fetch web-app --tags
|
||||
git merge web-app/main --allow-unrelated-histories
|
||||
|
||||
# Step 3: Update package names to scoped
|
||||
# In each package.json, change "name": "web" to "name": "@myorg/web"
|
||||
|
||||
# Step 4: Replace cross-repo npm deps with workspace:*
|
||||
# apps/web/package.json: "@myorg/ui": "1.2.3" → "@myorg/ui": "workspace:*"
|
||||
|
||||
# Step 5: Add shared configs to root
|
||||
cp apps/web/.eslintrc.js .eslintrc.base.js
|
||||
# Update each package's config to extend root:
|
||||
# { "extends": ["../../.eslintrc.base.js"] }
|
||||
|
||||
# Step 6: Add Turborepo
|
||||
pnpm add -D turbo -w
|
||||
# Create turbo.json (see above)
|
||||
|
||||
# Step 7: Unified CI (see CI section below)
|
||||
# Step 8: Test everything
|
||||
turbo run build test lint
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CI Patterns
|
||||
|
||||
### GitHub Actions — Affected Only
|
||||
|
||||
```yaml
|
||||
# .github/workflows/ci.yml
|
||||
name: CI
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
pull_request:
|
||||
|
||||
jobs:
|
||||
affected:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0 # full history needed for affected detection
|
||||
|
||||
- uses: pnpm/action-setup@v3
|
||||
with:
|
||||
version: 9
|
||||
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 20
|
||||
cache: pnpm
|
||||
|
||||
- run: pnpm install --frozen-lockfile
|
||||
|
||||
# Turborepo remote cache
|
||||
- uses: actions/cache@v4
|
||||
with:
|
||||
path: .turbo
|
||||
key: ${{ runner.os }}-turbo-${{ github.sha }}
|
||||
restore-keys: ${{ runner.os }}-turbo-
|
||||
|
||||
# Only test/build affected packages
|
||||
- name: Build affected
|
||||
run: turbo run build --filter=...[origin/main]
|
||||
env:
|
||||
TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
|
||||
TURBO_TEAM: ${{ vars.TURBO_TEAM }}
|
||||
|
||||
- name: Test affected
|
||||
run: turbo run test --filter=...[origin/main]
|
||||
|
||||
- name: Lint affected
|
||||
run: turbo run lint --filter=...[origin/main]
|
||||
```
|
||||
|
||||
### GitLab CI — Parallel Stages
|
||||
|
||||
```yaml
|
||||
# .gitlab-ci.yml
|
||||
stages: [install, build, test, publish]
|
||||
|
||||
variables:
|
||||
PNPM_CACHE_FOLDER: .pnpm-store
|
||||
|
||||
cache:
|
||||
key: pnpm-$CI_COMMIT_REF_SLUG
|
||||
paths: [.pnpm-store/, .turbo/]
|
||||
|
||||
install:
|
||||
stage: install
|
||||
script:
|
||||
- pnpm install --frozen-lockfile
|
||||
artifacts:
|
||||
paths: [node_modules/, packages/*/node_modules/, apps/*/node_modules/]
|
||||
expire_in: 1h
|
||||
|
||||
build:affected:
|
||||
stage: build
|
||||
needs: [install]
|
||||
script:
|
||||
- turbo run build --filter=...[origin/main]
|
||||
artifacts:
|
||||
paths: [apps/*/dist/, apps/*/.next/, packages/*/dist/]
|
||||
|
||||
test:affected:
|
||||
stage: test
|
||||
needs: [build:affected]
|
||||
script:
|
||||
- turbo run test --filter=...[origin/main]
|
||||
coverage: '/Statements\s*:\s*(\d+\.?\d*)%/'
|
||||
artifacts:
|
||||
reports:
|
||||
coverage_report:
|
||||
coverage_format: cobertura
|
||||
path: "**/coverage/cobertura-coverage.xml"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Publishing with Changesets
|
||||
|
||||
```bash
|
||||
# Install changesets
|
||||
pnpm add -D @changesets/cli -w
|
||||
pnpm changeset init
|
||||
|
||||
# After making changes, create a changeset
|
||||
pnpm changeset
|
||||
# Interactive: select packages, choose semver bump, write changelog entry
|
||||
|
||||
# In CI — version packages + update changelogs
|
||||
pnpm changeset version
|
||||
|
||||
# Publish all changed packages
|
||||
pnpm changeset publish
|
||||
|
||||
# Pre-release channel (for alpha/beta)
|
||||
pnpm changeset pre enter beta
|
||||
pnpm changeset
|
||||
pnpm changeset version # produces 1.2.0-beta.0
|
||||
pnpm changeset publish --tag beta
|
||||
pnpm changeset pre exit # back to stable releases
|
||||
```
|
||||
|
||||
### Automated publish workflow (GitHub Actions)
|
||||
|
||||
```yaml
|
||||
# .github/workflows/release.yml
|
||||
name: Release
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
release:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: pnpm/action-setup@v3
|
||||
- uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 20
|
||||
registry-url: https://registry.npmjs.org
|
||||
|
||||
- run: pnpm install --frozen-lockfile
|
||||
|
||||
- name: Create Release PR or Publish
|
||||
uses: changesets/action@v1
|
||||
with:
|
||||
publish: pnpm changeset publish
|
||||
version: pnpm changeset version
|
||||
commit: "chore: release packages"
|
||||
title: "chore: release packages"
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
| Pitfall | Fix |
|
||||
|---|---|
|
||||
| Running `turbo run build` without `--filter` on every PR | Always use `--filter=...[origin/main]` in CI |
|
||||
| `workspace:*` refs cause publish failures | Use `pnpm changeset publish` — it replaces `workspace:*` with real versions automatically |
|
||||
| All packages rebuild when unrelated file changes | Tune `inputs` in turbo.json to exclude docs, config files from cache keys |
|
||||
| Shared tsconfig causes one package to break all type-checks | Use `extends` properly — each package extends root but overrides `rootDir` / `outDir` |
|
||||
| git history lost during migration | Use `git filter-repo --to-subdirectory-filter` before merging — never move files manually |
|
||||
| Remote cache not working in CI | Check TURBO_TOKEN and TURBO_TEAM env vars; verify with `turbo run build --summarize` |
|
||||
| CLAUDE.md too generic — Claude modifies wrong package | Add explicit "When working on X, only touch files in apps/X" rules per package CLAUDE.md |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Root CLAUDE.md defines the map** — document every package, its purpose, and dependency rules
|
||||
2. **Per-package CLAUDE.md defines the rules** — what's allowed, what's forbidden, testing commands
|
||||
3. **Always scope commands with --filter** — running everything on every change defeats the purpose
|
||||
4. **Remote cache is not optional** — without it, monorepo CI is slower than multi-repo CI
|
||||
5. **Changesets over manual versioning** — never hand-edit package.json versions in a monorepo
|
||||
6. **Shared configs in root, extended in packages** — tsconfig.base.json, .eslintrc.base.js, jest.base.config.js
|
||||
7. **Impact analysis before merging shared package changes** — run affected check, communicate blast radius
|
||||
8. **Keep packages/types as pure TypeScript** — no runtime code, no dependencies, fast to build and type-check
|
||||
621
engineering/performance-profiler/SKILL.md
Normal file
621
engineering/performance-profiler/SKILL.md
Normal file
@@ -0,0 +1,621 @@
|
||||
# Performance Profiler
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Performance Engineering
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU, memory, and I/O bottlenecks; generates flamegraphs; analyzes bundle sizes; optimizes database queries; detects memory leaks; and runs load tests with k6 and Artillery. Always measures before and after.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **CPU profiling** — flamegraphs for Node.js, py-spy for Python, pprof for Go
|
||||
- **Memory profiling** — heap snapshots, leak detection, GC pressure
|
||||
- **Bundle analysis** — webpack-bundle-analyzer, Next.js bundle analyzer
|
||||
- **Database optimization** — EXPLAIN ANALYZE, slow query log, N+1 detection
|
||||
- **Load testing** — k6 scripts, Artillery scenarios, ramp-up patterns
|
||||
- **Before/after measurement** — establish baseline, profile, optimize, verify
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- App is slow and you don't know where the bottleneck is
|
||||
- P99 latency exceeds SLA before a release
|
||||
- Memory usage grows over time (suspected leak)
|
||||
- Bundle size increased after adding dependencies
|
||||
- Preparing for a traffic spike (load test before launch)
|
||||
- Database queries taking >100ms
|
||||
|
||||
---
|
||||
|
||||
## Golden Rule: Measure First
|
||||
|
||||
```bash
|
||||
# Establish baseline BEFORE any optimization
|
||||
# Record: P50, P95, P99 latency | RPS | error rate | memory usage
|
||||
|
||||
# Wrong: "I think the N+1 query is slow, let me fix it"
|
||||
# Right: Profile → confirm bottleneck → fix → measure again → verify improvement
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Node.js Profiling
|
||||
|
||||
### CPU Flamegraph
|
||||
|
||||
```bash
|
||||
# Method 1: clinic.js (best for development)
|
||||
npm install -g clinic
|
||||
|
||||
# CPU flamegraph
|
||||
clinic flame -- node dist/server.js
|
||||
|
||||
# Heap profiler
|
||||
clinic heapprofiler -- node dist/server.js
|
||||
|
||||
# Bubble chart (event loop blocking)
|
||||
clinic bubbles -- node dist/server.js
|
||||
|
||||
# Load with autocannon while profiling
|
||||
autocannon -c 50 -d 30 http://localhost:3000/api/tasks &
|
||||
clinic flame -- node dist/server.js
|
||||
```
|
||||
|
||||
```bash
|
||||
# Method 2: Node.js built-in profiler
|
||||
node --prof dist/server.js
|
||||
# After running some load:
|
||||
node --prof-process isolate-*.log | head -100
|
||||
```
|
||||
|
||||
```bash
|
||||
# Method 3: V8 CPU profiler via inspector
|
||||
node --inspect dist/server.js
|
||||
# Open Chrome DevTools → Performance → Record
|
||||
```
|
||||
|
||||
### Heap Snapshot / Memory Leak Detection
|
||||
|
||||
```javascript
|
||||
// Add to your server for on-demand heap snapshots
|
||||
import v8 from 'v8'
|
||||
import fs from 'fs'
|
||||
|
||||
// Endpoint: POST /debug/heap-snapshot (protect with auth!)
|
||||
app.post('/debug/heap-snapshot', (req, res) => {
|
||||
const filename = `heap-${Date.now()}.heapsnapshot`
|
||||
const snapshot = v8.writeHeapSnapshot(filename)
|
||||
res.json({ snapshot })
|
||||
})
|
||||
```
|
||||
|
||||
```bash
|
||||
# Take snapshots over time and compare in Chrome DevTools
|
||||
curl -X POST http://localhost:3000/debug/heap-snapshot
|
||||
# Wait 5 minutes of load
|
||||
curl -X POST http://localhost:3000/debug/heap-snapshot
|
||||
# Open both snapshots in Chrome → Memory → Compare
|
||||
```
|
||||
|
||||
### Detect Event Loop Blocking
|
||||
|
||||
```javascript
|
||||
// Add blocked-at to detect synchronous blocking
|
||||
import blocked from 'blocked-at'
|
||||
|
||||
blocked((time, stack) => {
|
||||
console.warn(`Event loop blocked for ${time}ms`)
|
||||
console.warn(stack.join('\n'))
|
||||
}, { threshold: 100 }) // Alert if blocked > 100ms
|
||||
```
|
||||
|
||||
### Node.js Memory Profiling Script
|
||||
|
||||
```javascript
|
||||
// scripts/memory-profile.mjs
|
||||
// Run: node --experimental-vm-modules scripts/memory-profile.mjs
|
||||
|
||||
import { createRequire } from 'module'
|
||||
const require = createRequire(import.meta.url)
|
||||
|
||||
function formatBytes(bytes) {
|
||||
return (bytes / 1024 / 1024).toFixed(2) + ' MB'
|
||||
}
|
||||
|
||||
function measureMemory(label) {
|
||||
const mem = process.memoryUsage()
|
||||
console.log(`\n[${label}]`)
|
||||
console.log(` RSS: ${formatBytes(mem.rss)}`)
|
||||
console.log(` Heap Used: ${formatBytes(mem.heapUsed)}`)
|
||||
console.log(` Heap Total:${formatBytes(mem.heapTotal)}`)
|
||||
console.log(` External: ${formatBytes(mem.external)}`)
|
||||
return mem
|
||||
}
|
||||
|
||||
const baseline = measureMemory('Baseline')
|
||||
|
||||
// Simulate your operation
|
||||
for (let i = 0; i < 1000; i++) {
|
||||
// Replace with your actual operation
|
||||
const result = await someOperation()
|
||||
}
|
||||
|
||||
const after = measureMemory('After 1000 operations')
|
||||
|
||||
console.log(`\n[Delta]`)
|
||||
console.log(` Heap Used: +${formatBytes(after.heapUsed - baseline.heapUsed)}`)
|
||||
|
||||
// If heap keeps growing across GC cycles, you have a leak
|
||||
global.gc?.() // Run with --expose-gc flag
|
||||
const afterGC = measureMemory('After GC')
|
||||
if (afterGC.heapUsed > baseline.heapUsed * 1.1) {
|
||||
console.warn('⚠️ Possible memory leak detected (>10% growth after GC)')
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Python Profiling
|
||||
|
||||
### CPU Profiling with py-spy
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pip install py-spy
|
||||
|
||||
# Profile a running process (no code changes needed)
|
||||
py-spy top --pid $(pgrep -f "uvicorn")
|
||||
|
||||
# Generate flamegraph SVG
|
||||
py-spy record -o flamegraph.svg --pid $(pgrep -f "uvicorn") --duration 30
|
||||
|
||||
# Profile from the start
|
||||
py-spy record -o flamegraph.svg -- python -m uvicorn app.main:app
|
||||
|
||||
# Open flamegraph.svg in browser — look for wide bars = hot code paths
|
||||
```
|
||||
|
||||
### cProfile for function-level profiling
|
||||
|
||||
```python
|
||||
# scripts/profile_endpoint.py
|
||||
import cProfile
|
||||
import pstats
|
||||
import io
|
||||
from app.services.task_service import TaskService
|
||||
|
||||
def run():
|
||||
service = TaskService()
|
||||
for _ in range(100):
|
||||
service.list_tasks(user_id="user_1", page=1, limit=20)
|
||||
|
||||
profiler = cProfile.Profile()
|
||||
profiler.enable()
|
||||
run()
|
||||
profiler.disable()
|
||||
|
||||
# Print top 20 functions by cumulative time
|
||||
stream = io.StringIO()
|
||||
stats = pstats.Stats(profiler, stream=stream)
|
||||
stats.sort_stats('cumulative')
|
||||
stats.print_stats(20)
|
||||
print(stream.getvalue())
|
||||
```
|
||||
|
||||
### Memory profiling with memory_profiler
|
||||
|
||||
```python
|
||||
# pip install memory-profiler
|
||||
from memory_profiler import profile
|
||||
|
||||
@profile
|
||||
def my_function():
|
||||
# Function to profile
|
||||
data = load_large_dataset()
|
||||
result = process(data)
|
||||
return result
|
||||
```
|
||||
|
||||
```bash
|
||||
# Run with line-by-line memory tracking
|
||||
python -m memory_profiler scripts/profile_function.py
|
||||
|
||||
# Output:
|
||||
# Line # Mem usage Increment Line Contents
|
||||
# ================================================
|
||||
# 10 45.3 MiB 45.3 MiB def my_function():
|
||||
# 11 78.1 MiB 32.8 MiB data = load_large_dataset()
|
||||
# 12 156.2 MiB 78.1 MiB result = process(data)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Go Profiling with pprof
|
||||
|
||||
```go
|
||||
// main.go — add pprof endpoints
|
||||
import _ "net/http/pprof"
|
||||
import "net/http"
|
||||
|
||||
func main() {
|
||||
// pprof endpoints at /debug/pprof/
|
||||
go func() {
|
||||
log.Println(http.ListenAndServe(":6060", nil))
|
||||
}()
|
||||
// ... rest of your app
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# CPU profile (30s)
|
||||
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
|
||||
|
||||
# Memory profile
|
||||
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap
|
||||
|
||||
# Goroutine leak detection
|
||||
curl http://localhost:6060/debug/pprof/goroutine?debug=1
|
||||
|
||||
# In pprof UI: "Flame Graph" view → find the tallest bars
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bundle Size Analysis
|
||||
|
||||
### Next.js Bundle Analyzer
|
||||
|
||||
```bash
|
||||
# Install
|
||||
pnpm add -D @next/bundle-analyzer
|
||||
|
||||
# next.config.js
|
||||
const withBundleAnalyzer = require('@next/bundle-analyzer')({
|
||||
enabled: process.env.ANALYZE === 'true',
|
||||
})
|
||||
module.exports = withBundleAnalyzer({})
|
||||
|
||||
# Run analyzer
|
||||
ANALYZE=true pnpm build
|
||||
# Opens browser with treemap of bundle
|
||||
```
|
||||
|
||||
### What to look for
|
||||
|
||||
```bash
|
||||
# Find the largest chunks
|
||||
pnpm build 2>&1 | grep -E "^\s+(λ|○|●)" | sort -k4 -rh | head -20
|
||||
|
||||
# Check if a specific package is too large
|
||||
# Visit: https://bundlephobia.com/package/moment@2.29.4
|
||||
# moment: 67.9kB gzipped → replace with date-fns (13.8kB) or dayjs (6.9kB)
|
||||
|
||||
# Find duplicate packages
|
||||
pnpm dedupe --check
|
||||
|
||||
# Visualize what's in a chunk
|
||||
npx source-map-explorer .next/static/chunks/*.js
|
||||
```
|
||||
|
||||
### Common bundle wins
|
||||
|
||||
```typescript
|
||||
// Before: import entire lodash
|
||||
import _ from 'lodash' // 71kB
|
||||
|
||||
// After: import only what you need
|
||||
import debounce from 'lodash/debounce' // 2kB
|
||||
|
||||
// Before: moment.js
|
||||
import moment from 'moment' // 67kB
|
||||
|
||||
// After: dayjs
|
||||
import dayjs from 'dayjs' // 7kB
|
||||
|
||||
// Before: static import (always in bundle)
|
||||
import HeavyChart from '@/components/HeavyChart'
|
||||
|
||||
// After: dynamic import (loaded on demand)
|
||||
const HeavyChart = dynamic(() => import('@/components/HeavyChart'), {
|
||||
loading: () => <Skeleton />,
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Query Optimization
|
||||
|
||||
### Find slow queries
|
||||
|
||||
```sql
|
||||
-- PostgreSQL: enable pg_stat_statements
|
||||
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
|
||||
|
||||
-- Top 20 slowest queries
|
||||
SELECT
|
||||
round(mean_exec_time::numeric, 2) AS mean_ms,
|
||||
calls,
|
||||
round(total_exec_time::numeric, 2) AS total_ms,
|
||||
round(stddev_exec_time::numeric, 2) AS stddev_ms,
|
||||
left(query, 80) AS query
|
||||
FROM pg_stat_statements
|
||||
WHERE calls > 10
|
||||
ORDER BY mean_exec_time DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Reset stats
|
||||
SELECT pg_stat_statements_reset();
|
||||
```
|
||||
|
||||
```bash
|
||||
# MySQL slow query log
|
||||
mysql -e "SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 0.1;"
|
||||
tail -f /var/log/mysql/slow-query.log
|
||||
```
|
||||
|
||||
### EXPLAIN ANALYZE
|
||||
|
||||
```sql
|
||||
-- Always use EXPLAIN (ANALYZE, BUFFERS) for real timing
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
|
||||
SELECT t.*, u.name as assignee_name
|
||||
FROM tasks t
|
||||
LEFT JOIN users u ON u.id = t.assignee_id
|
||||
WHERE t.project_id = 'proj_123'
|
||||
AND t.deleted_at IS NULL
|
||||
ORDER BY t.created_at DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Look for:
|
||||
-- Seq Scan on large table → needs index
|
||||
-- Nested Loop with high rows → N+1, consider JOIN or batch
|
||||
-- Sort → can index handle the sort?
|
||||
-- Hash Join → fine for moderate sizes
|
||||
```
|
||||
|
||||
### Detect N+1 Queries
|
||||
|
||||
```typescript
|
||||
// Add query logging in dev
|
||||
import { db } from './client'
|
||||
|
||||
// Drizzle: enable logging
|
||||
const db = drizzle(pool, { logger: true })
|
||||
|
||||
// Or use a query counter middleware
|
||||
let queryCount = 0
|
||||
db.$on('query', () => queryCount++)
|
||||
|
||||
// In tests:
|
||||
queryCount = 0
|
||||
const tasks = await getTasksWithAssignees(projectId)
|
||||
expect(queryCount).toBe(1) // Fail if it's 21 (1 + 20 N+1s)
|
||||
```
|
||||
|
||||
```python
|
||||
# Django: detect N+1 with django-silk or nplusone
|
||||
from nplusone.ext.django.middleware import NPlusOneMiddleware
|
||||
MIDDLEWARE = ['nplusone.ext.django.middleware.NPlusOneMiddleware']
|
||||
NPLUSONE_RAISE = True # Raise exception on N+1 in tests
|
||||
```
|
||||
|
||||
### Fix N+1 — Before/After
|
||||
|
||||
```typescript
|
||||
// Before: N+1 (1 query for tasks + N queries for assignees)
|
||||
const tasks = await db.select().from(tasksTable)
|
||||
for (const task of tasks) {
|
||||
task.assignee = await db.select().from(usersTable)
|
||||
.where(eq(usersTable.id, task.assigneeId))
|
||||
.then(r => r[0])
|
||||
}
|
||||
|
||||
// After: 1 query with JOIN
|
||||
const tasks = await db
|
||||
.select({
|
||||
id: tasksTable.id,
|
||||
title: tasksTable.title,
|
||||
assigneeName: usersTable.name,
|
||||
assigneeEmail: usersTable.email,
|
||||
})
|
||||
.from(tasksTable)
|
||||
.leftJoin(usersTable, eq(usersTable.id, tasksTable.assigneeId))
|
||||
.where(eq(tasksTable.projectId, projectId))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Load Testing with k6
|
||||
|
||||
```javascript
|
||||
// tests/load/api-load-test.js
|
||||
import http from 'k6/http'
|
||||
import { check, sleep } from 'k6'
|
||||
import { Rate, Trend } from 'k6/metrics'
|
||||
|
||||
const errorRate = new Rate('errors')
|
||||
const taskListDuration = new Trend('task_list_duration')
|
||||
|
||||
export const options = {
|
||||
stages: [
|
||||
{ duration: '30s', target: 10 }, // Ramp up to 10 VUs
|
||||
{ duration: '1m', target: 50 }, // Ramp to 50 VUs
|
||||
{ duration: '2m', target: 50 }, // Sustain 50 VUs
|
||||
{ duration: '30s', target: 100 }, // Spike to 100 VUs
|
||||
{ duration: '1m', target: 50 }, // Back to 50
|
||||
{ duration: '30s', target: 0 }, // Ramp down
|
||||
],
|
||||
thresholds: {
|
||||
http_req_duration: ['p(95)<500'], // 95% of requests < 500ms
|
||||
http_req_duration: ['p(99)<1000'], // 99% < 1s
|
||||
errors: ['rate<0.01'], // Error rate < 1%
|
||||
task_list_duration: ['p(95)<200'], // Task list specifically < 200ms
|
||||
},
|
||||
}
|
||||
|
||||
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000'
|
||||
|
||||
export function setup() {
|
||||
// Get auth token once
|
||||
const loginRes = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
|
||||
email: 'loadtest@example.com',
|
||||
password: 'loadtest123',
|
||||
}), { headers: { 'Content-Type': 'application/json' } })
|
||||
|
||||
return { token: loginRes.json('token') }
|
||||
}
|
||||
|
||||
export default function(data) {
|
||||
const headers = {
|
||||
'Authorization': `Bearer ${data.token}`,
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
// Scenario 1: List tasks
|
||||
const start = Date.now()
|
||||
const listRes = http.get(`${BASE_URL}/api/tasks?limit=20`, { headers })
|
||||
taskListDuration.add(Date.now() - start)
|
||||
|
||||
check(listRes, {
|
||||
'list tasks: status 200': (r) => r.status === 200,
|
||||
'list tasks: has items': (r) => r.json('items') !== undefined,
|
||||
}) || errorRate.add(1)
|
||||
|
||||
sleep(0.5)
|
||||
|
||||
// Scenario 2: Create task
|
||||
const createRes = http.post(
|
||||
`${BASE_URL}/api/tasks`,
|
||||
JSON.stringify({ title: `Load test task ${Date.now()}`, priority: 'medium' }),
|
||||
{ headers }
|
||||
)
|
||||
|
||||
check(createRes, {
|
||||
'create task: status 201': (r) => r.status === 201,
|
||||
}) || errorRate.add(1)
|
||||
|
||||
sleep(1)
|
||||
}
|
||||
|
||||
export function teardown(data) {
|
||||
// Cleanup: delete load test tasks
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Run load test
|
||||
k6 run tests/load/api-load-test.js \
|
||||
--env BASE_URL=https://staging.myapp.com
|
||||
|
||||
# With Grafana output
|
||||
k6 run --out influxdb=http://localhost:8086/k6 tests/load/api-load-test.js
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Before/After Measurement Template
|
||||
|
||||
```markdown
|
||||
## Performance Optimization: [What You Fixed]
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Engineer:** @username
|
||||
**Ticket:** PROJ-123
|
||||
|
||||
### Problem
|
||||
[1-2 sentences: what was slow, how was it observed]
|
||||
|
||||
### Root Cause
|
||||
[What the profiler revealed]
|
||||
|
||||
### Baseline (Before)
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| P50 latency | 480ms |
|
||||
| P95 latency | 1,240ms |
|
||||
| P99 latency | 3,100ms |
|
||||
| RPS @ 50 VUs | 42 |
|
||||
| Error rate | 0.8% |
|
||||
| DB queries/req | 23 (N+1) |
|
||||
|
||||
Profiler evidence: [link to flamegraph or screenshot]
|
||||
|
||||
### Fix Applied
|
||||
[What changed — code diff or description]
|
||||
|
||||
### After
|
||||
| Metric | Before | After | Delta |
|
||||
|--------|--------|-------|-------|
|
||||
| P50 latency | 480ms | 48ms | -90% |
|
||||
| P95 latency | 1,240ms | 120ms | -90% |
|
||||
| P99 latency | 3,100ms | 280ms | -91% |
|
||||
| RPS @ 50 VUs | 42 | 380 | +804% |
|
||||
| Error rate | 0.8% | 0% | -100% |
|
||||
| DB queries/req | 23 | 1 | -96% |
|
||||
|
||||
### Verification
|
||||
Load test run: [link to k6 output]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimization Checklist
|
||||
|
||||
### Quick wins (check these first)
|
||||
|
||||
```
|
||||
Database
|
||||
□ Missing indexes on WHERE/ORDER BY columns
|
||||
□ N+1 queries (check query count per request)
|
||||
□ Loading all columns when only 2-3 needed (SELECT *)
|
||||
□ No LIMIT on unbounded queries
|
||||
□ Missing connection pool (creating new connection per request)
|
||||
|
||||
Node.js
|
||||
□ Sync I/O (fs.readFileSync) in hot path
|
||||
□ JSON.parse/stringify of large objects in hot loop
|
||||
□ Missing caching for expensive computations
|
||||
□ No compression (gzip/brotli) on responses
|
||||
□ Dependencies loaded in request handler (move to module level)
|
||||
|
||||
Bundle
|
||||
□ Moment.js → dayjs/date-fns
|
||||
□ Lodash (full) → lodash/function imports
|
||||
□ Static imports of heavy components → dynamic imports
|
||||
□ Images not optimized / not using next/image
|
||||
□ No code splitting on routes
|
||||
|
||||
API
|
||||
□ No pagination on list endpoints
|
||||
□ No response caching (Cache-Control headers)
|
||||
□ Serial awaits that could be parallel (Promise.all)
|
||||
□ Fetching related data in a loop instead of JOIN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Optimizing without measuring** — you'll optimize the wrong thing
|
||||
- **Testing in development** — profile against production-like data volumes
|
||||
- **Ignoring P99** — P50 can look fine while P99 is catastrophic
|
||||
- **Premature optimization** — fix correctness first, then performance
|
||||
- **Not re-measuring** — always verify the fix actually improved things
|
||||
- **Load testing production** — use staging with production-size data
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Baseline first, always** — record metrics before touching anything
|
||||
2. **One change at a time** — isolate the variable to confirm causation
|
||||
3. **Profile with realistic data** — 10 rows in dev, millions in prod — different bottlenecks
|
||||
4. **Set performance budgets** — `p(95) < 200ms` in CI thresholds with k6
|
||||
5. **Monitor continuously** — add Datadog/Prometheus metrics for key paths
|
||||
6. **Cache invalidation strategy** — cache aggressively, invalidate precisely
|
||||
7. **Document the win** — before/after in the PR description motivates the team
|
||||
379
engineering/pr-review-expert/SKILL.md
Normal file
379
engineering/pr-review-expert/SKILL.md
Normal file
@@ -0,0 +1,379 @@
|
||||
# PR Review Expert
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** Code Review / Quality Assurance
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Structured, systematic code review for GitHub PRs and GitLab MRs. Goes beyond style nits — this skill
|
||||
performs blast radius analysis, security scanning, breaking change detection, and test coverage delta
|
||||
calculation. Produces a reviewer-ready report with a 30+ item checklist and prioritized findings.
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Blast radius analysis** — trace which files, services, and downstream consumers could break
|
||||
- **Security scan** — SQL injection, XSS, auth bypass, secret exposure, dependency vulns
|
||||
- **Test coverage delta** — new code vs new tests ratio
|
||||
- **Breaking change detection** — API contracts, DB schema migrations, config keys
|
||||
- **Ticket linking** — verify Jira/Linear ticket exists and matches scope
|
||||
- **Performance impact** — N+1 queries, bundle size regression, memory allocations
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
- Before merging any PR/MR that touches shared libraries, APIs, or DB schema
|
||||
- When a PR is large (>200 lines changed) and needs structured review
|
||||
- Onboarding new contributors whose PRs need thorough feedback
|
||||
- Security-sensitive code paths (auth, payments, PII handling)
|
||||
- After an incident — review similar PRs proactively
|
||||
|
||||
---
|
||||
|
||||
## Fetching the Diff
|
||||
|
||||
### GitHub (gh CLI)
|
||||
```bash
|
||||
# View diff in terminal
|
||||
gh pr diff <PR_NUMBER>
|
||||
|
||||
# Get PR metadata (title, body, labels, linked issues)
|
||||
gh pr view <PR_NUMBER> --json title,body,labels,assignees,milestone
|
||||
|
||||
# List files changed
|
||||
gh pr diff <PR_NUMBER> --name-only
|
||||
|
||||
# Check CI status
|
||||
gh pr checks <PR_NUMBER>
|
||||
|
||||
# Download diff to file for analysis
|
||||
gh pr diff <PR_NUMBER> > /tmp/pr-<PR_NUMBER>.diff
|
||||
```
|
||||
|
||||
### GitLab (glab CLI)
|
||||
```bash
|
||||
# View MR diff
|
||||
glab mr diff <MR_IID>
|
||||
|
||||
# MR details as JSON
|
||||
glab mr view <MR_IID> --output json
|
||||
|
||||
# List changed files
|
||||
glab mr diff <MR_IID> --name-only
|
||||
|
||||
# Download diff
|
||||
glab mr diff <MR_IID> > /tmp/mr-<MR_IID>.diff
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1 — Fetch Context
|
||||
|
||||
```bash
|
||||
PR=123
|
||||
gh pr view $PR --json title,body,labels,milestone,assignees | jq .
|
||||
gh pr diff $PR --name-only
|
||||
gh pr diff $PR > /tmp/pr-$PR.diff
|
||||
```
|
||||
|
||||
### Step 2 — Blast Radius Analysis
|
||||
|
||||
For each changed file, identify:
|
||||
|
||||
1. **Direct dependents** — who imports this file?
|
||||
```bash
|
||||
# Find all files importing a changed module
|
||||
grep -r "from ['\"].*changed-module['\"]" src/ --include="*.ts" -l
|
||||
grep -r "require(['\"].*changed-module" src/ --include="*.js" -l
|
||||
|
||||
# Python
|
||||
grep -r "from changed_module import\|import changed_module" . --include="*.py" -l
|
||||
```
|
||||
|
||||
2. **Service boundaries** — does this change cross a service?
|
||||
```bash
|
||||
# Check if changed files span multiple services (monorepo)
|
||||
gh pr diff $PR --name-only | cut -d/ -f1-2 | sort -u
|
||||
```
|
||||
|
||||
3. **Shared contracts** — types, interfaces, schemas
|
||||
```bash
|
||||
gh pr diff $PR --name-only | grep -E "types/|interfaces/|schemas/|models/"
|
||||
```
|
||||
|
||||
**Blast radius severity:**
|
||||
- CRITICAL — shared library, DB model, auth middleware, API contract
|
||||
- HIGH — service used by >3 others, shared config, env vars
|
||||
- MEDIUM — single service internal change, utility function
|
||||
- LOW — UI component, test file, docs
|
||||
|
||||
### Step 3 — Security Scan
|
||||
|
||||
```bash
|
||||
DIFF=/tmp/pr-$PR.diff
|
||||
|
||||
# SQL Injection — raw query string interpolation
|
||||
grep -n "query\|execute\|raw(" $DIFF | grep -E '\$\{|f"|%s|format\('
|
||||
|
||||
# Hardcoded secrets
|
||||
grep -nE "(password|secret|api_key|token|private_key)\s*=\s*['\"][^'\"]{8,}" $DIFF
|
||||
|
||||
# AWS key pattern
|
||||
grep -nE "AKIA[0-9A-Z]{16}" $DIFF
|
||||
|
||||
# JWT secret in code
|
||||
grep -nE "jwt\.sign\(.*['\"][^'\"]{20,}['\"]" $DIFF
|
||||
|
||||
# XSS vectors
|
||||
grep -n "dangerouslySetInnerHTML\|innerHTML\s*=" $DIFF
|
||||
|
||||
# Auth bypass patterns
|
||||
grep -n "bypass\|skip.*auth\|noauth\|TODO.*auth" $DIFF
|
||||
|
||||
# Insecure hash algorithms
|
||||
grep -nE "md5\(|sha1\(|createHash\(['\"]md5|createHash\(['\"]sha1" $DIFF
|
||||
|
||||
# eval / exec
|
||||
grep -nE "\beval\(|\bexec\(|\bsubprocess\.call\(" $DIFF
|
||||
|
||||
# Prototype pollution
|
||||
grep -n "__proto__\|constructor\[" $DIFF
|
||||
|
||||
# Path traversal risk
|
||||
grep -nE "path\.join\(.*req\.|readFile\(.*req\." $DIFF
|
||||
```
|
||||
|
||||
### Step 4 — Test Coverage Delta
|
||||
|
||||
```bash
|
||||
# Count source vs test files changed
|
||||
CHANGED_SRC=$(gh pr diff $PR --name-only | grep -vE "\.test\.|\.spec\.|__tests__")
|
||||
CHANGED_TESTS=$(gh pr diff $PR --name-only | grep -E "\.test\.|\.spec\.|__tests__")
|
||||
|
||||
echo "Source files changed: $(echo "$CHANGED_SRC" | wc -w)"
|
||||
echo "Test files changed: $(echo "$CHANGED_TESTS" | wc -w)"
|
||||
|
||||
# Lines of new logic vs new test lines
|
||||
LOGIC_LINES=$(grep "^+" /tmp/pr-$PR.diff | grep -v "^+++" | wc -l)
|
||||
echo "New lines added: $LOGIC_LINES"
|
||||
|
||||
# Run coverage locally
|
||||
npm test -- --coverage --changedSince=main 2>/dev/null | tail -20
|
||||
pytest --cov --cov-report=term-missing 2>/dev/null | tail -20
|
||||
```
|
||||
|
||||
**Coverage delta rules:**
|
||||
- New function without tests → flag
|
||||
- Deleted tests without deleted code → flag
|
||||
- Coverage drop >5% → block merge
|
||||
- Auth/payments paths → require 100% coverage
|
||||
|
||||
### Step 5 — Breaking Change Detection
|
||||
|
||||
#### API Contract Changes
|
||||
```bash
|
||||
# OpenAPI/Swagger spec changes
|
||||
grep -n "openapi\|swagger" /tmp/pr-$PR.diff | head -20
|
||||
|
||||
# REST route removals or renames
|
||||
grep "^-" /tmp/pr-$PR.diff | grep -E "router\.(get|post|put|delete|patch)\("
|
||||
|
||||
# GraphQL schema removals
|
||||
grep "^-" /tmp/pr-$PR.diff | grep -E "^-\s*(type |field |Query |Mutation )"
|
||||
|
||||
# TypeScript interface removals
|
||||
grep "^-" /tmp/pr-$PR.diff | grep -E "^-\s*(export\s+)?(interface|type) "
|
||||
```
|
||||
|
||||
#### DB Schema Changes
|
||||
```bash
|
||||
# Migration files added
|
||||
gh pr diff $PR --name-only | grep -E "migrations?/|alembic/|knex/"
|
||||
|
||||
# Destructive operations
|
||||
grep -E "DROP TABLE|DROP COLUMN|ALTER.*NOT NULL|TRUNCATE" /tmp/pr-$PR.diff
|
||||
|
||||
# Index removals (perf regression risk)
|
||||
grep "DROP INDEX\|remove_index" /tmp/pr-$PR.diff
|
||||
```
|
||||
|
||||
#### Config / Env Var Changes
|
||||
```bash
|
||||
# New env vars referenced in code (might be missing in prod)
|
||||
grep "^+" /tmp/pr-$PR.diff | grep -oE "process\.env\.[A-Z_]+" | sort -u
|
||||
|
||||
# Removed env vars (could break running instances)
|
||||
grep "^-" /tmp/pr-$PR.diff | grep -oE "process\.env\.[A-Z_]+" | sort -u
|
||||
```
|
||||
|
||||
### Step 6 — Performance Impact
|
||||
|
||||
```bash
|
||||
# N+1 query patterns (DB calls inside loops)
|
||||
grep -n "\.find\|\.findOne\|\.query\|db\." /tmp/pr-$PR.diff | grep "^+" | head -20
|
||||
# Then check surrounding context for forEach/map/for loops
|
||||
|
||||
# Heavy new dependencies
|
||||
grep "^+" /tmp/pr-$PR.diff | grep -E '"[a-z@].*":\s*"[0-9^~]' | head -20
|
||||
|
||||
# Unbounded loops
|
||||
grep -n "while (true\|while(true" /tmp/pr-$PR.diff | grep "^+"
|
||||
|
||||
# Missing await (accidentally sequential promises)
|
||||
grep -n "await.*await" /tmp/pr-$PR.diff | grep "^+" | head -10
|
||||
|
||||
# Large in-memory allocations
|
||||
grep -n "new Array([0-9]\{4,\}\|Buffer\.alloc" /tmp/pr-$PR.diff | grep "^+"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Ticket Linking Verification
|
||||
|
||||
```bash
|
||||
# Extract ticket references from PR body
|
||||
gh pr view $PR --json body | jq -r '.body' | \
|
||||
grep -oE "(PROJ-[0-9]+|[A-Z]+-[0-9]+|https://linear\.app/[^)\"]+)" | sort -u
|
||||
|
||||
# Verify Jira ticket exists (requires JIRA_API_TOKEN)
|
||||
TICKET="PROJ-123"
|
||||
curl -s -u "user@company.com:$JIRA_API_TOKEN" \
|
||||
"https://your-org.atlassian.net/rest/api/3/issue/$TICKET" | \
|
||||
jq '{key, summary: .fields.summary, status: .fields.status.name}'
|
||||
|
||||
# Linear ticket
|
||||
LINEAR_ID="abc-123"
|
||||
curl -s -H "Authorization: $LINEAR_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
--data "{\"query\": \"{ issue(id: \\\"$LINEAR_ID\\\") { title state { name } } }\"}" \
|
||||
https://api.linear.app/graphql | jq .
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Review Checklist (30+ Items)
|
||||
|
||||
```markdown
|
||||
## Code Review Checklist
|
||||
|
||||
### Scope & Context
|
||||
- [ ] PR title accurately describes the change
|
||||
- [ ] PR description explains WHY, not just WHAT
|
||||
- [ ] Linked Jira/Linear ticket exists and matches scope
|
||||
- [ ] No unrelated changes (scope creep)
|
||||
- [ ] Breaking changes documented in PR body
|
||||
|
||||
### Blast Radius
|
||||
- [ ] Identified all files importing changed modules
|
||||
- [ ] Cross-service dependencies checked
|
||||
- [ ] Shared types/interfaces/schemas reviewed for breakage
|
||||
- [ ] New env vars documented in .env.example
|
||||
- [ ] DB migrations are reversible (have down() / rollback)
|
||||
|
||||
### Security
|
||||
- [ ] No hardcoded secrets or API keys
|
||||
- [ ] SQL queries use parameterized inputs (no string interpolation)
|
||||
- [ ] User inputs validated/sanitized before use
|
||||
- [ ] Auth/authorization checks on all new endpoints
|
||||
- [ ] No XSS vectors (innerHTML, dangerouslySetInnerHTML)
|
||||
- [ ] New dependencies checked for known CVEs
|
||||
- [ ] No sensitive data in logs (PII, tokens, passwords)
|
||||
- [ ] File uploads validated (type, size, content-type)
|
||||
- [ ] CORS configured correctly for new endpoints
|
||||
|
||||
### Testing
|
||||
- [ ] New public functions have unit tests
|
||||
- [ ] Edge cases covered (empty, null, max values)
|
||||
- [ ] Error paths tested (not just happy path)
|
||||
- [ ] Integration tests for API endpoint changes
|
||||
- [ ] No tests deleted without clear reason
|
||||
- [ ] Test names clearly describe what they verify
|
||||
|
||||
### Breaking Changes
|
||||
- [ ] No API endpoints removed without deprecation notice
|
||||
- [ ] No required fields added to existing API responses
|
||||
- [ ] No DB columns removed without two-phase migration plan
|
||||
- [ ] No env vars removed that may be set in production
|
||||
- [ ] Backward-compatible for external API consumers
|
||||
|
||||
### Performance
|
||||
- [ ] No N+1 query patterns introduced
|
||||
- [ ] DB indexes added for new query patterns
|
||||
- [ ] No unbounded loops on potentially large datasets
|
||||
- [ ] No heavy new dependencies without justification
|
||||
- [ ] Async operations correctly awaited
|
||||
- [ ] Caching considered for expensive repeated operations
|
||||
|
||||
### Code Quality
|
||||
- [ ] No dead code or unused imports
|
||||
- [ ] Error handling present (no bare empty catch blocks)
|
||||
- [ ] Consistent with existing patterns and conventions
|
||||
- [ ] Complex logic has explanatory comments
|
||||
- [ ] No unresolved TODOs (or tracked in ticket)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output Format
|
||||
|
||||
Structure your review comment as:
|
||||
|
||||
```
|
||||
## PR Review: [PR Title] (#NUMBER)
|
||||
|
||||
Blast Radius: HIGH — changes lib/auth used by 5 services
|
||||
Security: 1 finding (medium severity)
|
||||
Tests: Coverage delta +2%
|
||||
Breaking Changes: None detected
|
||||
|
||||
--- MUST FIX (Blocking) ---
|
||||
|
||||
1. SQL Injection risk in src/db/users.ts:42
|
||||
Raw string interpolation in WHERE clause.
|
||||
Fix: db.query("SELECT * WHERE id = $1", [userId])
|
||||
|
||||
--- SHOULD FIX (Non-blocking) ---
|
||||
|
||||
2. Missing auth check on POST /api/admin/reset
|
||||
No role verification before destructive operation.
|
||||
|
||||
--- SUGGESTIONS ---
|
||||
|
||||
3. N+1 pattern in src/services/reports.ts:88
|
||||
findUser() called inside results.map() — batch with findManyUsers(ids)
|
||||
|
||||
--- LOOKS GOOD ---
|
||||
- Test coverage for new auth flow is thorough
|
||||
- DB migration has proper down() rollback method
|
||||
- Error handling consistent with rest of codebase
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
- **Reviewing style over substance** — let the linter handle style; focus on logic, security, correctness
|
||||
- **Missing blast radius** — a 5-line change in a shared utility can break 20 services
|
||||
- **Approving untested happy paths** — always verify error paths have coverage
|
||||
- **Ignoring migration risk** — NOT NULL additions need a default or two-phase migration
|
||||
- **Indirect secret exposure** — secrets in error messages/logs, not just hardcoded values
|
||||
- **Skipping large PRs** — if a PR is too large to review properly, request it be split
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. Read the linked ticket before looking at code — context prevents false positives
|
||||
2. Check CI status before reviewing — don't review code that fails to build
|
||||
3. Prioritize blast radius and security over style
|
||||
4. Reproduce locally for non-trivial auth or performance changes
|
||||
5. Label each comment clearly: "nit:", "must:", "question:", "suggestion:"
|
||||
6. Batch all comments in one review round — don't trickle feedback
|
||||
7. Acknowledge good patterns, not just problems — specific praise improves culture
|
||||
410
engineering/runbook-generator/SKILL.md
Normal file
410
engineering/runbook-generator/SKILL.md
Normal file
@@ -0,0 +1,410 @@
|
||||
# Runbook Generator
|
||||
|
||||
**Tier:** POWERFUL
|
||||
**Category:** Engineering
|
||||
**Domain:** DevOps / Site Reliability Engineering
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Analyze a codebase and generate production-grade operational runbooks. Detects your stack (CI/CD, database, hosting, containers), then produces step-by-step runbooks with copy-paste commands, verification checks, rollback procedures, escalation paths, and time estimates. Keeps runbooks fresh with staleness detection linked to config file modification dates.
|
||||
|
||||
---
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Stack detection** — auto-identify CI/CD, database, hosting, orchestration from repo files
|
||||
- **Runbook types** — deployment, incident response, database maintenance, scaling, monitoring setup
|
||||
- **Format discipline** — numbered steps, copy-paste commands, ✅ verification checks, time estimates
|
||||
- **Escalation paths** — L1 → L2 → L3 with contact info and decision criteria
|
||||
- **Rollback procedures** — every deployment step has a corresponding undo
|
||||
- **Staleness detection** — runbook sections reference config files; flag when source changes
|
||||
- **Testing methodology** — dry-run framework for staging validation, quarterly review cadence
|
||||
|
||||
---
|
||||
|
||||
## When to Use
|
||||
|
||||
Use when:
|
||||
- A codebase has no runbooks and you need to bootstrap them fast
|
||||
- Existing runbooks are outdated or incomplete (point at the repo, regenerate)
|
||||
- Onboarding a new engineer who needs clear operational procedures
|
||||
- Preparing for an incident response drill or audit
|
||||
- Setting up monitoring and on-call rotation from scratch
|
||||
|
||||
Skip when:
|
||||
- The system is too early-stage to have stable operational patterns
|
||||
- Runbooks already exist and only need minor updates (edit directly)
|
||||
|
||||
---
|
||||
|
||||
## Stack Detection
|
||||
|
||||
When given a repo, scan for these signals before writing a single runbook line:
|
||||
|
||||
```bash
|
||||
# CI/CD
|
||||
ls .github/workflows/ → GitHub Actions
|
||||
ls .gitlab-ci.yml → GitLab CI
|
||||
ls Jenkinsfile → Jenkins
|
||||
ls .circleci/ → CircleCI
|
||||
ls bitbucket-pipelines.yml → Bitbucket Pipelines
|
||||
|
||||
# Database
|
||||
grep -r "postgresql\|postgres\|pg" package.json pyproject.toml → PostgreSQL
|
||||
grep -r "mysql\|mariadb" package.json → MySQL
|
||||
grep -r "mongodb\|mongoose" package.json → MongoDB
|
||||
grep -r "redis" package.json → Redis
|
||||
ls prisma/schema.prisma → Prisma ORM (check provider field)
|
||||
ls drizzle.config.* → Drizzle ORM
|
||||
|
||||
# Hosting
|
||||
ls vercel.json → Vercel
|
||||
ls railway.toml → Railway
|
||||
ls fly.toml → Fly.io
|
||||
ls .ebextensions/ → AWS Elastic Beanstalk
|
||||
ls terraform/ ls *.tf → Custom AWS/GCP/Azure (check provider)
|
||||
ls kubernetes/ ls k8s/ → Kubernetes
|
||||
ls docker-compose.yml → Docker Compose
|
||||
|
||||
# Framework
|
||||
ls next.config.* → Next.js
|
||||
ls nuxt.config.* → Nuxt
|
||||
ls svelte.config.* → SvelteKit
|
||||
cat package.json | jq '.scripts' → Check build/start commands
|
||||
```
|
||||
|
||||
Map detected stack → runbook templates. A Next.js + PostgreSQL + Vercel + GitHub Actions repo needs:
|
||||
- Deployment runbook (Vercel + GitHub Actions)
|
||||
- Database runbook (PostgreSQL backup, migration, vacuum)
|
||||
- Incident response (with Vercel logs + pg query debugging)
|
||||
- Monitoring setup (Vercel Analytics, pg_stat, alerting)
|
||||
|
||||
---
|
||||
|
||||
## Runbook Types
|
||||
|
||||
### 1. Deployment Runbook
|
||||
|
||||
```markdown
|
||||
# Deployment Runbook — [App Name]
|
||||
**Stack:** Next.js 14 + PostgreSQL 15 + Vercel
|
||||
**Last verified:** 2025-03-01
|
||||
**Source configs:** vercel.json (modified: git log -1 --format=%ci -- vercel.json)
|
||||
**Owner:** Platform Team
|
||||
**Est. total time:** 15–25 min
|
||||
|
||||
---
|
||||
|
||||
## Pre-deployment Checklist
|
||||
- [ ] All PRs merged to main
|
||||
- [ ] CI passing on main (GitHub Actions green)
|
||||
- [ ] Database migrations tested in staging
|
||||
- [ ] Rollback plan confirmed
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1 — Run CI checks locally (3 min)
|
||||
```bash
|
||||
pnpm test
|
||||
pnpm lint
|
||||
pnpm build
|
||||
```
|
||||
✅ Expected: All pass with 0 errors. Build output in `.next/`
|
||||
|
||||
### Step 2 — Apply database migrations (5 min)
|
||||
```bash
|
||||
# Staging first
|
||||
DATABASE_URL=$STAGING_DATABASE_URL npx prisma migrate deploy
|
||||
```
|
||||
✅ Expected: `All migrations have been successfully applied.`
|
||||
|
||||
```bash
|
||||
# Verify migration applied
|
||||
psql $STAGING_DATABASE_URL -c "\d" | grep -i migration
|
||||
```
|
||||
✅ Expected: Migration table shows new entry with today's date
|
||||
|
||||
### Step 3 — Deploy to production (5 min)
|
||||
```bash
|
||||
git push origin main
|
||||
# OR trigger manually:
|
||||
vercel --prod
|
||||
```
|
||||
✅ Expected: Vercel dashboard shows deployment in progress. URL format:
|
||||
`https://app-name-<hash>-team.vercel.app`
|
||||
|
||||
### Step 4 — Smoke test production (5 min)
|
||||
```bash
|
||||
# Health check
|
||||
curl -sf https://your-app.vercel.app/api/health | jq .
|
||||
|
||||
# Critical path
|
||||
curl -sf https://your-app.vercel.app/api/users/me \
|
||||
-H "Authorization: Bearer $TEST_TOKEN" | jq '.id'
|
||||
```
|
||||
✅ Expected: health returns `{"status":"ok","db":"connected"}`. Users API returns valid ID.
|
||||
|
||||
### Step 5 — Monitor for 10 min
|
||||
- Check Vercel Functions log for errors: `vercel logs --since=10m`
|
||||
- Check error rate in Vercel Analytics: < 1% 5xx
|
||||
- Check DB connection pool: `SELECT count(*) FROM pg_stat_activity;` (< 80% of max_connections)
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
If smoke tests fail or error rate spikes:
|
||||
|
||||
```bash
|
||||
# Instant rollback via Vercel (preferred — < 30 sec)
|
||||
vercel rollback [previous-deployment-url]
|
||||
|
||||
# Database rollback (only if migration was applied)
|
||||
DATABASE_URL=$PROD_DATABASE_URL npx prisma migrate reset --skip-seed
|
||||
# WARNING: This resets to previous migration. Confirm data impact first.
|
||||
```
|
||||
|
||||
✅ Expected after rollback: Previous deployment URL becomes active. Verify with smoke test.
|
||||
|
||||
---
|
||||
|
||||
## Escalation
|
||||
- **L1 (on-call engineer):** Check Vercel logs, run smoke tests, attempt rollback
|
||||
- **L2 (platform lead):** DB issues, data loss risk, rollback failed — Slack: @platform-lead
|
||||
- **L3 (CTO):** Production down > 30 min, data breach — PagerDuty: #critical-incidents
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Incident Response Runbook
|
||||
|
||||
```markdown
|
||||
# Incident Response Runbook
|
||||
**Severity levels:** P1 (down), P2 (degraded), P3 (minor)
|
||||
**Est. total time:** P1: 30–60 min, P2: 1–4 hours
|
||||
|
||||
## Phase 1 — Triage (5 min)
|
||||
|
||||
### Confirm the incident
|
||||
```bash
|
||||
# Is the app responding?
|
||||
curl -sw "%{http_code}" https://your-app.vercel.app/api/health -o /dev/null
|
||||
|
||||
# Check Vercel function errors (last 15 min)
|
||||
vercel logs --since=15m | grep -i "error\|exception\|5[0-9][0-9]"
|
||||
```
|
||||
✅ 200 = app up. 5xx or timeout = incident confirmed.
|
||||
|
||||
Declare severity:
|
||||
- Site completely down → P1 — page L2/L3 immediately
|
||||
- Partial degradation / slow responses → P2 — notify team channel
|
||||
- Single feature broken → P3 — create ticket, fix in business hours
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Diagnose (10–15 min)
|
||||
|
||||
```bash
|
||||
# Recent deployments — did something just ship?
|
||||
vercel ls --limit=5
|
||||
|
||||
# Database health
|
||||
psql $DATABASE_URL -c "SELECT pid, state, wait_event, query FROM pg_stat_activity WHERE state != 'idle' LIMIT 20;"
|
||||
|
||||
# Long-running queries (> 30 sec)
|
||||
psql $DATABASE_URL -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '30 seconds';"
|
||||
|
||||
# Connection pool saturation
|
||||
psql $DATABASE_URL -c "SELECT count(*), max_conn FROM pg_stat_activity, (SELECT setting::int AS max_conn FROM pg_settings WHERE name='max_connections') t GROUP BY max_conn;"
|
||||
```
|
||||
|
||||
Diagnostic decision tree:
|
||||
- Recent deploy + new errors → rollback (see Deployment Runbook)
|
||||
- DB query timeout / pool saturation → kill long queries, scale connections
|
||||
- External dependency failing → check status pages, add circuit breaker
|
||||
- Memory/CPU spike → check Vercel function logs for infinite loops
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Mitigate (variable)
|
||||
|
||||
```bash
|
||||
# Kill a runaway DB query
|
||||
psql $DATABASE_URL -c "SELECT pg_terminate_backend(<pid>);"
|
||||
|
||||
# Scale DB connections (Supabase/Neon — adjust pool size)
|
||||
# Vercel → Settings → Environment Variables → update DATABASE_POOL_MAX
|
||||
|
||||
# Enable maintenance mode (if you have a feature flag)
|
||||
vercel env add MAINTENANCE_MODE true production
|
||||
vercel --prod # redeploy with flag
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Resolve & Postmortem
|
||||
|
||||
After incident is resolved, within 24 hours:
|
||||
|
||||
1. Write incident timeline (what happened, when, who noticed, what fixed it)
|
||||
2. Identify root cause (5-Whys)
|
||||
3. Define action items with owners and due dates
|
||||
4. Update this runbook if a step was missing or wrong
|
||||
5. Add monitoring/alert that would have caught this earlier
|
||||
|
||||
**Postmortem template:** `docs/postmortems/YYYY-MM-DD-incident-title.md`
|
||||
|
||||
---
|
||||
|
||||
## Escalation Path
|
||||
|
||||
| Level | Who | When | Contact |
|
||||
|-------|-----|------|---------|
|
||||
| L1 | On-call engineer | Always first | PagerDuty rotation |
|
||||
| L2 | Platform lead | DB issues, rollback needed | Slack @platform-lead |
|
||||
| L3 | CTO/VP Eng | P1 > 30 min, data loss | Phone + PagerDuty |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Database Maintenance Runbook
|
||||
|
||||
```markdown
|
||||
# Database Maintenance Runbook — PostgreSQL
|
||||
**Schedule:** Weekly vacuum (automated), monthly manual review
|
||||
|
||||
## Backup
|
||||
|
||||
```bash
|
||||
# Full backup
|
||||
pg_dump $DATABASE_URL \
|
||||
--format=custom \
|
||||
--compress=9 \
|
||||
--file="backup-$(date +%Y%m%d-%H%M%S).dump"
|
||||
```
|
||||
✅ Expected: File created, size > 0. `pg_restore --list backup.dump | head -20` shows tables.
|
||||
|
||||
Verify backup is restorable (test monthly):
|
||||
```bash
|
||||
pg_restore --dbname=$STAGING_DATABASE_URL backup.dump
|
||||
psql $STAGING_DATABASE_URL -c "SELECT count(*) FROM users;"
|
||||
```
|
||||
✅ Expected: Row count matches production.
|
||||
|
||||
## Migration
|
||||
|
||||
```bash
|
||||
# Always test in staging first
|
||||
DATABASE_URL=$STAGING_DATABASE_URL npx prisma migrate deploy
|
||||
# Verify, then:
|
||||
DATABASE_URL=$PROD_DATABASE_URL npx prisma migrate deploy
|
||||
```
|
||||
✅ Expected: `All migrations have been successfully applied.`
|
||||
|
||||
⚠️ For large table migrations (> 1M rows), use `pg_repack` or add column with DEFAULT separately to avoid table locks.
|
||||
|
||||
## Vacuum & Reindex
|
||||
|
||||
```bash
|
||||
# Check bloat before deciding
|
||||
psql $DATABASE_URL -c "
|
||||
SELECT schemaname, tablename,
|
||||
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
|
||||
n_dead_tup, n_live_tup,
|
||||
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_ratio
|
||||
FROM pg_stat_user_tables
|
||||
ORDER BY n_dead_tup DESC LIMIT 10;"
|
||||
|
||||
# Vacuum high-bloat tables (non-blocking)
|
||||
psql $DATABASE_URL -c "VACUUM ANALYZE users;"
|
||||
psql $DATABASE_URL -c "VACUUM ANALYZE events;"
|
||||
|
||||
# Reindex (use CONCURRENTLY to avoid locks)
|
||||
psql $DATABASE_URL -c "REINDEX INDEX CONCURRENTLY users_email_idx;"
|
||||
```
|
||||
✅ Expected: dead_ratio drops below 5% after vacuum.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Staleness Detection
|
||||
|
||||
Add a staleness header to every runbook:
|
||||
|
||||
```markdown
|
||||
## Staleness Check
|
||||
This runbook references the following config files. If they've changed since the
|
||||
"Last verified" date, review the affected steps.
|
||||
|
||||
| Config File | Last Modified | Affects Steps |
|
||||
|-------------|--------------|---------------|
|
||||
| vercel.json | `git log -1 --format=%ci -- vercel.json` | Step 3, Rollback |
|
||||
| prisma/schema.prisma | `git log -1 --format=%ci -- prisma/schema.prisma` | Step 2, DB Maintenance |
|
||||
| .github/workflows/deploy.yml | `git log -1 --format=%ci -- .github/workflows/deploy.yml` | Step 1, Step 3 |
|
||||
| docker-compose.yml | `git log -1 --format=%ci -- docker-compose.yml` | All scaling steps |
|
||||
```
|
||||
|
||||
**Automation:** Add a CI job that runs weekly and comments on the runbook doc if any referenced file was modified more recently than the runbook's "Last verified" date.
|
||||
|
||||
---
|
||||
|
||||
## Runbook Testing Methodology
|
||||
|
||||
### Dry-Run in Staging
|
||||
|
||||
Before trusting a runbook in production, validate every step in staging:
|
||||
|
||||
```bash
|
||||
# 1. Create a staging environment mirror
|
||||
vercel env pull .env.staging
|
||||
source .env.staging
|
||||
|
||||
# 2. Run each step with staging credentials
|
||||
# Replace all $DATABASE_URL with $STAGING_DATABASE_URL
|
||||
# Replace all production URLs with staging URLs
|
||||
|
||||
# 3. Verify expected outputs match
|
||||
# Document any discrepancies and update the runbook
|
||||
|
||||
# 4. Time each step — update estimates in the runbook
|
||||
time npx prisma migrate deploy
|
||||
```
|
||||
|
||||
### Quarterly Review Cadence
|
||||
|
||||
Schedule a 1-hour review every quarter:
|
||||
|
||||
1. **Run each command** in staging — does it still work?
|
||||
2. **Check config drift** — compare "Last Modified" dates vs "Last verified"
|
||||
3. **Test rollback procedures** — actually roll back in staging
|
||||
4. **Update contact info** — L1/L2/L3 may have changed
|
||||
5. **Add new failure modes** discovered in the past quarter
|
||||
6. **Update "Last verified" date** at top of runbook
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
| Pitfall | Fix |
|
||||
|---|---|
|
||||
| Commands that require manual copy of dynamic values | Use env vars — `$DATABASE_URL` not `postgres://user:pass@host/db` |
|
||||
| No expected output specified | Add ✅ with exact expected string after every verification step |
|
||||
| Rollback steps missing | Every destructive step needs a corresponding undo |
|
||||
| Runbooks that never get tested | Schedule quarterly staging dry-runs in team calendar |
|
||||
| L3 escalation contact is the former CTO | Review contact info every quarter |
|
||||
| Migration runbook doesn't mention table locks | Call out lock risk for large table operations explicitly |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Every command must be copy-pasteable** — no placeholder text, use env vars
|
||||
2. **✅ after every step** — explicit expected output, not "it should work"
|
||||
3. **Time estimates are mandatory** — engineers need to know if they have time to fix before SLA breach
|
||||
4. **Rollback before you deploy** — plan the undo before executing
|
||||
5. **Runbooks live in the repo** — `docs/runbooks/`, versioned with the code they describe
|
||||
6. **Postmortem → runbook update** — every incident should improve a runbook
|
||||
7. **Link, don't duplicate** — reference the canonical config file, don't copy its contents into the runbook
|
||||
8. **Test runbooks like you test code** — untested runbooks are worse than no runbooks (false confidence)
|
||||
Reference in New Issue
Block a user