- Codex CLI: 174 skills synced, 11 new symlinks - Gemini CLI: 262 items synced, 11 new - engineering plugin.json: 33 → 35 skills - engineering-team plugin.json: 28 → 29 skills - Docs regenerated: 261 pages (214 skills + 25 agents + 22 commands) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
415 lines
14 KiB
Markdown
415 lines
14 KiB
Markdown
---
|
|
title: "TDD Guide — Agent Skill & Codex Plugin"
|
|
description: "Test-driven development skill for writing unit tests, generating test fixtures and mocks, analyzing coverage gaps, and guiding red-green-refactor. Agent skill for Claude Code, Codex CLI, Gemini CLI, OpenClaw."
|
|
---
|
|
|
|
# TDD Guide
|
|
|
|
<div class="page-meta" markdown>
|
|
<span class="meta-badge">:material-code-braces: Engineering - Core</span>
|
|
<span class="meta-badge">:material-identifier: `tdd-guide`</span>
|
|
<span class="meta-badge">:material-github: <a href="https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/tdd-guide/SKILL.md">Source</a></span>
|
|
</div>
|
|
|
|
<div class="install-banner" markdown>
|
|
<span class="install-label">Install:</span> <code>claude /plugin install engineering-skills</code>
|
|
</div>
|
|
|
|
|
|
Test-driven development skill for generating tests, analyzing coverage, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, and Vitest.
|
|
|
|
---
|
|
|
|
## Workflows
|
|
|
|
### Generate Tests from Code
|
|
|
|
1. Provide source code (TypeScript, JavaScript, Python, Java)
|
|
2. Specify target framework (Jest, Pytest, JUnit, Vitest)
|
|
3. Run `test_generator.py` with requirements
|
|
4. Review generated test stubs
|
|
5. **Validation:** Tests compile and cover happy path, error cases, edge cases
|
|
|
|
### Analyze Coverage Gaps
|
|
|
|
1. Generate coverage report from test runner (`npm test -- --coverage`)
|
|
2. Run `coverage_analyzer.py` on LCOV/JSON/XML report
|
|
3. Review prioritized gaps (P0/P1/P2)
|
|
4. Generate missing tests for uncovered paths
|
|
5. **Validation:** Coverage meets target threshold (typically 80%+)
|
|
|
|
### TDD New Feature
|
|
|
|
1. Write failing test first (RED)
|
|
2. Run `tdd_workflow.py --phase red` to validate
|
|
3. Implement minimal code to pass (GREEN)
|
|
4. Run `tdd_workflow.py --phase green` to validate
|
|
5. Refactor while keeping tests green (REFACTOR)
|
|
6. **Validation:** All tests pass after each cycle
|
|
|
|
---
|
|
|
|
## Examples
|
|
|
|
### Test Generation — Input → Output (Pytest)
|
|
|
|
**Input source function (`math_utils.py`):**
|
|
```python
|
|
def divide(a: float, b: float) -> float:
|
|
if b == 0:
|
|
raise ValueError("Cannot divide by zero")
|
|
return a / b
|
|
```
|
|
|
|
**Command:**
|
|
```bash
|
|
python scripts/test_generator.py --input math_utils.py --framework pytest
|
|
```
|
|
|
|
**Generated test output (`test_math_utils.py`):**
|
|
```python
|
|
import pytest
|
|
from math_utils import divide
|
|
|
|
class TestDivide:
|
|
def test_divide_positive_numbers(self):
|
|
assert divide(10, 2) == 5.0
|
|
|
|
def test_divide_negative_numerator(self):
|
|
assert divide(-10, 2) == -5.0
|
|
|
|
def test_divide_float_result(self):
|
|
assert divide(1, 3) == pytest.approx(0.333, rel=1e-3)
|
|
|
|
def test_divide_by_zero_raises_value_error(self):
|
|
with pytest.raises(ValueError, match="Cannot divide by zero"):
|
|
divide(10, 0)
|
|
|
|
def test_divide_zero_numerator(self):
|
|
assert divide(0, 5) == 0.0
|
|
```
|
|
|
|
---
|
|
|
|
### Coverage Analysis — Sample P0/P1/P2 Output
|
|
|
|
**Command:**
|
|
```bash
|
|
python scripts/coverage_analyzer.py --report lcov.info --threshold 80
|
|
```
|
|
|
|
**Sample output:**
|
|
```
|
|
Coverage Report — Overall: 63% (threshold: 80%)
|
|
|
|
P0 — Critical gaps (uncovered error paths):
|
|
auth/login.py:42-58 handle_expired_token() 0% covered
|
|
payments/process.py:91-110 handle_payment_failure() 0% covered
|
|
|
|
P1 — High-value gaps (core logic branches):
|
|
users/service.py:77 update_profile() — else branch 0% covered
|
|
orders/cart.py:134 apply_discount() — zero-qty guard 0% covered
|
|
|
|
P2 — Low-risk gaps (utility / helper functions):
|
|
utils/formatting.py:12 format_currency() 0% covered
|
|
|
|
Recommended: Generate tests for P0 items first to reach 80% threshold.
|
|
```
|
|
|
|
---
|
|
|
|
## Key Tools
|
|
|
|
| Tool | Purpose | Usage |
|
|
|------|---------|-------|
|
|
| `test_generator.py` | Generate test cases from code/requirements | `python scripts/test_generator.py --input source.py --framework pytest` |
|
|
| `coverage_analyzer.py` | Parse and analyze coverage reports | `python scripts/coverage_analyzer.py --report lcov.info --threshold 80` |
|
|
| `tdd_workflow.py` | Guide red-green-refactor cycles | `python scripts/tdd_workflow.py --phase red --test test_auth.py` |
|
|
| `fixture_generator.py` | Generate test data and mocks | `python scripts/fixture_generator.py --entity User --count 5` |
|
|
|
|
Additional scripts: `framework_adapter.py` (convert between frameworks), `metrics_calculator.py` (quality metrics), `format_detector.py` (detect language/framework), `output_formatter.py` (CLI/desktop/CI output).
|
|
|
|
---
|
|
|
|
## Input Requirements
|
|
|
|
**For Test Generation:**
|
|
- Source code (file path or pasted content)
|
|
- Target framework (Jest, Pytest, JUnit, Vitest)
|
|
- Coverage scope (unit, integration, edge cases)
|
|
|
|
**For Coverage Analysis:**
|
|
- Coverage report file (LCOV, JSON, or XML format)
|
|
- Optional: Source code for context
|
|
- Optional: Target threshold percentage
|
|
|
|
**For TDD Workflow:**
|
|
- Feature requirements or user story
|
|
- Current phase (RED, GREEN, REFACTOR)
|
|
- Test code and implementation status
|
|
|
|
---
|
|
|
|
## Spec-First Workflow
|
|
|
|
TDD is most effective when driven by a written spec. The flow:
|
|
|
|
1. **Write or receive a spec** — stored in `specs/<feature>.md`
|
|
2. **Extract acceptance criteria** — each criterion becomes one or more test cases
|
|
3. **Write failing tests (RED)** — one test per acceptance criterion
|
|
4. **Implement minimal code (GREEN)** — satisfy each test in order
|
|
5. **Refactor** — clean up while all tests stay green
|
|
|
|
### Spec Directory Convention
|
|
|
|
```
|
|
project/
|
|
├── specs/
|
|
│ ├── user-auth.md # Feature spec with acceptance criteria
|
|
│ ├── payment-processing.md
|
|
│ └── notification-system.md
|
|
├── tests/
|
|
│ ├── test_user_auth.py # Tests derived from specs/user-auth.md
|
|
│ ├── test_payments.py
|
|
│ └── test_notifications.py
|
|
└── src/
|
|
```
|
|
|
|
### Extracting Tests from Specs
|
|
|
|
Each acceptance criterion in a spec maps to at least one test:
|
|
|
|
| Spec Criterion | Test Case |
|
|
|---------------|-----------|
|
|
| "User can log in with valid credentials" | `test_login_valid_credentials_returns_token` |
|
|
| "Invalid password returns 401" | `test_login_invalid_password_returns_401` |
|
|
| "Account locks after 5 failed attempts" | `test_login_locks_after_five_failures` |
|
|
|
|
**Tip:** Number your acceptance criteria in the spec. Reference the number in the test docstring for traceability (`# AC-3: Account locks after 5 failed attempts`).
|
|
|
|
> **Cross-reference:** See `engineering/spec-driven-workflow` for the full spec methodology, including spec templates and review checklists.
|
|
|
|
---
|
|
|
|
## Red-Green-Refactor Examples Per Language
|
|
|
|
### TypeScript / Jest
|
|
|
|
```typescript
|
|
// test/cart.test.ts
|
|
describe("Cart", () => {
|
|
describe("addItem", () => {
|
|
it("should add a new item to an empty cart", () => {
|
|
const cart = new Cart();
|
|
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
|
|
|
|
expect(cart.items).toHaveLength(1);
|
|
expect(cart.items[0].id).toBe("sku-1");
|
|
});
|
|
|
|
it("should increment quantity when adding an existing item", () => {
|
|
const cart = new Cart();
|
|
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });
|
|
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 2 });
|
|
|
|
expect(cart.items).toHaveLength(1);
|
|
expect(cart.items[0].qty).toBe(3);
|
|
});
|
|
|
|
it("should throw when quantity is zero or negative", () => {
|
|
const cart = new Cart();
|
|
expect(() =>
|
|
cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 0 })
|
|
).toThrow("Quantity must be positive");
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
### Python / Pytest (Advanced Patterns)
|
|
|
|
```python
|
|
# tests/conftest.py — shared fixtures
|
|
import pytest
|
|
from app.db import create_engine, Session
|
|
|
|
@pytest.fixture(scope="session")
|
|
def db_engine():
|
|
engine = create_engine("sqlite:///:memory:")
|
|
yield engine
|
|
engine.dispose()
|
|
|
|
@pytest.fixture
|
|
def db_session(db_engine):
|
|
session = Session(bind=db_engine)
|
|
yield session
|
|
session.rollback()
|
|
session.close()
|
|
|
|
# tests/test_pricing.py — parametrize for multiple cases
|
|
import pytest
|
|
from app.pricing import calculate_discount
|
|
|
|
@pytest.mark.parametrize("subtotal, expected_discount", [
|
|
(50.0, 0.0), # Below threshold — no discount
|
|
(100.0, 5.0), # 5% tier
|
|
(250.0, 25.0), # 10% tier
|
|
(500.0, 75.0), # 15% tier
|
|
])
|
|
def test_calculate_discount(subtotal, expected_discount):
|
|
assert calculate_discount(subtotal) == pytest.approx(expected_discount)
|
|
```
|
|
|
|
### Go — Table-Driven Tests
|
|
|
|
```go
|
|
// cart_test.go
|
|
package cart
|
|
|
|
import "testing"
|
|
|
|
func TestApplyDiscount(t *testing.T) {
|
|
tests := []struct {
|
|
name string
|
|
subtotal float64
|
|
want float64
|
|
}{
|
|
{"no discount below threshold", 50.0, 0.0},
|
|
{"5 percent tier", 100.0, 5.0},
|
|
{"10 percent tier", 250.0, 25.0},
|
|
{"15 percent tier", 500.0, 75.0},
|
|
{"zero subtotal", 0.0, 0.0},
|
|
}
|
|
|
|
for _, tt := range tests {
|
|
t.Run(tt.name, func(t *testing.T) {
|
|
got := ApplyDiscount(tt.subtotal)
|
|
if got != tt.want {
|
|
t.Errorf("ApplyDiscount(%v) = %v, want %v", tt.subtotal, got, tt.want)
|
|
}
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Bounded Autonomy Rules
|
|
|
|
When generating tests autonomously, follow these rules to decide when to stop and ask the user:
|
|
|
|
### Stop and Ask When
|
|
|
|
- **Ambiguous requirements** — the spec or user story has conflicting or unclear acceptance criteria
|
|
- **Missing edge cases** — you cannot determine boundary values without domain knowledge (e.g., max allowed transaction amount)
|
|
- **Test count exceeds 50** — large test suites need human review before committing; present a summary and ask which areas to prioritize
|
|
- **External dependencies unclear** — the feature relies on third-party APIs or services with undocumented behavior
|
|
- **Security-sensitive logic** — authentication, authorization, encryption, or payment flows require human sign-off on test scenarios
|
|
|
|
### Continue Autonomously When
|
|
|
|
- **Clear spec with numbered acceptance criteria** — each criterion maps directly to tests
|
|
- **Straightforward CRUD operations** — create, read, update, delete with well-defined models
|
|
- **Well-defined API contracts** — OpenAPI spec or typed interfaces available
|
|
- **Pure functions** — deterministic input/output with no side effects
|
|
- **Existing test patterns** — the codebase already has similar tests to follow
|
|
|
|
---
|
|
|
|
## Property-Based Testing
|
|
|
|
Property-based testing generates random inputs to verify invariants instead of relying on hand-picked examples. Use it when the input space is large and the expected behavior can be described as a property.
|
|
|
|
### Python — Hypothesis
|
|
|
|
```python
|
|
from hypothesis import given, strategies as st
|
|
from app.serializers import serialize, deserialize
|
|
|
|
@given(st.text())
|
|
def test_roundtrip_serialization(data):
|
|
"""Serialization followed by deserialization returns the original."""
|
|
assert deserialize(serialize(data)) == data
|
|
|
|
@given(st.integers(), st.integers())
|
|
def test_addition_is_commutative(a, b):
|
|
assert a + b == b + a
|
|
```
|
|
|
|
### TypeScript — fast-check
|
|
|
|
```typescript
|
|
import fc from "fast-check";
|
|
import { encode, decode } from "./codec";
|
|
|
|
test("encode/decode roundtrip", () => {
|
|
fc.assert(
|
|
fc.property(fc.string(), (input) => {
|
|
expect(decode(encode(input))).toBe(input);
|
|
})
|
|
);
|
|
});
|
|
```
|
|
|
|
### When to Use Property-Based Over Example-Based
|
|
|
|
| Use Property-Based | Example |
|
|
|-------------------|---------|
|
|
| Data transformations | Serialize/deserialize roundtrips |
|
|
| Mathematical properties | Commutativity, associativity, idempotency |
|
|
| Encoding/decoding | Base64, URL encoding, compression |
|
|
| Sorting and filtering | Output is sorted, length preserved |
|
|
| Parser correctness | Valid input always parses without error |
|
|
|
|
---
|
|
|
|
## Mutation Testing
|
|
|
|
Mutation testing modifies your production code (creates "mutants") and checks whether your tests catch the changes. If a mutant survives (tests still pass), your tests have a gap that coverage alone cannot reveal.
|
|
|
|
### Tools
|
|
|
|
| Language | Tool | Command |
|
|
|----------|------|---------|
|
|
| TypeScript/JavaScript | **Stryker** | `npx stryker run` |
|
|
| Python | **mutmut** | `mutmut run --paths-to-mutate=src/` |
|
|
| Java | **PIT** | `mvn org.pitest:pitest-maven:mutationCoverage` |
|
|
|
|
### Why Mutation Testing Matters
|
|
|
|
- **100% line coverage != good tests** — coverage tells you code was executed, not that it was verified
|
|
- **Catches weak assertions** — tests that run code but assert nothing meaningful
|
|
- **Finds missing boundary tests** — mutants that change `<` to `<=` expose off-by-one gaps
|
|
- **Quantifiable quality metric** — mutation score (% mutants killed) is a stronger signal than coverage %
|
|
|
|
**Recommendation:** Run mutation testing on critical paths (auth, payments, data processing) even if overall coverage is high. Target 85%+ mutation score on P0 modules.
|
|
|
|
---
|
|
|
|
## Cross-References
|
|
|
|
| Skill | Relationship |
|
|
|-------|-------------|
|
|
| `engineering/spec-driven-workflow` | Spec → acceptance criteria → test extraction pipeline |
|
|
| `engineering-team/focused-fix` | Phase 5 (Verify) uses TDD to confirm the fix with a regression test |
|
|
| `engineering-team/senior-qa` | Broader QA strategy; TDD is one layer in the test pyramid |
|
|
| `engineering-team/code-reviewer` | Review generated tests for assertion quality and coverage completeness |
|
|
| `engineering-team/senior-fullstack` | Project scaffolders include testing infrastructure compatible with TDD workflows |
|
|
|
|
---
|
|
|
|
## Limitations
|
|
|
|
| Scope | Details |
|
|
|-------|---------|
|
|
| Unit test focus | Integration and E2E tests require different patterns |
|
|
| Static analysis | Cannot execute tests or measure runtime behavior |
|
|
| Language support | Best for TypeScript, JavaScript, Python, Java |
|
|
| Report formats | LCOV, JSON, XML only; other formats need conversion |
|
|
| Generated tests | Provide scaffolding; require human review for complex logic |
|
|
|
|
**When to use other tools:**
|
|
- E2E testing: Playwright, Cypress, Selenium
|
|
- Performance testing: k6, JMeter, Locust
|
|
- Security testing: OWASP ZAP, Burp Suite
|