feat(engineering): add browser-automation and spec-driven-workflow skills

browser-automation (564-line SKILL.md, 3 scripts, 3 references): - Web scraping, form filling, screenshot capture, data extraction - Anti-detection patterns, cookie/session management, dynamic content - scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py - NOT testing (that's playwright-pro) — this is automation & scraping spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references): - Spec-first development: write spec BEFORE code - Bounded autonomy rules, 6-phase workflow, self-review checklist - spec_generator.py, spec_validator.py, test_extractor.py - Pairs with tdd-guide for red-green-refactor after spec Updated engineering plugin.json (31 → 33 skills). Added both to mkdocs.yml nav and generated docs pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 12:57:18 +01:00
parent 7a2189fa21
commit 97952ccbee
19 changed files with 7379 additions and 3 deletions
--- a/engineering/spec-driven-workflow/SKILL.md
+++ b/engineering/spec-driven-workflow/SKILL.md
@@ -0,0 +1,586 @@
+---
+name: "spec-driven-workflow"
+description: "Use when the user asks to write specs before code, define acceptance criteria, plan features before implementation, generate tests from specifications, or follow spec-first development practices."
+---
+
+# Spec-Driven Workflow — POWERFUL
+
+## Overview
+
+Spec-driven workflow enforces a single, non-negotiable rule: **write the specification BEFORE you write any code.** Not alongside. Not after. Before.
+
+This is not documentation. This is a contract. A spec defines what the system MUST do, what it SHOULD do, and what it explicitly WILL NOT do. Every line of code you write traces back to a requirement in the spec. Every test traces back to an acceptance criterion. If it is not in the spec, it does not get built.
+
+### Why Spec-First Matters
+
+1. **Eliminates rework.** 60-80% of defects originate from requirements, not implementation. Catching ambiguity in a spec costs minutes; catching it in production costs days.
+2. **Forces clarity.** If you cannot write what the system should do in plain language, you do not understand the problem well enough to write code.
+3. **Enables parallelism.** Once a spec is approved, frontend, backend, QA, and documentation can all start simultaneously.
+4. **Creates accountability.** The spec is the definition of done. No arguments about whether a feature is "complete" — either it satisfies the acceptance criteria or it does not.
+5. **Feeds TDD directly.** Acceptance criteria in Given/When/Then format translate 1:1 into test cases. The spec IS the test plan.
+
+### The Iron Law
+
+```
+NO CODE WITHOUT AN APPROVED SPEC.
+NO EXCEPTIONS. NO "QUICK PROTOTYPES." NO "I'LL DOCUMENT IT LATER."
+```
+
+If the spec is not written, reviewed, and approved, implementation does not begin. Period.
+
+---
+
+## The Spec Format
+
+Every spec follows this structure. No sections are optional — if a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not forgotten.
+
+### 1. Title and Context
+
+```markdown
+# Spec: [Feature Name]
+
+**Author:** [name]
+**Date:** [ISO 8601]
+**Status:** Draft | In Review | Approved | Superseded
+**Reviewers:** [list]
+**Related specs:** [links]
+
+## Context
+
+[Why does this feature exist? What problem does it solve? What is the business
+motivation? Include links to user research, support tickets, or metrics that
+justify this work. 2-4 paragraphs maximum.]
+```
+
+### 2. Functional Requirements (RFC 2119)
+
+Use RFC 2119 keywords precisely:
+
+| Keyword | Meaning |
+|---------|---------|
+| **MUST** | Absolute requirement. Failing this means the implementation is non-conformant. |
+| **MUST NOT** | Absolute prohibition. Doing this means the implementation is broken. |
+| **SHOULD** | Recommended. May be omitted with documented justification. |
+| **SHOULD NOT** | Discouraged. May be included with documented justification. |
+| **MAY** | Optional. Purely at the implementer's discretion. |
+
+```markdown
+## Functional Requirements
+
+- FR-1: The system MUST authenticate users via OAuth 2.0 PKCE flow.
+- FR-2: The system MUST reject tokens older than 24 hours.
+- FR-3: The system SHOULD support refresh token rotation.
+- FR-4: The system MAY cache user profiles for up to 5 minutes.
+- FR-5: The system MUST NOT store plaintext passwords under any circumstance.
+```
+
+Number every requirement. Use `FR-` prefix. Each requirement is a single, testable statement.
+
+### 3. Non-Functional Requirements
+
+```markdown
+## Non-Functional Requirements
+
+### Performance
+- NFR-P1: Login flow MUST complete in < 500ms (p95) under normal load.
+- NFR-P2: Token validation MUST complete in < 50ms (p99).
+
+### Security
+- NFR-S1: All tokens MUST be transmitted over TLS 1.2+.
+- NFR-S2: The system MUST rate-limit login attempts to 5/minute per IP.
+
+### Accessibility
+- NFR-A1: Login form MUST meet WCAG 2.1 AA standards.
+- NFR-A2: Error messages MUST be announced to screen readers.
+
+### Scalability
+- NFR-SC1: The system SHOULD handle 10,000 concurrent sessions.
+
+### Reliability
+- NFR-R1: The authentication service MUST maintain 99.9% uptime.
+```
+
+### 4. Acceptance Criteria (Given/When/Then)
+
+Every functional requirement maps to one or more acceptance criteria. Use Gherkin syntax:
+
+```markdown
+## Acceptance Criteria
+
+### AC-1: Successful login (FR-1)
+Given a user with valid credentials
+When they submit the login form with correct email and password
+Then they receive a valid access token
+And they are redirected to the dashboard
+And the login event is logged with timestamp and IP
+
+### AC-2: Expired token rejection (FR-2)
+Given a user with an access token issued 25 hours ago
+When they make an API request with that token
+Then they receive a 401 Unauthorized response
+And the response body contains error code "TOKEN_EXPIRED"
+And they are NOT redirected (API clients handle their own flow)
+
+### AC-3: Rate limiting (NFR-S2)
+Given an IP address that has made 5 failed login attempts in the last minute
+When a 6th login attempt arrives from that IP
+Then the request is rejected with 429 Too Many Requests
+And the response includes a Retry-After header
+```
+
+### 5. Edge Cases and Error Scenarios
+
+```markdown
+## Edge Cases
+
+- EC-1: User submits login form with empty email → Show validation error, do not hit API.
+- EC-2: OAuth provider is down → Show "Service temporarily unavailable", retry after 30s.
+- EC-3: User has account but no password (social-only) → Redirect to social login.
+- EC-4: Concurrent login from two devices → Both sessions are valid (no single-session enforcement).
+- EC-5: Token expires mid-request → Complete the current request, return warning header.
+```
+
+### 6. API Contracts
+
+Define request/response shapes using TypeScript-style notation:
+
+```markdown
+## API Contracts
+
+### POST /api/auth/login
+Request:
+```typescript
+interface LoginRequest {
+  email: string;       // MUST be valid email format
+  password: string;    // MUST be 8-128 characters
+  rememberMe?: boolean; // Default: false
+}
+```
+
+Success Response (200):
+```typescript
+interface LoginResponse {
+  accessToken: string;   // JWT, expires in 24h
+  refreshToken: string;  // Opaque, expires in 30d
+  expiresIn: number;     // Seconds until access token expires
+  user: {
+    id: string;
+    email: string;
+    displayName: string;
+  };
+}
+```
+
+Error Response (401):
+```typescript
+interface AuthError {
+  error: "INVALID_CREDENTIALS" | "TOKEN_EXPIRED" | "ACCOUNT_LOCKED";
+  message: string;
+  retryAfter?: number; // Seconds, present for rate-limited responses
+}
+```
+```
+
+### 7. Data Models
+
+```markdown
+## Data Models
+
+### User
+| Field | Type | Constraints |
+|-------|------|-------------|
+| id | UUID | Primary key, auto-generated |
+| email | string | Unique, max 255 chars, valid email format |
+| passwordHash | string | bcrypt, never exposed via API |
+| createdAt | timestamp | UTC, immutable |
+| lastLoginAt | timestamp | UTC, updated on each login |
+| loginAttempts | integer | Reset to 0 on successful login |
+| lockedUntil | timestamp | Null if not locked |
+```
+
+### 8. Out of Scope
+
+Explicit exclusions prevent scope creep:
+
+```markdown
+## Out of Scope
+
+- OS-1: Multi-factor authentication (separate spec: SPEC-042)
+- OS-2: Social login providers beyond Google and GitHub
+- OS-3: Admin impersonation of user accounts
+- OS-4: Password complexity rules beyond minimum length (deferred to v2)
+- OS-5: Session management UI (users cannot see/revoke active sessions yet)
+```
+
+If someone asks for an out-of-scope item during implementation, point them to this section. Do not build it.
+
+---
+
+## Bounded Autonomy Rules
+
+These rules define when an agent (human or AI) MUST stop and ask for guidance vs. when they can proceed independently.
+
+### STOP and Ask When:
+
+1. **Scope creep detected.** The implementation requires something not in the spec. Even if it seems obviously needed, STOP. The spec might have excluded it deliberately.
+
+2. **Ambiguity exceeds 30%.** If you cannot determine the correct behavior from the spec for more than 30% of a given requirement, the spec is incomplete. Do not guess.
+
+3. **Breaking changes required.** The implementation would change an existing API contract, database schema, or public interface. Always escalate.
+
+4. **Security implications.** Any change that touches authentication, authorization, encryption, or PII handling requires explicit approval.
+
+5. **Performance characteristics unknown.** If a requirement says "MUST complete in < 500ms" but you have no way to measure or guarantee that, escalate before implementing a guess.
+
+6. **Cross-team dependencies.** If the spec requires coordination with another team or service, confirm the dependency before building against it.
+
+### Continue Autonomously When:
+
+1. **Spec is clear and unambiguous** for the current task.
+2. **All acceptance criteria have passing tests** and you are refactoring internals.
+3. **Changes are non-breaking** — no public API, schema, or behavior changes.
+4. **Implementation is a direct translation** of a well-defined acceptance criterion.
+5. **Error handling follows established patterns** already documented in the codebase.
+
+### Escalation Protocol
+
+When you must stop, provide:
+
+```markdown
+## Escalation: [Brief Title]
+
+**Blocked on:** [requirement ID, e.g., FR-3]
+**Question:** [Specific, answerable question — not "what should I do?"]
+**Options considered:**
+  A. [Option] — Pros: [...] Cons: [...]
+  B. [Option] — Pros: [...] Cons: [...]
+**My recommendation:** [A or B, with reasoning]
+**Impact of waiting:** [What is blocked until this is resolved?]
+```
+
+Never escalate without a recommendation. Never present an open-ended question. Always give options.
+
+See `references/bounded_autonomy_rules.md` for the complete decision matrix.
+
+---
+
+## Workflow — 6 Phases
+
+### Phase 1: Gather Requirements
+
+**Goal:** Understand what needs to be built and why.
+
+1. **Interview the user.** Ask:
+   - What problem does this solve?
+   - Who are the users?
+   - What does success look like?
+   - What explicitly should NOT be built?
+2. **Read existing code.** Understand the current system before proposing changes.
+3. **Identify constraints.** Performance budgets, security requirements, backward compatibility.
+4. **List unknowns.** Every unknown is a risk. Surface them now, not during implementation.
+
+**Exit criteria:** You can explain the feature to someone unfamiliar with the project in 2 minutes.
+
+### Phase 2: Write Spec
+
+**Goal:** Produce a complete spec document following The Spec Format above.
+
+1. Fill every section of the template. No section left blank.
+2. Number all requirements (FR-*, NFR-*, AC-*, EC-*, OS-*).
+3. Use RFC 2119 keywords precisely.
+4. Write acceptance criteria in Given/When/Then format.
+5. Define API contracts with TypeScript-style types.
+6. List explicit exclusions in Out of Scope.
+
+**Exit criteria:** The spec can be handed to a developer who was not in the requirements meeting, and they can implement the feature without asking clarifying questions.
+
+### Phase 3: Validate Spec
+
+**Goal:** Verify the spec is complete, consistent, and implementable.
+
+Run `spec_validator.py` against the spec file:
+
+```bash
+python spec_validator.py --file spec.md --strict
+```
+
+Manual validation checklist:
+- [ ] Every functional requirement has at least one acceptance criterion
+- [ ] Every acceptance criterion is testable (no subjective language)
+- [ ] API contracts cover all endpoints mentioned in requirements
+- [ ] Data models cover all entities mentioned in requirements
+- [ ] Edge cases cover failure modes for every external dependency
+- [ ] Out of scope is explicit about what was considered and rejected
+- [ ] Non-functional requirements have measurable thresholds
+
+**Exit criteria:** Spec scores 80+ on validator, and all manual checklist items pass.
+
+### Phase 4: Generate Tests
+
+**Goal:** Extract test cases from acceptance criteria before writing implementation code.
+
+Run `test_extractor.py` against the approved spec:
+
+```bash
+python test_extractor.py --file spec.md --framework pytest --output tests/
+```
+
+1. Each acceptance criterion becomes one or more test cases.
+2. Each edge case becomes a test case.
+3. Tests are stubs — they define the assertion but not the implementation.
+4. All tests MUST fail initially (red phase of TDD).
+
+**Exit criteria:** You have a test file where every test fails with "not implemented" or equivalent.
+
+### Phase 5: Implement
+
+**Goal:** Write code that makes failing tests pass, one acceptance criterion at a time.
+
+1. Pick one acceptance criterion (start with the simplest).
+2. Make its test(s) pass with minimal code.
+3. Run the full test suite — no regressions.
+4. Commit.
+5. Pick the next acceptance criterion. Repeat.
+
+**Rules:**
+- Do NOT implement anything not in the spec.
+- Do NOT optimize before all acceptance criteria pass.
+- Do NOT refactor before all acceptance criteria pass.
+- If you discover a missing requirement, STOP and update the spec first.
+
+**Exit criteria:** All tests pass. All acceptance criteria satisfied.
+
+### Phase 6: Self-Review
+
+**Goal:** Verify implementation matches spec before marking done.
+
+Run through the Self-Review Checklist below. If any item fails, fix it before declaring the task complete.
+
+---
+
+## Self-Review Checklist
+
+Before marking any implementation as done, verify ALL of the following:
+
+- [ ] **Every acceptance criterion has a passing test.** No exceptions. If AC-3 exists, a test for AC-3 exists and passes.
+- [ ] **Every edge case has a test.** EC-1 through EC-N all have corresponding test cases.
+- [ ] **No scope creep.** The implementation does not include features not in the spec. If you added something, either update the spec or remove it.
+- [ ] **API contracts match implementation.** Request/response shapes in code match the spec exactly. Field names, types, status codes — all of it.
+- [ ] **Error scenarios tested.** Every error response defined in the spec has a test that triggers it.
+- [ ] **Non-functional requirements verified.** If the spec says < 500ms, you have evidence (benchmark, load test, profiling) that it meets the threshold.
+- [ ] **Data model matches.** Database schema matches the spec. No extra columns, no missing constraints.
+- [ ] **Out-of-scope items not built.** Double-check that nothing from the Out of Scope section leaked into the implementation.
+
+---
+
+## Integration with TDD Guide
+
+Spec-driven workflow and TDD are complementary, not competing:
+
+```
+Spec-Driven Workflow          TDD (Red-Green-Refactor)
+─────────────────────         ──────────────────────────
+Phase 1: Gather Requirements
+Phase 2: Write Spec
+Phase 3: Validate Spec
+Phase 4: Generate Tests  ──→  RED: Tests exist and fail
+Phase 5: Implement       ──→  GREEN: Minimal code to pass
+Phase 6: Self-Review     ──→  REFACTOR: Clean up internals
+```
+
+**The handoff:** Spec-driven workflow produces the test stubs (Phase 4). TDD takes over from there. The spec tells you WHAT to test. TDD tells you HOW to implement.
+
+Use `engineering-team/tdd-guide` for:
+- Red-green-refactor cycle discipline
+- Coverage analysis and gap detection
+- Framework-specific test patterns (Jest, Pytest, JUnit)
+
+Use `engineering/spec-driven-workflow` for:
+- Defining what to build before building it
+- Acceptance criteria authoring
+- Completeness validation
+- Scope control
+
+---
+
+## Examples
+
+### Full Spec: User Password Reset
+
+```markdown
+# Spec: Password Reset Flow
+
+**Author:** Engineering Team
+**Date:** 2026-03-25
+**Status:** Approved
+
+## Context
+
+Users who forget their passwords currently have no self-service recovery option.
+Support receives ~200 password reset requests per week, costing approximately
+8 hours of support time. This feature eliminates that burden entirely.
+
+## Functional Requirements
+
+- FR-1: The system MUST allow users to request a password reset via email.
+- FR-2: The system MUST send a reset link that expires after 1 hour.
+- FR-3: The system MUST invalidate all previous reset links when a new one is requested.
+- FR-4: The system MUST enforce minimum password length of 8 characters on reset.
+- FR-5: The system MUST NOT reveal whether an email exists in the system.
+- FR-6: The system SHOULD log all reset attempts for audit purposes.
+
+## Acceptance Criteria
+
+### AC-1: Request reset (FR-1, FR-5)
+Given a user on the password reset page
+When they enter any email address and submit
+Then they see "If an account exists, a reset link has been sent"
+And the response is identical whether the email exists or not
+
+### AC-2: Valid reset link (FR-2)
+Given a user who received a reset email 30 minutes ago
+When they click the reset link
+Then they see the password reset form
+
+### AC-3: Expired reset link (FR-2)
+Given a user who received a reset email 2 hours ago
+When they click the reset link
+Then they see "This link has expired. Please request a new one."
+
+### AC-4: Previous links invalidated (FR-3)
+Given a user who requested two reset emails
+When they click the link from the first email
+Then they see "This link is no longer valid."
+
+## Edge Cases
+
+- EC-1: User submits reset for non-existent email → Same success message (FR-5).
+- EC-2: User clicks reset link twice → Second click shows "already used" if password was changed.
+- EC-3: Email delivery fails → Log error, do not retry automatically.
+- EC-4: User requests reset while already logged in → Allow it, do not force logout.
+
+## Out of Scope
+
+- OS-1: Security questions as alternative reset method.
+- OS-2: SMS-based password reset.
+- OS-3: Admin-initiated password reset (separate spec).
+```
+
+### Extracted Test Cases (from above spec)
+
+```python
+# Generated by test_extractor.py --framework pytest
+
+class TestPasswordReset:
+    def test_ac1_request_reset_existing_email(self):
+        """AC-1: Request reset with existing email shows generic message."""
+        # Given a user on the password reset page
+        # When they enter a registered email and submit
+        # Then they see "If an account exists, a reset link has been sent"
+        raise NotImplementedError("Implement this test")
+
+    def test_ac1_request_reset_nonexistent_email(self):
+        """AC-1: Request reset with unknown email shows same generic message."""
+        # Given a user on the password reset page
+        # When they enter an unregistered email and submit
+        # Then they see identical response to existing email case
+        raise NotImplementedError("Implement this test")
+
+    def test_ac2_valid_reset_link(self):
+        """AC-2: Reset link works within expiry window."""
+        raise NotImplementedError("Implement this test")
+
+    def test_ac3_expired_reset_link(self):
+        """AC-3: Reset link rejected after 1 hour."""
+        raise NotImplementedError("Implement this test")
+
+    def test_ac4_previous_links_invalidated(self):
+        """AC-4: Old reset links stop working when new one is requested."""
+        raise NotImplementedError("Implement this test")
+
+    def test_ec1_nonexistent_email_same_response(self):
+        """EC-1: Non-existent email produces identical response."""
+        raise NotImplementedError("Implement this test")
+
+    def test_ec2_reset_link_used_twice(self):
+        """EC-2: Already-used reset link shows appropriate message."""
+        raise NotImplementedError("Implement this test")
+```
+
+---
+
+## Anti-Patterns
+
+### 1. Coding Before Spec Approval
+
+**Symptom:** "I'll start coding while the spec is being reviewed."
+**Problem:** The review will surface changes. Now you have code that implements a rejected design.
+**Rule:** Implementation does not begin until spec status is "Approved."
+
+### 2. Vague Acceptance Criteria
+
+**Symptom:** "The system should work well" or "The UI should be responsive."
+**Problem:** Untestable. What does "well" mean? What does "responsive" mean?
+**Rule:** Every acceptance criterion must be verifiable by a machine. If you cannot write a test for it, rewrite the criterion.
+
+### 3. Missing Edge Cases
+
+**Symptom:** Happy path is specified, error paths are not.
+**Problem:** Developers invent error handling on the fly, leading to inconsistent behavior.
+**Rule:** For every external dependency (API, database, file system, user input), specify at least one failure scenario.
+
+### 4. Spec as Post-Hoc Documentation
+
+**Symptom:** "Let me write the spec now that the feature is done."
+**Problem:** This is documentation, not specification. It describes what was built, not what should have been built. It cannot catch design errors because the design is already frozen.
+**Rule:** If the spec was written after the code, it is not a spec. Relabel it as documentation.
+
+### 5. Gold-Plating Beyond Spec
+
+**Symptom:** "While I was in there, I also added..."
+**Problem:** Untested code. Unreviewed design. Potential for subtle bugs in the "bonus" feature.
+**Rule:** If it is not in the spec, it does not get built. File a new spec for additional features.
+
+### 6. Acceptance Criteria Without Requirement Traceability
+
+**Symptom:** AC-7 exists but does not reference any FR-* or NFR-*.
+**Problem:** Orphaned criteria mean either a requirement is missing or the criterion is unnecessary.
+**Rule:** Every AC-* MUST reference at least one FR-* or NFR-*.
+
+### 7. Skipping Validation
+
+**Symptom:** "The spec looks fine, let's just start."
+**Problem:** Missing sections discovered during implementation cause blocking delays.
+**Rule:** Always run `spec_validator.py --strict` before starting implementation. Fix all warnings.
+
+---
+
+## Cross-References
+
+- **`engineering-team/tdd-guide`** — Red-green-refactor cycle, test generation, coverage analysis. Use after Phase 4 of this workflow.
+- **`engineering/focused-fix`** — Deep-dive feature repair. When a spec-driven implementation has systemic issues, use focused-fix for diagnosis.
+- **`engineering/rag-architect`** — If the feature involves retrieval or knowledge systems, use rag-architect for the technical design within the spec.
+- **`references/spec_format_guide.md`** — Complete template with section-by-section explanations.
+- **`references/bounded_autonomy_rules.md`** — Full decision matrix for when to stop vs. continue.
+- **`references/acceptance_criteria_patterns.md`** — Pattern library for writing Given/When/Then criteria.
+
+---
+
+## Tools
+
+| Script | Purpose | Key Flags |
+|--------|---------|-----------|
+| `spec_generator.py` | Generate spec template from feature name/description | `--name`, `--description`, `--format`, `--json` |
+| `spec_validator.py` | Validate spec completeness (0-100 score) | `--file`, `--strict`, `--json` |
+| `test_extractor.py` | Extract test stubs from acceptance criteria | `--file`, `--framework`, `--output`, `--json` |
+
+```bash
+# Generate a spec template
+python spec_generator.py --name "User Authentication" --description "OAuth 2.0 login flow"
+
+# Validate a spec
+python spec_validator.py --file specs/auth.md --strict
+
+# Extract test cases
+python test_extractor.py --file specs/auth.md --framework pytest --output tests/test_auth.py
+```
--- a/engineering/spec-driven-workflow/references/acceptance_criteria_patterns.md
+++ b/engineering/spec-driven-workflow/references/acceptance_criteria_patterns.md
@@ -0,0 +1,497 @@
+# Acceptance Criteria Patterns
+
+A pattern library for writing Given/When/Then acceptance criteria across common feature types. Use these as starting points — adapt to your domain.
+
+---
+
+## Pattern Structure
+
+Every acceptance criterion follows this structure:
+
+```
+### AC-N: [Descriptive name] (FR-N, NFR-N)
+Given [precondition — the system/user is in this state]
+When  [trigger — the user or system performs this action]
+Then  [outcome — this observable, testable result occurs]
+And   [additional outcome — and this also happens]
+```
+
+**Rules:**
+1. One scenario per AC. Multiple Given/When/Then blocks = multiple ACs.
+2. Every AC references at least one FR-* or NFR-*.
+3. Outcomes must be observable and testable — no subjective language.
+4. Preconditions must be achievable in a test setup.
+
+---
+
+## Authentication Patterns
+
+### Login — Happy Path
+
+```markdown
+### AC-1: Successful login with valid credentials (FR-1)
+Given a registered user with email "user@example.com" and password "V@lidP4ss!"
+When they POST /api/auth/login with email "user@example.com" and password "V@lidP4ss!"
+Then the response status is 200
+And the response body contains a valid JWT access token
+And the response body contains a refresh token
+And the access token expires in 24 hours
+```
+
+### Login — Invalid Credentials
+
+```markdown
+### AC-2: Login rejected with wrong password (FR-1)
+Given a registered user with email "user@example.com"
+When they POST /api/auth/login with email "user@example.com" and an incorrect password
+Then the response status is 401
+And the response body contains error code "INVALID_CREDENTIALS"
+And no token is issued
+And the failed attempt is logged
+```
+
+### Login — Account Locked
+
+```markdown
+### AC-3: Login rejected for locked account (FR-1, NFR-S2)
+Given a user whose account is locked due to 5 consecutive failed login attempts
+When they POST /api/auth/login with correct credentials
+Then the response status is 403
+And the response body contains error code "ACCOUNT_LOCKED"
+And the response includes a "retryAfter" field with seconds until unlock
+```
+
+### Token Refresh
+
+```markdown
+### AC-4: Token refresh with valid refresh token (FR-3)
+Given a user with a valid, non-expired refresh token
+When they POST /api/auth/refresh with that refresh token
+Then the response status is 200
+And a new access token is issued
+And the old refresh token is invalidated
+And a new refresh token is issued (rotation)
+```
+
+### Logout
+
+```markdown
+### AC-5: Logout invalidates session (FR-4)
+Given an authenticated user with a valid access token
+When they POST /api/auth/logout with that token
+Then the response status is 204
+And the access token is no longer accepted for API calls
+And the refresh token is invalidated
+```
+
+---
+
+## CRUD Patterns
+
+### Create
+
+```markdown
+### AC-6: Create resource with valid data (FR-1)
+Given an authenticated user with "editor" role
+When they POST /api/resources with valid payload {name: "Test", type: "A"}
+Then the response status is 201
+And the response body contains the created resource with a generated UUID
+And the resource's "createdAt" field is set to the current UTC timestamp
+And the resource's "createdBy" field matches the authenticated user's ID
+```
+
+### Create — Validation Failure
+
+```markdown
+### AC-7: Create resource rejected with invalid data (FR-1)
+Given an authenticated user
+When they POST /api/resources with payload missing required field "name"
+Then the response status is 400
+And the response body contains error code "VALIDATION_ERROR"
+And the response body contains field-level detail: {"name": "Required field"}
+And no resource is created in the database
+```
+
+### Read — Single Item
+
+```markdown
+### AC-8: Read resource by ID (FR-2)
+Given an existing resource with ID "abc-123"
+When an authenticated user GETs /api/resources/abc-123
+Then the response status is 200
+And the response body contains the resource with all fields
+```
+
+### Read — Not Found
+
+```markdown
+### AC-9: Read non-existent resource returns 404 (FR-2)
+Given no resource exists with ID "nonexistent-id"
+When an authenticated user GETs /api/resources/nonexistent-id
+Then the response status is 404
+And the response body contains error code "NOT_FOUND"
+```
+
+### Update
+
+```markdown
+### AC-10: Update resource with valid data (FR-3)
+Given an existing resource with ID "abc-123" owned by the authenticated user
+When they PATCH /api/resources/abc-123 with {name: "Updated Name"}
+Then the response status is 200
+And the resource's "name" field is "Updated Name"
+And the resource's "updatedAt" field is updated to the current UTC timestamp
+And fields not included in the patch are unchanged
+```
+
+### Update — Ownership Check
+
+```markdown
+### AC-11: Update rejected for non-owner (FR-3, FR-6)
+Given an existing resource with ID "abc-123" owned by user "other-user"
+When the authenticated user (not "other-user") PATCHes /api/resources/abc-123
+Then the response status is 403
+And the response body contains error code "FORBIDDEN"
+And the resource is unchanged
+```
+
+### Delete — Soft Delete
+
+```markdown
+### AC-12: Soft delete resource (FR-5)
+Given an existing resource with ID "abc-123" owned by the authenticated user
+When they DELETE /api/resources/abc-123
+Then the response status is 204
+And the resource's "deletedAt" field is set to the current UTC timestamp
+And the resource no longer appears in GET /api/resources (list endpoint)
+And the resource still exists in the database (soft deleted)
+```
+
+### List — Pagination
+
+```markdown
+### AC-13: List resources with default pagination (FR-4)
+Given 50 resources exist for the authenticated user
+When they GET /api/resources without pagination parameters
+Then the response status is 200
+And the response contains the first 20 resources (default page size)
+And the response includes "totalCount: 50"
+And the response includes "page: 1"
+And the response includes "pageSize: 20"
+And the response includes "hasNextPage: true"
+```
+
+### List — Filtered
+
+```markdown
+### AC-14: List resources with type filter (FR-4)
+Given 30 resources of type "A" and 20 resources of type "B" exist
+When the authenticated user GETs /api/resources?type=A
+Then the response status is 200
+And all returned resources have type "A"
+And the response "totalCount" is 30
+```
+
+---
+
+## Search Patterns
+
+### Basic Search
+
+```markdown
+### AC-15: Search returns matching results (FR-7)
+Given resources with names "Alpha Report", "Beta Analysis", "Alpha Summary" exist
+When the user GETs /api/resources?q=Alpha
+Then the response contains "Alpha Report" and "Alpha Summary"
+And the response does not contain "Beta Analysis"
+And results are ordered by relevance score (descending)
+```
+
+### Search — Empty Results
+
+```markdown
+### AC-16: Search with no matches returns empty list (FR-7)
+Given no resources match the query "xyznonexistent"
+When the user GETs /api/resources?q=xyznonexistent
+Then the response status is 200
+And the response contains an empty "items" array
+And "totalCount" is 0
+```
+
+### Search — Special Characters
+
+```markdown
+### AC-17: Search handles special characters safely (FR-7, NFR-S1)
+Given resources exist in the database
+When the user GETs /api/resources?q="; DROP TABLE resources;--
+Then the response status is 200
+And no SQL injection occurs
+And the search treats the input as a literal string
+```
+
+---
+
+## File Upload Patterns
+
+### Upload — Happy Path
+
+```markdown
+### AC-18: Upload file within size limit (FR-8)
+Given an authenticated user
+When they POST /api/files with a 5MB PNG file
+Then the response status is 201
+And the response contains the file's URL, size, and MIME type
+And the file is stored in the configured storage backend
+And the file is associated with the authenticated user
+```
+
+### Upload — Size Exceeded
+
+```markdown
+### AC-19: Upload rejected for oversized file (FR-8)
+Given the maximum file size is 10MB
+When the user POSTs /api/files with a 15MB file
+Then the response status is 413
+And the response contains error code "FILE_TOO_LARGE"
+And no file is stored
+```
+
+### Upload — Invalid Type
+
+```markdown
+### AC-20: Upload rejected for disallowed file type (FR-8, NFR-S3)
+Given allowed file types are PNG, JPG, PDF
+When the user POSTs /api/files with an .exe file
+Then the response status is 415
+And the response contains error code "UNSUPPORTED_MEDIA_TYPE"
+And no file is stored
+```
+
+---
+
+## Payment Patterns
+
+### Charge — Happy Path
+
+```markdown
+### AC-21: Successful payment charge (FR-10)
+Given a user with a valid payment method on file
+When they POST /api/payments with amount 49.99 and currency "USD"
+Then the payment gateway is charged $49.99
+And the response status is 201
+And the response contains a transaction ID
+And a payment record is created with status "completed"
+And a receipt email is sent to the user
+```
+
+### Charge — Declined
+
+```markdown
+### AC-22: Payment declined by gateway (FR-10)
+Given a user with an expired credit card on file
+When they POST /api/payments with amount 49.99
+Then the payment gateway returns a decline
+And the response status is 402
+And the response contains error code "PAYMENT_DECLINED"
+And no payment record is created with status "completed"
+And the user is prompted to update their payment method
+```
+
+### Charge — Idempotency
+
+```markdown
+### AC-23: Duplicate payment request is idempotent (FR-10, NFR-R1)
+Given a payment was successfully processed with idempotency key "key-123"
+When the same request is sent again with idempotency key "key-123"
+Then the response status is 200
+And the response contains the original transaction ID
+And the user is NOT charged a second time
+```
+
+---
+
+## Notification Patterns
+
+### Email Notification
+
+```markdown
+### AC-24: Email notification sent on event (FR-11)
+Given a user with notification preferences set to "email"
+When their order status changes to "shipped"
+Then an email is sent to their registered email address
+And the email subject contains the order number
+And the email body contains the tracking URL
+And a notification record is created with status "sent"
+```
+
+### Notification — Delivery Failure
+
+```markdown
+### AC-25: Failed notification is retried (FR-11, NFR-R2)
+Given the email service returns a 5xx error on first attempt
+When a notification is triggered
+Then the system retries up to 3 times with exponential backoff (1s, 4s, 16s)
+And if all retries fail, the notification status is set to "failed"
+And an alert is sent to the ops channel
+```
+
+---
+
+## Negative Test Patterns
+
+### Unauthorized Access
+
+```markdown
+### AC-26: Unauthenticated request rejected (NFR-S1)
+Given no authentication token is provided
+When the user GETs /api/resources
+Then the response status is 401
+And the response contains error code "AUTHENTICATION_REQUIRED"
+And no resource data is returned
+```
+
+### Invalid Input — Type Mismatch
+
+```markdown
+### AC-27: String provided for numeric field (FR-1)
+Given the "quantity" field expects an integer
+When the user POSTs with quantity: "abc"
+Then the response status is 400
+And the response body contains field error: {"quantity": "Must be an integer"}
+```
+
+### Rate Limiting
+
+```markdown
+### AC-28: Rate limit enforced (NFR-S2)
+Given the rate limit is 100 requests per minute per API key
+When the user sends the 101st request within 60 seconds
+Then the response status is 429
+And the response includes header "Retry-After" with seconds until reset
+And the response contains error code "RATE_LIMITED"
+```
+
+### Concurrent Modification
+
+```markdown
+### AC-29: Optimistic locking prevents lost updates (NFR-R1)
+Given a resource with version 5
+When user A PATCHes with version 5 and user B PATCHes with version 5 simultaneously
+Then one succeeds with status 200 (version becomes 6)
+And the other receives status 409 with error code "CONFLICT"
+And the 409 response includes the current version number
+```
+
+---
+
+## Performance Criteria Patterns
+
+### Response Time
+
+```markdown
+### AC-30: API response time under load (NFR-P1)
+Given the system is handling 1,000 concurrent users
+When a user GETs /api/dashboard
+Then the response is returned in < 500ms (p95)
+And the response is returned in < 1000ms (p99)
+```
+
+### Throughput
+
+```markdown
+### AC-31: System handles target throughput (NFR-P2)
+Given normal production traffic patterns
+When the system receives 5,000 requests per second
+Then all requests are processed without queue overflow
+And error rate remains below 0.1%
+```
+
+### Resource Usage
+
+```markdown
+### AC-32: Memory usage within bounds (NFR-P3)
+Given the service is processing normal traffic
+When measured over a 24-hour period
+Then memory usage does not exceed 512MB RSS
+And no memory leaks are detected (RSS growth < 5% over 24h)
+```
+
+---
+
+## Accessibility Criteria Patterns
+
+### Keyboard Navigation
+
+```markdown
+### AC-33: Form is fully keyboard navigable (NFR-A1)
+Given the user is on the login page using only a keyboard
+When they press Tab
+Then focus moves through: email field -> password field -> submit button
+And each focused element has a visible focus indicator
+And pressing Enter on the submit button submits the form
+```
+
+### Screen Reader
+
+```markdown
+### AC-34: Error messages announced to screen readers (NFR-A2)
+Given the user submits the form with invalid data
+When validation errors appear
+Then each error is associated with its form field via aria-describedby
+And the error container has role="alert" for immediate announcement
+And the first error field receives focus
+```
+
+### Color Contrast
+
+```markdown
+### AC-35: Text meets contrast requirements (NFR-A3)
+Given the default theme is active
+When measuring text against background colors
+Then all body text meets 4.5:1 contrast ratio (WCAG AA)
+And all large text (18px+ or 14px+ bold) meets 3:1 contrast ratio
+And all interactive element states (hover, focus, active) meet 3:1
+```
+
+### Reduced Motion
+
+```markdown
+### AC-36: Animations respect user preference (NFR-A4)
+Given the user has enabled "prefers-reduced-motion" in their OS settings
+When they load any page with animations
+Then all non-essential animations are disabled
+And essential animations (e.g., loading spinner) use a reduced version
+And no content is hidden behind animation-only interactions
+```
+
+---
+
+## Writing Tips
+
+### Do
+
+- Start Given with the system/user state, not the action
+- Make When a single, specific trigger
+- Make Then observable — status codes, field values, side effects
+- Include And for additional assertions on the same outcome
+- Reference requirement IDs in the AC title
+
+### Do Not
+
+- Write "Then the system works correctly" (not testable)
+- Combine multiple scenarios in one AC
+- Use subjective words: "quickly", "properly", "nicely", "user-friendly"
+- Skip the precondition — Given is required even if it seems obvious
+- Write Given/When/Then as prose paragraphs — use the structured format
+
+### Smell Tests
+
+If your AC has any of these, rewrite it:
+
+| Smell | Example | Fix |
+|-------|---------|-----|
+| No Given clause | "When user clicks, then page loads" | Add "Given user is on the dashboard" |
+| Vague Then | "Then it works" | Specify status code, body, side effects |
+| Multiple Whens | "When user clicks A and then clicks B" | Split into two ACs |
+| Implementation detail | "Then the Redux store is updated" | Focus on user-observable outcome |
+| No requirement reference | "AC-5: Dashboard loads" | "AC-5: Dashboard loads (FR-7)" |
--- a/engineering/spec-driven-workflow/references/bounded_autonomy_rules.md
+++ b/engineering/spec-driven-workflow/references/bounded_autonomy_rules.md
@@ -0,0 +1,273 @@
+# Bounded Autonomy Rules
+
+Decision framework for when an agent (human or AI) should stop and ask vs. continue working autonomously during spec-driven development.
+
+---
+
+## The Core Principle
+
+**Autonomy is earned by clarity.** The clearer the spec, the more autonomy the implementer has. The more ambiguous the spec, the more the implementer must stop and ask.
+
+This is not about trust. It is about risk. A clear spec means low risk of building the wrong thing. An ambiguous spec means high risk.
+
+---
+
+## Decision Matrix
+
+| Signal | Action | Rationale |
+|--------|--------|-----------|
+| Spec is Approved, requirement is clear, tests exist | **Continue** | Low risk. Build it. |
+| Requirement is clear but no test exists yet | **Continue** (write the test first) | You can infer the test from the requirement. |
+| Requirement uses SHOULD/MAY keywords | **Continue** with your best judgment | These are intentionally flexible. Document your choice. |
+| Requirement is ambiguous (multiple valid interpretations) | **STOP** if ambiguity > 30% of the task | Ask the spec author to clarify. |
+| Implementation requires changing an API contract | **STOP** always | Breaking changes need explicit approval. |
+| Implementation requires a new database migration | **STOP** if it changes existing columns/tables | New tables are lower risk than schema changes. |
+| Security-related change (auth, crypto, PII) | **STOP** always | Security changes need review regardless of spec clarity. |
+| Performance-critical path with no benchmark data | **STOP** | You cannot prove NFR compliance without measurement. |
+| Bug found in existing code unrelated to spec | **STOP** — file a separate issue | Do not fix unrelated bugs in a spec-scoped implementation. |
+| Spec says "N/A" for a section you think needs content | **STOP** | The author may have a reason, or they may have missed it. |
+
+---
+
+## Ambiguity Scoring
+
+When you encounter ambiguity, quantify it before deciding to stop or continue.
+
+### How to Score Ambiguity
+
+For each requirement you are implementing, ask:
+
+1. **Can I write a test for this right now?** (No = +20% ambiguity)
+2. **Are there multiple valid interpretations?** (Yes = +20% ambiguity)
+3. **Does the spec contradict itself?** (Yes = +30% ambiguity)
+4. **Am I making assumptions about user behavior?** (Yes = +15% ambiguity)
+5. **Does this depend on an undocumented external system?** (Yes = +15% ambiguity)
+
+### Threshold
+
+| Ambiguity Score | Action |
+|-----------------|--------|
+| 0-15% | Continue. Minor ambiguity is normal. Document your interpretation. |
+| 16-30% | Continue with caution. Add a comment explaining your interpretation. Flag in PR. |
+| 31-50% | STOP. Ask the spec author one specific question. Do not continue until answered. |
+| 51%+ | STOP. The spec is incomplete. Request a revision before proceeding. |
+
+### Example
+
+**Requirement:** "FR-7: The system MUST notify the user when their order ships."
+
+Questions:
+1. Can I write a test? Partially — I know WHAT to test but not HOW (email? push? in-app?). +20%
+2. Multiple interpretations? Yes — notification channel is unclear. +20%
+3. Contradicts itself? No. +0%
+4. Assuming user behavior? Yes — I am assuming they want email. +15%
+5. Undocumented external system? Maybe — depends on notification service. +15%
+
+**Total: 70%.** STOP. The spec needs to specify the notification channel.
+
+---
+
+## Scope Creep Detection
+
+### What Is Scope Creep?
+
+Scope creep is implementing functionality not described in the spec. It includes:
+
+- Adding features the spec does not mention
+- "Improving" behavior beyond what acceptance criteria require
+- Handling edge cases the spec explicitly excluded
+- Refactoring unrelated code "while you're in there"
+- Building infrastructure for future features
+
+### Detection Patterns
+
+| Pattern | Example | Risk |
+|---------|---------|------|
+| "While I'm here..." | Refactoring a utility function unrelated to the spec | Medium — unreviewed changes |
+| "This would be easy to add..." | Adding a search filter the spec does not mention | High — untested, unspecified |
+| "Users will probably want..." | Building a feature based on assumption | High — may conflict with future specs |
+| "This is obviously needed..." | Adding logging, metrics, or caching not in NFRs | Medium — may be overkill or wrong approach |
+| "The spec forgot to mention..." | Building something the spec excluded | Critical — may be deliberately excluded |
+
+### Response Protocol
+
+When you detect scope creep in your own work:
+
+1. **Stop immediately.** Do not commit the extra code.
+2. **Check Out of Scope.** Is this item explicitly excluded?
+3. **If excluded:** Delete the code. The spec author had a reason.
+4. **If not mentioned:** File a note for the spec author. Ask if it should be added.
+5. **If approved:** Update the spec FIRST, then implement.
+
+---
+
+## Breaking Change Identification
+
+### What Counts as a Breaking Change?
+
+A breaking change is any modification that could cause existing clients, tests, or integrations to fail.
+
+| Category | Breaking | Not Breaking |
+|----------|----------|--------------|
+| API endpoint removed | Yes | - |
+| API endpoint added | - | No |
+| Required field added to request | Yes | - |
+| Optional field added to request | - | No |
+| Field removed from response | Yes | - |
+| Field added to response | - | No (usually) |
+| Status code changed | Yes | - |
+| Error code string changed | Yes | - |
+| Database column removed | Yes | - |
+| Database column added (nullable) | - | No |
+| Database column added (not null, no default) | Yes | - |
+| Enum value removed | Yes | - |
+| Enum value added | - | No (usually) |
+| Behavior change for existing input | Yes | - |
+
+### Breaking Change Protocol
+
+1. **Identify** the breaking change before implementing it.
+2. **Escalate** immediately — do not implement without approval.
+3. **Propose** a migration path (versioned API, feature flag, deprecation period).
+4. **Document** the breaking change in the spec's changelog.
+
+---
+
+## Security Implication Checklist
+
+Any change touching the following areas MUST be escalated, even if the spec seems clear.
+
+### Always Escalate
+
+- [ ] Authentication logic (login, logout, token generation)
+- [ ] Authorization logic (role checks, permission gates)
+- [ ] Encryption/hashing (algorithm choice, key management)
+- [ ] PII handling (storage, transmission, logging)
+- [ ] Input validation bypass (new endpoints, parameter changes)
+- [ ] Rate limiting changes (thresholds, scope)
+- [ ] CORS or CSP policy changes
+- [ ] File upload handling
+- [ ] SQL/NoSQL query construction (injection risk)
+- [ ] Deserialization of user input
+- [ ] Redirect URLs from user input (open redirect risk)
+- [ ] Secrets in code, config, or logs
+
+### Security Escalation Template
+
+```markdown
+## Security Escalation: [Title]
+
+**Affected area:** [authentication/authorization/encryption/PII/etc.]
+**Spec reference:** [FR-N or NFR-SN]
+**Risk:** [What could go wrong if implemented incorrectly]
+**Current protection:** [What exists today]
+**Proposed change:** [What the spec requires]
+**My concern:** [Specific security question]
+**Recommendation:** [Proposed approach with security rationale]
+```
+
+---
+
+## Escalation Templates
+
+### Template 1: Ambiguous Requirement
+
+```markdown
+## Escalation: Ambiguous Requirement
+
+**Blocked on:** FR-7 ("notify the user when their order ships")
+**Ambiguity score:** 70%
+**Question:** What notification channel should be used?
+**Options considered:**
+  A. Email only — Pros: simple, reliable. Cons: not real-time.
+  B. Email + in-app notification — Pros: covers both async and real-time. Cons: more implementation effort.
+  C. Configurable per user — Pros: maximum flexibility. Cons: requires preference UI (not in spec).
+**My recommendation:** B (email + in-app). Covers most use cases without requiring new UI.
+**Impact of waiting:** Cannot implement FR-7 until resolved. No other work blocked.
+```
+
+### Template 2: Missing Edge Case
+
+```markdown
+## Escalation: Missing Edge Case
+
+**Related to:** FR-3 (password reset link expires after 1 hour)
+**Scenario:** User clicks a reset link, but their account was deleted between requesting and clicking.
+**Not in spec:** Edge cases section does not cover this.
+**Options considered:**
+  A. Show generic "link invalid" error — Pros: secure (no info leak). Cons: confusing for deleted user.
+  B. Show "account not found" error — Pros: clear. Cons: confirms account deletion to link holder.
+**My recommendation:** A. Security over clarity — do not reveal account existence.
+**Impact of waiting:** Can implement other ACs; this is blocking only AC-2 completion.
+```
+
+### Template 3: Potential Breaking Change
+
+```markdown
+## Escalation: Potential Breaking Change
+
+**Spec requires:** Adding required field "role" to POST /api/users request (FR-6)
+**Current behavior:** POST /api/users accepts {email, password, displayName}
+**Breaking:** Yes — existing clients will get 400 errors (missing required field)
+**Options considered:**
+  A. Make "role" required as spec says — Pros: matches spec. Cons: breaks mobile app v2.1.
+  B. Make "role" optional with default "user" — Pros: backward compatible. Cons: deviates from spec.
+  C. Version the API (v2) — Pros: clean separation. Cons: maintenance burden.
+**My recommendation:** B. Default to "user" for backward compatibility. Update spec to reflect MAY instead of MUST.
+**Impact of waiting:** Frontend team is building against the new contract. Need answer within 2 days.
+```
+
+### Template 4: Scope Creep Proposal
+
+```markdown
+## Escalation: Potential Addition to Spec
+
+**Context:** While implementing FR-2 (password validation), I noticed the spec does not mention password strength feedback.
+**Not in spec:** No requirement for showing strength indicators.
+**Checked Out of Scope:** Not listed there either.
+**Proposal:** Add FR-7: "The system SHOULD display password strength feedback during registration."
+**Effort:** ~2 hours additional implementation.
+**Question:** Should this be added to current spec, filed as a separate spec, or skipped?
+**Impact of waiting:** FR-2 implementation is not blocked. This is an enhancement question only.
+```
+
+---
+
+## Quick Reference Card
+
+```
+CONTINUE if:
+  - Spec is approved
+  - Requirement uses MUST and is unambiguous
+  - Tests can be written directly from the AC
+  - Changes are additive and non-breaking
+  - You are refactoring internals only (no behavior change)
+
+STOP if:
+  - Ambiguity > 30%
+  - Any breaking change
+  - Any security-related change
+  - Spec says N/A but you think it shouldn't
+  - You are about to build something not in the spec
+  - You cannot write a test for the requirement
+  - External dependency is undocumented
+```
+
+---
+
+## Anti-Patterns in Autonomy
+
+### 1. "I'll Ask Later"
+Continuing past an ambiguity checkpoint because asking feels slow. The rework from building the wrong thing is always slower.
+
+### 2. "It's Obviously Needed"
+Assuming a missing feature was accidentally omitted. It may have been deliberately excluded. Check Out of Scope first.
+
+### 3. "The Spec Is Wrong"
+Implementing what you think the spec SHOULD say instead of what it DOES say. If the spec is wrong, escalate. Do not silently "fix" it.
+
+### 4. "Just This Once"
+Bypassing the escalation protocol for a "small" change. Small changes compound. The protocol exists because humans are bad at judging risk in the moment.
+
+### 5. "I Already Built It"
+Presenting completed work that was never in the spec and hoping it gets accepted. This creates review pressure and wastes everyone's time if rejected. Ask BEFORE building.
--- a/engineering/spec-driven-workflow/references/spec_format_guide.md
+++ b/engineering/spec-driven-workflow/references/spec_format_guide.md
@@ -0,0 +1,423 @@
+# Spec Format Guide
+
+Complete reference for writing feature specifications. Every section is explained with examples, rationale, and common mistakes.
+
+---
+
+## The Spec Document Structure
+
+A spec has 8 mandatory sections. If a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not skipped.
+
+```
+1. Title and Metadata
+2. Context
+3. Functional Requirements
+4. Non-Functional Requirements
+5. Acceptance Criteria
+6. Edge Cases and Error Scenarios
+7. API Contracts
+8. Data Models
+9. Out of Scope
+```
+
+---
+
+## Section 1: Title and Metadata
+
+```markdown
+# Spec: [Feature Name]
+
+**Author:** Jane Doe
+**Date:** 2026-03-25
+**Status:** Draft | In Review | Approved | Superseded
+**Reviewers:** John Smith, Alice Chen
+**Related specs:** SPEC-018 (User Registration), SPEC-023 (Session Management)
+```
+
+### Status Lifecycle
+
+| Status | Meaning | Who Can Change |
+|--------|---------|----------------|
+| Draft | Author is still writing. Not ready for review. | Author |
+| In Review | Ready for feedback. Implementation blocked. | Author |
+| Approved | Reviewed and accepted. Implementation may begin. | Reviewer |
+| Superseded | Replaced by a newer spec. Link to replacement. | Author |
+
+**Rule:** Implementation MUST NOT begin until status is "Approved."
+
+---
+
+## Section 2: Context
+
+The context section answers: **Why does this feature exist?**
+
+### What to Include
+
+- The problem being solved (with evidence: support tickets, metrics, user research)
+- The current state (what exists today and what is broken or missing)
+- The business justification (revenue impact, cost savings, user retention)
+- Constraints or dependencies (regulatory, technical, timeline)
+
+### What to Exclude
+
+- Implementation details (that is the engineer's job)
+- Solution proposals (the spec says WHAT, not HOW)
+- Lengthy background (2-4 paragraphs maximum)
+
+### Good Example
+
+```markdown
+## Context
+
+Users who forget their passwords currently have no self-service recovery.
+Support handles ~200 password reset requests per week, consuming approximately
+8 hours of agent time at $45/hour ($360/week, $18,720/year). Additionally,
+12% of users who contact support for a reset never return.
+
+This feature provides self-service password reset via email, eliminating
+support burden and reducing user churn from the reset flow.
+```
+
+### Bad Example
+
+```markdown
+## Context
+
+We need a password reset feature. Users forget their passwords sometimes
+and need to reset them. We should build this.
+```
+
+**Why it is bad:** No evidence, no metrics, no business justification. "We should build this" is not a reason.
+
+---
+
+## Section 3: Functional Requirements — RFC 2119
+
+### RFC 2119 Keywords
+
+These keywords have precise meanings per [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). Do not use them casually.
+
+| Keyword | Meaning | Testing Implication |
+|---------|---------|---------------------|
+| **MUST** | Absolute requirement. The implementation is non-conformant without this. | Must have a passing test. Failure = release blocker. |
+| **MUST NOT** | Absolute prohibition. Doing this = broken implementation. | Must have a test proving this cannot happen. |
+| **SHOULD** | Strongly recommended. Can be omitted only with documented justification. | Should have a test. Omission requires written rationale. |
+| **SHOULD NOT** | Strongly discouraged. Can be done only with documented justification. | Should have a test confirming the behavior does not occur. |
+| **MAY** | Truly optional. Implementer's discretion. | Test is optional. Document if implemented. |
+
+### Writing Good Requirements
+
+**Each requirement MUST be:**
+1. **Atomic** — One behavior per requirement. Not "The system MUST authenticate users and log them in."
+2. **Testable** — You can write a test that proves it works or does not.
+3. **Numbered** — Sequential FR-N format for traceability.
+4. **Specific** — No ambiguous adjectives ("fast", "secure", "user-friendly").
+
+### Good Requirements
+
+```markdown
+- FR-1: The system MUST accept login via email and password.
+- FR-2: The system MUST reject passwords shorter than 8 characters.
+- FR-3: The system MUST return a JWT access token on successful login.
+- FR-4: The system MUST NOT include the password hash in any API response.
+- FR-5: The system SHOULD support "remember me" with a 30-day refresh token.
+- FR-6: The system MAY display last login time on the dashboard.
+```
+
+### Bad Requirements
+
+```markdown
+- FR-1: The login system must be fast and secure.
+  (Untestable: what is "fast"? What is "secure"?)
+
+- FR-2: The system must handle all edge cases.
+  (Vague: which edge cases? This delegates the spec to the implementer.)
+
+- FR-3: Users should be able to log in easily.
+  (Subjective: "easily" is not measurable.)
+```
+
+---
+
+## Section 4: Non-Functional Requirements
+
+Non-functional requirements define quality attributes. Every requirement needs a **measurable threshold**.
+
+### Categories
+
+#### Performance
+```markdown
+- NFR-P1: Login API MUST respond in < 500ms (p95) under 1,000 concurrent users.
+- NFR-P2: Dashboard page MUST achieve Largest Contentful Paint < 2.5s.
+- NFR-P3: Search results MUST return within 200ms for queries under 100 characters.
+```
+
+**Bad:** "The system should be fast." (Not measurable.)
+
+#### Security
+```markdown
+- NFR-S1: All API endpoints MUST require authentication except /health and /login.
+- NFR-S2: Failed login attempts MUST be rate-limited to 5 per minute per IP.
+- NFR-S3: Passwords MUST be hashed with bcrypt (cost factor >= 12).
+- NFR-S4: Session tokens MUST be invalidated on password change.
+```
+
+#### Accessibility
+```markdown
+- NFR-A1: All form inputs MUST have associated labels (WCAG 1.3.1).
+- NFR-A2: Color contrast MUST meet 4.5:1 ratio (WCAG 1.4.3).
+- NFR-A3: All interactive elements MUST be keyboard-navigable (WCAG 2.1.1).
+```
+
+#### Scalability
+```markdown
+- NFR-SC1: The system SHOULD handle 50,000 registered users.
+- NFR-SC2: Database queries MUST use indexes; no full table scans on tables > 10K rows.
+```
+
+#### Reliability
+```markdown
+- NFR-R1: The authentication service MUST maintain 99.9% uptime (< 8.77h downtime/year).
+- NFR-R2: Data MUST NOT be lost on service restart (durable storage required).
+```
+
+---
+
+## Section 5: Acceptance Criteria — Given/When/Then
+
+Acceptance criteria are the contract between the spec author and the implementer. They define "done."
+
+### The Given/When/Then Pattern
+
+```
+Given [precondition — the world is in this state]
+When  [action — the user or system does this]
+Then  [outcome — this observable result occurs]
+And   [additional outcome — and also this]
+```
+
+### Rules for Acceptance Criteria
+
+1. **Every AC MUST reference at least one FR-* or NFR-*.** Orphaned criteria indicate missing requirements.
+2. **Every AC MUST be testable by a machine.** If you cannot write an automated test, rewrite the criterion.
+3. **No subjective language.** Not "should look good" but "MUST render within the design-system grid."
+4. **One scenario per AC.** If you have multiple Given/When/Then blocks, split into separate ACs.
+
+### Example: Authentication Feature
+
+```markdown
+### AC-1: Successful login (FR-1, FR-3)
+Given a registered user with email "user@example.com" and password "P@ssw0rd123"
+When they POST /api/auth/login with those credentials
+Then they receive a 200 response with a valid JWT token
+And the token expires in 24 hours
+And the response includes the user's display name
+
+### AC-2: Invalid password (FR-1)
+Given a registered user with email "user@example.com"
+When they POST /api/auth/login with an incorrect password
+Then they receive a 401 response
+And the response body contains error "INVALID_CREDENTIALS"
+And no token is issued
+
+### AC-3: Short password rejected on registration (FR-2)
+Given a new user attempting to register
+When they submit a password with 7 characters
+Then they receive a 400 response
+And the response body contains error "PASSWORD_TOO_SHORT"
+And the account is not created
+```
+
+### Common Mistakes
+
+| Mistake | Example | Fix |
+|---------|---------|-----|
+| Vague outcome | "Then the system works correctly" | "Then the response status is 200 and body contains {field: value}" |
+| Missing precondition | "When user logs in, then token is issued" | "Given a registered user, when they POST valid credentials, then..." |
+| Multiple scenarios | AC with 3 different When clauses | Split into 3 separate ACs |
+| No FR reference | "AC-5: User sees dashboard" | "AC-5: User sees dashboard (FR-7)" |
+
+---
+
+## Section 6: Edge Cases and Error Scenarios
+
+### What Counts as an Edge Case
+
+- Invalid or malformed input
+- External service failures (API down, timeout, rate-limited)
+- Concurrent operations (race conditions)
+- Boundary values (empty string, max length, zero, negative numbers)
+- State conflicts (already exists, already deleted, expired)
+
+### Format
+
+```markdown
+- EC-1: Empty email field → Return 400 with error "EMAIL_REQUIRED". Do not call auth service.
+- EC-2: Email exceeds 255 characters → Return 400 with error "EMAIL_TOO_LONG".
+- EC-3: OAuth provider returns 503 → Return 503 with "Service temporarily unavailable". Retry after 30s.
+- EC-4: Two users register same email simultaneously → First succeeds, second gets 409 Conflict.
+- EC-5: User clicks reset link after password was already changed → Show "Link already used."
+```
+
+### Coverage Rule
+
+For every external dependency, specify at least one failure:
+- Database: connection lost, timeout, constraint violation
+- API: 4xx, 5xx, timeout, invalid response
+- File system: file not found, permission denied, disk full
+- User input: empty, too long, wrong type, injection attempt
+
+---
+
+## Section 7: API Contracts
+
+### Notation
+
+Use TypeScript-style interfaces. They are readable by both frontend and backend engineers.
+
+```typescript
+interface CreateUserRequest {
+  email: string;         // MUST be valid email, max 255 chars
+  password: string;      // MUST be 8-128 chars
+  displayName: string;   // MUST be 1-100 chars, no HTML
+  role?: "user" | "admin"; // Default: "user"
+}
+```
+
+### What to Define
+
+For each endpoint:
+1. **HTTP method and path** (e.g., POST /api/users)
+2. **Request body** (fields, types, constraints, defaults)
+3. **Success response** (status code, body shape)
+4. **Error responses** (each error code with its status and body)
+5. **Headers** (Authorization, Content-Type, custom headers)
+
+### Error Response Convention
+
+```typescript
+interface ApiError {
+  error: string;         // Machine-readable code: "INVALID_CREDENTIALS"
+  message: string;       // Human-readable: "The email or password is incorrect."
+  details?: Record<string, string>;  // Field-level errors for validation
+}
+```
+
+Always include:
+- 400 for validation errors
+- 401 for authentication failures
+- 403 for authorization failures
+- 404 for not found
+- 409 for conflicts
+- 429 for rate limiting
+- 500 for unexpected errors (keep it generic — do not leak internals)
+
+---
+
+## Section 8: Data Models
+
+### Table Format
+
+```markdown
+### User
+| Field | Type | Constraints |
+|-------|------|-------------|
+| id | UUID | PK, auto-generated, immutable |
+| email | varchar(255) | Unique, not null, valid email |
+| passwordHash | varchar(60) | Not null, bcrypt, never in API responses |
+| displayName | varchar(100) | Not null |
+| role | enum('user','admin') | Default: 'user' |
+| createdAt | timestamp | UTC, immutable, auto-set |
+| updatedAt | timestamp | UTC, auto-updated |
+| deletedAt | timestamp | Null unless soft-deleted |
+```
+
+### Rules
+
+1. **Every entity in requirements MUST have a data model.** If FR-1 mentions "users", there must be a User model.
+2. **Constraints MUST match requirements.** If FR-2 says passwords >= 8 chars, the model must note that.
+3. **Include indexes.** If NFR-P1 says < 500ms queries, note which fields need indexes.
+4. **Specify soft vs. hard delete.** State it explicitly.
+
+---
+
+## Section 9: Out of Scope
+
+### Why This Section Matters
+
+Out of Scope prevents scope creep during implementation. When someone says "while you're in there, could you also..." — point them to this section.
+
+### Format
+
+```markdown
+- OS-1: Multi-factor authentication — Planned for Q3 (SPEC-045).
+- OS-2: Social login beyond Google/GitHub — Insufficient user demand (< 2% requests).
+- OS-3: Admin impersonation — Security review pending. Separate spec required.
+- OS-4: Password strength meter UI — Nice-to-have, deferred to design sprint 12.
+```
+
+### Rules
+
+1. **Every feature discussed and rejected MUST be listed.** This creates a paper trail.
+2. **Include the reason.** "Not now" is not a reason. "Insufficient demand (< 2% of requests)" is.
+3. **Link to future specs** when the exclusion is a deferral, not a rejection.
+
+---
+
+## Feature-Type Templates
+
+### CRUD Feature
+
+Focus on: all 4 operations, validation rules, authorization, pagination for list endpoints.
+
+```markdown
+- FR-1: Users MUST be able to create a [resource] with [required fields].
+- FR-2: Users MUST be able to read a [resource] by ID.
+- FR-3: Users MUST be able to list [resources] with pagination (default: 20/page).
+- FR-4: Users MUST be able to update [mutable fields] of their own [resources].
+- FR-5: Users MUST be able to delete their own [resources] (soft delete).
+- FR-6: Users MUST NOT be able to modify or delete other users' [resources].
+```
+
+### Integration Feature
+
+Focus on: external API contract, retry/fallback behavior, data mapping, error propagation.
+
+```markdown
+- FR-1: The system MUST call [external API] to [purpose].
+- FR-2: The system MUST retry failed calls up to 3 times with exponential backoff.
+- FR-3: The system MUST map [external field] to [internal field].
+- FR-4: The system MUST NOT expose external API errors directly to users.
+- EC-1: External API returns 5xx → Log error, return cached data if < 1h old, else 503.
+- EC-2: External API response schema changes → Log warning, reject unmappable fields.
+```
+
+### Migration Feature
+
+Focus on: backward compatibility, rollback plan, data integrity, zero-downtime deployment.
+
+```markdown
+- FR-1: The migration MUST transform [old schema] to [new schema].
+- FR-2: The migration MUST be reversible (rollback script required).
+- FR-3: The migration MUST NOT cause downtime exceeding 30 seconds.
+- FR-4: The migration MUST validate data integrity post-run (row count, checksum).
+- EC-1: Migration fails mid-way → Automatic rollback, alert ops team.
+- EC-2: New schema has stricter constraints → Log invalid rows, quarantine for manual review.
+```
+
+---
+
+## Checklist: Is This Spec Ready for Review?
+
+- [ ] Every section is filled (or marked N/A with reason)
+- [ ] All requirements use FR-N, NFR-N numbering
+- [ ] RFC 2119 keywords are UPPERCASE
+- [ ] Every AC references at least one requirement
+- [ ] Every AC uses Given/When/Then
+- [ ] Edge cases cover each external dependency failure
+- [ ] API contracts define success AND error responses
+- [ ] Data models include all entities from requirements
+- [ ] Out of Scope lists items discussed and rejected
+- [ ] No placeholder text remains
+- [ ] Context includes evidence (metrics, tickets, research)
+- [ ] Status is "In Review" (not still "Draft")
--- a/engineering/spec-driven-workflow/spec_generator.py
+++ b/engineering/spec-driven-workflow/spec_generator.py
@@ -0,0 +1,338 @@
+#!/usr/bin/env python3
+"""
+Spec Generator - Generates a feature specification template from a name and description.
+
+Produces a complete spec document with all required sections pre-filled with
+guidance prompts. Output can be markdown or structured JSON.
+
+No external dependencies - uses only Python standard library.
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from datetime import date
+from pathlib import Path
+from typing import Dict, Any, Optional
+
+
+SPEC_TEMPLATE = """\
+# Spec: {name}
+
+**Author:** [your name]
+**Date:** {date}
+**Status:** Draft
+**Reviewers:** [list reviewers]
+**Related specs:** [links to related specs, or "None"]
+
+---
+
+## Context
+
+{context_prompt}
+
+---
+
+## Functional Requirements
+
+_Use RFC 2119 keywords: MUST, MUST NOT, SHOULD, SHOULD NOT, MAY._
+_Each requirement is a single, testable statement. Number sequentially._
+
+- FR-1: The system MUST [describe required behavior].
+- FR-2: The system MUST [describe another required behavior].
+- FR-3: The system SHOULD [describe recommended behavior].
+- FR-4: The system MAY [describe optional behavior].
+- FR-5: The system MUST NOT [describe prohibited behavior].
+
+---
+
+## Non-Functional Requirements
+
+### Performance
+- NFR-P1: [Operation] MUST complete in < [threshold] (p95) under [conditions].
+- NFR-P2: [Operation] SHOULD handle [throughput] requests per second.
+
+### Security
+- NFR-S1: All data in transit MUST be encrypted via TLS 1.2+.
+- NFR-S2: The system MUST rate-limit [operation] to [limit] per [period] per [scope].
+
+### Accessibility
+- NFR-A1: [UI component] MUST meet WCAG 2.1 AA standards.
+- NFR-A2: Error messages MUST be announced to screen readers.
+
+### Scalability
+- NFR-SC1: The system SHOULD handle [number] concurrent [entities].
+
+### Reliability
+- NFR-R1: The [service] MUST maintain [percentage]% uptime.
+
+---
+
+## Acceptance Criteria
+
+_Write in Given/When/Then (Gherkin) format._
+_Each criterion MUST reference at least one FR-* or NFR-*._
+
+### AC-1: [Descriptive name] (FR-1)
+Given [precondition]
+When [action]
+Then [expected result]
+And [additional assertion]
+
+### AC-2: [Descriptive name] (FR-2)
+Given [precondition]
+When [action]
+Then [expected result]
+
+### AC-3: [Descriptive name] (NFR-S2)
+Given [precondition]
+When [action]
+Then [expected result]
+And [additional assertion]
+
+---
+
+## Edge Cases
+
+_For every external dependency (API, database, file system, user input), specify at least one failure scenario._
+
+- EC-1: [Input/condition] -> [expected behavior].
+- EC-2: [Input/condition] -> [expected behavior].
+- EC-3: [External service] is unavailable -> [expected behavior].
+- EC-4: [Concurrent/race condition] -> [expected behavior].
+- EC-5: [Boundary value] -> [expected behavior].
+
+---
+
+## API Contracts
+
+_Define request/response shapes using TypeScript-style notation._
+_Cover all endpoints referenced in functional requirements._
+
+### [METHOD] [endpoint]
+
+Request:
+```typescript
+interface [Name]Request {{
+  field: string;       // Description, constraints
+  optional?: number;   // Default: [value]
+}}
+```
+
+Success Response ([status code]):
+```typescript
+interface [Name]Response {{
+  id: string;
+  field: string;
+  createdAt: string;   // ISO 8601
+}}
+```
+
+Error Response ([status code]):
+```typescript
+interface [Name]Error {{
+  error: "[ERROR_CODE]";
+  message: string;
+}}
+```
+
+---
+
+## Data Models
+
+_Define all entities referenced in requirements._
+
+### [Entity Name]
+| Field | Type | Constraints |
+|-------|------|-------------|
+| id | UUID | Primary key, auto-generated |
+| [field] | [type] | [constraints] |
+| createdAt | timestamp | UTC, immutable |
+| updatedAt | timestamp | UTC, auto-updated |
+
+---
+
+## Out of Scope
+
+_Explicit exclusions prevent scope creep. If someone asks for these during implementation, point them here._
+
+- OS-1: [Feature/capability] — [reason for exclusion or link to future spec].
+- OS-2: [Feature/capability] — [reason for exclusion].
+- OS-3: [Feature/capability] — deferred to [version/sprint].
+
+---
+
+## Open Questions
+
+_Track unresolved questions here. Each must be resolved before status moves to "Approved"._
+
+- [ ] Q1: [Question] — Owner: [name], Due: [date]
+- [ ] Q2: [Question] — Owner: [name], Due: [date]
+"""
+
+
+def generate_context_prompt(description: str) -> str:
+    """Generate a context section prompt based on the provided description."""
+    if description:
+        return textwrap.dedent(f"""\
+            {description}
+
+            _Expand this context section to include:_
+            _- Why does this feature exist? What problem does it solve?_
+            _- What is the business motivation? (link to user research, support tickets, metrics)_
+            _- What is the current state? (what exists today, what pain points exist)_
+            _- 2-4 paragraphs maximum._""")
+    return textwrap.dedent("""\
+        _Why does this feature exist? What problem does it solve? What is the business
+        motivation? Include links to user research, support tickets, or metrics that
+        justify this work. 2-4 paragraphs maximum._""")
+
+
+def generate_spec(name: str, description: str) -> str:
+    """Generate a spec document from name and description."""
+    context_prompt = generate_context_prompt(description)
+    return SPEC_TEMPLATE.format(
+        name=name,
+        date=date.today().isoformat(),
+        context_prompt=context_prompt,
+    )
+
+
+def generate_spec_json(name: str, description: str) -> Dict[str, Any]:
+    """Generate structured JSON representation of the spec template."""
+    return {
+        "spec": {
+            "title": f"Spec: {name}",
+            "metadata": {
+                "author": "[your name]",
+                "date": date.today().isoformat(),
+                "status": "Draft",
+                "reviewers": [],
+                "related_specs": [],
+            },
+            "context": description or "[Describe why this feature exists]",
+            "functional_requirements": [
+                {"id": "FR-1", "keyword": "MUST", "description": "[describe required behavior]"},
+                {"id": "FR-2", "keyword": "MUST", "description": "[describe another required behavior]"},
+                {"id": "FR-3", "keyword": "SHOULD", "description": "[describe recommended behavior]"},
+                {"id": "FR-4", "keyword": "MAY", "description": "[describe optional behavior]"},
+                {"id": "FR-5", "keyword": "MUST NOT", "description": "[describe prohibited behavior]"},
+            ],
+            "non_functional_requirements": {
+                "performance": [
+                    {"id": "NFR-P1", "description": "[operation] MUST complete in < [threshold]"},
+                ],
+                "security": [
+                    {"id": "NFR-S1", "description": "All data in transit MUST be encrypted via TLS 1.2+"},
+                ],
+                "accessibility": [
+                    {"id": "NFR-A1", "description": "[UI component] MUST meet WCAG 2.1 AA"},
+                ],
+                "scalability": [
+                    {"id": "NFR-SC1", "description": "[system] SHOULD handle [N] concurrent [entities]"},
+                ],
+                "reliability": [
+                    {"id": "NFR-R1", "description": "[service] MUST maintain [N]% uptime"},
+                ],
+            },
+            "acceptance_criteria": [
+                {
+                    "id": "AC-1",
+                    "name": "[descriptive name]",
+                    "references": ["FR-1"],
+                    "given": "[precondition]",
+                    "when": "[action]",
+                    "then": "[expected result]",
+                },
+            ],
+            "edge_cases": [
+                {"id": "EC-1", "condition": "[input/condition]", "behavior": "[expected behavior]"},
+            ],
+            "api_contracts": [
+                {
+                    "method": "[METHOD]",
+                    "endpoint": "[/api/path]",
+                    "request_fields": [{"name": "field", "type": "string", "constraints": "[description]"}],
+                    "success_response": {"status": 200, "fields": []},
+                    "error_response": {"status": 400, "fields": []},
+                },
+            ],
+            "data_models": [
+                {
+                    "name": "[Entity]",
+                    "fields": [
+                        {"name": "id", "type": "UUID", "constraints": "Primary key, auto-generated"},
+                    ],
+                },
+            ],
+            "out_of_scope": [
+                {"id": "OS-1", "description": "[feature/capability]", "reason": "[reason]"},
+            ],
+            "open_questions": [],
+        },
+        "metadata": {
+            "generated_by": "spec_generator.py",
+            "feature_name": name,
+            "feature_description": description,
+        },
+    }
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate a feature specification template from a name and description.",
+        epilog="Example: python spec_generator.py --name 'User Auth' --description 'OAuth 2.0 login flow'",
+    )
+    parser.add_argument(
+        "--name",
+        required=True,
+        help="Feature name (used as spec title)",
+    )
+    parser.add_argument(
+        "--description",
+        default="",
+        help="Brief feature description (used to seed the context section)",
+    )
+    parser.add_argument(
+        "--output",
+        "-o",
+        default=None,
+        help="Output file path (default: stdout)",
+    )
+    parser.add_argument(
+        "--format",
+        choices=["md", "json"],
+        default="md",
+        help="Output format: md (markdown) or json (default: md)",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        dest="json_flag",
+        help="Shorthand for --format json",
+    )
+
+    args = parser.parse_args()
+
+    output_format = "json" if args.json_flag else args.format
+
+    if output_format == "json":
+        result = generate_spec_json(args.name, args.description)
+        output = json.dumps(result, indent=2)
+    else:
+        output = generate_spec(args.name, args.description)
+
+    if args.output:
+        out_path = Path(args.output)
+        out_path.parent.mkdir(parents=True, exist_ok=True)
+        out_path.write_text(output, encoding="utf-8")
+        print(f"Spec template written to {out_path}", file=sys.stderr)
+    else:
+        print(output)
+
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/spec-driven-workflow/spec_validator.py
+++ b/engineering/spec-driven-workflow/spec_validator.py
@@ -0,0 +1,461 @@
+#!/usr/bin/env python3
+"""
+Spec Validator - Validates a feature specification for completeness and quality.
+
+Checks that a spec document contains all required sections, uses RFC 2119 keywords
+correctly, has acceptance criteria in Given/When/Then format, and scores overall
+completeness from 0-100.
+
+Sections checked:
+- Context, Functional Requirements, Non-Functional Requirements
+- Acceptance Criteria, Edge Cases, API Contracts, Data Models, Out of Scope
+
+Exit codes: 0 = pass, 1 = warnings, 2 = critical (or --strict with score < 80)
+
+No external dependencies - uses only Python standard library.
+"""
+
+import argparse
+import json
+import re
+import sys
+from pathlib import Path
+from typing import Dict, List, Any, Tuple
+
+
+# Section definitions: (key, display_name, required_header_patterns, weight)
+SECTIONS = [
+    ("context", "Context", [r"^##\s+Context"], 10),
+    ("functional_requirements", "Functional Requirements", [r"^##\s+Functional\s+Requirements"], 15),
+    ("non_functional_requirements", "Non-Functional Requirements", [r"^##\s+Non-Functional\s+Requirements"], 10),
+    ("acceptance_criteria", "Acceptance Criteria", [r"^##\s+Acceptance\s+Criteria"], 20),
+    ("edge_cases", "Edge Cases", [r"^##\s+Edge\s+Cases"], 10),
+    ("api_contracts", "API Contracts", [r"^##\s+API\s+Contracts"], 10),
+    ("data_models", "Data Models", [r"^##\s+Data\s+Models"], 10),
+    ("out_of_scope", "Out of Scope", [r"^##\s+Out\s+of\s+Scope"], 10),
+    ("metadata", "Metadata (Author/Date/Status)", [r"\*\*Author:\*\*", r"\*\*Date:\*\*", r"\*\*Status:\*\*"], 5),
+]
+
+RFC_KEYWORDS = ["MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY"]
+
+# Patterns that indicate placeholder/unfilled content
+PLACEHOLDER_PATTERNS = [
+    r"\[your\s+name\]",
+    r"\[list\s+reviewers\]",
+    r"\[describe\s+",
+    r"\[input/condition\]",
+    r"\[precondition\]",
+    r"\[action\]",
+    r"\[expected\s+result\]",
+    r"\[feature/capability\]",
+    r"\[operation\]",
+    r"\[threshold\]",
+    r"\[UI\s+component\]",
+    r"\[service\]",
+    r"\[percentage\]",
+    r"\[number\]",
+    r"\[METHOD\]",
+    r"\[endpoint\]",
+    r"\[Name\]",
+    r"\[Entity\s+Name\]",
+    r"\[type\]",
+    r"\[constraints\]",
+    r"\[field\]",
+    r"\[reason\]",
+]
+
+
+class SpecValidator:
+    """Validates a spec document for completeness and quality."""
+
+    def __init__(self, content: str, file_path: str = ""):
+        self.content = content
+        self.file_path = file_path
+        self.lines = content.split("\n")
+        self.findings: List[Dict[str, Any]] = []
+        self.section_scores: Dict[str, Dict[str, Any]] = {}
+
+    def validate(self) -> Dict[str, Any]:
+        """Run all validation checks and return results."""
+        self._check_sections_present()
+        self._check_functional_requirements()
+        self._check_acceptance_criteria()
+        self._check_edge_cases()
+        self._check_rfc_keywords()
+        self._check_api_contracts()
+        self._check_data_models()
+        self._check_out_of_scope()
+        self._check_placeholders()
+        self._check_traceability()
+
+        total_score = self._calculate_score()
+
+        return {
+            "file": self.file_path,
+            "score": total_score,
+            "grade": self._score_to_grade(total_score),
+            "sections": self.section_scores,
+            "findings": self.findings,
+            "summary": self._build_summary(total_score),
+        }
+
+    def _add_finding(self, severity: str, section: str, message: str):
+        """Record a validation finding."""
+        self.findings.append({
+            "severity": severity,  # "error", "warning", "info"
+            "section": section,
+            "message": message,
+        })
+
+    def _find_section_content(self, header_pattern: str) -> str:
+        """Extract content between a section header and the next ## header."""
+        in_section = False
+        section_lines = []
+        for line in self.lines:
+            if re.match(header_pattern, line, re.IGNORECASE):
+                in_section = True
+                continue
+            if in_section and re.match(r"^##\s+", line):
+                break
+            if in_section:
+                section_lines.append(line)
+        return "\n".join(section_lines)
+
+    def _check_sections_present(self):
+        """Check that all required sections exist."""
+        for key, name, patterns, weight in SECTIONS:
+            found = False
+            for pattern in patterns:
+                for line in self.lines:
+                    if re.search(pattern, line, re.IGNORECASE):
+                        found = True
+                        break
+                if found:
+                    break
+
+            if found:
+                self.section_scores[key] = {"name": name, "present": True, "score": weight, "max": weight}
+            else:
+                self.section_scores[key] = {"name": name, "present": False, "score": 0, "max": weight}
+                self._add_finding("error", key, f"Missing section: {name}")
+
+    def _check_functional_requirements(self):
+        """Validate functional requirements format and content."""
+        content = self._find_section_content(r"^##\s+Functional\s+Requirements")
+        if not content.strip():
+            return
+
+        fr_pattern = re.compile(r"-\s+FR-(\d+):")
+        matches = fr_pattern.findall(content)
+
+        if not matches:
+            self._add_finding("error", "functional_requirements", "No numbered requirements found (expected FR-N: format)")
+            if "functional_requirements" in self.section_scores:
+                self.section_scores["functional_requirements"]["score"] = max(
+                    0, self.section_scores["functional_requirements"]["score"] - 10
+                )
+            return
+
+        fr_count = len(matches)
+        if fr_count < 3:
+            self._add_finding("warning", "functional_requirements", f"Only {fr_count} requirements found. Most features need 3+.")
+
+        # Check for RFC keywords
+        has_keyword = False
+        for kw in RFC_KEYWORDS:
+            if kw in content:
+                has_keyword = True
+                break
+        if not has_keyword:
+            self._add_finding("warning", "functional_requirements", "No RFC 2119 keywords (MUST/SHOULD/MAY) found.")
+
+    def _check_acceptance_criteria(self):
+        """Validate acceptance criteria use Given/When/Then format."""
+        content = self._find_section_content(r"^##\s+Acceptance\s+Criteria")
+        if not content.strip():
+            return
+
+        ac_pattern = re.compile(r"###\s+AC-(\d+):")
+        matches = ac_pattern.findall(content)
+
+        if not matches:
+            self._add_finding("error", "acceptance_criteria", "No numbered acceptance criteria found (expected ### AC-N: format)")
+            if "acceptance_criteria" in self.section_scores:
+                self.section_scores["acceptance_criteria"]["score"] = max(
+                    0, self.section_scores["acceptance_criteria"]["score"] - 15
+                )
+            return
+
+        ac_count = len(matches)
+
+        # Check Given/When/Then
+        given_count = len(re.findall(r"(?i)\bgiven\b", content))
+        when_count = len(re.findall(r"(?i)\bwhen\b", content))
+        then_count = len(re.findall(r"(?i)\bthen\b", content))
+
+        if given_count < ac_count:
+            self._add_finding("warning", "acceptance_criteria",
+                              f"Found {ac_count} criteria but only {given_count} 'Given' clauses. Each AC needs Given/When/Then.")
+        if when_count < ac_count:
+            self._add_finding("warning", "acceptance_criteria",
+                              f"Found {ac_count} criteria but only {when_count} 'When' clauses.")
+        if then_count < ac_count:
+            self._add_finding("warning", "acceptance_criteria",
+                              f"Found {ac_count} criteria but only {then_count} 'Then' clauses.")
+
+        # Check for FR references
+        fr_refs = re.findall(r"\(FR-\d+", content)
+        if not fr_refs:
+            self._add_finding("warning", "acceptance_criteria",
+                              "No acceptance criteria reference functional requirements (expected (FR-N) in title).")
+
+    def _check_edge_cases(self):
+        """Validate edge cases section."""
+        content = self._find_section_content(r"^##\s+Edge\s+Cases")
+        if not content.strip():
+            return
+
+        ec_pattern = re.compile(r"-\s+EC-(\d+):")
+        matches = ec_pattern.findall(content)
+
+        if not matches:
+            self._add_finding("warning", "edge_cases", "No numbered edge cases found (expected EC-N: format)")
+        elif len(matches) < 3:
+            self._add_finding("warning", "edge_cases", f"Only {len(matches)} edge cases. Consider failure modes for each external dependency.")
+
+    def _check_rfc_keywords(self):
+        """Check RFC 2119 keywords are used consistently (capitalized)."""
+        # Look for lowercase must/should/may that might be intended as RFC keywords
+        context_content = self._find_section_content(r"^##\s+Functional\s+Requirements")
+        context_content += self._find_section_content(r"^##\s+Non-Functional\s+Requirements")
+
+        for kw in ["must", "should", "may"]:
+            # Find lowercase usage in requirement-like sentences
+            pattern = rf"(?:system|service|API|endpoint)\s+{kw}\s+"
+            if re.search(pattern, context_content):
+                self._add_finding("warning", "rfc_keywords",
+                                  f"Found lowercase '{kw}' in requirements. RFC 2119 keywords should be UPPERCASE: {kw.upper()}")
+
+    def _check_api_contracts(self):
+        """Validate API contracts section."""
+        content = self._find_section_content(r"^##\s+API\s+Contracts")
+        if not content.strip():
+            return
+
+        # Check for at least one endpoint definition
+        has_endpoint = bool(re.search(r"(GET|POST|PUT|PATCH|DELETE)\s+/", content))
+        if not has_endpoint:
+            self._add_finding("warning", "api_contracts", "No HTTP method + path found (expected e.g., POST /api/endpoint)")
+
+        # Check for request/response definitions
+        has_interface = bool(re.search(r"interface\s+\w+", content))
+        if not has_interface:
+            self._add_finding("info", "api_contracts", "No TypeScript interfaces found. Consider defining request/response shapes.")
+
+    def _check_data_models(self):
+        """Validate data models section."""
+        content = self._find_section_content(r"^##\s+Data\s+Models")
+        if not content.strip():
+            return
+
+        # Check for table format
+        has_table = bool(re.search(r"\|.*\|.*\|", content))
+        if not has_table:
+            self._add_finding("warning", "data_models", "No table-formatted data models found. Use | Field | Type | Constraints | format.")
+
+    def _check_out_of_scope(self):
+        """Validate out of scope section."""
+        content = self._find_section_content(r"^##\s+Out\s+of\s+Scope")
+        if not content.strip():
+            return
+
+        os_pattern = re.compile(r"-\s+OS-(\d+):")
+        matches = os_pattern.findall(content)
+
+        if not matches:
+            self._add_finding("warning", "out_of_scope", "No numbered exclusions found (expected OS-N: format)")
+        elif len(matches) < 2:
+            self._add_finding("info", "out_of_scope", "Only 1 exclusion listed. Consider what was deliberately left out.")
+
+    def _check_placeholders(self):
+        """Check for unfilled placeholder text."""
+        placeholder_count = 0
+        for pattern in PLACEHOLDER_PATTERNS:
+            matches = re.findall(pattern, self.content, re.IGNORECASE)
+            placeholder_count += len(matches)
+
+        if placeholder_count > 0:
+            self._add_finding("warning", "placeholders",
+                              f"Found {placeholder_count} placeholder(s) that need to be filled in (e.g., [your name], [describe ...]).")
+            # Deduct from overall score proportionally
+            for key in self.section_scores:
+                if self.section_scores[key]["present"]:
+                    deduction = min(3, self.section_scores[key]["score"])
+                    self.section_scores[key]["score"] = max(0, self.section_scores[key]["score"] - deduction)
+
+    def _check_traceability(self):
+        """Check that acceptance criteria reference functional requirements."""
+        ac_content = self._find_section_content(r"^##\s+Acceptance\s+Criteria")
+        fr_content = self._find_section_content(r"^##\s+Functional\s+Requirements")
+
+        if not ac_content.strip() or not fr_content.strip():
+            return
+
+        # Extract FR IDs
+        fr_ids = set(re.findall(r"FR-(\d+)", fr_content))
+        # Extract FR references from AC
+        ac_fr_refs = set(re.findall(r"FR-(\d+)", ac_content))
+
+        unreferenced = fr_ids - ac_fr_refs
+        if unreferenced:
+            unreferenced_list = ", ".join(f"FR-{i}" for i in sorted(unreferenced))
+            self._add_finding("warning", "traceability",
+                              f"Functional requirements without acceptance criteria: {unreferenced_list}")
+
+    def _calculate_score(self) -> int:
+        """Calculate the total completeness score."""
+        total = sum(s["score"] for s in self.section_scores.values())
+        maximum = sum(s["max"] for s in self.section_scores.values())
+
+        if maximum == 0:
+            return 0
+
+        # Apply finding-based deductions
+        error_count = sum(1 for f in self.findings if f["severity"] == "error")
+        warning_count = sum(1 for f in self.findings if f["severity"] == "warning")
+
+        base_score = round((total / maximum) * 100)
+        deduction = (error_count * 5) + (warning_count * 2)
+
+        return max(0, min(100, base_score - deduction))
+
+    @staticmethod
+    def _score_to_grade(score: int) -> str:
+        """Convert score to letter grade."""
+        if score >= 90:
+            return "A"
+        if score >= 80:
+            return "B"
+        if score >= 70:
+            return "C"
+        if score >= 60:
+            return "D"
+        return "F"
+
+    def _build_summary(self, score: int) -> str:
+        """Build human-readable summary."""
+        errors = [f for f in self.findings if f["severity"] == "error"]
+        warnings = [f for f in self.findings if f["severity"] == "warning"]
+        infos = [f for f in self.findings if f["severity"] == "info"]
+
+        lines = [
+            f"Spec Completeness Score: {score}/100 (Grade: {self._score_to_grade(score)})",
+            f"Errors: {len(errors)}, Warnings: {len(warnings)}, Info: {len(infos)}",
+            "",
+        ]
+
+        if errors:
+            lines.append("ERRORS (must fix):")
+            for e in errors:
+                lines.append(f"  [{e['section']}] {e['message']}")
+            lines.append("")
+
+        if warnings:
+            lines.append("WARNINGS (should fix):")
+            for w in warnings:
+                lines.append(f"  [{w['section']}] {w['message']}")
+            lines.append("")
+
+        if infos:
+            lines.append("INFO:")
+            for i in infos:
+                lines.append(f"  [{i['section']}] {i['message']}")
+            lines.append("")
+
+        # Section breakdown
+        lines.append("Section Breakdown:")
+        for key, data in self.section_scores.items():
+            status = "PRESENT" if data["present"] else "MISSING"
+            lines.append(f"  {data['name']}: {data['score']}/{data['max']} ({status})")
+
+        return "\n".join(lines)
+
+
+def format_human(result: Dict[str, Any]) -> str:
+    """Format validation result for human reading."""
+    lines = [
+        "=" * 60,
+        "SPEC VALIDATION REPORT",
+        "=" * 60,
+        "",
+    ]
+    if result["file"]:
+        lines.append(f"File: {result['file']}")
+        lines.append("")
+
+    lines.append(result["summary"])
+
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Validate a feature specification for completeness and quality.",
+        epilog="Example: python spec_validator.py --file spec.md --strict",
+    )
+    parser.add_argument(
+        "--file",
+        "-f",
+        required=True,
+        help="Path to the spec markdown file",
+    )
+    parser.add_argument(
+        "--strict",
+        action="store_true",
+        help="Exit with code 2 if score is below 80",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        dest="json_flag",
+        help="Output results as JSON",
+    )
+
+    args = parser.parse_args()
+
+    file_path = Path(args.file)
+    if not file_path.exists():
+        print(f"Error: File not found: {file_path}", file=sys.stderr)
+        sys.exit(2)
+
+    content = file_path.read_text(encoding="utf-8")
+
+    if not content.strip():
+        print(f"Error: File is empty: {file_path}", file=sys.stderr)
+        sys.exit(2)
+
+    validator = SpecValidator(content, str(file_path))
+    result = validator.validate()
+
+    if args.json_flag:
+        print(json.dumps(result, indent=2))
+    else:
+        print(format_human(result))
+
+    # Determine exit code
+    score = result["score"]
+    has_errors = any(f["severity"] == "error" for f in result["findings"])
+    has_warnings = any(f["severity"] == "warning" for f in result["findings"])
+
+    if args.strict and score < 80:
+        sys.exit(2)
+    elif has_errors:
+        sys.exit(2)
+    elif has_warnings:
+        sys.exit(1)
+    else:
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/spec-driven-workflow/test_extractor.py
+++ b/engineering/spec-driven-workflow/test_extractor.py
@@ -0,0 +1,431 @@
+#!/usr/bin/env python3
+"""
+Test Extractor - Extracts test case stubs from a feature specification.
+
+Parses acceptance criteria (Given/When/Then) and edge cases from a spec
+document, then generates test stubs for the specified framework.
+
+Supported frameworks: pytest, jest, go-test
+
+Exit codes: 0 = success, 1 = warnings (some criteria unparseable), 2 = critical error
+
+No external dependencies - uses only Python standard library.
+"""
+
+import argparse
+import json
+import re
+import sys
+import textwrap
+from pathlib import Path
+from typing import Dict, List, Any, Optional, Tuple
+
+
+class SpecParser:
+    """Parses spec documents to extract testable criteria."""
+
+    def __init__(self, content: str):
+        self.content = content
+        self.lines = content.split("\n")
+
+    def extract_acceptance_criteria(self) -> List[Dict[str, Any]]:
+        """Extract AC-N blocks with Given/When/Then clauses."""
+        criteria = []
+        ac_pattern = re.compile(r"###\s+AC-(\d+):\s*(.+?)(?:\s*\(([^)]+)\))?\s*$")
+
+        in_ac = False
+        current_ac: Optional[Dict[str, Any]] = None
+        body_lines: List[str] = []
+
+        for line in self.lines:
+            match = ac_pattern.match(line)
+            if match:
+                # Save previous AC
+                if current_ac is not None:
+                    current_ac["body"] = "\n".join(body_lines).strip()
+                    self._parse_gwt(current_ac)
+                    criteria.append(current_ac)
+
+                ac_id = int(match.group(1))
+                name = match.group(2).strip()
+                refs = match.group(3).strip() if match.group(3) else ""
+
+                current_ac = {
+                    "id": f"AC-{ac_id}",
+                    "name": name,
+                    "references": [r.strip() for r in refs.split(",") if r.strip()] if refs else [],
+                    "given": "",
+                    "when": "",
+                    "then": [],
+                    "body": "",
+                }
+                body_lines = []
+                in_ac = True
+            elif in_ac:
+                # Check if we hit another ## section
+                if re.match(r"^##\s+", line) and not re.match(r"^###\s+", line):
+                    in_ac = False
+                    if current_ac is not None:
+                        current_ac["body"] = "\n".join(body_lines).strip()
+                        self._parse_gwt(current_ac)
+                        criteria.append(current_ac)
+                        current_ac = None
+                else:
+                    body_lines.append(line)
+
+        # Don't forget the last one
+        if current_ac is not None:
+            current_ac["body"] = "\n".join(body_lines).strip()
+            self._parse_gwt(current_ac)
+            criteria.append(current_ac)
+
+        return criteria
+
+    def extract_edge_cases(self) -> List[Dict[str, Any]]:
+        """Extract EC-N edge case items."""
+        edge_cases = []
+        ec_pattern = re.compile(r"-\s+EC-(\d+):\s*(.+?)(?:\s*->\s*|\s*->\s*|\s*→\s*)(.+)")
+
+        in_section = False
+        for line in self.lines:
+            if re.match(r"^##\s+Edge\s+Cases", line, re.IGNORECASE):
+                in_section = True
+                continue
+            if in_section and re.match(r"^##\s+", line):
+                break
+            if in_section:
+                match = ec_pattern.match(line.strip())
+                if match:
+                    edge_cases.append({
+                        "id": f"EC-{match.group(1)}",
+                        "condition": match.group(2).strip().rstrip("."),
+                        "behavior": match.group(3).strip().rstrip("."),
+                    })
+
+        return edge_cases
+
+    def extract_spec_title(self) -> str:
+        """Extract the spec title from the first H1."""
+        for line in self.lines:
+            match = re.match(r"^#\s+(?:Spec:\s*)?(.+)", line)
+            if match:
+                return match.group(1).strip()
+        return "UnknownFeature"
+
+    @staticmethod
+    def _parse_gwt(ac: Dict[str, Any]):
+        """Parse Given/When/Then from the AC body text."""
+        body = ac["body"]
+        lines = body.split("\n")
+
+        current_section = None
+        for line in lines:
+            stripped = line.strip()
+            if not stripped:
+                continue
+
+            lower = stripped.lower()
+            if lower.startswith("given "):
+                current_section = "given"
+                ac["given"] = stripped[6:].strip()
+            elif lower.startswith("when "):
+                current_section = "when"
+                ac["when"] = stripped[5:].strip()
+            elif lower.startswith("then "):
+                current_section = "then"
+                ac["then"].append(stripped[5:].strip())
+            elif lower.startswith("and "):
+                if current_section == "then":
+                    ac["then"].append(stripped[4:].strip())
+                elif current_section == "given":
+                    ac["given"] += " AND " + stripped[4:].strip()
+                elif current_section == "when":
+                    ac["when"] += " AND " + stripped[4:].strip()
+
+
+def _sanitize_name(name: str) -> str:
+    """Convert a human-readable name to a valid function/method name."""
+    # Remove parenthetical references like (FR-1)
+    name = re.sub(r"\([^)]*\)", "", name)
+    # Replace non-alphanumeric with underscore
+    name = re.sub(r"[^a-zA-Z0-9]+", "_", name)
+    # Remove leading/trailing underscores
+    name = name.strip("_").lower()
+    return name or "unnamed"
+
+
+def _to_pascal_case(name: str) -> str:
+    """Convert to PascalCase for Go test names."""
+    parts = _sanitize_name(name).split("_")
+    return "".join(p.capitalize() for p in parts if p)
+
+
+class PytestGenerator:
+    """Generates pytest test stubs."""
+
+    def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
+        class_name = "Test" + _to_pascal_case(title)
+        lines = [
+            '"""',
+            f"Test suite for: {title}",
+            f"Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
+            "",
+            "All tests are stubs — implement the test body to make them pass.",
+            '"""',
+            "",
+            "import pytest",
+            "",
+            "",
+            f"class {class_name}:",
+            f'    """Tests for {title}."""',
+            "",
+        ]
+
+        for ac in criteria:
+            method_name = f"test_{ac['id'].lower().replace('-', '')}_{_sanitize_name(ac['name'])}"
+            docstring = f'{ac["id"]}: {ac["name"]}'
+            ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
+
+            lines.append(f"    def {method_name}(self):")
+            lines.append(f'        """{docstring}{ref_str}"""')
+
+            if ac["given"]:
+                lines.append(f"        # Given {ac['given']}")
+            if ac["when"]:
+                lines.append(f"        # When {ac['when']}")
+            for t in ac["then"]:
+                lines.append(f"        # Then {t}")
+
+            lines.append('        raise NotImplementedError("Implement this test")')
+            lines.append("")
+
+        if edge_cases:
+            lines.append("    # --- Edge Cases ---")
+            lines.append("")
+
+        for ec in edge_cases:
+            method_name = f"test_{ec['id'].lower().replace('-', '')}_{_sanitize_name(ec['condition'])}"
+            lines.append(f"    def {method_name}(self):")
+            lines.append(f'        """{ec["id"]}: {ec["condition"]} -> {ec["behavior"]}"""')
+            lines.append(f"        # Condition: {ec['condition']}")
+            lines.append(f"        # Expected: {ec['behavior']}")
+            lines.append('        raise NotImplementedError("Implement this test")')
+            lines.append("")
+
+        return "\n".join(lines)
+
+
+class JestGenerator:
+    """Generates Jest/Vitest test stubs (TypeScript)."""
+
+    def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
+        lines = [
+            f"/**",
+            f" * Test suite for: {title}",
+            f" * Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
+            f" *",
+            f" * All tests are stubs — implement the test body to make them pass.",
+            f" */",
+            "",
+            f'describe("{title}", () => {{',
+        ]
+
+        for ac in criteria:
+            ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
+            test_name = f"{ac['id']}: {ac['name']}{ref_str}"
+
+            lines.append(f'  it("{test_name}", () => {{')
+            if ac["given"]:
+                lines.append(f"    // Given {ac['given']}")
+            if ac["when"]:
+                lines.append(f"    // When {ac['when']}")
+            for t in ac["then"]:
+                lines.append(f"    // Then {t}")
+            lines.append("")
+            lines.append('    throw new Error("Not implemented");')
+            lines.append("  });")
+            lines.append("")
+
+        if edge_cases:
+            lines.append("  // --- Edge Cases ---")
+            lines.append("")
+
+        for ec in edge_cases:
+            test_name = f"{ec['id']}: {ec['condition']}"
+            lines.append(f'  it("{test_name}", () => {{')
+            lines.append(f"    // Condition: {ec['condition']}")
+            lines.append(f"    // Expected: {ec['behavior']}")
+            lines.append("")
+            lines.append('    throw new Error("Not implemented");')
+            lines.append("  });")
+            lines.append("")
+
+        lines.append("});")
+        lines.append("")
+
+        return "\n".join(lines)
+
+
+class GoTestGenerator:
+    """Generates Go test stubs."""
+
+    def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
+        package_name = _sanitize_name(title).split("_")[0] or "feature"
+
+        lines = [
+            f"package {package_name}_test",
+            "",
+            "import (",
+            '\t"testing"',
+            ")",
+            "",
+            f"// Test suite for: {title}",
+            f"// Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
+            f"// All tests are stubs — implement the test body to make them pass.",
+            "",
+        ]
+
+        for ac in criteria:
+            func_name = "Test" + _to_pascal_case(ac["id"] + " " + ac["name"])
+            ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
+
+            lines.append(f"// {ac['id']}: {ac['name']}{ref_str}")
+            lines.append(f"func {func_name}(t *testing.T) {{")
+
+            if ac["given"]:
+                lines.append(f"\t// Given {ac['given']}")
+            if ac["when"]:
+                lines.append(f"\t// When {ac['when']}")
+            for then_clause in ac["then"]:
+                lines.append(f"\t// Then {then_clause}")
+
+            lines.append("")
+            lines.append('\tt.Fatal("Not implemented")')
+            lines.append("}")
+            lines.append("")
+
+        if edge_cases:
+            lines.append("// --- Edge Cases ---")
+            lines.append("")
+
+        for ec in edge_cases:
+            func_name = "Test" + _to_pascal_case(ec["id"] + " " + ec["condition"])
+            lines.append(f"// {ec['id']}: {ec['condition']} -> {ec['behavior']}")
+            lines.append(f"func {func_name}(t *testing.T) {{")
+            lines.append(f"\t// Condition: {ec['condition']}")
+            lines.append(f"\t// Expected: {ec['behavior']}")
+            lines.append("")
+            lines.append('\tt.Fatal("Not implemented")')
+            lines.append("}")
+            lines.append("")
+
+        return "\n".join(lines)
+
+
+GENERATORS = {
+    "pytest": PytestGenerator,
+    "jest": JestGenerator,
+    "go-test": GoTestGenerator,
+}
+
+FILE_EXTENSIONS = {
+    "pytest": ".py",
+    "jest": ".test.ts",
+    "go-test": "_test.go",
+}
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Extract test case stubs from a feature specification.",
+        epilog="Example: python test_extractor.py --file spec.md --framework pytest --output tests/test_feature.py",
+    )
+    parser.add_argument(
+        "--file",
+        "-f",
+        required=True,
+        help="Path to the spec markdown file",
+    )
+    parser.add_argument(
+        "--framework",
+        choices=list(GENERATORS.keys()),
+        default="pytest",
+        help="Target test framework (default: pytest)",
+    )
+    parser.add_argument(
+        "--output",
+        "-o",
+        default=None,
+        help="Output file path (default: stdout)",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        dest="json_flag",
+        help="Output extracted criteria as JSON instead of test code",
+    )
+
+    args = parser.parse_args()
+
+    file_path = Path(args.file)
+    if not file_path.exists():
+        print(f"Error: File not found: {file_path}", file=sys.stderr)
+        sys.exit(2)
+
+    content = file_path.read_text(encoding="utf-8")
+    if not content.strip():
+        print(f"Error: File is empty: {file_path}", file=sys.stderr)
+        sys.exit(2)
+
+    spec_parser = SpecParser(content)
+    title = spec_parser.extract_spec_title()
+    criteria = spec_parser.extract_acceptance_criteria()
+    edge_cases = spec_parser.extract_edge_cases()
+
+    if not criteria and not edge_cases:
+        print("Error: No acceptance criteria or edge cases found in spec.", file=sys.stderr)
+        sys.exit(2)
+
+    warnings = []
+    for ac in criteria:
+        if not ac["given"] and not ac["when"]:
+            warnings.append(f"{ac['id']}: Could not parse Given/When/Then — check format.")
+
+    if args.json_flag:
+        result = {
+            "spec_title": title,
+            "framework": args.framework,
+            "acceptance_criteria": criteria,
+            "edge_cases": edge_cases,
+            "warnings": warnings,
+            "counts": {
+                "acceptance_criteria": len(criteria),
+                "edge_cases": len(edge_cases),
+                "total_test_cases": len(criteria) + len(edge_cases),
+            },
+        }
+        output = json.dumps(result, indent=2)
+    else:
+        generator_class = GENERATORS[args.framework]
+        generator = generator_class()
+        output = generator.generate(title, criteria, edge_cases)
+
+    if args.output:
+        out_path = Path(args.output)
+        out_path.parent.mkdir(parents=True, exist_ok=True)
+        out_path.write_text(output, encoding="utf-8")
+        total = len(criteria) + len(edge_cases)
+        print(f"Generated {total} test stubs -> {out_path}", file=sys.stderr)
+    else:
+        print(output)
+
+    if warnings:
+        for w in warnings:
+            print(f"Warning: {w}", file=sys.stderr)
+        sys.exit(1)
+
+    sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()