feat(engineering): add browser-automation and spec-driven-workflow skills
browser-automation (564-line SKILL.md, 3 scripts, 3 references): - Web scraping, form filling, screenshot capture, data extraction - Anti-detection patterns, cookie/session management, dynamic content - scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py - NOT testing (that's playwright-pro) — this is automation & scraping spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references): - Spec-first development: write spec BEFORE code - Bounded autonomy rules, 6-phase workflow, self-review checklist - spec_generator.py, spec_validator.py, test_extractor.py - Pairs with tdd-guide for red-green-refactor after spec Updated engineering plugin.json (31 → 33 skills). Added both to mkdocs.yml nav and generated docs pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
586
engineering/spec-driven-workflow/SKILL.md
Normal file
586
engineering/spec-driven-workflow/SKILL.md
Normal file
@@ -0,0 +1,586 @@
|
||||
---
|
||||
name: "spec-driven-workflow"
|
||||
description: "Use when the user asks to write specs before code, define acceptance criteria, plan features before implementation, generate tests from specifications, or follow spec-first development practices."
|
||||
---
|
||||
|
||||
# Spec-Driven Workflow — POWERFUL
|
||||
|
||||
## Overview
|
||||
|
||||
Spec-driven workflow enforces a single, non-negotiable rule: **write the specification BEFORE you write any code.** Not alongside. Not after. Before.
|
||||
|
||||
This is not documentation. This is a contract. A spec defines what the system MUST do, what it SHOULD do, and what it explicitly WILL NOT do. Every line of code you write traces back to a requirement in the spec. Every test traces back to an acceptance criterion. If it is not in the spec, it does not get built.
|
||||
|
||||
### Why Spec-First Matters
|
||||
|
||||
1. **Eliminates rework.** 60-80% of defects originate from requirements, not implementation. Catching ambiguity in a spec costs minutes; catching it in production costs days.
|
||||
2. **Forces clarity.** If you cannot write what the system should do in plain language, you do not understand the problem well enough to write code.
|
||||
3. **Enables parallelism.** Once a spec is approved, frontend, backend, QA, and documentation can all start simultaneously.
|
||||
4. **Creates accountability.** The spec is the definition of done. No arguments about whether a feature is "complete" — either it satisfies the acceptance criteria or it does not.
|
||||
5. **Feeds TDD directly.** Acceptance criteria in Given/When/Then format translate 1:1 into test cases. The spec IS the test plan.
|
||||
|
||||
### The Iron Law
|
||||
|
||||
```
|
||||
NO CODE WITHOUT AN APPROVED SPEC.
|
||||
NO EXCEPTIONS. NO "QUICK PROTOTYPES." NO "I'LL DOCUMENT IT LATER."
|
||||
```
|
||||
|
||||
If the spec is not written, reviewed, and approved, implementation does not begin. Period.
|
||||
|
||||
---
|
||||
|
||||
## The Spec Format
|
||||
|
||||
Every spec follows this structure. No sections are optional — if a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not forgotten.
|
||||
|
||||
### 1. Title and Context
|
||||
|
||||
```markdown
|
||||
# Spec: [Feature Name]
|
||||
|
||||
**Author:** [name]
|
||||
**Date:** [ISO 8601]
|
||||
**Status:** Draft | In Review | Approved | Superseded
|
||||
**Reviewers:** [list]
|
||||
**Related specs:** [links]
|
||||
|
||||
## Context
|
||||
|
||||
[Why does this feature exist? What problem does it solve? What is the business
|
||||
motivation? Include links to user research, support tickets, or metrics that
|
||||
justify this work. 2-4 paragraphs maximum.]
|
||||
```
|
||||
|
||||
### 2. Functional Requirements (RFC 2119)
|
||||
|
||||
Use RFC 2119 keywords precisely:
|
||||
|
||||
| Keyword | Meaning |
|
||||
|---------|---------|
|
||||
| **MUST** | Absolute requirement. Failing this means the implementation is non-conformant. |
|
||||
| **MUST NOT** | Absolute prohibition. Doing this means the implementation is broken. |
|
||||
| **SHOULD** | Recommended. May be omitted with documented justification. |
|
||||
| **SHOULD NOT** | Discouraged. May be included with documented justification. |
|
||||
| **MAY** | Optional. Purely at the implementer's discretion. |
|
||||
|
||||
```markdown
|
||||
## Functional Requirements
|
||||
|
||||
- FR-1: The system MUST authenticate users via OAuth 2.0 PKCE flow.
|
||||
- FR-2: The system MUST reject tokens older than 24 hours.
|
||||
- FR-3: The system SHOULD support refresh token rotation.
|
||||
- FR-4: The system MAY cache user profiles for up to 5 minutes.
|
||||
- FR-5: The system MUST NOT store plaintext passwords under any circumstance.
|
||||
```
|
||||
|
||||
Number every requirement. Use `FR-` prefix. Each requirement is a single, testable statement.
|
||||
|
||||
### 3. Non-Functional Requirements
|
||||
|
||||
```markdown
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- NFR-P1: Login flow MUST complete in < 500ms (p95) under normal load.
|
||||
- NFR-P2: Token validation MUST complete in < 50ms (p99).
|
||||
|
||||
### Security
|
||||
- NFR-S1: All tokens MUST be transmitted over TLS 1.2+.
|
||||
- NFR-S2: The system MUST rate-limit login attempts to 5/minute per IP.
|
||||
|
||||
### Accessibility
|
||||
- NFR-A1: Login form MUST meet WCAG 2.1 AA standards.
|
||||
- NFR-A2: Error messages MUST be announced to screen readers.
|
||||
|
||||
### Scalability
|
||||
- NFR-SC1: The system SHOULD handle 10,000 concurrent sessions.
|
||||
|
||||
### Reliability
|
||||
- NFR-R1: The authentication service MUST maintain 99.9% uptime.
|
||||
```
|
||||
|
||||
### 4. Acceptance Criteria (Given/When/Then)
|
||||
|
||||
Every functional requirement maps to one or more acceptance criteria. Use Gherkin syntax:
|
||||
|
||||
```markdown
|
||||
## Acceptance Criteria
|
||||
|
||||
### AC-1: Successful login (FR-1)
|
||||
Given a user with valid credentials
|
||||
When they submit the login form with correct email and password
|
||||
Then they receive a valid access token
|
||||
And they are redirected to the dashboard
|
||||
And the login event is logged with timestamp and IP
|
||||
|
||||
### AC-2: Expired token rejection (FR-2)
|
||||
Given a user with an access token issued 25 hours ago
|
||||
When they make an API request with that token
|
||||
Then they receive a 401 Unauthorized response
|
||||
And the response body contains error code "TOKEN_EXPIRED"
|
||||
And they are NOT redirected (API clients handle their own flow)
|
||||
|
||||
### AC-3: Rate limiting (NFR-S2)
|
||||
Given an IP address that has made 5 failed login attempts in the last minute
|
||||
When a 6th login attempt arrives from that IP
|
||||
Then the request is rejected with 429 Too Many Requests
|
||||
And the response includes a Retry-After header
|
||||
```
|
||||
|
||||
### 5. Edge Cases and Error Scenarios
|
||||
|
||||
```markdown
|
||||
## Edge Cases
|
||||
|
||||
- EC-1: User submits login form with empty email → Show validation error, do not hit API.
|
||||
- EC-2: OAuth provider is down → Show "Service temporarily unavailable", retry after 30s.
|
||||
- EC-3: User has account but no password (social-only) → Redirect to social login.
|
||||
- EC-4: Concurrent login from two devices → Both sessions are valid (no single-session enforcement).
|
||||
- EC-5: Token expires mid-request → Complete the current request, return warning header.
|
||||
```
|
||||
|
||||
### 6. API Contracts
|
||||
|
||||
Define request/response shapes using TypeScript-style notation:
|
||||
|
||||
```markdown
|
||||
## API Contracts
|
||||
|
||||
### POST /api/auth/login
|
||||
Request:
|
||||
```typescript
|
||||
interface LoginRequest {
|
||||
email: string; // MUST be valid email format
|
||||
password: string; // MUST be 8-128 characters
|
||||
rememberMe?: boolean; // Default: false
|
||||
}
|
||||
```
|
||||
|
||||
Success Response (200):
|
||||
```typescript
|
||||
interface LoginResponse {
|
||||
accessToken: string; // JWT, expires in 24h
|
||||
refreshToken: string; // Opaque, expires in 30d
|
||||
expiresIn: number; // Seconds until access token expires
|
||||
user: {
|
||||
id: string;
|
||||
email: string;
|
||||
displayName: string;
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
Error Response (401):
|
||||
```typescript
|
||||
interface AuthError {
|
||||
error: "INVALID_CREDENTIALS" | "TOKEN_EXPIRED" | "ACCOUNT_LOCKED";
|
||||
message: string;
|
||||
retryAfter?: number; // Seconds, present for rate-limited responses
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
### 7. Data Models
|
||||
|
||||
```markdown
|
||||
## Data Models
|
||||
|
||||
### User
|
||||
| Field | Type | Constraints |
|
||||
|-------|------|-------------|
|
||||
| id | UUID | Primary key, auto-generated |
|
||||
| email | string | Unique, max 255 chars, valid email format |
|
||||
| passwordHash | string | bcrypt, never exposed via API |
|
||||
| createdAt | timestamp | UTC, immutable |
|
||||
| lastLoginAt | timestamp | UTC, updated on each login |
|
||||
| loginAttempts | integer | Reset to 0 on successful login |
|
||||
| lockedUntil | timestamp | Null if not locked |
|
||||
```
|
||||
|
||||
### 8. Out of Scope
|
||||
|
||||
Explicit exclusions prevent scope creep:
|
||||
|
||||
```markdown
|
||||
## Out of Scope
|
||||
|
||||
- OS-1: Multi-factor authentication (separate spec: SPEC-042)
|
||||
- OS-2: Social login providers beyond Google and GitHub
|
||||
- OS-3: Admin impersonation of user accounts
|
||||
- OS-4: Password complexity rules beyond minimum length (deferred to v2)
|
||||
- OS-5: Session management UI (users cannot see/revoke active sessions yet)
|
||||
```
|
||||
|
||||
If someone asks for an out-of-scope item during implementation, point them to this section. Do not build it.
|
||||
|
||||
---
|
||||
|
||||
## Bounded Autonomy Rules
|
||||
|
||||
These rules define when an agent (human or AI) MUST stop and ask for guidance vs. when they can proceed independently.
|
||||
|
||||
### STOP and Ask When:
|
||||
|
||||
1. **Scope creep detected.** The implementation requires something not in the spec. Even if it seems obviously needed, STOP. The spec might have excluded it deliberately.
|
||||
|
||||
2. **Ambiguity exceeds 30%.** If you cannot determine the correct behavior from the spec for more than 30% of a given requirement, the spec is incomplete. Do not guess.
|
||||
|
||||
3. **Breaking changes required.** The implementation would change an existing API contract, database schema, or public interface. Always escalate.
|
||||
|
||||
4. **Security implications.** Any change that touches authentication, authorization, encryption, or PII handling requires explicit approval.
|
||||
|
||||
5. **Performance characteristics unknown.** If a requirement says "MUST complete in < 500ms" but you have no way to measure or guarantee that, escalate before implementing a guess.
|
||||
|
||||
6. **Cross-team dependencies.** If the spec requires coordination with another team or service, confirm the dependency before building against it.
|
||||
|
||||
### Continue Autonomously When:
|
||||
|
||||
1. **Spec is clear and unambiguous** for the current task.
|
||||
2. **All acceptance criteria have passing tests** and you are refactoring internals.
|
||||
3. **Changes are non-breaking** — no public API, schema, or behavior changes.
|
||||
4. **Implementation is a direct translation** of a well-defined acceptance criterion.
|
||||
5. **Error handling follows established patterns** already documented in the codebase.
|
||||
|
||||
### Escalation Protocol
|
||||
|
||||
When you must stop, provide:
|
||||
|
||||
```markdown
|
||||
## Escalation: [Brief Title]
|
||||
|
||||
**Blocked on:** [requirement ID, e.g., FR-3]
|
||||
**Question:** [Specific, answerable question — not "what should I do?"]
|
||||
**Options considered:**
|
||||
A. [Option] — Pros: [...] Cons: [...]
|
||||
B. [Option] — Pros: [...] Cons: [...]
|
||||
**My recommendation:** [A or B, with reasoning]
|
||||
**Impact of waiting:** [What is blocked until this is resolved?]
|
||||
```
|
||||
|
||||
Never escalate without a recommendation. Never present an open-ended question. Always give options.
|
||||
|
||||
See `references/bounded_autonomy_rules.md` for the complete decision matrix.
|
||||
|
||||
---
|
||||
|
||||
## Workflow — 6 Phases
|
||||
|
||||
### Phase 1: Gather Requirements
|
||||
|
||||
**Goal:** Understand what needs to be built and why.
|
||||
|
||||
1. **Interview the user.** Ask:
|
||||
- What problem does this solve?
|
||||
- Who are the users?
|
||||
- What does success look like?
|
||||
- What explicitly should NOT be built?
|
||||
2. **Read existing code.** Understand the current system before proposing changes.
|
||||
3. **Identify constraints.** Performance budgets, security requirements, backward compatibility.
|
||||
4. **List unknowns.** Every unknown is a risk. Surface them now, not during implementation.
|
||||
|
||||
**Exit criteria:** You can explain the feature to someone unfamiliar with the project in 2 minutes.
|
||||
|
||||
### Phase 2: Write Spec
|
||||
|
||||
**Goal:** Produce a complete spec document following The Spec Format above.
|
||||
|
||||
1. Fill every section of the template. No section left blank.
|
||||
2. Number all requirements (FR-*, NFR-*, AC-*, EC-*, OS-*).
|
||||
3. Use RFC 2119 keywords precisely.
|
||||
4. Write acceptance criteria in Given/When/Then format.
|
||||
5. Define API contracts with TypeScript-style types.
|
||||
6. List explicit exclusions in Out of Scope.
|
||||
|
||||
**Exit criteria:** The spec can be handed to a developer who was not in the requirements meeting, and they can implement the feature without asking clarifying questions.
|
||||
|
||||
### Phase 3: Validate Spec
|
||||
|
||||
**Goal:** Verify the spec is complete, consistent, and implementable.
|
||||
|
||||
Run `spec_validator.py` against the spec file:
|
||||
|
||||
```bash
|
||||
python spec_validator.py --file spec.md --strict
|
||||
```
|
||||
|
||||
Manual validation checklist:
|
||||
- [ ] Every functional requirement has at least one acceptance criterion
|
||||
- [ ] Every acceptance criterion is testable (no subjective language)
|
||||
- [ ] API contracts cover all endpoints mentioned in requirements
|
||||
- [ ] Data models cover all entities mentioned in requirements
|
||||
- [ ] Edge cases cover failure modes for every external dependency
|
||||
- [ ] Out of scope is explicit about what was considered and rejected
|
||||
- [ ] Non-functional requirements have measurable thresholds
|
||||
|
||||
**Exit criteria:** Spec scores 80+ on validator, and all manual checklist items pass.
|
||||
|
||||
### Phase 4: Generate Tests
|
||||
|
||||
**Goal:** Extract test cases from acceptance criteria before writing implementation code.
|
||||
|
||||
Run `test_extractor.py` against the approved spec:
|
||||
|
||||
```bash
|
||||
python test_extractor.py --file spec.md --framework pytest --output tests/
|
||||
```
|
||||
|
||||
1. Each acceptance criterion becomes one or more test cases.
|
||||
2. Each edge case becomes a test case.
|
||||
3. Tests are stubs — they define the assertion but not the implementation.
|
||||
4. All tests MUST fail initially (red phase of TDD).
|
||||
|
||||
**Exit criteria:** You have a test file where every test fails with "not implemented" or equivalent.
|
||||
|
||||
### Phase 5: Implement
|
||||
|
||||
**Goal:** Write code that makes failing tests pass, one acceptance criterion at a time.
|
||||
|
||||
1. Pick one acceptance criterion (start with the simplest).
|
||||
2. Make its test(s) pass with minimal code.
|
||||
3. Run the full test suite — no regressions.
|
||||
4. Commit.
|
||||
5. Pick the next acceptance criterion. Repeat.
|
||||
|
||||
**Rules:**
|
||||
- Do NOT implement anything not in the spec.
|
||||
- Do NOT optimize before all acceptance criteria pass.
|
||||
- Do NOT refactor before all acceptance criteria pass.
|
||||
- If you discover a missing requirement, STOP and update the spec first.
|
||||
|
||||
**Exit criteria:** All tests pass. All acceptance criteria satisfied.
|
||||
|
||||
### Phase 6: Self-Review
|
||||
|
||||
**Goal:** Verify implementation matches spec before marking done.
|
||||
|
||||
Run through the Self-Review Checklist below. If any item fails, fix it before declaring the task complete.
|
||||
|
||||
---
|
||||
|
||||
## Self-Review Checklist
|
||||
|
||||
Before marking any implementation as done, verify ALL of the following:
|
||||
|
||||
- [ ] **Every acceptance criterion has a passing test.** No exceptions. If AC-3 exists, a test for AC-3 exists and passes.
|
||||
- [ ] **Every edge case has a test.** EC-1 through EC-N all have corresponding test cases.
|
||||
- [ ] **No scope creep.** The implementation does not include features not in the spec. If you added something, either update the spec or remove it.
|
||||
- [ ] **API contracts match implementation.** Request/response shapes in code match the spec exactly. Field names, types, status codes — all of it.
|
||||
- [ ] **Error scenarios tested.** Every error response defined in the spec has a test that triggers it.
|
||||
- [ ] **Non-functional requirements verified.** If the spec says < 500ms, you have evidence (benchmark, load test, profiling) that it meets the threshold.
|
||||
- [ ] **Data model matches.** Database schema matches the spec. No extra columns, no missing constraints.
|
||||
- [ ] **Out-of-scope items not built.** Double-check that nothing from the Out of Scope section leaked into the implementation.
|
||||
|
||||
---
|
||||
|
||||
## Integration with TDD Guide
|
||||
|
||||
Spec-driven workflow and TDD are complementary, not competing:
|
||||
|
||||
```
|
||||
Spec-Driven Workflow TDD (Red-Green-Refactor)
|
||||
───────────────────── ──────────────────────────
|
||||
Phase 1: Gather Requirements
|
||||
Phase 2: Write Spec
|
||||
Phase 3: Validate Spec
|
||||
Phase 4: Generate Tests ──→ RED: Tests exist and fail
|
||||
Phase 5: Implement ──→ GREEN: Minimal code to pass
|
||||
Phase 6: Self-Review ──→ REFACTOR: Clean up internals
|
||||
```
|
||||
|
||||
**The handoff:** Spec-driven workflow produces the test stubs (Phase 4). TDD takes over from there. The spec tells you WHAT to test. TDD tells you HOW to implement.
|
||||
|
||||
Use `engineering-team/tdd-guide` for:
|
||||
- Red-green-refactor cycle discipline
|
||||
- Coverage analysis and gap detection
|
||||
- Framework-specific test patterns (Jest, Pytest, JUnit)
|
||||
|
||||
Use `engineering/spec-driven-workflow` for:
|
||||
- Defining what to build before building it
|
||||
- Acceptance criteria authoring
|
||||
- Completeness validation
|
||||
- Scope control
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Full Spec: User Password Reset
|
||||
|
||||
```markdown
|
||||
# Spec: Password Reset Flow
|
||||
|
||||
**Author:** Engineering Team
|
||||
**Date:** 2026-03-25
|
||||
**Status:** Approved
|
||||
|
||||
## Context
|
||||
|
||||
Users who forget their passwords currently have no self-service recovery option.
|
||||
Support receives ~200 password reset requests per week, costing approximately
|
||||
8 hours of support time. This feature eliminates that burden entirely.
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
- FR-1: The system MUST allow users to request a password reset via email.
|
||||
- FR-2: The system MUST send a reset link that expires after 1 hour.
|
||||
- FR-3: The system MUST invalidate all previous reset links when a new one is requested.
|
||||
- FR-4: The system MUST enforce minimum password length of 8 characters on reset.
|
||||
- FR-5: The system MUST NOT reveal whether an email exists in the system.
|
||||
- FR-6: The system SHOULD log all reset attempts for audit purposes.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### AC-1: Request reset (FR-1, FR-5)
|
||||
Given a user on the password reset page
|
||||
When they enter any email address and submit
|
||||
Then they see "If an account exists, a reset link has been sent"
|
||||
And the response is identical whether the email exists or not
|
||||
|
||||
### AC-2: Valid reset link (FR-2)
|
||||
Given a user who received a reset email 30 minutes ago
|
||||
When they click the reset link
|
||||
Then they see the password reset form
|
||||
|
||||
### AC-3: Expired reset link (FR-2)
|
||||
Given a user who received a reset email 2 hours ago
|
||||
When they click the reset link
|
||||
Then they see "This link has expired. Please request a new one."
|
||||
|
||||
### AC-4: Previous links invalidated (FR-3)
|
||||
Given a user who requested two reset emails
|
||||
When they click the link from the first email
|
||||
Then they see "This link is no longer valid."
|
||||
|
||||
## Edge Cases
|
||||
|
||||
- EC-1: User submits reset for non-existent email → Same success message (FR-5).
|
||||
- EC-2: User clicks reset link twice → Second click shows "already used" if password was changed.
|
||||
- EC-3: Email delivery fails → Log error, do not retry automatically.
|
||||
- EC-4: User requests reset while already logged in → Allow it, do not force logout.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- OS-1: Security questions as alternative reset method.
|
||||
- OS-2: SMS-based password reset.
|
||||
- OS-3: Admin-initiated password reset (separate spec).
|
||||
```
|
||||
|
||||
### Extracted Test Cases (from above spec)
|
||||
|
||||
```python
|
||||
# Generated by test_extractor.py --framework pytest
|
||||
|
||||
class TestPasswordReset:
|
||||
def test_ac1_request_reset_existing_email(self):
|
||||
"""AC-1: Request reset with existing email shows generic message."""
|
||||
# Given a user on the password reset page
|
||||
# When they enter a registered email and submit
|
||||
# Then they see "If an account exists, a reset link has been sent"
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ac1_request_reset_nonexistent_email(self):
|
||||
"""AC-1: Request reset with unknown email shows same generic message."""
|
||||
# Given a user on the password reset page
|
||||
# When they enter an unregistered email and submit
|
||||
# Then they see identical response to existing email case
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ac2_valid_reset_link(self):
|
||||
"""AC-2: Reset link works within expiry window."""
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ac3_expired_reset_link(self):
|
||||
"""AC-3: Reset link rejected after 1 hour."""
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ac4_previous_links_invalidated(self):
|
||||
"""AC-4: Old reset links stop working when new one is requested."""
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ec1_nonexistent_email_same_response(self):
|
||||
"""EC-1: Non-existent email produces identical response."""
|
||||
raise NotImplementedError("Implement this test")
|
||||
|
||||
def test_ec2_reset_link_used_twice(self):
|
||||
"""EC-2: Already-used reset link shows appropriate message."""
|
||||
raise NotImplementedError("Implement this test")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### 1. Coding Before Spec Approval
|
||||
|
||||
**Symptom:** "I'll start coding while the spec is being reviewed."
|
||||
**Problem:** The review will surface changes. Now you have code that implements a rejected design.
|
||||
**Rule:** Implementation does not begin until spec status is "Approved."
|
||||
|
||||
### 2. Vague Acceptance Criteria
|
||||
|
||||
**Symptom:** "The system should work well" or "The UI should be responsive."
|
||||
**Problem:** Untestable. What does "well" mean? What does "responsive" mean?
|
||||
**Rule:** Every acceptance criterion must be verifiable by a machine. If you cannot write a test for it, rewrite the criterion.
|
||||
|
||||
### 3. Missing Edge Cases
|
||||
|
||||
**Symptom:** Happy path is specified, error paths are not.
|
||||
**Problem:** Developers invent error handling on the fly, leading to inconsistent behavior.
|
||||
**Rule:** For every external dependency (API, database, file system, user input), specify at least one failure scenario.
|
||||
|
||||
### 4. Spec as Post-Hoc Documentation
|
||||
|
||||
**Symptom:** "Let me write the spec now that the feature is done."
|
||||
**Problem:** This is documentation, not specification. It describes what was built, not what should have been built. It cannot catch design errors because the design is already frozen.
|
||||
**Rule:** If the spec was written after the code, it is not a spec. Relabel it as documentation.
|
||||
|
||||
### 5. Gold-Plating Beyond Spec
|
||||
|
||||
**Symptom:** "While I was in there, I also added..."
|
||||
**Problem:** Untested code. Unreviewed design. Potential for subtle bugs in the "bonus" feature.
|
||||
**Rule:** If it is not in the spec, it does not get built. File a new spec for additional features.
|
||||
|
||||
### 6. Acceptance Criteria Without Requirement Traceability
|
||||
|
||||
**Symptom:** AC-7 exists but does not reference any FR-* or NFR-*.
|
||||
**Problem:** Orphaned criteria mean either a requirement is missing or the criterion is unnecessary.
|
||||
**Rule:** Every AC-* MUST reference at least one FR-* or NFR-*.
|
||||
|
||||
### 7. Skipping Validation
|
||||
|
||||
**Symptom:** "The spec looks fine, let's just start."
|
||||
**Problem:** Missing sections discovered during implementation cause blocking delays.
|
||||
**Rule:** Always run `spec_validator.py --strict` before starting implementation. Fix all warnings.
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
- **`engineering-team/tdd-guide`** — Red-green-refactor cycle, test generation, coverage analysis. Use after Phase 4 of this workflow.
|
||||
- **`engineering/focused-fix`** — Deep-dive feature repair. When a spec-driven implementation has systemic issues, use focused-fix for diagnosis.
|
||||
- **`engineering/rag-architect`** — If the feature involves retrieval or knowledge systems, use rag-architect for the technical design within the spec.
|
||||
- **`references/spec_format_guide.md`** — Complete template with section-by-section explanations.
|
||||
- **`references/bounded_autonomy_rules.md`** — Full decision matrix for when to stop vs. continue.
|
||||
- **`references/acceptance_criteria_patterns.md`** — Pattern library for writing Given/When/Then criteria.
|
||||
|
||||
---
|
||||
|
||||
## Tools
|
||||
|
||||
| Script | Purpose | Key Flags |
|
||||
|--------|---------|-----------|
|
||||
| `spec_generator.py` | Generate spec template from feature name/description | `--name`, `--description`, `--format`, `--json` |
|
||||
| `spec_validator.py` | Validate spec completeness (0-100 score) | `--file`, `--strict`, `--json` |
|
||||
| `test_extractor.py` | Extract test stubs from acceptance criteria | `--file`, `--framework`, `--output`, `--json` |
|
||||
|
||||
```bash
|
||||
# Generate a spec template
|
||||
python spec_generator.py --name "User Authentication" --description "OAuth 2.0 login flow"
|
||||
|
||||
# Validate a spec
|
||||
python spec_validator.py --file specs/auth.md --strict
|
||||
|
||||
# Extract test cases
|
||||
python test_extractor.py --file specs/auth.md --framework pytest --output tests/test_auth.py
|
||||
```
|
||||
@@ -0,0 +1,497 @@
|
||||
# Acceptance Criteria Patterns
|
||||
|
||||
A pattern library for writing Given/When/Then acceptance criteria across common feature types. Use these as starting points — adapt to your domain.
|
||||
|
||||
---
|
||||
|
||||
## Pattern Structure
|
||||
|
||||
Every acceptance criterion follows this structure:
|
||||
|
||||
```
|
||||
### AC-N: [Descriptive name] (FR-N, NFR-N)
|
||||
Given [precondition — the system/user is in this state]
|
||||
When [trigger — the user or system performs this action]
|
||||
Then [outcome — this observable, testable result occurs]
|
||||
And [additional outcome — and this also happens]
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
1. One scenario per AC. Multiple Given/When/Then blocks = multiple ACs.
|
||||
2. Every AC references at least one FR-* or NFR-*.
|
||||
3. Outcomes must be observable and testable — no subjective language.
|
||||
4. Preconditions must be achievable in a test setup.
|
||||
|
||||
---
|
||||
|
||||
## Authentication Patterns
|
||||
|
||||
### Login — Happy Path
|
||||
|
||||
```markdown
|
||||
### AC-1: Successful login with valid credentials (FR-1)
|
||||
Given a registered user with email "user@example.com" and password "V@lidP4ss!"
|
||||
When they POST /api/auth/login with email "user@example.com" and password "V@lidP4ss!"
|
||||
Then the response status is 200
|
||||
And the response body contains a valid JWT access token
|
||||
And the response body contains a refresh token
|
||||
And the access token expires in 24 hours
|
||||
```
|
||||
|
||||
### Login — Invalid Credentials
|
||||
|
||||
```markdown
|
||||
### AC-2: Login rejected with wrong password (FR-1)
|
||||
Given a registered user with email "user@example.com"
|
||||
When they POST /api/auth/login with email "user@example.com" and an incorrect password
|
||||
Then the response status is 401
|
||||
And the response body contains error code "INVALID_CREDENTIALS"
|
||||
And no token is issued
|
||||
And the failed attempt is logged
|
||||
```
|
||||
|
||||
### Login — Account Locked
|
||||
|
||||
```markdown
|
||||
### AC-3: Login rejected for locked account (FR-1, NFR-S2)
|
||||
Given a user whose account is locked due to 5 consecutive failed login attempts
|
||||
When they POST /api/auth/login with correct credentials
|
||||
Then the response status is 403
|
||||
And the response body contains error code "ACCOUNT_LOCKED"
|
||||
And the response includes a "retryAfter" field with seconds until unlock
|
||||
```
|
||||
|
||||
### Token Refresh
|
||||
|
||||
```markdown
|
||||
### AC-4: Token refresh with valid refresh token (FR-3)
|
||||
Given a user with a valid, non-expired refresh token
|
||||
When they POST /api/auth/refresh with that refresh token
|
||||
Then the response status is 200
|
||||
And a new access token is issued
|
||||
And the old refresh token is invalidated
|
||||
And a new refresh token is issued (rotation)
|
||||
```
|
||||
|
||||
### Logout
|
||||
|
||||
```markdown
|
||||
### AC-5: Logout invalidates session (FR-4)
|
||||
Given an authenticated user with a valid access token
|
||||
When they POST /api/auth/logout with that token
|
||||
Then the response status is 204
|
||||
And the access token is no longer accepted for API calls
|
||||
And the refresh token is invalidated
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CRUD Patterns
|
||||
|
||||
### Create
|
||||
|
||||
```markdown
|
||||
### AC-6: Create resource with valid data (FR-1)
|
||||
Given an authenticated user with "editor" role
|
||||
When they POST /api/resources with valid payload {name: "Test", type: "A"}
|
||||
Then the response status is 201
|
||||
And the response body contains the created resource with a generated UUID
|
||||
And the resource's "createdAt" field is set to the current UTC timestamp
|
||||
And the resource's "createdBy" field matches the authenticated user's ID
|
||||
```
|
||||
|
||||
### Create — Validation Failure
|
||||
|
||||
```markdown
|
||||
### AC-7: Create resource rejected with invalid data (FR-1)
|
||||
Given an authenticated user
|
||||
When they POST /api/resources with payload missing required field "name"
|
||||
Then the response status is 400
|
||||
And the response body contains error code "VALIDATION_ERROR"
|
||||
And the response body contains field-level detail: {"name": "Required field"}
|
||||
And no resource is created in the database
|
||||
```
|
||||
|
||||
### Read — Single Item
|
||||
|
||||
```markdown
|
||||
### AC-8: Read resource by ID (FR-2)
|
||||
Given an existing resource with ID "abc-123"
|
||||
When an authenticated user GETs /api/resources/abc-123
|
||||
Then the response status is 200
|
||||
And the response body contains the resource with all fields
|
||||
```
|
||||
|
||||
### Read — Not Found
|
||||
|
||||
```markdown
|
||||
### AC-9: Read non-existent resource returns 404 (FR-2)
|
||||
Given no resource exists with ID "nonexistent-id"
|
||||
When an authenticated user GETs /api/resources/nonexistent-id
|
||||
Then the response status is 404
|
||||
And the response body contains error code "NOT_FOUND"
|
||||
```
|
||||
|
||||
### Update
|
||||
|
||||
```markdown
|
||||
### AC-10: Update resource with valid data (FR-3)
|
||||
Given an existing resource with ID "abc-123" owned by the authenticated user
|
||||
When they PATCH /api/resources/abc-123 with {name: "Updated Name"}
|
||||
Then the response status is 200
|
||||
And the resource's "name" field is "Updated Name"
|
||||
And the resource's "updatedAt" field is updated to the current UTC timestamp
|
||||
And fields not included in the patch are unchanged
|
||||
```
|
||||
|
||||
### Update — Ownership Check
|
||||
|
||||
```markdown
|
||||
### AC-11: Update rejected for non-owner (FR-3, FR-6)
|
||||
Given an existing resource with ID "abc-123" owned by user "other-user"
|
||||
When the authenticated user (not "other-user") PATCHes /api/resources/abc-123
|
||||
Then the response status is 403
|
||||
And the response body contains error code "FORBIDDEN"
|
||||
And the resource is unchanged
|
||||
```
|
||||
|
||||
### Delete — Soft Delete
|
||||
|
||||
```markdown
|
||||
### AC-12: Soft delete resource (FR-5)
|
||||
Given an existing resource with ID "abc-123" owned by the authenticated user
|
||||
When they DELETE /api/resources/abc-123
|
||||
Then the response status is 204
|
||||
And the resource's "deletedAt" field is set to the current UTC timestamp
|
||||
And the resource no longer appears in GET /api/resources (list endpoint)
|
||||
And the resource still exists in the database (soft deleted)
|
||||
```
|
||||
|
||||
### List — Pagination
|
||||
|
||||
```markdown
|
||||
### AC-13: List resources with default pagination (FR-4)
|
||||
Given 50 resources exist for the authenticated user
|
||||
When they GET /api/resources without pagination parameters
|
||||
Then the response status is 200
|
||||
And the response contains the first 20 resources (default page size)
|
||||
And the response includes "totalCount: 50"
|
||||
And the response includes "page: 1"
|
||||
And the response includes "pageSize: 20"
|
||||
And the response includes "hasNextPage: true"
|
||||
```
|
||||
|
||||
### List — Filtered
|
||||
|
||||
```markdown
|
||||
### AC-14: List resources with type filter (FR-4)
|
||||
Given 30 resources of type "A" and 20 resources of type "B" exist
|
||||
When the authenticated user GETs /api/resources?type=A
|
||||
Then the response status is 200
|
||||
And all returned resources have type "A"
|
||||
And the response "totalCount" is 30
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Search Patterns
|
||||
|
||||
### Basic Search
|
||||
|
||||
```markdown
|
||||
### AC-15: Search returns matching results (FR-7)
|
||||
Given resources with names "Alpha Report", "Beta Analysis", "Alpha Summary" exist
|
||||
When the user GETs /api/resources?q=Alpha
|
||||
Then the response contains "Alpha Report" and "Alpha Summary"
|
||||
And the response does not contain "Beta Analysis"
|
||||
And results are ordered by relevance score (descending)
|
||||
```
|
||||
|
||||
### Search — Empty Results
|
||||
|
||||
```markdown
|
||||
### AC-16: Search with no matches returns empty list (FR-7)
|
||||
Given no resources match the query "xyznonexistent"
|
||||
When the user GETs /api/resources?q=xyznonexistent
|
||||
Then the response status is 200
|
||||
And the response contains an empty "items" array
|
||||
And "totalCount" is 0
|
||||
```
|
||||
|
||||
### Search — Special Characters
|
||||
|
||||
```markdown
|
||||
### AC-17: Search handles special characters safely (FR-7, NFR-S1)
|
||||
Given resources exist in the database
|
||||
When the user GETs /api/resources?q="; DROP TABLE resources;--
|
||||
Then the response status is 200
|
||||
And no SQL injection occurs
|
||||
And the search treats the input as a literal string
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File Upload Patterns
|
||||
|
||||
### Upload — Happy Path
|
||||
|
||||
```markdown
|
||||
### AC-18: Upload file within size limit (FR-8)
|
||||
Given an authenticated user
|
||||
When they POST /api/files with a 5MB PNG file
|
||||
Then the response status is 201
|
||||
And the response contains the file's URL, size, and MIME type
|
||||
And the file is stored in the configured storage backend
|
||||
And the file is associated with the authenticated user
|
||||
```
|
||||
|
||||
### Upload — Size Exceeded
|
||||
|
||||
```markdown
|
||||
### AC-19: Upload rejected for oversized file (FR-8)
|
||||
Given the maximum file size is 10MB
|
||||
When the user POSTs /api/files with a 15MB file
|
||||
Then the response status is 413
|
||||
And the response contains error code "FILE_TOO_LARGE"
|
||||
And no file is stored
|
||||
```
|
||||
|
||||
### Upload — Invalid Type
|
||||
|
||||
```markdown
|
||||
### AC-20: Upload rejected for disallowed file type (FR-8, NFR-S3)
|
||||
Given allowed file types are PNG, JPG, PDF
|
||||
When the user POSTs /api/files with an .exe file
|
||||
Then the response status is 415
|
||||
And the response contains error code "UNSUPPORTED_MEDIA_TYPE"
|
||||
And no file is stored
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Payment Patterns
|
||||
|
||||
### Charge — Happy Path
|
||||
|
||||
```markdown
|
||||
### AC-21: Successful payment charge (FR-10)
|
||||
Given a user with a valid payment method on file
|
||||
When they POST /api/payments with amount 49.99 and currency "USD"
|
||||
Then the payment gateway is charged $49.99
|
||||
And the response status is 201
|
||||
And the response contains a transaction ID
|
||||
And a payment record is created with status "completed"
|
||||
And a receipt email is sent to the user
|
||||
```
|
||||
|
||||
### Charge — Declined
|
||||
|
||||
```markdown
|
||||
### AC-22: Payment declined by gateway (FR-10)
|
||||
Given a user with an expired credit card on file
|
||||
When they POST /api/payments with amount 49.99
|
||||
Then the payment gateway returns a decline
|
||||
And the response status is 402
|
||||
And the response contains error code "PAYMENT_DECLINED"
|
||||
And no payment record is created with status "completed"
|
||||
And the user is prompted to update their payment method
|
||||
```
|
||||
|
||||
### Charge — Idempotency
|
||||
|
||||
```markdown
|
||||
### AC-23: Duplicate payment request is idempotent (FR-10, NFR-R1)
|
||||
Given a payment was successfully processed with idempotency key "key-123"
|
||||
When the same request is sent again with idempotency key "key-123"
|
||||
Then the response status is 200
|
||||
And the response contains the original transaction ID
|
||||
And the user is NOT charged a second time
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notification Patterns
|
||||
|
||||
### Email Notification
|
||||
|
||||
```markdown
|
||||
### AC-24: Email notification sent on event (FR-11)
|
||||
Given a user with notification preferences set to "email"
|
||||
When their order status changes to "shipped"
|
||||
Then an email is sent to their registered email address
|
||||
And the email subject contains the order number
|
||||
And the email body contains the tracking URL
|
||||
And a notification record is created with status "sent"
|
||||
```
|
||||
|
||||
### Notification — Delivery Failure
|
||||
|
||||
```markdown
|
||||
### AC-25: Failed notification is retried (FR-11, NFR-R2)
|
||||
Given the email service returns a 5xx error on first attempt
|
||||
When a notification is triggered
|
||||
Then the system retries up to 3 times with exponential backoff (1s, 4s, 16s)
|
||||
And if all retries fail, the notification status is set to "failed"
|
||||
And an alert is sent to the ops channel
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Negative Test Patterns
|
||||
|
||||
### Unauthorized Access
|
||||
|
||||
```markdown
|
||||
### AC-26: Unauthenticated request rejected (NFR-S1)
|
||||
Given no authentication token is provided
|
||||
When the user GETs /api/resources
|
||||
Then the response status is 401
|
||||
And the response contains error code "AUTHENTICATION_REQUIRED"
|
||||
And no resource data is returned
|
||||
```
|
||||
|
||||
### Invalid Input — Type Mismatch
|
||||
|
||||
```markdown
|
||||
### AC-27: String provided for numeric field (FR-1)
|
||||
Given the "quantity" field expects an integer
|
||||
When the user POSTs with quantity: "abc"
|
||||
Then the response status is 400
|
||||
And the response body contains field error: {"quantity": "Must be an integer"}
|
||||
```
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
```markdown
|
||||
### AC-28: Rate limit enforced (NFR-S2)
|
||||
Given the rate limit is 100 requests per minute per API key
|
||||
When the user sends the 101st request within 60 seconds
|
||||
Then the response status is 429
|
||||
And the response includes header "Retry-After" with seconds until reset
|
||||
And the response contains error code "RATE_LIMITED"
|
||||
```
|
||||
|
||||
### Concurrent Modification
|
||||
|
||||
```markdown
|
||||
### AC-29: Optimistic locking prevents lost updates (NFR-R1)
|
||||
Given a resource with version 5
|
||||
When user A PATCHes with version 5 and user B PATCHes with version 5 simultaneously
|
||||
Then one succeeds with status 200 (version becomes 6)
|
||||
And the other receives status 409 with error code "CONFLICT"
|
||||
And the 409 response includes the current version number
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Criteria Patterns
|
||||
|
||||
### Response Time
|
||||
|
||||
```markdown
|
||||
### AC-30: API response time under load (NFR-P1)
|
||||
Given the system is handling 1,000 concurrent users
|
||||
When a user GETs /api/dashboard
|
||||
Then the response is returned in < 500ms (p95)
|
||||
And the response is returned in < 1000ms (p99)
|
||||
```
|
||||
|
||||
### Throughput
|
||||
|
||||
```markdown
|
||||
### AC-31: System handles target throughput (NFR-P2)
|
||||
Given normal production traffic patterns
|
||||
When the system receives 5,000 requests per second
|
||||
Then all requests are processed without queue overflow
|
||||
And error rate remains below 0.1%
|
||||
```
|
||||
|
||||
### Resource Usage
|
||||
|
||||
```markdown
|
||||
### AC-32: Memory usage within bounds (NFR-P3)
|
||||
Given the service is processing normal traffic
|
||||
When measured over a 24-hour period
|
||||
Then memory usage does not exceed 512MB RSS
|
||||
And no memory leaks are detected (RSS growth < 5% over 24h)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Accessibility Criteria Patterns
|
||||
|
||||
### Keyboard Navigation
|
||||
|
||||
```markdown
|
||||
### AC-33: Form is fully keyboard navigable (NFR-A1)
|
||||
Given the user is on the login page using only a keyboard
|
||||
When they press Tab
|
||||
Then focus moves through: email field -> password field -> submit button
|
||||
And each focused element has a visible focus indicator
|
||||
And pressing Enter on the submit button submits the form
|
||||
```
|
||||
|
||||
### Screen Reader
|
||||
|
||||
```markdown
|
||||
### AC-34: Error messages announced to screen readers (NFR-A2)
|
||||
Given the user submits the form with invalid data
|
||||
When validation errors appear
|
||||
Then each error is associated with its form field via aria-describedby
|
||||
And the error container has role="alert" for immediate announcement
|
||||
And the first error field receives focus
|
||||
```
|
||||
|
||||
### Color Contrast
|
||||
|
||||
```markdown
|
||||
### AC-35: Text meets contrast requirements (NFR-A3)
|
||||
Given the default theme is active
|
||||
When measuring text against background colors
|
||||
Then all body text meets 4.5:1 contrast ratio (WCAG AA)
|
||||
And all large text (18px+ or 14px+ bold) meets 3:1 contrast ratio
|
||||
And all interactive element states (hover, focus, active) meet 3:1
|
||||
```
|
||||
|
||||
### Reduced Motion
|
||||
|
||||
```markdown
|
||||
### AC-36: Animations respect user preference (NFR-A4)
|
||||
Given the user has enabled "prefers-reduced-motion" in their OS settings
|
||||
When they load any page with animations
|
||||
Then all non-essential animations are disabled
|
||||
And essential animations (e.g., loading spinner) use a reduced version
|
||||
And no content is hidden behind animation-only interactions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Writing Tips
|
||||
|
||||
### Do
|
||||
|
||||
- Start Given with the system/user state, not the action
|
||||
- Make When a single, specific trigger
|
||||
- Make Then observable — status codes, field values, side effects
|
||||
- Include And for additional assertions on the same outcome
|
||||
- Reference requirement IDs in the AC title
|
||||
|
||||
### Do Not
|
||||
|
||||
- Write "Then the system works correctly" (not testable)
|
||||
- Combine multiple scenarios in one AC
|
||||
- Use subjective words: "quickly", "properly", "nicely", "user-friendly"
|
||||
- Skip the precondition — Given is required even if it seems obvious
|
||||
- Write Given/When/Then as prose paragraphs — use the structured format
|
||||
|
||||
### Smell Tests
|
||||
|
||||
If your AC has any of these, rewrite it:
|
||||
|
||||
| Smell | Example | Fix |
|
||||
|-------|---------|-----|
|
||||
| No Given clause | "When user clicks, then page loads" | Add "Given user is on the dashboard" |
|
||||
| Vague Then | "Then it works" | Specify status code, body, side effects |
|
||||
| Multiple Whens | "When user clicks A and then clicks B" | Split into two ACs |
|
||||
| Implementation detail | "Then the Redux store is updated" | Focus on user-observable outcome |
|
||||
| No requirement reference | "AC-5: Dashboard loads" | "AC-5: Dashboard loads (FR-7)" |
|
||||
@@ -0,0 +1,273 @@
|
||||
# Bounded Autonomy Rules
|
||||
|
||||
Decision framework for when an agent (human or AI) should stop and ask vs. continue working autonomously during spec-driven development.
|
||||
|
||||
---
|
||||
|
||||
## The Core Principle
|
||||
|
||||
**Autonomy is earned by clarity.** The clearer the spec, the more autonomy the implementer has. The more ambiguous the spec, the more the implementer must stop and ask.
|
||||
|
||||
This is not about trust. It is about risk. A clear spec means low risk of building the wrong thing. An ambiguous spec means high risk.
|
||||
|
||||
---
|
||||
|
||||
## Decision Matrix
|
||||
|
||||
| Signal | Action | Rationale |
|
||||
|--------|--------|-----------|
|
||||
| Spec is Approved, requirement is clear, tests exist | **Continue** | Low risk. Build it. |
|
||||
| Requirement is clear but no test exists yet | **Continue** (write the test first) | You can infer the test from the requirement. |
|
||||
| Requirement uses SHOULD/MAY keywords | **Continue** with your best judgment | These are intentionally flexible. Document your choice. |
|
||||
| Requirement is ambiguous (multiple valid interpretations) | **STOP** if ambiguity > 30% of the task | Ask the spec author to clarify. |
|
||||
| Implementation requires changing an API contract | **STOP** always | Breaking changes need explicit approval. |
|
||||
| Implementation requires a new database migration | **STOP** if it changes existing columns/tables | New tables are lower risk than schema changes. |
|
||||
| Security-related change (auth, crypto, PII) | **STOP** always | Security changes need review regardless of spec clarity. |
|
||||
| Performance-critical path with no benchmark data | **STOP** | You cannot prove NFR compliance without measurement. |
|
||||
| Bug found in existing code unrelated to spec | **STOP** — file a separate issue | Do not fix unrelated bugs in a spec-scoped implementation. |
|
||||
| Spec says "N/A" for a section you think needs content | **STOP** | The author may have a reason, or they may have missed it. |
|
||||
|
||||
---
|
||||
|
||||
## Ambiguity Scoring
|
||||
|
||||
When you encounter ambiguity, quantify it before deciding to stop or continue.
|
||||
|
||||
### How to Score Ambiguity
|
||||
|
||||
For each requirement you are implementing, ask:
|
||||
|
||||
1. **Can I write a test for this right now?** (No = +20% ambiguity)
|
||||
2. **Are there multiple valid interpretations?** (Yes = +20% ambiguity)
|
||||
3. **Does the spec contradict itself?** (Yes = +30% ambiguity)
|
||||
4. **Am I making assumptions about user behavior?** (Yes = +15% ambiguity)
|
||||
5. **Does this depend on an undocumented external system?** (Yes = +15% ambiguity)
|
||||
|
||||
### Threshold
|
||||
|
||||
| Ambiguity Score | Action |
|
||||
|-----------------|--------|
|
||||
| 0-15% | Continue. Minor ambiguity is normal. Document your interpretation. |
|
||||
| 16-30% | Continue with caution. Add a comment explaining your interpretation. Flag in PR. |
|
||||
| 31-50% | STOP. Ask the spec author one specific question. Do not continue until answered. |
|
||||
| 51%+ | STOP. The spec is incomplete. Request a revision before proceeding. |
|
||||
|
||||
### Example
|
||||
|
||||
**Requirement:** "FR-7: The system MUST notify the user when their order ships."
|
||||
|
||||
Questions:
|
||||
1. Can I write a test? Partially — I know WHAT to test but not HOW (email? push? in-app?). +20%
|
||||
2. Multiple interpretations? Yes — notification channel is unclear. +20%
|
||||
3. Contradicts itself? No. +0%
|
||||
4. Assuming user behavior? Yes — I am assuming they want email. +15%
|
||||
5. Undocumented external system? Maybe — depends on notification service. +15%
|
||||
|
||||
**Total: 70%.** STOP. The spec needs to specify the notification channel.
|
||||
|
||||
---
|
||||
|
||||
## Scope Creep Detection
|
||||
|
||||
### What Is Scope Creep?
|
||||
|
||||
Scope creep is implementing functionality not described in the spec. It includes:
|
||||
|
||||
- Adding features the spec does not mention
|
||||
- "Improving" behavior beyond what acceptance criteria require
|
||||
- Handling edge cases the spec explicitly excluded
|
||||
- Refactoring unrelated code "while you're in there"
|
||||
- Building infrastructure for future features
|
||||
|
||||
### Detection Patterns
|
||||
|
||||
| Pattern | Example | Risk |
|
||||
|---------|---------|------|
|
||||
| "While I'm here..." | Refactoring a utility function unrelated to the spec | Medium — unreviewed changes |
|
||||
| "This would be easy to add..." | Adding a search filter the spec does not mention | High — untested, unspecified |
|
||||
| "Users will probably want..." | Building a feature based on assumption | High — may conflict with future specs |
|
||||
| "This is obviously needed..." | Adding logging, metrics, or caching not in NFRs | Medium — may be overkill or wrong approach |
|
||||
| "The spec forgot to mention..." | Building something the spec excluded | Critical — may be deliberately excluded |
|
||||
|
||||
### Response Protocol
|
||||
|
||||
When you detect scope creep in your own work:
|
||||
|
||||
1. **Stop immediately.** Do not commit the extra code.
|
||||
2. **Check Out of Scope.** Is this item explicitly excluded?
|
||||
3. **If excluded:** Delete the code. The spec author had a reason.
|
||||
4. **If not mentioned:** File a note for the spec author. Ask if it should be added.
|
||||
5. **If approved:** Update the spec FIRST, then implement.
|
||||
|
||||
---
|
||||
|
||||
## Breaking Change Identification
|
||||
|
||||
### What Counts as a Breaking Change?
|
||||
|
||||
A breaking change is any modification that could cause existing clients, tests, or integrations to fail.
|
||||
|
||||
| Category | Breaking | Not Breaking |
|
||||
|----------|----------|--------------|
|
||||
| API endpoint removed | Yes | - |
|
||||
| API endpoint added | - | No |
|
||||
| Required field added to request | Yes | - |
|
||||
| Optional field added to request | - | No |
|
||||
| Field removed from response | Yes | - |
|
||||
| Field added to response | - | No (usually) |
|
||||
| Status code changed | Yes | - |
|
||||
| Error code string changed | Yes | - |
|
||||
| Database column removed | Yes | - |
|
||||
| Database column added (nullable) | - | No |
|
||||
| Database column added (not null, no default) | Yes | - |
|
||||
| Enum value removed | Yes | - |
|
||||
| Enum value added | - | No (usually) |
|
||||
| Behavior change for existing input | Yes | - |
|
||||
|
||||
### Breaking Change Protocol
|
||||
|
||||
1. **Identify** the breaking change before implementing it.
|
||||
2. **Escalate** immediately — do not implement without approval.
|
||||
3. **Propose** a migration path (versioned API, feature flag, deprecation period).
|
||||
4. **Document** the breaking change in the spec's changelog.
|
||||
|
||||
---
|
||||
|
||||
## Security Implication Checklist
|
||||
|
||||
Any change touching the following areas MUST be escalated, even if the spec seems clear.
|
||||
|
||||
### Always Escalate
|
||||
|
||||
- [ ] Authentication logic (login, logout, token generation)
|
||||
- [ ] Authorization logic (role checks, permission gates)
|
||||
- [ ] Encryption/hashing (algorithm choice, key management)
|
||||
- [ ] PII handling (storage, transmission, logging)
|
||||
- [ ] Input validation bypass (new endpoints, parameter changes)
|
||||
- [ ] Rate limiting changes (thresholds, scope)
|
||||
- [ ] CORS or CSP policy changes
|
||||
- [ ] File upload handling
|
||||
- [ ] SQL/NoSQL query construction (injection risk)
|
||||
- [ ] Deserialization of user input
|
||||
- [ ] Redirect URLs from user input (open redirect risk)
|
||||
- [ ] Secrets in code, config, or logs
|
||||
|
||||
### Security Escalation Template
|
||||
|
||||
```markdown
|
||||
## Security Escalation: [Title]
|
||||
|
||||
**Affected area:** [authentication/authorization/encryption/PII/etc.]
|
||||
**Spec reference:** [FR-N or NFR-SN]
|
||||
**Risk:** [What could go wrong if implemented incorrectly]
|
||||
**Current protection:** [What exists today]
|
||||
**Proposed change:** [What the spec requires]
|
||||
**My concern:** [Specific security question]
|
||||
**Recommendation:** [Proposed approach with security rationale]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Escalation Templates
|
||||
|
||||
### Template 1: Ambiguous Requirement
|
||||
|
||||
```markdown
|
||||
## Escalation: Ambiguous Requirement
|
||||
|
||||
**Blocked on:** FR-7 ("notify the user when their order ships")
|
||||
**Ambiguity score:** 70%
|
||||
**Question:** What notification channel should be used?
|
||||
**Options considered:**
|
||||
A. Email only — Pros: simple, reliable. Cons: not real-time.
|
||||
B. Email + in-app notification — Pros: covers both async and real-time. Cons: more implementation effort.
|
||||
C. Configurable per user — Pros: maximum flexibility. Cons: requires preference UI (not in spec).
|
||||
**My recommendation:** B (email + in-app). Covers most use cases without requiring new UI.
|
||||
**Impact of waiting:** Cannot implement FR-7 until resolved. No other work blocked.
|
||||
```
|
||||
|
||||
### Template 2: Missing Edge Case
|
||||
|
||||
```markdown
|
||||
## Escalation: Missing Edge Case
|
||||
|
||||
**Related to:** FR-3 (password reset link expires after 1 hour)
|
||||
**Scenario:** User clicks a reset link, but their account was deleted between requesting and clicking.
|
||||
**Not in spec:** Edge cases section does not cover this.
|
||||
**Options considered:**
|
||||
A. Show generic "link invalid" error — Pros: secure (no info leak). Cons: confusing for deleted user.
|
||||
B. Show "account not found" error — Pros: clear. Cons: confirms account deletion to link holder.
|
||||
**My recommendation:** A. Security over clarity — do not reveal account existence.
|
||||
**Impact of waiting:** Can implement other ACs; this is blocking only AC-2 completion.
|
||||
```
|
||||
|
||||
### Template 3: Potential Breaking Change
|
||||
|
||||
```markdown
|
||||
## Escalation: Potential Breaking Change
|
||||
|
||||
**Spec requires:** Adding required field "role" to POST /api/users request (FR-6)
|
||||
**Current behavior:** POST /api/users accepts {email, password, displayName}
|
||||
**Breaking:** Yes — existing clients will get 400 errors (missing required field)
|
||||
**Options considered:**
|
||||
A. Make "role" required as spec says — Pros: matches spec. Cons: breaks mobile app v2.1.
|
||||
B. Make "role" optional with default "user" — Pros: backward compatible. Cons: deviates from spec.
|
||||
C. Version the API (v2) — Pros: clean separation. Cons: maintenance burden.
|
||||
**My recommendation:** B. Default to "user" for backward compatibility. Update spec to reflect MAY instead of MUST.
|
||||
**Impact of waiting:** Frontend team is building against the new contract. Need answer within 2 days.
|
||||
```
|
||||
|
||||
### Template 4: Scope Creep Proposal
|
||||
|
||||
```markdown
|
||||
## Escalation: Potential Addition to Spec
|
||||
|
||||
**Context:** While implementing FR-2 (password validation), I noticed the spec does not mention password strength feedback.
|
||||
**Not in spec:** No requirement for showing strength indicators.
|
||||
**Checked Out of Scope:** Not listed there either.
|
||||
**Proposal:** Add FR-7: "The system SHOULD display password strength feedback during registration."
|
||||
**Effort:** ~2 hours additional implementation.
|
||||
**Question:** Should this be added to current spec, filed as a separate spec, or skipped?
|
||||
**Impact of waiting:** FR-2 implementation is not blocked. This is an enhancement question only.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
```
|
||||
CONTINUE if:
|
||||
- Spec is approved
|
||||
- Requirement uses MUST and is unambiguous
|
||||
- Tests can be written directly from the AC
|
||||
- Changes are additive and non-breaking
|
||||
- You are refactoring internals only (no behavior change)
|
||||
|
||||
STOP if:
|
||||
- Ambiguity > 30%
|
||||
- Any breaking change
|
||||
- Any security-related change
|
||||
- Spec says N/A but you think it shouldn't
|
||||
- You are about to build something not in the spec
|
||||
- You cannot write a test for the requirement
|
||||
- External dependency is undocumented
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns in Autonomy
|
||||
|
||||
### 1. "I'll Ask Later"
|
||||
Continuing past an ambiguity checkpoint because asking feels slow. The rework from building the wrong thing is always slower.
|
||||
|
||||
### 2. "It's Obviously Needed"
|
||||
Assuming a missing feature was accidentally omitted. It may have been deliberately excluded. Check Out of Scope first.
|
||||
|
||||
### 3. "The Spec Is Wrong"
|
||||
Implementing what you think the spec SHOULD say instead of what it DOES say. If the spec is wrong, escalate. Do not silently "fix" it.
|
||||
|
||||
### 4. "Just This Once"
|
||||
Bypassing the escalation protocol for a "small" change. Small changes compound. The protocol exists because humans are bad at judging risk in the moment.
|
||||
|
||||
### 5. "I Already Built It"
|
||||
Presenting completed work that was never in the spec and hoping it gets accepted. This creates review pressure and wastes everyone's time if rejected. Ask BEFORE building.
|
||||
423
engineering/spec-driven-workflow/references/spec_format_guide.md
Normal file
423
engineering/spec-driven-workflow/references/spec_format_guide.md
Normal file
@@ -0,0 +1,423 @@
|
||||
# Spec Format Guide
|
||||
|
||||
Complete reference for writing feature specifications. Every section is explained with examples, rationale, and common mistakes.
|
||||
|
||||
---
|
||||
|
||||
## The Spec Document Structure
|
||||
|
||||
A spec has 8 mandatory sections. If a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not skipped.
|
||||
|
||||
```
|
||||
1. Title and Metadata
|
||||
2. Context
|
||||
3. Functional Requirements
|
||||
4. Non-Functional Requirements
|
||||
5. Acceptance Criteria
|
||||
6. Edge Cases and Error Scenarios
|
||||
7. API Contracts
|
||||
8. Data Models
|
||||
9. Out of Scope
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Section 1: Title and Metadata
|
||||
|
||||
```markdown
|
||||
# Spec: [Feature Name]
|
||||
|
||||
**Author:** Jane Doe
|
||||
**Date:** 2026-03-25
|
||||
**Status:** Draft | In Review | Approved | Superseded
|
||||
**Reviewers:** John Smith, Alice Chen
|
||||
**Related specs:** SPEC-018 (User Registration), SPEC-023 (Session Management)
|
||||
```
|
||||
|
||||
### Status Lifecycle
|
||||
|
||||
| Status | Meaning | Who Can Change |
|
||||
|--------|---------|----------------|
|
||||
| Draft | Author is still writing. Not ready for review. | Author |
|
||||
| In Review | Ready for feedback. Implementation blocked. | Author |
|
||||
| Approved | Reviewed and accepted. Implementation may begin. | Reviewer |
|
||||
| Superseded | Replaced by a newer spec. Link to replacement. | Author |
|
||||
|
||||
**Rule:** Implementation MUST NOT begin until status is "Approved."
|
||||
|
||||
---
|
||||
|
||||
## Section 2: Context
|
||||
|
||||
The context section answers: **Why does this feature exist?**
|
||||
|
||||
### What to Include
|
||||
|
||||
- The problem being solved (with evidence: support tickets, metrics, user research)
|
||||
- The current state (what exists today and what is broken or missing)
|
||||
- The business justification (revenue impact, cost savings, user retention)
|
||||
- Constraints or dependencies (regulatory, technical, timeline)
|
||||
|
||||
### What to Exclude
|
||||
|
||||
- Implementation details (that is the engineer's job)
|
||||
- Solution proposals (the spec says WHAT, not HOW)
|
||||
- Lengthy background (2-4 paragraphs maximum)
|
||||
|
||||
### Good Example
|
||||
|
||||
```markdown
|
||||
## Context
|
||||
|
||||
Users who forget their passwords currently have no self-service recovery.
|
||||
Support handles ~200 password reset requests per week, consuming approximately
|
||||
8 hours of agent time at $45/hour ($360/week, $18,720/year). Additionally,
|
||||
12% of users who contact support for a reset never return.
|
||||
|
||||
This feature provides self-service password reset via email, eliminating
|
||||
support burden and reducing user churn from the reset flow.
|
||||
```
|
||||
|
||||
### Bad Example
|
||||
|
||||
```markdown
|
||||
## Context
|
||||
|
||||
We need a password reset feature. Users forget their passwords sometimes
|
||||
and need to reset them. We should build this.
|
||||
```
|
||||
|
||||
**Why it is bad:** No evidence, no metrics, no business justification. "We should build this" is not a reason.
|
||||
|
||||
---
|
||||
|
||||
## Section 3: Functional Requirements — RFC 2119
|
||||
|
||||
### RFC 2119 Keywords
|
||||
|
||||
These keywords have precise meanings per [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). Do not use them casually.
|
||||
|
||||
| Keyword | Meaning | Testing Implication |
|
||||
|---------|---------|---------------------|
|
||||
| **MUST** | Absolute requirement. The implementation is non-conformant without this. | Must have a passing test. Failure = release blocker. |
|
||||
| **MUST NOT** | Absolute prohibition. Doing this = broken implementation. | Must have a test proving this cannot happen. |
|
||||
| **SHOULD** | Strongly recommended. Can be omitted only with documented justification. | Should have a test. Omission requires written rationale. |
|
||||
| **SHOULD NOT** | Strongly discouraged. Can be done only with documented justification. | Should have a test confirming the behavior does not occur. |
|
||||
| **MAY** | Truly optional. Implementer's discretion. | Test is optional. Document if implemented. |
|
||||
|
||||
### Writing Good Requirements
|
||||
|
||||
**Each requirement MUST be:**
|
||||
1. **Atomic** — One behavior per requirement. Not "The system MUST authenticate users and log them in."
|
||||
2. **Testable** — You can write a test that proves it works or does not.
|
||||
3. **Numbered** — Sequential FR-N format for traceability.
|
||||
4. **Specific** — No ambiguous adjectives ("fast", "secure", "user-friendly").
|
||||
|
||||
### Good Requirements
|
||||
|
||||
```markdown
|
||||
- FR-1: The system MUST accept login via email and password.
|
||||
- FR-2: The system MUST reject passwords shorter than 8 characters.
|
||||
- FR-3: The system MUST return a JWT access token on successful login.
|
||||
- FR-4: The system MUST NOT include the password hash in any API response.
|
||||
- FR-5: The system SHOULD support "remember me" with a 30-day refresh token.
|
||||
- FR-6: The system MAY display last login time on the dashboard.
|
||||
```
|
||||
|
||||
### Bad Requirements
|
||||
|
||||
```markdown
|
||||
- FR-1: The login system must be fast and secure.
|
||||
(Untestable: what is "fast"? What is "secure"?)
|
||||
|
||||
- FR-2: The system must handle all edge cases.
|
||||
(Vague: which edge cases? This delegates the spec to the implementer.)
|
||||
|
||||
- FR-3: Users should be able to log in easily.
|
||||
(Subjective: "easily" is not measurable.)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Section 4: Non-Functional Requirements
|
||||
|
||||
Non-functional requirements define quality attributes. Every requirement needs a **measurable threshold**.
|
||||
|
||||
### Categories
|
||||
|
||||
#### Performance
|
||||
```markdown
|
||||
- NFR-P1: Login API MUST respond in < 500ms (p95) under 1,000 concurrent users.
|
||||
- NFR-P2: Dashboard page MUST achieve Largest Contentful Paint < 2.5s.
|
||||
- NFR-P3: Search results MUST return within 200ms for queries under 100 characters.
|
||||
```
|
||||
|
||||
**Bad:** "The system should be fast." (Not measurable.)
|
||||
|
||||
#### Security
|
||||
```markdown
|
||||
- NFR-S1: All API endpoints MUST require authentication except /health and /login.
|
||||
- NFR-S2: Failed login attempts MUST be rate-limited to 5 per minute per IP.
|
||||
- NFR-S3: Passwords MUST be hashed with bcrypt (cost factor >= 12).
|
||||
- NFR-S4: Session tokens MUST be invalidated on password change.
|
||||
```
|
||||
|
||||
#### Accessibility
|
||||
```markdown
|
||||
- NFR-A1: All form inputs MUST have associated labels (WCAG 1.3.1).
|
||||
- NFR-A2: Color contrast MUST meet 4.5:1 ratio (WCAG 1.4.3).
|
||||
- NFR-A3: All interactive elements MUST be keyboard-navigable (WCAG 2.1.1).
|
||||
```
|
||||
|
||||
#### Scalability
|
||||
```markdown
|
||||
- NFR-SC1: The system SHOULD handle 50,000 registered users.
|
||||
- NFR-SC2: Database queries MUST use indexes; no full table scans on tables > 10K rows.
|
||||
```
|
||||
|
||||
#### Reliability
|
||||
```markdown
|
||||
- NFR-R1: The authentication service MUST maintain 99.9% uptime (< 8.77h downtime/year).
|
||||
- NFR-R2: Data MUST NOT be lost on service restart (durable storage required).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Section 5: Acceptance Criteria — Given/When/Then
|
||||
|
||||
Acceptance criteria are the contract between the spec author and the implementer. They define "done."
|
||||
|
||||
### The Given/When/Then Pattern
|
||||
|
||||
```
|
||||
Given [precondition — the world is in this state]
|
||||
When [action — the user or system does this]
|
||||
Then [outcome — this observable result occurs]
|
||||
And [additional outcome — and also this]
|
||||
```
|
||||
|
||||
### Rules for Acceptance Criteria
|
||||
|
||||
1. **Every AC MUST reference at least one FR-* or NFR-*.** Orphaned criteria indicate missing requirements.
|
||||
2. **Every AC MUST be testable by a machine.** If you cannot write an automated test, rewrite the criterion.
|
||||
3. **No subjective language.** Not "should look good" but "MUST render within the design-system grid."
|
||||
4. **One scenario per AC.** If you have multiple Given/When/Then blocks, split into separate ACs.
|
||||
|
||||
### Example: Authentication Feature
|
||||
|
||||
```markdown
|
||||
### AC-1: Successful login (FR-1, FR-3)
|
||||
Given a registered user with email "user@example.com" and password "P@ssw0rd123"
|
||||
When they POST /api/auth/login with those credentials
|
||||
Then they receive a 200 response with a valid JWT token
|
||||
And the token expires in 24 hours
|
||||
And the response includes the user's display name
|
||||
|
||||
### AC-2: Invalid password (FR-1)
|
||||
Given a registered user with email "user@example.com"
|
||||
When they POST /api/auth/login with an incorrect password
|
||||
Then they receive a 401 response
|
||||
And the response body contains error "INVALID_CREDENTIALS"
|
||||
And no token is issued
|
||||
|
||||
### AC-3: Short password rejected on registration (FR-2)
|
||||
Given a new user attempting to register
|
||||
When they submit a password with 7 characters
|
||||
Then they receive a 400 response
|
||||
And the response body contains error "PASSWORD_TOO_SHORT"
|
||||
And the account is not created
|
||||
```
|
||||
|
||||
### Common Mistakes
|
||||
|
||||
| Mistake | Example | Fix |
|
||||
|---------|---------|-----|
|
||||
| Vague outcome | "Then the system works correctly" | "Then the response status is 200 and body contains {field: value}" |
|
||||
| Missing precondition | "When user logs in, then token is issued" | "Given a registered user, when they POST valid credentials, then..." |
|
||||
| Multiple scenarios | AC with 3 different When clauses | Split into 3 separate ACs |
|
||||
| No FR reference | "AC-5: User sees dashboard" | "AC-5: User sees dashboard (FR-7)" |
|
||||
|
||||
---
|
||||
|
||||
## Section 6: Edge Cases and Error Scenarios
|
||||
|
||||
### What Counts as an Edge Case
|
||||
|
||||
- Invalid or malformed input
|
||||
- External service failures (API down, timeout, rate-limited)
|
||||
- Concurrent operations (race conditions)
|
||||
- Boundary values (empty string, max length, zero, negative numbers)
|
||||
- State conflicts (already exists, already deleted, expired)
|
||||
|
||||
### Format
|
||||
|
||||
```markdown
|
||||
- EC-1: Empty email field → Return 400 with error "EMAIL_REQUIRED". Do not call auth service.
|
||||
- EC-2: Email exceeds 255 characters → Return 400 with error "EMAIL_TOO_LONG".
|
||||
- EC-3: OAuth provider returns 503 → Return 503 with "Service temporarily unavailable". Retry after 30s.
|
||||
- EC-4: Two users register same email simultaneously → First succeeds, second gets 409 Conflict.
|
||||
- EC-5: User clicks reset link after password was already changed → Show "Link already used."
|
||||
```
|
||||
|
||||
### Coverage Rule
|
||||
|
||||
For every external dependency, specify at least one failure:
|
||||
- Database: connection lost, timeout, constraint violation
|
||||
- API: 4xx, 5xx, timeout, invalid response
|
||||
- File system: file not found, permission denied, disk full
|
||||
- User input: empty, too long, wrong type, injection attempt
|
||||
|
||||
---
|
||||
|
||||
## Section 7: API Contracts
|
||||
|
||||
### Notation
|
||||
|
||||
Use TypeScript-style interfaces. They are readable by both frontend and backend engineers.
|
||||
|
||||
```typescript
|
||||
interface CreateUserRequest {
|
||||
email: string; // MUST be valid email, max 255 chars
|
||||
password: string; // MUST be 8-128 chars
|
||||
displayName: string; // MUST be 1-100 chars, no HTML
|
||||
role?: "user" | "admin"; // Default: "user"
|
||||
}
|
||||
```
|
||||
|
||||
### What to Define
|
||||
|
||||
For each endpoint:
|
||||
1. **HTTP method and path** (e.g., POST /api/users)
|
||||
2. **Request body** (fields, types, constraints, defaults)
|
||||
3. **Success response** (status code, body shape)
|
||||
4. **Error responses** (each error code with its status and body)
|
||||
5. **Headers** (Authorization, Content-Type, custom headers)
|
||||
|
||||
### Error Response Convention
|
||||
|
||||
```typescript
|
||||
interface ApiError {
|
||||
error: string; // Machine-readable code: "INVALID_CREDENTIALS"
|
||||
message: string; // Human-readable: "The email or password is incorrect."
|
||||
details?: Record<string, string>; // Field-level errors for validation
|
||||
}
|
||||
```
|
||||
|
||||
Always include:
|
||||
- 400 for validation errors
|
||||
- 401 for authentication failures
|
||||
- 403 for authorization failures
|
||||
- 404 for not found
|
||||
- 409 for conflicts
|
||||
- 429 for rate limiting
|
||||
- 500 for unexpected errors (keep it generic — do not leak internals)
|
||||
|
||||
---
|
||||
|
||||
## Section 8: Data Models
|
||||
|
||||
### Table Format
|
||||
|
||||
```markdown
|
||||
### User
|
||||
| Field | Type | Constraints |
|
||||
|-------|------|-------------|
|
||||
| id | UUID | PK, auto-generated, immutable |
|
||||
| email | varchar(255) | Unique, not null, valid email |
|
||||
| passwordHash | varchar(60) | Not null, bcrypt, never in API responses |
|
||||
| displayName | varchar(100) | Not null |
|
||||
| role | enum('user','admin') | Default: 'user' |
|
||||
| createdAt | timestamp | UTC, immutable, auto-set |
|
||||
| updatedAt | timestamp | UTC, auto-updated |
|
||||
| deletedAt | timestamp | Null unless soft-deleted |
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Every entity in requirements MUST have a data model.** If FR-1 mentions "users", there must be a User model.
|
||||
2. **Constraints MUST match requirements.** If FR-2 says passwords >= 8 chars, the model must note that.
|
||||
3. **Include indexes.** If NFR-P1 says < 500ms queries, note which fields need indexes.
|
||||
4. **Specify soft vs. hard delete.** State it explicitly.
|
||||
|
||||
---
|
||||
|
||||
## Section 9: Out of Scope
|
||||
|
||||
### Why This Section Matters
|
||||
|
||||
Out of Scope prevents scope creep during implementation. When someone says "while you're in there, could you also..." — point them to this section.
|
||||
|
||||
### Format
|
||||
|
||||
```markdown
|
||||
- OS-1: Multi-factor authentication — Planned for Q3 (SPEC-045).
|
||||
- OS-2: Social login beyond Google/GitHub — Insufficient user demand (< 2% requests).
|
||||
- OS-3: Admin impersonation — Security review pending. Separate spec required.
|
||||
- OS-4: Password strength meter UI — Nice-to-have, deferred to design sprint 12.
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Every feature discussed and rejected MUST be listed.** This creates a paper trail.
|
||||
2. **Include the reason.** "Not now" is not a reason. "Insufficient demand (< 2% of requests)" is.
|
||||
3. **Link to future specs** when the exclusion is a deferral, not a rejection.
|
||||
|
||||
---
|
||||
|
||||
## Feature-Type Templates
|
||||
|
||||
### CRUD Feature
|
||||
|
||||
Focus on: all 4 operations, validation rules, authorization, pagination for list endpoints.
|
||||
|
||||
```markdown
|
||||
- FR-1: Users MUST be able to create a [resource] with [required fields].
|
||||
- FR-2: Users MUST be able to read a [resource] by ID.
|
||||
- FR-3: Users MUST be able to list [resources] with pagination (default: 20/page).
|
||||
- FR-4: Users MUST be able to update [mutable fields] of their own [resources].
|
||||
- FR-5: Users MUST be able to delete their own [resources] (soft delete).
|
||||
- FR-6: Users MUST NOT be able to modify or delete other users' [resources].
|
||||
```
|
||||
|
||||
### Integration Feature
|
||||
|
||||
Focus on: external API contract, retry/fallback behavior, data mapping, error propagation.
|
||||
|
||||
```markdown
|
||||
- FR-1: The system MUST call [external API] to [purpose].
|
||||
- FR-2: The system MUST retry failed calls up to 3 times with exponential backoff.
|
||||
- FR-3: The system MUST map [external field] to [internal field].
|
||||
- FR-4: The system MUST NOT expose external API errors directly to users.
|
||||
- EC-1: External API returns 5xx → Log error, return cached data if < 1h old, else 503.
|
||||
- EC-2: External API response schema changes → Log warning, reject unmappable fields.
|
||||
```
|
||||
|
||||
### Migration Feature
|
||||
|
||||
Focus on: backward compatibility, rollback plan, data integrity, zero-downtime deployment.
|
||||
|
||||
```markdown
|
||||
- FR-1: The migration MUST transform [old schema] to [new schema].
|
||||
- FR-2: The migration MUST be reversible (rollback script required).
|
||||
- FR-3: The migration MUST NOT cause downtime exceeding 30 seconds.
|
||||
- FR-4: The migration MUST validate data integrity post-run (row count, checksum).
|
||||
- EC-1: Migration fails mid-way → Automatic rollback, alert ops team.
|
||||
- EC-2: New schema has stricter constraints → Log invalid rows, quarantine for manual review.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist: Is This Spec Ready for Review?
|
||||
|
||||
- [ ] Every section is filled (or marked N/A with reason)
|
||||
- [ ] All requirements use FR-N, NFR-N numbering
|
||||
- [ ] RFC 2119 keywords are UPPERCASE
|
||||
- [ ] Every AC references at least one requirement
|
||||
- [ ] Every AC uses Given/When/Then
|
||||
- [ ] Edge cases cover each external dependency failure
|
||||
- [ ] API contracts define success AND error responses
|
||||
- [ ] Data models include all entities from requirements
|
||||
- [ ] Out of Scope lists items discussed and rejected
|
||||
- [ ] No placeholder text remains
|
||||
- [ ] Context includes evidence (metrics, tickets, research)
|
||||
- [ ] Status is "In Review" (not still "Draft")
|
||||
338
engineering/spec-driven-workflow/spec_generator.py
Normal file
338
engineering/spec-driven-workflow/spec_generator.py
Normal file
@@ -0,0 +1,338 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Spec Generator - Generates a feature specification template from a name and description.
|
||||
|
||||
Produces a complete spec document with all required sections pre-filled with
|
||||
guidance prompts. Output can be markdown or structured JSON.
|
||||
|
||||
No external dependencies - uses only Python standard library.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import textwrap
|
||||
from datetime import date
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
|
||||
SPEC_TEMPLATE = """\
|
||||
# Spec: {name}
|
||||
|
||||
**Author:** [your name]
|
||||
**Date:** {date}
|
||||
**Status:** Draft
|
||||
**Reviewers:** [list reviewers]
|
||||
**Related specs:** [links to related specs, or "None"]
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
{context_prompt}
|
||||
|
||||
---
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
_Use RFC 2119 keywords: MUST, MUST NOT, SHOULD, SHOULD NOT, MAY._
|
||||
_Each requirement is a single, testable statement. Number sequentially._
|
||||
|
||||
- FR-1: The system MUST [describe required behavior].
|
||||
- FR-2: The system MUST [describe another required behavior].
|
||||
- FR-3: The system SHOULD [describe recommended behavior].
|
||||
- FR-4: The system MAY [describe optional behavior].
|
||||
- FR-5: The system MUST NOT [describe prohibited behavior].
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- NFR-P1: [Operation] MUST complete in < [threshold] (p95) under [conditions].
|
||||
- NFR-P2: [Operation] SHOULD handle [throughput] requests per second.
|
||||
|
||||
### Security
|
||||
- NFR-S1: All data in transit MUST be encrypted via TLS 1.2+.
|
||||
- NFR-S2: The system MUST rate-limit [operation] to [limit] per [period] per [scope].
|
||||
|
||||
### Accessibility
|
||||
- NFR-A1: [UI component] MUST meet WCAG 2.1 AA standards.
|
||||
- NFR-A2: Error messages MUST be announced to screen readers.
|
||||
|
||||
### Scalability
|
||||
- NFR-SC1: The system SHOULD handle [number] concurrent [entities].
|
||||
|
||||
### Reliability
|
||||
- NFR-R1: The [service] MUST maintain [percentage]% uptime.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
_Write in Given/When/Then (Gherkin) format._
|
||||
_Each criterion MUST reference at least one FR-* or NFR-*._
|
||||
|
||||
### AC-1: [Descriptive name] (FR-1)
|
||||
Given [precondition]
|
||||
When [action]
|
||||
Then [expected result]
|
||||
And [additional assertion]
|
||||
|
||||
### AC-2: [Descriptive name] (FR-2)
|
||||
Given [precondition]
|
||||
When [action]
|
||||
Then [expected result]
|
||||
|
||||
### AC-3: [Descriptive name] (NFR-S2)
|
||||
Given [precondition]
|
||||
When [action]
|
||||
Then [expected result]
|
||||
And [additional assertion]
|
||||
|
||||
---
|
||||
|
||||
## Edge Cases
|
||||
|
||||
_For every external dependency (API, database, file system, user input), specify at least one failure scenario._
|
||||
|
||||
- EC-1: [Input/condition] -> [expected behavior].
|
||||
- EC-2: [Input/condition] -> [expected behavior].
|
||||
- EC-3: [External service] is unavailable -> [expected behavior].
|
||||
- EC-4: [Concurrent/race condition] -> [expected behavior].
|
||||
- EC-5: [Boundary value] -> [expected behavior].
|
||||
|
||||
---
|
||||
|
||||
## API Contracts
|
||||
|
||||
_Define request/response shapes using TypeScript-style notation._
|
||||
_Cover all endpoints referenced in functional requirements._
|
||||
|
||||
### [METHOD] [endpoint]
|
||||
|
||||
Request:
|
||||
```typescript
|
||||
interface [Name]Request {{
|
||||
field: string; // Description, constraints
|
||||
optional?: number; // Default: [value]
|
||||
}}
|
||||
```
|
||||
|
||||
Success Response ([status code]):
|
||||
```typescript
|
||||
interface [Name]Response {{
|
||||
id: string;
|
||||
field: string;
|
||||
createdAt: string; // ISO 8601
|
||||
}}
|
||||
```
|
||||
|
||||
Error Response ([status code]):
|
||||
```typescript
|
||||
interface [Name]Error {{
|
||||
error: "[ERROR_CODE]";
|
||||
message: string;
|
||||
}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Data Models
|
||||
|
||||
_Define all entities referenced in requirements._
|
||||
|
||||
### [Entity Name]
|
||||
| Field | Type | Constraints |
|
||||
|-------|------|-------------|
|
||||
| id | UUID | Primary key, auto-generated |
|
||||
| [field] | [type] | [constraints] |
|
||||
| createdAt | timestamp | UTC, immutable |
|
||||
| updatedAt | timestamp | UTC, auto-updated |
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
_Explicit exclusions prevent scope creep. If someone asks for these during implementation, point them here._
|
||||
|
||||
- OS-1: [Feature/capability] — [reason for exclusion or link to future spec].
|
||||
- OS-2: [Feature/capability] — [reason for exclusion].
|
||||
- OS-3: [Feature/capability] — deferred to [version/sprint].
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
_Track unresolved questions here. Each must be resolved before status moves to "Approved"._
|
||||
|
||||
- [ ] Q1: [Question] — Owner: [name], Due: [date]
|
||||
- [ ] Q2: [Question] — Owner: [name], Due: [date]
|
||||
"""
|
||||
|
||||
|
||||
def generate_context_prompt(description: str) -> str:
|
||||
"""Generate a context section prompt based on the provided description."""
|
||||
if description:
|
||||
return textwrap.dedent(f"""\
|
||||
{description}
|
||||
|
||||
_Expand this context section to include:_
|
||||
_- Why does this feature exist? What problem does it solve?_
|
||||
_- What is the business motivation? (link to user research, support tickets, metrics)_
|
||||
_- What is the current state? (what exists today, what pain points exist)_
|
||||
_- 2-4 paragraphs maximum._""")
|
||||
return textwrap.dedent("""\
|
||||
_Why does this feature exist? What problem does it solve? What is the business
|
||||
motivation? Include links to user research, support tickets, or metrics that
|
||||
justify this work. 2-4 paragraphs maximum._""")
|
||||
|
||||
|
||||
def generate_spec(name: str, description: str) -> str:
|
||||
"""Generate a spec document from name and description."""
|
||||
context_prompt = generate_context_prompt(description)
|
||||
return SPEC_TEMPLATE.format(
|
||||
name=name,
|
||||
date=date.today().isoformat(),
|
||||
context_prompt=context_prompt,
|
||||
)
|
||||
|
||||
|
||||
def generate_spec_json(name: str, description: str) -> Dict[str, Any]:
|
||||
"""Generate structured JSON representation of the spec template."""
|
||||
return {
|
||||
"spec": {
|
||||
"title": f"Spec: {name}",
|
||||
"metadata": {
|
||||
"author": "[your name]",
|
||||
"date": date.today().isoformat(),
|
||||
"status": "Draft",
|
||||
"reviewers": [],
|
||||
"related_specs": [],
|
||||
},
|
||||
"context": description or "[Describe why this feature exists]",
|
||||
"functional_requirements": [
|
||||
{"id": "FR-1", "keyword": "MUST", "description": "[describe required behavior]"},
|
||||
{"id": "FR-2", "keyword": "MUST", "description": "[describe another required behavior]"},
|
||||
{"id": "FR-3", "keyword": "SHOULD", "description": "[describe recommended behavior]"},
|
||||
{"id": "FR-4", "keyword": "MAY", "description": "[describe optional behavior]"},
|
||||
{"id": "FR-5", "keyword": "MUST NOT", "description": "[describe prohibited behavior]"},
|
||||
],
|
||||
"non_functional_requirements": {
|
||||
"performance": [
|
||||
{"id": "NFR-P1", "description": "[operation] MUST complete in < [threshold]"},
|
||||
],
|
||||
"security": [
|
||||
{"id": "NFR-S1", "description": "All data in transit MUST be encrypted via TLS 1.2+"},
|
||||
],
|
||||
"accessibility": [
|
||||
{"id": "NFR-A1", "description": "[UI component] MUST meet WCAG 2.1 AA"},
|
||||
],
|
||||
"scalability": [
|
||||
{"id": "NFR-SC1", "description": "[system] SHOULD handle [N] concurrent [entities]"},
|
||||
],
|
||||
"reliability": [
|
||||
{"id": "NFR-R1", "description": "[service] MUST maintain [N]% uptime"},
|
||||
],
|
||||
},
|
||||
"acceptance_criteria": [
|
||||
{
|
||||
"id": "AC-1",
|
||||
"name": "[descriptive name]",
|
||||
"references": ["FR-1"],
|
||||
"given": "[precondition]",
|
||||
"when": "[action]",
|
||||
"then": "[expected result]",
|
||||
},
|
||||
],
|
||||
"edge_cases": [
|
||||
{"id": "EC-1", "condition": "[input/condition]", "behavior": "[expected behavior]"},
|
||||
],
|
||||
"api_contracts": [
|
||||
{
|
||||
"method": "[METHOD]",
|
||||
"endpoint": "[/api/path]",
|
||||
"request_fields": [{"name": "field", "type": "string", "constraints": "[description]"}],
|
||||
"success_response": {"status": 200, "fields": []},
|
||||
"error_response": {"status": 400, "fields": []},
|
||||
},
|
||||
],
|
||||
"data_models": [
|
||||
{
|
||||
"name": "[Entity]",
|
||||
"fields": [
|
||||
{"name": "id", "type": "UUID", "constraints": "Primary key, auto-generated"},
|
||||
],
|
||||
},
|
||||
],
|
||||
"out_of_scope": [
|
||||
{"id": "OS-1", "description": "[feature/capability]", "reason": "[reason]"},
|
||||
],
|
||||
"open_questions": [],
|
||||
},
|
||||
"metadata": {
|
||||
"generated_by": "spec_generator.py",
|
||||
"feature_name": name,
|
||||
"feature_description": description,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate a feature specification template from a name and description.",
|
||||
epilog="Example: python spec_generator.py --name 'User Auth' --description 'OAuth 2.0 login flow'",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--name",
|
||||
required=True,
|
||||
help="Feature name (used as spec title)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--description",
|
||||
default="",
|
||||
help="Brief feature description (used to seed the context section)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
"-o",
|
||||
default=None,
|
||||
help="Output file path (default: stdout)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--format",
|
||||
choices=["md", "json"],
|
||||
default="md",
|
||||
help="Output format: md (markdown) or json (default: md)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
action="store_true",
|
||||
dest="json_flag",
|
||||
help="Shorthand for --format json",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
output_format = "json" if args.json_flag else args.format
|
||||
|
||||
if output_format == "json":
|
||||
result = generate_spec_json(args.name, args.description)
|
||||
output = json.dumps(result, indent=2)
|
||||
else:
|
||||
output = generate_spec(args.name, args.description)
|
||||
|
||||
if args.output:
|
||||
out_path = Path(args.output)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(output, encoding="utf-8")
|
||||
print(f"Spec template written to {out_path}", file=sys.stderr)
|
||||
else:
|
||||
print(output)
|
||||
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
461
engineering/spec-driven-workflow/spec_validator.py
Normal file
461
engineering/spec-driven-workflow/spec_validator.py
Normal file
@@ -0,0 +1,461 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Spec Validator - Validates a feature specification for completeness and quality.
|
||||
|
||||
Checks that a spec document contains all required sections, uses RFC 2119 keywords
|
||||
correctly, has acceptance criteria in Given/When/Then format, and scores overall
|
||||
completeness from 0-100.
|
||||
|
||||
Sections checked:
|
||||
- Context, Functional Requirements, Non-Functional Requirements
|
||||
- Acceptance Criteria, Edge Cases, API Contracts, Data Models, Out of Scope
|
||||
|
||||
Exit codes: 0 = pass, 1 = warnings, 2 = critical (or --strict with score < 80)
|
||||
|
||||
No external dependencies - uses only Python standard library.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Tuple
|
||||
|
||||
|
||||
# Section definitions: (key, display_name, required_header_patterns, weight)
|
||||
SECTIONS = [
|
||||
("context", "Context", [r"^##\s+Context"], 10),
|
||||
("functional_requirements", "Functional Requirements", [r"^##\s+Functional\s+Requirements"], 15),
|
||||
("non_functional_requirements", "Non-Functional Requirements", [r"^##\s+Non-Functional\s+Requirements"], 10),
|
||||
("acceptance_criteria", "Acceptance Criteria", [r"^##\s+Acceptance\s+Criteria"], 20),
|
||||
("edge_cases", "Edge Cases", [r"^##\s+Edge\s+Cases"], 10),
|
||||
("api_contracts", "API Contracts", [r"^##\s+API\s+Contracts"], 10),
|
||||
("data_models", "Data Models", [r"^##\s+Data\s+Models"], 10),
|
||||
("out_of_scope", "Out of Scope", [r"^##\s+Out\s+of\s+Scope"], 10),
|
||||
("metadata", "Metadata (Author/Date/Status)", [r"\*\*Author:\*\*", r"\*\*Date:\*\*", r"\*\*Status:\*\*"], 5),
|
||||
]
|
||||
|
||||
RFC_KEYWORDS = ["MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY"]
|
||||
|
||||
# Patterns that indicate placeholder/unfilled content
|
||||
PLACEHOLDER_PATTERNS = [
|
||||
r"\[your\s+name\]",
|
||||
r"\[list\s+reviewers\]",
|
||||
r"\[describe\s+",
|
||||
r"\[input/condition\]",
|
||||
r"\[precondition\]",
|
||||
r"\[action\]",
|
||||
r"\[expected\s+result\]",
|
||||
r"\[feature/capability\]",
|
||||
r"\[operation\]",
|
||||
r"\[threshold\]",
|
||||
r"\[UI\s+component\]",
|
||||
r"\[service\]",
|
||||
r"\[percentage\]",
|
||||
r"\[number\]",
|
||||
r"\[METHOD\]",
|
||||
r"\[endpoint\]",
|
||||
r"\[Name\]",
|
||||
r"\[Entity\s+Name\]",
|
||||
r"\[type\]",
|
||||
r"\[constraints\]",
|
||||
r"\[field\]",
|
||||
r"\[reason\]",
|
||||
]
|
||||
|
||||
|
||||
class SpecValidator:
|
||||
"""Validates a spec document for completeness and quality."""
|
||||
|
||||
def __init__(self, content: str, file_path: str = ""):
|
||||
self.content = content
|
||||
self.file_path = file_path
|
||||
self.lines = content.split("\n")
|
||||
self.findings: List[Dict[str, Any]] = []
|
||||
self.section_scores: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
def validate(self) -> Dict[str, Any]:
|
||||
"""Run all validation checks and return results."""
|
||||
self._check_sections_present()
|
||||
self._check_functional_requirements()
|
||||
self._check_acceptance_criteria()
|
||||
self._check_edge_cases()
|
||||
self._check_rfc_keywords()
|
||||
self._check_api_contracts()
|
||||
self._check_data_models()
|
||||
self._check_out_of_scope()
|
||||
self._check_placeholders()
|
||||
self._check_traceability()
|
||||
|
||||
total_score = self._calculate_score()
|
||||
|
||||
return {
|
||||
"file": self.file_path,
|
||||
"score": total_score,
|
||||
"grade": self._score_to_grade(total_score),
|
||||
"sections": self.section_scores,
|
||||
"findings": self.findings,
|
||||
"summary": self._build_summary(total_score),
|
||||
}
|
||||
|
||||
def _add_finding(self, severity: str, section: str, message: str):
|
||||
"""Record a validation finding."""
|
||||
self.findings.append({
|
||||
"severity": severity, # "error", "warning", "info"
|
||||
"section": section,
|
||||
"message": message,
|
||||
})
|
||||
|
||||
def _find_section_content(self, header_pattern: str) -> str:
|
||||
"""Extract content between a section header and the next ## header."""
|
||||
in_section = False
|
||||
section_lines = []
|
||||
for line in self.lines:
|
||||
if re.match(header_pattern, line, re.IGNORECASE):
|
||||
in_section = True
|
||||
continue
|
||||
if in_section and re.match(r"^##\s+", line):
|
||||
break
|
||||
if in_section:
|
||||
section_lines.append(line)
|
||||
return "\n".join(section_lines)
|
||||
|
||||
def _check_sections_present(self):
|
||||
"""Check that all required sections exist."""
|
||||
for key, name, patterns, weight in SECTIONS:
|
||||
found = False
|
||||
for pattern in patterns:
|
||||
for line in self.lines:
|
||||
if re.search(pattern, line, re.IGNORECASE):
|
||||
found = True
|
||||
break
|
||||
if found:
|
||||
break
|
||||
|
||||
if found:
|
||||
self.section_scores[key] = {"name": name, "present": True, "score": weight, "max": weight}
|
||||
else:
|
||||
self.section_scores[key] = {"name": name, "present": False, "score": 0, "max": weight}
|
||||
self._add_finding("error", key, f"Missing section: {name}")
|
||||
|
||||
def _check_functional_requirements(self):
|
||||
"""Validate functional requirements format and content."""
|
||||
content = self._find_section_content(r"^##\s+Functional\s+Requirements")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
fr_pattern = re.compile(r"-\s+FR-(\d+):")
|
||||
matches = fr_pattern.findall(content)
|
||||
|
||||
if not matches:
|
||||
self._add_finding("error", "functional_requirements", "No numbered requirements found (expected FR-N: format)")
|
||||
if "functional_requirements" in self.section_scores:
|
||||
self.section_scores["functional_requirements"]["score"] = max(
|
||||
0, self.section_scores["functional_requirements"]["score"] - 10
|
||||
)
|
||||
return
|
||||
|
||||
fr_count = len(matches)
|
||||
if fr_count < 3:
|
||||
self._add_finding("warning", "functional_requirements", f"Only {fr_count} requirements found. Most features need 3+.")
|
||||
|
||||
# Check for RFC keywords
|
||||
has_keyword = False
|
||||
for kw in RFC_KEYWORDS:
|
||||
if kw in content:
|
||||
has_keyword = True
|
||||
break
|
||||
if not has_keyword:
|
||||
self._add_finding("warning", "functional_requirements", "No RFC 2119 keywords (MUST/SHOULD/MAY) found.")
|
||||
|
||||
def _check_acceptance_criteria(self):
|
||||
"""Validate acceptance criteria use Given/When/Then format."""
|
||||
content = self._find_section_content(r"^##\s+Acceptance\s+Criteria")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
ac_pattern = re.compile(r"###\s+AC-(\d+):")
|
||||
matches = ac_pattern.findall(content)
|
||||
|
||||
if not matches:
|
||||
self._add_finding("error", "acceptance_criteria", "No numbered acceptance criteria found (expected ### AC-N: format)")
|
||||
if "acceptance_criteria" in self.section_scores:
|
||||
self.section_scores["acceptance_criteria"]["score"] = max(
|
||||
0, self.section_scores["acceptance_criteria"]["score"] - 15
|
||||
)
|
||||
return
|
||||
|
||||
ac_count = len(matches)
|
||||
|
||||
# Check Given/When/Then
|
||||
given_count = len(re.findall(r"(?i)\bgiven\b", content))
|
||||
when_count = len(re.findall(r"(?i)\bwhen\b", content))
|
||||
then_count = len(re.findall(r"(?i)\bthen\b", content))
|
||||
|
||||
if given_count < ac_count:
|
||||
self._add_finding("warning", "acceptance_criteria",
|
||||
f"Found {ac_count} criteria but only {given_count} 'Given' clauses. Each AC needs Given/When/Then.")
|
||||
if when_count < ac_count:
|
||||
self._add_finding("warning", "acceptance_criteria",
|
||||
f"Found {ac_count} criteria but only {when_count} 'When' clauses.")
|
||||
if then_count < ac_count:
|
||||
self._add_finding("warning", "acceptance_criteria",
|
||||
f"Found {ac_count} criteria but only {then_count} 'Then' clauses.")
|
||||
|
||||
# Check for FR references
|
||||
fr_refs = re.findall(r"\(FR-\d+", content)
|
||||
if not fr_refs:
|
||||
self._add_finding("warning", "acceptance_criteria",
|
||||
"No acceptance criteria reference functional requirements (expected (FR-N) in title).")
|
||||
|
||||
def _check_edge_cases(self):
|
||||
"""Validate edge cases section."""
|
||||
content = self._find_section_content(r"^##\s+Edge\s+Cases")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
ec_pattern = re.compile(r"-\s+EC-(\d+):")
|
||||
matches = ec_pattern.findall(content)
|
||||
|
||||
if not matches:
|
||||
self._add_finding("warning", "edge_cases", "No numbered edge cases found (expected EC-N: format)")
|
||||
elif len(matches) < 3:
|
||||
self._add_finding("warning", "edge_cases", f"Only {len(matches)} edge cases. Consider failure modes for each external dependency.")
|
||||
|
||||
def _check_rfc_keywords(self):
|
||||
"""Check RFC 2119 keywords are used consistently (capitalized)."""
|
||||
# Look for lowercase must/should/may that might be intended as RFC keywords
|
||||
context_content = self._find_section_content(r"^##\s+Functional\s+Requirements")
|
||||
context_content += self._find_section_content(r"^##\s+Non-Functional\s+Requirements")
|
||||
|
||||
for kw in ["must", "should", "may"]:
|
||||
# Find lowercase usage in requirement-like sentences
|
||||
pattern = rf"(?:system|service|API|endpoint)\s+{kw}\s+"
|
||||
if re.search(pattern, context_content):
|
||||
self._add_finding("warning", "rfc_keywords",
|
||||
f"Found lowercase '{kw}' in requirements. RFC 2119 keywords should be UPPERCASE: {kw.upper()}")
|
||||
|
||||
def _check_api_contracts(self):
|
||||
"""Validate API contracts section."""
|
||||
content = self._find_section_content(r"^##\s+API\s+Contracts")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
# Check for at least one endpoint definition
|
||||
has_endpoint = bool(re.search(r"(GET|POST|PUT|PATCH|DELETE)\s+/", content))
|
||||
if not has_endpoint:
|
||||
self._add_finding("warning", "api_contracts", "No HTTP method + path found (expected e.g., POST /api/endpoint)")
|
||||
|
||||
# Check for request/response definitions
|
||||
has_interface = bool(re.search(r"interface\s+\w+", content))
|
||||
if not has_interface:
|
||||
self._add_finding("info", "api_contracts", "No TypeScript interfaces found. Consider defining request/response shapes.")
|
||||
|
||||
def _check_data_models(self):
|
||||
"""Validate data models section."""
|
||||
content = self._find_section_content(r"^##\s+Data\s+Models")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
# Check for table format
|
||||
has_table = bool(re.search(r"\|.*\|.*\|", content))
|
||||
if not has_table:
|
||||
self._add_finding("warning", "data_models", "No table-formatted data models found. Use | Field | Type | Constraints | format.")
|
||||
|
||||
def _check_out_of_scope(self):
|
||||
"""Validate out of scope section."""
|
||||
content = self._find_section_content(r"^##\s+Out\s+of\s+Scope")
|
||||
if not content.strip():
|
||||
return
|
||||
|
||||
os_pattern = re.compile(r"-\s+OS-(\d+):")
|
||||
matches = os_pattern.findall(content)
|
||||
|
||||
if not matches:
|
||||
self._add_finding("warning", "out_of_scope", "No numbered exclusions found (expected OS-N: format)")
|
||||
elif len(matches) < 2:
|
||||
self._add_finding("info", "out_of_scope", "Only 1 exclusion listed. Consider what was deliberately left out.")
|
||||
|
||||
def _check_placeholders(self):
|
||||
"""Check for unfilled placeholder text."""
|
||||
placeholder_count = 0
|
||||
for pattern in PLACEHOLDER_PATTERNS:
|
||||
matches = re.findall(pattern, self.content, re.IGNORECASE)
|
||||
placeholder_count += len(matches)
|
||||
|
||||
if placeholder_count > 0:
|
||||
self._add_finding("warning", "placeholders",
|
||||
f"Found {placeholder_count} placeholder(s) that need to be filled in (e.g., [your name], [describe ...]).")
|
||||
# Deduct from overall score proportionally
|
||||
for key in self.section_scores:
|
||||
if self.section_scores[key]["present"]:
|
||||
deduction = min(3, self.section_scores[key]["score"])
|
||||
self.section_scores[key]["score"] = max(0, self.section_scores[key]["score"] - deduction)
|
||||
|
||||
def _check_traceability(self):
|
||||
"""Check that acceptance criteria reference functional requirements."""
|
||||
ac_content = self._find_section_content(r"^##\s+Acceptance\s+Criteria")
|
||||
fr_content = self._find_section_content(r"^##\s+Functional\s+Requirements")
|
||||
|
||||
if not ac_content.strip() or not fr_content.strip():
|
||||
return
|
||||
|
||||
# Extract FR IDs
|
||||
fr_ids = set(re.findall(r"FR-(\d+)", fr_content))
|
||||
# Extract FR references from AC
|
||||
ac_fr_refs = set(re.findall(r"FR-(\d+)", ac_content))
|
||||
|
||||
unreferenced = fr_ids - ac_fr_refs
|
||||
if unreferenced:
|
||||
unreferenced_list = ", ".join(f"FR-{i}" for i in sorted(unreferenced))
|
||||
self._add_finding("warning", "traceability",
|
||||
f"Functional requirements without acceptance criteria: {unreferenced_list}")
|
||||
|
||||
def _calculate_score(self) -> int:
|
||||
"""Calculate the total completeness score."""
|
||||
total = sum(s["score"] for s in self.section_scores.values())
|
||||
maximum = sum(s["max"] for s in self.section_scores.values())
|
||||
|
||||
if maximum == 0:
|
||||
return 0
|
||||
|
||||
# Apply finding-based deductions
|
||||
error_count = sum(1 for f in self.findings if f["severity"] == "error")
|
||||
warning_count = sum(1 for f in self.findings if f["severity"] == "warning")
|
||||
|
||||
base_score = round((total / maximum) * 100)
|
||||
deduction = (error_count * 5) + (warning_count * 2)
|
||||
|
||||
return max(0, min(100, base_score - deduction))
|
||||
|
||||
@staticmethod
|
||||
def _score_to_grade(score: int) -> str:
|
||||
"""Convert score to letter grade."""
|
||||
if score >= 90:
|
||||
return "A"
|
||||
if score >= 80:
|
||||
return "B"
|
||||
if score >= 70:
|
||||
return "C"
|
||||
if score >= 60:
|
||||
return "D"
|
||||
return "F"
|
||||
|
||||
def _build_summary(self, score: int) -> str:
|
||||
"""Build human-readable summary."""
|
||||
errors = [f for f in self.findings if f["severity"] == "error"]
|
||||
warnings = [f for f in self.findings if f["severity"] == "warning"]
|
||||
infos = [f for f in self.findings if f["severity"] == "info"]
|
||||
|
||||
lines = [
|
||||
f"Spec Completeness Score: {score}/100 (Grade: {self._score_to_grade(score)})",
|
||||
f"Errors: {len(errors)}, Warnings: {len(warnings)}, Info: {len(infos)}",
|
||||
"",
|
||||
]
|
||||
|
||||
if errors:
|
||||
lines.append("ERRORS (must fix):")
|
||||
for e in errors:
|
||||
lines.append(f" [{e['section']}] {e['message']}")
|
||||
lines.append("")
|
||||
|
||||
if warnings:
|
||||
lines.append("WARNINGS (should fix):")
|
||||
for w in warnings:
|
||||
lines.append(f" [{w['section']}] {w['message']}")
|
||||
lines.append("")
|
||||
|
||||
if infos:
|
||||
lines.append("INFO:")
|
||||
for i in infos:
|
||||
lines.append(f" [{i['section']}] {i['message']}")
|
||||
lines.append("")
|
||||
|
||||
# Section breakdown
|
||||
lines.append("Section Breakdown:")
|
||||
for key, data in self.section_scores.items():
|
||||
status = "PRESENT" if data["present"] else "MISSING"
|
||||
lines.append(f" {data['name']}: {data['score']}/{data['max']} ({status})")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def format_human(result: Dict[str, Any]) -> str:
|
||||
"""Format validation result for human reading."""
|
||||
lines = [
|
||||
"=" * 60,
|
||||
"SPEC VALIDATION REPORT",
|
||||
"=" * 60,
|
||||
"",
|
||||
]
|
||||
if result["file"]:
|
||||
lines.append(f"File: {result['file']}")
|
||||
lines.append("")
|
||||
|
||||
lines.append(result["summary"])
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Validate a feature specification for completeness and quality.",
|
||||
epilog="Example: python spec_validator.py --file spec.md --strict",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--file",
|
||||
"-f",
|
||||
required=True,
|
||||
help="Path to the spec markdown file",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--strict",
|
||||
action="store_true",
|
||||
help="Exit with code 2 if score is below 80",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
action="store_true",
|
||||
dest="json_flag",
|
||||
help="Output results as JSON",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
file_path = Path(args.file)
|
||||
if not file_path.exists():
|
||||
print(f"Error: File not found: {file_path}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
content = file_path.read_text(encoding="utf-8")
|
||||
|
||||
if not content.strip():
|
||||
print(f"Error: File is empty: {file_path}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
validator = SpecValidator(content, str(file_path))
|
||||
result = validator.validate()
|
||||
|
||||
if args.json_flag:
|
||||
print(json.dumps(result, indent=2))
|
||||
else:
|
||||
print(format_human(result))
|
||||
|
||||
# Determine exit code
|
||||
score = result["score"]
|
||||
has_errors = any(f["severity"] == "error" for f in result["findings"])
|
||||
has_warnings = any(f["severity"] == "warning" for f in result["findings"])
|
||||
|
||||
if args.strict and score < 80:
|
||||
sys.exit(2)
|
||||
elif has_errors:
|
||||
sys.exit(2)
|
||||
elif has_warnings:
|
||||
sys.exit(1)
|
||||
else:
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
431
engineering/spec-driven-workflow/test_extractor.py
Normal file
431
engineering/spec-driven-workflow/test_extractor.py
Normal file
@@ -0,0 +1,431 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Test Extractor - Extracts test case stubs from a feature specification.
|
||||
|
||||
Parses acceptance criteria (Given/When/Then) and edge cases from a spec
|
||||
document, then generates test stubs for the specified framework.
|
||||
|
||||
Supported frameworks: pytest, jest, go-test
|
||||
|
||||
Exit codes: 0 = success, 1 = warnings (some criteria unparseable), 2 = critical error
|
||||
|
||||
No external dependencies - uses only Python standard library.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import textwrap
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional, Tuple
|
||||
|
||||
|
||||
class SpecParser:
|
||||
"""Parses spec documents to extract testable criteria."""
|
||||
|
||||
def __init__(self, content: str):
|
||||
self.content = content
|
||||
self.lines = content.split("\n")
|
||||
|
||||
def extract_acceptance_criteria(self) -> List[Dict[str, Any]]:
|
||||
"""Extract AC-N blocks with Given/When/Then clauses."""
|
||||
criteria = []
|
||||
ac_pattern = re.compile(r"###\s+AC-(\d+):\s*(.+?)(?:\s*\(([^)]+)\))?\s*$")
|
||||
|
||||
in_ac = False
|
||||
current_ac: Optional[Dict[str, Any]] = None
|
||||
body_lines: List[str] = []
|
||||
|
||||
for line in self.lines:
|
||||
match = ac_pattern.match(line)
|
||||
if match:
|
||||
# Save previous AC
|
||||
if current_ac is not None:
|
||||
current_ac["body"] = "\n".join(body_lines).strip()
|
||||
self._parse_gwt(current_ac)
|
||||
criteria.append(current_ac)
|
||||
|
||||
ac_id = int(match.group(1))
|
||||
name = match.group(2).strip()
|
||||
refs = match.group(3).strip() if match.group(3) else ""
|
||||
|
||||
current_ac = {
|
||||
"id": f"AC-{ac_id}",
|
||||
"name": name,
|
||||
"references": [r.strip() for r in refs.split(",") if r.strip()] if refs else [],
|
||||
"given": "",
|
||||
"when": "",
|
||||
"then": [],
|
||||
"body": "",
|
||||
}
|
||||
body_lines = []
|
||||
in_ac = True
|
||||
elif in_ac:
|
||||
# Check if we hit another ## section
|
||||
if re.match(r"^##\s+", line) and not re.match(r"^###\s+", line):
|
||||
in_ac = False
|
||||
if current_ac is not None:
|
||||
current_ac["body"] = "\n".join(body_lines).strip()
|
||||
self._parse_gwt(current_ac)
|
||||
criteria.append(current_ac)
|
||||
current_ac = None
|
||||
else:
|
||||
body_lines.append(line)
|
||||
|
||||
# Don't forget the last one
|
||||
if current_ac is not None:
|
||||
current_ac["body"] = "\n".join(body_lines).strip()
|
||||
self._parse_gwt(current_ac)
|
||||
criteria.append(current_ac)
|
||||
|
||||
return criteria
|
||||
|
||||
def extract_edge_cases(self) -> List[Dict[str, Any]]:
|
||||
"""Extract EC-N edge case items."""
|
||||
edge_cases = []
|
||||
ec_pattern = re.compile(r"-\s+EC-(\d+):\s*(.+?)(?:\s*->\s*|\s*->\s*|\s*→\s*)(.+)")
|
||||
|
||||
in_section = False
|
||||
for line in self.lines:
|
||||
if re.match(r"^##\s+Edge\s+Cases", line, re.IGNORECASE):
|
||||
in_section = True
|
||||
continue
|
||||
if in_section and re.match(r"^##\s+", line):
|
||||
break
|
||||
if in_section:
|
||||
match = ec_pattern.match(line.strip())
|
||||
if match:
|
||||
edge_cases.append({
|
||||
"id": f"EC-{match.group(1)}",
|
||||
"condition": match.group(2).strip().rstrip("."),
|
||||
"behavior": match.group(3).strip().rstrip("."),
|
||||
})
|
||||
|
||||
return edge_cases
|
||||
|
||||
def extract_spec_title(self) -> str:
|
||||
"""Extract the spec title from the first H1."""
|
||||
for line in self.lines:
|
||||
match = re.match(r"^#\s+(?:Spec:\s*)?(.+)", line)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
return "UnknownFeature"
|
||||
|
||||
@staticmethod
|
||||
def _parse_gwt(ac: Dict[str, Any]):
|
||||
"""Parse Given/When/Then from the AC body text."""
|
||||
body = ac["body"]
|
||||
lines = body.split("\n")
|
||||
|
||||
current_section = None
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
|
||||
lower = stripped.lower()
|
||||
if lower.startswith("given "):
|
||||
current_section = "given"
|
||||
ac["given"] = stripped[6:].strip()
|
||||
elif lower.startswith("when "):
|
||||
current_section = "when"
|
||||
ac["when"] = stripped[5:].strip()
|
||||
elif lower.startswith("then "):
|
||||
current_section = "then"
|
||||
ac["then"].append(stripped[5:].strip())
|
||||
elif lower.startswith("and "):
|
||||
if current_section == "then":
|
||||
ac["then"].append(stripped[4:].strip())
|
||||
elif current_section == "given":
|
||||
ac["given"] += " AND " + stripped[4:].strip()
|
||||
elif current_section == "when":
|
||||
ac["when"] += " AND " + stripped[4:].strip()
|
||||
|
||||
|
||||
def _sanitize_name(name: str) -> str:
|
||||
"""Convert a human-readable name to a valid function/method name."""
|
||||
# Remove parenthetical references like (FR-1)
|
||||
name = re.sub(r"\([^)]*\)", "", name)
|
||||
# Replace non-alphanumeric with underscore
|
||||
name = re.sub(r"[^a-zA-Z0-9]+", "_", name)
|
||||
# Remove leading/trailing underscores
|
||||
name = name.strip("_").lower()
|
||||
return name or "unnamed"
|
||||
|
||||
|
||||
def _to_pascal_case(name: str) -> str:
|
||||
"""Convert to PascalCase for Go test names."""
|
||||
parts = _sanitize_name(name).split("_")
|
||||
return "".join(p.capitalize() for p in parts if p)
|
||||
|
||||
|
||||
class PytestGenerator:
|
||||
"""Generates pytest test stubs."""
|
||||
|
||||
def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
|
||||
class_name = "Test" + _to_pascal_case(title)
|
||||
lines = [
|
||||
'"""',
|
||||
f"Test suite for: {title}",
|
||||
f"Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
|
||||
"",
|
||||
"All tests are stubs — implement the test body to make them pass.",
|
||||
'"""',
|
||||
"",
|
||||
"import pytest",
|
||||
"",
|
||||
"",
|
||||
f"class {class_name}:",
|
||||
f' """Tests for {title}."""',
|
||||
"",
|
||||
]
|
||||
|
||||
for ac in criteria:
|
||||
method_name = f"test_{ac['id'].lower().replace('-', '')}_{_sanitize_name(ac['name'])}"
|
||||
docstring = f'{ac["id"]}: {ac["name"]}'
|
||||
ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
|
||||
|
||||
lines.append(f" def {method_name}(self):")
|
||||
lines.append(f' """{docstring}{ref_str}"""')
|
||||
|
||||
if ac["given"]:
|
||||
lines.append(f" # Given {ac['given']}")
|
||||
if ac["when"]:
|
||||
lines.append(f" # When {ac['when']}")
|
||||
for t in ac["then"]:
|
||||
lines.append(f" # Then {t}")
|
||||
|
||||
lines.append(' raise NotImplementedError("Implement this test")')
|
||||
lines.append("")
|
||||
|
||||
if edge_cases:
|
||||
lines.append(" # --- Edge Cases ---")
|
||||
lines.append("")
|
||||
|
||||
for ec in edge_cases:
|
||||
method_name = f"test_{ec['id'].lower().replace('-', '')}_{_sanitize_name(ec['condition'])}"
|
||||
lines.append(f" def {method_name}(self):")
|
||||
lines.append(f' """{ec["id"]}: {ec["condition"]} -> {ec["behavior"]}"""')
|
||||
lines.append(f" # Condition: {ec['condition']}")
|
||||
lines.append(f" # Expected: {ec['behavior']}")
|
||||
lines.append(' raise NotImplementedError("Implement this test")')
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class JestGenerator:
|
||||
"""Generates Jest/Vitest test stubs (TypeScript)."""
|
||||
|
||||
def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
|
||||
lines = [
|
||||
f"/**",
|
||||
f" * Test suite for: {title}",
|
||||
f" * Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
|
||||
f" *",
|
||||
f" * All tests are stubs — implement the test body to make them pass.",
|
||||
f" */",
|
||||
"",
|
||||
f'describe("{title}", () => {{',
|
||||
]
|
||||
|
||||
for ac in criteria:
|
||||
ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
|
||||
test_name = f"{ac['id']}: {ac['name']}{ref_str}"
|
||||
|
||||
lines.append(f' it("{test_name}", () => {{')
|
||||
if ac["given"]:
|
||||
lines.append(f" // Given {ac['given']}")
|
||||
if ac["when"]:
|
||||
lines.append(f" // When {ac['when']}")
|
||||
for t in ac["then"]:
|
||||
lines.append(f" // Then {t}")
|
||||
lines.append("")
|
||||
lines.append(' throw new Error("Not implemented");')
|
||||
lines.append(" });")
|
||||
lines.append("")
|
||||
|
||||
if edge_cases:
|
||||
lines.append(" // --- Edge Cases ---")
|
||||
lines.append("")
|
||||
|
||||
for ec in edge_cases:
|
||||
test_name = f"{ec['id']}: {ec['condition']}"
|
||||
lines.append(f' it("{test_name}", () => {{')
|
||||
lines.append(f" // Condition: {ec['condition']}")
|
||||
lines.append(f" // Expected: {ec['behavior']}")
|
||||
lines.append("")
|
||||
lines.append(' throw new Error("Not implemented");')
|
||||
lines.append(" });")
|
||||
lines.append("")
|
||||
|
||||
lines.append("});")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class GoTestGenerator:
|
||||
"""Generates Go test stubs."""
|
||||
|
||||
def generate(self, title: str, criteria: List[Dict], edge_cases: List[Dict]) -> str:
|
||||
package_name = _sanitize_name(title).split("_")[0] or "feature"
|
||||
|
||||
lines = [
|
||||
f"package {package_name}_test",
|
||||
"",
|
||||
"import (",
|
||||
'\t"testing"',
|
||||
")",
|
||||
"",
|
||||
f"// Test suite for: {title}",
|
||||
f"// Auto-generated from spec. {len(criteria)} acceptance criteria, {len(edge_cases)} edge cases.",
|
||||
f"// All tests are stubs — implement the test body to make them pass.",
|
||||
"",
|
||||
]
|
||||
|
||||
for ac in criteria:
|
||||
func_name = "Test" + _to_pascal_case(ac["id"] + " " + ac["name"])
|
||||
ref_str = f" [{', '.join(ac['references'])}]" if ac["references"] else ""
|
||||
|
||||
lines.append(f"// {ac['id']}: {ac['name']}{ref_str}")
|
||||
lines.append(f"func {func_name}(t *testing.T) {{")
|
||||
|
||||
if ac["given"]:
|
||||
lines.append(f"\t// Given {ac['given']}")
|
||||
if ac["when"]:
|
||||
lines.append(f"\t// When {ac['when']}")
|
||||
for then_clause in ac["then"]:
|
||||
lines.append(f"\t// Then {then_clause}")
|
||||
|
||||
lines.append("")
|
||||
lines.append('\tt.Fatal("Not implemented")')
|
||||
lines.append("}")
|
||||
lines.append("")
|
||||
|
||||
if edge_cases:
|
||||
lines.append("// --- Edge Cases ---")
|
||||
lines.append("")
|
||||
|
||||
for ec in edge_cases:
|
||||
func_name = "Test" + _to_pascal_case(ec["id"] + " " + ec["condition"])
|
||||
lines.append(f"// {ec['id']}: {ec['condition']} -> {ec['behavior']}")
|
||||
lines.append(f"func {func_name}(t *testing.T) {{")
|
||||
lines.append(f"\t// Condition: {ec['condition']}")
|
||||
lines.append(f"\t// Expected: {ec['behavior']}")
|
||||
lines.append("")
|
||||
lines.append('\tt.Fatal("Not implemented")')
|
||||
lines.append("}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
GENERATORS = {
|
||||
"pytest": PytestGenerator,
|
||||
"jest": JestGenerator,
|
||||
"go-test": GoTestGenerator,
|
||||
}
|
||||
|
||||
FILE_EXTENSIONS = {
|
||||
"pytest": ".py",
|
||||
"jest": ".test.ts",
|
||||
"go-test": "_test.go",
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Extract test case stubs from a feature specification.",
|
||||
epilog="Example: python test_extractor.py --file spec.md --framework pytest --output tests/test_feature.py",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--file",
|
||||
"-f",
|
||||
required=True,
|
||||
help="Path to the spec markdown file",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--framework",
|
||||
choices=list(GENERATORS.keys()),
|
||||
default="pytest",
|
||||
help="Target test framework (default: pytest)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
"-o",
|
||||
default=None,
|
||||
help="Output file path (default: stdout)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json",
|
||||
action="store_true",
|
||||
dest="json_flag",
|
||||
help="Output extracted criteria as JSON instead of test code",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
file_path = Path(args.file)
|
||||
if not file_path.exists():
|
||||
print(f"Error: File not found: {file_path}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
content = file_path.read_text(encoding="utf-8")
|
||||
if not content.strip():
|
||||
print(f"Error: File is empty: {file_path}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
spec_parser = SpecParser(content)
|
||||
title = spec_parser.extract_spec_title()
|
||||
criteria = spec_parser.extract_acceptance_criteria()
|
||||
edge_cases = spec_parser.extract_edge_cases()
|
||||
|
||||
if not criteria and not edge_cases:
|
||||
print("Error: No acceptance criteria or edge cases found in spec.", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
warnings = []
|
||||
for ac in criteria:
|
||||
if not ac["given"] and not ac["when"]:
|
||||
warnings.append(f"{ac['id']}: Could not parse Given/When/Then — check format.")
|
||||
|
||||
if args.json_flag:
|
||||
result = {
|
||||
"spec_title": title,
|
||||
"framework": args.framework,
|
||||
"acceptance_criteria": criteria,
|
||||
"edge_cases": edge_cases,
|
||||
"warnings": warnings,
|
||||
"counts": {
|
||||
"acceptance_criteria": len(criteria),
|
||||
"edge_cases": len(edge_cases),
|
||||
"total_test_cases": len(criteria) + len(edge_cases),
|
||||
},
|
||||
}
|
||||
output = json.dumps(result, indent=2)
|
||||
else:
|
||||
generator_class = GENERATORS[args.framework]
|
||||
generator = generator_class()
|
||||
output = generator.generate(title, criteria, edge_cases)
|
||||
|
||||
if args.output:
|
||||
out_path = Path(args.output)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(output, encoding="utf-8")
|
||||
total = len(criteria) + len(edge_cases)
|
||||
print(f"Generated {total} test stubs -> {out_path}", file=sys.stderr)
|
||||
else:
|
||||
print(output)
|
||||
|
||||
if warnings:
|
||||
for w in warnings:
|
||||
print(f"Warning: {w}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user