Files
claude-skills-reference/docs/skills/engineering/spec-driven-workflow.md
Reza Rezvani 97952ccbee feat(engineering): add browser-automation and spec-driven-workflow skills
browser-automation (564-line SKILL.md, 3 scripts, 3 references):
- Web scraping, form filling, screenshot capture, data extraction
- Anti-detection patterns, cookie/session management, dynamic content
- scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py
- NOT testing (that's playwright-pro) — this is automation & scraping

spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references):
- Spec-first development: write spec BEFORE code
- Bounded autonomy rules, 6-phase workflow, self-review checklist
- spec_generator.py, spec_validator.py, test_extractor.py
- Pairs with tdd-guide for red-green-refactor after spec

Updated engineering plugin.json (31 → 33 skills).
Added both to mkdocs.yml nav and generated docs pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 12:57:18 +01:00

23 KiB

title, description
title description
Spec-Driven Workflow — Agent Skill for Codex & OpenClaw Use when the user asks to write specs before code, define acceptance criteria, plan features before implementation, generate tests from. Agent skill for Claude Code, Codex CLI, Gemini CLI, OpenClaw.

Spec-Driven Workflow

:material-rocket-launch: Engineering - POWERFUL :material-identifier: `spec-driven-workflow` :material-github: Source
Install: claude /plugin install engineering-advanced-skills

Overview

Spec-driven workflow enforces a single, non-negotiable rule: write the specification BEFORE you write any code. Not alongside. Not after. Before.

This is not documentation. This is a contract. A spec defines what the system MUST do, what it SHOULD do, and what it explicitly WILL NOT do. Every line of code you write traces back to a requirement in the spec. Every test traces back to an acceptance criterion. If it is not in the spec, it does not get built.

Why Spec-First Matters

  1. Eliminates rework. 60-80% of defects originate from requirements, not implementation. Catching ambiguity in a spec costs minutes; catching it in production costs days.
  2. Forces clarity. If you cannot write what the system should do in plain language, you do not understand the problem well enough to write code.
  3. Enables parallelism. Once a spec is approved, frontend, backend, QA, and documentation can all start simultaneously.
  4. Creates accountability. The spec is the definition of done. No arguments about whether a feature is "complete" — either it satisfies the acceptance criteria or it does not.
  5. Feeds TDD directly. Acceptance criteria in Given/When/Then format translate 1:1 into test cases. The spec IS the test plan.

The Iron Law

NO CODE WITHOUT AN APPROVED SPEC.
NO EXCEPTIONS. NO "QUICK PROTOTYPES." NO "I'LL DOCUMENT IT LATER."

If the spec is not written, reviewed, and approved, implementation does not begin. Period.


The Spec Format

Every spec follows this structure. No sections are optional — if a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not forgotten.

1. Title and Context

# Spec: [Feature Name]

**Author:** [name]
**Date:** [ISO 8601]
**Status:** Draft | In Review | Approved | Superseded
**Reviewers:** [list]
**Related specs:** [links]

## Context

[Why does this feature exist? What problem does it solve? What is the business
motivation? Include links to user research, support tickets, or metrics that
justify this work. 2-4 paragraphs maximum.]

2. Functional Requirements (RFC 2119)

Use RFC 2119 keywords precisely:

Keyword Meaning
MUST Absolute requirement. Failing this means the implementation is non-conformant.
MUST NOT Absolute prohibition. Doing this means the implementation is broken.
SHOULD Recommended. May be omitted with documented justification.
SHOULD NOT Discouraged. May be included with documented justification.
MAY Optional. Purely at the implementer's discretion.
## Functional Requirements

- FR-1: The system MUST authenticate users via OAuth 2.0 PKCE flow.
- FR-2: The system MUST reject tokens older than 24 hours.
- FR-3: The system SHOULD support refresh token rotation.
- FR-4: The system MAY cache user profiles for up to 5 minutes.
- FR-5: The system MUST NOT store plaintext passwords under any circumstance.

Number every requirement. Use FR- prefix. Each requirement is a single, testable statement.

3. Non-Functional Requirements

## Non-Functional Requirements

### Performance
- NFR-P1: Login flow MUST complete in < 500ms (p95) under normal load.
- NFR-P2: Token validation MUST complete in < 50ms (p99).

### Security
- NFR-S1: All tokens MUST be transmitted over TLS 1.2+.
- NFR-S2: The system MUST rate-limit login attempts to 5/minute per IP.

### Accessibility
- NFR-A1: Login form MUST meet WCAG 2.1 AA standards.
- NFR-A2: Error messages MUST be announced to screen readers.

### Scalability
- NFR-SC1: The system SHOULD handle 10,000 concurrent sessions.

### Reliability
- NFR-R1: The authentication service MUST maintain 99.9% uptime.

4. Acceptance Criteria (Given/When/Then)

Every functional requirement maps to one or more acceptance criteria. Use Gherkin syntax:

## Acceptance Criteria

### AC-1: Successful login (FR-1)
Given a user with valid credentials
When they submit the login form with correct email and password
Then they receive a valid access token
And they are redirected to the dashboard
And the login event is logged with timestamp and IP

### AC-2: Expired token rejection (FR-2)
Given a user with an access token issued 25 hours ago
When they make an API request with that token
Then they receive a 401 Unauthorized response
And the response body contains error code "TOKEN_EXPIRED"
And they are NOT redirected (API clients handle their own flow)

### AC-3: Rate limiting (NFR-S2)
Given an IP address that has made 5 failed login attempts in the last minute
When a 6th login attempt arrives from that IP
Then the request is rejected with 429 Too Many Requests
And the response includes a Retry-After header

5. Edge Cases and Error Scenarios

## Edge Cases

- EC-1: User submits login form with empty email → Show validation error, do not hit API.
- EC-2: OAuth provider is down → Show "Service temporarily unavailable", retry after 30s.
- EC-3: User has account but no password (social-only) → Redirect to social login.
- EC-4: Concurrent login from two devices → Both sessions are valid (no single-session enforcement).
- EC-5: Token expires mid-request → Complete the current request, return warning header.

6. API Contracts

Define request/response shapes using TypeScript-style notation:

## API Contracts

### POST /api/auth/login
Request:
```typescript
interface LoginRequest {
  email: string;       // MUST be valid email format
  password: string;    // MUST be 8-128 characters
  rememberMe?: boolean; // Default: false
}

Success Response (200):

interface LoginResponse {
  accessToken: string;   // JWT, expires in 24h
  refreshToken: string;  // Opaque, expires in 30d
  expiresIn: number;     // Seconds until access token expires
  user: {
    id: string;
    email: string;
    displayName: string;
  };
}

Error Response (401):

interface AuthError {
  error: "INVALID_CREDENTIALS" | "TOKEN_EXPIRED" | "ACCOUNT_LOCKED";
  message: string;
  retryAfter?: number; // Seconds, present for rate-limited responses
}

### 7. Data Models

```markdown
## Data Models

### User
| Field | Type | Constraints |
|-------|------|-------------|
| id | UUID | Primary key, auto-generated |
| email | string | Unique, max 255 chars, valid email format |
| passwordHash | string | bcrypt, never exposed via API |
| createdAt | timestamp | UTC, immutable |
| lastLoginAt | timestamp | UTC, updated on each login |
| loginAttempts | integer | Reset to 0 on successful login |
| lockedUntil | timestamp | Null if not locked |

8. Out of Scope

Explicit exclusions prevent scope creep:

## Out of Scope

- OS-1: Multi-factor authentication (separate spec: SPEC-042)
- OS-2: Social login providers beyond Google and GitHub
- OS-3: Admin impersonation of user accounts
- OS-4: Password complexity rules beyond minimum length (deferred to v2)
- OS-5: Session management UI (users cannot see/revoke active sessions yet)

If someone asks for an out-of-scope item during implementation, point them to this section. Do not build it.


Bounded Autonomy Rules

These rules define when an agent (human or AI) MUST stop and ask for guidance vs. when they can proceed independently.

STOP and Ask When:

  1. Scope creep detected. The implementation requires something not in the spec. Even if it seems obviously needed, STOP. The spec might have excluded it deliberately.

  2. Ambiguity exceeds 30%. If you cannot determine the correct behavior from the spec for more than 30% of a given requirement, the spec is incomplete. Do not guess.

  3. Breaking changes required. The implementation would change an existing API contract, database schema, or public interface. Always escalate.

  4. Security implications. Any change that touches authentication, authorization, encryption, or PII handling requires explicit approval.

  5. Performance characteristics unknown. If a requirement says "MUST complete in < 500ms" but you have no way to measure or guarantee that, escalate before implementing a guess.

  6. Cross-team dependencies. If the spec requires coordination with another team or service, confirm the dependency before building against it.

Continue Autonomously When:

  1. Spec is clear and unambiguous for the current task.
  2. All acceptance criteria have passing tests and you are refactoring internals.
  3. Changes are non-breaking — no public API, schema, or behavior changes.
  4. Implementation is a direct translation of a well-defined acceptance criterion.
  5. Error handling follows established patterns already documented in the codebase.

Escalation Protocol

When you must stop, provide:

## Escalation: [Brief Title]

**Blocked on:** [requirement ID, e.g., FR-3]
**Question:** [Specific, answerable question — not "what should I do?"]
**Options considered:**
  A. [Option] — Pros: [...] Cons: [...]
  B. [Option] — Pros: [...] Cons: [...]
**My recommendation:** [A or B, with reasoning]
**Impact of waiting:** [What is blocked until this is resolved?]

Never escalate without a recommendation. Never present an open-ended question. Always give options.

See references/bounded_autonomy_rules.md for the complete decision matrix.


Workflow — 6 Phases

Phase 1: Gather Requirements

Goal: Understand what needs to be built and why.

  1. Interview the user. Ask:
    • What problem does this solve?
    • Who are the users?
    • What does success look like?
    • What explicitly should NOT be built?
  2. Read existing code. Understand the current system before proposing changes.
  3. Identify constraints. Performance budgets, security requirements, backward compatibility.
  4. List unknowns. Every unknown is a risk. Surface them now, not during implementation.

Exit criteria: You can explain the feature to someone unfamiliar with the project in 2 minutes.

Phase 2: Write Spec

Goal: Produce a complete spec document following The Spec Format above.

  1. Fill every section of the template. No section left blank.
  2. Number all requirements (FR-, NFR-, AC-, EC-, OS-*).
  3. Use RFC 2119 keywords precisely.
  4. Write acceptance criteria in Given/When/Then format.
  5. Define API contracts with TypeScript-style types.
  6. List explicit exclusions in Out of Scope.

Exit criteria: The spec can be handed to a developer who was not in the requirements meeting, and they can implement the feature without asking clarifying questions.

Phase 3: Validate Spec

Goal: Verify the spec is complete, consistent, and implementable.

Run spec_validator.py against the spec file:

python spec_validator.py --file spec.md --strict

Manual validation checklist:

  • Every functional requirement has at least one acceptance criterion
  • Every acceptance criterion is testable (no subjective language)
  • API contracts cover all endpoints mentioned in requirements
  • Data models cover all entities mentioned in requirements
  • Edge cases cover failure modes for every external dependency
  • Out of scope is explicit about what was considered and rejected
  • Non-functional requirements have measurable thresholds

Exit criteria: Spec scores 80+ on validator, and all manual checklist items pass.

Phase 4: Generate Tests

Goal: Extract test cases from acceptance criteria before writing implementation code.

Run test_extractor.py against the approved spec:

python test_extractor.py --file spec.md --framework pytest --output tests/
  1. Each acceptance criterion becomes one or more test cases.
  2. Each edge case becomes a test case.
  3. Tests are stubs — they define the assertion but not the implementation.
  4. All tests MUST fail initially (red phase of TDD).

Exit criteria: You have a test file where every test fails with "not implemented" or equivalent.

Phase 5: Implement

Goal: Write code that makes failing tests pass, one acceptance criterion at a time.

  1. Pick one acceptance criterion (start with the simplest).
  2. Make its test(s) pass with minimal code.
  3. Run the full test suite — no regressions.
  4. Commit.
  5. Pick the next acceptance criterion. Repeat.

Rules:

  • Do NOT implement anything not in the spec.
  • Do NOT optimize before all acceptance criteria pass.
  • Do NOT refactor before all acceptance criteria pass.
  • If you discover a missing requirement, STOP and update the spec first.

Exit criteria: All tests pass. All acceptance criteria satisfied.

Phase 6: Self-Review

Goal: Verify implementation matches spec before marking done.

Run through the Self-Review Checklist below. If any item fails, fix it before declaring the task complete.


Self-Review Checklist

Before marking any implementation as done, verify ALL of the following:

  • Every acceptance criterion has a passing test. No exceptions. If AC-3 exists, a test for AC-3 exists and passes.
  • Every edge case has a test. EC-1 through EC-N all have corresponding test cases.
  • No scope creep. The implementation does not include features not in the spec. If you added something, either update the spec or remove it.
  • API contracts match implementation. Request/response shapes in code match the spec exactly. Field names, types, status codes — all of it.
  • Error scenarios tested. Every error response defined in the spec has a test that triggers it.
  • Non-functional requirements verified. If the spec says < 500ms, you have evidence (benchmark, load test, profiling) that it meets the threshold.
  • Data model matches. Database schema matches the spec. No extra columns, no missing constraints.
  • Out-of-scope items not built. Double-check that nothing from the Out of Scope section leaked into the implementation.

Integration with TDD Guide

Spec-driven workflow and TDD are complementary, not competing:

Spec-Driven Workflow          TDD (Red-Green-Refactor)
─────────────────────         ──────────────────────────
Phase 1: Gather Requirements
Phase 2: Write Spec
Phase 3: Validate Spec
Phase 4: Generate Tests  ──→  RED: Tests exist and fail
Phase 5: Implement       ──→  GREEN: Minimal code to pass
Phase 6: Self-Review     ──→  REFACTOR: Clean up internals

The handoff: Spec-driven workflow produces the test stubs (Phase 4). TDD takes over from there. The spec tells you WHAT to test. TDD tells you HOW to implement.

Use engineering-team/tdd-guide for:

  • Red-green-refactor cycle discipline
  • Coverage analysis and gap detection
  • Framework-specific test patterns (Jest, Pytest, JUnit)

Use engineering/spec-driven-workflow for:

  • Defining what to build before building it
  • Acceptance criteria authoring
  • Completeness validation
  • Scope control

Examples

Full Spec: User Password Reset

# Spec: Password Reset Flow

**Author:** Engineering Team
**Date:** 2026-03-25
**Status:** Approved

## Context

Users who forget their passwords currently have no self-service recovery option.
Support receives ~200 password reset requests per week, costing approximately
8 hours of support time. This feature eliminates that burden entirely.

## Functional Requirements

- FR-1: The system MUST allow users to request a password reset via email.
- FR-2: The system MUST send a reset link that expires after 1 hour.
- FR-3: The system MUST invalidate all previous reset links when a new one is requested.
- FR-4: The system MUST enforce minimum password length of 8 characters on reset.
- FR-5: The system MUST NOT reveal whether an email exists in the system.
- FR-6: The system SHOULD log all reset attempts for audit purposes.

## Acceptance Criteria

### AC-1: Request reset (FR-1, FR-5)
Given a user on the password reset page
When they enter any email address and submit
Then they see "If an account exists, a reset link has been sent"
And the response is identical whether the email exists or not

### AC-2: Valid reset link (FR-2)
Given a user who received a reset email 30 minutes ago
When they click the reset link
Then they see the password reset form

### AC-3: Expired reset link (FR-2)
Given a user who received a reset email 2 hours ago
When they click the reset link
Then they see "This link has expired. Please request a new one."

### AC-4: Previous links invalidated (FR-3)
Given a user who requested two reset emails
When they click the link from the first email
Then they see "This link is no longer valid."

## Edge Cases

- EC-1: User submits reset for non-existent email → Same success message (FR-5).
- EC-2: User clicks reset link twice → Second click shows "already used" if password was changed.
- EC-3: Email delivery fails → Log error, do not retry automatically.
- EC-4: User requests reset while already logged in → Allow it, do not force logout.

## Out of Scope

- OS-1: Security questions as alternative reset method.
- OS-2: SMS-based password reset.
- OS-3: Admin-initiated password reset (separate spec).

Extracted Test Cases (from above spec)

# Generated by test_extractor.py --framework pytest

class TestPasswordReset:
    def test_ac1_request_reset_existing_email(self):
        """AC-1: Request reset with existing email shows generic message."""
        # Given a user on the password reset page
        # When they enter a registered email and submit
        # Then they see "If an account exists, a reset link has been sent"
        raise NotImplementedError("Implement this test")

    def test_ac1_request_reset_nonexistent_email(self):
        """AC-1: Request reset with unknown email shows same generic message."""
        # Given a user on the password reset page
        # When they enter an unregistered email and submit
        # Then they see identical response to existing email case
        raise NotImplementedError("Implement this test")

    def test_ac2_valid_reset_link(self):
        """AC-2: Reset link works within expiry window."""
        raise NotImplementedError("Implement this test")

    def test_ac3_expired_reset_link(self):
        """AC-3: Reset link rejected after 1 hour."""
        raise NotImplementedError("Implement this test")

    def test_ac4_previous_links_invalidated(self):
        """AC-4: Old reset links stop working when new one is requested."""
        raise NotImplementedError("Implement this test")

    def test_ec1_nonexistent_email_same_response(self):
        """EC-1: Non-existent email produces identical response."""
        raise NotImplementedError("Implement this test")

    def test_ec2_reset_link_used_twice(self):
        """EC-2: Already-used reset link shows appropriate message."""
        raise NotImplementedError("Implement this test")

Anti-Patterns

1. Coding Before Spec Approval

Symptom: "I'll start coding while the spec is being reviewed." Problem: The review will surface changes. Now you have code that implements a rejected design. Rule: Implementation does not begin until spec status is "Approved."

2. Vague Acceptance Criteria

Symptom: "The system should work well" or "The UI should be responsive." Problem: Untestable. What does "well" mean? What does "responsive" mean? Rule: Every acceptance criterion must be verifiable by a machine. If you cannot write a test for it, rewrite the criterion.

3. Missing Edge Cases

Symptom: Happy path is specified, error paths are not. Problem: Developers invent error handling on the fly, leading to inconsistent behavior. Rule: For every external dependency (API, database, file system, user input), specify at least one failure scenario.

4. Spec as Post-Hoc Documentation

Symptom: "Let me write the spec now that the feature is done." Problem: This is documentation, not specification. It describes what was built, not what should have been built. It cannot catch design errors because the design is already frozen. Rule: If the spec was written after the code, it is not a spec. Relabel it as documentation.

5. Gold-Plating Beyond Spec

Symptom: "While I was in there, I also added..." Problem: Untested code. Unreviewed design. Potential for subtle bugs in the "bonus" feature. Rule: If it is not in the spec, it does not get built. File a new spec for additional features.

6. Acceptance Criteria Without Requirement Traceability

Symptom: AC-7 exists but does not reference any FR-* or NFR-. Problem: Orphaned criteria mean either a requirement is missing or the criterion is unnecessary. Rule: Every AC- MUST reference at least one FR-* or NFR-*.

7. Skipping Validation

Symptom: "The spec looks fine, let's just start." Problem: Missing sections discovered during implementation cause blocking delays. Rule: Always run spec_validator.py --strict before starting implementation. Fix all warnings.


Cross-References

  • engineering-team/tdd-guide — Red-green-refactor cycle, test generation, coverage analysis. Use after Phase 4 of this workflow.
  • engineering/focused-fix — Deep-dive feature repair. When a spec-driven implementation has systemic issues, use focused-fix for diagnosis.
  • engineering/rag-architect — If the feature involves retrieval or knowledge systems, use rag-architect for the technical design within the spec.
  • references/spec_format_guide.md — Complete template with section-by-section explanations.
  • references/bounded_autonomy_rules.md — Full decision matrix for when to stop vs. continue.
  • references/acceptance_criteria_patterns.md — Pattern library for writing Given/When/Then criteria.

Tools

Script Purpose Key Flags
spec_generator.py Generate spec template from feature name/description --name, --description, --format, --json
spec_validator.py Validate spec completeness (0-100 score) --file, --strict, --json
test_extractor.py Extract test stubs from acceptance criteria --file, --framework, --output, --json
# Generate a spec template
python spec_generator.py --name "User Authentication" --description "OAuth 2.0 login flow"

# Validate a spec
python spec_validator.py --file specs/auth.md --strict

# Extract test cases
python test_extractor.py --file specs/auth.md --framework pytest --output tests/test_auth.py