feat(engineering): add browser-automation and spec-driven-workflow skills

browser-automation (564-line SKILL.md, 3 scripts, 3 references): - Web scraping, form filling, screenshot capture, data extraction - Anti-detection patterns, cookie/session management, dynamic content - scraping_toolkit.py, form_automation_builder.py, anti_detection_checker.py - NOT testing (that's playwright-pro) — this is automation & scraping spec-driven-workflow (586-line SKILL.md, 3 scripts, 3 references): - Spec-first development: write spec BEFORE code - Bounded autonomy rules, 6-phase workflow, self-review checklist - spec_generator.py, spec_validator.py, test_extractor.py - Pairs with tdd-guide for red-green-refactor after spec Updated engineering plugin.json (31 → 33 skills). Added both to mkdocs.yml nav and generated docs pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 12:57:18 +01:00
parent 7a2189fa21
commit 97952ccbee
19 changed files with 7379 additions and 3 deletions
--- a/engineering/spec-driven-workflow/references/acceptance_criteria_patterns.md
+++ b/engineering/spec-driven-workflow/references/acceptance_criteria_patterns.md
@@ -0,0 +1,497 @@
+# Acceptance Criteria Patterns
+
+A pattern library for writing Given/When/Then acceptance criteria across common feature types. Use these as starting points — adapt to your domain.
+
+---
+
+## Pattern Structure
+
+Every acceptance criterion follows this structure:
+
+```
+### AC-N: [Descriptive name] (FR-N, NFR-N)
+Given [precondition — the system/user is in this state]
+When  [trigger — the user or system performs this action]
+Then  [outcome — this observable, testable result occurs]
+And   [additional outcome — and this also happens]
+```
+
+**Rules:**
+1. One scenario per AC. Multiple Given/When/Then blocks = multiple ACs.
+2. Every AC references at least one FR-* or NFR-*.
+3. Outcomes must be observable and testable — no subjective language.
+4. Preconditions must be achievable in a test setup.
+
+---
+
+## Authentication Patterns
+
+### Login — Happy Path
+
+```markdown
+### AC-1: Successful login with valid credentials (FR-1)
+Given a registered user with email "user@example.com" and password "V@lidP4ss!"
+When they POST /api/auth/login with email "user@example.com" and password "V@lidP4ss!"
+Then the response status is 200
+And the response body contains a valid JWT access token
+And the response body contains a refresh token
+And the access token expires in 24 hours
+```
+
+### Login — Invalid Credentials
+
+```markdown
+### AC-2: Login rejected with wrong password (FR-1)
+Given a registered user with email "user@example.com"
+When they POST /api/auth/login with email "user@example.com" and an incorrect password
+Then the response status is 401
+And the response body contains error code "INVALID_CREDENTIALS"
+And no token is issued
+And the failed attempt is logged
+```
+
+### Login — Account Locked
+
+```markdown
+### AC-3: Login rejected for locked account (FR-1, NFR-S2)
+Given a user whose account is locked due to 5 consecutive failed login attempts
+When they POST /api/auth/login with correct credentials
+Then the response status is 403
+And the response body contains error code "ACCOUNT_LOCKED"
+And the response includes a "retryAfter" field with seconds until unlock
+```
+
+### Token Refresh
+
+```markdown
+### AC-4: Token refresh with valid refresh token (FR-3)
+Given a user with a valid, non-expired refresh token
+When they POST /api/auth/refresh with that refresh token
+Then the response status is 200
+And a new access token is issued
+And the old refresh token is invalidated
+And a new refresh token is issued (rotation)
+```
+
+### Logout
+
+```markdown
+### AC-5: Logout invalidates session (FR-4)
+Given an authenticated user with a valid access token
+When they POST /api/auth/logout with that token
+Then the response status is 204
+And the access token is no longer accepted for API calls
+And the refresh token is invalidated
+```
+
+---
+
+## CRUD Patterns
+
+### Create
+
+```markdown
+### AC-6: Create resource with valid data (FR-1)
+Given an authenticated user with "editor" role
+When they POST /api/resources with valid payload {name: "Test", type: "A"}
+Then the response status is 201
+And the response body contains the created resource with a generated UUID
+And the resource's "createdAt" field is set to the current UTC timestamp
+And the resource's "createdBy" field matches the authenticated user's ID
+```
+
+### Create — Validation Failure
+
+```markdown
+### AC-7: Create resource rejected with invalid data (FR-1)
+Given an authenticated user
+When they POST /api/resources with payload missing required field "name"
+Then the response status is 400
+And the response body contains error code "VALIDATION_ERROR"
+And the response body contains field-level detail: {"name": "Required field"}
+And no resource is created in the database
+```
+
+### Read — Single Item
+
+```markdown
+### AC-8: Read resource by ID (FR-2)
+Given an existing resource with ID "abc-123"
+When an authenticated user GETs /api/resources/abc-123
+Then the response status is 200
+And the response body contains the resource with all fields
+```
+
+### Read — Not Found
+
+```markdown
+### AC-9: Read non-existent resource returns 404 (FR-2)
+Given no resource exists with ID "nonexistent-id"
+When an authenticated user GETs /api/resources/nonexistent-id
+Then the response status is 404
+And the response body contains error code "NOT_FOUND"
+```
+
+### Update
+
+```markdown
+### AC-10: Update resource with valid data (FR-3)
+Given an existing resource with ID "abc-123" owned by the authenticated user
+When they PATCH /api/resources/abc-123 with {name: "Updated Name"}
+Then the response status is 200
+And the resource's "name" field is "Updated Name"
+And the resource's "updatedAt" field is updated to the current UTC timestamp
+And fields not included in the patch are unchanged
+```
+
+### Update — Ownership Check
+
+```markdown
+### AC-11: Update rejected for non-owner (FR-3, FR-6)
+Given an existing resource with ID "abc-123" owned by user "other-user"
+When the authenticated user (not "other-user") PATCHes /api/resources/abc-123
+Then the response status is 403
+And the response body contains error code "FORBIDDEN"
+And the resource is unchanged
+```
+
+### Delete — Soft Delete
+
+```markdown
+### AC-12: Soft delete resource (FR-5)
+Given an existing resource with ID "abc-123" owned by the authenticated user
+When they DELETE /api/resources/abc-123
+Then the response status is 204
+And the resource's "deletedAt" field is set to the current UTC timestamp
+And the resource no longer appears in GET /api/resources (list endpoint)
+And the resource still exists in the database (soft deleted)
+```
+
+### List — Pagination
+
+```markdown
+### AC-13: List resources with default pagination (FR-4)
+Given 50 resources exist for the authenticated user
+When they GET /api/resources without pagination parameters
+Then the response status is 200
+And the response contains the first 20 resources (default page size)
+And the response includes "totalCount: 50"
+And the response includes "page: 1"
+And the response includes "pageSize: 20"
+And the response includes "hasNextPage: true"
+```
+
+### List — Filtered
+
+```markdown
+### AC-14: List resources with type filter (FR-4)
+Given 30 resources of type "A" and 20 resources of type "B" exist
+When the authenticated user GETs /api/resources?type=A
+Then the response status is 200
+And all returned resources have type "A"
+And the response "totalCount" is 30
+```
+
+---
+
+## Search Patterns
+
+### Basic Search
+
+```markdown
+### AC-15: Search returns matching results (FR-7)
+Given resources with names "Alpha Report", "Beta Analysis", "Alpha Summary" exist
+When the user GETs /api/resources?q=Alpha
+Then the response contains "Alpha Report" and "Alpha Summary"
+And the response does not contain "Beta Analysis"
+And results are ordered by relevance score (descending)
+```
+
+### Search — Empty Results
+
+```markdown
+### AC-16: Search with no matches returns empty list (FR-7)
+Given no resources match the query "xyznonexistent"
+When the user GETs /api/resources?q=xyznonexistent
+Then the response status is 200
+And the response contains an empty "items" array
+And "totalCount" is 0
+```
+
+### Search — Special Characters
+
+```markdown
+### AC-17: Search handles special characters safely (FR-7, NFR-S1)
+Given resources exist in the database
+When the user GETs /api/resources?q="; DROP TABLE resources;--
+Then the response status is 200
+And no SQL injection occurs
+And the search treats the input as a literal string
+```
+
+---
+
+## File Upload Patterns
+
+### Upload — Happy Path
+
+```markdown
+### AC-18: Upload file within size limit (FR-8)
+Given an authenticated user
+When they POST /api/files with a 5MB PNG file
+Then the response status is 201
+And the response contains the file's URL, size, and MIME type
+And the file is stored in the configured storage backend
+And the file is associated with the authenticated user
+```
+
+### Upload — Size Exceeded
+
+```markdown
+### AC-19: Upload rejected for oversized file (FR-8)
+Given the maximum file size is 10MB
+When the user POSTs /api/files with a 15MB file
+Then the response status is 413
+And the response contains error code "FILE_TOO_LARGE"
+And no file is stored
+```
+
+### Upload — Invalid Type
+
+```markdown
+### AC-20: Upload rejected for disallowed file type (FR-8, NFR-S3)
+Given allowed file types are PNG, JPG, PDF
+When the user POSTs /api/files with an .exe file
+Then the response status is 415
+And the response contains error code "UNSUPPORTED_MEDIA_TYPE"
+And no file is stored
+```
+
+---
+
+## Payment Patterns
+
+### Charge — Happy Path
+
+```markdown
+### AC-21: Successful payment charge (FR-10)
+Given a user with a valid payment method on file
+When they POST /api/payments with amount 49.99 and currency "USD"
+Then the payment gateway is charged $49.99
+And the response status is 201
+And the response contains a transaction ID
+And a payment record is created with status "completed"
+And a receipt email is sent to the user
+```
+
+### Charge — Declined
+
+```markdown
+### AC-22: Payment declined by gateway (FR-10)
+Given a user with an expired credit card on file
+When they POST /api/payments with amount 49.99
+Then the payment gateway returns a decline
+And the response status is 402
+And the response contains error code "PAYMENT_DECLINED"
+And no payment record is created with status "completed"
+And the user is prompted to update their payment method
+```
+
+### Charge — Idempotency
+
+```markdown
+### AC-23: Duplicate payment request is idempotent (FR-10, NFR-R1)
+Given a payment was successfully processed with idempotency key "key-123"
+When the same request is sent again with idempotency key "key-123"
+Then the response status is 200
+And the response contains the original transaction ID
+And the user is NOT charged a second time
+```
+
+---
+
+## Notification Patterns
+
+### Email Notification
+
+```markdown
+### AC-24: Email notification sent on event (FR-11)
+Given a user with notification preferences set to "email"
+When their order status changes to "shipped"
+Then an email is sent to their registered email address
+And the email subject contains the order number
+And the email body contains the tracking URL
+And a notification record is created with status "sent"
+```
+
+### Notification — Delivery Failure
+
+```markdown
+### AC-25: Failed notification is retried (FR-11, NFR-R2)
+Given the email service returns a 5xx error on first attempt
+When a notification is triggered
+Then the system retries up to 3 times with exponential backoff (1s, 4s, 16s)
+And if all retries fail, the notification status is set to "failed"
+And an alert is sent to the ops channel
+```
+
+---
+
+## Negative Test Patterns
+
+### Unauthorized Access
+
+```markdown
+### AC-26: Unauthenticated request rejected (NFR-S1)
+Given no authentication token is provided
+When the user GETs /api/resources
+Then the response status is 401
+And the response contains error code "AUTHENTICATION_REQUIRED"
+And no resource data is returned
+```
+
+### Invalid Input — Type Mismatch
+
+```markdown
+### AC-27: String provided for numeric field (FR-1)
+Given the "quantity" field expects an integer
+When the user POSTs with quantity: "abc"
+Then the response status is 400
+And the response body contains field error: {"quantity": "Must be an integer"}
+```
+
+### Rate Limiting
+
+```markdown
+### AC-28: Rate limit enforced (NFR-S2)
+Given the rate limit is 100 requests per minute per API key
+When the user sends the 101st request within 60 seconds
+Then the response status is 429
+And the response includes header "Retry-After" with seconds until reset
+And the response contains error code "RATE_LIMITED"
+```
+
+### Concurrent Modification
+
+```markdown
+### AC-29: Optimistic locking prevents lost updates (NFR-R1)
+Given a resource with version 5
+When user A PATCHes with version 5 and user B PATCHes with version 5 simultaneously
+Then one succeeds with status 200 (version becomes 6)
+And the other receives status 409 with error code "CONFLICT"
+And the 409 response includes the current version number
+```
+
+---
+
+## Performance Criteria Patterns
+
+### Response Time
+
+```markdown
+### AC-30: API response time under load (NFR-P1)
+Given the system is handling 1,000 concurrent users
+When a user GETs /api/dashboard
+Then the response is returned in < 500ms (p95)
+And the response is returned in < 1000ms (p99)
+```
+
+### Throughput
+
+```markdown
+### AC-31: System handles target throughput (NFR-P2)
+Given normal production traffic patterns
+When the system receives 5,000 requests per second
+Then all requests are processed without queue overflow
+And error rate remains below 0.1%
+```
+
+### Resource Usage
+
+```markdown
+### AC-32: Memory usage within bounds (NFR-P3)
+Given the service is processing normal traffic
+When measured over a 24-hour period
+Then memory usage does not exceed 512MB RSS
+And no memory leaks are detected (RSS growth < 5% over 24h)
+```
+
+---
+
+## Accessibility Criteria Patterns
+
+### Keyboard Navigation
+
+```markdown
+### AC-33: Form is fully keyboard navigable (NFR-A1)
+Given the user is on the login page using only a keyboard
+When they press Tab
+Then focus moves through: email field -> password field -> submit button
+And each focused element has a visible focus indicator
+And pressing Enter on the submit button submits the form
+```
+
+### Screen Reader
+
+```markdown
+### AC-34: Error messages announced to screen readers (NFR-A2)
+Given the user submits the form with invalid data
+When validation errors appear
+Then each error is associated with its form field via aria-describedby
+And the error container has role="alert" for immediate announcement
+And the first error field receives focus
+```
+
+### Color Contrast
+
+```markdown
+### AC-35: Text meets contrast requirements (NFR-A3)
+Given the default theme is active
+When measuring text against background colors
+Then all body text meets 4.5:1 contrast ratio (WCAG AA)
+And all large text (18px+ or 14px+ bold) meets 3:1 contrast ratio
+And all interactive element states (hover, focus, active) meet 3:1
+```
+
+### Reduced Motion
+
+```markdown
+### AC-36: Animations respect user preference (NFR-A4)
+Given the user has enabled "prefers-reduced-motion" in their OS settings
+When they load any page with animations
+Then all non-essential animations are disabled
+And essential animations (e.g., loading spinner) use a reduced version
+And no content is hidden behind animation-only interactions
+```
+
+---
+
+## Writing Tips
+
+### Do
+
+- Start Given with the system/user state, not the action
+- Make When a single, specific trigger
+- Make Then observable — status codes, field values, side effects
+- Include And for additional assertions on the same outcome
+- Reference requirement IDs in the AC title
+
+### Do Not
+
+- Write "Then the system works correctly" (not testable)
+- Combine multiple scenarios in one AC
+- Use subjective words: "quickly", "properly", "nicely", "user-friendly"
+- Skip the precondition — Given is required even if it seems obvious
+- Write Given/When/Then as prose paragraphs — use the structured format
+
+### Smell Tests
+
+If your AC has any of these, rewrite it:
+
+| Smell | Example | Fix |
+|-------|---------|-----|
+| No Given clause | "When user clicks, then page loads" | Add "Given user is on the dashboard" |
+| Vague Then | "Then it works" | Specify status code, body, side effects |
+| Multiple Whens | "When user clicks A and then clicks B" | Split into two ACs |
+| Implementation detail | "Then the Redux store is updated" | Focus on user-observable outcome |
+| No requirement reference | "AC-5: Dashboard loads" | "AC-5: Dashboard loads (FR-7)" |
--- a/engineering/spec-driven-workflow/references/bounded_autonomy_rules.md
+++ b/engineering/spec-driven-workflow/references/bounded_autonomy_rules.md
@@ -0,0 +1,273 @@
+# Bounded Autonomy Rules
+
+Decision framework for when an agent (human or AI) should stop and ask vs. continue working autonomously during spec-driven development.
+
+---
+
+## The Core Principle
+
+**Autonomy is earned by clarity.** The clearer the spec, the more autonomy the implementer has. The more ambiguous the spec, the more the implementer must stop and ask.
+
+This is not about trust. It is about risk. A clear spec means low risk of building the wrong thing. An ambiguous spec means high risk.
+
+---
+
+## Decision Matrix
+
+| Signal | Action | Rationale |
+|--------|--------|-----------|
+| Spec is Approved, requirement is clear, tests exist | **Continue** | Low risk. Build it. |
+| Requirement is clear but no test exists yet | **Continue** (write the test first) | You can infer the test from the requirement. |
+| Requirement uses SHOULD/MAY keywords | **Continue** with your best judgment | These are intentionally flexible. Document your choice. |
+| Requirement is ambiguous (multiple valid interpretations) | **STOP** if ambiguity > 30% of the task | Ask the spec author to clarify. |
+| Implementation requires changing an API contract | **STOP** always | Breaking changes need explicit approval. |
+| Implementation requires a new database migration | **STOP** if it changes existing columns/tables | New tables are lower risk than schema changes. |
+| Security-related change (auth, crypto, PII) | **STOP** always | Security changes need review regardless of spec clarity. |
+| Performance-critical path with no benchmark data | **STOP** | You cannot prove NFR compliance without measurement. |
+| Bug found in existing code unrelated to spec | **STOP** — file a separate issue | Do not fix unrelated bugs in a spec-scoped implementation. |
+| Spec says "N/A" for a section you think needs content | **STOP** | The author may have a reason, or they may have missed it. |
+
+---
+
+## Ambiguity Scoring
+
+When you encounter ambiguity, quantify it before deciding to stop or continue.
+
+### How to Score Ambiguity
+
+For each requirement you are implementing, ask:
+
+1. **Can I write a test for this right now?** (No = +20% ambiguity)
+2. **Are there multiple valid interpretations?** (Yes = +20% ambiguity)
+3. **Does the spec contradict itself?** (Yes = +30% ambiguity)
+4. **Am I making assumptions about user behavior?** (Yes = +15% ambiguity)
+5. **Does this depend on an undocumented external system?** (Yes = +15% ambiguity)
+
+### Threshold
+
+| Ambiguity Score | Action |
+|-----------------|--------|
+| 0-15% | Continue. Minor ambiguity is normal. Document your interpretation. |
+| 16-30% | Continue with caution. Add a comment explaining your interpretation. Flag in PR. |
+| 31-50% | STOP. Ask the spec author one specific question. Do not continue until answered. |
+| 51%+ | STOP. The spec is incomplete. Request a revision before proceeding. |
+
+### Example
+
+**Requirement:** "FR-7: The system MUST notify the user when their order ships."
+
+Questions:
+1. Can I write a test? Partially — I know WHAT to test but not HOW (email? push? in-app?). +20%
+2. Multiple interpretations? Yes — notification channel is unclear. +20%
+3. Contradicts itself? No. +0%
+4. Assuming user behavior? Yes — I am assuming they want email. +15%
+5. Undocumented external system? Maybe — depends on notification service. +15%
+
+**Total: 70%.** STOP. The spec needs to specify the notification channel.
+
+---
+
+## Scope Creep Detection
+
+### What Is Scope Creep?
+
+Scope creep is implementing functionality not described in the spec. It includes:
+
+- Adding features the spec does not mention
+- "Improving" behavior beyond what acceptance criteria require
+- Handling edge cases the spec explicitly excluded
+- Refactoring unrelated code "while you're in there"
+- Building infrastructure for future features
+
+### Detection Patterns
+
+| Pattern | Example | Risk |
+|---------|---------|------|
+| "While I'm here..." | Refactoring a utility function unrelated to the spec | Medium — unreviewed changes |
+| "This would be easy to add..." | Adding a search filter the spec does not mention | High — untested, unspecified |
+| "Users will probably want..." | Building a feature based on assumption | High — may conflict with future specs |
+| "This is obviously needed..." | Adding logging, metrics, or caching not in NFRs | Medium — may be overkill or wrong approach |
+| "The spec forgot to mention..." | Building something the spec excluded | Critical — may be deliberately excluded |
+
+### Response Protocol
+
+When you detect scope creep in your own work:
+
+1. **Stop immediately.** Do not commit the extra code.
+2. **Check Out of Scope.** Is this item explicitly excluded?
+3. **If excluded:** Delete the code. The spec author had a reason.
+4. **If not mentioned:** File a note for the spec author. Ask if it should be added.
+5. **If approved:** Update the spec FIRST, then implement.
+
+---
+
+## Breaking Change Identification
+
+### What Counts as a Breaking Change?
+
+A breaking change is any modification that could cause existing clients, tests, or integrations to fail.
+
+| Category | Breaking | Not Breaking |
+|----------|----------|--------------|
+| API endpoint removed | Yes | - |
+| API endpoint added | - | No |
+| Required field added to request | Yes | - |
+| Optional field added to request | - | No |
+| Field removed from response | Yes | - |
+| Field added to response | - | No (usually) |
+| Status code changed | Yes | - |
+| Error code string changed | Yes | - |
+| Database column removed | Yes | - |
+| Database column added (nullable) | - | No |
+| Database column added (not null, no default) | Yes | - |
+| Enum value removed | Yes | - |
+| Enum value added | - | No (usually) |
+| Behavior change for existing input | Yes | - |
+
+### Breaking Change Protocol
+
+1. **Identify** the breaking change before implementing it.
+2. **Escalate** immediately — do not implement without approval.
+3. **Propose** a migration path (versioned API, feature flag, deprecation period).
+4. **Document** the breaking change in the spec's changelog.
+
+---
+
+## Security Implication Checklist
+
+Any change touching the following areas MUST be escalated, even if the spec seems clear.
+
+### Always Escalate
+
+- [ ] Authentication logic (login, logout, token generation)
+- [ ] Authorization logic (role checks, permission gates)
+- [ ] Encryption/hashing (algorithm choice, key management)
+- [ ] PII handling (storage, transmission, logging)
+- [ ] Input validation bypass (new endpoints, parameter changes)
+- [ ] Rate limiting changes (thresholds, scope)
+- [ ] CORS or CSP policy changes
+- [ ] File upload handling
+- [ ] SQL/NoSQL query construction (injection risk)
+- [ ] Deserialization of user input
+- [ ] Redirect URLs from user input (open redirect risk)
+- [ ] Secrets in code, config, or logs
+
+### Security Escalation Template
+
+```markdown
+## Security Escalation: [Title]
+
+**Affected area:** [authentication/authorization/encryption/PII/etc.]
+**Spec reference:** [FR-N or NFR-SN]
+**Risk:** [What could go wrong if implemented incorrectly]
+**Current protection:** [What exists today]
+**Proposed change:** [What the spec requires]
+**My concern:** [Specific security question]
+**Recommendation:** [Proposed approach with security rationale]
+```
+
+---
+
+## Escalation Templates
+
+### Template 1: Ambiguous Requirement
+
+```markdown
+## Escalation: Ambiguous Requirement
+
+**Blocked on:** FR-7 ("notify the user when their order ships")
+**Ambiguity score:** 70%
+**Question:** What notification channel should be used?
+**Options considered:**
+  A. Email only — Pros: simple, reliable. Cons: not real-time.
+  B. Email + in-app notification — Pros: covers both async and real-time. Cons: more implementation effort.
+  C. Configurable per user — Pros: maximum flexibility. Cons: requires preference UI (not in spec).
+**My recommendation:** B (email + in-app). Covers most use cases without requiring new UI.
+**Impact of waiting:** Cannot implement FR-7 until resolved. No other work blocked.
+```
+
+### Template 2: Missing Edge Case
+
+```markdown
+## Escalation: Missing Edge Case
+
+**Related to:** FR-3 (password reset link expires after 1 hour)
+**Scenario:** User clicks a reset link, but their account was deleted between requesting and clicking.
+**Not in spec:** Edge cases section does not cover this.
+**Options considered:**
+  A. Show generic "link invalid" error — Pros: secure (no info leak). Cons: confusing for deleted user.
+  B. Show "account not found" error — Pros: clear. Cons: confirms account deletion to link holder.
+**My recommendation:** A. Security over clarity — do not reveal account existence.
+**Impact of waiting:** Can implement other ACs; this is blocking only AC-2 completion.
+```
+
+### Template 3: Potential Breaking Change
+
+```markdown
+## Escalation: Potential Breaking Change
+
+**Spec requires:** Adding required field "role" to POST /api/users request (FR-6)
+**Current behavior:** POST /api/users accepts {email, password, displayName}
+**Breaking:** Yes — existing clients will get 400 errors (missing required field)
+**Options considered:**
+  A. Make "role" required as spec says — Pros: matches spec. Cons: breaks mobile app v2.1.
+  B. Make "role" optional with default "user" — Pros: backward compatible. Cons: deviates from spec.
+  C. Version the API (v2) — Pros: clean separation. Cons: maintenance burden.
+**My recommendation:** B. Default to "user" for backward compatibility. Update spec to reflect MAY instead of MUST.
+**Impact of waiting:** Frontend team is building against the new contract. Need answer within 2 days.
+```
+
+### Template 4: Scope Creep Proposal
+
+```markdown
+## Escalation: Potential Addition to Spec
+
+**Context:** While implementing FR-2 (password validation), I noticed the spec does not mention password strength feedback.
+**Not in spec:** No requirement for showing strength indicators.
+**Checked Out of Scope:** Not listed there either.
+**Proposal:** Add FR-7: "The system SHOULD display password strength feedback during registration."
+**Effort:** ~2 hours additional implementation.
+**Question:** Should this be added to current spec, filed as a separate spec, or skipped?
+**Impact of waiting:** FR-2 implementation is not blocked. This is an enhancement question only.
+```
+
+---
+
+## Quick Reference Card
+
+```
+CONTINUE if:
+  - Spec is approved
+  - Requirement uses MUST and is unambiguous
+  - Tests can be written directly from the AC
+  - Changes are additive and non-breaking
+  - You are refactoring internals only (no behavior change)
+
+STOP if:
+  - Ambiguity > 30%
+  - Any breaking change
+  - Any security-related change
+  - Spec says N/A but you think it shouldn't
+  - You are about to build something not in the spec
+  - You cannot write a test for the requirement
+  - External dependency is undocumented
+```
+
+---
+
+## Anti-Patterns in Autonomy
+
+### 1. "I'll Ask Later"
+Continuing past an ambiguity checkpoint because asking feels slow. The rework from building the wrong thing is always slower.
+
+### 2. "It's Obviously Needed"
+Assuming a missing feature was accidentally omitted. It may have been deliberately excluded. Check Out of Scope first.
+
+### 3. "The Spec Is Wrong"
+Implementing what you think the spec SHOULD say instead of what it DOES say. If the spec is wrong, escalate. Do not silently "fix" it.
+
+### 4. "Just This Once"
+Bypassing the escalation protocol for a "small" change. Small changes compound. The protocol exists because humans are bad at judging risk in the moment.
+
+### 5. "I Already Built It"
+Presenting completed work that was never in the spec and hoping it gets accepted. This creates review pressure and wastes everyone's time if rejected. Ask BEFORE building.
--- a/engineering/spec-driven-workflow/references/spec_format_guide.md
+++ b/engineering/spec-driven-workflow/references/spec_format_guide.md
@@ -0,0 +1,423 @@
+# Spec Format Guide
+
+Complete reference for writing feature specifications. Every section is explained with examples, rationale, and common mistakes.
+
+---
+
+## The Spec Document Structure
+
+A spec has 8 mandatory sections. If a section does not apply, write "N/A — [reason]" so reviewers know it was considered, not skipped.
+
+```
+1. Title and Metadata
+2. Context
+3. Functional Requirements
+4. Non-Functional Requirements
+5. Acceptance Criteria
+6. Edge Cases and Error Scenarios
+7. API Contracts
+8. Data Models
+9. Out of Scope
+```
+
+---
+
+## Section 1: Title and Metadata
+
+```markdown
+# Spec: [Feature Name]
+
+**Author:** Jane Doe
+**Date:** 2026-03-25
+**Status:** Draft | In Review | Approved | Superseded
+**Reviewers:** John Smith, Alice Chen
+**Related specs:** SPEC-018 (User Registration), SPEC-023 (Session Management)
+```
+
+### Status Lifecycle
+
+| Status | Meaning | Who Can Change |
+|--------|---------|----------------|
+| Draft | Author is still writing. Not ready for review. | Author |
+| In Review | Ready for feedback. Implementation blocked. | Author |
+| Approved | Reviewed and accepted. Implementation may begin. | Reviewer |
+| Superseded | Replaced by a newer spec. Link to replacement. | Author |
+
+**Rule:** Implementation MUST NOT begin until status is "Approved."
+
+---
+
+## Section 2: Context
+
+The context section answers: **Why does this feature exist?**
+
+### What to Include
+
+- The problem being solved (with evidence: support tickets, metrics, user research)
+- The current state (what exists today and what is broken or missing)
+- The business justification (revenue impact, cost savings, user retention)
+- Constraints or dependencies (regulatory, technical, timeline)
+
+### What to Exclude
+
+- Implementation details (that is the engineer's job)
+- Solution proposals (the spec says WHAT, not HOW)
+- Lengthy background (2-4 paragraphs maximum)
+
+### Good Example
+
+```markdown
+## Context
+
+Users who forget their passwords currently have no self-service recovery.
+Support handles ~200 password reset requests per week, consuming approximately
+8 hours of agent time at $45/hour ($360/week, $18,720/year). Additionally,
+12% of users who contact support for a reset never return.
+
+This feature provides self-service password reset via email, eliminating
+support burden and reducing user churn from the reset flow.
+```
+
+### Bad Example
+
+```markdown
+## Context
+
+We need a password reset feature. Users forget their passwords sometimes
+and need to reset them. We should build this.
+```
+
+**Why it is bad:** No evidence, no metrics, no business justification. "We should build this" is not a reason.
+
+---
+
+## Section 3: Functional Requirements — RFC 2119
+
+### RFC 2119 Keywords
+
+These keywords have precise meanings per [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). Do not use them casually.
+
+| Keyword | Meaning | Testing Implication |
+|---------|---------|---------------------|
+| **MUST** | Absolute requirement. The implementation is non-conformant without this. | Must have a passing test. Failure = release blocker. |
+| **MUST NOT** | Absolute prohibition. Doing this = broken implementation. | Must have a test proving this cannot happen. |
+| **SHOULD** | Strongly recommended. Can be omitted only with documented justification. | Should have a test. Omission requires written rationale. |
+| **SHOULD NOT** | Strongly discouraged. Can be done only with documented justification. | Should have a test confirming the behavior does not occur. |
+| **MAY** | Truly optional. Implementer's discretion. | Test is optional. Document if implemented. |
+
+### Writing Good Requirements
+
+**Each requirement MUST be:**
+1. **Atomic** — One behavior per requirement. Not "The system MUST authenticate users and log them in."
+2. **Testable** — You can write a test that proves it works or does not.
+3. **Numbered** — Sequential FR-N format for traceability.
+4. **Specific** — No ambiguous adjectives ("fast", "secure", "user-friendly").
+
+### Good Requirements
+
+```markdown
+- FR-1: The system MUST accept login via email and password.
+- FR-2: The system MUST reject passwords shorter than 8 characters.
+- FR-3: The system MUST return a JWT access token on successful login.
+- FR-4: The system MUST NOT include the password hash in any API response.
+- FR-5: The system SHOULD support "remember me" with a 30-day refresh token.
+- FR-6: The system MAY display last login time on the dashboard.
+```
+
+### Bad Requirements
+
+```markdown
+- FR-1: The login system must be fast and secure.
+  (Untestable: what is "fast"? What is "secure"?)
+
+- FR-2: The system must handle all edge cases.
+  (Vague: which edge cases? This delegates the spec to the implementer.)
+
+- FR-3: Users should be able to log in easily.
+  (Subjective: "easily" is not measurable.)
+```
+
+---
+
+## Section 4: Non-Functional Requirements
+
+Non-functional requirements define quality attributes. Every requirement needs a **measurable threshold**.
+
+### Categories
+
+#### Performance
+```markdown
+- NFR-P1: Login API MUST respond in < 500ms (p95) under 1,000 concurrent users.
+- NFR-P2: Dashboard page MUST achieve Largest Contentful Paint < 2.5s.
+- NFR-P3: Search results MUST return within 200ms for queries under 100 characters.
+```
+
+**Bad:** "The system should be fast." (Not measurable.)
+
+#### Security
+```markdown
+- NFR-S1: All API endpoints MUST require authentication except /health and /login.
+- NFR-S2: Failed login attempts MUST be rate-limited to 5 per minute per IP.
+- NFR-S3: Passwords MUST be hashed with bcrypt (cost factor >= 12).
+- NFR-S4: Session tokens MUST be invalidated on password change.
+```
+
+#### Accessibility
+```markdown
+- NFR-A1: All form inputs MUST have associated labels (WCAG 1.3.1).
+- NFR-A2: Color contrast MUST meet 4.5:1 ratio (WCAG 1.4.3).
+- NFR-A3: All interactive elements MUST be keyboard-navigable (WCAG 2.1.1).
+```
+
+#### Scalability
+```markdown
+- NFR-SC1: The system SHOULD handle 50,000 registered users.
+- NFR-SC2: Database queries MUST use indexes; no full table scans on tables > 10K rows.
+```
+
+#### Reliability
+```markdown
+- NFR-R1: The authentication service MUST maintain 99.9% uptime (< 8.77h downtime/year).
+- NFR-R2: Data MUST NOT be lost on service restart (durable storage required).
+```
+
+---
+
+## Section 5: Acceptance Criteria — Given/When/Then
+
+Acceptance criteria are the contract between the spec author and the implementer. They define "done."
+
+### The Given/When/Then Pattern
+
+```
+Given [precondition — the world is in this state]
+When  [action — the user or system does this]
+Then  [outcome — this observable result occurs]
+And   [additional outcome — and also this]
+```
+
+### Rules for Acceptance Criteria
+
+1. **Every AC MUST reference at least one FR-* or NFR-*.** Orphaned criteria indicate missing requirements.
+2. **Every AC MUST be testable by a machine.** If you cannot write an automated test, rewrite the criterion.
+3. **No subjective language.** Not "should look good" but "MUST render within the design-system grid."
+4. **One scenario per AC.** If you have multiple Given/When/Then blocks, split into separate ACs.
+
+### Example: Authentication Feature
+
+```markdown
+### AC-1: Successful login (FR-1, FR-3)
+Given a registered user with email "user@example.com" and password "P@ssw0rd123"
+When they POST /api/auth/login with those credentials
+Then they receive a 200 response with a valid JWT token
+And the token expires in 24 hours
+And the response includes the user's display name
+
+### AC-2: Invalid password (FR-1)
+Given a registered user with email "user@example.com"
+When they POST /api/auth/login with an incorrect password
+Then they receive a 401 response
+And the response body contains error "INVALID_CREDENTIALS"
+And no token is issued
+
+### AC-3: Short password rejected on registration (FR-2)
+Given a new user attempting to register
+When they submit a password with 7 characters
+Then they receive a 400 response
+And the response body contains error "PASSWORD_TOO_SHORT"
+And the account is not created
+```
+
+### Common Mistakes
+
+| Mistake | Example | Fix |
+|---------|---------|-----|
+| Vague outcome | "Then the system works correctly" | "Then the response status is 200 and body contains {field: value}" |
+| Missing precondition | "When user logs in, then token is issued" | "Given a registered user, when they POST valid credentials, then..." |
+| Multiple scenarios | AC with 3 different When clauses | Split into 3 separate ACs |
+| No FR reference | "AC-5: User sees dashboard" | "AC-5: User sees dashboard (FR-7)" |
+
+---
+
+## Section 6: Edge Cases and Error Scenarios
+
+### What Counts as an Edge Case
+
+- Invalid or malformed input
+- External service failures (API down, timeout, rate-limited)
+- Concurrent operations (race conditions)
+- Boundary values (empty string, max length, zero, negative numbers)
+- State conflicts (already exists, already deleted, expired)
+
+### Format
+
+```markdown
+- EC-1: Empty email field → Return 400 with error "EMAIL_REQUIRED". Do not call auth service.
+- EC-2: Email exceeds 255 characters → Return 400 with error "EMAIL_TOO_LONG".
+- EC-3: OAuth provider returns 503 → Return 503 with "Service temporarily unavailable". Retry after 30s.
+- EC-4: Two users register same email simultaneously → First succeeds, second gets 409 Conflict.
+- EC-5: User clicks reset link after password was already changed → Show "Link already used."
+```
+
+### Coverage Rule
+
+For every external dependency, specify at least one failure:
+- Database: connection lost, timeout, constraint violation
+- API: 4xx, 5xx, timeout, invalid response
+- File system: file not found, permission denied, disk full
+- User input: empty, too long, wrong type, injection attempt
+
+---
+
+## Section 7: API Contracts
+
+### Notation
+
+Use TypeScript-style interfaces. They are readable by both frontend and backend engineers.
+
+```typescript
+interface CreateUserRequest {
+  email: string;         // MUST be valid email, max 255 chars
+  password: string;      // MUST be 8-128 chars
+  displayName: string;   // MUST be 1-100 chars, no HTML
+  role?: "user" | "admin"; // Default: "user"
+}
+```
+
+### What to Define
+
+For each endpoint:
+1. **HTTP method and path** (e.g., POST /api/users)
+2. **Request body** (fields, types, constraints, defaults)
+3. **Success response** (status code, body shape)
+4. **Error responses** (each error code with its status and body)
+5. **Headers** (Authorization, Content-Type, custom headers)
+
+### Error Response Convention
+
+```typescript
+interface ApiError {
+  error: string;         // Machine-readable code: "INVALID_CREDENTIALS"
+  message: string;       // Human-readable: "The email or password is incorrect."
+  details?: Record<string, string>;  // Field-level errors for validation
+}
+```
+
+Always include:
+- 400 for validation errors
+- 401 for authentication failures
+- 403 for authorization failures
+- 404 for not found
+- 409 for conflicts
+- 429 for rate limiting
+- 500 for unexpected errors (keep it generic — do not leak internals)
+
+---
+
+## Section 8: Data Models
+
+### Table Format
+
+```markdown
+### User
+| Field | Type | Constraints |
+|-------|------|-------------|
+| id | UUID | PK, auto-generated, immutable |
+| email | varchar(255) | Unique, not null, valid email |
+| passwordHash | varchar(60) | Not null, bcrypt, never in API responses |
+| displayName | varchar(100) | Not null |
+| role | enum('user','admin') | Default: 'user' |
+| createdAt | timestamp | UTC, immutable, auto-set |
+| updatedAt | timestamp | UTC, auto-updated |
+| deletedAt | timestamp | Null unless soft-deleted |
+```
+
+### Rules
+
+1. **Every entity in requirements MUST have a data model.** If FR-1 mentions "users", there must be a User model.
+2. **Constraints MUST match requirements.** If FR-2 says passwords >= 8 chars, the model must note that.
+3. **Include indexes.** If NFR-P1 says < 500ms queries, note which fields need indexes.
+4. **Specify soft vs. hard delete.** State it explicitly.
+
+---
+
+## Section 9: Out of Scope
+
+### Why This Section Matters
+
+Out of Scope prevents scope creep during implementation. When someone says "while you're in there, could you also..." — point them to this section.
+
+### Format
+
+```markdown
+- OS-1: Multi-factor authentication — Planned for Q3 (SPEC-045).
+- OS-2: Social login beyond Google/GitHub — Insufficient user demand (< 2% requests).
+- OS-3: Admin impersonation — Security review pending. Separate spec required.
+- OS-4: Password strength meter UI — Nice-to-have, deferred to design sprint 12.
+```
+
+### Rules
+
+1. **Every feature discussed and rejected MUST be listed.** This creates a paper trail.
+2. **Include the reason.** "Not now" is not a reason. "Insufficient demand (< 2% of requests)" is.
+3. **Link to future specs** when the exclusion is a deferral, not a rejection.
+
+---
+
+## Feature-Type Templates
+
+### CRUD Feature
+
+Focus on: all 4 operations, validation rules, authorization, pagination for list endpoints.
+
+```markdown
+- FR-1: Users MUST be able to create a [resource] with [required fields].
+- FR-2: Users MUST be able to read a [resource] by ID.
+- FR-3: Users MUST be able to list [resources] with pagination (default: 20/page).
+- FR-4: Users MUST be able to update [mutable fields] of their own [resources].
+- FR-5: Users MUST be able to delete their own [resources] (soft delete).
+- FR-6: Users MUST NOT be able to modify or delete other users' [resources].
+```
+
+### Integration Feature
+
+Focus on: external API contract, retry/fallback behavior, data mapping, error propagation.
+
+```markdown
+- FR-1: The system MUST call [external API] to [purpose].
+- FR-2: The system MUST retry failed calls up to 3 times with exponential backoff.
+- FR-3: The system MUST map [external field] to [internal field].
+- FR-4: The system MUST NOT expose external API errors directly to users.
+- EC-1: External API returns 5xx → Log error, return cached data if < 1h old, else 503.
+- EC-2: External API response schema changes → Log warning, reject unmappable fields.
+```
+
+### Migration Feature
+
+Focus on: backward compatibility, rollback plan, data integrity, zero-downtime deployment.
+
+```markdown
+- FR-1: The migration MUST transform [old schema] to [new schema].
+- FR-2: The migration MUST be reversible (rollback script required).
+- FR-3: The migration MUST NOT cause downtime exceeding 30 seconds.
+- FR-4: The migration MUST validate data integrity post-run (row count, checksum).
+- EC-1: Migration fails mid-way → Automatic rollback, alert ops team.
+- EC-2: New schema has stricter constraints → Log invalid rows, quarantine for manual review.
+```
+
+---
+
+## Checklist: Is This Spec Ready for Review?
+
+- [ ] Every section is filled (or marked N/A with reason)
+- [ ] All requirements use FR-N, NFR-N numbering
+- [ ] RFC 2119 keywords are UPPERCASE
+- [ ] Every AC references at least one requirement
+- [ ] Every AC uses Given/When/Then
+- [ ] Edge cases cover each external dependency failure
+- [ ] API contracts define success AND error responses
+- [ ] Data models include all entities from requirements
+- [ ] Out of Scope lists items discussed and rejected
+- [ ] No placeholder text remains
+- [ ] Context includes evidence (metrics, tickets, research)
+- [ ] Status is "In Review" (not still "Draft")