docs: add test coverage analysis with prioritized improvement plan

Analyzes the current state of testing across 301 Python scripts (0% unit test coverage), identifies 6 priority areas for improvement, and proposes a phased implementation plan. Key findings: CI quality gate is non-blocking, calculator/scoring scripts are trivially testable, and compliance checkers carry regulatory risk without test coverage. https://claude.ai/code/session_01MsVmZoAsPvLv7rAGDBGTbL
2026-03-30 19:33:25 +00:00
parent fb1c17b064
commit 235c063701
1 changed files with 218 additions and 0 deletions
--- a/documentation/TEST_COVERAGE_ANALYSIS.md
+++ b/documentation/TEST_COVERAGE_ANALYSIS.md
@@ -0,0 +1,218 @@
 # Test Coverage Analysis
 **Date:** 2026-03-30
 **Scope:** Full repository analysis of testing infrastructure, coverage gaps, and improvement recommendations.
 ---
 ## Current State
 ### By the Numbers
 | Metric | Value |
 |--------|-------|
 | Total Python scripts | 301 |
 | Scripts with any test coverage | 0 |
 | Validation/quality scripts | 35 |
 | CI quality gate checks | 5 (YAML lint, JSON schema, Python syntax, safety audit, markdown links) |
 | Test framework configuration | None (no pytest.ini, tox.ini, etc.) |
 | Test dependencies declared | None |
 ### What Exists Today
 The repository has **no unit tests**. Quality assurance relies on:
 1. **CI quality gate** (`ci-quality-gate.yml`) - Runs syntax compilation, YAML linting, JSON schema validation, dependency safety audits, and markdown link checks. Most steps use `|| true`, making them non-blocking.
 2. **Playwright hooks** - Anti-pattern detection for Playwright test files (not test execution).
 3. **Skill validator** (`engineering/skill-tester/`) - Validates skill directory structure, script syntax, and argparse compliance. Designed for users to run on their own skills.
 4. **35 validation scripts** - Checkers and linters distributed across skills (SEO, compliance, security, API design). These are *skill products*, not repo infrastructure tests.
 ### Key Observation
 The CLAUDE.md explicitly states "No build system or test frameworks - intentional design choice for portability." However, the repository has grown to 301 Python scripts, many with pure computational logic that is highly testable and would benefit from regression protection.
 ---
 ## Coverage Gaps (Prioritized)
 ### Priority 1: Core Infrastructure Scripts (High Impact, Easy)
 **Scripts:** `scripts/generate-docs.py`, `scripts/sync-codex-skills.py`, `scripts/sync-gemini-skills.py`
 **Risk:** These scripts power the documentation site build and multi-platform sync. A regression here breaks the entire docs pipeline or causes silent data loss in skill synchronization.
 **What to test:**
 - `generate-docs.py`: Skill file discovery logic, domain categorization, YAML frontmatter parsing, MkDocs nav generation
 - `sync-*-skills.py`: Symlink creation, directory mapping, validation functions
 **Effort:** Low. Functions are mostly pure with filesystem inputs that can be mocked or tested against fixture directories.
 ---
 ### Priority 2: Calculator/Scoring Scripts (High Value, Trivial)
 **Scripts (examples):**
 - `product-team/product-manager-toolkit/scripts/rice_prioritizer.py` - RICE formula
 - `product-team/product-manager-toolkit/scripts/okr_tracker.py` - OKR scoring
 - `finance/financial-analysis/scripts/dcf_calculator.py` - DCF valuation
 - `finance/financial-analysis/scripts/ratio_analyzer.py` - Financial ratios
 - `marketing-skill/campaign-analytics/scripts/roi_calculator.py` - ROI calculations
 - `engineering/skill-tester/scripts/quality_scorer.py` - Quality scoring
 **Risk:** Incorrect calculations silently produce wrong results. Users trust these as authoritative tools.
 **What to test:**
 - Known-input/known-output parameterized tests for all formulas
 - Edge cases: zero values, negative inputs, division by zero, boundary scores
 - Categorical-to-numeric mappings (e.g., "high" -> 3)
 **Effort:** Trivial. These are pure functions with zero external dependencies.
 ---
 ### Priority 3: Parser/Analyzer Scripts (Medium Impact, Moderate Effort)
 **Scripts (examples):**
 - `marketing-skill/seo-audit/scripts/seo_checker.py` - HTML parsing + scoring
 - `marketing-skill/schema-markup/scripts/schema_validator.py` - JSON-LD validation
 - `engineering/api-design-reviewer/scripts/api_linter.py` - API spec linting
 - `engineering/docker-development/scripts/compose_validator.py` - Docker Compose validation
 - `engineering/helm-chart-builder/scripts/values_validator.py` - Helm values checking
 - `engineering/changelog-generator/scripts/commit_linter.py` - Conventional commit parsing
 **Risk:** Parsers are notoriously fragile against edge-case inputs. Malformed HTML, YAML, or JSON can cause silent failures or crashes.
 **What to test:**
 - Well-formed input produces correct parsed output
 - Malformed input is handled gracefully (no crashes, clear error messages)
 - Edge cases: empty files, very large files, unicode content, missing required fields
 **Effort:** Moderate. Requires crafting fixture files but the parser classes are self-contained.
 ---
 ### Priority 4: Compliance Checker Scripts (High Regulatory Risk)
 **Scripts:**
 - `ra-qm-team/gdpr-dsgvo-expert/scripts/gdpr_compliance_checker.py`
 - `ra-qm-team/fda-consultant-specialist/scripts/qsr_compliance_checker.py`
 - `ra-qm-team/information-security-manager-iso27001/scripts/compliance_checker.py`
 - `ra-qm-team/quality-documentation-manager/scripts/document_validator.py`
 **Risk:** Compliance tools that give false positives or false negatives have real regulatory consequences. Users rely on these for audit preparation.
 **What to test:**
 - Known-compliant inputs return passing results
 - Known-noncompliant inputs flag correct violations
 - Completeness: all documented requirements are actually checked
 - Output format consistency (JSON/human-readable modes)
 **Effort:** Moderate. Requires building compliance fixture data.
 ---
 ### Priority 5: CI Quality Gate Hardening
 **Current problem:** Most CI steps use `|| true`, meaning failures are swallowed silently. The quality gate currently cannot block a broken PR.
 **Recommended improvements:**
 - Remove `|| true` from Python syntax check (currently only checks 5 of 9+ skill directories)
 - Add `engineering/`, `business-growth/`, `finance/`, `project-management/` to the compileall step
 - Add a `--help` smoke test for all argparse-based scripts (the repo already validated 237/237 passing)
 - Add SKILL.md structure validation (required sections, YAML frontmatter)
 - Make at least syntax and import checks blocking (remove `|| true`)
 ---
 ### Priority 6: Integration/Smoke Tests for Skill Packages
 **What's missing:** No test verifies that a complete skill directory is internally consistent - that SKILL.md references to scripts and references actually exist, that scripts listed in workflows are present, etc.
 **What to test:**
 - All file paths referenced in SKILL.md exist
 - All scripts in `scripts/` directories pass `python script.py --help`
 - All referenced `references/*.md` files exist and are non-empty
 - YAML frontmatter in SKILL.md is valid
 ---
 ## Recommended Implementation Plan
 ### Phase 1: Foundation (1-2 days)
 1. Add `pytest` to a top-level `requirements-dev.txt`
 2. Create a `tests/` directory at the repo root
 3. Add pytest configuration in `pyproject.toml` (minimal)
 4. Write smoke tests: import + `--help` for all 301 scripts
 5. Harden CI: remove `|| true` from syntax checks, expand compileall scope
 ### Phase 2: Unit Tests for Pure Logic (2-3 days)
 1. Test all calculator/scoring scripts (Priority 2) - ~15 scripts, parameterized tests
 2. Test core infrastructure scripts (Priority 1) - 3 scripts with mocked filesystem
 3. Add to CI pipeline as a blocking step
 ### Phase 3: Parser and Validator Tests (3-5 days)
 1. Create fixture files for each parser type (HTML, YAML, JSON, Dockerfile, etc.)
 2. Test parser scripts (Priority 3) - ~10 scripts
 3. Test compliance checkers (Priority 4) - ~5 scripts with compliance fixtures
 4. Add to CI pipeline
 ### Phase 4: Integration Tests (2-3 days)
 1. Skill package consistency validation (Priority 6)
 2. Cross-reference validation (SKILL.md -> scripts, references)
 3. Documentation build test (generate-docs.py end-to-end)
 ---
 ## Quick Win: Starter Test Examples
 ### Example 1: RICE Calculator Test
 ```python
 # tests/test_rice_prioritizer.py
 import pytest
 import sys, os
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'product-team', 'product-manager-toolkit', 'scripts'))
 from rice_prioritizer import RICECalculator
@pytest.mark.parametrize("reach,impact,confidence,effort,expected_min", [
    (1000, "massive", "high", "medium", 500),
    (0, "high", "high", "low", 0),
    (100, "low", "low", "massive", 0),
 ])
 def test_rice_calculation(reach, impact, confidence, effort, expected_min):
    calc = RICECalculator()
    result = calc.calculate_rice(reach, impact, confidence, effort)
    assert result["score"] >= expected_min
 ```
 ### Example 2: Script Smoke Test
 ```python
 # tests/test_script_smoke.py
 import subprocess, glob, pytest
 scripts = glob.glob("**/scripts/*.py", recursive=True)
@pytest.mark.parametrize("script", scripts)
 def test_script_syntax(script):
    result = subprocess.run(["python", "-m", "py_compile", script], capture_output=True)
    assert result.returncode == 0, f"Syntax error in {script}: {result.stderr.decode()}"
 ```
 ---
 ## Summary
 The repository has **0% unit test coverage** across 301 Python scripts. The CI quality gate exists but is non-blocking (`|| true`). The highest-impact improvements are:
 1. **Harden CI** - Make syntax checks blocking, expand scope to all directories
 2. **Test pure calculations** - Trivial effort, high trust value for calculator scripts
 3. **Test infrastructure scripts** - Protect the docs build and sync pipelines
 4. **Test parsers with fixtures** - Prevent regressions in fragile parsing logic
 5. **Test compliance checkers** - Regulatory correctness matters
 The recommended phased approach adds meaningful coverage within 1-2 weeks without violating the repository's "minimal dependencies" philosophy - pytest is the only addition needed.