Files
claude-skills-reference/documentation/TEST_COVERAGE_ANALYSIS.md
Claude 235c063701 docs: add test coverage analysis with prioritized improvement plan
Analyzes the current state of testing across 301 Python scripts (0% unit
test coverage), identifies 6 priority areas for improvement, and proposes
a phased implementation plan. Key findings: CI quality gate is non-blocking,
calculator/scoring scripts are trivially testable, and compliance checkers
carry regulatory risk without test coverage.

https://claude.ai/code/session_01MsVmZoAsPvLv7rAGDBGTbL
2026-03-30 19:33:25 +00:00

9.3 KiB

Test Coverage Analysis

Date: 2026-03-30 Scope: Full repository analysis of testing infrastructure, coverage gaps, and improvement recommendations.


Current State

By the Numbers

Metric Value
Total Python scripts 301
Scripts with any test coverage 0
Validation/quality scripts 35
CI quality gate checks 5 (YAML lint, JSON schema, Python syntax, safety audit, markdown links)
Test framework configuration None (no pytest.ini, tox.ini, etc.)
Test dependencies declared None

What Exists Today

The repository has no unit tests. Quality assurance relies on:

  1. CI quality gate (ci-quality-gate.yml) - Runs syntax compilation, YAML linting, JSON schema validation, dependency safety audits, and markdown link checks. Most steps use || true, making them non-blocking.
  2. Playwright hooks - Anti-pattern detection for Playwright test files (not test execution).
  3. Skill validator (engineering/skill-tester/) - Validates skill directory structure, script syntax, and argparse compliance. Designed for users to run on their own skills.
  4. 35 validation scripts - Checkers and linters distributed across skills (SEO, compliance, security, API design). These are skill products, not repo infrastructure tests.

Key Observation

The CLAUDE.md explicitly states "No build system or test frameworks - intentional design choice for portability." However, the repository has grown to 301 Python scripts, many with pure computational logic that is highly testable and would benefit from regression protection.


Coverage Gaps (Prioritized)

Priority 1: Core Infrastructure Scripts (High Impact, Easy)

Scripts: scripts/generate-docs.py, scripts/sync-codex-skills.py, scripts/sync-gemini-skills.py

Risk: These scripts power the documentation site build and multi-platform sync. A regression here breaks the entire docs pipeline or causes silent data loss in skill synchronization.

What to test:

  • generate-docs.py: Skill file discovery logic, domain categorization, YAML frontmatter parsing, MkDocs nav generation
  • sync-*-skills.py: Symlink creation, directory mapping, validation functions

Effort: Low. Functions are mostly pure with filesystem inputs that can be mocked or tested against fixture directories.


Priority 2: Calculator/Scoring Scripts (High Value, Trivial)

Scripts (examples):

  • product-team/product-manager-toolkit/scripts/rice_prioritizer.py - RICE formula
  • product-team/product-manager-toolkit/scripts/okr_tracker.py - OKR scoring
  • finance/financial-analysis/scripts/dcf_calculator.py - DCF valuation
  • finance/financial-analysis/scripts/ratio_analyzer.py - Financial ratios
  • marketing-skill/campaign-analytics/scripts/roi_calculator.py - ROI calculations
  • engineering/skill-tester/scripts/quality_scorer.py - Quality scoring

Risk: Incorrect calculations silently produce wrong results. Users trust these as authoritative tools.

What to test:

  • Known-input/known-output parameterized tests for all formulas
  • Edge cases: zero values, negative inputs, division by zero, boundary scores
  • Categorical-to-numeric mappings (e.g., "high" -> 3)

Effort: Trivial. These are pure functions with zero external dependencies.


Priority 3: Parser/Analyzer Scripts (Medium Impact, Moderate Effort)

Scripts (examples):

  • marketing-skill/seo-audit/scripts/seo_checker.py - HTML parsing + scoring
  • marketing-skill/schema-markup/scripts/schema_validator.py - JSON-LD validation
  • engineering/api-design-reviewer/scripts/api_linter.py - API spec linting
  • engineering/docker-development/scripts/compose_validator.py - Docker Compose validation
  • engineering/helm-chart-builder/scripts/values_validator.py - Helm values checking
  • engineering/changelog-generator/scripts/commit_linter.py - Conventional commit parsing

Risk: Parsers are notoriously fragile against edge-case inputs. Malformed HTML, YAML, or JSON can cause silent failures or crashes.

What to test:

  • Well-formed input produces correct parsed output
  • Malformed input is handled gracefully (no crashes, clear error messages)
  • Edge cases: empty files, very large files, unicode content, missing required fields

Effort: Moderate. Requires crafting fixture files but the parser classes are self-contained.


Priority 4: Compliance Checker Scripts (High Regulatory Risk)

Scripts:

  • ra-qm-team/gdpr-dsgvo-expert/scripts/gdpr_compliance_checker.py
  • ra-qm-team/fda-consultant-specialist/scripts/qsr_compliance_checker.py
  • ra-qm-team/information-security-manager-iso27001/scripts/compliance_checker.py
  • ra-qm-team/quality-documentation-manager/scripts/document_validator.py

Risk: Compliance tools that give false positives or false negatives have real regulatory consequences. Users rely on these for audit preparation.

What to test:

  • Known-compliant inputs return passing results
  • Known-noncompliant inputs flag correct violations
  • Completeness: all documented requirements are actually checked
  • Output format consistency (JSON/human-readable modes)

Effort: Moderate. Requires building compliance fixture data.


Priority 5: CI Quality Gate Hardening

Current problem: Most CI steps use || true, meaning failures are swallowed silently. The quality gate currently cannot block a broken PR.

Recommended improvements:

  • Remove || true from Python syntax check (currently only checks 5 of 9+ skill directories)
  • Add engineering/, business-growth/, finance/, project-management/ to the compileall step
  • Add a --help smoke test for all argparse-based scripts (the repo already validated 237/237 passing)
  • Add SKILL.md structure validation (required sections, YAML frontmatter)
  • Make at least syntax and import checks blocking (remove || true)

Priority 6: Integration/Smoke Tests for Skill Packages

What's missing: No test verifies that a complete skill directory is internally consistent - that SKILL.md references to scripts and references actually exist, that scripts listed in workflows are present, etc.

What to test:

  • All file paths referenced in SKILL.md exist
  • All scripts in scripts/ directories pass python script.py --help
  • All referenced references/*.md files exist and are non-empty
  • YAML frontmatter in SKILL.md is valid

Phase 1: Foundation (1-2 days)

  1. Add pytest to a top-level requirements-dev.txt
  2. Create a tests/ directory at the repo root
  3. Add pytest configuration in pyproject.toml (minimal)
  4. Write smoke tests: import + --help for all 301 scripts
  5. Harden CI: remove || true from syntax checks, expand compileall scope

Phase 2: Unit Tests for Pure Logic (2-3 days)

  1. Test all calculator/scoring scripts (Priority 2) - ~15 scripts, parameterized tests
  2. Test core infrastructure scripts (Priority 1) - 3 scripts with mocked filesystem
  3. Add to CI pipeline as a blocking step

Phase 3: Parser and Validator Tests (3-5 days)

  1. Create fixture files for each parser type (HTML, YAML, JSON, Dockerfile, etc.)
  2. Test parser scripts (Priority 3) - ~10 scripts
  3. Test compliance checkers (Priority 4) - ~5 scripts with compliance fixtures
  4. Add to CI pipeline

Phase 4: Integration Tests (2-3 days)

  1. Skill package consistency validation (Priority 6)
  2. Cross-reference validation (SKILL.md -> scripts, references)
  3. Documentation build test (generate-docs.py end-to-end)

Quick Win: Starter Test Examples

Example 1: RICE Calculator Test

# tests/test_rice_prioritizer.py
import pytest
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'product-team', 'product-manager-toolkit', 'scripts'))
from rice_prioritizer import RICECalculator

@pytest.mark.parametrize("reach,impact,confidence,effort,expected_min", [
    (1000, "massive", "high", "medium", 500),
    (0, "high", "high", "low", 0),
    (100, "low", "low", "massive", 0),
])
def test_rice_calculation(reach, impact, confidence, effort, expected_min):
    calc = RICECalculator()
    result = calc.calculate_rice(reach, impact, confidence, effort)
    assert result["score"] >= expected_min

Example 2: Script Smoke Test

# tests/test_script_smoke.py
import subprocess, glob, pytest

scripts = glob.glob("**/scripts/*.py", recursive=True)

@pytest.mark.parametrize("script", scripts)
def test_script_syntax(script):
    result = subprocess.run(["python", "-m", "py_compile", script], capture_output=True)
    assert result.returncode == 0, f"Syntax error in {script}: {result.stderr.decode()}"

Summary

The repository has 0% unit test coverage across 301 Python scripts. The CI quality gate exists but is non-blocking (|| true). The highest-impact improvements are:

  1. Harden CI - Make syntax checks blocking, expand scope to all directories
  2. Test pure calculations - Trivial effort, high trust value for calculator scripts
  3. Test infrastructure scripts - Protect the docs build and sync pipelines
  4. Test parsers with fixtures - Prevent regressions in fragile parsing logic
  5. Test compliance checkers - Regulatory correctness matters

The recommended phased approach adds meaningful coverage within 1-2 weeks without violating the repository's "minimal dependencies" philosophy - pytest is the only addition needed.