Analyzes the current state of testing across 301 Python scripts (0% unit test coverage), identifies 6 priority areas for improvement, and proposes a phased implementation plan. Key findings: CI quality gate is non-blocking, calculator/scoring scripts are trivially testable, and compliance checkers carry regulatory risk without test coverage. https://claude.ai/code/session_01MsVmZoAsPvLv7rAGDBGTbL
9.3 KiB
Test Coverage Analysis
Date: 2026-03-30 Scope: Full repository analysis of testing infrastructure, coverage gaps, and improvement recommendations.
Current State
By the Numbers
| Metric | Value |
|---|---|
| Total Python scripts | 301 |
| Scripts with any test coverage | 0 |
| Validation/quality scripts | 35 |
| CI quality gate checks | 5 (YAML lint, JSON schema, Python syntax, safety audit, markdown links) |
| Test framework configuration | None (no pytest.ini, tox.ini, etc.) |
| Test dependencies declared | None |
What Exists Today
The repository has no unit tests. Quality assurance relies on:
- CI quality gate (
ci-quality-gate.yml) - Runs syntax compilation, YAML linting, JSON schema validation, dependency safety audits, and markdown link checks. Most steps use|| true, making them non-blocking. - Playwright hooks - Anti-pattern detection for Playwright test files (not test execution).
- Skill validator (
engineering/skill-tester/) - Validates skill directory structure, script syntax, and argparse compliance. Designed for users to run on their own skills. - 35 validation scripts - Checkers and linters distributed across skills (SEO, compliance, security, API design). These are skill products, not repo infrastructure tests.
Key Observation
The CLAUDE.md explicitly states "No build system or test frameworks - intentional design choice for portability." However, the repository has grown to 301 Python scripts, many with pure computational logic that is highly testable and would benefit from regression protection.
Coverage Gaps (Prioritized)
Priority 1: Core Infrastructure Scripts (High Impact, Easy)
Scripts: scripts/generate-docs.py, scripts/sync-codex-skills.py, scripts/sync-gemini-skills.py
Risk: These scripts power the documentation site build and multi-platform sync. A regression here breaks the entire docs pipeline or causes silent data loss in skill synchronization.
What to test:
generate-docs.py: Skill file discovery logic, domain categorization, YAML frontmatter parsing, MkDocs nav generationsync-*-skills.py: Symlink creation, directory mapping, validation functions
Effort: Low. Functions are mostly pure with filesystem inputs that can be mocked or tested against fixture directories.
Priority 2: Calculator/Scoring Scripts (High Value, Trivial)
Scripts (examples):
product-team/product-manager-toolkit/scripts/rice_prioritizer.py- RICE formulaproduct-team/product-manager-toolkit/scripts/okr_tracker.py- OKR scoringfinance/financial-analysis/scripts/dcf_calculator.py- DCF valuationfinance/financial-analysis/scripts/ratio_analyzer.py- Financial ratiosmarketing-skill/campaign-analytics/scripts/roi_calculator.py- ROI calculationsengineering/skill-tester/scripts/quality_scorer.py- Quality scoring
Risk: Incorrect calculations silently produce wrong results. Users trust these as authoritative tools.
What to test:
- Known-input/known-output parameterized tests for all formulas
- Edge cases: zero values, negative inputs, division by zero, boundary scores
- Categorical-to-numeric mappings (e.g., "high" -> 3)
Effort: Trivial. These are pure functions with zero external dependencies.
Priority 3: Parser/Analyzer Scripts (Medium Impact, Moderate Effort)
Scripts (examples):
marketing-skill/seo-audit/scripts/seo_checker.py- HTML parsing + scoringmarketing-skill/schema-markup/scripts/schema_validator.py- JSON-LD validationengineering/api-design-reviewer/scripts/api_linter.py- API spec lintingengineering/docker-development/scripts/compose_validator.py- Docker Compose validationengineering/helm-chart-builder/scripts/values_validator.py- Helm values checkingengineering/changelog-generator/scripts/commit_linter.py- Conventional commit parsing
Risk: Parsers are notoriously fragile against edge-case inputs. Malformed HTML, YAML, or JSON can cause silent failures or crashes.
What to test:
- Well-formed input produces correct parsed output
- Malformed input is handled gracefully (no crashes, clear error messages)
- Edge cases: empty files, very large files, unicode content, missing required fields
Effort: Moderate. Requires crafting fixture files but the parser classes are self-contained.
Priority 4: Compliance Checker Scripts (High Regulatory Risk)
Scripts:
ra-qm-team/gdpr-dsgvo-expert/scripts/gdpr_compliance_checker.pyra-qm-team/fda-consultant-specialist/scripts/qsr_compliance_checker.pyra-qm-team/information-security-manager-iso27001/scripts/compliance_checker.pyra-qm-team/quality-documentation-manager/scripts/document_validator.py
Risk: Compliance tools that give false positives or false negatives have real regulatory consequences. Users rely on these for audit preparation.
What to test:
- Known-compliant inputs return passing results
- Known-noncompliant inputs flag correct violations
- Completeness: all documented requirements are actually checked
- Output format consistency (JSON/human-readable modes)
Effort: Moderate. Requires building compliance fixture data.
Priority 5: CI Quality Gate Hardening
Current problem: Most CI steps use || true, meaning failures are swallowed silently. The quality gate currently cannot block a broken PR.
Recommended improvements:
- Remove
|| truefrom Python syntax check (currently only checks 5 of 9+ skill directories) - Add
engineering/,business-growth/,finance/,project-management/to the compileall step - Add a
--helpsmoke test for all argparse-based scripts (the repo already validated 237/237 passing) - Add SKILL.md structure validation (required sections, YAML frontmatter)
- Make at least syntax and import checks blocking (remove
|| true)
Priority 6: Integration/Smoke Tests for Skill Packages
What's missing: No test verifies that a complete skill directory is internally consistent - that SKILL.md references to scripts and references actually exist, that scripts listed in workflows are present, etc.
What to test:
- All file paths referenced in SKILL.md exist
- All scripts in
scripts/directories passpython script.py --help - All referenced
references/*.mdfiles exist and are non-empty - YAML frontmatter in SKILL.md is valid
Recommended Implementation Plan
Phase 1: Foundation (1-2 days)
- Add
pytestto a top-levelrequirements-dev.txt - Create a
tests/directory at the repo root - Add pytest configuration in
pyproject.toml(minimal) - Write smoke tests: import +
--helpfor all 301 scripts - Harden CI: remove
|| truefrom syntax checks, expand compileall scope
Phase 2: Unit Tests for Pure Logic (2-3 days)
- Test all calculator/scoring scripts (Priority 2) - ~15 scripts, parameterized tests
- Test core infrastructure scripts (Priority 1) - 3 scripts with mocked filesystem
- Add to CI pipeline as a blocking step
Phase 3: Parser and Validator Tests (3-5 days)
- Create fixture files for each parser type (HTML, YAML, JSON, Dockerfile, etc.)
- Test parser scripts (Priority 3) - ~10 scripts
- Test compliance checkers (Priority 4) - ~5 scripts with compliance fixtures
- Add to CI pipeline
Phase 4: Integration Tests (2-3 days)
- Skill package consistency validation (Priority 6)
- Cross-reference validation (SKILL.md -> scripts, references)
- Documentation build test (generate-docs.py end-to-end)
Quick Win: Starter Test Examples
Example 1: RICE Calculator Test
# tests/test_rice_prioritizer.py
import pytest
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'product-team', 'product-manager-toolkit', 'scripts'))
from rice_prioritizer import RICECalculator
@pytest.mark.parametrize("reach,impact,confidence,effort,expected_min", [
(1000, "massive", "high", "medium", 500),
(0, "high", "high", "low", 0),
(100, "low", "low", "massive", 0),
])
def test_rice_calculation(reach, impact, confidence, effort, expected_min):
calc = RICECalculator()
result = calc.calculate_rice(reach, impact, confidence, effort)
assert result["score"] >= expected_min
Example 2: Script Smoke Test
# tests/test_script_smoke.py
import subprocess, glob, pytest
scripts = glob.glob("**/scripts/*.py", recursive=True)
@pytest.mark.parametrize("script", scripts)
def test_script_syntax(script):
result = subprocess.run(["python", "-m", "py_compile", script], capture_output=True)
assert result.returncode == 0, f"Syntax error in {script}: {result.stderr.decode()}"
Summary
The repository has 0% unit test coverage across 301 Python scripts. The CI quality gate exists but is non-blocking (|| true). The highest-impact improvements are:
- Harden CI - Make syntax checks blocking, expand scope to all directories
- Test pure calculations - Trivial effort, high trust value for calculator scripts
- Test infrastructure scripts - Protect the docs build and sync pipelines
- Test parsers with fixtures - Prevent regressions in fragile parsing logic
- Test compliance checkers - Regulatory correctness matters
The recommended phased approach adds meaningful coverage within 1-2 weeks without violating the repository's "minimal dependencies" philosophy - pytest is the only addition needed.