Files
skill-seekers-reference/TESTING_GAP_REPORT.md
yusyus 73adda0b17 docs: update all chunk flag names to match renamed CLI flags
Replace all occurrences of old ambiguous flag names with the new explicit ones:
  --chunk-size (tokens)  → --chunk-tokens
  --chunk-overlap        → --chunk-overlap-tokens
  --chunk                → --chunk-for-rag
  --streaming-chunk-size → --streaming-chunk-chars
  --streaming-overlap    → --streaming-overlap-chars
  --chunk-size (pages)   → --pdf-pages-per-chunk

Updated: CLI_REFERENCE (EN+ZH), user-guide (EN+ZH), integrations (Haystack,
Chroma, Weaviate, FAISS, Qdrant), features/PDF_CHUNKING, examples/haystack-pipeline,
strategy docs, archive docs, and CHANGELOG.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:15:14 +03:00

346 lines
10 KiB
Markdown

# Comprehensive Testing Gap Report
**Project:** Skill Seekers v3.1.0
**Date:** 2026-02-22
**Total Test Files:** 113
**Total Test Functions:** ~208+ (collected: 2173 tests)
---
## Executive Summary
### Overall Test Health: 🟡 GOOD with Gaps
| Category | Status | Coverage | Key Gaps |
|----------|--------|----------|----------|
| CLI Arguments | ✅ Good | 85% | Some edge cases |
| Workflow System | ✅ Excellent | 90% | Inline stage parsing edge cases |
| Scrapers | 🟡 Moderate | 70% | Missing real HTTP/PDF tests |
| Enhancement | 🟡 Partial | 60% | Core logic not tested |
| MCP Tools | 🟡 Good | 75% | 8 tools not covered |
| Integration/E2E | 🟡 Moderate | 65% | Heavy mocking |
| Adaptors | ✅ Good | 80% | Good coverage per platform |
---
## Detailed Findings by Category
### 1. CLI Argument Tests ✅ GOOD
**Files Reviewed:**
- `test_analyze_command.py` (269 lines, 26 tests)
- `test_unified.py` - TestUnifiedCLIArguments class (6 tests)
- `test_pdf_scraper.py` - TestPDFCLIArguments class (4 tests)
- `test_create_arguments.py` (399 lines)
- `test_create_integration_basic.py` (310 lines, 23 tests)
**Strengths:**
- All new workflow flags are tested (`--enhance-workflow`, `--enhance-stage`, `--var`, `--workflow-dry-run`)
- Argument parsing thoroughly tested
- Default values verified
- Complex command combinations tested
**Gaps:**
- `test_create_integration_basic.py`: 2 tests skipped (source auto-detection not fully tested)
- No tests for invalid argument combinations beyond basic parsing errors
---
### 2. Workflow Tests ✅ EXCELLENT
**Files Reviewed:**
- `test_workflow_runner.py` (445 lines, 30+ tests)
- `test_workflows_command.py` (571 lines, 40+ tests)
- `test_workflow_tools_mcp.py` (295 lines, 20+ tests)
**Strengths:**
- Comprehensive workflow execution tests
- Variable substitution thoroughly tested
- Dry-run mode tested
- Workflow chaining tested
- All 6 workflow subcommands tested (list, show, copy, add, remove, validate)
- MCP workflow tools tested
**Minor Gaps:**
- No tests for `_build_inline_engine` edge cases
- No tests for malformed stage specs (empty, invalid format)
---
### 3. Scraper Tests 🟡 MODERATE with Significant Gaps
**Files Reviewed:**
- `test_scraper_features.py` (524 lines) - Doc scraper features
- `test_codebase_scraper.py` (478 lines) - Codebase analysis
- `test_pdf_scraper.py` (558 lines) - PDF scraper
- `test_github_scraper.py` (1015 lines) - GitHub scraper
- `test_unified_analyzer.py` (428 lines) - Unified analyzer
**Critical Gaps:**
#### A. Missing Real External Resource Tests
| Resource | Test Type | Status |
|----------|-----------|--------|
| HTTP Requests (docs) | Mocked only | ❌ Gap |
| PDF Extraction | Mocked only | ❌ Gap |
| GitHub API | Mocked only | ❌ Gap (acceptable) |
| Local Files | Real tests | ✅ Good |
#### B. Missing Core Function Tests
| Function | Location | Priority |
|----------|----------|----------|
| `UnifiedScraper.run()` | unified_scraper.py | 🔴 High |
| `UnifiedScraper._scrape_documentation()` | unified_scraper.py | 🔴 High |
| `UnifiedScraper._scrape_github()` | unified_scraper.py | 🔴 High |
| `UnifiedScraper._scrape_pdf()` | unified_scraper.py | 🔴 High |
| `UnifiedScraper._scrape_local()` | unified_scraper.py | 🟡 Medium |
| `DocToSkillConverter.scrape()` | doc_scraper.py | 🔴 High |
| `PDFToSkillConverter.extract_pdf()` | pdf_scraper.py | 🔴 High |
#### C. PDF Scraper Limited Coverage
- No actual PDF parsing tests (only mocked)
- OCR functionality not tested
- Page range extraction not tested
---
### 4. Enhancement Tests 🟡 PARTIAL - MAJOR GAPS
**Files Reviewed:**
- `test_enhance_command.py` (367 lines, 25+ tests)
- `test_enhance_skill_local.py` (163 lines, 14 tests)
**Critical Gap in `test_enhance_skill_local.py`:**
| Function | Lines | Tested? | Priority |
|----------|-------|---------|----------|
| `summarize_reference()` | ~50 | ❌ No | 🔴 High |
| `create_enhancement_prompt()` | ~200 | ❌ No | 🔴 High |
| `run()` | ~100 | ❌ No | 🔴 High |
| `_run_headless()` | ~130 | ❌ No | 🔴 High |
| `_run_background()` | ~80 | ❌ No | 🟡 Medium |
| `_run_daemon()` | ~60 | ❌ No | 🟡 Medium |
| `write_status()` | ~30 | ❌ No | 🟡 Medium |
| `read_status()` | ~40 | ❌ No | 🟡 Medium |
| `detect_terminal_app()` | ~80 | ❌ No | 🟡 Medium |
**Current Tests Only Cover:**
- Agent presets configuration
- Command building
- Agent name normalization
- Environment variable handling
**Recommendation:** Add comprehensive tests for the core enhancement logic.
---
### 5. MCP Tool Tests 🟡 GOOD with Coverage Gaps
**Files Reviewed:**
- `test_mcp_fastmcp.py` (868 lines)
- `test_mcp_server.py` (715 lines)
- `test_mcp_vector_dbs.py` (259 lines)
- `test_real_world_fastmcp.py` (558 lines)
**Coverage Analysis:**
| Tool Category | Tools | Tested | Coverage |
|---------------|-------|--------|----------|
| Config Tools | 3 | 3 | ✅ 100% |
| Scraping Tools | 8 | 4 | 🟡 50% |
| Packaging Tools | 4 | 4 | ✅ 100% |
| Splitting Tools | 2 | 2 | ✅ 100% |
| Source Tools | 5 | 5 | ✅ 100% |
| Vector DB Tools | 4 | 4 | ✅ 100% |
| Workflow Tools | 5 | 0 | ❌ 0% |
| **Total** | **31** | **22** | **🟡 71%** |
**Untested Tools:**
1. `detect_patterns`
2. `extract_test_examples`
3. `build_how_to_guides`
4. `extract_config_patterns`
5. `list_workflows`
6. `get_workflow`
7. `create_workflow`
8. `update_workflow`
9. `delete_workflow`
**Note:** `test_mcp_server.py` tests legacy server, `test_mcp_fastmcp.py` tests modern server.
---
### 6. Integration/E2E Tests 🟡 MODERATE
**Files Reviewed:**
- `test_create_integration_basic.py` (310 lines)
- `test_e2e_three_stream_pipeline.py` (598 lines)
- `test_analyze_e2e.py` (344 lines)
- `test_install_skill_e2e.py` (533 lines)
- `test_c3_integration.py` (362 lines)
**Issues Found:**
1. **Skipped Tests:**
- `test_create_detects_web_url` - Source auto-detection incomplete
- `test_create_invalid_source_shows_error` - Error handling incomplete
- `test_cli_via_unified_command` - Asyncio issues
2. **Heavy Mocking:**
- Most GitHub API tests use mocking
- No real HTTP tests for doc scraping
- Integration tests don't test actual integration
3. **Limited Scope:**
- Only `--quick` preset tested (not `--comprehensive`)
- C3.x tests use mock data only
- Most E2E tests are unit tests with mocks
---
### 7. Adaptor Tests ✅ GOOD
**Files Reviewed:**
- `test_adaptors/test_adaptors_e2e.py` (893 lines)
- `test_adaptors/test_claude_adaptor.py` (314 lines)
- `test_adaptors/test_gemini_adaptor.py` (146 lines)
- `test_adaptors/test_openai_adaptor.py` (188 lines)
- Plus 8 more platform adaptors
**Strengths:**
- Each adaptor has dedicated tests
- Package format testing
- Upload success/failure scenarios
- Platform-specific features tested
**Minor Gaps:**
- Some adaptors only test 1-2 scenarios
- Error handling coverage varies by platform
---
### 8. Config/Validation Tests ✅ GOOD
**Files Reviewed:**
- `test_config_validation.py` (270 lines)
- `test_config_extractor.py` (629 lines)
- `test_config_fetcher.py` (340 lines)
**Strengths:**
- Unified vs legacy format detection
- Field validation comprehensive
- Error message quality tested
---
## Summary of Critical Testing Gaps
### 🔴 HIGH PRIORITY (Must Fix)
1. **Enhancement Core Logic**
- File: `test_enhance_skill_local.py`
- Missing: 9 major functions
- Impact: Core feature untested
2. **Unified Scraper Main Flow**
- File: New tests needed
- Missing: `_scrape_*()` methods, `run()` orchestration
- Impact: Multi-source scraping untested
3. **Actual HTTP/PDF/GitHub Integration**
- Missing: Real external resource tests
- Impact: Only mock tests exist
### 🟡 MEDIUM PRIORITY (Should Fix)
4. **MCP Workflow Tools**
- Missing: 5 workflow tools (0% coverage)
- Impact: MCP workflow features untested
5. **Skipped Integration Tests**
- 3 tests skipped
- Impact: Source auto-detection incomplete
6. **PDF Real Extraction**
- Missing: Actual PDF parsing
- Impact: PDF feature quality unknown
### 🟢 LOW PRIORITY (Nice to Have)
7. **Additional Scraping Tools**
- Missing: 4 scraping tool tests
- Impact: Low (core tools covered)
8. **Edge Case Coverage**
- Missing: Invalid argument combinations
- Impact: Low (happy path covered)
---
## Recommendations
### Immediate Actions (Next Sprint)
1. **Add Enhancement Logic Tests** (~400 lines)
- Test `summarize_reference()`
- Test `create_enhancement_prompt()`
- Test `run()` method
- Test status read/write
2. **Fix Skipped Tests** (~100 lines)
- Fix asyncio issues in `test_cli_via_unified_command`
- Complete source auto-detection tests
3. **Add MCP Workflow Tool Tests** (~200 lines)
- Test all 5 workflow tools
### Short Term (Next Month)
4. **Add Unified Scraper Integration Tests** (~300 lines)
- Test main orchestration flow
- Test individual source scraping
5. **Add Real PDF Tests** (~150 lines)
- Test with actual PDF files
- Test OCR if available
### Long Term (Next Quarter)
6. **HTTP Integration Tests** (~200 lines)
- Test with real websites (use test sites)
- Mock server approach
7. **Complete E2E Pipeline** (~300 lines)
- Full workflow from scrape to upload
- Real GitHub repo (fork test repo)
---
## Test Quality Metrics
| Metric | Score | Notes |
|--------|-------|-------|
| Test Count | 🟢 Good | 2173+ tests |
| Coverage | 🟡 Moderate | ~75% estimated |
| Real Tests | 🟡 Moderate | Many mocked |
| Documentation | 🟢 Good | Most tests documented |
| Maintenance | 🟢 Good | Tests recently updated |
---
## Conclusion
The Skill Seekers test suite is **comprehensive in quantity** (2173+ tests) but has **quality gaps** in critical areas:
1. **Core enhancement logic** is largely untested
2. **Multi-source scraping** orchestration lacks integration tests
3. **MCP workflow tools** have zero coverage
4. **Real external resource** testing is minimal
**Priority:** Fix the 🔴 HIGH priority gaps first, as they impact core functionality.
---
*Report generated: 2026-02-22*
*Reviewer: Systematic test review with parallel subagent analysis*