Documentation restructure: - New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps) - New docs/user-guide/ section (6 files: core concepts through troubleshooting) - New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE) - New docs/advanced/ section (custom-workflows, mcp-server, multi-source) - New docs/ARCHITECTURE.md - system architecture overview - Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/ Chinese (zh-CN) translations: - Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced) - GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml) - Translation sync checker script (scripts/check_translation_sync.sh) - Translation helper script (scripts/translate_doc.py) Content updates: - CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22 - README.md: updated with new doc structure links - AGENTS.md: updated agent documentation - docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config Analysis/planning artifacts (kept for reference): - DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md - FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md - CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
350 lines
10 KiB
Markdown
350 lines
10 KiB
Markdown
# Implementation Gaps Analysis - Current Codebase
|
|
|
|
> **Analysis Date:** 2026-02-16
|
|
> **Scope:** Integration gaps, duplicate code, missing connections in CURRENT implementation
|
|
|
|
---
|
|
|
|
## 🚨 Critical Integration Gaps
|
|
|
|
### 1. Unified Scraper Does NOT Use Workflow Runner
|
|
|
|
**Gap:** `unified_scraper.py` has its own scraping logic instead of using the shared `workflow_runner.py`
|
|
|
|
**Evidence:**
|
|
```bash
|
|
$ grep -n "workflow_runner" src/skill_seekers/cli/unified_scraper.py
|
|
# (no results)
|
|
```
|
|
|
|
**Other scrapers DO use workflow_runner:**
|
|
- ✅ `doc_scraper.py` - uses `run_workflows()`
|
|
- ✅ `github_scraper.py` - uses `run_workflows()`
|
|
- ✅ `pdf_scraper.py` - uses `run_workflows()`
|
|
- ✅ `codebase_scraper.py` - uses `run_workflows()`
|
|
- ❌ `unified_scraper.py` - DOES NOT use `run_workflows()`
|
|
|
|
**Impact:**
|
|
- Unified scraper cannot use enhancement workflows
|
|
- Inconsistent behavior between single-source and multi-source scraping
|
|
- Code duplication in enhancement logic
|
|
|
|
**Fix:**
|
|
```python
|
|
# Add to unified_scraper.py
|
|
from skill_seekers.cli.workflow_runner import run_workflows
|
|
|
|
# After scraping all sources
|
|
context = run_workflows(
|
|
workflows=args.enhance_workflow,
|
|
inline_stages=args.enhance_stage,
|
|
scraper_context={"name": skill_name, "source_type": "unified"},
|
|
args=args
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Duplicate Enhancer Classes (Old vs New)
|
|
|
|
**Gap:** Both old and new enhancer modules exist and are used simultaneously
|
|
|
|
**Old modules (should be deprecated):**
|
|
- `ai_enhancer.py` - Old AIEnhancer class
|
|
- `config_enhancer.py` - Old ConfigEnhancer class
|
|
- `guide_enhancer.py` - Old GuideEnhancer class
|
|
|
|
**New unified module:**
|
|
- `unified_enhancer.py` - New UnifiedEnhancer class (replaces all above)
|
|
|
|
**Files still importing OLD modules:**
|
|
```
|
|
architectural_pattern_detector.py → ai_enhancer.AIEnhancer
|
|
codebase_scraper.py → ai_enhancer.PatternEnhancer, config_enhancer.ConfigEnhancer
|
|
config_extractor.py → config_enhancer.ConfigEnhancer
|
|
enhancement_workflow.py → ai_enhancer.PatternEnhancer, TestExampleEnhancer, AIEnhancer
|
|
how_to_guide_builder.py → guide_enhancer.GuideEnhancer
|
|
pattern_recognizer.py → ai_enhancer.PatternEnhancer
|
|
test_example_extractor.py → ai_enhancer.TestExampleEnhancer
|
|
```
|
|
|
|
**New unified_enhancer.py exports:**
|
|
```python
|
|
class UnifiedEnhancer: ...
|
|
class PatternEnhancer(UnifiedEnhancer): ...
|
|
class TestExampleEnhancer(UnifiedEnhancer): ...
|
|
class GuideEnhancer(UnifiedEnhancer): ...
|
|
class ConfigEnhancer(UnifiedEnhancer): ...
|
|
AIEnhancer = UnifiedEnhancer # Alias for compatibility
|
|
```
|
|
|
|
**Impact:**
|
|
- Maintenance burden (fix bugs in multiple places)
|
|
- Inconsistent behavior
|
|
- Confusion about which enhancer to use
|
|
- Larger codebase
|
|
|
|
**Fix:**
|
|
1. Migrate all imports from old modules to `unified_enhancer.py`
|
|
2. Deprecate old modules with warnings
|
|
3. Eventually remove old modules
|
|
|
|
---
|
|
|
|
### 3. MCP Tools Missing Several CLI Commands
|
|
|
|
**CLI Commands (20):**
|
|
1. ✅ create - Has MCP equivalent
|
|
2. ✅ config - Has MCP equivalent
|
|
3. ✅ scrape - Has MCP equivalent
|
|
4. ✅ github - Has MCP equivalent
|
|
5. ✅ package - Has MCP equivalent
|
|
6. ✅ upload - Has MCP equivalent
|
|
7. ✅ analyze - Has MCP equivalent (scrape_codebase)
|
|
8. ✅ enhance - Has MCP equivalent
|
|
9. ❌ enhance-status - **NO MCP equivalent**
|
|
10. ✅ pdf - Has MCP equivalent
|
|
11. ✅ unified - Has MCP equivalent (unified_scrape)
|
|
12. ✅ estimate - Has MCP equivalent
|
|
13. ✅ install - Has MCP equivalent
|
|
14. ❌ install-agent - **NO MCP equivalent**
|
|
15. ✅ extract-test-examples - Has MCP equivalent
|
|
16. ❌ resume - **NO MCP equivalent**
|
|
17. ❌ stream - **NO MCP equivalent**
|
|
18. ❌ update - **NO MCP equivalent**
|
|
19. ❌ multilang - **NO MCP equivalent**
|
|
20. ❌ quality - **NO MCP equivalent**
|
|
21. ✅ workflows - Has MCP equivalent
|
|
|
|
**Missing in MCP (7 commands):**
|
|
- `enhance-status` - Monitor background enhancement
|
|
- `install-agent` - Install to IDE agents (Cursor, etc.)
|
|
- `resume` - Resume interrupted jobs
|
|
- `stream` - Stream large files
|
|
- `update` - Incremental updates
|
|
- `multilang` - Multi-language docs
|
|
- `quality` - Quality scoring
|
|
|
|
**Impact:**
|
|
- Cannot use full functionality via MCP
|
|
- CLI and MCP have different capabilities
|
|
- Users restricted when using AI agents
|
|
|
|
---
|
|
|
|
### 4. Create Command Does Not Use Unified Infrastructure
|
|
|
|
**Gap:** `create_command.py` routes to individual scrapers instead of using unified system
|
|
|
|
**Current flow:**
|
|
```
|
|
create_command.py → detects source → calls individual scraper
|
|
→ doc_scraper.main()
|
|
→ github_scraper.main()
|
|
→ pdf_scraper.main()
|
|
→ codebase_scraper.main()
|
|
```
|
|
|
|
**Gap:** Each scraper has its own argument parsing and workflow logic
|
|
|
|
**Impact:**
|
|
- Inconsistent argument handling
|
|
- Duplicated workflow code
|
|
- Harder to maintain
|
|
|
|
**Note:** This is partially mitigated by workflow_runner usage in individual scrapers
|
|
|
|
---
|
|
|
|
### 5. Conflict Detector Not Integrated with Unified Scraper
|
|
|
|
**Gap:** `conflict_detector.py` exists but may not be fully utilized
|
|
|
|
**Evidence:**
|
|
```python
|
|
# unified_scraper.py imports it:
|
|
from skill_seekers.cli.conflict_detector import ConflictDetector
|
|
|
|
# But check integration depth...
|
|
```
|
|
|
|
**Need to verify:**
|
|
- Does unified scraper actually run conflict detection?
|
|
- Are conflicts reported to users?
|
|
- Can users act on conflict reports?
|
|
|
|
---
|
|
|
|
## 🟠 Medium Priority Gaps
|
|
|
|
### 6. Enhancement Workflow Engine vs Old Enhancers
|
|
|
|
**Gap:** `enhancement_workflow.py` (new) may not fully replace old enhancer usage
|
|
|
|
**enhancement_workflow.py:**
|
|
- Uses `UnifiedEnhancer` (new)
|
|
- Supports YAML workflow presets
|
|
- Sequential stage execution
|
|
|
|
**Old enhancers:**
|
|
- Direct class instantiation
|
|
- No workflow support
|
|
- Used in codebase_scraper, pattern_recognizer, etc.
|
|
|
|
**Impact:** Two enhancement systems running in parallel
|
|
|
|
---
|
|
|
|
### 7. Resume Command Limited Scope
|
|
|
|
**Gap:** `resume_command.py` only works with specific scrapers
|
|
|
|
**Questions:**
|
|
- Does resume work with unified scraper?
|
|
- Does resume work with PDF scraping?
|
|
- Is resume state stored consistently?
|
|
|
|
---
|
|
|
|
### 8. Argument Parsing Duplication
|
|
|
|
**Gap:** Multiple argument parsers for similar functionality
|
|
|
|
**Files:**
|
|
- `parsers/doc_parser.py`
|
|
- `parsers/github_parser.py`
|
|
- `parsers/pdf_parser.py`
|
|
- `parsers/create_parser.py`
|
|
- `arguments/` directory with multiple files
|
|
|
|
**Gap:** No unified argument validation across parsers
|
|
|
|
---
|
|
|
|
## 🟡 Minor Gaps
|
|
|
|
### 9. Storage Adapters Not Used in Core Flow
|
|
|
|
**Gap:** Cloud storage adapters exist but may not be integrated
|
|
|
|
```
|
|
storage/
|
|
├── base_storage.py
|
|
├── s3_storage.py
|
|
├── gcs_storage.py
|
|
└── azure_storage.py
|
|
```
|
|
|
|
**Check:** Are these actually used in CLI commands or just standalone?
|
|
|
|
---
|
|
|
|
### 10. Benchmark Framework Underutilized
|
|
|
|
**Gap:** `benchmark/` module exists but may not be integrated into main flow
|
|
|
|
**Check:** Is benchmarking automatically run? Can users easily benchmark their skills?
|
|
|
|
---
|
|
|
|
## 📊 Gap Summary Matrix
|
|
|
|
| # | Gap | Severity | Files Affected | Effort to Fix |
|
|
|---|-----|----------|----------------|---------------|
|
|
| 1 | Unified scraper → workflow_runner | 🔴 Critical | unified_scraper.py | Medium |
|
|
| 2 | Duplicate enhancer classes | 🔴 Critical | 8 files import old | High |
|
|
| 3 | Missing MCP tools (7) | 🔴 Critical | MCP parity | Medium |
|
|
| 4 | Create command routing | 🟠 Medium | create_command.py | Medium |
|
|
| 5 | Conflict detector integration | 🟠 Medium | unified_scraper.py | Low |
|
|
| 6 | Old vs new enhancer systems | 🟠 Medium | Multiple | High |
|
|
| 7 | Resume scope | 🟠 Medium | resume_command.py | Low |
|
|
| 8 | Argument parsing duplication | 🟡 Minor | parsers/ | Medium |
|
|
| 9 | Storage adapters usage | 🟡 Minor | storage/ | Low |
|
|
| 10 | Benchmark integration | 🟡 Minor | benchmark/ | Low |
|
|
|
|
---
|
|
|
|
## 🎯 Recommended Fixes (Priority Order)
|
|
|
|
### Phase 1: Critical (Immediate)
|
|
|
|
1. **Add workflow_runner to unified_scraper.py**
|
|
```python
|
|
from skill_seekers.cli.workflow_runner import run_workflows
|
|
|
|
# In main():
|
|
if args.enhance_workflow or args.enhance_stage:
|
|
context = run_workflows(...)
|
|
```
|
|
|
|
2. **Migrate old enhancer imports to unified_enhancer**
|
|
- Replace `from ai_enhancer import X` with `from unified_enhancer import X`
|
|
- Test all affected modules
|
|
- Add deprecation warnings to old modules
|
|
|
|
3. **Add missing MCP tools**
|
|
- `resume_tool` - Resume interrupted jobs
|
|
- `update_tool` - Incremental updates
|
|
- `quality_tool` - Quality scoring
|
|
- `stream_tool` - Streaming mode
|
|
- `multilang_tool` - Multi-language support
|
|
- `enhance_status_tool` - Monitor enhancement
|
|
- `install_agent_tool` - IDE agent installation
|
|
|
|
### Phase 2: Medium Priority
|
|
|
|
4. **Audit conflict_detector usage**
|
|
- Verify it's called in unified_scraper
|
|
- Add conflict reporting to output
|
|
|
|
5. **Consolidate argument parsing**
|
|
- Create shared argument definitions
|
|
- Use composition instead of duplication
|
|
|
|
### Phase 3: Cleanup
|
|
|
|
6. **Deprecate old enhancer modules**
|
|
```python
|
|
# In ai_enhancer.py, config_enhancer.py, guide_enhancer.py
|
|
import warnings
|
|
warnings.warn("This module is deprecated. Use unified_enhancer instead.", DeprecationWarning)
|
|
```
|
|
|
|
7. **Remove old modules** (after migration complete)
|
|
|
|
---
|
|
|
|
## 🔍 Verification Commands
|
|
|
|
```bash
|
|
# Check workflow_runner usage
|
|
grep -r "from.*workflow_runner" src/skill_seekers/cli/*.py
|
|
grep -r "run_workflows" src/skill_seekers/cli/*.py
|
|
|
|
# Check old enhancer imports
|
|
grep -r "from.*ai_enhancer\|from.*config_enhancer\|from.*guide_enhancer" src/skill_seekers/cli/*.py | grep -v "^src/skill_seekers/cli/\(ai_enhancer\|config_enhancer\|guide_enhancer\).py"
|
|
|
|
# Check MCP tools
|
|
grep -n "@mcp.tool\|def.*_tool" src/skill_seekers/mcp/server_fastmcp.py | wc -l
|
|
|
|
# Compare CLI vs MCP
|
|
skill-seekers --help | grep "^ [a-z]" | wc -l # 20 CLI commands
|
|
grep -c "@mcp.tool" src/skill_seekers/mcp/server_fastmcp.py # Should match
|
|
```
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The **biggest gaps** are:
|
|
|
|
1. **Unified scraper missing workflow support** - Critical for feature parity
|
|
2. **Old enhancer code still in use** - Technical debt, maintenance burden
|
|
3. **MCP missing 7 CLI commands** - Limits AI agent capabilities
|
|
|
|
These are **integration gaps in existing features**, not missing features. The functionality exists but isn't properly connected.
|
|
|
|
---
|
|
|
|
*Analysis complete. Recommend Phase 1 fixes immediately.*
|