Add unified multi-source scraping feature (Phases 7-11)
Completes the unified scraping system implementation: **Phase 7: Unified Skill Builder** - cli/unified_skill_builder.py: Generates final skill structure - Inline conflict warnings (⚠️) in API reference - Side-by-side docs vs code comparison - Severity-based conflict grouping - Separate conflicts.md report **Phase 8: MCP Integration** - skill_seeker_mcp/server.py: Auto-detects unified vs legacy configs - Routes to unified_scraper.py or doc_scraper.py automatically - Supports merge_mode parameter override - Maintains full backward compatibility **Phase 9: Example Unified Configs** - configs/react_unified.json: React docs + GitHub - configs/django_unified.json: Django docs + GitHub - configs/fastapi_unified.json: FastAPI docs + GitHub - configs/fastapi_unified_test.json: Test config with limited pages **Phase 10: Comprehensive Tests** - cli/test_unified_simple.py: Integration tests (all passing) - Tests unified config validation - Tests backward compatibility - Tests mixed source types - Tests error handling **Phase 11: Documentation** - docs/UNIFIED_SCRAPING.md: Complete guide (1000+ lines) - Examples, best practices, troubleshooting - Architecture diagrams and data flow - Command reference **Additional:** - demo_conflicts.py: Interactive conflict detection demo - TEST_RESULTS.md: Complete test results and findings - cli/unified_scraper.py: Fixed doc_scraper integration (subprocess) **Features:** ✅ Multi-source scraping (docs + GitHub + PDF) ✅ Conflict detection (4 types, 3 severity levels) ✅ Rule-based merging (fast, deterministic) ✅ Claude-enhanced merging (AI-powered) ✅ Transparent conflict reporting ✅ MCP auto-detection ✅ Backward compatibility **Test Results:** - 6/6 integration tests passed - 4 unified configs validated - 3 legacy configs backward compatible - 5 conflicts detected in test data - All documentation complete 🤖 Generated with Claude Code
This commit is contained in:
45
configs/fastapi_unified.json
Normal file
45
configs/fastapi_unified.json
Normal file
@@ -0,0 +1,45 @@
|
||||
{
|
||||
"name": "fastapi",
|
||||
"description": "Complete FastAPI knowledge combining official documentation and FastAPI codebase. Use when building FastAPI applications, understanding async patterns, or working with Pydantic models.",
|
||||
"merge_mode": "rule-based",
|
||||
"sources": [
|
||||
{
|
||||
"type": "documentation",
|
||||
"base_url": "https://fastapi.tiangolo.com/",
|
||||
"extract_api": true,
|
||||
"selectors": {
|
||||
"main_content": "article",
|
||||
"title": "h1",
|
||||
"code_blocks": "pre code"
|
||||
},
|
||||
"url_patterns": {
|
||||
"include": [],
|
||||
"exclude": ["/img/", "/js/"]
|
||||
},
|
||||
"categories": {
|
||||
"getting_started": ["tutorial", "first-steps"],
|
||||
"path_operations": ["path-params", "query-params", "body"],
|
||||
"dependencies": ["dependencies"],
|
||||
"security": ["security", "oauth2"],
|
||||
"database": ["sql-databases"],
|
||||
"advanced": ["advanced", "async", "middleware"],
|
||||
"deployment": ["deployment"]
|
||||
},
|
||||
"rate_limit": 0.5,
|
||||
"max_pages": 150
|
||||
},
|
||||
{
|
||||
"type": "github",
|
||||
"repo": "tiangolo/fastapi",
|
||||
"include_issues": true,
|
||||
"max_issues": 100,
|
||||
"include_changelog": true,
|
||||
"include_releases": true,
|
||||
"include_code": true,
|
||||
"code_analysis_depth": "surface",
|
||||
"file_patterns": [
|
||||
"fastapi/**/*.py"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user