Fixes critical bug where RST/Markdown files in documentation
directories were not being parsed with the unified parser system.
Issue:
- Documentation files were found and categorized
- But were only copied, not parsed with unified RstParser/MarkdownParser
- Result: 0 tables, 0 cross-references extracted from 1,579 RST files
Fix:
- Updated extract_project_documentation() to use RstParser for .rst files
- Updated extract_project_documentation() to use MarkdownParser for .md files
- Extract rich structured data: tables, cross-refs, directives, quality scores
- Save extraction summary with parser version
Results (Godot documentation test):
- Enhanced files: 1,579/1,579 (100%)
- Tables extracted: 1,426 (was 0)
- Cross-references: 42,715 (was 0)
- Code blocks: 770 (with quality scoring)
Impact:
- Documentation extraction now benefits from unified parser system
- Complete parity with web documentation scraping (doc_scraper.py)
- RST API docs fully parsed (classes, methods, properties, signals)
- All content gets quality scoring
Files Changed:
- src/skill_seekers/cli/codebase_scraper.py (~100 lines)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>