skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Fork 0

Commit Graph

Author	SHA1	Message	Date
yusyus	37cb307455	docs: update all documentation for 17 source types Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines	2026-03-15 15:56:04 +03:00
yusyus	7496c2b5e0	feat: unified document parser system with RST/Markdown/PDF support Implements comprehensive unified parser architecture for extracting structured content from multiple documentation formats with feature parity and quality scoring. Key Features: - Unified Document structure for all formats (RST, Markdown, PDF) - Enhanced RST parser: tables, cross-refs, directives, field lists - Enhanced Markdown parser: tables, images, admonitions, quality scoring - PDF parser wrapper: unified output while preserving all features - Quality scoring system for code blocks and tables - Format converters: to_markdown(), to_skill_format() - Auto-detection of document formats Architecture: - BaseParser abstract class with format-specific implementations - ContentBlock universal container with 12 block types - 14 cross-reference types (including Godot-specific) - Backward compatible with legacy parsers Integration: - doc_scraper.py: Enhanced MarkdownParser with graceful fallback - codebase_scraper.py: RstParser for .rst file processing - Maintains backward compatibility with existing workflows Test Coverage: - 75 tests passing (up from 42) - 37 comprehensive parser tests (RST, Markdown, auto-detection, quality) - Proper pytest fixtures and assertions - Zero critical warnings Documentation: - Complete architecture guide (docs/architecture/UNIFIED_PARSERS.md) - Class hierarchy diagrams and usage examples - Integration guide and extension patterns Impact: - Godot documentation extraction: 20% → 90% content coverage (+70%) - Tables: 0 → ~3,000+ extracted - Cross-references: 0 → ~50,000+ extracted - Directives: 0 → ~5,000+ extracted - All with quality scoring and validation Files Changed: - New: src/skill_seekers/cli/parsers/extractors/ (7 files, ~100KB) - New: tests/test_unified_parsers.py (37 tests) - New: docs/architecture/UNIFIED_PARSERS.md (12KB) - Modified: doc_scraper.py (enhanced Markdown extraction) - Modified: codebase_scraper.py (RST file processing) Breaking Changes: None (backward compatible) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-15 23:14:49 +03:00

Author

SHA1

Message

Date

yusyus

37cb307455

docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.

Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above

32 files changed, +3,016 lines, -245 lines

2026-03-15 15:56:04 +03:00

yusyus

7496c2b5e0

feat: unified document parser system with RST/Markdown/PDF support

Implements comprehensive unified parser architecture for extracting
structured content from multiple documentation formats with feature
parity and quality scoring.

Key Features:
- Unified Document structure for all formats (RST, Markdown, PDF)
- Enhanced RST parser: tables, cross-refs, directives, field lists
- Enhanced Markdown parser: tables, images, admonitions, quality scoring
- PDF parser wrapper: unified output while preserving all features
- Quality scoring system for code blocks and tables
- Format converters: to_markdown(), to_skill_format()
- Auto-detection of document formats

Architecture:
- BaseParser abstract class with format-specific implementations
- ContentBlock universal container with 12 block types
- 14 cross-reference types (including Godot-specific)
- Backward compatible with legacy parsers

Integration:
- doc_scraper.py: Enhanced MarkdownParser with graceful fallback
- codebase_scraper.py: RstParser for .rst file processing
- Maintains backward compatibility with existing workflows

Test Coverage:
- 75 tests passing (up from 42)
- 37 comprehensive parser tests (RST, Markdown, auto-detection, quality)
- Proper pytest fixtures and assertions
- Zero critical warnings

Documentation:
- Complete architecture guide (docs/architecture/UNIFIED_PARSERS.md)
- Class hierarchy diagrams and usage examples
- Integration guide and extension patterns

Impact:
- Godot documentation extraction: 20% → 90% content coverage (+70%)
- Tables: 0 → ~3,000+ extracted
- Cross-references: 0 → ~50,000+ extracted
- Directives: 0 → ~5,000+ extracted
- All with quality scoring and validation

Files Changed:
- New: src/skill_seekers/cli/parsers/extractors/ (7 files, ~100KB)
- New: tests/test_unified_parsers.py (37 tests)
- New: docs/architecture/UNIFIED_PARSERS.md (12KB)
- Modified: doc_scraper.py (enhanced Markdown extraction)
- Modified: codebase_scraper.py (RST file processing)

Breaking Changes: None (backward compatible)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-15 23:14:49 +03:00

2 Commits