## Summary
Add `skill-seekers sync-config` subcommand that crawls a docs site's navigation,
diffs discovered URLs against a config's start_urls, and optionally writes the
updated list back with --apply.
- BFS link discovery with configurable depth (default 2), max-pages, rate-limit
- Respects url_patterns.include/exclude from config
- Supports optional nav_seed_urls config field
- Handles both unified (sources array) and legacy flat config formats
- MCP tool sync_config included
- 57 tests (39 unit + 18 E2E with local HTTP server)
- Fixed CI: renamed summary job to "Tests" to match branch protection rule
Closes#306
Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.
New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify
Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan
All 2523 tests passing (15 skipped).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix extract_visual_data returning 2-tuple instead of 3 (ValueError crash)
- Move pytesseract from core deps to [video-full] optional group
- Add 30-min timeout + user feedback to video enhancement subprocess
- Add scrape_video_impl to MCP server fallback import block
- Detect auto-generated YouTube captions via is_generated property
- Forward --vision-ocr and --video-playlist through create command
- Fix filename collision for non-ASCII video titles (fallback to video_id)
- Make _vision_used a proper dataclass field on FrameSubSection
- Expose 6 visual params in MCP scrape_video tool
- Add install instructions on missing video deps in unified scraper
- Update MCP docstring tool counts (25→33, 7 categories)
- Add video and word commands to main.py docstring
- Document video-full exclusion from [all] deps in pyproject.toml
- Update parser registry test count (22→23 for video parser)
All 2437 tests passing, 0 failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add YAML-based enhancement workflow presets shipped inside the package
(default, minimal, security-focus, architecture-comprehensive, api-documentation)
- Add `skill-seekers workflows` subcommand: list, show, copy, add, remove, validate
- copy/add/remove all accept multiple names/files in one invocation with partial-failure behaviour
- `add --name` override restricted to single-file operations
- Add 5 MCP tools: list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
- Fix: create command _add_common_args() now correctly forwards each --enhance-workflow
as a separate flag instead of passing the whole list as a single argument
- Update README: reposition as "data layer for AI systems" with AI Skills front and centre
- Update CHANGELOG, QUICK_REFERENCE, CLAUDE.md with workflow preset details
- 1,880+ tests passing
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Filter out chunks smaller than min_chunk_size (default 100 tokens)
- Exception: Keep all chunks if entire document is smaller than target size
- All 15 tests passing (100% pass rate)
Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were
being created despite min_chunk_size=100 setting.
Test: pytest tests/test_rag_chunker.py -v
Fixed incorrect variable names in list comprehensions that were causing
NameError in CI (Python 3.11/3.12):
Critical fixes:
- tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension
- src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences)
Additional auto-lint fixes:
- Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py
- Fixed comparison operators in config files
- Fixed list comprehension in other files
All tests now pass in CI.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive AI enhancement to C3.4 Configuration Pattern Extraction
similar to C3.3's dual-mode architecture (API + LOCAL).
NEW CAPABILITIES (What users can do now):
1. **AI-Powered Config Analysis** - Understand what configs do, not just extract them
- Explanations: What each configuration setting does
- Best Practices: Suggested improvements and better organization
- Security Analysis: Identifies hardcoded secrets, exposed credentials
- Migration Suggestions: Opportunities to consolidate configs
- Context: Explains detected patterns and when to use them
2. **Dual-Mode AI Support** (Same as C3.3):
- API Mode: Claude API analyzes configs (requires ANTHROPIC_API_KEY)
- LOCAL Mode: Claude Code CLI (FREE, no API key needed)
- AUTO Mode: Automatically detects best available mode
3. **Seamless Integration**:
- CLI: --enhance, --enhance-local, --ai-mode flags
- Codebase Scraper: Works with existing enhance_with_ai parameter
- MCP Tools: Enhanced extract_config_patterns with AI parameters
- Optional: Enhancement only runs when explicitly requested
Components Added:
- ConfigEnhancer class (~400 lines) - Dual-mode AI enhancement engine
- Enhanced CLI flags in config_extractor.py
- AI integration in codebase_scraper.py config extraction workflow
- MCP tool parameter expansion (enhance, enhance_local, ai_mode)
- FastMCP server tool signature updates
- Comprehensive documentation in CHANGELOG.md and README.md
Performance:
- Basic extraction: ~3 seconds for 100 config files
- With AI enhancement: +30-60 seconds (LOCAL mode, FREE)
- With AI enhancement: +20-40 seconds (API mode, ~$0.10-0.20)
Use Cases:
- Security audits: Find hardcoded secrets across all configs
- Migration planning: Identify consolidation opportunities
- Onboarding: Understand what each config file does
- Best practices: Get improvement suggestions for config organization
Technical Details:
- Structured JSON prompts for reliable AI responses
- 5 enhancement categories: explanations, best_practices, security, migration, context
- Graceful fallback if AI enhancement fails
- Security findings logged separately for visibility
- Results stored in JSON under 'ai_enhancements' key
Testing:
- 28 comprehensive tests in test_config_extractor.py
- Tests cover: file detection, parsing, pattern detection, enhancement modes
- All integrations tested: CLI, codebase_scraper, MCP tools
Documentation:
- CHANGELOG.md: Complete C3.4 feature description
- README.md: Updated C3.4 section with AI enhancement
- MCP tool descriptions: Added AI enhancement details
Related Issues: #74🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add build_dependency_graph parameter to scrape_codebase MCP tool
- Update tool documentation with new parameter
- Pass --build-dependency-graph flag to CLI command
- Update FastMCP server function signature
Usage via MCP:
scrape_codebase(
directory="/path/to/repo",
build_dependency_graph=True
)
This completes the C2.6 feature set by exposing dependency graph
generation through the MCP interface, making it available to all
MCP clients (Claude Code, Cursor, etc.).