## Summary
Add `skill-seekers sync-config` subcommand that crawls a docs site's navigation,
diffs discovered URLs against a config's start_urls, and optionally writes the
updated list back with --apply.
- BFS link discovery with configurable depth (default 2), max-pages, rate-limit
- Respects url_patterns.include/exclude from config
- Supports optional nav_seed_urls config field
- Handles both unified (sources array) and legacy flat config formats
- MCP tool sync_config included
- 57 tests (39 unit + 18 E2E with local HTTP server)
- Fixed CI: renamed summary job to "Tests" to match branch protection rule
Closes#306
Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.
New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify
Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan
All 2523 tests passing (15 skipped).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix extract_visual_data returning 2-tuple instead of 3 (ValueError crash)
- Move pytesseract from core deps to [video-full] optional group
- Add 30-min timeout + user feedback to video enhancement subprocess
- Add scrape_video_impl to MCP server fallback import block
- Detect auto-generated YouTube captions via is_generated property
- Forward --vision-ocr and --video-playlist through create command
- Fix filename collision for non-ASCII video titles (fallback to video_id)
- Make _vision_used a proper dataclass field on FrameSubSection
- Expose 6 visual params in MCP scrape_video tool
- Add install instructions on missing video deps in unified scraper
- Update MCP docstring tool counts (25→33, 7 categories)
- Add video and word commands to main.py docstring
- Document video-full exclusion from [all] deps in pyproject.toml
- Update parser registry test count (22→23 for video parser)
All 2437 tests passing, 0 failures.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- pyproject.toml: version 3.0.0 → 3.1.0
- src/skill_seekers/_version.py: update hardcoded fallback to 3.1.0
- CHANGELOG.md: comprehensive [3.1.0] release notes covering all
features and fixes since v3.0.0 (unified create command, workflow
presets, RST parser, smart enhance dispatcher, CLI flag parity,
60 new workflow YAMLs, test suite improvements)
- Deprecation messages: update "removed in v3.0.0" → "v4.0.0" across
analyze_presets.py, codebase_scraper.py, mcp/server.py
- tests/test_cli_paths.py: update version assertion to 3.1.0
- tests/test_package_structure.py: update __version__ assertions to 3.1.0
- tests/test_preset_system.py: update deprecation message version to v4.0.0
All 2267 tests passing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes ruff format --check CI failure. 22 files reformatted to satisfy
the ruff formatter's style requirements. No logic changes, only
whitespace/formatting adjustments.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add YAML-based enhancement workflow presets shipped inside the package
(default, minimal, security-focus, architecture-comprehensive, api-documentation)
- Add `skill-seekers workflows` subcommand: list, show, copy, add, remove, validate
- copy/add/remove all accept multiple names/files in one invocation with partial-failure behaviour
- `add --name` override restricted to single-file operations
- Add 5 MCP tools: list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
- Fix: create command _add_common_args() now correctly forwards each --enhance-workflow
as a separate flag instead of passing the whole list as a single argument
- Update README: reposition as "data layer for AI systems" with AI Skills front and centre
- Update CHANGELOG, QUICK_REFERENCE, CLAUDE.md with workflow preset details
- 1,880+ tests passing
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed 7 ruff linting errors:
- SIM102: Simplified nested if statements in rag_chunker.py
- SIM113: Use enumerate() in streaming_ingest.py
- ARG001: Prefix unused signal handler args with underscore
- SIM105: Replace try-except-pass with contextlib.suppress (3 instances)
Fixed 7 MCP server test failures:
- Updated generate_config_tool to output unified format (not legacy)
- Updated test_validate_valid_config to use unified format
- Renamed test_submit_config_accepts_legacy_format to
test_submit_config_rejects_legacy_format (tests rejection, not acceptance)
- Updated all submit_config tests to use unified format:
- test_submit_config_requires_token
- test_submit_config_from_file_path
- test_submit_config_detects_category
- test_submit_config_validates_name_format
- test_submit_config_validates_url_format
Added v3.0.0 release planning documents:
- RELEASE_EXECUTIVE_SUMMARY_v3.0.0.md (one-page overview)
- RELEASE_PLAN_v3.0.0.md (complete 4-week campaign)
- RELEASE_CONTENT_CHECKLIST_v3.0.0.md (content creation guide)
All tests should now pass. Ready for v3.0.0 release.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Filter out chunks smaller than min_chunk_size (default 100 tokens)
- Exception: Keep all chunks if entire document is smaller than target size
- All 15 tests passing (100% pass rate)
Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were
being created despite min_chunk_size=100 setting.
Test: pytest tests/test_rag_chunker.py -v
- Remove hardcoded version string
- Import from skill_seekers._version instead
- Ensures single source of truth for version management
- Future version bumps only need pyproject.toml update
Thanks @franklegolasyoung for the excellent work on the core fixes for issues #267, #242, and #260! 🙏
Your comprehensive approach to fixing PDF processing, expanding workflow detection, and improving the Chinese README documentation is much appreciated. I've added code quality fixes and comprehensive tests to ensure everything passes CI.
All 1266+ tests are now passing, and the issues are resolved! 🎉
- Create src/skill_seekers/_version.py as single source of truth
- Read version dynamically from pyproject.toml at runtime
- Update all __init__.py files to import from _version module
- Add tomli dependency for Python <3.11 (built-in tomllib for 3.11+)
- Remove hardcoded version duplicates (2.7.2 in 3 files)
- Fixes version mismatch: pyproject.toml (2.7.4) vs __init__.py (2.7.2)
Benefits:
- Single place to update version (pyproject.toml)
- No more version mismatches across files
- Automatic version consistency
- Works across Python 3.10-3.13
Before:
- pyproject.toml: 2.7.4
- src/skill_seekers/__init__.py: 2.7.2
- src/skill_seekers/cli/__init__.py: 2.7.2
- src/skill_seekers/mcp/__init__.py: 2.7.2
After:
- pyproject.toml: 2.7.4 (single source of truth)
- All other files: import from _version.py
Complete implementation of C3.9, granular AI enhancement control, performance optimizations, and bug fixes.
Features:
- C3.9 Project Documentation Extraction (markdown files)
- Granular AI enhancement control (--enhance-level 0-3)
- C# test extraction support
- 6-12x faster LOCAL mode with parallel execution
- Auto-enhancement UX improvements
- LOCAL mode fallback for all AI enhancements
Bug Fixes:
- C# language support
- Config type field compatibility
- LocalSkillEnhancer import
Documentation:
- Updated CHANGELOG.md
- Updated CLAUDE.md
- Removed client-specific files
Tests: All 1,257 tests passing
Critical linter errors: Fixed
CRITICAL BUG FIX - Resolves 404 errors when fetching configs from API
Root Cause:
The code was constructing download URLs manually:
download_url = f"{API_BASE_URL}/api/download/{config_name}.json"
This fails because the API provides download_url in the response, which
may differ from the constructed path (e.g., CDN URLs, version-specific paths).
Solution:
Changed both MCP server implementations to use download_url from API:
download_url = config_info.get("download_url")
Added validation check for missing download_url field.
Files Modified:
- src/skill_seekers/mcp/tools/source_tools.py (FastMCP server, line 285-297)
- src/skill_seekers/mcp/server_legacy.py (Legacy server, line 1483-1494)
Bug Report:
User reported: skill-seekers install --config godot --unlimited
- API check: /api/configs/godot → 200 OK ✅
- Download: /api/download/godot.json → 404 Not Found ❌
After Fix:
- Uses download_url from API response → Works correctly ✅
Testing:
✅ All 15 source tools tests pass (test_mcp_fastmcp.py::TestSourceTools)
✅ All 8 fetch_config tests pass
✅ test_fetch_config_download_api: PASSED
✅ test_fetch_config_from_source: PASSED
Impact:
- Fixes config downloads from official API (skillseekersweb.com)
- Fixes config downloads from private Git repositories
- Prevents all future 404 errors from URL construction mismatch
- No breaking changes - fully backward compatible
Related Issue: Bug reported by user when testing Godot skill
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Start development cycle for v2.8.0.
Version updated in 5 locations:
- pyproject.toml
- src/skill_seekers/__init__.py
- src/skill_seekers/cli/__init__.py
- src/skill_seekers/mcp/__init__.py
- src/skill_seekers/mcp/tools/__init__.py
All version numbers synchronized to prevent Issue #248.
[Unreleased] section in CHANGELOG.md ready for v2.8.0 changes.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This PR improves MCP server configuration by updating all documentation
to use the current server_fastmcp module and ensuring setup scripts
automatically use virtual environment Python instead of system Python.
## Changes
### 1. Documentation Updates (server → server_fastmcp)
Updated all references from deprecated `server` module to `server_fastmcp`:
**User-facing documentation:**
- examples/http_transport_examples.sh: All 13 command examples
- README.md: Configuration examples and troubleshooting commands
- docs/guides/MCP_SETUP.md: Enhanced migration guide with stdio/HTTP examples
- docs/guides/TESTING_GUIDE.md: Test import statements
- docs/guides/MULTI_AGENT_SETUP.md: Updated examples
- docs/guides/SETUP_QUICK_REFERENCE.md: Updated paths
- CLAUDE.md: CLI command examples
**MCP module:**
- src/skill_seekers/mcp/README.md: Updated config examples
- src/skill_seekers/mcp/agent_detector.py: Use server_fastmcp module
Note: Historical release notes (CHANGELOG.md) preserved unchanged.
### 2. Venv Python Configuration
**setup_mcp.sh improvements:**
- Added automatic venv detection (checks .venv, venv, and $VIRTUAL_ENV)
- Sets PYTHON_CMD to venv Python path when available
- **CRITICAL FIX**: Now updates PYTHON_CMD after creating/activating venv
- Generates MCP configs with full venv Python path
- Falls back to system python3 if no venv found
- Displays detected Python version and path
**Config examples updated:**
- .claude/mcp_config.example.json: Use venv Python path
- example-mcp-config.json: Use venv Python path
- Added "type": "stdio" for clarity
- Updated to use server_fastmcp module
### 3. Bug Fix: PYTHON_CMD Not Updated After Venv Creation
Previously, when setup_mcp.sh created or activated a venv, it failed to
update PYTHON_CMD, causing generated configs to still use system python3.
**Fixed cases:**
- When $VIRTUAL_ENV is already set → Update PYTHON_CMD to venv Python
- When existing venv is activated → Set PYTHON_CMD="$REPO_PATH/venv/bin/python3"
- When new venv is created → Set PYTHON_CMD="$REPO_PATH/venv/bin/python3"
## Benefits
### For Users:
✅ No deprecation warnings - All docs show current module
✅ Proper Python environment - MCP uses venv with all dependencies
✅ No system Python issues - Avoids "module not found" errors
✅ No global installation needed - No --break-system-packages required
✅ Automatic detection - setup_mcp.sh finds venv automatically
✅ Clean isolation - Projects don't interfere with system Python
### For Maintainers:
✅ Prepared for v3.0.0 - Documentation ready for server.py removal
✅ Reduced support burden - Fewer MCP configuration issues
✅ Consistent examples - All docs use same module/pattern
## Testing
**Verified:**
- ✅ All command examples use server_fastmcp
- ✅ No deprecated module references in user-facing docs (0 results)
- ✅ New module correctly referenced (129 instances)
- ✅ setup_mcp.sh detects venv and generates correct config
- ✅ PYTHON_CMD properly updated after venv creation
- ✅ MCP server starts correctly with venv Python
**Files changed:** 12 files (+262/-107 lines)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed case-sensitivity bug where regex patterns failed to match output messages
due to case mismatch between 'saved to:' (lowercase in regex) and 'Saved to:'
(uppercase in actual output).
Changes:
- Line 529: Added (?i) flag to config path extraction regex
- Line 668: Added (?i) flag to package path extraction regex
This fixes the issue where 'skill-seekers install --config react' would:
1. Successfully download and save config to disk
2. Output: '📂 Saved to: output/react.json'
3. But fail with '❌ Failed to fetch config' due to regex mismatch
The workflow now correctly continues to Phase 2 (scraping) after fetching config.
Also updated comment on line 528 to reflect actual output format with emoji.
Fixes#236
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed version mismatch bug where hardcoded versions were out of sync
with pyproject.toml.
Updated version from 2.5.2 to 2.7.0 in:
- src/skill_seekers/__init__.py
- src/skill_seekers/cli/__init__.py
- src/skill_seekers/mcp/__init__.py
- src/skill_seekers/mcp/tools/__init__.py
Now skill-seekers --version correctly reports: 2.7.0
Fixes#248
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed incorrect variable names in list comprehensions that were causing
NameError in CI (Python 3.11/3.12):
Critical fixes:
- tests/test_markdown_parsing.py: 'l' → 'link' in list comprehension
- src/skill_seekers/cli/pdf_extractor_poc.py: 'l' → 'line' (2 occurrences)
Additional auto-lint fixes:
- Removed unused imports in llms_txt_downloader.py, llms_txt_parser.py
- Fixed comparison operators in config files
- Fixed list comprehension in other files
All tests now pass in CI.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive AI enhancement to C3.4 Configuration Pattern Extraction
similar to C3.3's dual-mode architecture (API + LOCAL).
NEW CAPABILITIES (What users can do now):
1. **AI-Powered Config Analysis** - Understand what configs do, not just extract them
- Explanations: What each configuration setting does
- Best Practices: Suggested improvements and better organization
- Security Analysis: Identifies hardcoded secrets, exposed credentials
- Migration Suggestions: Opportunities to consolidate configs
- Context: Explains detected patterns and when to use them
2. **Dual-Mode AI Support** (Same as C3.3):
- API Mode: Claude API analyzes configs (requires ANTHROPIC_API_KEY)
- LOCAL Mode: Claude Code CLI (FREE, no API key needed)
- AUTO Mode: Automatically detects best available mode
3. **Seamless Integration**:
- CLI: --enhance, --enhance-local, --ai-mode flags
- Codebase Scraper: Works with existing enhance_with_ai parameter
- MCP Tools: Enhanced extract_config_patterns with AI parameters
- Optional: Enhancement only runs when explicitly requested
Components Added:
- ConfigEnhancer class (~400 lines) - Dual-mode AI enhancement engine
- Enhanced CLI flags in config_extractor.py
- AI integration in codebase_scraper.py config extraction workflow
- MCP tool parameter expansion (enhance, enhance_local, ai_mode)
- FastMCP server tool signature updates
- Comprehensive documentation in CHANGELOG.md and README.md
Performance:
- Basic extraction: ~3 seconds for 100 config files
- With AI enhancement: +30-60 seconds (LOCAL mode, FREE)
- With AI enhancement: +20-40 seconds (API mode, ~$0.10-0.20)
Use Cases:
- Security audits: Find hardcoded secrets across all configs
- Migration planning: Identify consolidation opportunities
- Onboarding: Understand what each config file does
- Best practices: Get improvement suggestions for config organization
Technical Details:
- Structured JSON prompts for reliable AI responses
- 5 enhancement categories: explanations, best_practices, security, migration, context
- Graceful fallback if AI enhancement fails
- Security findings logged separately for visibility
- Results stored in JSON under 'ai_enhancements' key
Testing:
- 28 comprehensive tests in test_config_extractor.py
- Tests cover: file detection, parsing, pattern detection, enhancement modes
- All integrations tested: CLI, codebase_scraper, MCP tools
Documentation:
- CHANGELOG.md: Complete C3.4 feature description
- README.md: Updated C3.4 section with AI enhancement
- MCP tool descriptions: Added AI enhancement details
Related Issues: #74🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add build_dependency_graph parameter to scrape_codebase MCP tool
- Update tool documentation with new parameter
- Pass --build-dependency-graph flag to CLI command
- Update FastMCP server function signature
Usage via MCP:
scrape_codebase(
directory="/path/to/repo",
build_dependency_graph=True
)
This completes the C2.6 feature set by exposing dependency graph
generation through the MCP interface, making it available to all
MCP clients (Claude Code, Cursor, etc.).
- Switch from manual package listing to automatic discovery
- Improves maintainability and prevents missing module bugs
- All tests passing (700+ tests)
- Package contents verified identical to v2.5.1
Fixes#226
Merges #227
Thanks to @iamKhan79690 for the contribution!
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Anas Ur Rehman (@iamKhan79690) <noreply@github.com>
- Replace TextContent = None with proper fallback class in all MCP tool modules
- Fixes TypeError when MCP library is not fully initialized in test environment
- Ensures all 700 tests pass (was 699 passing, 1 failing)
- Affected files:
* packaging_tools.py
* config_tools.py
* scraping_tools.py
* source_tools.py
* splitting_tools.py
The fallback class maintains the same interface as mcp.types.TextContent,
allowing tests to run successfully even when the MCP library import fails.
Test results: ✅ 700 passed, 157 skipped, 2 warnings
Issue: #11 (A1.3 test failures)
## Problem
3/8 tests were failing because ConfigValidator only validates structure
and required fields, NOT format validation (names, URLs, etc.).
## Root Cause
ConfigValidator checks:
- Required fields (name, description, sources/base_url)
- Source types validity
- Field types (arrays, integers)
ConfigValidator does NOT check:
- Name format (alphanumeric, hyphens, underscores)
- URL format (http:// or https://)
## Solution
Added additional format validation in submit_config_tool after ConfigValidator:
1. Name format validation using regex: `^[a-zA-Z0-9_-]+$`
2. URL format validation (must start with http:// or https://)
3. Validates both legacy (base_url) and unified (sources.base_url) formats
## Test Results
Before: 5/8 tests passing, 3 failing
After: 8/8 tests passing ✅
Full suite: 427 tests passing, 40 skipped ✅
## Changes Made
- src/skill_seekers/mcp/server.py:
* Added `import re` at top of file
* Added name format validation (line 1280-1281)
* Added URL format validation for legacy configs (line 1285-1289)
* Added URL format validation for unified configs (line 1291-1296)
- tests/test_mcp_server.py:
* Updated test_submit_config_validates_required_fields to accept
ConfigValidator's correct error message ("cannot detect" instead of "description")
## Validation Examples
Invalid name: "React@2024!" → ❌ "Invalid name format"
Invalid URL: "not-a-url" → ❌ "Invalid base_url format"
Valid name: "react-docs" → ✅
Valid URL: "https://react.dev/" → ✅🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements A1.2 - Add MCP tool to download configs from API
Features:
- Download config files from api.skillseekersweb.com
- List all available configs (24 configs)
- Filter configs by category
- Download specific config by name
- Save to local configs directory
- Display config metadata (category, tags, type, source, last_updated)
- Error handling for 404 and network errors
Usage:
- List configs: fetch_config with list_available=true
- Filter by category: fetch_config with list_available=true, category='web-frameworks'
- Download config: fetch_config with config_name='react'
- Custom destination: fetch_config with config_name='react', destination='my_configs/'
Technical:
- Uses httpx AsyncClient for HTTP requests
- Connects to https://api.skillseekersweb.com
- Returns formatted TextContent responses
- Supports GET /api/configs and GET /api/download endpoints
- Proper error handling for HTTP and JSON errors
Tests:
- ✅ List all configs (24 total)
- ✅ List by category filter (12 web-frameworks)
- ✅ Download specific config (react.json)
- ✅ Handle nonexistent config (404 error)
Issue: N/A (from roadmap task A1.2)