Merges setup_mcp.sh fix for v2.0.0 src/ layout + test updates. Original fix by @501981732 in PR #197. Test updates to make CI pass. Closes #192
24 KiB
Changelog
All notable changes to Skill Seeker will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
Added
- (No unreleased changes yet)
[2.1.0] - 2025-11-12
🎉 Major Enhancement: Quality Assurance + Race Condition Fixes
This release focuses on quality and reliability improvements, adding comprehensive quality checks and fixing critical race conditions in the enhancement workflow.
🚀 Major Features
Comprehensive Quality Checker
- Automatic quality checks before packaging - Validates skill quality before upload
- Quality scoring system - 0-100 score with A-F grades
- Enhancement verification - Checks for template text, code examples, sections
- Structure validation - Validates SKILL.md, references/ directory
- Content quality checks - YAML frontmatter, language tags, "When to Use" section
- Link validation - Validates internal markdown links
- Detailed reporting - Errors, warnings, and info messages with file locations
- CLI tool -
skill-seekers-quality-checkerwith verbose and strict modes
Headless Enhancement Mode (Default)
- No terminal windows - Runs enhancement in background by default
- Proper waiting - Main console waits for enhancement to complete
- Timeout protection - 10-minute default timeout (configurable)
- Verification - Checks that SKILL.md was actually updated
- Progress messages - Clear status updates during enhancement
- Interactive mode available -
--interactive-enhancementflag for terminal mode
Added
New CLI Tools
- quality_checker.py - Comprehensive skill quality validation
- Structure checks (SKILL.md, references/)
- Enhancement verification (code examples, sections)
- Content validation (frontmatter, language tags)
- Link validation (internal markdown links)
- Quality scoring (0-100 + A-F grade)
New Features
- Headless enhancement -
skill-seekers-enhanceruns in background by default - Quality checks in packaging - Automatic validation before creating .zip
- MCP quality skip - MCP server skips interactive checks
- Enhanced error handling - Better error messages and timeout handling
Tests
- +12 quality checker tests - Comprehensive validation testing
- 391 total tests passing - Up from 379 in v2.0.0
- 0 test failures - All tests green
- CI improvements - Fixed macOS terminal detection tests
Changed
Enhancement Workflow
- Default mode changed - Headless mode is now default (was terminal mode)
- Waiting behavior - Main console waits for enhancement completion
- No race conditions - Fixed "Package your skill" message appearing too early
- Better progress - Clear status messages during enhancement
Package Workflow
- Quality checks added - Automatic validation before packaging
- User confirmation - Ask to continue if warnings/errors found
- Skip option -
--skip-quality-checkflag to bypass checks - MCP context - Automatically skips checks in non-interactive contexts
CLI Arguments
- doc_scraper.py:
- Updated
--enhance-localhelp text (mentions headless mode) - Added
--interactive-enhancementflag
- Updated
- enhance_skill_local.py:
- Changed default to
headless=True - Added
--interactive-enhancementflag - Added
--timeoutflag (default: 600 seconds)
- Changed default to
- package_skill.py:
- Added
--skip-quality-checkflag
- Added
Fixed
Critical Bugs
- Enhancement race condition - Main console no longer exits before enhancement completes
- MCP stdin errors - MCP server now skips interactive prompts
- Terminal detection tests - Fixed for headless mode default
Enhancement Issues
- Process detachment - subprocess.run() now waits properly instead of Popen()
- Timeout handling - Added timeout protection to prevent infinite hangs
- Verification - Checks file modification time and size to verify success
- Error messages - Better error handling and user-friendly messages
Test Fixes
- package_skill tests - Added skip_quality_check=True to prevent stdin errors
- Terminal detection tests - Updated to use headless=False for interactive tests
- MCP server tests - Fixed to skip quality checks in non-interactive context
Technical Details
New Modules
src/skill_seekers/cli/quality_checker.py- Quality validation enginetests/test_quality_checker.py- 12 comprehensive tests
Modified Modules
src/skill_seekers/cli/enhance_skill_local.py- Added headless modesrc/skill_seekers/cli/doc_scraper.py- Updated enhancement integrationsrc/skill_seekers/cli/package_skill.py- Added quality checkssrc/skill_seekers/mcp/server.py- Skip quality checks in MCP contexttests/test_package_skill.py- Updated for quality checkertests/test_terminal_detection.py- Updated for headless default
Commits in This Release
e279ed6- Phase 1: Enhancement race condition fix (headless mode)3272f9c- Phases 2 & 3: Quality checker implementation2dd1027- Phase 4: Tests (+12 quality checker tests)befcb89- CI Fix: Skip quality checks in MCP context67ab627- CI Fix: Update terminal tests for headless default
Upgrade Notes
Breaking Changes
- Headless mode default - Enhancement now runs in background by default
- Use
--interactive-enhancementif you want the old terminal mode - Affects:
skill-seekers-enhanceandskill-seekers scrape --enhance-local
- Use
New Behavior
- Quality checks - Packaging now runs quality checks by default
- May prompt for confirmation if warnings/errors found
- Use
--skip-quality-checkto bypass (not recommended)
Recommendations
- Try headless mode - Faster and more reliable than terminal mode
- Review quality reports - Fix warnings before packaging
- Update scripts - Add
--skip-quality-checkto automated packaging scripts if needed
Migration Guide
If you want the old terminal mode behavior:
# Old (v2.0.0): Default was terminal mode
skill-seekers-enhance output/react/
# New (v2.1.0): Use --interactive-enhancement
skill-seekers-enhance output/react/ --interactive-enhancement
If you want to skip quality checks:
# Add --skip-quality-check to package command
skill-seekers-package output/react/ --skip-quality-check
[2.0.0] - 2025-11-11
🎉 Major Release: PyPI Publication + Modern Python Packaging
Skill Seekers is now available on PyPI! Install with: pip install skill-seekers
This is a major milestone release featuring complete restructuring for modern Python packaging, comprehensive testing improvements, and publication to the Python Package Index.
🚀 Major Changes
PyPI Publication
- Published to PyPI - https://pypi.org/project/skill-seekers/
- Installation:
pip install skill-seekersoruv tool install skill-seekers - No cloning required - Install globally or in virtual environments
- Automatic dependency management - All dependencies handled by pip/uv
Modern Python Packaging
- pyproject.toml-based configuration - Standard PEP 621 metadata
- src/ layout structure - Best practice package organization
- Entry point scripts -
skill-seekerscommand available globally - Proper dependency groups - Separate dev, test, and MCP dependencies
- Build backend - setuptools-based build with uv support
Unified CLI Interface
- Single
skill-seekerscommand - Git-style subcommands - Subcommands:
scrape,github,pdf,unified,enhance,package,upload,estimate - Consistent interface - All tools accessible through one entry point
- Help system - Comprehensive
--helpfor all commands
Added
Testing Infrastructure
- 379 passing tests (up from 299) - Comprehensive test coverage
- 0 test failures - All tests passing successfully
- Test suite improvements:
- Fixed import paths for src/ layout
- Updated CLI tests for unified entry points
- Added package structure verification tests
- Fixed MCP server import tests
- Added pytest configuration in pyproject.toml
Documentation
- Updated README.md - PyPI badges, reordered installation options
- FUTURE_RELEASES.md - Roadmap for upcoming features
- Installation guides - Simplified with PyPI as primary method
- Testing documentation - How to run full test suite
Changed
Package Structure
- Moved to src/ layout:
src/skill_seekers/- Main packagesrc/skill_seekers/cli/- CLI toolssrc/skill_seekers/mcp/- MCP server
- Import paths updated - All imports use proper package structure
- Entry points configured - All CLI tools available as commands
Import Fixes
- Fixed
merge_sources.py- Corrected conflict_detector import (.conflict_detector) - Fixed MCP server tests - Updated to use
skill_seekers.mcp.serverimports - Fixed test paths - All tests updated for src/ layout
Fixed
Critical Bugs
- Import path errors - Fixed relative imports in CLI modules
- MCP test isolation - Added proper MCP availability checks
- Package installation - Resolved entry point conflicts
- Dependency resolution - All dependencies properly specified
Test Improvements
- 17 test fixes - Updated for modern package structure
- MCP test guards - Proper skipif decorators for MCP tests
- CLI test updates - Accept both exit codes 0 and 2 for help
- Path validation - Tests verify correct package structure
Technical Details
Build System
- Build backend: setuptools.build_meta
- Build command:
uv build - Publish command:
uv publish - Distribution formats: wheel + source tarball
Dependencies
- Core: requests, beautifulsoup4, PyGithub, mcp, httpx
- PDF: PyMuPDF, Pillow, pytesseract
- Dev: pytest, pytest-cov, pytest-anyio, mypy
- MCP: mcp package for Claude Code integration
Migration Guide
For Users
Old way:
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -r requirements.txt
python3 cli/doc_scraper.py --config configs/react.json
New way:
pip install skill-seekers
skill-seekers scrape --config configs/react.json
For Developers
- Update imports:
from cli.* → from skill_seekers.cli.* - Use
pip install -e ".[dev]"for development - Run tests:
python -m pytest - Entry points instead of direct script execution
Breaking Changes
- CLI interface changed - Use
skill-seekerscommand instead ofpython3 cli/... - Import paths changed - Package now at
skill_seekers.*instead ofcli.* - Installation method changed - PyPI recommended over git clone
Deprecations
- Direct script execution - Still works but deprecated (use
skill-seekerscommand) - Old import patterns - Legacy imports still work but will be removed in v3.0
Compatibility
- Python 3.10+ required
- Backward compatible - Old scripts still work with legacy CLI
- Config files - No changes required
- Output format - No changes to generated skills
[1.3.0] - 2025-10-26
Added - Refactoring & Performance Improvements
- Async/Await Support for Parallel Scraping (2-3x performance boost)
--asyncflag to enable async modeasync def scrape_page_async()method using httpx.AsyncClientasync def scrape_all_async()method with asyncio.gather()- Connection pooling for better performance
- asyncio.Semaphore for concurrency control
- Comprehensive async testing (11 new tests)
- Full documentation in ASYNC_SUPPORT.md
- Performance: ~55 pages/sec vs ~18 pages/sec (sync)
- Memory: 40 MB vs 120 MB (66% reduction)
- Python Package Structure (Phase 0 Complete)
cli/__init__.py- CLI tools package with clean importsskill_seeker_mcp/__init__.py- MCP server package (renamed from mcp/)skill_seeker_mcp/tools/__init__.py- MCP tools subpackage- Proper package imports:
from cli import constants
- Centralized Configuration Module
cli/constants.pywith 18 configuration constantsDEFAULT_ASYNC_MODE,DEFAULT_RATE_LIMIT,DEFAULT_MAX_PAGES- Enhancement limits, categorization scores, file limits
- All magic numbers now centralized and configurable
- Code Quality Improvements
- Converted 71 print() statements to proper logging calls
- Added type hints to all DocToSkillConverter methods
- Fixed all mypy type checking issues
- Installed types-requests for better type safety
- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
- Automatic .txt → .md file extension conversion
- No content truncation: preserves complete documentation
detect_all()method for finding all llms.txt variantsget_proper_filename()for correct .md naming
Changed
_try_llms_txt()now downloads all available variants instead of just one- Reference files now contain complete content (no 2500 char limit)
- Code samples now include full code (no 600 char limit)
- Test count increased from 207 to 299 (92 new tests)
- All print() statements replaced with logging (logger.info, logger.warning, logger.error)
- Better IDE support with proper package structure
- Code quality improved from 5.5/10 to 6.5/10
Fixed
- File extension bug: llms.txt files now saved as .md
- Content loss: 0% truncation (was 36%)
- Test isolation issues in test_async_scraping.py (proper cleanup with try/finally)
- Import issues: no more sys.path.insert() hacks needed
- .gitignore: added test artifacts (.pytest_cache, .coverage, htmlcov, etc.)
1.2.0 - 2025-10-23
🚀 PDF Advanced Features Release
Major enhancement to PDF extraction capabilities with Priority 2 & 3 features.
Added
Priority 2: Support More PDF Types
-
OCR Support for Scanned PDFs
- Automatic text extraction from scanned documents using Tesseract OCR
- Fallback mechanism when page text < 50 characters
- Integration with pytesseract and Pillow
- Command:
--ocrflag - New dependencies:
Pillow==11.0.0,pytesseract==0.3.13
-
Password-Protected PDF Support
- Handle encrypted PDFs with password authentication
- Clear error messages for missing/wrong passwords
- Secure password handling
- Command:
--password PASSWORDflag
-
Complex Table Extraction
- Extract tables from PDFs using PyMuPDF's table detection
- Capture table data as 2D arrays with metadata (bbox, row/col count)
- Integration with skill references in markdown format
- Command:
--extract-tablesflag
Priority 3: Performance Optimizations
-
Parallel Page Processing
- 3x faster PDF extraction using ThreadPoolExecutor
- Auto-detect CPU count or custom worker specification
- Only activates for PDFs with > 5 pages
- Commands:
--paralleland--workers Nflags - Benchmarks: 500-page PDF reduced from 4m 10s to 1m 15s
-
Intelligent Caching
- In-memory cache for expensive operations (text extraction, code detection, quality scoring)
- 50% faster on re-runs
- Command:
--no-cacheto disable (enabled by default)
New Documentation
docs/PDF_ADVANCED_FEATURES.md(580 lines)- Complete usage guide for all advanced features
- Installation instructions
- Performance benchmarks showing 3x speedup
- Best practices and troubleshooting
- API reference with all parameters
Testing
- New test file:
tests/test_pdf_advanced_features.py(568 lines, 26 tests)- TestOCRSupport (5 tests)
- TestPasswordProtection (4 tests)
- TestTableExtraction (5 tests)
- TestCaching (5 tests)
- TestParallelProcessing (4 tests)
- TestIntegration (3 tests)
- Updated:
tests/test_pdf_extractor.py(23 tests fixed and passing) - Total PDF tests: 49/49 PASSING ✅ (100% pass rate)
Changed
- Enhanced
cli/pdf_extractor_poc.pywith all advanced features - Updated
requirements.txtwith new dependencies - Updated
README.mdwith PDF advanced features usage - Updated
docs/TESTING.mdwith new test counts (142 total tests)
Performance Improvements
- 3.3x faster with parallel processing (8 workers)
- 1.7x faster on re-runs with caching enabled
- Support for unlimited page PDFs (no more 500-page limit)
Dependencies
- Added
Pillow==11.0.0for image processing - Added
pytesseract==0.3.13for OCR support - Tesseract OCR engine (system package, optional)
1.1.0 - 2025-10-22
🌐 Documentation Scraping Enhancements
Major improvements to documentation scraping with unlimited pages, parallel processing, and new configs.
Added
Unlimited Scraping & Performance
- Unlimited Page Scraping - Removed 500-page limit, now supports unlimited pages
- Parallel Scraping Mode - Process multiple pages simultaneously for faster scraping
- Dynamic Rate Limiting - Smart rate limit control to avoid server blocks
- CLI Utilities - New helper scripts for common tasks
New Configurations
- Ansible Core 2.19 - Complete Ansible documentation config
- Claude Code - Documentation for this very tool!
- Laravel 9.x - PHP framework documentation
Testing & Quality
- Comprehensive test coverage for CLI utilities
- Parallel scraping test suite
- Virtual environment setup documentation
- Thread-safety improvements
Fixed
- Thread-safety issues in parallel scraping
- CLI path references across all documentation
- Flaky upload_skill tests
- MCP server streaming subprocess implementation
Changed
- All CLI examples now use
cli/directory prefix - Updated documentation structure
- Enhanced error handling
1.0.0 - 2025-10-19
🎉 First Production Release
This is the first production-ready release of Skill Seekers with complete feature set, full test coverage, and comprehensive documentation.
Added
Smart Auto-Upload Feature
- New
upload_skill.pyCLI tool for automatic API-based upload - Enhanced
package_skill.pywith--uploadflag - Smart API key detection with graceful fallback
- Cross-platform folder opening in
utils.py - Helpful error messages instead of confusing errors
MCP Integration Enhancements
- 9 MCP tools (added
upload_skilltool) mcp__skill-seeker__upload_skill- Upload .zip files to Claude automatically- Enhanced
package_skilltool with smart auto-upload parameter - Updated all MCP documentation to reflect 9 tools
Documentation Improvements
- Updated README with version badge (v1.0.0)
- Enhanced upload guide with 3 upload methods
- Updated MCP setup guide with all 9 tools
- Comprehensive test documentation (14/14 tests)
- All references to tool counts corrected
Fixed
- Missing
import osinmcp/server.py package_skill.pyexit code behavior (now exits 0 when API key missing)- Improved UX with helpful messages instead of errors
Changed
- Test count badge updated (96 → 14 passing)
- All documentation references updated to 9 tools
Testing
- CLI Tests: 8/8 PASSED ✅
- MCP Tests: 6/6 PASSED ✅
- Total: 14/14 PASSED (100%)
0.4.0 - 2025-10-18
Added
Large Documentation Support (40K+ Pages)
- Config splitting functionality for massive documentation sites
- Router/hub skill generation for intelligent query routing
- Checkpoint/resume feature for long scrapes
- Parallel scraping support for faster processing
- 4 split strategies: auto, category, router, size
New CLI Tools
split_config.py- Split large configs into focused sub-skillsgenerate_router.py- Generate router/hub skillspackage_multi.py- Package multiple skills at once
New MCP Tools
split_config- Split large documentation via MCPgenerate_router- Generate router skills via MCP
Documentation
- New
docs/LARGE_DOCUMENTATION.mdguide - Example config:
godot-large-example.json(40K pages)
Changed
- MCP tool count: 6 → 8 tools
- Updated documentation for large docs workflow
0.3.0 - 2025-10-15
Added
MCP Server Integration
- Complete MCP server implementation (
mcp/server.py) - 6 MCP tools for Claude Code integration:
list_configsgenerate_configvalidate_configestimate_pagesscrape_docspackage_skill
Setup & Configuration
- Automated setup script (
setup_mcp.sh) - MCP configuration examples
- Comprehensive MCP setup guide (
docs/MCP_SETUP.md) - MCP testing guide (
docs/TEST_MCP_IN_CLAUDE_CODE.md)
Testing
- 31 comprehensive unit tests for MCP server
- Integration tests via Claude Code MCP protocol
- 100% test pass rate
Documentation
- Complete MCP integration documentation
- Natural language usage examples
- Troubleshooting guides
Changed
- Restructured project as monorepo with CLI and MCP server
- Moved CLI tools to
cli/directory - Added MCP server to
mcp/directory
0.2.0 - 2025-10-10
Added
Testing & Quality
- Comprehensive test suite with 71 tests
- 100% test pass rate
- Test coverage for all major features
- Config validation tests
Optimization
- Page count estimator (
estimate_pages.py) - Framework config optimizations with
start_urls - Better URL pattern coverage
- Improved scraping efficiency
New Configs
- Kubernetes documentation config
- Tailwind CSS config
- Astro framework config
Changed
- Optimized all framework configs
- Improved categorization accuracy
- Enhanced error messages
0.1.0 - 2025-10-05
Added
Initial Release
- Basic documentation scraper functionality
- Manual skill creation
- Framework configs (Godot, React, Vue, Django, FastAPI)
- Smart categorization system
- Code language detection
- Pattern extraction
- Local and API-based enhancement options
- Basic packaging functionality
Core Features
- BFS traversal for documentation scraping
- CSS selector-based content extraction
- Smart categorization with scoring
- Code block detection and formatting
- Caching system for scraped data
- Interactive mode for config creation
Documentation
- README with quick start guide
- Basic usage documentation
- Configuration file examples
Release Links
- v1.2.0 - PDF Advanced Features
- v1.1.0 - Documentation Scraping Enhancements
- v1.0.0 - Production Release
- v0.4.0 - Large Documentation Support
- v0.3.0 - MCP Integration
Version History Summary
| Version | Date | Highlights |
|---|---|---|
| 1.2.0 | 2025-10-23 | 📄 PDF advanced features: OCR, passwords, tables, 3x faster |
| 1.1.0 | 2025-10-22 | 🌐 Unlimited scraping, parallel mode, new configs (Ansible, Laravel) |
| 1.0.0 | 2025-10-19 | 🚀 Production release, auto-upload, 9 MCP tools |
| 0.4.0 | 2025-10-18 | 📚 Large docs support (40K+ pages) |
| 0.3.0 | 2025-10-15 | 🔌 MCP integration with Claude Code |
| 0.2.0 | 2025-10-10 | 🧪 Testing & optimization |
| 0.1.0 | 2025-10-05 | 🎬 Initial release |