README.md: - Add PyPI badges (version, downloads, python version) - Update test count from 299 to 379 passing tests - Add prominent 'Now Available on PyPI!' callout section - Reorder installation options (pip as Option 1, uv as Option 2) - Add links to Quick Start and Bulletproof guides - Emphasize PyPI as the recommended installation method CHANGELOG.md: - Add comprehensive v2.0.0 release entry (dated 2025-11-11) - Document PyPI publication as major milestone - Detail modern Python packaging changes - Include unified CLI interface documentation - Add migration guide for users and developers - List all breaking changes and deprecations - Document 379 passing tests and import fixes FUTURE_RELEASES.md (NEW): - Create roadmap document for upcoming releases - Plan v2.1.0 (Dec 2025): Test coverage & quality improvements - Plan v2.2.0 (Q1 2026): Web presence & community growth - Plan v2.3.0 (Q2 2026): Developer experience & integrations - Long-term vision for v3.0+ - Community contribution guidelines - Release schedule and priority system 🚀 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
18 KiB
Changelog
All notable changes to Skill Seeker will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
Added
- (No unreleased changes yet)
[2.0.0] - 2025-11-11
🎉 Major Release: PyPI Publication + Modern Python Packaging
Skill Seekers is now available on PyPI! Install with: pip install skill-seekers
This is a major milestone release featuring complete restructuring for modern Python packaging, comprehensive testing improvements, and publication to the Python Package Index.
🚀 Major Changes
PyPI Publication
- Published to PyPI - https://pypi.org/project/skill-seekers/
- Installation:
pip install skill-seekersoruv tool install skill-seekers - No cloning required - Install globally or in virtual environments
- Automatic dependency management - All dependencies handled by pip/uv
Modern Python Packaging
- pyproject.toml-based configuration - Standard PEP 621 metadata
- src/ layout structure - Best practice package organization
- Entry point scripts -
skill-seekerscommand available globally - Proper dependency groups - Separate dev, test, and MCP dependencies
- Build backend - setuptools-based build with uv support
Unified CLI Interface
- Single
skill-seekerscommand - Git-style subcommands - Subcommands:
scrape,github,pdf,unified,enhance,package,upload,estimate - Consistent interface - All tools accessible through one entry point
- Help system - Comprehensive
--helpfor all commands
Added
Testing Infrastructure
- 379 passing tests (up from 299) - Comprehensive test coverage
- 0 test failures - All tests passing successfully
- Test suite improvements:
- Fixed import paths for src/ layout
- Updated CLI tests for unified entry points
- Added package structure verification tests
- Fixed MCP server import tests
- Added pytest configuration in pyproject.toml
Documentation
- Updated README.md - PyPI badges, reordered installation options
- FUTURE_RELEASES.md - Roadmap for upcoming features
- Installation guides - Simplified with PyPI as primary method
- Testing documentation - How to run full test suite
Changed
Package Structure
- Moved to src/ layout:
src/skill_seekers/- Main packagesrc/skill_seekers/cli/- CLI toolssrc/skill_seekers/mcp/- MCP server
- Import paths updated - All imports use proper package structure
- Entry points configured - All CLI tools available as commands
Import Fixes
- Fixed
merge_sources.py- Corrected conflict_detector import (.conflict_detector) - Fixed MCP server tests - Updated to use
skill_seekers.mcp.serverimports - Fixed test paths - All tests updated for src/ layout
Fixed
Critical Bugs
- Import path errors - Fixed relative imports in CLI modules
- MCP test isolation - Added proper MCP availability checks
- Package installation - Resolved entry point conflicts
- Dependency resolution - All dependencies properly specified
Test Improvements
- 17 test fixes - Updated for modern package structure
- MCP test guards - Proper skipif decorators for MCP tests
- CLI test updates - Accept both exit codes 0 and 2 for help
- Path validation - Tests verify correct package structure
Technical Details
Build System
- Build backend: setuptools.build_meta
- Build command:
uv build - Publish command:
uv publish - Distribution formats: wheel + source tarball
Dependencies
- Core: requests, beautifulsoup4, PyGithub, mcp, httpx
- PDF: PyMuPDF, Pillow, pytesseract
- Dev: pytest, pytest-cov, pytest-anyio, mypy
- MCP: mcp package for Claude Code integration
Migration Guide
For Users
Old way:
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
pip install -r requirements.txt
python3 cli/doc_scraper.py --config configs/react.json
New way:
pip install skill-seekers
skill-seekers scrape --config configs/react.json
For Developers
- Update imports:
from cli.* → from skill_seekers.cli.* - Use
pip install -e ".[dev]"for development - Run tests:
python -m pytest - Entry points instead of direct script execution
Breaking Changes
- CLI interface changed - Use
skill-seekerscommand instead ofpython3 cli/... - Import paths changed - Package now at
skill_seekers.*instead ofcli.* - Installation method changed - PyPI recommended over git clone
Deprecations
- Direct script execution - Still works but deprecated (use
skill-seekerscommand) - Old import patterns - Legacy imports still work but will be removed in v3.0
Compatibility
- Python 3.10+ required
- Backward compatible - Old scripts still work with legacy CLI
- Config files - No changes required
- Output format - No changes to generated skills
[1.3.0] - 2025-10-26
Added - Refactoring & Performance Improvements
- Async/Await Support for Parallel Scraping (2-3x performance boost)
--asyncflag to enable async modeasync def scrape_page_async()method using httpx.AsyncClientasync def scrape_all_async()method with asyncio.gather()- Connection pooling for better performance
- asyncio.Semaphore for concurrency control
- Comprehensive async testing (11 new tests)
- Full documentation in ASYNC_SUPPORT.md
- Performance: ~55 pages/sec vs ~18 pages/sec (sync)
- Memory: 40 MB vs 120 MB (66% reduction)
- Python Package Structure (Phase 0 Complete)
cli/__init__.py- CLI tools package with clean importsskill_seeker_mcp/__init__.py- MCP server package (renamed from mcp/)skill_seeker_mcp/tools/__init__.py- MCP tools subpackage- Proper package imports:
from cli import constants
- Centralized Configuration Module
cli/constants.pywith 18 configuration constantsDEFAULT_ASYNC_MODE,DEFAULT_RATE_LIMIT,DEFAULT_MAX_PAGES- Enhancement limits, categorization scores, file limits
- All magic numbers now centralized and configurable
- Code Quality Improvements
- Converted 71 print() statements to proper logging calls
- Added type hints to all DocToSkillConverter methods
- Fixed all mypy type checking issues
- Installed types-requests for better type safety
- Multi-variant llms.txt detection: downloads all 3 variants (full, standard, small)
- Automatic .txt → .md file extension conversion
- No content truncation: preserves complete documentation
detect_all()method for finding all llms.txt variantsget_proper_filename()for correct .md naming
Changed
_try_llms_txt()now downloads all available variants instead of just one- Reference files now contain complete content (no 2500 char limit)
- Code samples now include full code (no 600 char limit)
- Test count increased from 207 to 299 (92 new tests)
- All print() statements replaced with logging (logger.info, logger.warning, logger.error)
- Better IDE support with proper package structure
- Code quality improved from 5.5/10 to 6.5/10
Fixed
- File extension bug: llms.txt files now saved as .md
- Content loss: 0% truncation (was 36%)
- Test isolation issues in test_async_scraping.py (proper cleanup with try/finally)
- Import issues: no more sys.path.insert() hacks needed
- .gitignore: added test artifacts (.pytest_cache, .coverage, htmlcov, etc.)
1.2.0 - 2025-10-23
🚀 PDF Advanced Features Release
Major enhancement to PDF extraction capabilities with Priority 2 & 3 features.
Added
Priority 2: Support More PDF Types
-
OCR Support for Scanned PDFs
- Automatic text extraction from scanned documents using Tesseract OCR
- Fallback mechanism when page text < 50 characters
- Integration with pytesseract and Pillow
- Command:
--ocrflag - New dependencies:
Pillow==11.0.0,pytesseract==0.3.13
-
Password-Protected PDF Support
- Handle encrypted PDFs with password authentication
- Clear error messages for missing/wrong passwords
- Secure password handling
- Command:
--password PASSWORDflag
-
Complex Table Extraction
- Extract tables from PDFs using PyMuPDF's table detection
- Capture table data as 2D arrays with metadata (bbox, row/col count)
- Integration with skill references in markdown format
- Command:
--extract-tablesflag
Priority 3: Performance Optimizations
-
Parallel Page Processing
- 3x faster PDF extraction using ThreadPoolExecutor
- Auto-detect CPU count or custom worker specification
- Only activates for PDFs with > 5 pages
- Commands:
--paralleland--workers Nflags - Benchmarks: 500-page PDF reduced from 4m 10s to 1m 15s
-
Intelligent Caching
- In-memory cache for expensive operations (text extraction, code detection, quality scoring)
- 50% faster on re-runs
- Command:
--no-cacheto disable (enabled by default)
New Documentation
docs/PDF_ADVANCED_FEATURES.md(580 lines)- Complete usage guide for all advanced features
- Installation instructions
- Performance benchmarks showing 3x speedup
- Best practices and troubleshooting
- API reference with all parameters
Testing
- New test file:
tests/test_pdf_advanced_features.py(568 lines, 26 tests)- TestOCRSupport (5 tests)
- TestPasswordProtection (4 tests)
- TestTableExtraction (5 tests)
- TestCaching (5 tests)
- TestParallelProcessing (4 tests)
- TestIntegration (3 tests)
- Updated:
tests/test_pdf_extractor.py(23 tests fixed and passing) - Total PDF tests: 49/49 PASSING ✅ (100% pass rate)
Changed
- Enhanced
cli/pdf_extractor_poc.pywith all advanced features - Updated
requirements.txtwith new dependencies - Updated
README.mdwith PDF advanced features usage - Updated
docs/TESTING.mdwith new test counts (142 total tests)
Performance Improvements
- 3.3x faster with parallel processing (8 workers)
- 1.7x faster on re-runs with caching enabled
- Support for unlimited page PDFs (no more 500-page limit)
Dependencies
- Added
Pillow==11.0.0for image processing - Added
pytesseract==0.3.13for OCR support - Tesseract OCR engine (system package, optional)
1.1.0 - 2025-10-22
🌐 Documentation Scraping Enhancements
Major improvements to documentation scraping with unlimited pages, parallel processing, and new configs.
Added
Unlimited Scraping & Performance
- Unlimited Page Scraping - Removed 500-page limit, now supports unlimited pages
- Parallel Scraping Mode - Process multiple pages simultaneously for faster scraping
- Dynamic Rate Limiting - Smart rate limit control to avoid server blocks
- CLI Utilities - New helper scripts for common tasks
New Configurations
- Ansible Core 2.19 - Complete Ansible documentation config
- Claude Code - Documentation for this very tool!
- Laravel 9.x - PHP framework documentation
Testing & Quality
- Comprehensive test coverage for CLI utilities
- Parallel scraping test suite
- Virtual environment setup documentation
- Thread-safety improvements
Fixed
- Thread-safety issues in parallel scraping
- CLI path references across all documentation
- Flaky upload_skill tests
- MCP server streaming subprocess implementation
Changed
- All CLI examples now use
cli/directory prefix - Updated documentation structure
- Enhanced error handling
1.0.0 - 2025-10-19
🎉 First Production Release
This is the first production-ready release of Skill Seekers with complete feature set, full test coverage, and comprehensive documentation.
Added
Smart Auto-Upload Feature
- New
upload_skill.pyCLI tool for automatic API-based upload - Enhanced
package_skill.pywith--uploadflag - Smart API key detection with graceful fallback
- Cross-platform folder opening in
utils.py - Helpful error messages instead of confusing errors
MCP Integration Enhancements
- 9 MCP tools (added
upload_skilltool) mcp__skill-seeker__upload_skill- Upload .zip files to Claude automatically- Enhanced
package_skilltool with smart auto-upload parameter - Updated all MCP documentation to reflect 9 tools
Documentation Improvements
- Updated README with version badge (v1.0.0)
- Enhanced upload guide with 3 upload methods
- Updated MCP setup guide with all 9 tools
- Comprehensive test documentation (14/14 tests)
- All references to tool counts corrected
Fixed
- Missing
import osinmcp/server.py package_skill.pyexit code behavior (now exits 0 when API key missing)- Improved UX with helpful messages instead of errors
Changed
- Test count badge updated (96 → 14 passing)
- All documentation references updated to 9 tools
Testing
- CLI Tests: 8/8 PASSED ✅
- MCP Tests: 6/6 PASSED ✅
- Total: 14/14 PASSED (100%)
0.4.0 - 2025-10-18
Added
Large Documentation Support (40K+ Pages)
- Config splitting functionality for massive documentation sites
- Router/hub skill generation for intelligent query routing
- Checkpoint/resume feature for long scrapes
- Parallel scraping support for faster processing
- 4 split strategies: auto, category, router, size
New CLI Tools
split_config.py- Split large configs into focused sub-skillsgenerate_router.py- Generate router/hub skillspackage_multi.py- Package multiple skills at once
New MCP Tools
split_config- Split large documentation via MCPgenerate_router- Generate router skills via MCP
Documentation
- New
docs/LARGE_DOCUMENTATION.mdguide - Example config:
godot-large-example.json(40K pages)
Changed
- MCP tool count: 6 → 8 tools
- Updated documentation for large docs workflow
0.3.0 - 2025-10-15
Added
MCP Server Integration
- Complete MCP server implementation (
mcp/server.py) - 6 MCP tools for Claude Code integration:
list_configsgenerate_configvalidate_configestimate_pagesscrape_docspackage_skill
Setup & Configuration
- Automated setup script (
setup_mcp.sh) - MCP configuration examples
- Comprehensive MCP setup guide (
docs/MCP_SETUP.md) - MCP testing guide (
docs/TEST_MCP_IN_CLAUDE_CODE.md)
Testing
- 31 comprehensive unit tests for MCP server
- Integration tests via Claude Code MCP protocol
- 100% test pass rate
Documentation
- Complete MCP integration documentation
- Natural language usage examples
- Troubleshooting guides
Changed
- Restructured project as monorepo with CLI and MCP server
- Moved CLI tools to
cli/directory - Added MCP server to
mcp/directory
0.2.0 - 2025-10-10
Added
Testing & Quality
- Comprehensive test suite with 71 tests
- 100% test pass rate
- Test coverage for all major features
- Config validation tests
Optimization
- Page count estimator (
estimate_pages.py) - Framework config optimizations with
start_urls - Better URL pattern coverage
- Improved scraping efficiency
New Configs
- Kubernetes documentation config
- Tailwind CSS config
- Astro framework config
Changed
- Optimized all framework configs
- Improved categorization accuracy
- Enhanced error messages
0.1.0 - 2025-10-05
Added
Initial Release
- Basic documentation scraper functionality
- Manual skill creation
- Framework configs (Godot, React, Vue, Django, FastAPI)
- Smart categorization system
- Code language detection
- Pattern extraction
- Local and API-based enhancement options
- Basic packaging functionality
Core Features
- BFS traversal for documentation scraping
- CSS selector-based content extraction
- Smart categorization with scoring
- Code block detection and formatting
- Caching system for scraped data
- Interactive mode for config creation
Documentation
- README with quick start guide
- Basic usage documentation
- Configuration file examples
Release Links
- v1.2.0 - PDF Advanced Features
- v1.1.0 - Documentation Scraping Enhancements
- v1.0.0 - Production Release
- v0.4.0 - Large Documentation Support
- v0.3.0 - MCP Integration
Version History Summary
| Version | Date | Highlights |
|---|---|---|
| 1.2.0 | 2025-10-23 | 📄 PDF advanced features: OCR, passwords, tables, 3x faster |
| 1.1.0 | 2025-10-22 | 🌐 Unlimited scraping, parallel mode, new configs (Ansible, Laravel) |
| 1.0.0 | 2025-10-19 | 🚀 Production release, auto-upload, 9 MCP tools |
| 0.4.0 | 2025-10-18 | 📚 Large docs support (40K+ pages) |
| 0.3.0 | 2025-10-15 | 🔌 MCP integration with Claude Code |
| 0.2.0 | 2025-10-10 | 🧪 Testing & optimization |
| 0.1.0 | 2025-10-05 | 🎬 Initial release |