CHANGES:
1. **Fixed 9 PDF Scraper Test Failures:**
- Added .get() safety for missing page keys (headings, text, code_blocks, images)
- Supported both 'code_samples' and 'code_blocks' keys for compatibility
- Fixed extract_pdf() to raise RuntimeError on failure (tests expect exception)
- Added image saving functionality to _generate_reference_file()
- Updated all test methods to override skill_dir with temp directory
- Fixed categorization to handle pre-categorized test data
2. **Fixed 25 MCP Test Skips:**
- Renamed mcp/ directory to skill_seeker_mcp/ to avoid shadowing external mcp package
- Updated all imports in tests/test_mcp_server.py
- Simplified skill_seeker_mcp/server.py import logic (no more shadowing workarounds)
- Updated tests/test_package_structure.py to reference skill_seeker_mcp
3. **Test Results:**
- ✅ 297 tests passing (100%)
- ✅ 0 tests skipped
- ✅ 0 tests failed
- All test categories passing:
* 23 package structure tests
* 18 PDF scraper tests
* 67 PDF extractor/advanced tests
* 25 MCP server tests
* 164 other core tests
BREAKING CHANGE: MCP server directory renamed from `mcp/` to `skill_seeker_mcp/`
📦 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PROBLEM:
- Local mcp/ directory shadows installed mcp package from PyPI
- Tests couldn't import external mcp.server.Server and mcp.types classes
- MCP server tests (67 tests) were blocked
SOLUTION:
1. Updated mcp/server.py to check sys.modules for pre-imported MCP classes
- Allows tests to import external MCP first, then import our server module
- Falls back to regular import if MCP not pre-imported
- No longer crashes during test collection
2. Updated tests/test_mcp_server.py to import external MCP from /tmp
- Temporarily changes to /tmp directory before importing external mcp
- Avoids local mcp/ directory shadowing in sys.path
- Restores original directory after import
RESULTS:
- Test collection: 297 tests collected (was 272)
- Passing: 263 tests (was 205) - +58 tests
- Skipped: 25 MCP tests (intentional, due to shadowing)
- Failed: 9 PDF scraper tests (pre-existing bugs, not Phase 0 related)
- All PDF tests now running (67 PDF tests passing)
TEST BREAKDOWN:
✅ 205 core tests passing
✅ 67 PDF tests passing (PyMuPDF installed)
✅ 23 package structure tests passing
⏭️ 25 MCP server tests skipped (architectural issue - mcp/ naming conflict)
❌ 9 PDF scraper tests failing (pre-existing bugs in cli/pdf_scraper.py)
LONG-TERM FIX:
Rename mcp/ directory to skill_seeker_mcp/ to eliminate shadowing conflict
(Will enable all 25 MCP tests to run)
📦 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
🐛 Fixes:
- Fix mcp package shadowing by importing external MCP before sys.path modification
- Update mcp/server.py to avoid shadowing installed mcp package
- Update tests/test_mcp_server.py import order
✅ Tests Added:
- Add tests/test_package_structure.py with 23 comprehensive tests
- Test cli package structure and imports
- Test mcp package structure and imports
- Test backwards compatibility
- All package structure tests passing ✅📊 Test Results:
- 205 tests passed ✅
- 67 tests skipped (PDF features, PyMuPDF not installed)
- 23 new package structure tests added
- Total: 272 tests (excluding test_mcp_server.py which needs more work)
⚠️ Known Issue:
- test_mcp_server.py still has import issues (67 tests)
- Will be fixed in next commit
- Main functionality tests all passing
Impact: Package structure working, 75% of tests passing
Problem:
- 2 tests in test_upload_skill.py failing intermittently in CI
- Tests passed individually but failed when run after test_parallel_scraping.py
- Tests failed with exit code 2 instead of 0 when running `--help`
Root Cause:
- test_parallel_scraping.py calls `os.chdir(tmpdir)` to create temporary test directories
- These directory changes persisted across test classes
- When upload_skill CLI tests ran subprocess with path 'cli/upload_skill.py',
the relative path was broken because cwd was still in the temp directory
- Result: subprocess couldn't find the script, returned exit code 2
Fix:
- Added setUp/tearDown to all 6 test classes in test_parallel_scraping.py
- setUp saves original cwd with `self.original_cwd = os.getcwd()`
- tearDown restores it with `os.chdir(self.original_cwd)`
- Ensures tests don't pollute working directory state for subsequent tests
Impact:
- All 158 tests now pass consistently
- No more flaky failures in CI
- Test isolation properly maintained
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major restructure to support both CLI usage and MCP integration:
**Repository Structure:**
- cli/ - All CLI tools (doc_scraper, estimate_pages, enhance_skill, etc.)
- mcp/ - New MCP server for Claude Code integration
- configs/ - Shared configuration files
- tests/ - Updated to import from cli/
- docs/ - Shared documentation
**MCP Server (NEW):**
- mcp/server.py - Full MCP server implementation
- 6 tools available:
* generate_config - Create config from URL
* estimate_pages - Fast page count estimation
* scrape_docs - Full documentation scraping
* package_skill - Package to .zip
* list_configs - Show available presets
* validate_config - Validate config files
- mcp/README.md - Complete MCP documentation
- mcp/requirements.txt - MCP dependencies
**CLI Tools (Moved to cli/):**
- All existing functionality preserved
- Same commands, same behavior
- Tests updated to import from cli.doc_scraper
**Tests:**
- 68/71 passing (95.8%)
- Updated imports from doc_scraper to cli.doc_scraper
- Fixed validate_config() tuple unpacking (errors, warnings)
- 3 minor test failures (checking warnings instead of errors)
**Benefits:**
- Use as CLI tool: python3 cli/doc_scraper.py
- Use via MCP: Integrated with Claude Code
- Shared code and configs
- Single source of truth
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Test Framework:
- Created tests/ directory structure
- Added __init__.py for test package
- Implemented 71 comprehensive tests across 3 test suites
Test Suites:
1. test_config_validation.py (25 tests)
- Valid/invalid config structure
- Required fields validation
- Name format validation
- URL format validation
- Selectors validation
- URL patterns validation
- Categories validation
- Rate limit validation (0-10 range)
- Max pages validation (1-10000 range)
- Start URLs validation
2. test_scraper_features.py (28 tests)
- URL validation (include/exclude patterns)
- Language detection (Python, JavaScript, GDScript, C++, etc.)
- Pattern extraction from documentation
- Smart categorization (by URL, title, content)
- Text cleaning utilities
3. test_integration.py (18 tests)
- Dry-run mode functionality
- Config loading and validation
- Real config files validation (godot, react, vue, django, fastapi, steam)
- URL processing and normalization
- Content extraction
Test Runner (run_tests.py):
- Custom colored test runner with ANSI colors
- Detailed test summary with breakdown by category
- Success rate calculation
- Command-line options:
--suite: Run specific test suite
--verbose: Show each test name
--quiet: Minimal output
--failfast: Stop on first failure
--list: List all available tests
- Execution time: ~1 second for full suite
Documentation:
- Added comprehensive TESTING.md guide
- Test writing templates
- Best practices
- Coverage information
- Troubleshooting guide
.gitignore:
- Added Python cache files
- Added output directory
- Added IDE and OS files
Test Results:
✅ 71/71 tests passing (100% pass rate)
✅ All existing configs validated
✅ Fast execution (<1 second)
✅ Ready for CI/CD integration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>