Commit Graph

5 Commits

Author SHA1 Message Date
Pablo Estevez
c33c6f9073 change max lenght 2026-01-17 17:48:15 +00:00
Pablo Estevez
5ed767ff9a run ruff 2026-01-17 17:29:21 +00:00
yusyus
9931066741 fix: Update test imports for new package structure
Updated 8 test files to use new skill_seekers.* imports:
- test_async_scraping.py
- test_estimate_pages.py
- test_package_skill.py
- test_parallel_scraping.py
- test_unified.py
- test_unified_mcp_integration.py
- test_upload_skill.py
- test_utilities.py

Changed:
- from cli.* → from skill_seekers.cli.*
- from skill_seeker_mcp.* → from skill_seekers.mcp.*
- Removed obsolete sys.path.insert() calls

Result:
- 364/389 tests passing (93.5% pass rate)
- Remaining 25 failures are path-related tests that need
  updating for new unified CLI commands (will fix next)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:21:29 +03:00
yusyus
0c5515129b Fix flaky upload_skill tests by restoring cwd in parallel scraping tests
Problem:
- 2 tests in test_upload_skill.py failing intermittently in CI
- Tests passed individually but failed when run after test_parallel_scraping.py
- Tests failed with exit code 2 instead of 0 when running `--help`

Root Cause:
- test_parallel_scraping.py calls `os.chdir(tmpdir)` to create temporary test directories
- These directory changes persisted across test classes
- When upload_skill CLI tests ran subprocess with path 'cli/upload_skill.py',
  the relative path was broken because cwd was still in the temp directory
- Result: subprocess couldn't find the script, returned exit code 2

Fix:
- Added setUp/tearDown to all 6 test classes in test_parallel_scraping.py
- setUp saves original cwd with `self.original_cwd = os.getcwd()`
- tearDown restores it with `os.chdir(self.original_cwd)`
- Ensures tests don't pollute working directory state for subsequent tests

Impact:
- All 158 tests now pass consistently
- No more flaky failures in CI
- Test isolation properly maintained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-22 22:53:49 +03:00
IbrahimAlbyrk-luduArts
7e94c276be Add unlimited scraping, parallel mode, and rate limit control (#144)
Add three major features for improved performance and flexibility:

1. **Unlimited Scraping Mode**
   - Support max_pages: null or -1 for complete documentation coverage
   - Added unlimited parameter to MCP tools
   - Warning messages for unlimited mode

2. **Parallel Scraping (1-10 workers)**
   - ThreadPoolExecutor for concurrent requests
   - Thread-safe with proper locking
   - 20x performance improvement (10K pages: 83min → 4min)
   - Workers parameter in config

3. **Configurable Rate Limiting**
   - CLI overrides for rate_limit
   - --no-rate-limit flag for maximum speed
   - Per-worker rate limiting semantics

4. **MCP Streaming & Timeouts**
   - Non-blocking subprocess with real-time output
   - Intelligent timeouts per operation type
   - Prevents frozen/hanging behavior

**Thread-Safety Fixes:**
- Fixed race condition on visited_urls.add()
- Protected pages_scraped counter with lock
- Added explicit exception checking for workers
- All shared state operations properly synchronized

**Test Coverage:**
- Added 17 comprehensive tests for new features
- All 117 tests passing
- Thread safety validated

**Performance:**
- 1000 pages: 8.3min → 0.4min (20x faster)
- 10000 pages: 83min → 4min (20x faster)
- Maintains backward compatibility (default: 0.5s, 1 worker)

**Commits:**
- 309bf71: feat: Add unlimited scraping mode support
- 3ebc2d7: fix(mcp): Add timeout and streaming output
- 5d16fdc: feat: Add configurable rate limiting and parallel scraping
- ae7883d: Fix MCP server tests for streaming subprocess
- e5713dd: Fix critical thread-safety issues in parallel scraping
- 303efaf: Add comprehensive tests for parallel scraping features

Co-authored-by: IbrahimAlbyrk-luduArts <ialbayrak@luduarts.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 22:46:02 +03:00