- Remove max_pages upper limit (was 10,000, now unlimited)
- Remove rate_limit upper limit (was 10s, now unlimited)
- Convert missing selector checks from errors to warnings
- Add warnings system (non-blocking) vs errors (blocking)
- Allow users to scrape large documentation sites (45k+ pages)
- Allow flexible rate limiting for different site requirements
All reasonable validations remain (required fields, valid URLs,
correct data types, no negative values).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
High Priority:
- Fix hardcoded package_skill.py path (line 778)
Changed from: /mnt/skills/examples/skill-creator/scripts/package_skill.py
Changed to: package_skill.py (local repository path)
Medium Priority:
- Add comprehensive config validation
* Validates required fields (name, base_url)
* Validates name format (alphanumeric, hyphens, underscores)
* Validates base_url format (http/https)
* Validates selectors structure and recommends standard selectors
* Validates url_patterns (include/exclude lists)
* Validates categories structure
* Validates rate_limit range (0-10 seconds)
* Validates max_pages range (1-10000)
* Validates start_urls format if present
* Provides clear error messages for invalid configs
- Add --dry-run flag for preview mode
* Previews first 20 URLs without saving data
* Shows what would be scraped without creating files
* Discovers links to estimate total pages
* Displays configuration summary
* No directories created in dry-run mode
* Useful for testing configs before full scrape
All changes tested and working correctly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>