Commit Graph

657 Commits

Author SHA1 Message Date
yusyus
f9c8f1d610 Clean up macOS .DS_Store file from output directory 2025-10-19 02:21:25 +03:00
yusyus
f1fa8354d2 Add comprehensive test system with 71 tests (100% pass rate)
Test Framework:
- Created tests/ directory structure
- Added __init__.py for test package
- Implemented 71 comprehensive tests across 3 test suites

Test Suites:
1. test_config_validation.py (25 tests)
   - Valid/invalid config structure
   - Required fields validation
   - Name format validation
   - URL format validation
   - Selectors validation
   - URL patterns validation
   - Categories validation
   - Rate limit validation (0-10 range)
   - Max pages validation (1-10000 range)
   - Start URLs validation

2. test_scraper_features.py (28 tests)
   - URL validation (include/exclude patterns)
   - Language detection (Python, JavaScript, GDScript, C++, etc.)
   - Pattern extraction from documentation
   - Smart categorization (by URL, title, content)
   - Text cleaning utilities

3. test_integration.py (18 tests)
   - Dry-run mode functionality
   - Config loading and validation
   - Real config files validation (godot, react, vue, django, fastapi, steam)
   - URL processing and normalization
   - Content extraction

Test Runner (run_tests.py):
- Custom colored test runner with ANSI colors
- Detailed test summary with breakdown by category
- Success rate calculation
- Command-line options:
  --suite: Run specific test suite
  --verbose: Show each test name
  --quiet: Minimal output
  --failfast: Stop on first failure
  --list: List all available tests
- Execution time: ~1 second for full suite

Documentation:
- Added comprehensive TESTING.md guide
- Test writing templates
- Best practices
- Coverage information
- Troubleshooting guide

.gitignore:
- Added Python cache files
- Added output directory
- Added IDE and OS files

Test Results:
 71/71 tests passing (100% pass rate)
 All existing configs validated
 Fast execution (<1 second)
 Ready for CI/CD integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 02:08:58 +03:00
yusyus
eeef230c7b Implement high and medium priority improvements
High Priority:
- Fix hardcoded package_skill.py path (line 778)
  Changed from: /mnt/skills/examples/skill-creator/scripts/package_skill.py
  Changed to: package_skill.py (local repository path)

Medium Priority:
- Add comprehensive config validation
  * Validates required fields (name, base_url)
  * Validates name format (alphanumeric, hyphens, underscores)
  * Validates base_url format (http/https)
  * Validates selectors structure and recommends standard selectors
  * Validates url_patterns (include/exclude lists)
  * Validates categories structure
  * Validates rate_limit range (0-10 seconds)
  * Validates max_pages range (1-10000)
  * Validates start_urls format if present
  * Provides clear error messages for invalid configs

- Add --dry-run flag for preview mode
  * Previews first 20 URLs without saving data
  * Shows what would be scraped without creating files
  * Discovers links to estimate total pages
  * Displays configuration summary
  * No directories created in dry-run mode
  * Useful for testing configs before full scrape

All changes tested and working correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 01:57:59 +03:00
yusyus
f8c75a3b2d Add comprehensive CLAUDE.md for Claude Code integration
- Add root-level CLAUDE.md with complete guidance for Claude Code
- Include Python 3.7+ requirement
- Add first-time user workflow with all commands
- Include CSS selector testing with BeautifulSoup examples
- Add output quality verification commands
- Document force re-scrape instructions
- Fix package_skill.py path (remove hardcoded /mnt/skills reference)
- Add complete config file structure with real examples
- Include testing section for selector validation
- Add performance metrics table
- Document all key code locations with line numbers
- Organize by: quick start → architecture → workflows → troubleshooting
- Preserve existing docs/CLAUDE.md as detailed technical reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 01:43:02 +03:00
yusyus
a9b8591731 Update README.md with detailed project description and features; add initial VSCode settings. 2025-10-17 15:21:39 +00:00
yusyus
78b9cae398 Init 2025-10-17 15:14:44 +00:00
yusyus
397d47fe7c Initial commit 2025-10-17 17:43:48 +03:00