Commit Graph

40 Commits

Author SHA1 Message Date
yusyus
5f29c1c191 Configure project board for incremental development workflow
New Workflow:
- Added 'Workflow Stage' field with 5 stages
- 📋 Backlog (120 tasks) - All available tasks
-  Quick Wins (7 tasks) - High priority starters
- 🎯 Ready to Start (0-5 tasks) - Personal queue
- 🔨 In Progress (1-2 max) - Active work
-  Done - Completed tasks

Quick Wins Pre-selected:
- #130 - Install MCP (5 min)
- #114 - Respond to Issue #8 (30 min)
- #117 - Answer Issue #3 (30 min)
- #27 - Research PDF parsing (30 min)
- #21 - GitHub Pages site (1-2 hours)
- #93 - URL normalization (1-2 hours)
- #116 - Example project (2-3 hours)

Updated PROJECT_BOARD_GUIDE.md:
- Explained Workflow Stage field
- Step-by-step incremental workflow
- Recommended views and filters
- Tips for incremental success

Philosophy: Small tasks → Pick one → Complete → Move to next!
2025-10-20 23:13:10 +03:00
yusyus
e1e3968537 Add GitHub Project Board setup and guide
- Added 3 custom fields: Category, Time Estimate, Priority
- Created comprehensive project README
- Added PROJECT_BOARD_GUIDE.md with usage instructions
- Project board fully configured for flexible development

Custom Fields:
- Category: 10 categories matching our roadmap
- Time Estimate: 5 levels (5min to 8+ hours)
- Priority: High/Medium/Low/Starter
- Status: Todo/In Progress/Done (default)

Project: https://github.com/users/yusufkaraaslan/projects/2
2025-10-20 23:03:14 +03:00
yusyus
e092318351 Add project board link to README
- Added project board badge
- Added prominent link to development roadmap
- Links to 127 tasks across 10 categories
2025-10-20 22:58:04 +03:00
yusyus
90449cc86d Add step-by-step project board setup instructions
- Complete checklist for manual setup
- GitHub CLI automation commands
- Label colors and descriptions
- Milestone creation guide
- Issue creation workflow

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 13:40:29 +03:00
yusyus
29a181752a Add GitHub Project Board setup and issue templates
- Add comprehensive PROJECT_BOARD_SETUP.md with 20 issues
- Create 3 milestones: v1.1.0, v1.2.0, v2.0.0
- Add issue templates: feature, bug, documentation
- Add PR template with checklist
- Define labels for priority, type, component, status
- Include setup instructions for web UI and CLI

Features:
- 6-column project board structure
- 20 pre-defined issues covering website, core improvements, advanced features
- Custom fields: Effort, Impact, Category
- Success metrics and community engagement guidelines

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 13:38:13 +03:00
yusyus
efaa1454e5 Merge pull request #6 from lwsinclair/add-mseep-badge
Add MseeP.ai badge
2025-10-20 12:43:47 +03:00
Lawrence Sinclair
61fda321e3 Add MseeP.ai badge to README.md 2025-10-20 12:04:04 +11:00
yusyus
b83f276621 Update Python requirement to 3.10+ for MCP compatibility
The MCP package requires Python 3.10 or higher. Updated:
- GitHub Actions workflow to test Python 3.10, 3.11, 3.12
- README.md badge to Python 3.10+
- CLAUDE.md prerequisites
- CONTRIBUTING.md prerequisites
- docs/MCP_SETUP.md prerequisites

This fixes the MCP installation error in CI:
'ERROR: No matching distribution found for mcp>=1.0.0'

MCP package versions 0.9.1+ all require Python 3.10+.
2025-10-19 22:53:28 +03:00
yusyus
9ce78e9a16 Fix GitHub Actions workflow: Update Python version requirements
- Update CI workflow to Python 3.9-3.12 (from 3.7-3.11)
- Python 3.7 and 3.8 no longer available on ubuntu-latest (Ubuntu 24.04)
- Add fail-fast: false to continue testing on failures
- Update all documentation to reflect Python 3.9+ requirement

Files updated:
- .github/workflows/tests.yml - New Python versions
- README.md - Badge updated to Python 3.9+
- CLAUDE.md - Prerequisites updated
- CONTRIBUTING.md - Prerequisites updated
- docs/MCP_SETUP.md - Prerequisites updated

This fixes the failing GitHub Actions tests.
2025-10-19 22:49:14 +03:00
yusyus
517ed46338 Add project infrastructure and documentation
Infrastructure:
- Add GitHub Actions workflows (tests.yml, release.yml)
- Add CHANGELOG.md with full version history
- Add CONTRIBUTING.md with contribution guidelines
- Add RELEASE_NOTES_v1.0.0.md for v1.0.0 release

Documentation:
- Update README.md with version badge (v1.0.0)
- Update test count badge (14 tests)
- Add links to new documentation files

Features:
- CI/CD pipeline with automated testing
- Multi-OS testing (Ubuntu, macOS)
- Multi-Python version testing (3.7-3.11)
- Automated release creation on tag push
- Code coverage reporting

This completes the v1.0.0 production release setup.
2025-10-19 22:37:55 +03:00
yusyus
7aa5f0d3cb Merge MCP_refactor: Add auto-upload feature with 9 MCP tools
Merges smart auto-upload feature with API key detection.

Features:
- New upload_skill.py for automatic API-based upload
- Enhanced package_skill.py with --upload flag
- Smart detection: upload if API key available, helpful message if not
- 9 total MCP tools (added upload_skill)
- Cross-platform folder opening
- Graceful error handling

Fixes:
- Fix missing import os in mcp/server.py
- Fix package_skill.py exit code
- Update all documentation to reflect 9 tools

Tests: 14/14 passed (100%)
- CLI tests: 8/8 passed
- MCP tests: 6/6 passed

All documentation updated and verified.
2025-10-19 22:22:45 +03:00
yusyus
06dabf639c Update documentation: correct MCP tool count to 9 tools
- Update mcp/README.md: 8 tools → 9 tools, add upload_skill docs
- Update docs/MCP_SETUP.md: verify section lists all 9 tools
- Update docs/CLAUDE.md: MCP tool references updated
- Add upload_skill to tool listings and examples
- Update test coverage count: 31 → 34 tests

All documentation now accurately reflects the current feature set.
2025-10-19 22:22:03 +03:00
yusyus
d8cc92cd46 Add smart auto-upload feature with API key detection
Features:
- New upload_skill.py for automatic API-based upload
- Smart detection: upload if API key available, helpful message if not
- Enhanced package_skill.py with --upload flag
- New MCP tool: upload_skill (9 total MCP tools now)
- Enhanced MCP tool: package_skill with smart auto-upload
- Cross-platform folder opening in utils.py
- Graceful error handling throughout

Fixes:
- Fix missing import os in mcp/server.py
- Fix package_skill.py exit code (now 0 when API key missing)
- Improve UX with helpful messages instead of errors

Tests: 14/14 passed (100%)
- CLI tests: 8/8 passed
- MCP tests: 6/6 passed

Files: +4 new, 5 modified, ~600 lines added
2025-10-19 22:17:23 +03:00
yusyus
6b97a9edc6 Update documentation for large documentation features
Comprehensive documentation updates for large docs support:

README.md:
- Add "Large Documentation Support" to key features
- Add "Router/Hub Skills" feature highlight
- Add "Checkpoint/Resume" feature highlight
- Update MCP tools count: 6 → 8
- Add complete section 7: Large Documentation Support (10K-40K+ Pages)
  - Split strategies: auto, category, router, size
  - Parallel scraping workflow
  - Configuration examples
  - Benefits and use cases
- Add section 8: Checkpoint/Resume for Long Scrapes
  - Configuration examples
  - Resume/fresh workflow
  - Benefits and features
- Update documentation links to include LARGE_DOCUMENTATION.md
- Update MCP guide links to reflect 8 tools

docs/CLAUDE.md:
- Add resume/checkpoint commands
- Add large documentation commands (split, router, package_multi)
- Update MCP integration section (8 tools)
- Expand directory structure to show new files
- Add split_strategy, split_config, checkpoint config parameters
- Add "Large Documentation Support" and "Checkpoint/Resume" features
- Add complete large documentation workflow (40K pages example)
- Update all command paths to use cli/ prefix

mcp/README.md:
- Update tool count: 6 → 8
- Add tool 7: split_config with full documentation
- Add tool 8: generate_router with full documentation
- Add "Large Documentation (40K Pages)" workflow example
- Update test coverage: 25 → 31 tests
- Update performance table with parallel scraping metrics
- Document all split strategies

docs/MCP_SETUP.md:
- Update verified tools count: 6 → 8
- Update test count: 25 → 31

All documentation now comprehensively covers:
- Large documentation handling (10K-40K+ pages)
- Router/hub architecture
- Config splitting strategies
- Checkpoint/resume functionality
- Parallel scraping workflows
- Complete MCP integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 20:58:47 +03:00
yusyus
105218f85e Add checkpoint/resume feature for long scrapes
Implement automatic progress saving and resumption for interrupted
or very long documentation scrapes (40K+ pages).

**Features:**
- Automatic checkpoint saving every N pages (configurable, default: 1000)
- Resume from last checkpoint with --resume flag
- Fresh start with --fresh flag (clears checkpoint)
- Progress state saved: visited URLs, pending URLs, pages scraped
- Checkpoint saved on interruption (Ctrl+C)
- Checkpoint cleared after successful completion

**Configuration:**
```json
{
  "checkpoint": {
    "enabled": true,
    "interval": 1000
  }
}
```

**Usage:**
```bash
# Start scraping (with checkpoints enabled in config)
python3 cli/doc_scraper.py --config configs/large-docs.json

# If interrupted (Ctrl+C), resume later:
python3 cli/doc_scraper.py --config configs/large-docs.json --resume

# Start fresh (clear checkpoint):
python3 cli/doc_scraper.py --config configs/large-docs.json --fresh
```

**Checkpoint Data:**
- config: Full configuration
- visited_urls: All URLs already scraped
- pending_urls: Queue of URLs to scrape
- pages_scraped: Count of pages completed
- last_updated: Timestamp
- checkpoint_interval: Interval setting

**Benefits:**
 Never lose progress on long scrapes
 Handle interruptions gracefully
 Resume multi-hour scrapes easily
 Automatic save every 1000 pages
 Essential for 40K+ page documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 20:50:24 +03:00
yusyus
bddb57f5ef Add large documentation handling (40K+ pages support)
Implement comprehensive system for handling very large documentation sites
with intelligent splitting strategies and router/hub architecture.

**New CLI Tools:**
- cli/split_config.py: Split large configs into focused sub-skills
  * Strategies: auto, category, router, size
  * Configurable target pages per skill (default: 5000)
  * Dry-run mode for preview

- cli/generate_router.py: Create intelligent router/hub skills
  * Auto-generates routing logic based on keywords
  * Creates SKILL.md with topic-to-skill mapping
  * Infers router name from sub-skills

- cli/package_multi.py: Batch package multiple skills
  * Package router + all sub-skills in one command
  * Progress tracking for each skill

**MCP Integration:**
- Added split_config tool (8 total MCP tools now)
- Added generate_router tool
- Supports 40K+ page documentation via MCP

**Configuration:**
- New split_strategy parameter in configs
- split_config section for fine-tuned control
- checkpoint section for resume capability (ready for Phase 4)
- Example: configs/godot-large-example.json

**Documentation:**
- docs/LARGE_DOCUMENTATION.md (500+ lines)
  * Complete guide for 10K+ page documentation
  * All splitting strategies explained
  * Detailed workflows with examples
  * Best practices and troubleshooting
  * Real-world examples (AWS, Microsoft, Godot)

**Features:**
 Handle 40K+ page documentation efficiently
 Parallel scraping support (5x-10x faster)
 Router + sub-skills architecture
 Intelligent keyword-based routing
 Multiple splitting strategies
 Full MCP integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 20:48:03 +03:00
yusyus
f103aa62cb Clean up tracked files and repository structure
Remove unnecessary files:
- configs/.DS_Store (macOS system file, should not be tracked)

This ensures only relevant project files are version controlled
and improves repository hygiene.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 19:45:13 +03:00
yusyus
1c5801d121 Update documentation for MCP integration
Comprehensive documentation updates reflecting MCP integration:

README.md:
- Add MCP Integration and Tests Passing badges
- Enhance MCP section with "Tested and Working" status
- Add links to both setup and testing guides

docs/MCP_SETUP.md:
- Update status to reflect production testing
- Add integration testing verification notes
- Confirm all 6 tools working with natural language

CLAUDE.md:
- Add prominent MCP Integration section at top
- List all 6 available MCP tools with descriptions
- Add setup instructions and production status

docs/TEST_MCP_IN_CLAUDE_CODE.md (moved from root):
- Relocate testing guide to docs/ for better organization
- Provides step-by-step MCP integration testing workflow
- Documents complete test suite for all 6 tools

All documentation now accurately reflects the fully tested and
working MCP integration verified in production Claude Code environment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 19:44:47 +03:00
yusyus
d7e6142ab0 Add test configurations for MCP validation
Add 4 test configuration files used for validating MCP functionality:
- astro.json: Astro framework documentation (15 pages, production test)
- python-tutorial-test.json: Python tutorial (minimal test case)
- tailwind.json: Tailwind CSS documentation (test case)
- test-manual.json: Manual testing configuration

These configs were used to verify:
- Config generation via generate_config tool
- Config validation via validate_config tool
- Page estimation via estimate_pages tool
- Full scraping workflow via scrape_docs tool
- Skill packaging via package_skill tool

All tests passed successfully in production Claude Code environment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 19:44:27 +03:00
yusyus
35499da922 Add MCP configuration and setup scripts
Add complete setup infrastructure for MCP integration:
- example-mcp-config.json: Template Claude Code MCP configuration
- setup_mcp.sh: Automated one-command setup script
- test_mcp_server.py: Comprehensive test suite (25 tests, 100% pass)

The setup script automates:
- Dependency installation
- Configuration file generation with absolute paths
- Claude Code config directory creation
- Validation and verification

Tests cover:
- All 6 MCP tool functions
- Error handling and edge cases
- Config validation
- Page estimation
- Skill packaging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 19:43:56 +03:00
yusyus
278b591ed7 Add MCP server implementation with 6 tools
Implement complete Model Context Protocol server providing 6 tools for
documentation skill generation:
- list_configs: List all available preset configurations
- generate_config: Create new config files for any documentation site
- validate_config: Validate config file structure and parameters
- estimate_pages: Fast page count estimation before scraping
- scrape_docs: Full documentation scraping and skill building
- package_skill: Package skill directory into uploadable .zip

Features:
- Async/await architecture for efficient I/O operations
- Full MCP protocol compliance
- Comprehensive error handling and user-friendly messages
- Integration with existing CLI tools (doc_scraper.py, etc.)
- 25 unit tests with 100% pass rate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 19:43:25 +03:00
yusyus
36ce32d02e Add MCP test scripts for easy testing after restart
- MCP_TEST_SCRIPT.md: Complete 10-test script with verification
- QUICK_MCP_TEST.md: Quick 6-test version for fast testing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 17:29:21 +03:00
yusyus
b69f57b60a Add comprehensive MCP setup guide and integration test template
**Documentation Added:**
- docs/MCP_SETUP.md: Complete 400+ line setup guide
  - Prerequisites and installation steps
  - Configuration examples for Claude Code
  - Verification and troubleshooting
  - 3 usage examples and advanced configuration
  - End-to-end workflow and quick reference

- tests/mcp_integration_test.md: Comprehensive test template
  - 10 test cases covering all MCP tools
  - Performance metrics table
  - Issue tracking and environment setup
  - Setup and cleanup scripts

- .claude/mcp_config.example.json: Example MCP configuration

**Documentation Updated:**
- STRUCTURE.md: Complete monorepo structure documentation
- CLAUDE.md: All Python script paths updated to cli/ prefix
- docs/USAGE.md: All command examples updated for monorepo
- TODO.md: Current sprint status and completed tasks

**Summary:**
- Issues #2 and #3 handled (MCP setup guide + integration tests)
- All documentation now reflects monorepo structure (cli/ + mcp/)
- Tests: 71/71 passing (100%)
- Ready for MCP server testing with Claude Code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 17:01:37 +03:00
yusyus
ba7cacdb4c Fix all test failures and add upper limit validation (100% pass rate!)
**Test Fixes:**
- Fixed 3 failing tests by checking warnings instead of errors
- test_missing_recommended_selectors: now checks warnings
- test_invalid_rate_limit_too_high: now checks warnings
- test_invalid_max_pages_too_high: now checks warnings

**Validation Improvements:**
- Added rate_limit upper limit warning (> 10s)
- Added max_pages upper limit warning (> 10000)
- Helps users avoid extreme values

**Results:**
- Before: 68/71 tests passing (95.8%)
- After: 71/71 tests passing (100%) 

**Planning Files Added:**
- .github/create_issues.sh - Helper for creating issues
- .github/SETUP_GUIDE.md - GitHub setup instructions

Tests now comprehensively cover all validation scenarios including
errors, warnings, and edge cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:50:25 +03:00
yusyus
23277ded26 Update TODO.md with current sprint tasks and create GitHub issues
**TODO.md Updates:**
- Mark current 4 tasks as STARTED
- Add "In Progress" and "Completed Today" sections
- Document current branch: MCP_refactor
- Clear tracking of sprint progress

**GitHub Issues Created (templates):**
1. Fix 3 test failures (warnings vs errors)
2. Create MCP setup guide for Claude Code
3. Test MCP server with actual Claude Code
4. Update documentation for monorepo structure

**Issue Templates Include:**
- Detailed problem descriptions
- Step-by-step solutions
- Acceptance criteria
- Files to modify
- Test plans

**Next Steps:**
User can create issues via:
- GitHub web UI (copy from ISSUES_TO_CREATE.md)
- GitHub CLI (gh issue create)
- Or work directly from TODO.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:30:13 +03:00
yusyus
f66718122a Add project planning and roadmap documentation
**Planning Structure:**
- TODO.md - Sprint planning and current tasks
- ROADMAP.md - Long-term vision and milestones
- .github/ISSUE_TEMPLATE/ - Issue templates for features and MCP tools

**TODO.md:**
- 5 phases: Core (done), Enhancement, Advanced, Docs, Integrations
- Current sprint tasks clearly defined
- Progress tracking with checkboxes

**ROADMAP.md:**
- Vision and milestones
- v1.0 (done), v1.1 (in progress), v1.2-3.0 (planned)
- Feature ideas categorized by priority
- Metrics and goals
- Release schedule

**Issue Templates:**
- feature_request.md - General features
- mcp_tool.md - New MCP tools specifically

Next: Fix tests, test MCP with Claude Code, document setup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:23:58 +03:00
yusyus
ae924a9d05 Refactor: Convert to monorepo with CLI and MCP server
Major restructure to support both CLI usage and MCP integration:

**Repository Structure:**
- cli/ - All CLI tools (doc_scraper, estimate_pages, enhance_skill, etc.)
- mcp/ - New MCP server for Claude Code integration
- configs/ - Shared configuration files
- tests/ - Updated to import from cli/
- docs/ - Shared documentation

**MCP Server (NEW):**
- mcp/server.py - Full MCP server implementation
- 6 tools available:
  * generate_config - Create config from URL
  * estimate_pages - Fast page count estimation
  * scrape_docs - Full documentation scraping
  * package_skill - Package to .zip
  * list_configs - Show available presets
  * validate_config - Validate config files
- mcp/README.md - Complete MCP documentation
- mcp/requirements.txt - MCP dependencies

**CLI Tools (Moved to cli/):**
- All existing functionality preserved
- Same commands, same behavior
- Tests updated to import from cli.doc_scraper

**Tests:**
- 68/71 passing (95.8%)
- Updated imports from doc_scraper to cli.doc_scraper
- Fixed validate_config() tuple unpacking (errors, warnings)
- 3 minor test failures (checking warnings instead of errors)

**Benefits:**
- Use as CLI tool: python3 cli/doc_scraper.py
- Use via MCP: Integrated with Claude Code
- Shared code and configs
- Single source of truth

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 15:19:53 +03:00
yusyus
af87572735 Remove unnecessary validation limits from config validator
- Remove max_pages upper limit (was 10,000, now unlimited)
- Remove rate_limit upper limit (was 10s, now unlimited)
- Convert missing selector checks from errors to warnings
- Add warnings system (non-blocking) vs errors (blocking)
- Allow users to scrape large documentation sites (45k+ pages)
- Allow flexible rate limiting for different site requirements

All reasonable validations remain (required fields, valid URLs,
correct data types, no negative values).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 14:55:56 +03:00
yusyus
be84e5a321 Merge branch 'pr-2' 2025-10-19 13:55:25 +03:00
yusyus
3144d3cf3a Add comprehensive usage guide for all tools and workflows
- Add docs/USAGE.md (~650 lines)
- Complete command reference for all tools
- Full help output for doc_scraper.py, estimate_pages.py, run_tests.py
- Usage examples for enhancement and packaging tools
- All 6 preset configs documented with details
- 6 common workflows from quick start to advanced
- Troubleshooting section with solutions
- Advanced usage: custom selectors, URL patterns, categories
- Performance tips and best practices
- Exit codes, environment variables, file locations

Tools covered:
  - doc_scraper.py (main tool with all options)
  - estimate_pages.py (page count estimator)
  - enhance_skill.py (API enhancement)
  - enhance_skill_local.py (local enhancement)
  - package_skill.py (skill packager)
  - run_tests.py (test runner)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 13:34:02 +03:00
jarek
7a4c1d7083 kubernetes config for official docs 2025-10-19 09:28:44 +02:00
yusyus
9c1a133c51 Add page count estimator for fast config validation
- Add estimate_pages.py script (~270 lines)
- Fast estimation without downloading content (HEAD requests only)
- Shows estimated total pages and recommended max_pages
- Validates URL patterns work correctly
- Estimates scraping time based on rate_limit
- Update CLAUDE.md with estimator workflow and commands
- Update README.md features section with estimation benefits
- Usage: python3 estimate_pages.py configs/react.json
- Time: 1-2 minutes vs 20-40 minutes for full scrape

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 02:44:50 +03:00
yusyus
59c2f9126d Optimize all framework configs with start_urls for better coverage
All configs now follow the steam-economy-complete.json pattern with:
- Multiple start_urls for comprehensive entry points
- Improved include patterns for better targeting
- Enhanced exclude patterns to skip irrelevant pages

Godot Config:
- Added 7 start_urls covering getting started, scripting, 2D, 3D, physics, animation, and classes
- Added include patterns: /getting_started/, /tutorials/, /classes/
- More focused scraping of core documentation

React Config:
- Added 6 start_urls covering learn, quick-start, reference, and hooks
- Existing patterns maintained (already well-optimized)

Vue Config:
- Added 6 start_urls covering introduction, essentials, components, composables, and API
- Fixed base_url from https://vuejs.org/guide/ to https://vuejs.org/
- Added /partners/ to exclude list

Django Config:
- Added 7 start_urls covering intro, models, views, templates, forms, auth, and reference
- Added /intro/ to include patterns
- Added /releases/ to exclude list (changelog not needed)

FastAPI Config:
- Added 7 start_urls covering tutorial, first-steps, path-params, body, dependencies, advanced, and reference
- Added /deployment/ to exclude list

Benefits:
- Better initial page discovery
- More comprehensive documentation coverage
- Faster scraping (direct entry to important sections)
- Reduced unnecessary page crawling
- Consistent pattern across all configs

All configs tested and validated:
 71/71 tests passing
 All 6 configs validated successfully

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 02:24:56 +03:00
yusyus
f9c8f1d610 Clean up macOS .DS_Store file from output directory 2025-10-19 02:21:25 +03:00
yusyus
f1fa8354d2 Add comprehensive test system with 71 tests (100% pass rate)
Test Framework:
- Created tests/ directory structure
- Added __init__.py for test package
- Implemented 71 comprehensive tests across 3 test suites

Test Suites:
1. test_config_validation.py (25 tests)
   - Valid/invalid config structure
   - Required fields validation
   - Name format validation
   - URL format validation
   - Selectors validation
   - URL patterns validation
   - Categories validation
   - Rate limit validation (0-10 range)
   - Max pages validation (1-10000 range)
   - Start URLs validation

2. test_scraper_features.py (28 tests)
   - URL validation (include/exclude patterns)
   - Language detection (Python, JavaScript, GDScript, C++, etc.)
   - Pattern extraction from documentation
   - Smart categorization (by URL, title, content)
   - Text cleaning utilities

3. test_integration.py (18 tests)
   - Dry-run mode functionality
   - Config loading and validation
   - Real config files validation (godot, react, vue, django, fastapi, steam)
   - URL processing and normalization
   - Content extraction

Test Runner (run_tests.py):
- Custom colored test runner with ANSI colors
- Detailed test summary with breakdown by category
- Success rate calculation
- Command-line options:
  --suite: Run specific test suite
  --verbose: Show each test name
  --quiet: Minimal output
  --failfast: Stop on first failure
  --list: List all available tests
- Execution time: ~1 second for full suite

Documentation:
- Added comprehensive TESTING.md guide
- Test writing templates
- Best practices
- Coverage information
- Troubleshooting guide

.gitignore:
- Added Python cache files
- Added output directory
- Added IDE and OS files

Test Results:
 71/71 tests passing (100% pass rate)
 All existing configs validated
 Fast execution (<1 second)
 Ready for CI/CD integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 02:08:58 +03:00
yusyus
eeef230c7b Implement high and medium priority improvements
High Priority:
- Fix hardcoded package_skill.py path (line 778)
  Changed from: /mnt/skills/examples/skill-creator/scripts/package_skill.py
  Changed to: package_skill.py (local repository path)

Medium Priority:
- Add comprehensive config validation
  * Validates required fields (name, base_url)
  * Validates name format (alphanumeric, hyphens, underscores)
  * Validates base_url format (http/https)
  * Validates selectors structure and recommends standard selectors
  * Validates url_patterns (include/exclude lists)
  * Validates categories structure
  * Validates rate_limit range (0-10 seconds)
  * Validates max_pages range (1-10000)
  * Validates start_urls format if present
  * Provides clear error messages for invalid configs

- Add --dry-run flag for preview mode
  * Previews first 20 URLs without saving data
  * Shows what would be scraped without creating files
  * Discovers links to estimate total pages
  * Displays configuration summary
  * No directories created in dry-run mode
  * Useful for testing configs before full scrape

All changes tested and working correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 01:57:59 +03:00
yusyus
f8c75a3b2d Add comprehensive CLAUDE.md for Claude Code integration
- Add root-level CLAUDE.md with complete guidance for Claude Code
- Include Python 3.7+ requirement
- Add first-time user workflow with all commands
- Include CSS selector testing with BeautifulSoup examples
- Add output quality verification commands
- Document force re-scrape instructions
- Fix package_skill.py path (remove hardcoded /mnt/skills reference)
- Add complete config file structure with real examples
- Include testing section for selector validation
- Add performance metrics table
- Document all key code locations with line numbers
- Organize by: quick start → architecture → workflows → troubleshooting
- Preserve existing docs/CLAUDE.md as detailed technical reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 01:43:02 +03:00
yusyus
a9b8591731 Update README.md with detailed project description and features; add initial VSCode settings. 2025-10-17 15:21:39 +00:00
yusyus
78b9cae398 Init 2025-10-17 15:14:44 +00:00
yusyus
397d47fe7c Initial commit 2025-10-17 17:43:48 +03:00