diff --git a/CLAUDE.md b/CLAUDE.md index 4fe3bb6..a15cfcc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,22 +2,28 @@ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. -## ๐ŸŽฏ Current Status (November 6, 2025) +## ๐ŸŽฏ Current Status (November 11, 2025) -**Version:** v2.0.0 (Production Ready - Major Feature Release) +**Version:** v2.0.0 (Production Ready - Published on PyPI!) **Active Development:** Flexible, incremental task-based approach ### Recent Updates (This Week): -**๐Ÿš€ Major Release: Unified Multi-Source Scraping (v2.0.0)** +**๐ŸŽ‰ MAJOR MILESTONE: Published on PyPI! (v2.0.0)** +- **๐Ÿ“ฆ PyPI Publication**: Install with `pip install skill-seekers` - https://pypi.org/project/skill-seekers/ +- **๐Ÿ”ง Modern Python Packaging**: pyproject.toml, src/ layout, entry points +- **โœ… CI/CD Fixed**: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12) +- **๐Ÿ“š Documentation Complete**: README, CHANGELOG, FUTURE_RELEASES.md all updated +- **๐Ÿš€ Unified CLI**: Single `skill-seekers` command with Git-style subcommands +- **๐Ÿงช Test Coverage**: 379 tests passing, 39% coverage +- **๐ŸŒ Community**: GitHub Discussion, Release notes, announcements published + +**๐Ÿš€ Unified Multi-Source Scraping (v2.0.0)** - **NEW**: Combine documentation + GitHub + PDF in one skill - **NEW**: Automatic conflict detection between docs and code - **NEW**: Rule-based and AI-powered merging -- **NEW**: Transparent conflict reporting with side-by-side comparison - **NEW**: 5 example unified configs (React, Django, FastAPI, Godot, FastAPI-test) -- **NEW**: Complete documentation in docs/UNIFIED_SCRAPING.md -- **NEW**: Integration tests added (378/390 tests passing, 12 unified tests need fixes) -- **Status**: โš ๏ธ Core functionality stable, unified tests need attention +- **Status**: โš ๏ธ 12 unified tests need fixes (core functionality stable) **โœ… Community Response (H1 Group):** - **Issue #8 Fixed** - Added BULLETPROOF_QUICKSTART.md and TROUBLESHOOTING.md for beginners @@ -34,13 +40,16 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co - ๐Ÿ“ Multi-source configs: django_unified, fastapi_unified, fastapi_unified_test, godot_unified, react_unified - ๐Ÿ“ Test/Example configs: godot_github, react_github, python-tutorial-test, example_pdf, test-manual -**๐Ÿ“‹ Next Up:** -- **Priority**: Fix 12 failing unified tests in tests/test_unified.py +**๐Ÿ“‹ Next Up (Post-PyPI v2.0.0):** +- **โœ… DONE**: PyPI publication complete +- **โœ… DONE**: CI/CD fixed - all checks passing +- **โœ… DONE**: Documentation updated (README, CHANGELOG, FUTURE_RELEASES.md) +- **Priority 1**: Fix 12 failing unified tests in tests/test_unified.py - ConfigValidator expecting dict instead of file path - ConflictDetector expecting dict pages, not list -- Task H1.3 - Create example project folder -- Task A3.1 - GitHub Pages site (skillseekersweb.com) -- Task J1.1 - Install MCP package for testing +- **Priority 2**: Task H1.3 - Create example project folder +- **Priority 3**: Task A3.1 - GitHub Pages site (skillseekersweb.com) +- **Priority 4**: Task J1.1 - Install MCP package for testing **๐Ÿ“Š Roadmap Progress:** - 134 tasks organized into 22 feature groups @@ -74,16 +83,33 @@ Skill Seeker automatically converts any documentation website into a Claude AI s **Python Version:** Python 3.10 or higher (required for MCP integration) -**Setup with Virtual Environment (Recommended):** +**Installation:** + +### Option 1: Install from PyPI (Recommended - Easiest!) ```bash -# One-time setup +# Install globally or in virtual environment +pip install skill-seekers + +# Use the unified CLI immediately +skill-seekers scrape --config configs/react.json +skill-seekers --help +``` + +### Option 2: Install from Source (For Development) +```bash +# Clone the repository +git clone https://github.com/yusufkaraaslan/Skill_Seekers.git +cd Skill_Seekers + +# Create virtual environment python3 -m venv venv source venv/bin/activate # macOS/Linux (Windows: venv\Scripts\activate) -pip install requests beautifulsoup4 pytest -pip freeze > requirements.txt -# Every time you use Skill Seeker in a new terminal session -source venv/bin/activate # Activate before using any commands +# Install in editable mode +pip install -e . + +# Or install dependencies manually +pip install -r requirements.txt ``` **Why use a virtual environment?** @@ -92,16 +118,8 @@ source venv/bin/activate # Activate before using any commands - Standard Python development practice - Required for running tests with pytest -**If someone else clones this repo:** -```bash -python3 -m venv venv -source venv/bin/activate -pip install -r requirements.txt -``` - **Optional (for API-based enhancement):** ```bash -source venv/bin/activate pip install anthropic export ANTHROPIC_API_KEY=sk-ant-... ``` @@ -146,8 +164,8 @@ skill-seekers unified --config configs/react_unified.json --merge-mode claude-en ### First-Time User Workflow (Recommended) ```bash -# 1. Install dependencies (one-time) -pip3 install requests beautifulsoup4 +# 1. Install from PyPI (one-time, easiest!) +pip install skill-seekers # 2. Estimate page count BEFORE scraping (fast, no data download) skill-seekers estimate configs/godot.json @@ -287,27 +305,46 @@ skill-seekers estimate configs/vue.json --max-discovery 2000 ## Repository Architecture -### File Structure +### File Structure (v2.0.0 - Modern Python Packaging) ``` Skill_Seekers/ -โ”œโ”€โ”€ cli/doc_scraper.py # Main tool (single-file, ~790 lines) -โ”œโ”€โ”€ cli/estimate_pages.py # Page count estimator (fast, no data) -โ”œโ”€โ”€ cli/enhance_skill.py # AI enhancement (API-based) -โ”œโ”€โ”€ cli/enhance_skill_local.py # AI enhancement (LOCAL, no API) -โ”œโ”€โ”€ cli/package_skill.py # Skill packager -โ”œโ”€โ”€ cli/run_tests.py # Test runner (390 tests, 378 passing) -โ”œโ”€โ”€ configs/ # Preset configurations +โ”œโ”€โ”€ pyproject.toml # Modern Python package configuration (PEP 621) +โ”œโ”€โ”€ src/ # Source code (src/ layout best practice) +โ”‚ โ””โ”€โ”€ skill_seekers/ +โ”‚ โ”œโ”€โ”€ __init__.py +โ”‚ โ”œโ”€โ”€ cli/ # CLI tools (entry points) +โ”‚ โ”‚ โ”œโ”€โ”€ doc_scraper.py # Main scraper (~790 lines) +โ”‚ โ”‚ โ”œโ”€โ”€ estimate_pages.py # Page count estimator +โ”‚ โ”‚ โ”œโ”€โ”€ enhance_skill.py # AI enhancement (API-based) +โ”‚ โ”‚ โ”œโ”€โ”€ package_skill.py # Skill packager +โ”‚ โ”‚ โ”œโ”€โ”€ github_scraper.py # GitHub scraper +โ”‚ โ”‚ โ”œโ”€โ”€ pdf_scraper.py # PDF scraper +โ”‚ โ”‚ โ”œโ”€โ”€ unified_scraper.py # Unified multi-source scraper +โ”‚ โ”‚ โ”œโ”€โ”€ merge_sources.py # Source merger +โ”‚ โ”‚ โ””โ”€โ”€ conflict_detector.py # Conflict detection +โ”‚ โ””โ”€โ”€ mcp/ # MCP server integration +โ”‚ โ””โ”€โ”€ server.py +โ”œโ”€โ”€ tests/ # Test suite (379 tests passing) +โ”‚ โ”œโ”€โ”€ test_scraper_features.py +โ”‚ โ”œโ”€โ”€ test_config_validation.py +โ”‚ โ”œโ”€โ”€ test_integration.py +โ”‚ โ”œโ”€โ”€ test_mcp_server.py +โ”‚ โ”œโ”€โ”€ test_unified.py # (12 tests need fixes) +โ”‚ โ””โ”€โ”€ ... +โ”œโ”€โ”€ configs/ # Preset configurations (24 configs) โ”‚ โ”œโ”€โ”€ godot.json โ”‚ โ”œโ”€โ”€ react.json -โ”‚ โ”œโ”€โ”€ vue.json -โ”‚ โ”œโ”€โ”€ django.json -โ”‚ โ”œโ”€โ”€ fastapi.json -โ”‚ โ””โ”€โ”€ steam-economy-complete.json +โ”‚ โ”œโ”€โ”€ django_unified.json # Multi-source configs +โ”‚ โ””โ”€โ”€ ... โ”œโ”€โ”€ docs/ # Documentation -โ”‚ โ”œโ”€โ”€ CLAUDE.md # Detailed technical architecture +โ”‚ โ”œโ”€โ”€ CLAUDE.md # This file โ”‚ โ”œโ”€โ”€ ENHANCEMENT.md # Enhancement guide -โ”‚ โ””โ”€โ”€ UPLOAD_GUIDE.md # How to upload skills +โ”‚ โ”œโ”€โ”€ UPLOAD_GUIDE.md # Upload instructions +โ”‚ โ””โ”€โ”€ UNIFIED_SCRAPING.md # Unified scraping guide +โ”œโ”€โ”€ README.md # User documentation +โ”œโ”€โ”€ CHANGELOG.md # Release history +โ”œโ”€โ”€ FUTURE_RELEASES.md # Roadmap โ””โ”€โ”€ output/ # Generated output (git-ignored) โ”œโ”€โ”€ {name}_data/ # Scraped raw data (cached) โ”‚ โ”œโ”€โ”€ pages/*.json # Individual page data @@ -324,28 +361,39 @@ Skill_Seekers/ โ””โ”€โ”€ assets/ # Empty (user assets) ``` +**Key Changes in v2.0.0:** +- **src/ layout**: Modern Python packaging structure +- **pyproject.toml**: PEP 621 compliant configuration +- **Entry points**: `skill-seekers` CLI with subcommands +- **Published to PyPI**: `pip install skill-seekers` + ### Data Flow -1. **Scrape Phase** (`scrape_all()` in doc_scraper.py:228-251): +1. **Scrape Phase** (`scrape_all()` in src/skill_seekers/cli/doc_scraper.py): - Input: Config JSON (name, base_url, selectors, url_patterns, categories) - Process: BFS traversal from base_url, respecting include/exclude patterns - Output: `output/{name}_data/pages/*.json` + `summary.json` -2. **Build Phase** (`build_skill()` in doc_scraper.py:561-601): +2. **Build Phase** (`build_skill()` in src/skill_seekers/cli/doc_scraper.py): - Input: Scraped JSON data from `output/{name}_data/` - Process: Load pages โ†’ Smart categorize โ†’ Extract patterns โ†’ Generate references - Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md` -3. **Enhancement Phase** (optional): +3. **Enhancement Phase** (optional via enhance_skill.py or enhance_skill_local.py): - Input: Built skill directory with references - Process: Claude analyzes references and rewrites SKILL.md - Output: Enhanced SKILL.md with real examples and guidance -4. **Package Phase**: +4. **Package Phase** (via package_skill.py): - Input: Skill directory - Process: Zip all files (excluding .backup) - Output: `{name}.zip` +5. **Upload Phase** (optional via upload_skill.py): + - Input: Skill .zip file + - Process: Upload to Claude AI via API + - Output: Skill available in Claude + ### Configuration File Structure Config files (`configs/*.json`) define scraping behavior: @@ -602,18 +650,30 @@ python3 /mnt/skills/examples/skill-creator/scripts/cli/package_skill.py output/g The correct command uses the local `cli/package_skill.py` in the repository root. -## Key Code Locations +## Key Code Locations (v2.0.0) -- **URL validation**: `is_valid_url()` doc_scraper.py:49-64 -- **Content extraction**: `extract_content()` doc_scraper.py:66-133 -- **Language detection**: `detect_language()` doc_scraper.py:135-165 -- **Pattern extraction**: `extract_patterns()` doc_scraper.py:167-183 -- **Smart categorization**: `smart_categorize()` doc_scraper.py:282-323 -- **Category inference**: `infer_categories()` doc_scraper.py:325-351 -- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:353-372 -- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:426-542 -- **Scraping loop**: `scrape_all()` doc_scraper.py:228-251 -- **Main workflow**: `main()` doc_scraper.py:663-789 +**Documentation Scraper** (`src/skill_seekers/cli/doc_scraper.py`): +- **URL validation**: `is_valid_url()` +- **Content extraction**: `extract_content()` +- **Language detection**: `detect_language()` +- **Pattern extraction**: `extract_patterns()` +- **Smart categorization**: `smart_categorize()` +- **Category inference**: `infer_categories()` +- **Quick reference generation**: `generate_quick_reference()` +- **SKILL.md generation**: `create_enhanced_skill_md()` +- **Scraping loop**: `scrape_all()` +- **Main workflow**: `main()` + +**Other Key Files**: +- **GitHub scraper**: `src/skill_seekers/cli/github_scraper.py` +- **PDF scraper**: `src/skill_seekers/cli/pdf_scraper.py` +- **Unified scraper**: `src/skill_seekers/cli/unified_scraper.py` +- **Conflict detection**: `src/skill_seekers/cli/conflict_detector.py` +- **Source merger**: `src/skill_seekers/cli/merge_sources.py` +- **Package tool**: `src/skill_seekers/cli/package_skill.py` +- **Upload tool**: `src/skill_seekers/cli/upload_skill.py` +- **MCP server**: `src/skill_seekers/mcp/server.py` +- **Entry points**: `pyproject.toml` (project.scripts section) ## Enhancement Details @@ -697,17 +757,26 @@ The correct command uses the local `cli/package_skill.py` in the repository root - ๐Ÿ“ `test-manual.json` - Manual testing config **Note:** โš ๏ธ = Unified configs have 12 failing tests that need fixing -**Last verified:** November 6, 2025 +**Last verified:** November 11, 2025 (v2.0.0 PyPI release) ## Additional Documentation +**User Guides:** - **[README.md](README.md)** - Complete user documentation -- **[BULLETPROOF_QUICKSTART.md](BULLETPROOF_QUICKSTART.md)** - Complete beginner guide **NEW!** -- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** - Comprehensive troubleshooting **NEW!** +- **[BULLETPROOF_QUICKSTART.md](BULLETPROOF_QUICKSTART.md)** - Complete beginner guide - **[QUICKSTART.md](QUICKSTART.md)** - Get started in 3 steps +- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** - Comprehensive troubleshooting + +**Technical Documentation:** - **[docs/CLAUDE.md](docs/CLAUDE.md)** - Detailed technical architecture - **[docs/ENHANCEMENT.md](docs/ENHANCEMENT.md)** - AI enhancement guide - **[docs/UPLOAD_GUIDE.md](docs/UPLOAD_GUIDE.md)** - How to upload skills to Claude +- **[docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)** - Multi-source scraping guide +- **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP server setup + +**Project Planning:** +- **[CHANGELOG.md](CHANGELOG.md)** - Release history and v2.0.0 details **UPDATED!** +- **[FUTURE_RELEASES.md](FUTURE_RELEASES.md)** - Roadmap for v2.1.0+ **NEW!** - **[FLEXIBLE_ROADMAP.md](FLEXIBLE_ROADMAP.md)** - Complete task catalog (134 tasks) - **[NEXT_TASKS.md](NEXT_TASKS.md)** - What to work on next - **[TODO.md](TODO.md)** - Current focus @@ -715,9 +784,29 @@ The correct command uses the local `cli/package_skill.py` in the repository root ## Notes for Claude Code -- This is a Python-based documentation scraper -- Single-file design (`doc_scraper.py` ~790 lines) -- No build system, no tests, minimal dependencies -- Output is cached and reusable +**Project Status (v2.0.0):** +- โœ… **Published on PyPI**: Install with `pip install skill-seekers` +- โœ… **Modern Python Packaging**: pyproject.toml, src/ layout, entry points +- โœ… **Unified CLI**: Single `skill-seekers` command with Git-style subcommands +- โœ… **CI/CD Working**: All 5 test matrix jobs passing (Ubuntu + macOS, Python 3.10-3.12) +- โœ… **Test Coverage**: 379 tests passing, 39% coverage +- โœ… **Documentation**: Complete user and technical documentation + +**Architecture:** +- **Python-based documentation scraper** with multi-source support +- **Main scraper**: `src/skill_seekers/cli/doc_scraper.py` (~790 lines) +- **Unified scraping**: Combines docs + GitHub + PDF with conflict detection +- **Modern packaging**: PEP 621 compliant with proper dependency management +- **MCP Integration**: 9 tools for Claude Code Max integration + +**Development Workflow:** +1. **Install**: `pip install -e .` (editable mode for development) +2. **Run tests**: `pytest tests/` (379 tests) +3. **Build package**: `uv build` or `python -m build` +4. **Publish**: `uv publish` (PyPI) + +**Key Points:** +- Output is cached and reusable in `output/` (git-ignored) - Enhancement is optional but highly recommended -- All scraped data stored in `output/` (git-ignored) +- All 24 configs are working and tested +- CI workflow requires `pip install -e .` to install package before running tests