# AGENTS.md - Skill Seekers Essential guidance for AI coding agents working with the Skill Seekers codebase. --- ## Project Overview **Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, PDF files, and videos into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems. ### Key Facts | Attribute | Value | |-----------|-------| | **Current Version** | 3.1.3 | | **Python Version** | 3.10+ (tested on 3.10, 3.11, 3.12, 3.13) | | **License** | MIT | | **Package Name** | `skill-seekers` (PyPI) | | **Source Files** | 182 Python files | | **Test Files** | 105+ test files | | **Website** | https://skillseekersweb.com/ | | **Repository** | https://github.com/yusufkaraaslan/Skill_Seekers | ### Supported Target Platforms | Platform | Format | Use Case | |----------|--------|----------| | **Claude AI** | ZIP + YAML | Claude Code skills | | **Google Gemini** | tar.gz | Gemini skills | | **OpenAI ChatGPT** | ZIP + Vector Store | Custom GPTs | | **LangChain** | Documents | QA chains, agents, retrievers | | **LlamaIndex** | TextNodes | Query engines, chat engines | | **Haystack** | Documents | Enterprise RAG pipelines | | **Pinecone** | Ready for upsert | Production vector search | | **Weaviate** | Vector objects | Vector database | | **Qdrant** | Points | Vector database | | **Chroma** | Documents | Local vector database | | **FAISS** | Index files | Local similarity search | | **Cursor IDE** | .cursorrules | AI coding assistant rules | | **Windsurf** | .windsurfrules | AI coding rules | | **Cline** | .clinerules + MCP | VS Code extension | | **Continue.dev** | HTTP context | Universal IDE support | | **Generic Markdown** | ZIP | Universal export | ### Core Workflow 1. **Scrape Phase** - Crawl documentation/GitHub/PDF/video sources 2. **Build Phase** - Organize content into categorized references 3. **Enhancement Phase** - AI-powered quality improvements (optional) 4. **Package Phase** - Create platform-specific packages 5. **Upload Phase** - Auto-upload to target platform (optional) --- ## Project Structure ``` /mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/ ├── src/skill_seekers/ # Main source code (src/ layout) │ ├── cli/ # CLI tools and commands (~70 modules) │ │ ├── adaptors/ # Platform adaptors (Strategy pattern) │ │ │ ├── base.py # Abstract base class (SkillAdaptor) │ │ │ ├── claude.py # Claude AI adaptor │ │ │ ├── gemini.py # Google Gemini adaptor │ │ │ ├── openai.py # OpenAI ChatGPT adaptor │ │ │ ├── markdown.py # Generic Markdown adaptor │ │ │ ├── chroma.py # Chroma vector DB adaptor │ │ │ ├── faiss_helpers.py # FAISS index adaptor │ │ │ ├── haystack.py # Haystack RAG adaptor │ │ │ ├── langchain.py # LangChain adaptor │ │ │ ├── llama_index.py # LlamaIndex adaptor │ │ │ ├── qdrant.py # Qdrant vector DB adaptor │ │ │ ├── weaviate.py # Weaviate vector DB adaptor │ │ │ └── streaming_adaptor.py # Streaming output adaptor │ │ ├── arguments/ # CLI argument definitions │ │ ├── parsers/ # Argument parsers │ │ │ └── extractors/ # Content extractors │ │ ├── presets/ # Preset configuration management │ │ ├── storage/ # Cloud storage adaptors │ │ ├── main.py # Unified CLI entry point │ │ ├── create_command.py # Unified create command │ │ ├── doc_scraper.py # Documentation scraper │ │ ├── github_scraper.py # GitHub repository scraper │ │ ├── pdf_scraper.py # PDF extraction │ │ ├── word_scraper.py # Word document scraper │ │ ├── video_scraper.py # Video extraction │ │ ├── video_setup.py # GPU detection & dependency installation │ │ ├── unified_scraper.py # Multi-source scraping │ │ ├── codebase_scraper.py # Local codebase analysis │ │ ├── enhance_command.py # AI enhancement command │ │ ├── enhance_skill_local.py # AI enhancement (local mode) │ │ ├── package_skill.py # Skill packager │ │ ├── upload_skill.py # Upload to platforms │ │ ├── cloud_storage_cli.py # Cloud storage CLI │ │ ├── benchmark_cli.py # Benchmarking CLI │ │ ├── sync_cli.py # Sync monitoring CLI │ │ └── workflows_command.py # Workflow management CLI │ ├── mcp/ # MCP server integration │ │ ├── server_fastmcp.py # FastMCP server (~708 lines) │ │ ├── server_legacy.py # Legacy server implementation │ │ ├── server.py # Server entry point │ │ ├── agent_detector.py # AI agent detection │ │ ├── git_repo.py # Git repository operations │ │ ├── source_manager.py # Config source management │ │ └── tools/ # MCP tool implementations │ │ ├── config_tools.py # Configuration tools │ │ ├── packaging_tools.py # Packaging tools │ │ ├── scraping_tools.py # Scraping tools │ │ ├── source_tools.py # Source management tools │ │ ├── splitting_tools.py # Config splitting tools │ │ ├── vector_db_tools.py # Vector database tools │ │ └── workflow_tools.py # Workflow management tools │ ├── sync/ # Sync monitoring module │ │ ├── detector.py # Change detection │ │ ├── models.py # Data models (Pydantic) │ │ ├── monitor.py # Monitoring logic │ │ └── notifier.py # Notification system │ ├── benchmark/ # Benchmarking framework │ │ ├── framework.py # Benchmark framework │ │ ├── models.py # Benchmark models │ │ └── runner.py # Benchmark runner │ ├── embedding/ # Embedding server │ │ ├── server.py # FastAPI embedding server │ │ ├── generator.py # Embedding generation │ │ ├── cache.py # Embedding cache │ │ └── models.py # Embedding models │ ├── workflows/ # YAML workflow presets (66 presets) │ ├── _version.py # Version information (reads from pyproject.toml) │ └── __init__.py # Package init ├── tests/ # Test suite (105+ test files) ├── configs/ # Preset configuration files ├── docs/ # Documentation (80+ markdown files) │ ├── integrations/ # Platform integration guides │ ├── guides/ # User guides │ ├── reference/ # API reference │ ├── features/ # Feature documentation │ ├── blog/ # Blog posts │ └── roadmap/ # Roadmap documents ├── examples/ # Usage examples ├── .github/workflows/ # CI/CD workflows ├── pyproject.toml # Main project configuration ├── requirements.txt # Pinned dependencies ├── mypy.ini # MyPy type checker configuration ├── Dockerfile # Main Docker image (multi-stage) ├── Dockerfile.mcp # MCP server Docker image └── docker-compose.yml # Full stack deployment ``` --- ## Build and Development Commands ### Prerequisites - Python 3.10 or higher - pip or uv package manager - Git (for GitHub scraping features) ### Setup (REQUIRED before any development) ```bash # Install in editable mode (REQUIRED for tests due to src/ layout) pip install -e . # Install with all platform dependencies pip install -e ".[all-llms]" # Install with all optional dependencies pip install -e ".[all]" # Install specific platforms only pip install -e ".[gemini]" # Google Gemini support pip install -e ".[openai]" # OpenAI ChatGPT support pip install -e ".[mcp]" # MCP server dependencies pip install -e ".[s3]" # AWS S3 support pip install -e ".[gcs]" # Google Cloud Storage pip install -e ".[azure]" # Azure Blob Storage pip install -e ".[embedding]" # Embedding server support pip install -e ".[rag-upload]" # Vector DB upload support # Install dev dependencies (using dependency-groups) pip install -e ".[dev]" ``` **CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first. ### Building ```bash # Build package using uv (recommended) uv build # Or using standard build python -m build # Publish to PyPI uv publish ``` ### Docker ```bash # Build Docker image docker build -t skill-seekers . # Run with docker-compose (includes vector databases) docker-compose up -d # Run MCP server only docker-compose up -d mcp-server # View logs docker-compose logs -f mcp-server ``` --- ## Testing Instructions ### Running Tests **CRITICAL:** Never skip tests - all tests must pass before commits. ```bash # All tests (must run pip install -e . first!) pytest tests/ -v # Specific test file pytest tests/test_scraper_features.py -v pytest tests/test_mcp_fastmcp.py -v pytest tests/test_cloud_storage.py -v # With coverage pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html # Single test pytest tests/test_scraper_features.py::test_detect_language -v # E2E tests pytest tests/test_e2e_three_stream_pipeline.py -v # Skip slow tests pytest tests/ -v -m "not slow" # Run only integration tests pytest tests/ -v -m integration # Run only specific marker pytest tests/ -v -m "not slow and not integration" ``` ### Test Architecture - **105+ test files** covering all features - **CI Matrix:** Ubuntu + macOS, Python 3.10-3.12 - Test markers defined in `pyproject.toml`: | Marker | Description | |--------|-------------| | `slow` | Tests taking >5 seconds | | `integration` | Requires external services (APIs) | | `e2e` | End-to-end tests (resource-intensive) | | `venv` | Requires virtual environment setup | | `bootstrap` | Bootstrap skill specific | | `benchmark` | Performance benchmark tests | ### Test Configuration From `pyproject.toml`: ```toml [tool.pytest.ini_options] testpaths = ["tests"] python_files = ["test_*.py"] addopts = "-v --tb=short --strict-markers" asyncio_mode = "auto" asyncio_default_fixture_loop_scope = "function" ``` The `conftest.py` file checks that the package is installed before running tests. --- ## Code Style Guidelines ### Linting and Formatting ```bash # Run ruff linter ruff check src/ tests/ # Run ruff formatter check ruff format --check src/ tests/ # Auto-fix issues ruff check src/ tests/ --fix ruff format src/ tests/ # Run mypy type checker mypy src/skill_seekers --show-error-codes --pretty ``` ### Style Rules (from pyproject.toml) - **Line length:** 100 characters - **Target Python:** 3.10+ - **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM - **Ignored rules:** E501, F541, ARG002, B007, I001, SIM114 - **Import sorting:** isort style with `skill_seekers` as first-party ### MyPy Configuration (from pyproject.toml) ```toml [tool.mypy] python_version = "3.10" warn_return_any = true warn_unused_configs = true disallow_untyped_defs = false disallow_incomplete_defs = false check_untyped_defs = true ignore_missing_imports = true show_error_codes = true pretty = true ``` ### Code Conventions 1. **Use type hints** where practical (gradual typing approach) 2. **Docstrings:** Use Google-style or standard docstrings 3. **Error handling:** Use specific exceptions, provide helpful messages 4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio` 5. **File naming:** Use snake_case for all Python files 6. **Class naming:** Use PascalCase for classes 7. **Function naming:** Use snake_case for functions and methods 8. **Constants:** Use UPPER_CASE for module-level constants --- ## Architecture Patterns ### Platform Adaptor Pattern (Strategy Pattern) All platform-specific logic is encapsulated in adaptors: ```python from skill_seekers.cli.adaptors import get_adaptor # Get platform-specific adaptor adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'langchain', etc. # Package skill adaptor.package(skill_dir='output/react/', output_path='output/') # Upload to platform adaptor.upload( package_path='output/react-gemini.tar.gz', api_key=os.getenv('GOOGLE_API_KEY') ) ``` Each adaptor inherits from `SkillAdaptor` base class and implements: - `format_skill_md()` - Format SKILL.md content - `package()` - Create platform-specific package - `upload()` - Upload to platform API - `validate_api_key()` - Validate API key format - `supports_enhancement()` - Whether AI enhancement is supported ### CLI Architecture (Git-style) Entry point: `src/skill_seekers/cli/main.py` The CLI uses subcommands that delegate to existing modules: ```bash # skill-seekers scrape --config react.json # Transforms to: doc_scraper.main() with modified sys.argv ``` **Available subcommands:** - `create` - Unified create command - `config` - Configuration wizard - `scrape` - Documentation scraping - `github` - GitHub repository scraping - `pdf` - PDF extraction - `word` - Word document extraction - `video` - Video extraction (YouTube or local). Use `--setup` to auto-detect GPU and install visual deps. - `unified` - Multi-source scraping - `analyze` / `codebase` - Local codebase analysis - `enhance` - AI enhancement - `package` - Package skill for target platform - `upload` - Upload to platform - `cloud` - Cloud storage operations - `sync` - Sync monitoring - `benchmark` - Performance benchmarking - `embed` - Embedding server - `install` / `install-agent` - Complete workflow - `stream` - Streaming ingestion - `update` - Incremental updates - `multilang` - Multi-language support - `quality` - Quality metrics - `resume` - Resume interrupted jobs - `estimate` - Estimate page counts - `workflows` - Workflow management ### MCP Server Architecture Two implementations: - `server_fastmcp.py` - Modern, decorator-based (recommended, ~708 lines) - `server_legacy.py` - Legacy implementation Tools are organized by category: - Config tools (3 tools): generate_config, list_configs, validate_config - Scraping tools (10 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video (supports `setup` parameter for GPU detection and visual dep installation), scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns - Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill - Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source - Splitting tools (2 tools): split_config, generate_router - Vector Database tools (4 tools): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant - Workflow tools (5 tools): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow **Running MCP Server:** ```bash # Stdio transport (default) python -m skill_seekers.mcp.server_fastmcp # HTTP transport python -m skill_seekers.mcp.server_fastmcp --http --port 8765 ``` ### Cloud Storage Architecture Abstract base class pattern for cloud providers: - `base_storage.py` - Defines `BaseStorageAdaptor` interface - `s3_storage.py` - AWS S3 implementation - `gcs_storage.py` - Google Cloud Storage implementation - `azure_storage.py` - Azure Blob Storage implementation ### Sync Monitoring Architecture Pydantic-based models in `src/skill_seekers/sync/`: - `models.py` - Data models (SyncConfig, ChangeReport, SyncState) - `detector.py` - Change detection logic - `monitor.py` - Monitoring daemon - `notifier.py` - Notification system (webhook, email, slack) --- ## Git Workflow ### Branch Structure ``` main (production) ↑ │ (only maintainer merges) │ development (integration) ← default branch for PRs ↑ │ (all contributor PRs go here) │ feature branches ``` - **`main`** - Production, always stable, protected - **`development`** - Active development, default for PRs - **Feature branches** - Your work, created from `development` ### Creating a Feature Branch ```bash # 1. Checkout development git checkout development git pull upstream development # 2. Create feature branch git checkout -b my-feature # 3. Make changes, commit, push git add . git commit -m "Add my feature" git push origin my-feature # 4. Create PR targeting 'development' branch ``` --- ## CI/CD Configuration ### GitHub Actions Workflows All workflows are in `.github/workflows/`: **`tests.yml`:** - Runs on: push/PR to `main` and `development` - Lint job: Ruff + MyPy - Test matrix: Ubuntu + macOS, Python 3.10-3.12 - Coverage: Uploads to Codecov **`release.yml`:** - Triggered on version tags (`v*`) - Builds and publishes to PyPI using `uv` - Creates GitHub release with changelog **`docker-publish.yml`:** - Builds and publishes Docker images - Multi-architecture support (linux/amd64, linux/arm64) **`vector-db-export.yml`:** - Tests vector database exports **`scheduled-updates.yml`:** - Scheduled sync monitoring **`quality-metrics.yml`:** - Quality metrics tracking **`test-vector-dbs.yml`:** - Vector database integration tests ### Pre-commit Checks (Manual) ```bash # Before committing, run: ruff check src/ tests/ ruff format --check src/ tests/ pytest tests/ -v -x # Stop on first failure ``` --- ## Security Considerations ### API Keys and Secrets 1. **Never commit API keys** to the repository 2. **Use environment variables:** - `ANTHROPIC_API_KEY` - Claude AI - `GOOGLE_API_KEY` - Google Gemini - `OPENAI_API_KEY` - OpenAI - `GITHUB_TOKEN` - GitHub API - `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` - AWS S3 - `GOOGLE_APPLICATION_CREDENTIALS` - GCS - `AZURE_STORAGE_CONNECTION_STRING` - Azure 3. **Configuration storage:** - Stored at `~/.config/skill-seekers/config.json` - Permissions: 600 (owner read/write only) ### Rate Limit Handling - GitHub API has rate limits (5000 requests/hour for authenticated) - The tool has built-in rate limit handling with retry logic - Use `--non-interactive` flag for CI/CD environments ### Custom API Endpoints Support for Claude-compatible APIs: ```bash export ANTHROPIC_API_KEY=your-custom-api-key export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1 ``` --- ## Common Development Tasks ### Adding a New CLI Command 1. Create module in `src/skill_seekers/cli/my_command.py` 2. Implement `main()` function with argument parsing 3. Add entry point in `pyproject.toml`: ```toml [project.scripts] skill-seekers-my-command = "skill_seekers.cli.my_command:main" ``` 4. Add subcommand handler in `src/skill_seekers/cli/main.py` 5. Add argument parser in `src/skill_seekers/cli/parsers/` 6. Add tests in `tests/test_my_command.py` ### Adding a New Platform Adaptor 1. Create `src/skill_seekers/cli/adaptors/my_platform.py` 2. Inherit from `SkillAdaptor` base class 3. Implement required methods: `package()`, `upload()`, `format_skill_md()` 4. Register in `src/skill_seekers/cli/adaptors/__init__.py` 5. Add optional dependencies in `pyproject.toml` 6. Add tests in `tests/test_adaptors/` ### Adding an MCP Tool 1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py` 2. Register in `src/skill_seekers/mcp/server_fastmcp.py` 3. Add test in `tests/test_mcp_fastmcp.py` ### Adding Cloud Storage Provider 1. Create module in `src/skill_seekers/cli/storage/my_storage.py` 2. Inherit from `BaseStorageAdaptor` base class 3. Implement required methods: `upload_file()`, `download_file()`, `list_files()`, `delete_file()` 4. Register in `src/skill_seekers/cli/storage/__init__.py` 5. Add optional dependencies in `pyproject.toml` --- ## Documentation ### Project Documentation (New Structure - v3.1.0+) **Entry Points:** - **README.md** - Main project documentation with navigation - **docs/README.md** - Documentation hub - **AGENTS.md** - This file, for AI coding agents **Getting Started (for new users):** - `docs/getting-started/01-installation.md` - Installation guide - `docs/getting-started/02-quick-start.md` - 3 commands to first skill - `docs/getting-started/03-your-first-skill.md` - Complete walkthrough - `docs/getting-started/04-next-steps.md` - Where to go from here **User Guides (common tasks):** - `docs/user-guide/01-core-concepts.md` - How Skill Seekers works - `docs/user-guide/02-scraping.md` - All scraping options - `docs/user-guide/03-enhancement.md` - AI enhancement explained - `docs/user-guide/04-packaging.md` - Export to platforms - `docs/user-guide/05-workflows.md` - Enhancement workflows - `docs/user-guide/06-troubleshooting.md` - Common issues **Reference (technical details):** - `docs/reference/CLI_REFERENCE.md` - Complete command reference (20 commands) - `docs/reference/MCP_REFERENCE.md` - MCP tools reference (33 tools) - `docs/reference/CONFIG_FORMAT.md` - JSON configuration specification - `docs/reference/ENVIRONMENT_VARIABLES.md` - All environment variables **Advanced (power user topics):** - `docs/advanced/mcp-server.md` - MCP server setup - `docs/advanced/mcp-tools.md` - Advanced MCP usage - `docs/advanced/custom-workflows.md` - Creating custom workflows - `docs/advanced/multi-source.md` - Multi-source scraping ### Configuration Documentation Preset configs are in `configs/` directory: - `godot.json` / `godot_unified.json` - Godot Engine - `blender.json` / `blender-unified.json` - Blender Engine - `claude-code.json` - Claude Code - `httpx_comprehensive.json` - HTTPX library - `medusa-mercurjs.json` - Medusa/MercurJS - `astrovalley_unified.json` - Astrovalley - `react.json` - React documentation - `configs/integrations/` - Integration-specific configs --- ## Key Dependencies ### Core Dependencies (Required) | Package | Version | Purpose | |---------|---------|---------| | `requests` | >=2.32.5 | HTTP requests | | `beautifulsoup4` | >=4.14.2 | HTML parsing | | `PyGithub` | >=2.5.0 | GitHub API | | `GitPython` | >=3.1.40 | Git operations | | `httpx` | >=0.28.1 | Async HTTP | | `anthropic` | >=0.76.0 | Claude AI API | | `PyMuPDF` | >=1.24.14 | PDF processing | | `Pillow` | >=11.0.0 | Image processing | | `pytesseract` | >=0.3.13 | OCR | | `pydantic` | >=2.12.3 | Data validation | | `pydantic-settings` | >=2.11.0 | Settings management | | `click` | >=8.3.0 | CLI framework | | `Pygments` | >=2.19.2 | Syntax highlighting | | `pathspec` | >=0.12.1 | Path matching | | `networkx` | >=3.0 | Graph operations | | `schedule` | >=1.2.0 | Scheduled tasks | | `python-dotenv` | >=1.1.1 | Environment variables | | `jsonschema` | >=4.25.1 | JSON validation | | `PyYAML` | >=6.0 | YAML parsing | | `langchain` | >=1.2.10 | LangChain integration | | `llama-index` | >=0.14.15 | LlamaIndex integration | ### Optional Dependencies | Feature | Package | Install Command | |---------|---------|-----------------| | MCP Server | `mcp>=1.25,<2` | `pip install -e ".[mcp]"` | | Google Gemini | `google-generativeai>=0.8.0` | `pip install -e ".[gemini]"` | | OpenAI | `openai>=1.0.0` | `pip install -e ".[openai]"` | | AWS S3 | `boto3>=1.34.0` | `pip install -e ".[s3]"` | | Google Cloud Storage | `google-cloud-storage>=2.10.0` | `pip install -e ".[gcs]"` | | Azure Blob Storage | `azure-storage-blob>=12.19.0` | `pip install -e ".[azure]"` | | Word Documents | `mammoth>=1.6.0`, `python-docx>=1.1.0` | `pip install -e ".[docx]"` | | Video (lightweight) | `yt-dlp>=2024.12.0`, `youtube-transcript-api>=1.2.0` | `pip install -e ".[video]"` | | Video (full) | +`faster-whisper`, `scenedetect`, `opencv-python-headless` (`easyocr` now installed via `--setup`) | `pip install -e ".[video-full]"` | | Video (GPU setup) | Auto-detects GPU, installs PyTorch + easyocr + all visual deps | `skill-seekers video --setup` | | Chroma DB | `chromadb>=0.4.0` | `pip install -e ".[chroma]"` | | Weaviate | `weaviate-client>=3.25.0` | `pip install -e ".[weaviate]"` | | Pinecone | `pinecone>=5.0.0` | `pip install -e ".[pinecone]"` | | Embedding Server | `fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0` | `pip install -e ".[embedding]"` | ### Dev Dependencies (in dependency-groups) | Package | Version | Purpose | |---------|---------|---------| | `pytest` | >=8.4.2 | Testing framework | | `pytest-asyncio` | >=0.24.0 | Async test support | | `pytest-cov` | >=7.0.0 | Coverage | | `coverage` | >=7.11.0 | Coverage reporting | | `ruff` | >=0.14.13 | Linting/formatting | | `mypy` | >=1.19.1 | Type checking | | `psutil` | >=5.9.0 | Process utilities for testing | | `numpy` | >=1.24.0 | Numerical operations | | `starlette` | >=0.31.0 | HTTP transport testing | | `httpx` | >=0.24.0 | HTTP client for testing | | `boto3` | >=1.26.0 | AWS S3 testing | | `google-cloud-storage` | >=2.10.0 | GCS testing | | `azure-storage-blob` | >=12.17.0 | Azure testing | --- ## Troubleshooting ### Common Issues **ImportError: No module named 'skill_seekers'** - Solution: Run `pip install -e .` **Tests failing with "package not installed"** - Solution: Ensure you ran `pip install -e .` in the correct virtual environment **MCP server import errors** - Solution: Install with `pip install -e ".[mcp]"` **Type checking failures** - MyPy is configured to be lenient (gradual typing) - Focus on critical paths, not full coverage **Docker build failures** - Ensure you have BuildKit enabled: `DOCKER_BUILDKIT=1` - Check that all submodules are initialized: `git submodule update --init` **Rate limit errors from GitHub** - Set `GITHUB_TOKEN` environment variable for authenticated requests - Improves rate limit from 60 to 5000 requests/hour ### Getting Help - Check **TROUBLESHOOTING.md** for detailed solutions - Review **docs/FAQ.md** for common questions - Visit https://skillseekersweb.com/ for documentation - Open an issue on GitHub with: - Clear title and description - Steps to reproduce - Expected vs actual behavior - Environment details (OS, Python version) - Error messages and stack traces --- ## Environment Variables Reference | Variable | Purpose | Required For | |----------|---------|--------------| | `ANTHROPIC_API_KEY` | Claude AI API access | Claude enhancement/upload | | `GOOGLE_API_KEY` | Google Gemini API access | Gemini enhancement/upload | | `OPENAI_API_KEY` | OpenAI API access | OpenAI enhancement/upload | | `GITHUB_TOKEN` | GitHub API authentication | GitHub scraping (recommended) | | `AWS_ACCESS_KEY_ID` | AWS S3 authentication | S3 cloud storage | | `AWS_SECRET_ACCESS_KEY` | AWS S3 authentication | S3 cloud storage | | `GOOGLE_APPLICATION_CREDENTIALS` | GCS authentication path | GCS cloud storage | | `AZURE_STORAGE_CONNECTION_STRING` | Azure Blob authentication | Azure cloud storage | | `ANTHROPIC_BASE_URL` | Custom Claude endpoint | Custom API endpoints | | `SKILL_SEEKERS_HOME` | Data directory path | Docker/runtime | | `SKILL_SEEKERS_OUTPUT` | Output directory path | Docker/runtime | --- ## Version Management The version is defined in `pyproject.toml` and dynamically read by `src/skill_seekers/_version.py`: ```python # _version.py reads from pyproject.toml __version__ = get_version() # Returns version from pyproject.toml ``` **To update version:** 1. Edit `version` in `pyproject.toml` 2. The `_version.py` file will automatically pick up the new version --- ## Configuration File Format Skill Seekers uses JSON configuration files to define scraping targets. Example structure: ```json { "name": "godot", "description": "Godot Engine documentation", "merge_mode": "claude-enhanced", "sources": [ { "type": "documentation", "base_url": "https://docs.godotengine.org/en/stable/", "extract_api": true, "selectors": { "main_content": "div[role='main']", "title": "title", "code_blocks": "pre" }, "url_patterns": { "include": [], "exclude": ["/search.html", "/_static/"] }, "categories": { "getting_started": ["introduction", "getting_started"], "scripting": ["scripting", "gdscript"] }, "rate_limit": 0.5, "max_pages": 500 }, { "type": "github", "repo": "godotengine/godot", "enable_codebase_analysis": true, "code_analysis_depth": "deep", "fetch_issues": true, "max_issues": 100 } ] } ``` --- ## Workflow Presets Skill Seekers includes 66 YAML workflow presets for AI enhancement in `src/skill_seekers/workflows/`: **Built-in presets:** - `default.yaml` - Standard enhancement workflow - `minimal.yaml` - Fast, minimal enhancement - `security-focus.yaml` - Security-focused review - `architecture-comprehensive.yaml` - Deep architecture analysis - `api-documentation.yaml` - API documentation focus - And 61 more specialized presets... **Usage:** ```bash # Apply a preset skill-seekers create ./my-project --enhance-workflow security-focus # Chain multiple presets skill-seekers create ./my-project --enhance-workflow security-focus --enhance-workflow minimal # Manage presets skill-seekers workflows list skill-seekers workflows show security-focus skill-seekers workflows copy security-focus ``` --- *This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.* *Last updated: 2026-03-01*