# AGENTS.md - Skill Seekers Concise reference for AI coding agents. Skill Seekers is a Python CLI tool (v3.2.0) that converts documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more into AI-ready skills for 16+ LLM platforms and RAG pipelines. ## Setup ```bash # REQUIRED before running tests (src/ layout — tests fail without this) pip install -e . # With dev tools pip install -e ".[dev]" # With all optional deps pip install -e ".[all]" ``` ## Build / Test / Lint Commands ```bash # Run ALL tests (never skip tests — all must pass before commits) pytest tests/ -v # Run a single test file pytest tests/test_scraper_features.py -v # Run a single test function pytest tests/test_scraper_features.py::test_detect_language -v # Run a single test class method pytest tests/test_adaptors/test_claude_adaptor.py::TestClaudeAdaptor::test_package -v # Skip slow/integration tests pytest tests/ -v -m "not slow and not integration" # With coverage pytest tests/ --cov=src/skill_seekers --cov-report=term # Lint (ruff) ruff check src/ tests/ ruff check src/ tests/ --fix # Format (ruff) ruff format --check src/ tests/ ruff format src/ tests/ # Type check (mypy) mypy src/skill_seekers --show-error-codes --pretty ``` **Test markers:** `slow`, `integration`, `e2e`, `venv`, `bootstrap`, `benchmark` **Async tests:** use `@pytest.mark.asyncio`; asyncio_mode is `auto`. ## Code Style ### Formatting Rules (ruff — from pyproject.toml) - **Line length:** 100 characters - **Target Python:** 3.10+ - **Enabled lint rules:** E, W, F, I, B, C4, UP, ARG, SIM - **Ignored rules:** E501 (line length handled by formatter), F541 (f-string style), ARG002 (unused method args for interface compliance), B007 (intentional unused loop vars), I001 (formatter handles imports), SIM114 (readability preference) ### Imports - Sort with isort (via ruff); `skill_seekers` is first-party - Standard library → third-party → first-party, separated by blank lines - Use `from __future__ import annotations` only if needed for forward refs - Guard optional imports with try/except ImportError (see `adaptors/__init__.py` pattern) ### Naming Conventions - **Files:** `snake_case.py` - **Classes:** `PascalCase` (e.g., `SkillAdaptor`, `ClaudeAdaptor`) - **Functions/methods:** `snake_case` - **Constants:** `UPPER_CASE` (e.g., `ADAPTORS`, `DEFAULT_CHUNK_TOKENS`) - **Private:** prefix with `_` ### Type Hints - Gradual typing — add hints where practical, not enforced everywhere - Use modern syntax: `str | None` not `Optional[str]`, `list[str]` not `List[str]` - MyPy config: `disallow_untyped_defs = false`, `check_untyped_defs = true`, `ignore_missing_imports = true` ### Docstrings - Module-level docstring on every file (triple-quoted, describes purpose) - Google-style or standard docstrings for public functions/classes - Include `Args:`, `Returns:`, `Raises:` sections where useful ### Error Handling - Use specific exceptions, never bare `except:` - Provide helpful error messages with context (see `get_adaptor()` in `adaptors/__init__.py`) - Use `raise ValueError(...)` for invalid arguments, `raise RuntimeError(...)` for state errors - Guard optional dependency imports with try/except and give clear install instructions on failure ### Suppressing Lint Warnings - Use inline `# noqa: XXXX` comments (e.g., `# noqa: F401` for re-exports, `# noqa: ARG001` for required but unused params) ## Supported Source Types (17) | Type | CLI Command | Config Type | Detection | |------|------------|-------------|-----------| | Documentation (web) | `scrape` / `create ` | `documentation` | HTTP/HTTPS URLs | | GitHub repo | `github` / `create owner/repo` | `github` | `owner/repo` or github.com URLs | | PDF | `pdf` / `create file.pdf` | `pdf` | `.pdf` extension | | Word (.docx) | `word` / `create file.docx` | `word` | `.docx` extension | | EPUB | `epub` / `create file.epub` | `epub` | `.epub` extension | | Video | `video` / `create ` | `video` | YouTube/Vimeo URLs, video extensions | | Local codebase | `analyze` / `create ./path` | `local` | Directory paths | | Jupyter Notebook | `jupyter` / `create file.ipynb` | `jupyter` | `.ipynb` extension | | Local HTML | `html` / `create file.html` | `html` | `.html`/`.htm` extensions | | OpenAPI/Swagger | `openapi` / `create spec.yaml` | `openapi` | `.yaml`/`.yml` with OpenAPI content | | AsciiDoc | `asciidoc` / `create file.adoc` | `asciidoc` | `.adoc`/`.asciidoc` extensions | | PowerPoint | `pptx` / `create file.pptx` | `pptx` | `.pptx` extension | | RSS/Atom | `rss` / `create feed.rss` | `rss` | `.rss`/`.atom` extensions | | Man pages | `manpage` / `create cmd.1` | `manpage` | `.1`-`.8`/`.man` extensions | | Confluence | `confluence` | `confluence` | API or export directory | | Notion | `notion` | `notion` | API or export directory | | Slack/Discord | `chat` | `chat` | Export directory or API | ## Project Layout ``` src/skill_seekers/ # Main package (src/ layout) cli/ # CLI commands and entry points adaptors/ # Platform adaptors (Strategy pattern, inherit SkillAdaptor) arguments/ # CLI argument definitions (one per source type) parsers/ # Subcommand parsers (one per source type) storage/ # Cloud storage (inherit BaseStorageAdaptor) main.py # Unified CLI entry point (COMMAND_MODULES dict) source_detector.py # Auto-detects source type from user input create_command.py # Unified `create` command routing config_validator.py # VALID_SOURCE_TYPES set + per-type validation unified_scraper.py # Multi-source orchestrator (scraped_data + dispatch) unified_skill_builder.py # Pairwise synthesis + generic merge mcp/ # MCP server (FastMCP + legacy) tools/ # MCP tool implementations by category sync/ # Sync monitoring (Pydantic models) benchmark/ # Benchmarking framework embedding/ # FastAPI embedding server workflows/ # 67 YAML workflow presets (includes complex-merge.yaml) _version.py # Reads version from pyproject.toml tests/ # 115+ test files (pytest) configs/ # Preset JSON scraping configs docs/ # 80+ markdown doc files ``` ## Key Patterns **Adaptor (Strategy) pattern** — all platform logic in `cli/adaptors/`. Inherit `SkillAdaptor`, implement `format_skill_md()`, `package()`, `upload()`. Register in `adaptors/__init__.py` ADAPTORS dict. **Scraper pattern** — each source type has: `cli/_scraper.py` (with `ToSkillConverter` class + `main()`), `arguments/.py`, `parsers/_parser.py`. Register in `parsers/__init__.py` PARSERS list, `main.py` COMMAND_MODULES dict, `config_validator.py` VALID_SOURCE_TYPES set. **Unified pipeline** — `unified_scraper.py` dispatches to per-type `_scrape_()` methods. `unified_skill_builder.py` uses pairwise synthesis for docs+github+pdf combos and `_generic_merge()` for all other combinations. **MCP tools** — grouped in `mcp/tools/` by category. `scrape_generic_tool` handles all new source types. **CLI subcommands** — git-style in `cli/main.py`. Each delegates to a module's `main()` function. ## Git Workflow - **`main`** — production, protected - **`development`** — default PR target, active dev - Feature branches created from `development` ## Pre-commit Checklist ```bash ruff check src/ tests/ ruff format --check src/ tests/ pytest tests/ -v -x # stop on first failure ``` Never commit API keys. Use env vars: `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `OPENAI_API_KEY`, `GITHUB_TOKEN`. ## CI GitHub Actions (`.github/workflows/tests.yml`): ruff + mypy lint job, then pytest matrix (Ubuntu + macOS, Python 3.10-3.12) with Codecov upload.