diff --git a/.gitignore b/.gitignore index 4450b38..8569091 100644 --- a/.gitignore +++ b/.gitignore @@ -61,5 +61,6 @@ htmlcov/ skill-seekers-configs/ .claude/skills .mcp.json +!distribution/claude-plugin/.mcp.json settings.json USER_GUIDE.md diff --git a/AGENTS.md b/AGENTS.md index d26c952..1afdc9a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,18 +1,20 @@ # AGENTS.md - Skill Seekers -Concise reference for AI coding agents. Skill Seekers is a Python CLI tool (v3.2.0) that converts documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more into AI-ready skills for 16+ LLM platforms and RAG pipelines. +Concise reference for AI coding agents. Skill Seekers is a Python CLI tool (v3.3.0) that converts documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more into AI-ready skills for 16+ LLM platforms and RAG pipelines. ## Setup ```bash -# REQUIRED before running tests (src/ layout — tests fail without this) +# REQUIRED before running tests (src/ layout — tests hard-exit if package not installed) pip install -e . -# With dev tools +# With dev tools (pytest, ruff, mypy, coverage) pip install -e ".[dev]" # With all optional deps pip install -e ".[all]" ``` +Note: `tests/conftest.py` checks that `skill_seekers` is importable and calls `sys.exit(1)` if not. Always install in editable mode first. + ## Build / Test / Lint Commands ```bash @@ -46,8 +48,10 @@ ruff format src/ tests/ mypy src/skill_seekers --show-error-codes --pretty ``` -**Test markers:** `slow`, `integration`, `e2e`, `venv`, `bootstrap`, `benchmark` -**Async tests:** use `@pytest.mark.asyncio`; asyncio_mode is `auto`. +**Pytest config** (from pyproject.toml): `addopts = "-v --tb=short --strict-markers"`, `asyncio_mode = "auto"`, `asyncio_default_fixture_loop_scope = "function"`. +**Test markers:** `slow`, `integration`, `e2e`, `venv`, `bootstrap`, `benchmark`, `asyncio`. +**Async tests:** use `@pytest.mark.asyncio`; asyncio_mode is `auto` so the decorator is often implicit. +**Test count:** 120 test files (107 in `tests/`, 13 in `tests/test_adaptors/`). ## Code Style @@ -61,61 +65,47 @@ mypy src/skill_seekers --show-error-codes --pretty - Sort with isort (via ruff); `skill_seekers` is first-party - Standard library → third-party → first-party, separated by blank lines - Use `from __future__ import annotations` only if needed for forward refs -- Guard optional imports with try/except ImportError (see `adaptors/__init__.py` pattern) +- Guard optional imports with try/except ImportError (see `adaptors/__init__.py` pattern): + ```python + try: + from .claude import ClaudeAdaptor + except ImportError: + ClaudeAdaptor = None + ``` ### Naming Conventions -- **Files:** `snake_case.py` -- **Classes:** `PascalCase` (e.g., `SkillAdaptor`, `ClaudeAdaptor`) -- **Functions/methods:** `snake_case` -- **Constants:** `UPPER_CASE` (e.g., `ADAPTORS`, `DEFAULT_CHUNK_TOKENS`) -- **Private:** prefix with `_` +- **Files:** `snake_case.py` (e.g., `source_detector.py`, `config_validator.py`) +- **Classes:** `PascalCase` (e.g., `SkillAdaptor`, `ClaudeAdaptor`, `SourceDetector`) +- **Functions/methods:** `snake_case` (e.g., `get_adaptor()`, `detect_language()`) +- **Constants:** `UPPER_CASE` (e.g., `ADAPTORS`, `DEFAULT_CHUNK_TOKENS`, `VALID_SOURCE_TYPES`) +- **Private:** prefix with `_` (e.g., `_read_existing_content()`, `_validate_unified()`) ### Type Hints - Gradual typing — add hints where practical, not enforced everywhere - Use modern syntax: `str | None` not `Optional[str]`, `list[str]` not `List[str]` - MyPy config: `disallow_untyped_defs = false`, `check_untyped_defs = true`, `ignore_missing_imports = true` +- Tests are excluded from strict type checking (`disallow_untyped_defs = false`, `check_untyped_defs = false` for `tests.*`) ### Docstrings - Module-level docstring on every file (triple-quoted, describes purpose) -- Google-style or standard docstrings for public functions/classes +- Google-style docstrings for public functions/classes - Include `Args:`, `Returns:`, `Raises:` sections where useful ### Error Handling - Use specific exceptions, never bare `except:` -- Provide helpful error messages with context (see `get_adaptor()` in `adaptors/__init__.py`) +- Provide helpful error messages with context - Use `raise ValueError(...)` for invalid arguments, `raise RuntimeError(...)` for state errors - Guard optional dependency imports with try/except and give clear install instructions on failure +- Chain exceptions with `raise ... from e` when wrapping ### Suppressing Lint Warnings - Use inline `# noqa: XXXX` comments (e.g., `# noqa: F401` for re-exports, `# noqa: ARG001` for required but unused params) -## Supported Source Types (17) - -| Type | CLI Command | Config Type | Detection | -|------|------------|-------------|-----------| -| Documentation (web) | `scrape` / `create ` | `documentation` | HTTP/HTTPS URLs | -| GitHub repo | `github` / `create owner/repo` | `github` | `owner/repo` or github.com URLs | -| PDF | `pdf` / `create file.pdf` | `pdf` | `.pdf` extension | -| Word (.docx) | `word` / `create file.docx` | `word` | `.docx` extension | -| EPUB | `epub` / `create file.epub` | `epub` | `.epub` extension | -| Video | `video` / `create ` | `video` | YouTube/Vimeo URLs, video extensions | -| Local codebase | `analyze` / `create ./path` | `local` | Directory paths | -| Jupyter Notebook | `jupyter` / `create file.ipynb` | `jupyter` | `.ipynb` extension | -| Local HTML | `html` / `create file.html` | `html` | `.html`/`.htm` extensions | -| OpenAPI/Swagger | `openapi` / `create spec.yaml` | `openapi` | `.yaml`/`.yml` with OpenAPI content | -| AsciiDoc | `asciidoc` / `create file.adoc` | `asciidoc` | `.adoc`/`.asciidoc` extensions | -| PowerPoint | `pptx` / `create file.pptx` | `pptx` | `.pptx` extension | -| RSS/Atom | `rss` / `create feed.rss` | `rss` | `.rss`/`.atom` extensions | -| Man pages | `manpage` / `create cmd.1` | `manpage` | `.1`-`.8`/`.man` extensions | -| Confluence | `confluence` | `confluence` | API or export directory | -| Notion | `notion` | `notion` | API or export directory | -| Slack/Discord | `chat` | `chat` | Export directory or API | - ## Project Layout ``` src/skill_seekers/ # Main package (src/ layout) - cli/ # CLI commands and entry points + cli/ # CLI commands and entry points (96 files) adaptors/ # Platform adaptors (Strategy pattern, inherit SkillAdaptor) arguments/ # CLI argument definitions (one per source type) parsers/ # Subcommand parsers (one per source type) @@ -127,15 +117,15 @@ src/skill_seekers/ # Main package (src/ layout) unified_scraper.py # Multi-source orchestrator (scraped_data + dispatch) unified_skill_builder.py # Pairwise synthesis + generic merge mcp/ # MCP server (FastMCP + legacy) - tools/ # MCP tool implementations by category + tools/ # MCP tool implementations by category (10 files) sync/ # Sync monitoring (Pydantic models) benchmark/ # Benchmarking framework embedding/ # FastAPI embedding server - workflows/ # 67 YAML workflow presets (includes complex-merge.yaml) + workflows/ # 67 YAML workflow presets _version.py # Reads version from pyproject.toml -tests/ # 115+ test files (pytest) +tests/ # 120 test files (pytest) configs/ # Preset JSON scraping configs -docs/ # 80+ markdown doc files +docs/ # Documentation (guides, integrations, architecture) ``` ## Key Patterns @@ -150,6 +140,8 @@ docs/ # 80+ markdown doc files **CLI subcommands** — git-style in `cli/main.py`. Each delegates to a module's `main()` function. +**Supported source types (17):** documentation (web), github, pdf, word, epub, video, local codebase, jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat (slack/discord). Each detected automatically by `source_detector.py`. + ## Git Workflow - **`main`** — production, protected @@ -168,4 +160,11 @@ Never commit API keys. Use env vars: `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, `OPE ## CI -GitHub Actions (`.github/workflows/tests.yml`): ruff + mypy lint job, then pytest matrix (Ubuntu + macOS, Python 3.10-3.12) with Codecov upload. +GitHub Actions (7 workflows in `.github/workflows/`): +- **tests.yml** — ruff + mypy lint job, then pytest matrix (Ubuntu + macOS, Python 3.10-3.12) with Codecov upload +- **release.yml** — tag-triggered: tests → version verification → PyPI publish via `uv build` +- **test-vector-dbs.yml** — tests vector DB adaptors (weaviate, chroma, faiss, qdrant) +- **docker-publish.yml** — multi-platform Docker builds (amd64, arm64) for CLI + MCP images +- **quality-metrics.yml** — quality analysis with configurable threshold +- **scheduled-updates.yml** — weekly skill updates for popular frameworks +- **vector-db-export.yml** — weekly vector DB exports diff --git a/Dockerfile.mcp b/Dockerfile.mcp index 6e7cc3e..7baba55 100644 --- a/Dockerfile.mcp +++ b/Dockerfile.mcp @@ -4,8 +4,8 @@ FROM python:3.12-slim LABEL maintainer="Skill Seekers " -LABEL description="Skill Seekers MCP Server - 25 tools for AI skills generation" -LABEL version="2.9.0" +LABEL description="Skill Seekers MCP Server - 35 tools for AI skills generation" +LABEL version="3.3.0" WORKDIR /app @@ -48,9 +48,10 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \ # Volumes VOLUME ["/data", "/configs", "/output"] -# Expose MCP server port -EXPOSE 8765 +# Expose MCP server port (default 8765, overridden by $PORT on cloud platforms) +EXPOSE ${MCP_PORT:-8765} # Start MCP server in HTTP mode by default -# Use --transport stdio for stdio mode -CMD ["python", "-m", "skill_seekers.mcp.server_fastmcp", "--transport", "http", "--port", "8765"] +# Uses shell form so $PORT/$MCP_PORT env vars are expanded at runtime +# Cloud platforms (Render, Railway, etc.) set $PORT automatically +CMD python -m skill_seekers.mcp.server_fastmcp --http --host 0.0.0.0 --port ${PORT:-${MCP_PORT:-8765}} diff --git a/distribution/claude-plugin/.claude-plugin/plugin.json b/distribution/claude-plugin/.claude-plugin/plugin.json new file mode 100644 index 0000000..1b3ed69 --- /dev/null +++ b/distribution/claude-plugin/.claude-plugin/plugin.json @@ -0,0 +1,11 @@ +{ + "name": "skill-seekers", + "description": "Transform 17 source types (docs, GitHub, PDFs, videos, Jupyter, Confluence, Notion, Slack, and more) into AI-ready skills and RAG knowledge for 16+ LLM platforms.", + "version": "3.3.0", + "author": { + "name": "Yusuf Karaaslan" + }, + "homepage": "https://github.com/yusufkaraaslan/Skill_Seekers", + "repository": "https://github.com/yusufkaraaslan/Skill_Seekers", + "license": "MIT" +} diff --git a/distribution/claude-plugin/.mcp.json b/distribution/claude-plugin/.mcp.json new file mode 100644 index 0000000..c0fa9c9 --- /dev/null +++ b/distribution/claude-plugin/.mcp.json @@ -0,0 +1,6 @@ +{ + "skill-seekers": { + "command": "python", + "args": ["-m", "skill_seekers.mcp.server_fastmcp"] + } +} diff --git a/distribution/claude-plugin/README.md b/distribution/claude-plugin/README.md new file mode 100644 index 0000000..a433fc1 --- /dev/null +++ b/distribution/claude-plugin/README.md @@ -0,0 +1,93 @@ +# Skill Seekers — Claude Code Plugin + +Transform 17 source types into AI-ready skills and RAG knowledge, directly from Claude Code. + +## Installation + +### From the Official Plugin Directory + +``` +/plugin install skill-seekers@claude-plugin-directory +``` + +Or browse for it in `/plugin > Discover`. + +### Local Installation (for development) + +```bash +claude --plugin-dir ./path/to/skill-seekers-plugin +``` + +### Prerequisites + +The plugin requires `skill-seekers` to be installed: + +```bash +pip install skill-seekers[mcp] +``` + +## What's Included + +### MCP Server (35 tools) + +The plugin bundles the Skill Seekers MCP server providing tools for: +- Scraping documentation, GitHub repos, PDFs, videos, and 13 other source types +- Packaging skills for 16+ LLM platforms +- Exporting to vector databases (Weaviate, Chroma, FAISS, Qdrant) +- Managing configs, workflows, and sources + +### Slash Commands + +| Command | Description | +|---------|-------------| +| `/skill-seekers:create-skill ` | Create a skill from any source (auto-detects type) | +| `/skill-seekers:sync-config ` | Sync config URLs against live docs | +| `/skill-seekers:install-skill ` | End-to-end: fetch, scrape, enhance, package, install | + +### Agent Skill + +The **skill-builder** skill is automatically available to Claude. It detects source types and uses the appropriate MCP tools to build skills autonomously. + +## Usage Examples + +``` +# Create a skill from a documentation site +/skill-seekers:create-skill https://react.dev + +# Create from a GitHub repo, targeting LangChain +/skill-seekers:create-skill pallets/flask --target langchain + +# Full install workflow with AI enhancement +/skill-seekers:install-skill https://fastapi.tiangolo.com --enhance + +# Sync an existing config +/skill-seekers:sync-config react +``` + +Or just ask Claude naturally: +> "Create an AI skill from the React documentation" +> "Scrape the Flask GitHub repo and package it for OpenAI" +> "Export my skill to a Chroma vector database" + +The skill-builder agent skill will automatically detect the intent and use the right tools. + +## Remote MCP Alternative + +By default, the plugin runs the MCP server locally via `python -m skill_seekers.mcp.server_fastmcp`. To use a remote server instead, edit `.mcp.json`: + +```json +{ + "skill-seekers": { + "type": "http", + "url": "https://your-hosted-server.com/mcp" + } +} +``` + +## Supported Source Types + +Documentation (web), GitHub repos, PDFs, Word docs, EPUBs, videos, local codebases, Jupyter notebooks, HTML files, OpenAPI specs, AsciiDoc, PowerPoint, RSS/Atom feeds, man pages, Confluence, Notion, Slack/Discord exports. + +## License + +MIT — https://github.com/yusufkaraaslan/Skill_Seekers diff --git a/distribution/claude-plugin/commands/create-skill.md b/distribution/claude-plugin/commands/create-skill.md new file mode 100644 index 0000000..7c0a584 --- /dev/null +++ b/distribution/claude-plugin/commands/create-skill.md @@ -0,0 +1,52 @@ +--- +description: Create an AI skill from any source (URL, repo, PDF, video, notebook, etc.) +--- + +# Create Skill + +Create an AI-ready skill from a source. The source type is auto-detected. + +## Usage + +``` +/skill-seekers:create-skill [--target ] [--output ] +``` + +## Instructions + +When the user provides a source via `$ARGUMENTS`, run the `skill-seekers create` command to generate a skill. + +1. Parse the arguments: extract the source (first argument) and any flags. +2. If no `--target` is specified, default to `claude`. +3. If no `--output` is specified, default to `./output`. +4. Run the command: + ```bash + skill-seekers create "$SOURCE" --target "$TARGET" --output "$OUTPUT" + ``` +5. After completion, read the generated `SKILL.md` and summarize what was created. + +## Source Types (auto-detected) + +- **URL** (https://...) → Documentation scraping +- **owner/repo** or github.com URL → GitHub repo analysis +- **file.pdf** → PDF extraction +- **file.ipynb** → Jupyter notebook +- **file.docx** → Word document +- **file.epub** → EPUB book +- **YouTube/Vimeo URL** → Video transcript +- **./directory** → Local codebase analysis +- **file.yaml** with OpenAPI → API spec +- **file.pptx** → PowerPoint +- **file.adoc** → AsciiDoc +- **file.html** → HTML page +- **file.rss** → RSS/Atom feed +- **cmd.1** → Man page + +## Examples + +``` +/skill-seekers:create-skill https://react.dev +/skill-seekers:create-skill pallets/flask --target langchain +/skill-seekers:create-skill ./docs/api.pdf --target openai +/skill-seekers:create-skill https://youtube.com/watch?v=abc123 +``` diff --git a/distribution/claude-plugin/commands/install-skill.md b/distribution/claude-plugin/commands/install-skill.md new file mode 100644 index 0000000..63595fa --- /dev/null +++ b/distribution/claude-plugin/commands/install-skill.md @@ -0,0 +1,44 @@ +--- +description: One-command skill installation — fetch config, scrape, enhance, package, and install +--- + +# Install Skill + +Complete end-to-end workflow: fetch a config (from preset or URL), scrape the source, optionally enhance with AI, package for the target platform, and install. + +## Usage + +``` +/skill-seekers:install-skill [--target ] [--enhance] +``` + +## Instructions + +When the user provides a source or config via `$ARGUMENTS`: + +1. Determine if the argument is a config preset name, config file path, or a direct source. +2. Use the `install_skill` MCP tool if available, or run the equivalent CLI commands: + ```bash + # For preset configs + skill-seekers install --config "$CONFIG" --target "$TARGET" + + # For direct sources + skill-seekers create "$SOURCE" --target "$TARGET" + ``` +3. If `--enhance` is specified, run enhancement after initial scraping: + ```bash + skill-seekers enhance "$SKILL_DIR" --target "$TARGET" + ``` +4. Report the final skill location and how to use it. + +## Target Platforms + +`claude`, `openai`, `gemini`, `langchain`, `llamaindex`, `haystack`, `cursor`, `windsurf`, `continue`, `cline`, `markdown` + +## Examples + +``` +/skill-seekers:install-skill react --target claude +/skill-seekers:install-skill https://fastapi.tiangolo.com --target langchain --enhance +/skill-seekers:install-skill pallets/flask +``` diff --git a/distribution/claude-plugin/commands/sync-config.md b/distribution/claude-plugin/commands/sync-config.md new file mode 100644 index 0000000..273bd15 --- /dev/null +++ b/distribution/claude-plugin/commands/sync-config.md @@ -0,0 +1,32 @@ +--- +description: Sync a scraping config's URLs against the live documentation site +--- + +# Sync Config + +Synchronize a Skill Seekers config file with the current state of a documentation site. Detects new pages, removed pages, and URL changes. + +## Usage + +``` +/skill-seekers:sync-config +``` + +## Instructions + +When the user provides a config path or preset name via `$ARGUMENTS`: + +1. If it's a preset name (e.g., `react`, `godot`), look for it in the `configs/` directory or fetch from the API. +2. Run the sync command: + ```bash + skill-seekers sync-config "$CONFIG" + ``` +3. Report what changed: new URLs found, removed URLs, and any conflicts. +4. Ask the user if they want to update the config and re-scrape. + +## Examples + +``` +/skill-seekers:sync-config configs/react.json +/skill-seekers:sync-config react +``` diff --git a/distribution/claude-plugin/skills/skill-builder/SKILL.md b/distribution/claude-plugin/skills/skill-builder/SKILL.md new file mode 100644 index 0000000..c0d8b40 --- /dev/null +++ b/distribution/claude-plugin/skills/skill-builder/SKILL.md @@ -0,0 +1,69 @@ +--- +name: skill-builder +description: Automatically detect source types and build AI skills using Skill Seekers. Use when the user wants to create skills from documentation, repos, PDFs, videos, or other knowledge sources. +--- + +# Skill Builder + +You have access to the Skill Seekers MCP server which provides 35 tools for converting knowledge sources into AI-ready skills. + +## When to Use This Skill + +Use this skill when the user: +- Wants to create an AI skill from a documentation site, GitHub repo, PDF, video, or other source +- Needs to convert documentation into a format suitable for LLM consumption +- Wants to update or sync existing skills with their source documentation +- Needs to export skills to vector databases (Weaviate, Chroma, FAISS, Qdrant) +- Asks about scraping, converting, or packaging documentation for AI + +## Source Type Detection + +Automatically detect the source type from user input: + +| Input Pattern | Source Type | Tool to Use | +|---------------|-------------|-------------| +| `https://...` (not GitHub/YouTube) | Documentation | `scrape_docs` | +| `owner/repo` or `github.com/...` | GitHub | `scrape_github` | +| `*.pdf` | PDF | `scrape_pdf` | +| YouTube/Vimeo URL or video file | Video | `scrape_video` | +| Local directory path | Codebase | `scrape_codebase` | +| `*.ipynb`, `*.html`, `*.yaml` (OpenAPI), `*.adoc`, `*.pptx`, `*.rss`, `*.1`-`.8` | Various | `scrape_generic` | +| JSON config file | Unified | Use config with `scrape_docs` | + +## Recommended Workflow + +1. **Detect source type** from the user's input +2. **Generate or fetch config** using `generate_config` or `fetch_config` if needed +3. **Estimate scope** with `estimate_pages` for documentation sites +4. **Scrape the source** using the appropriate scraping tool +5. **Enhance** with `enhance_skill` if the user wants AI-powered improvements +6. **Package** with `package_skill` for the target platform +7. **Export to vector DB** if requested using `export_to_*` tools + +## Available MCP Tools + +### Config Management +- `generate_config` — Generate a scraping config from a URL +- `list_configs` — List available preset configs +- `validate_config` — Validate a config file + +### Scraping (use based on source type) +- `scrape_docs` — Documentation sites +- `scrape_github` — GitHub repositories +- `scrape_pdf` — PDF files +- `scrape_video` — Video transcripts +- `scrape_codebase` — Local code analysis +- `scrape_generic` — Jupyter, HTML, OpenAPI, AsciiDoc, PPTX, RSS, manpage, Confluence, Notion, chat + +### Post-processing +- `enhance_skill` — AI-powered skill enhancement +- `package_skill` — Package for target platform +- `upload_skill` — Upload to platform API +- `install_skill` — End-to-end install workflow + +### Advanced +- `detect_patterns` — Design pattern detection in code +- `extract_test_examples` — Extract usage examples from tests +- `build_how_to_guides` — Generate how-to guides from tests +- `split_config` — Split large configs into focused skills +- `export_to_weaviate`, `export_to_chroma`, `export_to_faiss`, `export_to_qdrant` — Vector DB export diff --git a/distribution/github-action/README.md b/distribution/github-action/README.md new file mode 100644 index 0000000..dd9f46e --- /dev/null +++ b/distribution/github-action/README.md @@ -0,0 +1,147 @@ +# Skill Seekers GitHub Action + +Transform documentation, GitHub repos, PDFs, videos, and 13 other source types into AI-ready skills and RAG knowledge — directly in your CI/CD pipeline. + +## Quick Start + +```yaml +- uses: yusufkaraaslan/skill-seekers-action@v3 + with: + source: 'https://react.dev' +``` + +## Inputs + +| Input | Required | Default | Description | +|-------|----------|---------|-------------| +| `source` | Yes | — | Source URL, file path, or `owner/repo` | +| `command` | No | `create` | Command: `create`, `scrape`, `github`, `pdf`, `video`, `analyze`, `unified` | +| `target` | No | `claude` | Target platform: `claude`, `openai`, `gemini`, `langchain`, `llamaindex`, `markdown` | +| `config` | No | — | Path to JSON config file | +| `output-dir` | No | `output` | Output directory | +| `extra-args` | No | — | Additional CLI arguments | + +## Outputs + +| Output | Description | +|--------|-------------| +| `skill-dir` | Path to the generated skill directory | +| `skill-name` | Name of the generated skill | + +## Examples + +### Auto-update documentation skill weekly + +```yaml +name: Update AI Skills +on: + schedule: + - cron: '0 6 * * 1' # Every Monday 6am UTC + workflow_dispatch: + +jobs: + update-skills: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - uses: yusufkaraaslan/skill-seekers-action@v3 + with: + source: 'https://react.dev' + target: 'langchain' + + - uses: actions/upload-artifact@v4 + with: + name: react-skill + path: output/ +``` + +### Generate skill from GitHub repo + +```yaml +- uses: yusufkaraaslan/skill-seekers-action@v3 + with: + source: 'pallets/flask' + command: 'github' + target: 'claude' +``` + +### Process PDF documentation + +```yaml +- uses: actions/checkout@v4 + +- uses: yusufkaraaslan/skill-seekers-action@v3 + with: + source: 'docs/api-reference.pdf' + command: 'pdf' +``` + +### Unified multi-source build with config + +```yaml +- uses: actions/checkout@v4 + +- uses: yusufkaraaslan/skill-seekers-action@v3 + with: + config: 'configs/my-project.json' + command: 'unified' + target: 'openai' +``` + +### Commit generated skill back to repo + +```yaml +- uses: actions/checkout@v4 + +- uses: yusufkaraaslan/skill-seekers-action@v3 + id: generate + with: + source: 'https://fastapi.tiangolo.com' + +- name: Commit skill + run: | + git config user.name "github-actions[bot]" + git config user.email "github-actions[bot]@users.noreply.github.com" + git add output/ + git diff --staged --quiet || git commit -m "Update AI skill: ${{ steps.generate.outputs.skill-name }}" + git push +``` + +## Environment Variables + +Pass API keys as environment variables for AI-enhanced skills: + +```yaml +env: + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} + OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} + GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} +``` + +## Supported Source Types + +| Type | Example Source | +|------|---------------| +| Documentation (web) | `https://react.dev` | +| GitHub repo | `pallets/flask` or `https://github.com/pallets/flask` | +| PDF | `docs/manual.pdf` | +| Video | `https://youtube.com/watch?v=...` | +| Local codebase | `./src` | +| Jupyter Notebook | `analysis.ipynb` | +| OpenAPI/Swagger | `openapi.yaml` | +| Word (.docx) | `docs/guide.docx` | +| EPUB | `book.epub` | +| PowerPoint | `slides.pptx` | +| AsciiDoc | `docs/guide.adoc` | +| HTML | `page.html` | +| RSS/Atom | `feed.rss` | +| Man pages | `tool.1` | +| Confluence | Via config file | +| Notion | Via config file | +| Chat (Slack/Discord) | Via config file | + +## License + +MIT diff --git a/distribution/github-action/action.yml b/distribution/github-action/action.yml new file mode 100644 index 0000000..17b9977 --- /dev/null +++ b/distribution/github-action/action.yml @@ -0,0 +1,92 @@ +name: 'Skill Seekers - AI Knowledge Builder' +description: 'Transform documentation, repos, PDFs, videos, and 13 other source types into AI skills and RAG knowledge' +author: 'Yusuf Karaaslan' + +branding: + icon: 'book-open' + color: 'blue' + +inputs: + source: + description: 'Source URL, file path, or owner/repo for GitHub repos' + required: true + command: + description: 'Command to run: create (auto-detect), scrape, github, pdf, video, analyze, unified' + required: false + default: 'create' + target: + description: 'Output target platform: claude, openai, gemini, langchain, llamaindex, markdown, cursor, windsurf' + required: false + default: 'claude' + config: + description: 'Path to JSON config file (for unified/advanced scraping)' + required: false + output-dir: + description: 'Output directory for generated skills' + required: false + default: 'output' + extra-args: + description: 'Additional CLI arguments to pass to skill-seekers' + required: false + default: '' + +outputs: + skill-dir: + description: 'Path to the generated skill directory' + value: ${{ steps.run.outputs.skill-dir }} + skill-name: + description: 'Name of the generated skill' + value: ${{ steps.run.outputs.skill-name }} + +runs: + using: 'composite' + steps: + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.12' + + - name: Install Skill Seekers + shell: bash + run: pip install skill-seekers + + - name: Run Skill Seekers + id: run + shell: bash + env: + ANTHROPIC_API_KEY: ${{ env.ANTHROPIC_API_KEY }} + OPENAI_API_KEY: ${{ env.OPENAI_API_KEY }} + GOOGLE_API_KEY: ${{ env.GOOGLE_API_KEY }} + GITHUB_TOKEN: ${{ env.GITHUB_TOKEN }} + run: | + set -euo pipefail + + OUTPUT_DIR="${{ inputs.output-dir }}" + mkdir -p "$OUTPUT_DIR" + + CMD="${{ inputs.command }}" + SOURCE="${{ inputs.source }}" + TARGET="${{ inputs.target }}" + CONFIG="${{ inputs.config }}" + EXTRA="${{ inputs.extra-args }}" + + # Build the command + if [ "$CMD" = "create" ]; then + skill-seekers create "$SOURCE" --target "$TARGET" --output "$OUTPUT_DIR" $EXTRA + elif [ -n "$CONFIG" ]; then + skill-seekers "$CMD" --config "$CONFIG" --target "$TARGET" --output "$OUTPUT_DIR" $EXTRA + else + skill-seekers "$CMD" "$SOURCE" --target "$TARGET" --output "$OUTPUT_DIR" $EXTRA + fi + + # Find the generated skill directory + SKILL_DIR=$(find "$OUTPUT_DIR" -name "SKILL.md" -exec dirname {} \; | head -1) + SKILL_NAME=$(basename "$SKILL_DIR" 2>/dev/null || echo "unknown") + + echo "skill-dir=$SKILL_DIR" >> "$GITHUB_OUTPUT" + echo "skill-name=$SKILL_NAME" >> "$GITHUB_OUTPUT" + + echo "### Skill Generated" >> "$GITHUB_STEP_SUMMARY" + echo "- **Name:** $SKILL_NAME" >> "$GITHUB_STEP_SUMMARY" + echo "- **Directory:** $SKILL_DIR" >> "$GITHUB_STEP_SUMMARY" + echo "- **Target:** $TARGET" >> "$GITHUB_STEP_SUMMARY" diff --git a/distribution/smithery/README.md b/distribution/smithery/README.md new file mode 100644 index 0000000..714cff4 --- /dev/null +++ b/distribution/smithery/README.md @@ -0,0 +1,107 @@ +# Skill Seekers — Smithery MCP Registry + +Publishing guide for the Skill Seekers MCP server on [Smithery](https://smithery.ai). + +## Status + +- **Namespace created:** `yusufkaraaslan` +- **Server created:** `yusufkaraaslan/skill-seekers` +- **Server page:** https://smithery.ai/servers/yusufkaraaslan/skill-seekers +- **Release status:** Needs re-publish (initial release failed — Smithery couldn't scan GitHub URL as MCP endpoint) + +## Publishing + +Smithery requires a live, scannable MCP HTTP endpoint for URL-based publishing. Two options: + +### Option A: Publish via Web UI (Recommended) + +1. Go to https://smithery.ai/servers/yusufkaraaslan/skill-seekers/releases +2. The server already exists — create a new release +3. For the "Local" tab: follow the prompts to publish as a stdio server +4. For the "URL" tab: provide a hosted HTTP endpoint URL + +### Option B: Deploy HTTP endpoint first, then publish via CLI + +1. Deploy the MCP server on Render/Railway/Fly.io: + ```bash + # Using existing Dockerfile.mcp + docker build -f Dockerfile.mcp -t skill-seekers-mcp . + # Deploy to your hosting provider + ``` +2. Publish the live URL: + ```bash + npx @smithery/cli@latest auth login + npx @smithery/cli@latest mcp publish "https://your-deployed-url/mcp" \ + -n yusufkaraaslan/skill-seekers + ``` + +### CLI Authentication (already done) + +```bash +# Install via npx (no global install needed) +npx @smithery/cli@latest auth login +npx @smithery/cli@latest namespace show # Should show: yusufkaraaslan +``` + +### After Publishing + +Update the server page with metadata: + +**Display name:** Skill Seekers — AI Skill & RAG Toolkit + +**Description:** +> Transform 17 source types into AI-ready skills and RAG knowledge. Ingest documentation sites, GitHub repos, PDFs, Jupyter notebooks, videos, Confluence, Notion, Slack/Discord exports, and more. Package for 16+ LLM platforms including Claude, GPT, Gemini, LangChain, LlamaIndex, and vector databases. + +**Tags:** `ai`, `rag`, `documentation`, `skills`, `preprocessing`, `mcp`, `knowledge-base`, `vector-database` + +## User Installation + +Once published, users can add the server to their MCP client: + +```bash +# Via Smithery CLI (adds to Claude Desktop, Cursor, etc.) +smithery mcp add yusufkaraaslan/skill-seekers --client claude + +# Or configure manually — users need skill-seekers installed: +pip install skill-seekers[mcp] +``` + +### Manual MCP Configuration + +For clients that use JSON config (Claude Desktop, Claude Code, Cursor): + +```json +{ + "mcpServers": { + "skill-seekers": { + "command": "python", + "args": ["-m", "skill_seekers.mcp.server_fastmcp"] + } + } +} +``` + +## Available Tools (35) + +| Category | Tools | Description | +|----------|-------|-------------| +| Config | 3 | Generate, list, validate scraping configs | +| Sync | 1 | Sync config URLs against live docs | +| Scraping | 11 | Scrape docs, GitHub, PDF, video, codebase, generic (10 types) | +| Packaging | 4 | Package, upload, enhance, install skills | +| Splitting | 2 | Split large configs, generate routers | +| Sources | 5 | Fetch, submit, manage config sources | +| Vector DB | 4 | Export to Weaviate, Chroma, FAISS, Qdrant | +| Workflows | 5 | List, get, create, update, delete workflows | + +## Maintenance + +- Update description/tags on major releases +- No code changes needed — users always get the latest via `pip install` + +## Notes + +- Smithery CLI v4.7.0 removed the `--transport stdio` flag from the docs +- The CLI `publish` command only supports URL-based (external) publishing +- For local/stdio servers, use the web UI at smithery.ai/servers/new +- The namespace and server entity are already created; only the release needs to succeed diff --git a/render-mcp.yaml b/render-mcp.yaml new file mode 100644 index 0000000..29fd36b --- /dev/null +++ b/render-mcp.yaml @@ -0,0 +1,17 @@ +services: + # MCP Server Service (HTTP mode) + - type: web + name: skill-seekers-mcp + runtime: docker + plan: free + dockerfilePath: ./Dockerfile.mcp + envVars: + - key: MCP_PORT + value: "8765" + - key: PORT + fromService: + type: web + name: skill-seekers-mcp + property: port + healthCheckPath: /health + autoDeploy: true