* feat: add MiniMax AI as LLM platform adaptor Original implementation by octo-patch in PR #318. This commit includes comprehensive improvements and documentation. Code Improvements: - Fix API key validation to properly check JWT format (eyJ prefix) - Add specific exception handling for timeout and connection errors - Remove unused variable in upload method Dependencies: - Add MiniMax to [all-llms] extra group in pyproject.toml Tests: - Remove duplicate setUp method in integration test class - Add 4 new test methods: * test_package_excludes_backup_files * test_upload_success_mocked (with OpenAI mocking) * test_upload_network_error * test_upload_connection_error * test_validate_api_key_jwt_format - Update test_validate_api_key_valid to use JWT format keys - Fix test assertions for error message matching Documentation: - Create comprehensive MINIMAX_INTEGRATION.md guide (380+ lines) - Update MULTI_LLM_SUPPORT.md with MiniMax platform entry - Update 01-installation.md extras table - Update INTEGRATIONS.md AI platforms table - Update AGENTS.md adaptor import pattern example - Fix README.md platform count from 4 to 5 All tests pass (33 passed, 3 skipped) Lint checks pass Co-authored-by: octo-patch <octo-patch@users.noreply.github.com> * fix: improve MiniMax adaptor — typed exceptions, key validation, tests, docs - Remove invalid "minimax" self-reference from all-llms dependency group - Use typed OpenAI exceptions (APITimeoutError, APIConnectionError) instead of string-matching on generic Exception - Replace incorrect JWT assumption in validate_api_key with length check - Use DEFAULT_API_ENDPOINT constant instead of hardcoded URLs (3 sites) - Add Path() cast for output_path before .is_dir() call - Add sys.modules mock to test_enhance_missing_library - Add mocked test_enhance_success with backup/content verification - Update test assertions for new exception types and key validation - Add MiniMax to __init__.py docstrings (module, get_adaptor, list_platforms) - Add MiniMax sections to MULTI_LLM_SUPPORT.md (install, format, API key, workflow example, export-to-all) Follows up on PR #318 by @octo-patch (feat: add MiniMax AI as LLM platform adaptor). Co-Authored-By: Octopus <octo-patch@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: octo-patch <octo-patch@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Skill Seekers converts documentation from 17 source types into production-ready formats for 16+ AI platforms (LLM platforms, RAG frameworks, vector databases, AI coding assistants). Published on PyPI as skill-seekers.
Version: 3.3.0 | Python: 3.10+ | Website: https://skillseekersweb.com/
Essential Commands
# REQUIRED before running tests or CLI (src/ layout)
pip install -e .
# Run all tests (NEVER skip - all must pass before commits)
pytest tests/ -v
# Fast iteration (skip slow MCP tests ~20min)
pytest tests/ --ignore=tests/test_mcp_fastmcp.py --ignore=tests/test_mcp_server.py --ignore=tests/test_install_skill_e2e.py -q
# Single test
pytest tests/test_scraper_features.py::test_detect_language -vv -s
# Code quality (must pass before push - matches CI)
uvx ruff check src/ tests/
uvx ruff format --check src/ tests/
mypy src/skill_seekers # continue-on-error in CI
# Auto-fix lint/format issues
uvx ruff check --fix --unsafe-fixes src/ tests/
uvx ruff format src/ tests/
# Build & publish
uv build
uv publish
CI Matrix
Runs on push/PR to main or development. Lint job (Python 3.12, Ubuntu) + Test job (Ubuntu + macOS, Python 3.10/3.11/3.12, excludes macOS+3.10). Both must pass for merge.
Git Workflow
- Main branch:
main(requires tests + 1 review) - Development branch:
development(default PR target, requires tests) - Feature branches:
feature/{task-id}-{description}fromdevelopment - PRs always target
development, nevermaindirectly
Architecture
CLI: Git-style dispatcher
Entry point src/skill_seekers/cli/main.py maps subcommands to modules. The create command auto-detects source type and is the recommended entry point for users.
skill-seekers create <source> # Auto-detect: URL, owner/repo, ./path, file.pdf, etc.
skill-seekers <type> [options] # Direct: scrape, github, pdf, word, epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat
skill-seekers package <dir> # Package for platform (--target claude/gemini/openai/markdown, --format langchain/llama-index/haystack/chroma/faiss/weaviate/qdrant)
Data Flow (5 phases)
- Scrape - Source-specific scraper extracts content to
output/{name}_data/pages/*.json - Build -
build_skill()categorizes pages, extracts patterns, generatesoutput/{name}/SKILL.md - Enhance (optional) - LLM rewrites SKILL.md (
--enhance-level 0-3, auto-detects API vs LOCAL mode) - Package - Platform adaptor formats output (
.zip,.tar.gz, JSON, vector index) - Upload (optional) - Platform API upload
Platform Adaptor Pattern (Strategy + Factory)
src/skill_seekers/cli/adaptors/
├── __init__.py # Factory: get_adaptor(target=..., format=...)
├── base_adaptor.py # Abstract base: package(), upload(), enhance(), export()
├── claude_adaptor.py # --target claude
├── gemini_adaptor.py # --target gemini
├── openai_adaptor.py # --target openai
├── markdown_adaptor.py # --target markdown
├── langchain.py # --format langchain
├── llama_index.py # --format llama-index
├── haystack.py # --format haystack
├── chroma.py # --format chroma
├── faiss_helpers.py # --format faiss
├── qdrant.py # --format qdrant
├── weaviate.py # --format weaviate
└── streaming_adaptor.py # --format streaming
--target = LLM platforms, --format = RAG/vector DBs.
17 Source Type Scrapers
Each in src/skill_seekers/cli/{type}_scraper.py with a main() entry point. The create_command.py uses source_detector.py to auto-route. New scrapers added in v3.2.0+: jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat.
CLI Argument System
src/skill_seekers/cli/
├── parsers/ # Subcommand parser registration
│ └── create_parser.py # Progressive help disclosure (--help-web, --help-github, etc.)
├── arguments/ # Argument definitions
│ ├── common.py # add_all_standard_arguments() - shared across all scrapers
│ └── create.py # UNIVERSAL_ARGUMENTS, WEB_ARGUMENTS, GITHUB_ARGUMENTS, etc.
└── source_detector.py # Auto-detect source type from input string
C3.x Codebase Analysis Pipeline
Local codebase analysis features, all opt-out (--skip-* flags):
- C3.1
pattern_recognizer.py- Design pattern detection (10 GoF patterns, 9 languages) - C3.2
test_example_extractor.py- Usage examples from tests - C3.3
how_to_guide_builder.py- AI-enhanced educational guides - C3.4
config_extractor.py- Configuration pattern extraction - C3.5
generate_router.py- Architecture overview generation - C3.10
signal_flow_analyzer.py- Godot signal flow analysis
MCP Server
src/skill_seekers/mcp/server_fastmcp.py - 26+ tools via FastMCP. Transport: stdio (Claude Code) or HTTP (Cursor/Windsurf). Optional dependency: pip install -e ".[mcp]"
Enhancement Modes
- API mode (if
ANTHROPIC_API_KEYset): Direct Claude API calls - LOCAL mode (fallback): Uses Claude Code CLI (free with Max plan)
- Control:
--enhance-level 0(off) /1(SKILL.md only) /2(default, balanced) /3(full)
Key Implementation Details
Smart Categorization (doc_scraper.py:smart_categorize())
Scores pages against category keywords: 3 points for URL match, 2 for title, 1 for content. Threshold of 2+ required. Falls back to "other".
Content Extraction (doc_scraper.py)
FALLBACK_MAIN_SELECTORS constant + _find_main_content() helper handle CSS selector fallback. Links are extracted from the full page before early return (not just main content). body is deliberately excluded from fallbacks.
Three-Stream GitHub Architecture (unified_codebase_analyzer.py)
Stream 1: Code Analysis (AST, patterns, tests, guides). Stream 2: Documentation (README, docs/, wiki). Stream 3: Community (issues, PRs, metadata). Depth control: basic (1-2 min) or c3x (20-60 min).
Testing
Test markers (pytest.ini)
pytest tests/ -v # Default: fast tests only
pytest tests/ -v -m slow # Include slow tests (>5s)
pytest tests/ -v -m integration # External services required
pytest tests/ -v -m e2e # Resource-intensive
pytest tests/ -v -m "not slow and not integration" # Fastest subset
Known legitimate skips (~11)
- 2: chromadb incompatible with Python 3.14 (pydantic v1)
- 2: weaviate-client not installed
- 2: Qdrant not running (requires docker)
- 2: langchain/llama_index not installed
- 3: GITHUB_TOKEN not set
sys.modules gotcha
test_swift_detection.py deletes skill_seekers.cli modules from sys.modules. It must save and restore both sys.modules entries AND parent package attributes (setattr). See the test file for the pattern.
Dependencies
Core deps include langchain, llama-index, anthropic, httpx, PyMuPDF, pydantic. Platform-specific deps are optional:
pip install -e ".[mcp]" # MCP server
pip install -e ".[gemini]" # Google Gemini
pip install -e ".[openai]" # OpenAI
pip install -e ".[docx]" # Word documents
pip install -e ".[epub]" # EPUB books
pip install -e ".[video]" # Video (lightweight)
pip install -e ".[video-full]"# Video (Whisper + visual)
pip install -e ".[jupyter]" # Jupyter notebooks
pip install -e ".[pptx]" # PowerPoint
pip install -e ".[rss]" # RSS/Atom feeds
pip install -e ".[confluence]"# Confluence wiki
pip install -e ".[notion]" # Notion pages
pip install -e ".[chroma]" # ChromaDB
pip install -e ".[all]" # Everything (except video-full)
Dev dependencies use PEP 735 [dependency-groups] in pyproject.toml.
Environment Variables
ANTHROPIC_API_KEY=sk-ant-... # Claude AI (or compatible endpoint)
ANTHROPIC_BASE_URL=https://... # Optional: Claude-compatible API endpoint
GOOGLE_API_KEY=AIza... # Google Gemini (optional)
OPENAI_API_KEY=sk-... # OpenAI (optional)
GITHUB_TOKEN=ghp_... # Higher GitHub rate limits
Adding New Features
New platform adaptor
- Create
src/skill_seekers/cli/adaptors/{platform}_adaptor.pyinheritingBaseAdaptor - Register in
adaptors/__init__.pyfactory - Add optional dep to
pyproject.toml - Add tests in
tests/
New source type scraper
- Create
src/skill_seekers/cli/{type}_scraper.pywithmain() - Add to
COMMAND_MODULESincli/main.py - Add entry point in
pyproject.toml[project.scripts] - Add auto-detection in
source_detector.py - Add optional dep if needed
- Add tests
New CLI argument
- Universal:
UNIVERSAL_ARGUMENTSinarguments/create.py - Source-specific: appropriate dict (
WEB_ARGUMENTS,GITHUB_ARGUMENTS, etc.) - Shared across scrapers:
add_all_standard_arguments()inarguments/common.py