firefrost-gaming/skill-seekers-reference

Files

yusyus 4f87de6b56 fix: improve MiniMax adaptor from PR #318 review (#319 )

* feat: add MiniMax AI as LLM platform adaptor

Original implementation by octo-patch in PR #318.
This commit includes comprehensive improvements and documentation.

Code Improvements:
- Fix API key validation to properly check JWT format (eyJ prefix)
- Add specific exception handling for timeout and connection errors
- Remove unused variable in upload method

Dependencies:
- Add MiniMax to [all-llms] extra group in pyproject.toml

Tests:
- Remove duplicate setUp method in integration test class
- Add 4 new test methods:
  * test_package_excludes_backup_files
  * test_upload_success_mocked (with OpenAI mocking)
  * test_upload_network_error
  * test_upload_connection_error
  * test_validate_api_key_jwt_format
- Update test_validate_api_key_valid to use JWT format keys
- Fix test assertions for error message matching

Documentation:
- Create comprehensive MINIMAX_INTEGRATION.md guide (380+ lines)
- Update MULTI_LLM_SUPPORT.md with MiniMax platform entry
- Update 01-installation.md extras table
- Update INTEGRATIONS.md AI platforms table
- Update AGENTS.md adaptor import pattern example
- Fix README.md platform count from 4 to 5

All tests pass (33 passed, 3 skipped)
Lint checks pass

Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>

* fix: improve MiniMax adaptor — typed exceptions, key validation, tests, docs

- Remove invalid "minimax" self-reference from all-llms dependency group
- Use typed OpenAI exceptions (APITimeoutError, APIConnectionError)
  instead of string-matching on generic Exception
- Replace incorrect JWT assumption in validate_api_key with length check
- Use DEFAULT_API_ENDPOINT constant instead of hardcoded URLs (3 sites)
- Add Path() cast for output_path before .is_dir() call
- Add sys.modules mock to test_enhance_missing_library
- Add mocked test_enhance_success with backup/content verification
- Update test assertions for new exception types and key validation
- Add MiniMax to __init__.py docstrings (module, get_adaptor, list_platforms)
- Add MiniMax sections to MULTI_LLM_SUPPORT.md (install, format, API key,
  workflow example, export-to-all)

Follows up on PR #318 by @octo-patch (feat: add MiniMax AI as LLM platform adaptor).

Co-Authored-By: Octopus <octo-patch@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-20 22:12:23 +03:00

9.0 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Skill Seekers converts documentation from 17 source types into production-ready formats for 16+ AI platforms (LLM platforms, RAG frameworks, vector databases, AI coding assistants). Published on PyPI as skill-seekers.

Version: 3.3.0 | Python: 3.10+ | Website: https://skillseekersweb.com/

Essential Commands

# REQUIRED before running tests or CLI (src/ layout)
pip install -e .

# Run all tests (NEVER skip - all must pass before commits)
pytest tests/ -v

# Fast iteration (skip slow MCP tests ~20min)
pytest tests/ --ignore=tests/test_mcp_fastmcp.py --ignore=tests/test_mcp_server.py --ignore=tests/test_install_skill_e2e.py -q

# Single test
pytest tests/test_scraper_features.py::test_detect_language -vv -s

# Code quality (must pass before push - matches CI)
uvx ruff check src/ tests/
uvx ruff format --check src/ tests/
mypy src/skill_seekers  # continue-on-error in CI

# Auto-fix lint/format issues
uvx ruff check --fix --unsafe-fixes src/ tests/
uvx ruff format src/ tests/

# Build & publish
uv build
uv publish

CI Matrix

Runs on push/PR to main or development. Lint job (Python 3.12, Ubuntu) + Test job (Ubuntu + macOS, Python 3.10/3.11/3.12, excludes macOS+3.10). Both must pass for merge.

Git Workflow

Main branch: main (requires tests + 1 review)
Development branch: development (default PR target, requires tests)
Feature branches: feature/{task-id}-{description} from development
PRs always target development, never main directly

Architecture

CLI: Git-style dispatcher

Entry point src/skill_seekers/cli/main.py maps subcommands to modules. The create command auto-detects source type and is the recommended entry point for users.

skill-seekers create <source>     # Auto-detect: URL, owner/repo, ./path, file.pdf, etc.
skill-seekers <type> [options]    # Direct: scrape, github, pdf, word, epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat
skill-seekers package <dir>       # Package for platform (--target claude/gemini/openai/markdown, --format langchain/llama-index/haystack/chroma/faiss/weaviate/qdrant)

Data Flow (5 phases)

Scrape - Source-specific scraper extracts content to output/{name}_data/pages/*.json
Build - build_skill() categorizes pages, extracts patterns, generates output/{name}/SKILL.md
Enhance (optional) - LLM rewrites SKILL.md (--enhance-level 0-3, auto-detects API vs LOCAL mode)
Package - Platform adaptor formats output (.zip, .tar.gz, JSON, vector index)
Upload (optional) - Platform API upload

Platform Adaptor Pattern (Strategy + Factory)

src/skill_seekers/cli/adaptors/
├── __init__.py          # Factory: get_adaptor(target=..., format=...)
├── base_adaptor.py      # Abstract base: package(), upload(), enhance(), export()
├── claude_adaptor.py    # --target claude
├── gemini_adaptor.py    # --target gemini
├── openai_adaptor.py    # --target openai
├── markdown_adaptor.py  # --target markdown
├── langchain.py         # --format langchain
├── llama_index.py       # --format llama-index
├── haystack.py          # --format haystack
├── chroma.py            # --format chroma
├── faiss_helpers.py     # --format faiss
├── qdrant.py            # --format qdrant
├── weaviate.py          # --format weaviate
└── streaming_adaptor.py # --format streaming

--target = LLM platforms, --format = RAG/vector DBs.

17 Source Type Scrapers

Each in src/skill_seekers/cli/{type}_scraper.py with a main() entry point. The create_command.py uses source_detector.py to auto-route. New scrapers added in v3.2.0+: jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat.

CLI Argument System

src/skill_seekers/cli/
├── parsers/              # Subcommand parser registration
│   └── create_parser.py  # Progressive help disclosure (--help-web, --help-github, etc.)
├── arguments/            # Argument definitions
│   ├── common.py         # add_all_standard_arguments() - shared across all scrapers
│   └── create.py         # UNIVERSAL_ARGUMENTS, WEB_ARGUMENTS, GITHUB_ARGUMENTS, etc.
└── source_detector.py    # Auto-detect source type from input string

C3.x Codebase Analysis Pipeline

Local codebase analysis features, all opt-out (--skip-* flags):

C3.1 pattern_recognizer.py - Design pattern detection (10 GoF patterns, 9 languages)
C3.2 test_example_extractor.py - Usage examples from tests
C3.3 how_to_guide_builder.py - AI-enhanced educational guides
C3.4 config_extractor.py - Configuration pattern extraction
C3.5 generate_router.py - Architecture overview generation
C3.10 signal_flow_analyzer.py - Godot signal flow analysis

MCP Server

src/skill_seekers/mcp/server_fastmcp.py - 26+ tools via FastMCP. Transport: stdio (Claude Code) or HTTP (Cursor/Windsurf). Optional dependency: pip install -e ".[mcp]"

Enhancement Modes

API mode (if ANTHROPIC_API_KEY set): Direct Claude API calls
LOCAL mode (fallback): Uses Claude Code CLI (free with Max plan)
Control: --enhance-level 0 (off) / 1 (SKILL.md only) / 2 (default, balanced) / 3 (full)

Key Implementation Details

Smart Categorization (`doc_scraper.py:smart_categorize()`)

Scores pages against category keywords: 3 points for URL match, 2 for title, 1 for content. Threshold of 2+ required. Falls back to "other".

Content Extraction (`doc_scraper.py`)

FALLBACK_MAIN_SELECTORS constant + _find_main_content() helper handle CSS selector fallback. Links are extracted from the full page before early return (not just main content). body is deliberately excluded from fallbacks.

Three-Stream GitHub Architecture (`unified_codebase_analyzer.py`)

Stream 1: Code Analysis (AST, patterns, tests, guides). Stream 2: Documentation (README, docs/, wiki). Stream 3: Community (issues, PRs, metadata). Depth control: basic (1-2 min) or c3x (20-60 min).

Testing

Test markers (pytest.ini)

pytest tests/ -v                                    # Default: fast tests only
pytest tests/ -v -m slow                            # Include slow tests (>5s)
pytest tests/ -v -m integration                     # External services required
pytest tests/ -v -m e2e                             # Resource-intensive
pytest tests/ -v -m "not slow and not integration"  # Fastest subset

Known legitimate skips (~11)

2: chromadb incompatible with Python 3.14 (pydantic v1)
2: weaviate-client not installed
2: Qdrant not running (requires docker)
2: langchain/llama_index not installed
3: GITHUB_TOKEN not set

sys.modules gotcha

test_swift_detection.py deletes skill_seekers.cli modules from sys.modules. It must save and restore both sys.modules entries AND parent package attributes (setattr). See the test file for the pattern.

Dependencies

Core deps include langchain, llama-index, anthropic, httpx, PyMuPDF, pydantic. Platform-specific deps are optional:

pip install -e ".[mcp]"       # MCP server
pip install -e ".[gemini]"    # Google Gemini
pip install -e ".[openai]"    # OpenAI
pip install -e ".[docx]"      # Word documents
pip install -e ".[epub]"      # EPUB books
pip install -e ".[video]"     # Video (lightweight)
pip install -e ".[video-full]"# Video (Whisper + visual)
pip install -e ".[jupyter]"   # Jupyter notebooks
pip install -e ".[pptx]"      # PowerPoint
pip install -e ".[rss]"       # RSS/Atom feeds
pip install -e ".[confluence]"# Confluence wiki
pip install -e ".[notion]"    # Notion pages
pip install -e ".[chroma]"    # ChromaDB
pip install -e ".[all]"       # Everything (except video-full)

Dev dependencies use PEP 735 [dependency-groups] in pyproject.toml.

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...          # Claude AI (or compatible endpoint)
ANTHROPIC_BASE_URL=https://...        # Optional: Claude-compatible API endpoint
GOOGLE_API_KEY=AIza...                # Google Gemini (optional)
OPENAI_API_KEY=sk-...                 # OpenAI (optional)
GITHUB_TOKEN=ghp_...                  # Higher GitHub rate limits

Adding New Features

New platform adaptor

Create src/skill_seekers/cli/adaptors/{platform}_adaptor.py inheriting BaseAdaptor
Register in adaptors/__init__.py factory
Add optional dep to pyproject.toml
Add tests in tests/

New source type scraper

Create src/skill_seekers/cli/{type}_scraper.py with main()
Add to COMMAND_MODULES in cli/main.py
Add entry point in pyproject.toml [project.scripts]
Add auto-detection in source_detector.py
Add optional dep if needed
Add tests

New CLI argument

Universal: UNIVERSAL_ARGUMENTS in arguments/create.py
Source-specific: appropriate dict (WEB_ARGUMENTS, GITHUB_ARGUMENTS, etc.)
Shared across scrapers: add_all_standard_arguments() in arguments/common.py

9.0 KiB Raw Blame History