firefrost-gaming/skill-seekers-reference

Files

yusyus cc9cc32417 feat: add skill-seekers video --setup for GPU auto-detection and dependency installation

Auto-detects NVIDIA (CUDA), AMD (ROCm), or CPU-only GPU and installs the
correct PyTorch variant + easyocr + all visual extraction dependencies.
Removes easyocr from video-full pip extras to avoid pulling ~2GB of wrong
CUDA packages on non-NVIDIA systems.

New files:
- video_setup.py (835 lines): GPU detection, PyTorch install, ROCm config,
  venv checks, system dep validation, module selection, verification
- test_video_setup.py (60 tests): Full coverage of detection, install, verify

Updated docs: CHANGELOG, AGENTS.md, CLAUDE.md, README.md, CLI_REFERENCE,
FAQ, TROUBLESHOOTING, installation guide, video dependency plan

All 2523 tests passing (15 skipped).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-01 18:39:16 +03:00

30 KiB

Raw Blame History

AGENTS.md - Skill Seekers

Essential guidance for AI coding agents working with the Skill Seekers codebase.

Project Overview

Skill Seekers is a Python CLI tool that converts documentation websites, GitHub repositories, PDF files, and videos into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.

Key Facts

Attribute	Value
Current Version	3.1.3
Python Version	3.10+ (tested on 3.10, 3.11, 3.12, 3.13)
License	MIT
Package Name	`skill-seekers` (PyPI)
Source Files	182 Python files
Test Files	105+ test files
Website	https://skillseekersweb.com/
Repository	https://github.com/yusufkaraaslan/Skill_Seekers

Supported Target Platforms

Platform	Format	Use Case
Claude AI	ZIP + YAML	Claude Code skills
Google Gemini	tar.gz	Gemini skills
OpenAI ChatGPT	ZIP + Vector Store	Custom GPTs
LangChain	Documents	QA chains, agents, retrievers
LlamaIndex	TextNodes	Query engines, chat engines
Haystack	Documents	Enterprise RAG pipelines
Pinecone	Ready for upsert	Production vector search
Weaviate	Vector objects	Vector database
Qdrant	Points	Vector database
Chroma	Documents	Local vector database
FAISS	Index files	Local similarity search
Cursor IDE	.cursorrules	AI coding assistant rules
Windsurf	.windsurfrules	AI coding rules
Cline	.clinerules + MCP	VS Code extension
Continue.dev	HTTP context	Universal IDE support
Generic Markdown	ZIP	Universal export

Core Workflow

Scrape Phase - Crawl documentation/GitHub/PDF/video sources
Build Phase - Organize content into categorized references
Enhancement Phase - AI-powered quality improvements (optional)
Package Phase - Create platform-specific packages
Upload Phase - Auto-upload to target platform (optional)

Project Structure

/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
├── src/skill_seekers/              # Main source code (src/ layout)
│   ├── cli/                        # CLI tools and commands (~70 modules)
│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
│   │   │   ├── base.py             # Abstract base class (SkillAdaptor)
│   │   │   ├── claude.py           # Claude AI adaptor
│   │   │   ├── gemini.py           # Google Gemini adaptor
│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
│   │   │   ├── markdown.py         # Generic Markdown adaptor
│   │   │   ├── chroma.py           # Chroma vector DB adaptor
│   │   │   ├── faiss_helpers.py    # FAISS index adaptor
│   │   │   ├── haystack.py         # Haystack RAG adaptor
│   │   │   ├── langchain.py        # LangChain adaptor
│   │   │   ├── llama_index.py      # LlamaIndex adaptor
│   │   │   ├── qdrant.py           # Qdrant vector DB adaptor
│   │   │   ├── weaviate.py         # Weaviate vector DB adaptor
│   │   │   └── streaming_adaptor.py # Streaming output adaptor
│   │   ├── arguments/              # CLI argument definitions
│   │   ├── parsers/                # Argument parsers
│   │   │   └── extractors/         # Content extractors
│   │   ├── presets/                # Preset configuration management
│   │   ├── storage/                # Cloud storage adaptors
│   │   ├── main.py                 # Unified CLI entry point
│   │   ├── create_command.py       # Unified create command
│   │   ├── doc_scraper.py          # Documentation scraper
│   │   ├── github_scraper.py       # GitHub repository scraper
│   │   ├── pdf_scraper.py          # PDF extraction
│   │   ├── word_scraper.py         # Word document scraper
│   │   ├── video_scraper.py        # Video extraction
│   │   ├── video_setup.py          # GPU detection & dependency installation
│   │   ├── unified_scraper.py      # Multi-source scraping
│   │   ├── codebase_scraper.py     # Local codebase analysis
│   │   ├── enhance_command.py      # AI enhancement command
│   │   ├── enhance_skill_local.py  # AI enhancement (local mode)
│   │   ├── package_skill.py        # Skill packager
│   │   ├── upload_skill.py         # Upload to platforms
│   │   ├── cloud_storage_cli.py    # Cloud storage CLI
│   │   ├── benchmark_cli.py        # Benchmarking CLI
│   │   ├── sync_cli.py             # Sync monitoring CLI
│   │   └── workflows_command.py    # Workflow management CLI
│   ├── mcp/                        # MCP server integration
│   │   ├── server_fastmcp.py       # FastMCP server (~708 lines)
│   │   ├── server_legacy.py        # Legacy server implementation
│   │   ├── server.py               # Server entry point
│   │   ├── agent_detector.py       # AI agent detection
│   │   ├── git_repo.py             # Git repository operations
│   │   ├── source_manager.py       # Config source management
│   │   └── tools/                  # MCP tool implementations
│   │       ├── config_tools.py     # Configuration tools
│   │       ├── packaging_tools.py  # Packaging tools
│   │       ├── scraping_tools.py   # Scraping tools
│   │       ├── source_tools.py     # Source management tools
│   │       ├── splitting_tools.py  # Config splitting tools
│   │       ├── vector_db_tools.py  # Vector database tools
│   │       └── workflow_tools.py   # Workflow management tools
│   ├── sync/                       # Sync monitoring module
│   │   ├── detector.py             # Change detection
│   │   ├── models.py               # Data models (Pydantic)
│   │   ├── monitor.py              # Monitoring logic
│   │   └── notifier.py             # Notification system
│   ├── benchmark/                  # Benchmarking framework
│   │   ├── framework.py            # Benchmark framework
│   │   ├── models.py               # Benchmark models
│   │   └── runner.py               # Benchmark runner
│   ├── embedding/                  # Embedding server
│   │   ├── server.py               # FastAPI embedding server
│   │   ├── generator.py            # Embedding generation
│   │   ├── cache.py                # Embedding cache
│   │   └── models.py               # Embedding models
│   ├── workflows/                  # YAML workflow presets (66 presets)
│   ├── _version.py                 # Version information (reads from pyproject.toml)
│   └── __init__.py                 # Package init
├── tests/                          # Test suite (105+ test files)
├── configs/                        # Preset configuration files
├── docs/                           # Documentation (80+ markdown files)
│   ├── integrations/               # Platform integration guides
│   ├── guides/                     # User guides
│   ├── reference/                  # API reference
│   ├── features/                   # Feature documentation
│   ├── blog/                       # Blog posts
│   └── roadmap/                    # Roadmap documents
├── examples/                       # Usage examples
├── .github/workflows/              # CI/CD workflows
├── pyproject.toml                  # Main project configuration
├── requirements.txt                # Pinned dependencies
├── mypy.ini                        # MyPy type checker configuration
├── Dockerfile                      # Main Docker image (multi-stage)
├── Dockerfile.mcp                  # MCP server Docker image
└── docker-compose.yml              # Full stack deployment

Build and Development Commands

Prerequisites

Python 3.10 or higher
pip or uv package manager
Git (for GitHub scraping features)

Setup (REQUIRED before any development)

# Install in editable mode (REQUIRED for tests due to src/ layout)
pip install -e .

# Install with all platform dependencies
pip install -e ".[all-llms]"

# Install with all optional dependencies
pip install -e ".[all]"

# Install specific platforms only
pip install -e ".[gemini]"    # Google Gemini support
pip install -e ".[openai]"    # OpenAI ChatGPT support
pip install -e ".[mcp]"       # MCP server dependencies
pip install -e ".[s3]"        # AWS S3 support
pip install -e ".[gcs]"       # Google Cloud Storage
pip install -e ".[azure]"     # Azure Blob Storage
pip install -e ".[embedding]" # Embedding server support
pip install -e ".[rag-upload]" # Vector DB upload support

# Install dev dependencies (using dependency-groups)
pip install -e ".[dev]"

CRITICAL: The project uses a src/ layout. Tests WILL FAIL unless you install with pip install -e . first.

Building

# Build package using uv (recommended)
uv build

# Or using standard build
python -m build

# Publish to PyPI
uv publish

Docker

# Build Docker image
docker build -t skill-seekers .

# Run with docker-compose (includes vector databases)
docker-compose up -d

# Run MCP server only
docker-compose up -d mcp-server

# View logs
docker-compose logs -f mcp-server

Testing Instructions

Running Tests

CRITICAL: Never skip tests - all tests must pass before commits.

# All tests (must run pip install -e . first!)
pytest tests/ -v

# Specific test file
pytest tests/test_scraper_features.py -v
pytest tests/test_mcp_fastmcp.py -v
pytest tests/test_cloud_storage.py -v

# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html

# Single test
pytest tests/test_scraper_features.py::test_detect_language -v

# E2E tests
pytest tests/test_e2e_three_stream_pipeline.py -v

# Skip slow tests
pytest tests/ -v -m "not slow"

# Run only integration tests
pytest tests/ -v -m integration

# Run only specific marker
pytest tests/ -v -m "not slow and not integration"

Test Architecture

105+ test files covering all features
CI Matrix: Ubuntu + macOS, Python 3.10-3.12
Test markers defined in pyproject.toml:

Marker	Description
`slow`	Tests taking >5 seconds
`integration`	Requires external services (APIs)
`e2e`	End-to-end tests (resource-intensive)
`venv`	Requires virtual environment setup
`bootstrap`	Bootstrap skill specific
`benchmark`	Performance benchmark tests

Test Configuration

From pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short --strict-markers"
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"

The conftest.py file checks that the package is installed before running tests.

Code Style Guidelines

Linting and Formatting

# Run ruff linter
ruff check src/ tests/

# Run ruff formatter check
ruff format --check src/ tests/

# Auto-fix issues
ruff check src/ tests/ --fix
ruff format src/ tests/

# Run mypy type checker
mypy src/skill_seekers --show-error-codes --pretty

Style Rules (from pyproject.toml)

Line length: 100 characters
Target Python: 3.10+
Enabled rules: E, W, F, I, B, C4, UP, ARG, SIM
Ignored rules: E501, F541, ARG002, B007, I001, SIM114
Import sorting: isort style with skill_seekers as first-party

MyPy Configuration (from pyproject.toml)

[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = false
disallow_incomplete_defs = false
check_untyped_defs = true
ignore_missing_imports = true
show_error_codes = true
pretty = true

Code Conventions

Use type hints where practical (gradual typing approach)
Docstrings: Use Google-style or standard docstrings
Error handling: Use specific exceptions, provide helpful messages
Async code: Use asyncio, mark tests with @pytest.mark.asyncio
File naming: Use snake_case for all Python files
Class naming: Use PascalCase for classes
Function naming: Use snake_case for functions and methods
Constants: Use UPPER_CASE for module-level constants

Architecture Patterns

Platform Adaptor Pattern (Strategy Pattern)

All platform-specific logic is encapsulated in adaptors:

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'langchain', etc.

# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform
adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)

Each adaptor inherits from SkillAdaptor base class and implements:

format_skill_md() - Format SKILL.md content
package() - Create platform-specific package
upload() - Upload to platform API
validate_api_key() - Validate API key format
supports_enhancement() - Whether AI enhancement is supported

CLI Architecture (Git-style)

Entry point: src/skill_seekers/cli/main.py

The CLI uses subcommands that delegate to existing modules:

# skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv

Available subcommands:

create - Unified create command
config - Configuration wizard
scrape - Documentation scraping
github - GitHub repository scraping
pdf - PDF extraction
word - Word document extraction
video - Video extraction (YouTube or local). Use --setup to auto-detect GPU and install visual deps.
unified - Multi-source scraping
analyze / codebase - Local codebase analysis
enhance - AI enhancement
package - Package skill for target platform
upload - Upload to platform
cloud - Cloud storage operations
sync - Sync monitoring
benchmark - Performance benchmarking
embed - Embedding server
install / install-agent - Complete workflow
stream - Streaming ingestion
update - Incremental updates
multilang - Multi-language support
quality - Quality metrics
resume - Resume interrupted jobs
estimate - Estimate page counts
workflows - Workflow management

MCP Server Architecture

Two implementations:

server_fastmcp.py - Modern, decorator-based (recommended, ~708 lines)
server_legacy.py - Legacy implementation

Tools are organized by category:

Config tools (3 tools): generate_config, list_configs, validate_config
Scraping tools (10 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video (supports setup parameter for GPU detection and visual dep installation), scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill
Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
Splitting tools (2 tools): split_config, generate_router
Vector Database tools (4 tools): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant
Workflow tools (5 tools): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow

Running MCP Server:

# Stdio transport (default)
python -m skill_seekers.mcp.server_fastmcp

# HTTP transport
python -m skill_seekers.mcp.server_fastmcp --http --port 8765

Cloud Storage Architecture

Abstract base class pattern for cloud providers:

base_storage.py - Defines BaseStorageAdaptor interface
s3_storage.py - AWS S3 implementation
gcs_storage.py - Google Cloud Storage implementation
azure_storage.py - Azure Blob Storage implementation

Sync Monitoring Architecture

Pydantic-based models in src/skill_seekers/sync/:

models.py - Data models (SyncConfig, ChangeReport, SyncState)
detector.py - Change detection logic
monitor.py - Monitoring daemon
notifier.py - Notification system (webhook, email, slack)

Git Workflow

Branch Structure

main (production)
  ↑
  │ (only maintainer merges)
  │
development (integration) ← default branch for PRs
  ↑
  │ (all contributor PRs go here)
  │
feature branches

main - Production, always stable, protected
development - Active development, default for PRs
Feature branches - Your work, created from development

Creating a Feature Branch

# 1. Checkout development
git checkout development
git pull upstream development

# 2. Create feature branch
git checkout -b my-feature

# 3. Make changes, commit, push
git add .
git commit -m "Add my feature"
git push origin my-feature

# 4. Create PR targeting 'development' branch

CI/CD Configuration

GitHub Actions Workflows

All workflows are in .github/workflows/:

tests.yml:

Runs on: push/PR to main and development
Lint job: Ruff + MyPy
Test matrix: Ubuntu + macOS, Python 3.10-3.12
Coverage: Uploads to Codecov

release.yml:

Triggered on version tags (v*)
Builds and publishes to PyPI using uv
Creates GitHub release with changelog

docker-publish.yml:

Builds and publishes Docker images
Multi-architecture support (linux/amd64, linux/arm64)

vector-db-export.yml:

Tests vector database exports

scheduled-updates.yml:

Scheduled sync monitoring

quality-metrics.yml:

Quality metrics tracking

test-vector-dbs.yml:

Vector database integration tests

Pre-commit Checks (Manual)

# Before committing, run:
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x  # Stop on first failure

Security Considerations

API Keys and Secrets

Never commit API keys to the repository
Use environment variables:
- ANTHROPIC_API_KEY - Claude AI
- GOOGLE_API_KEY - Google Gemini
- OPENAI_API_KEY - OpenAI
- GITHUB_TOKEN - GitHub API
- AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY - AWS S3
- GOOGLE_APPLICATION_CREDENTIALS - GCS
- AZURE_STORAGE_CONNECTION_STRING - Azure
Configuration storage:
- Stored at ~/.config/skill-seekers/config.json
- Permissions: 600 (owner read/write only)

Rate Limit Handling

GitHub API has rate limits (5000 requests/hour for authenticated)
The tool has built-in rate limit handling with retry logic
Use --non-interactive flag for CI/CD environments

Custom API Endpoints

Support for Claude-compatible APIs:

export ANTHROPIC_API_KEY=your-custom-api-key
export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1

Common Development Tasks

Adding a New CLI Command

Create module in src/skill_seekers/cli/my_command.py
Implement main() function with argument parsing

Add entry point in pyproject.toml:

[project.scripts]
skill-seekers-my-command = "skill_seekers.cli.my_command:main"

Add subcommand handler in src/skill_seekers/cli/main.py
Add argument parser in src/skill_seekers/cli/parsers/
Add tests in tests/test_my_command.py

Adding a New Platform Adaptor

Create src/skill_seekers/cli/adaptors/my_platform.py
Inherit from SkillAdaptor base class
Implement required methods: package(), upload(), format_skill_md()
Register in src/skill_seekers/cli/adaptors/__init__.py
Add optional dependencies in pyproject.toml
Add tests in tests/test_adaptors/

Adding an MCP Tool

Implement tool logic in src/skill_seekers/mcp/tools/category_tools.py
Register in src/skill_seekers/mcp/server_fastmcp.py
Add test in tests/test_mcp_fastmcp.py

Adding Cloud Storage Provider

Create module in src/skill_seekers/cli/storage/my_storage.py
Inherit from BaseStorageAdaptor base class
Implement required methods: upload_file(), download_file(), list_files(), delete_file()
Register in src/skill_seekers/cli/storage/__init__.py
Add optional dependencies in pyproject.toml

Documentation

Project Documentation (New Structure - v3.1.0+)

Entry Points:

README.md - Main project documentation with navigation
docs/README.md - Documentation hub
AGENTS.md - This file, for AI coding agents

Getting Started (for new users):

docs/getting-started/01-installation.md - Installation guide
docs/getting-started/02-quick-start.md - 3 commands to first skill
docs/getting-started/03-your-first-skill.md - Complete walkthrough
docs/getting-started/04-next-steps.md - Where to go from here

User Guides (common tasks):

docs/user-guide/01-core-concepts.md - How Skill Seekers works
docs/user-guide/02-scraping.md - All scraping options
docs/user-guide/03-enhancement.md - AI enhancement explained
docs/user-guide/04-packaging.md - Export to platforms
docs/user-guide/05-workflows.md - Enhancement workflows
docs/user-guide/06-troubleshooting.md - Common issues

Reference (technical details):

docs/reference/CLI_REFERENCE.md - Complete command reference (20 commands)
docs/reference/MCP_REFERENCE.md - MCP tools reference (33 tools)
docs/reference/CONFIG_FORMAT.md - JSON configuration specification
docs/reference/ENVIRONMENT_VARIABLES.md - All environment variables

Advanced (power user topics):

docs/advanced/mcp-server.md - MCP server setup
docs/advanced/mcp-tools.md - Advanced MCP usage
docs/advanced/custom-workflows.md - Creating custom workflows
docs/advanced/multi-source.md - Multi-source scraping

Configuration Documentation

Preset configs are in configs/ directory:

godot.json / godot_unified.json - Godot Engine
blender.json / blender-unified.json - Blender Engine
claude-code.json - Claude Code
httpx_comprehensive.json - HTTPX library
medusa-mercurjs.json - Medusa/MercurJS
astrovalley_unified.json - Astrovalley
react.json - React documentation
configs/integrations/ - Integration-specific configs

Key Dependencies

Core Dependencies (Required)

Package	Version	Purpose
`requests`	>=2.32.5	HTTP requests
`beautifulsoup4`	>=4.14.2	HTML parsing
`PyGithub`	>=2.5.0	GitHub API
`GitPython`	>=3.1.40	Git operations
`httpx`	>=0.28.1	Async HTTP
`anthropic`	>=0.76.0	Claude AI API
`PyMuPDF`	>=1.24.14	PDF processing
`Pillow`	>=11.0.0	Image processing
`pytesseract`	>=0.3.13	OCR
`pydantic`	>=2.12.3	Data validation
`pydantic-settings`	>=2.11.0	Settings management
`click`	>=8.3.0	CLI framework
`Pygments`	>=2.19.2	Syntax highlighting
`pathspec`	>=0.12.1	Path matching
`networkx`	>=3.0	Graph operations
`schedule`	>=1.2.0	Scheduled tasks
`python-dotenv`	>=1.1.1	Environment variables
`jsonschema`	>=4.25.1	JSON validation
`PyYAML`	>=6.0	YAML parsing
`langchain`	>=1.2.10	LangChain integration
`llama-index`	>=0.14.15	LlamaIndex integration

Optional Dependencies

Feature	Package	Install Command
MCP Server	`mcp>=1.25,<2`	`pip install -e ".[mcp]"`
Google Gemini	`google-generativeai>=0.8.0`	`pip install -e ".[gemini]"`
OpenAI	`openai>=1.0.0`	`pip install -e ".[openai]"`
AWS S3	`boto3>=1.34.0`	`pip install -e ".[s3]"`
Google Cloud Storage	`google-cloud-storage>=2.10.0`	`pip install -e ".[gcs]"`
Azure Blob Storage	`azure-storage-blob>=12.19.0`	`pip install -e ".[azure]"`
Word Documents	`mammoth>=1.6.0`, `python-docx>=1.1.0`	`pip install -e ".[docx]"`
Video (lightweight)	`yt-dlp>=2024.12.0`, `youtube-transcript-api>=1.2.0`	`pip install -e ".[video]"`
Video (full)	+`faster-whisper`, `scenedetect`, `opencv-python-headless` (`easyocr` now installed via `--setup`)	`pip install -e ".[video-full]"`
Video (GPU setup)	Auto-detects GPU, installs PyTorch + easyocr + all visual deps	`skill-seekers video --setup`
Chroma DB	`chromadb>=0.4.0`	`pip install -e ".[chroma]"`
Weaviate	`weaviate-client>=3.25.0`	`pip install -e ".[weaviate]"`
Pinecone	`pinecone>=5.0.0`	`pip install -e ".[pinecone]"`
Embedding Server	`fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0`	`pip install -e ".[embedding]"`

Dev Dependencies (in dependency-groups)

Package	Version	Purpose
`pytest`	>=8.4.2	Testing framework
`pytest-asyncio`	>=0.24.0	Async test support
`pytest-cov`	>=7.0.0	Coverage
`coverage`	>=7.11.0	Coverage reporting
`ruff`	>=0.14.13	Linting/formatting
`mypy`	>=1.19.1	Type checking
`psutil`	>=5.9.0	Process utilities for testing
`numpy`	>=1.24.0	Numerical operations
`starlette`	>=0.31.0	HTTP transport testing
`httpx`	>=0.24.0	HTTP client for testing
`boto3`	>=1.26.0	AWS S3 testing
`google-cloud-storage`	>=2.10.0	GCS testing
`azure-storage-blob`	>=12.17.0	Azure testing

Troubleshooting

Common Issues

ImportError: No module named 'skill_seekers'

Solution: Run pip install -e .

Tests failing with "package not installed"

Solution: Ensure you ran pip install -e . in the correct virtual environment

MCP server import errors

Solution: Install with pip install -e ".[mcp]"

Type checking failures

MyPy is configured to be lenient (gradual typing)
Focus on critical paths, not full coverage

Docker build failures

Ensure you have BuildKit enabled: DOCKER_BUILDKIT=1
Check that all submodules are initialized: git submodule update --init

Rate limit errors from GitHub

Set GITHUB_TOKEN environment variable for authenticated requests
Improves rate limit from 60 to 5000 requests/hour

Getting Help

Check TROUBLESHOOTING.md for detailed solutions
Review docs/FAQ.md for common questions
Visit https://skillseekersweb.com/ for documentation
Open an issue on GitHub with:
- Clear title and description
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Python version)
- Error messages and stack traces

Environment Variables Reference

Variable	Purpose	Required For
`ANTHROPIC_API_KEY`	Claude AI API access	Claude enhancement/upload
`GOOGLE_API_KEY`	Google Gemini API access	Gemini enhancement/upload
`OPENAI_API_KEY`	OpenAI API access	OpenAI enhancement/upload
`GITHUB_TOKEN`	GitHub API authentication	GitHub scraping (recommended)
`AWS_ACCESS_KEY_ID`	AWS S3 authentication	S3 cloud storage
`AWS_SECRET_ACCESS_KEY`	AWS S3 authentication	S3 cloud storage
`GOOGLE_APPLICATION_CREDENTIALS`	GCS authentication path	GCS cloud storage
`AZURE_STORAGE_CONNECTION_STRING`	Azure Blob authentication	Azure cloud storage
`ANTHROPIC_BASE_URL`	Custom Claude endpoint	Custom API endpoints
`SKILL_SEEKERS_HOME`	Data directory path	Docker/runtime
`SKILL_SEEKERS_OUTPUT`	Output directory path	Docker/runtime

Version Management

The version is defined in pyproject.toml and dynamically read by src/skill_seekers/_version.py:

# _version.py reads from pyproject.toml
__version__ = get_version()  # Returns version from pyproject.toml

To update version:

Edit version in pyproject.toml
The _version.py file will automatically pick up the new version

Configuration File Format

Skill Seekers uses JSON configuration files to define scraping targets. Example structure:

{
  "name": "godot",
  "description": "Godot Engine documentation",
  "merge_mode": "claude-enhanced",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.godotengine.org/en/stable/",
      "extract_api": true,
      "selectors": {
        "main_content": "div[role='main']",
        "title": "title",
        "code_blocks": "pre"
      },
      "url_patterns": {
        "include": [],
        "exclude": ["/search.html", "/_static/"]
      },
      "categories": {
        "getting_started": ["introduction", "getting_started"],
        "scripting": ["scripting", "gdscript"]
      },
      "rate_limit": 0.5,
      "max_pages": 500
    },
    {
      "type": "github",
      "repo": "godotengine/godot",
      "enable_codebase_analysis": true,
      "code_analysis_depth": "deep",
      "fetch_issues": true,
      "max_issues": 100
    }
  ]
}

Workflow Presets

Skill Seekers includes 66 YAML workflow presets for AI enhancement in src/skill_seekers/workflows/:

Built-in presets:

default.yaml - Standard enhancement workflow
minimal.yaml - Fast, minimal enhancement
security-focus.yaml - Security-focused review
architecture-comprehensive.yaml - Deep architecture analysis
api-documentation.yaml - API documentation focus
And 61 more specialized presets...

Usage:

# Apply a preset
skill-seekers create ./my-project --enhance-workflow security-focus

# Chain multiple presets
skill-seekers create ./my-project --enhance-workflow security-focus --enhance-workflow minimal

# Manage presets
skill-seekers workflows list
skill-seekers workflows show security-focus
skill-seekers workflows copy security-focus

This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.

Last updated: 2026-03-01

30 KiB Raw Blame History