firefrost-gaming/skill-seekers-reference

Files

yusyus ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations

Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-22 01:01:51 +03:00

29 KiB

Raw Blame History

AGENTS.md - Skill Seekers

This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.

Project Overview

Skill Seekers is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.

Key Facts

Attribute	Value
Current Version	3.0.0
Python Version	3.10+ (tested on 3.10, 3.11, 3.12, 3.13)
License	MIT
Package Name	`skill-seekers` (PyPI)
Website	https://skillseekersweb.com/
Repository	https://github.com/yusufkaraaslan/Skill_Seekers

Supported Target Platforms

Platform	Format	Use Case
Claude AI	ZIP + YAML	Claude Code skills
Google Gemini	tar.gz	Gemini skills
OpenAI ChatGPT	ZIP + Vector Store	Custom GPTs
LangChain	Documents	QA chains, agents, retrievers
LlamaIndex	TextNodes	Query engines, chat engines
Haystack	Documents	Enterprise RAG pipelines
Pinecone	Ready for upsert	Production vector search
Weaviate	Vector objects	Vector database
Qdrant	Points	Vector database
Chroma	Documents	Local vector database
FAISS	Index files	Local similarity search
Cursor IDE	.cursorrules	AI coding assistant rules
Windsurf	.windsurfrules	AI coding rules
Cline	.clinerules + MCP	VS Code extension
Continue.dev	HTTP context	Universal IDE support
Generic Markdown	ZIP	Universal export

Core Workflow

Scrape Phase - Crawl documentation/GitHub/PDF sources
Build Phase - Organize content into categorized references
Enhancement Phase - AI-powered quality improvements (optional)
Package Phase - Create platform-specific packages
Upload Phase - Auto-upload to target platform (optional)

Project Structure

/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
├── src/skill_seekers/              # Main source code (src/ layout)
│   ├── cli/                        # CLI tools and commands (~42k lines)
│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
│   │   │   ├── base.py             # Abstract base class (SkillAdaptor)
│   │   │   ├── claude.py           # Claude AI adaptor
│   │   │   ├── gemini.py           # Google Gemini adaptor
│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
│   │   │   ├── markdown.py         # Generic Markdown adaptor
│   │   │   ├── chroma.py           # Chroma vector DB adaptor
│   │   │   ├── faiss_helpers.py    # FAISS index adaptor
│   │   │   ├── haystack.py         # Haystack RAG adaptor
│   │   │   ├── langchain.py        # LangChain adaptor
│   │   │   ├── llama_index.py      # LlamaIndex adaptor
│   │   │   ├── qdrant.py           # Qdrant vector DB adaptor
│   │   │   ├── weaviate.py         # Weaviate vector DB adaptor
│   │   │   └── streaming_adaptor.py # Streaming output adaptor
│   │   ├── storage/                # Cloud storage backends
│   │   │   ├── base_storage.py     # Storage interface
│   │   │   ├── s3_storage.py       # AWS S3 support
│   │   │   ├── gcs_storage.py      # Google Cloud Storage
│   │   │   └── azure_storage.py    # Azure Blob Storage
│   │   ├── parsers/                # CLI argument parsers
│   │   ├── arguments/              # CLI argument definitions
│   │   ├── presets/                # Preset configuration management
│   │   ├── main.py                 # Unified CLI entry point
│   │   ├── create_command.py       # Unified create command
│   │   ├── doc_scraper.py          # Documentation scraper
│   │   ├── github_scraper.py       # GitHub repository scraper
│   │   ├── pdf_scraper.py          # PDF extraction
│   │   ├── unified_scraper.py      # Multi-source scraping
│   │   ├── codebase_scraper.py     # Local codebase analysis
│   │   ├── enhance_skill_local.py  # AI enhancement (local mode)
│   │   ├── package_skill.py        # Skill packager
│   │   ├── upload_skill.py         # Upload to platforms
│   │   ├── cloud_storage_cli.py    # Cloud storage CLI
│   │   ├── benchmark_cli.py        # Benchmarking CLI
│   │   ├── sync_cli.py             # Sync monitoring CLI
│   │   └── workflows_command.py    # Workflow management CLI
│   ├── mcp/                        # MCP server integration
│   │   ├── server_fastmcp.py       # FastMCP server (~708 lines)
│   │   ├── server_legacy.py        # Legacy server implementation
│   │   ├── server.py               # Server entry point
│   │   ├── agent_detector.py       # AI agent detection
│   │   ├── git_repo.py             # Git repository operations
│   │   ├── source_manager.py       # Config source management
│   │   └── tools/                  # MCP tool implementations
│   │       ├── config_tools.py     # Configuration tools
│   │       ├── scraping_tools.py   # Scraping tools
│   │       ├── packaging_tools.py  # Packaging tools
│   │       ├── source_tools.py     # Source management tools
│   │       ├── splitting_tools.py  # Config splitting tools
│   │       ├── vector_db_tools.py  # Vector database tools
│   │       └── workflow_tools.py   # Workflow management tools
│   ├── sync/                       # Sync monitoring module
│   │   ├── detector.py             # Change detection
│   │   ├── models.py               # Data models (Pydantic)
│   │   ├── monitor.py              # Monitoring logic
│   │   └── notifier.py             # Notification system
│   ├── benchmark/                  # Benchmarking framework
│   │   ├── framework.py            # Benchmark framework
│   │   ├── models.py               # Benchmark models
│   │   └── runner.py               # Benchmark runner
│   ├── embedding/                  # Embedding server
│   │   ├── server.py               # FastAPI embedding server
│   │   ├── generator.py            # Embedding generation
│   │   ├── cache.py                # Embedding cache
│   │   └── models.py               # Embedding models
│   ├── workflows/                  # YAML workflow presets
│   ├── _version.py                 # Version information (reads from pyproject.toml)
│   └── __init__.py                 # Package init
├── tests/                          # Test suite (98 test files)
├── configs/                        # Preset configuration files
├── docs/                           # Documentation (80+ markdown files)
│   ├── integrations/               # Platform integration guides
│   ├── guides/                     # User guides
│   ├── reference/                  # API reference
│   ├── features/                   # Feature documentation
│   ├── blog/                       # Blog posts
│   └── roadmap/                    # Roadmap documents
├── examples/                       # Usage examples
│   ├── langchain-rag-pipeline/     # LangChain example
│   ├── llama-index-query-engine/   # LlamaIndex example
│   ├── pinecone-upsert/            # Pinecone example
│   ├── chroma-example/             # Chroma example
│   ├── weaviate-example/           # Weaviate example
│   ├── qdrant-example/             # Qdrant example
│   ├── faiss-example/              # FAISS example
│   ├── haystack-pipeline/          # Haystack example
│   ├── cursor-react-skill/         # Cursor IDE example
│   ├── windsurf-fastapi-context/   # Windsurf example
│   └── continue-dev-universal/     # Continue.dev example
├── .github/workflows/              # CI/CD workflows
├── pyproject.toml                  # Main project configuration
├── requirements.txt                # Pinned dependencies
├── mypy.ini                        # MyPy type checker configuration
├── Dockerfile                      # Main Docker image (multi-stage)
├── Dockerfile.mcp                  # MCP server Docker image
└── docker-compose.yml              # Full stack deployment

Build and Development Commands

Prerequisites

Python 3.10 or higher
pip or uv package manager
Git (for GitHub scraping features)

Setup (REQUIRED before any development)

# Install in editable mode (REQUIRED for tests due to src/ layout)
pip install -e .

# Install with all platform dependencies
pip install -e ".[all-llms]"

# Install with all optional dependencies
pip install -e ".[all]"

# Install specific platforms only
pip install -e ".[gemini]"    # Google Gemini support
pip install -e ".[openai]"    # OpenAI ChatGPT support
pip install -e ".[mcp]"       # MCP server dependencies
pip install -e ".[s3]"        # AWS S3 support
pip install -e ".[gcs]"       # Google Cloud Storage
pip install -e ".[azure]"     # Azure Blob Storage
pip install -e ".[embedding]" # Embedding server support
pip install -e ".[rag-upload]" # Vector DB upload support

# Install dev dependencies (using dependency-groups)
pip install -e ".[dev]"

CRITICAL: The project uses a src/ layout. Tests WILL FAIL unless you install with pip install -e . first.

Building

# Build package using uv (recommended)
uv build

# Or using standard build
python -m build

# Publish to PyPI
uv publish

Docker

# Build Docker image
docker build -t skill-seekers .

# Run with docker-compose (includes vector databases)
docker-compose up -d

# Run MCP server only
docker-compose up -d mcp-server

# View logs
docker-compose logs -f mcp-server

Testing Instructions

Running Tests

CRITICAL: Never skip tests - all tests must pass before commits.

# All tests (must run pip install -e . first!)
pytest tests/ -v

# Specific test file
pytest tests/test_scraper_features.py -v
pytest tests/test_mcp_fastmcp.py -v
pytest tests/test_cloud_storage.py -v

# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html

# Single test
pytest tests/test_scraper_features.py::test_detect_language -v

# E2E tests
pytest tests/test_e2e_three_stream_pipeline.py -v

# Skip slow tests
pytest tests/ -v -m "not slow"

# Run only integration tests
pytest tests/ -v -m integration

# Run only specific marker
pytest tests/ -v -m "not slow and not integration"

Test Architecture

98 test files covering all features
1880+ tests passing
CI Matrix: Ubuntu + macOS, Python 3.10-3.12
Test markers defined in pyproject.toml:

Marker	Description
`slow`	Tests taking >5 seconds
`integration`	Requires external services (APIs)
`e2e`	End-to-end tests (resource-intensive)
`venv`	Requires virtual environment setup
`bootstrap`	Bootstrap skill specific
`benchmark`	Performance benchmark tests

Test Configuration

From pyproject.toml:

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short --strict-markers"
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"

The conftest.py file checks that the package is installed before running tests.

Code Style Guidelines

Linting and Formatting

# Run ruff linter
ruff check src/ tests/

# Run ruff formatter check
ruff format --check src/ tests/

# Auto-fix issues
ruff check src/ tests/ --fix
ruff format src/ tests/

# Run mypy type checker
mypy src/skill_seekers --show-error-codes --pretty

Style Rules (from pyproject.toml)

Line length: 100 characters
Target Python: 3.10+
Enabled rules: E, W, F, I, B, C4, UP, ARG, SIM
Ignored rules: E501, F541, ARG002, B007, I001, SIM114
Import sorting: isort style with skill_seekers as first-party

MyPy Configuration (from mypy.ini)

[mypy]
python_version = 3.10
warn_return_any = False
warn_unused_configs = True
disallow_untyped_defs = False
check_untyped_defs = True
ignore_missing_imports = True
no_implicit_optional = True
show_error_codes = True

# Gradual typing - be lenient for now
disallow_incomplete_defs = False
disallow_untyped_calls = False

Code Conventions

Use type hints where practical (gradual typing approach)
Docstrings: Use Google-style or standard docstrings
Error handling: Use specific exceptions, provide helpful messages
Async code: Use asyncio, mark tests with @pytest.mark.asyncio
File naming: Use snake_case for all Python files
Class naming: Use PascalCase for classes
Function naming: Use snake_case for functions and methods
Constants: Use UPPER_CASE for module-level constants

Architecture Patterns

Platform Adaptor Pattern (Strategy Pattern)

All platform-specific logic is encapsulated in adaptors:

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'langchain', etc.

# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform
adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)

Each adaptor inherits from SkillAdaptor base class and implements:

format_skill_md() - Format SKILL.md content
package() - Create platform-specific package
upload() - Upload to platform API
validate_api_key() - Validate API key format
supports_enhancement() - Whether AI enhancement is supported

CLI Architecture (Git-style)

Entry point: src/skill_seekers/cli/main.py

The CLI uses subcommands that delegate to existing modules:

# skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv

Available subcommands:

create - Unified create command
config - Configuration wizard
scrape - Documentation scraping
github - GitHub repository scraping
pdf - PDF extraction
unified - Multi-source scraping
analyze / codebase - Local codebase analysis
enhance - AI enhancement
package - Package skill for target platform
upload - Upload to platform
cloud - Cloud storage operations
sync - Sync monitoring
benchmark - Performance benchmarking
embed - Embedding server
install / install-agent - Complete workflow
stream - Streaming ingestion
update - Incremental updates
multilang - Multi-language support
quality - Quality metrics
resume - Resume interrupted jobs
estimate - Estimate page counts
workflows - Workflow management

MCP Server Architecture

Two implementations:

server_fastmcp.py - Modern, decorator-based (recommended, ~708 lines)
server_legacy.py - Legacy implementation

Tools are organized by category:

Config tools (3 tools): generate_config, list_configs, validate_config
Scraping tools (9 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill
Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
Splitting tools (2 tools): split_config, generate_router
Vector Database tools (4 tools): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant
Workflow tools (5 tools): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow

Running MCP Server:

# Stdio transport (default)
python -m skill_seekers.mcp.server_fastmcp

# HTTP transport
python -m skill_seekers.mcp.server_fastmcp --http --port 8765

Cloud Storage Architecture

Abstract base class pattern for cloud providers:

base_storage.py - Defines BaseStorageAdaptor interface
s3_storage.py - AWS S3 implementation
gcs_storage.py - Google Cloud Storage implementation
azure_storage.py - Azure Blob Storage implementation

Sync Monitoring Architecture

Pydantic-based models in src/skill_seekers/sync/:

models.py - Data models (SyncConfig, ChangeReport, SyncState)
detector.py - Change detection logic
monitor.py - Monitoring daemon
notifier.py - Notification system (webhook, email, slack)

Git Workflow

Branch Structure

main (production)
  ↑
  │ (only maintainer merges)
  │
development (integration) ← default branch for PRs
  ↑
  │ (all contributor PRs go here)
  │
feature branches

main - Production, always stable, protected
development - Active development, default for PRs
Feature branches - Your work, created from development

Creating a Feature Branch

# 1. Checkout development
git checkout development
git pull upstream development

# 2. Create feature branch
git checkout -b my-feature

# 3. Make changes, commit, push
git add .
git commit -m "Add my feature"
git push origin my-feature

# 4. Create PR targeting 'development' branch

CI/CD Configuration

GitHub Actions Workflows

All workflows are in .github/workflows/:

tests.yml:

Runs on: push/PR to main and development
Lint job: Ruff + MyPy
Test matrix: Ubuntu + macOS, Python 3.10-3.12
Coverage: Uploads to Codecov

release.yml:

Triggered on version tags (v*)
Builds and publishes to PyPI using uv
Creates GitHub release with changelog

docker-publish.yml:

Builds and publishes Docker images
Multi-architecture support (linux/amd64, linux/arm64)

vector-db-export.yml:

Tests vector database exports

scheduled-updates.yml:

Scheduled sync monitoring

quality-metrics.yml:

Quality metrics tracking

test-vector-dbs.yml:

Vector database integration tests

Pre-commit Checks (Manual)

# Before committing, run:
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x  # Stop on first failure

Security Considerations

API Keys and Secrets

Never commit API keys to the repository
Use environment variables:
- ANTHROPIC_API_KEY - Claude AI
- GOOGLE_API_KEY - Google Gemini
- OPENAI_API_KEY - OpenAI
- GITHUB_TOKEN - GitHub API
- AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY - AWS S3
- GOOGLE_APPLICATION_CREDENTIALS - GCS
- AZURE_STORAGE_CONNECTION_STRING - Azure
Configuration storage:
- Stored at ~/.config/skill-seekers/config.json
- Permissions: 600 (owner read/write only)

Rate Limit Handling

GitHub API has rate limits (5000 requests/hour for authenticated)
The tool has built-in rate limit handling with retry logic
Use --non-interactive flag for CI/CD environments

Custom API Endpoints

Support for Claude-compatible APIs:

export ANTHROPIC_API_KEY=your-custom-api-key
export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1

Common Development Tasks

Adding a New CLI Command

Create module in src/skill_seekers/cli/my_command.py
Implement main() function with argument parsing

Add entry point in pyproject.toml:

[project.scripts]
skill-seekers-my-command = "skill_seekers.cli.my_command:main"

Add subcommand handler in src/skill_seekers/cli/main.py
Add argument parser in src/skill_seekers/cli/parsers/
Add tests in tests/test_my_command.py

Adding a New Platform Adaptor

Create src/skill_seekers/cli/adaptors/my_platform.py
Inherit from SkillAdaptor base class
Implement required methods: package(), upload(), format_skill_md()
Register in src/skill_seekers/cli/adaptors/__init__.py
Add optional dependencies in pyproject.toml
Add tests in tests/test_adaptors/

Adding an MCP Tool

Implement tool logic in src/skill_seekers/mcp/tools/category_tools.py
Register in src/skill_seekers/mcp/server_fastmcp.py
Add test in tests/test_mcp_fastmcp.py

Adding Cloud Storage Provider

Create module in src/skill_seekers/cli/storage/my_storage.py
Inherit from BaseStorageAdaptor base class
Implement required methods: upload_file(), download_file(), list_files(), delete_file()
Register in src/skill_seekers/cli/storage/__init__.py
Add optional dependencies in pyproject.toml

Documentation

Project Documentation (New Structure - v3.1.0+)

Entry Points:

README.md - Main project documentation with navigation
docs/README.md - Documentation hub
AGENTS.md - This file, for AI coding agents

Getting Started (for new users):

docs/getting-started/01-installation.md - Installation guide
docs/getting-started/02-quick-start.md - 3 commands to first skill
docs/getting-started/03-your-first-skill.md - Complete walkthrough
docs/getting-started/04-next-steps.md - Where to go from here

User Guides (common tasks):

docs/user-guide/01-core-concepts.md - How Skill Seekers works
docs/user-guide/02-scraping.md - All scraping options
docs/user-guide/03-enhancement.md - AI enhancement explained
docs/user-guide/04-packaging.md - Export to platforms
docs/user-guide/05-workflows.md - Enhancement workflows
docs/user-guide/06-troubleshooting.md - Common issues

Reference (technical details):

docs/reference/CLI_REFERENCE.md - Complete command reference (20 commands)
docs/reference/MCP_REFERENCE.md - MCP tools reference (26 tools)
docs/reference/CONFIG_FORMAT.md - JSON configuration specification
docs/reference/ENVIRONMENT_VARIABLES.md - All environment variables

Advanced (power user topics):

docs/advanced/mcp-server.md - MCP server setup
docs/advanced/mcp-tools.md - Advanced MCP usage
docs/advanced/custom-workflows.md - Creating custom workflows
docs/advanced/multi-source.md - Multi-source scraping

Legacy (being phased out):

QUICKSTART.md - Old quick start (see docs/getting-started/)
docs/guides/USAGE.md - Old usage guide (see docs/user-guide/)
docs/QUICK_REFERENCE.md - Old reference (see docs/reference/)

Configuration Documentation

Preset configs are in configs/ directory:

godot.json - Godot Engine
blender.json / blender-unified.json - Blender Engine
claude-code.json - Claude Code
httpx_comprehensive.json - HTTPX library
medusa-mercurjs.json - Medusa/MercurJS
astrovalley_unified.json - Astrovalley
configs/integrations/ - Integration-specific configs

Configuration Documentation

Preset configs are in configs/ directory:

godot.json - Godot Engine
blender.json / blender-unified.json - Blender Engine
claude-code.json - Claude Code
httpx_comprehensive.json - HTTPX library
medusa-mercurjs.json - Medusa/MercurJS
astrovalley_unified.json - Astrovalley
configs/integrations/ - Integration-specific configs

Key Dependencies

Core Dependencies (Required)

Package	Version	Purpose
`requests`	>=2.32.5	HTTP requests
`beautifulsoup4`	>=4.14.2	HTML parsing
`PyGithub`	>=2.5.0	GitHub API
`GitPython`	>=3.1.40	Git operations
`httpx`	>=0.28.1	Async HTTP
`anthropic`	>=0.76.0	Claude AI API
`PyMuPDF`	>=1.24.14	PDF processing
`Pillow`	>=11.0.0	Image processing
`pytesseract`	>=0.3.13	OCR
`pydantic`	>=2.12.3	Data validation
`pydantic-settings`	>=2.11.0	Settings management
`click`	>=8.3.0	CLI framework
`Pygments`	>=2.19.2	Syntax highlighting
`pathspec`	>=0.12.1	Path matching
`networkx`	>=3.0	Graph operations
`schedule`	>=1.2.0	Scheduled tasks
`python-dotenv`	>=1.1.1	Environment variables
`jsonschema`	>=4.25.1	JSON validation
`PyYAML`	>=6.0	YAML parsing

Optional Dependencies

Feature	Package	Install Command
MCP Server	`mcp>=1.25,<2`	`pip install -e ".[mcp]"`
Google Gemini	`google-generativeai>=0.8.0`	`pip install -e ".[gemini]"`
OpenAI	`openai>=1.0.0`	`pip install -e ".[openai]"`
AWS S3	`boto3>=1.34.0`	`pip install -e ".[s3]"`
Google Cloud Storage	`google-cloud-storage>=2.10.0`	`pip install -e ".[gcs]"`
Azure Blob Storage	`azure-storage-blob>=12.19.0`	`pip install -e ".[azure]"`
Chroma DB	`chromadb>=0.4.0`	`pip install -e ".[chroma]"`
Weaviate	`weaviate-client>=3.25.0`	`pip install -e ".[weaviate]"`
Embedding Server	`fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0`	`pip install -e ".[embedding]"`

Dev Dependencies (in dependency-groups)

Package	Version	Purpose
`pytest`	>=8.4.2	Testing framework
`pytest-asyncio`	>=0.24.0	Async test support
`pytest-cov`	>=7.0.0	Coverage
`coverage`	>=7.11.0	Coverage reporting
`ruff`	>=0.14.13	Linting/formatting
`mypy`	>=1.19.1	Type checking
`psutil`	>=5.9.0	Process utilities for testing
`numpy`	>=1.24.0	Numerical operations
`starlette`	>=0.31.0	HTTP transport testing
`boto3`	>=1.26.0	AWS S3 testing
`google-cloud-storage`	>=2.10.0	GCS testing
`azure-storage-blob`	>=12.17.0	Azure testing

Troubleshooting

Common Issues

ImportError: No module named 'skill_seekers'

Solution: Run pip install -e .

Tests failing with "package not installed"

Solution: Ensure you ran pip install -e . in the correct virtual environment

MCP server import errors

Solution: Install with pip install -e ".[mcp]"

Type checking failures

MyPy is configured to be lenient (gradual typing)
Focus on critical paths, not full coverage

Docker build failures

Ensure you have BuildKit enabled: DOCKER_BUILDKIT=1
Check that all submodules are initialized: git submodule update --init

Rate limit errors from GitHub

Set GITHUB_TOKEN environment variable for authenticated requests
Improves rate limit from 60 to 5000 requests/hour

Getting Help

Check TROUBLESHOOTING.md for detailed solutions
Review docs/FAQ.md for common questions
Visit https://skillseekersweb.com/ for documentation
Open an issue on GitHub with:
- Clear title and description
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Python version)
- Error messages and stack traces

Environment Variables Reference

Variable	Purpose	Required For
`ANTHROPIC_API_KEY`	Claude AI API access	Claude enhancement/upload
`GOOGLE_API_KEY`	Google Gemini API access	Gemini enhancement/upload
`OPENAI_API_KEY`	OpenAI API access	OpenAI enhancement/upload
`GITHUB_TOKEN`	GitHub API authentication	GitHub scraping (recommended)
`AWS_ACCESS_KEY_ID`	AWS S3 authentication	S3 cloud storage
`AWS_SECRET_ACCESS_KEY`	AWS S3 authentication	S3 cloud storage
`GOOGLE_APPLICATION_CREDENTIALS`	GCS authentication path	GCS cloud storage
`AZURE_STORAGE_CONNECTION_STRING`	Azure Blob authentication	Azure cloud storage
`ANTHROPIC_BASE_URL`	Custom Claude endpoint	Custom API endpoints
`SKILL_SEEKERS_HOME`	Data directory path	Docker/runtime
`SKILL_SEEKERS_OUTPUT`	Output directory path	Docker/runtime

Version Management

The version is defined in pyproject.toml and dynamically read by src/skill_seekers/_version.py:

# _version.py reads from pyproject.toml
__version__ = get_version()  # Returns version from pyproject.toml

To update version:

Edit version in pyproject.toml
The _version.py file will automatically pick up the new version

Configuration File Format

Skill Seekers uses JSON configuration files to define scraping targets. Example structure:

{
  "name": "godot",
  "description": "Godot Engine documentation",
  "merge_mode": "claude-enhanced",
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.godotengine.org/en/stable/",
      "extract_api": true,
      "selectors": {
        "main_content": "div[role='main']",
        "title": "title",
        "code_blocks": "pre"
      },
      "url_patterns": {
        "include": [],
        "exclude": ["/search.html", "/_static/"]
      },
      "categories": {
        "getting_started": ["introduction", "getting_started"],
        "scripting": ["scripting", "gdscript"]
      },
      "rate_limit": 0.5,
      "max_pages": 500
    },
    {
      "type": "github",
      "repo": "godotengine/godot",
      "enable_codebase_analysis": true,
      "code_analysis_depth": "deep",
      "fetch_issues": true,
      "max_issues": 100
    }
  ]
}

This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.

Last updated: 2026-02-16

29 KiB Raw Blame History

AGENTS.md - Skill Seekers

Project Overview

Key Facts

Supported Target Platforms

Core Workflow

Project Structure

Build and Development Commands

Prerequisites

Setup (REQUIRED before any development)

Building

Docker

Testing Instructions

Running Tests

Test Architecture

Test Configuration

Code Style Guidelines

Linting and Formatting

Style Rules (from pyproject.toml)

MyPy Configuration (from mypy.ini)

Code Conventions

Architecture Patterns

Platform Adaptor Pattern (Strategy Pattern)

CLI Architecture (Git-style)

MCP Server Architecture

Cloud Storage Architecture

Sync Monitoring Architecture

Git Workflow

Branch Structure

Creating a Feature Branch

CI/CD Configuration

GitHub Actions Workflows

Pre-commit Checks (Manual)

Security Considerations

API Keys and Secrets

Rate Limit Handling

Custom API Endpoints

Common Development Tasks

Adding a New CLI Command

Adding a New Platform Adaptor

Adding an MCP Tool

Adding Cloud Storage Provider

Documentation

Project Documentation (New Structure - v3.1.0+)

Configuration Documentation

Configuration Documentation

Key Dependencies

Core Dependencies (Required)

Optional Dependencies

Dev Dependencies (in dependency-groups)

Troubleshooting

Common Issues

Getting Help

Environment Variables Reference

Version Management

Configuration File Format

29 KiB

Raw Blame History