Files
skill-seekers-reference/AGENTS.md
yusyus 80a40b4fc9 docs: Add AGENTS.md guide for AI coding agents
- Comprehensive guide for AI assistants working with the codebase
- Covers project structure, development commands, architecture patterns
- Includes testing guidelines, CI/CD info, and troubleshooting
- Documents all entry points, dependencies, and best practices
2026-02-01 16:31:20 +03:00

12 KiB

AGENTS.md - Skill Seekers

This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.


Project Overview

Skill Seekers is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms:

  • Claude AI (ZIP + YAML format)
  • Google Gemini (tar.gz format)
  • OpenAI ChatGPT (ZIP + Vector Store)
  • Generic Markdown (universal ZIP export)

Current Version: 2.7.4 Python Version: 3.10+ required License: MIT Website: https://skillseekersweb.com/

Core Workflow

  1. Scrape Phase - Crawl documentation/GitHub/PDF sources
  2. Build Phase - Organize content into categorized references
  3. Enhancement Phase - AI-powered quality improvements (optional)
  4. Package Phase - Create platform-specific packages
  5. Upload Phase - Auto-upload to target platform (optional)

Project Structure

/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
├── src/skill_seekers/              # Main source code (src/ layout)
│   ├── cli/                        # CLI tools and commands
│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
│   │   │   ├── base.py             # Abstract base class
│   │   │   ├── claude.py           # Claude AI adaptor
│   │   │   ├── gemini.py           # Google Gemini adaptor
│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
│   │   │   └── markdown.py         # Generic Markdown adaptor
│   │   ├── main.py                 # Unified CLI entry point
│   │   ├── doc_scraper.py          # Documentation scraper
│   │   ├── github_scraper.py       # GitHub repository scraper
│   │   ├── pdf_scraper.py          # PDF extraction
│   │   ├── unified_scraper.py      # Multi-source scraping
│   │   ├── codebase_scraper.py     # Local codebase analysis (C2.x/C3.x)
│   │   ├── enhance_skill_local.py  # AI enhancement (LOCAL mode)
│   │   ├── package_skill.py        # Skill packager
│   │   ├── upload_skill.py         # Upload to platforms
│   │   └── ...                     # 50+ CLI modules
│   └── mcp/                        # MCP server integration
│       ├── server_fastmcp.py       # FastMCP server (main)
│       ├── server.py               # Legacy server
│       └── tools/                  # MCP tool implementations
├── tests/                          # Test suite (76 test files)
├── configs/                        # Preset configuration files
├── docs/                           # Documentation (54 markdown files)
├── .github/workflows/              # CI/CD workflows
├── pyproject.toml                  # Main project configuration
└── requirements.txt                # Pinned dependencies

Build and Development Commands

Setup (REQUIRED before any development)

# Install in editable mode (REQUIRED for tests due to src/ layout)
pip install -e .

# Install with all platform dependencies
pip install -e ".[all-llms]"

# Install specific platforms only
pip install -e ".[gemini]"    # Google Gemini support
pip install -e ".[openai]"    # OpenAI ChatGPT support
pip install -e ".[mcp]"       # MCP server dependencies

CRITICAL: The project uses a src/ layout. Tests WILL FAIL unless you install with pip install -e . first.

Building

# Build package using uv (recommended)
uv build

# Or using standard build
python -m build

# Publish to PyPI
uv publish

Running Tests

CRITICAL: Never skip tests - all tests must pass before commits.

# All tests (must run pip install -e . first!)
pytest tests/ -v

# Specific test file
pytest tests/test_scraper_features.py -v
pytest tests/test_mcp_fastmcp.py -v

# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html

# Single test
pytest tests/test_scraper_features.py::test_detect_language -v

# E2E tests
pytest tests/test_e2e_three_stream_pipeline.py -v

Test Architecture:

  • 76 test files covering all features
  • CI Matrix: Ubuntu + macOS, Python 3.10-3.13
  • 1200+ tests passing
  • Test markers: slow, integration, e2e, venv, bootstrap

Code Style Guidelines

Linting and Formatting

# Run ruff linter
ruff check src/ tests/

# Run ruff formatter check
ruff format --check src/ tests/

# Auto-fix issues
ruff check src/ tests/ --fix
ruff format src/ tests/

# Run mypy type checker
mypy src/skill_seekers --show-error-codes --pretty

Style Rules (from pyproject.toml)

  • Line length: 100 characters
  • Target Python: 3.10+
  • Enabled rules: E, W, F, I, B, C4, UP, ARG, SIM
  • Import sorting: isort style with skill_seekers as first-party

Code Conventions

  1. Use type hints where practical (gradual typing approach)
  2. Docstrings: Use Google-style or standard docstrings
  3. Error handling: Use specific exceptions, provide helpful messages
  4. Async code: Use asyncio, mark tests with @pytest.mark.asyncio
  5. File naming: Use snake_case for all Python files

Architecture Patterns

Platform Adaptor Pattern (Strategy Pattern)

All platform-specific logic is encapsulated in adaptors:

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'markdown'

# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform
adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)

CLI Architecture (Git-style)

Entry point: src/skill_seekers/cli/main.py

The CLI uses subcommands that delegate to existing modules:

# skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv

Available subcommands:

  • config - Configuration wizard
  • scrape - Documentation scraping
  • github - GitHub repository scraping
  • pdf - PDF extraction
  • unified - Multi-source scraping
  • analyze - Local codebase analysis
  • enhance - AI enhancement
  • package - Package skill
  • upload - Upload to platform
  • install / install-agent - Complete workflow

MCP Server Architecture

Two implementations:

  • server_fastmcp.py - Modern, decorator-based (recommended, 708 lines)
  • server.py - Legacy implementation (2200 lines)

Tools are organized by category:

  • Config tools (3)
  • Scraping tools (8)
  • Packaging tools (4)
  • Splitting tools (2)
  • Source tools (4)

Testing Instructions

Test Categories

Marker Description
slow Tests taking >5 seconds
integration Requires external services (APIs)
e2e End-to-end tests (resource-intensive)
venv Requires virtual environment setup
bootstrap Bootstrap skill specific

Running Specific Test Categories

# Skip slow tests
pytest tests/ -v -m "not slow"

# Run only integration tests
pytest tests/ -v -m integration

# Run E2E tests
pytest tests/ -v -m e2e

Test Configuration (pytest.ini in pyproject.toml)

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short --strict-markers"
asyncio_mode = "auto"

Git Workflow

Branch Structure

main (production)
  ↑
  │ (only maintainer merges)
  │
development (integration) ← default branch for PRs
  ↑
  │ (all contributor PRs go here)
  │
feature branches
  • main - Production, always stable, protected
  • development - Active development, default for PRs
  • Feature branches - Your work, created from development

Creating a Feature Branch

# 1. Checkout development
git checkout development
git pull upstream development

# 2. Create feature branch
git checkout -b my-feature

# 3. Make changes, commit, push
git add .
git commit -m "Add my feature"
git push origin my-feature

# 4. Create PR targeting 'development' branch

CI/CD Configuration

GitHub Actions Workflows

.github/workflows/tests.yml:

  • Runs on: push/PR to main and development
  • Lint job: Ruff + MyPy
  • Test matrix: Ubuntu + macOS, Python 3.10-3.12
  • Coverage: Uploads to Codecov

.github/workflows/release.yml:

  • Triggered on version tags
  • Builds and publishes to PyPI

Pre-commit Checks (Manual)

# Before committing, run:
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x  # Stop on first failure

Security Considerations

API Keys and Secrets

  1. Never commit API keys to the repository
  2. Use environment variables:
    • ANTHROPIC_API_KEY - Claude AI
    • GOOGLE_API_KEY - Google Gemini
    • OPENAI_API_KEY - OpenAI
    • GITHUB_TOKEN - GitHub API
  3. Configuration storage:
    • Stored at ~/.config/skill-seekers/config.json
    • Permissions: 600 (owner read/write only)

Rate Limit Handling

  • GitHub API has rate limits (5000 requests/hour for authenticated)
  • The tool has built-in rate limit handling with retry logic
  • Use --non-interactive flag for CI/CD environments

Custom API Endpoints

Support for Claude-compatible APIs (e.g., GLM-4.7):

export ANTHROPIC_API_KEY=your-glm-47-api-key
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1

Common Development Tasks

Adding a New CLI Command

  1. Create module in src/skill_seekers/cli/my_command.py
  2. Implement main() function with argument parsing
  3. Add entry point in pyproject.toml:
    [project.scripts]
    skill-seekers-my-command = "skill_seekers.cli.my_command:main"
    
  4. Add subcommand handler in src/skill_seekers/cli/main.py
  5. Add tests in tests/test_my_command.py

Adding a New Platform Adaptor

  1. Create src/skill_seekers/cli/adaptors/my_platform.py
  2. Inherit from SkillAdaptor base class
  3. Implement required methods: package(), upload(), enhance()
  4. Register in src/skill_seekers/cli/adaptors/__init__.py
  5. Add optional dependencies in pyproject.toml
  6. Add tests in tests/test_adaptors/

Adding an MCP Tool

  1. Implement tool logic in src/skill_seekers/mcp/tools/category_tools.py
  2. Register in src/skill_seekers/mcp/server_fastmcp.py
  3. Add test in tests/test_mcp_fastmcp.py

Documentation

Project Documentation

  • README.md - Main project documentation
  • README.zh-CN.md - Chinese translation
  • CLAUDE.md - Detailed implementation guidance
  • QUICKSTART.md - Quick start guide
  • CONTRIBUTING.md - Contribution guidelines
  • docs/ - Comprehensive documentation (54 files)

Configuration Documentation

Preset configs are in configs/ directory:

  • godot.json - Godot Engine
  • react.json - React
  • vue.json - Vue.js
  • fastapi.json - FastAPI
  • *_unified.json - Multi-source configs

Troubleshooting

Common Issues

ImportError: No module named 'skill_seekers'

  • Solution: Run pip install -e .

Tests failing with "package not installed"

  • Solution: Ensure you ran pip install -e . in the correct virtual environment

MCP server import errors

  • Solution: Install with pip install -e ".[mcp]"

Type checking failures

  • MyPy is configured to be lenient (gradual typing)
  • Focus on critical paths, not full coverage

Getting Help

  • Check TROUBLESHOOTING.md for detailed solutions
  • Review docs/FAQ.md for common questions
  • Visit https://skillseekersweb.com/ for documentation
  • Open an issue on GitHub with:
    • Clear title and description
    • Steps to reproduce
    • Expected vs actual behavior
    • Environment details (OS, Python version)
    • Error messages and stack traces

Key Dependencies

Core Dependencies

  • requests>=2.32.5 - HTTP requests
  • beautifulsoup4>=4.14.2 - HTML parsing
  • PyGithub>=2.5.0 - GitHub API
  • GitPython>=3.1.40 - Git operations
  • httpx>=0.28.1 - Async HTTP
  • anthropic>=0.76.0 - Claude AI API
  • PyMuPDF>=1.24.14 - PDF processing
  • pydantic>=2.12.3 - Data validation
  • click>=8.3.0 - CLI framework

Optional Dependencies

  • mcp>=1.25 - MCP server
  • google-generativeai>=0.8.0 - Gemini support
  • openai>=1.0.0 - OpenAI support

Dev Dependencies

  • pytest>=8.4.2 - Testing framework
  • pytest-asyncio>=0.24.0 - Async test support
  • pytest-cov>=7.0.0 - Coverage
  • ruff>=0.14.13 - Linting/formatting
  • mypy>=1.19.1 - Type checking

This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.