- Comprehensive guide for AI assistants working with the codebase - Covers project structure, development commands, architecture patterns - Includes testing guidelines, CI/CD info, and troubleshooting - Documents all entry points, dependencies, and best practices
12 KiB
AGENTS.md - Skill Seekers
This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.
Project Overview
Skill Seekers is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms:
- Claude AI (ZIP + YAML format)
- Google Gemini (tar.gz format)
- OpenAI ChatGPT (ZIP + Vector Store)
- Generic Markdown (universal ZIP export)
Current Version: 2.7.4 Python Version: 3.10+ required License: MIT Website: https://skillseekersweb.com/
Core Workflow
- Scrape Phase - Crawl documentation/GitHub/PDF sources
- Build Phase - Organize content into categorized references
- Enhancement Phase - AI-powered quality improvements (optional)
- Package Phase - Create platform-specific packages
- Upload Phase - Auto-upload to target platform (optional)
Project Structure
/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
├── src/skill_seekers/ # Main source code (src/ layout)
│ ├── cli/ # CLI tools and commands
│ │ ├── adaptors/ # Platform adaptors (Strategy pattern)
│ │ │ ├── base.py # Abstract base class
│ │ │ ├── claude.py # Claude AI adaptor
│ │ │ ├── gemini.py # Google Gemini adaptor
│ │ │ ├── openai.py # OpenAI ChatGPT adaptor
│ │ │ └── markdown.py # Generic Markdown adaptor
│ │ ├── main.py # Unified CLI entry point
│ │ ├── doc_scraper.py # Documentation scraper
│ │ ├── github_scraper.py # GitHub repository scraper
│ │ ├── pdf_scraper.py # PDF extraction
│ │ ├── unified_scraper.py # Multi-source scraping
│ │ ├── codebase_scraper.py # Local codebase analysis (C2.x/C3.x)
│ │ ├── enhance_skill_local.py # AI enhancement (LOCAL mode)
│ │ ├── package_skill.py # Skill packager
│ │ ├── upload_skill.py # Upload to platforms
│ │ └── ... # 50+ CLI modules
│ └── mcp/ # MCP server integration
│ ├── server_fastmcp.py # FastMCP server (main)
│ ├── server.py # Legacy server
│ └── tools/ # MCP tool implementations
├── tests/ # Test suite (76 test files)
├── configs/ # Preset configuration files
├── docs/ # Documentation (54 markdown files)
├── .github/workflows/ # CI/CD workflows
├── pyproject.toml # Main project configuration
└── requirements.txt # Pinned dependencies
Build and Development Commands
Setup (REQUIRED before any development)
# Install in editable mode (REQUIRED for tests due to src/ layout)
pip install -e .
# Install with all platform dependencies
pip install -e ".[all-llms]"
# Install specific platforms only
pip install -e ".[gemini]" # Google Gemini support
pip install -e ".[openai]" # OpenAI ChatGPT support
pip install -e ".[mcp]" # MCP server dependencies
CRITICAL: The project uses a src/ layout. Tests WILL FAIL unless you install with pip install -e . first.
Building
# Build package using uv (recommended)
uv build
# Or using standard build
python -m build
# Publish to PyPI
uv publish
Running Tests
CRITICAL: Never skip tests - all tests must pass before commits.
# All tests (must run pip install -e . first!)
pytest tests/ -v
# Specific test file
pytest tests/test_scraper_features.py -v
pytest tests/test_mcp_fastmcp.py -v
# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
# Single test
pytest tests/test_scraper_features.py::test_detect_language -v
# E2E tests
pytest tests/test_e2e_three_stream_pipeline.py -v
Test Architecture:
- 76 test files covering all features
- CI Matrix: Ubuntu + macOS, Python 3.10-3.13
- 1200+ tests passing
- Test markers:
slow,integration,e2e,venv,bootstrap
Code Style Guidelines
Linting and Formatting
# Run ruff linter
ruff check src/ tests/
# Run ruff formatter check
ruff format --check src/ tests/
# Auto-fix issues
ruff check src/ tests/ --fix
ruff format src/ tests/
# Run mypy type checker
mypy src/skill_seekers --show-error-codes --pretty
Style Rules (from pyproject.toml)
- Line length: 100 characters
- Target Python: 3.10+
- Enabled rules: E, W, F, I, B, C4, UP, ARG, SIM
- Import sorting: isort style with
skill_seekersas first-party
Code Conventions
- Use type hints where practical (gradual typing approach)
- Docstrings: Use Google-style or standard docstrings
- Error handling: Use specific exceptions, provide helpful messages
- Async code: Use
asyncio, mark tests with@pytest.mark.asyncio - File naming: Use snake_case for all Python files
Architecture Patterns
Platform Adaptor Pattern (Strategy Pattern)
All platform-specific logic is encapsulated in adaptors:
from skill_seekers.cli.adaptors import get_adaptor
# Get platform-specific adaptor
adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'markdown'
# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')
# Upload to platform
adaptor.upload(
package_path='output/react-gemini.tar.gz',
api_key=os.getenv('GOOGLE_API_KEY')
)
CLI Architecture (Git-style)
Entry point: src/skill_seekers/cli/main.py
The CLI uses subcommands that delegate to existing modules:
# skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv
Available subcommands:
config- Configuration wizardscrape- Documentation scrapinggithub- GitHub repository scrapingpdf- PDF extractionunified- Multi-source scrapinganalyze- Local codebase analysisenhance- AI enhancementpackage- Package skillupload- Upload to platforminstall/install-agent- Complete workflow
MCP Server Architecture
Two implementations:
server_fastmcp.py- Modern, decorator-based (recommended, 708 lines)server.py- Legacy implementation (2200 lines)
Tools are organized by category:
- Config tools (3)
- Scraping tools (8)
- Packaging tools (4)
- Splitting tools (2)
- Source tools (4)
Testing Instructions
Test Categories
| Marker | Description |
|---|---|
slow |
Tests taking >5 seconds |
integration |
Requires external services (APIs) |
e2e |
End-to-end tests (resource-intensive) |
venv |
Requires virtual environment setup |
bootstrap |
Bootstrap skill specific |
Running Specific Test Categories
# Skip slow tests
pytest tests/ -v -m "not slow"
# Run only integration tests
pytest tests/ -v -m integration
# Run E2E tests
pytest tests/ -v -m e2e
Test Configuration (pytest.ini in pyproject.toml)
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short --strict-markers"
asyncio_mode = "auto"
Git Workflow
Branch Structure
main (production)
↑
│ (only maintainer merges)
│
development (integration) ← default branch for PRs
↑
│ (all contributor PRs go here)
│
feature branches
main- Production, always stable, protecteddevelopment- Active development, default for PRs- Feature branches - Your work, created from
development
Creating a Feature Branch
# 1. Checkout development
git checkout development
git pull upstream development
# 2. Create feature branch
git checkout -b my-feature
# 3. Make changes, commit, push
git add .
git commit -m "Add my feature"
git push origin my-feature
# 4. Create PR targeting 'development' branch
CI/CD Configuration
GitHub Actions Workflows
.github/workflows/tests.yml:
- Runs on: push/PR to
mainanddevelopment - Lint job: Ruff + MyPy
- Test matrix: Ubuntu + macOS, Python 3.10-3.12
- Coverage: Uploads to Codecov
.github/workflows/release.yml:
- Triggered on version tags
- Builds and publishes to PyPI
Pre-commit Checks (Manual)
# Before committing, run:
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x # Stop on first failure
Security Considerations
API Keys and Secrets
- Never commit API keys to the repository
- Use environment variables:
ANTHROPIC_API_KEY- Claude AIGOOGLE_API_KEY- Google GeminiOPENAI_API_KEY- OpenAIGITHUB_TOKEN- GitHub API
- Configuration storage:
- Stored at
~/.config/skill-seekers/config.json - Permissions: 600 (owner read/write only)
- Stored at
Rate Limit Handling
- GitHub API has rate limits (5000 requests/hour for authenticated)
- The tool has built-in rate limit handling with retry logic
- Use
--non-interactiveflag for CI/CD environments
Custom API Endpoints
Support for Claude-compatible APIs (e.g., GLM-4.7):
export ANTHROPIC_API_KEY=your-glm-47-api-key
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
Common Development Tasks
Adding a New CLI Command
- Create module in
src/skill_seekers/cli/my_command.py - Implement
main()function with argument parsing - Add entry point in
pyproject.toml:[project.scripts] skill-seekers-my-command = "skill_seekers.cli.my_command:main" - Add subcommand handler in
src/skill_seekers/cli/main.py - Add tests in
tests/test_my_command.py
Adding a New Platform Adaptor
- Create
src/skill_seekers/cli/adaptors/my_platform.py - Inherit from
SkillAdaptorbase class - Implement required methods:
package(),upload(),enhance() - Register in
src/skill_seekers/cli/adaptors/__init__.py - Add optional dependencies in
pyproject.toml - Add tests in
tests/test_adaptors/
Adding an MCP Tool
- Implement tool logic in
src/skill_seekers/mcp/tools/category_tools.py - Register in
src/skill_seekers/mcp/server_fastmcp.py - Add test in
tests/test_mcp_fastmcp.py
Documentation
Project Documentation
- README.md - Main project documentation
- README.zh-CN.md - Chinese translation
- CLAUDE.md - Detailed implementation guidance
- QUICKSTART.md - Quick start guide
- CONTRIBUTING.md - Contribution guidelines
- docs/ - Comprehensive documentation (54 files)
Configuration Documentation
Preset configs are in configs/ directory:
godot.json- Godot Enginereact.json- Reactvue.json- Vue.jsfastapi.json- FastAPI*_unified.json- Multi-source configs
Troubleshooting
Common Issues
ImportError: No module named 'skill_seekers'
- Solution: Run
pip install -e .
Tests failing with "package not installed"
- Solution: Ensure you ran
pip install -e .in the correct virtual environment
MCP server import errors
- Solution: Install with
pip install -e ".[mcp]"
Type checking failures
- MyPy is configured to be lenient (gradual typing)
- Focus on critical paths, not full coverage
Getting Help
- Check TROUBLESHOOTING.md for detailed solutions
- Review docs/FAQ.md for common questions
- Visit https://skillseekersweb.com/ for documentation
- Open an issue on GitHub with:
- Clear title and description
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Python version)
- Error messages and stack traces
Key Dependencies
Core Dependencies
requests>=2.32.5- HTTP requestsbeautifulsoup4>=4.14.2- HTML parsingPyGithub>=2.5.0- GitHub APIGitPython>=3.1.40- Git operationshttpx>=0.28.1- Async HTTPanthropic>=0.76.0- Claude AI APIPyMuPDF>=1.24.14- PDF processingpydantic>=2.12.3- Data validationclick>=8.3.0- CLI framework
Optional Dependencies
mcp>=1.25- MCP servergoogle-generativeai>=0.8.0- Gemini supportopenai>=1.0.0- OpenAI support
Dev Dependencies
pytest>=8.4.2- Testing frameworkpytest-asyncio>=0.24.0- Async test supportpytest-cov>=7.0.0- Coverageruff>=0.14.13- Linting/formattingmypy>=1.19.1- Type checking
This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.