From 80a40b4fc95c09324fadde274b2386d54aa138a1 Mon Sep 17 00:00:00 2001 From: yusyus Date: Sun, 1 Feb 2026 16:31:20 +0300 Subject: [PATCH] docs: Add AGENTS.md guide for AI coding agents - Comprehensive guide for AI assistants working with the codebase - Covers project structure, development commands, architecture patterns - Includes testing guidelines, CI/CD info, and troubleshooting - Documents all entry points, dependencies, and best practices --- AGENTS.md | 469 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 469 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..0fffdb3 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,469 @@ +# AGENTS.md - Skill Seekers + +This file provides essential guidance for AI coding agents working with the Skill Seekers codebase. + +--- + +## Project Overview + +**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms: + +- **Claude AI** (ZIP + YAML format) +- **Google Gemini** (tar.gz format) +- **OpenAI ChatGPT** (ZIP + Vector Store) +- **Generic Markdown** (universal ZIP export) + +**Current Version:** 2.7.4 +**Python Version:** 3.10+ required +**License:** MIT +**Website:** https://skillseekersweb.com/ + +### Core Workflow + +1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources +2. **Build Phase** - Organize content into categorized references +3. **Enhancement Phase** - AI-powered quality improvements (optional) +4. **Package Phase** - Create platform-specific packages +5. **Upload Phase** - Auto-upload to target platform (optional) + +--- + +## Project Structure + +``` +/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/ +├── src/skill_seekers/ # Main source code (src/ layout) +│ ├── cli/ # CLI tools and commands +│ │ ├── adaptors/ # Platform adaptors (Strategy pattern) +│ │ │ ├── base.py # Abstract base class +│ │ │ ├── claude.py # Claude AI adaptor +│ │ │ ├── gemini.py # Google Gemini adaptor +│ │ │ ├── openai.py # OpenAI ChatGPT adaptor +│ │ │ └── markdown.py # Generic Markdown adaptor +│ │ ├── main.py # Unified CLI entry point +│ │ ├── doc_scraper.py # Documentation scraper +│ │ ├── github_scraper.py # GitHub repository scraper +│ │ ├── pdf_scraper.py # PDF extraction +│ │ ├── unified_scraper.py # Multi-source scraping +│ │ ├── codebase_scraper.py # Local codebase analysis (C2.x/C3.x) +│ │ ├── enhance_skill_local.py # AI enhancement (LOCAL mode) +│ │ ├── package_skill.py # Skill packager +│ │ ├── upload_skill.py # Upload to platforms +│ │ └── ... # 50+ CLI modules +│ └── mcp/ # MCP server integration +│ ├── server_fastmcp.py # FastMCP server (main) +│ ├── server.py # Legacy server +│ └── tools/ # MCP tool implementations +├── tests/ # Test suite (76 test files) +├── configs/ # Preset configuration files +├── docs/ # Documentation (54 markdown files) +├── .github/workflows/ # CI/CD workflows +├── pyproject.toml # Main project configuration +└── requirements.txt # Pinned dependencies +``` + +--- + +## Build and Development Commands + +### Setup (REQUIRED before any development) + +```bash +# Install in editable mode (REQUIRED for tests due to src/ layout) +pip install -e . + +# Install with all platform dependencies +pip install -e ".[all-llms]" + +# Install specific platforms only +pip install -e ".[gemini]" # Google Gemini support +pip install -e ".[openai]" # OpenAI ChatGPT support +pip install -e ".[mcp]" # MCP server dependencies +``` + +**CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first. + +### Building + +```bash +# Build package using uv (recommended) +uv build + +# Or using standard build +python -m build + +# Publish to PyPI +uv publish +``` + +### Running Tests + +**CRITICAL:** Never skip tests - all tests must pass before commits. + +```bash +# All tests (must run pip install -e . first!) +pytest tests/ -v + +# Specific test file +pytest tests/test_scraper_features.py -v +pytest tests/test_mcp_fastmcp.py -v + +# With coverage +pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html + +# Single test +pytest tests/test_scraper_features.py::test_detect_language -v + +# E2E tests +pytest tests/test_e2e_three_stream_pipeline.py -v +``` + +**Test Architecture:** +- 76 test files covering all features +- CI Matrix: Ubuntu + macOS, Python 3.10-3.13 +- 1200+ tests passing +- Test markers: `slow`, `integration`, `e2e`, `venv`, `bootstrap` + +--- + +## Code Style Guidelines + +### Linting and Formatting + +```bash +# Run ruff linter +ruff check src/ tests/ + +# Run ruff formatter check +ruff format --check src/ tests/ + +# Auto-fix issues +ruff check src/ tests/ --fix +ruff format src/ tests/ + +# Run mypy type checker +mypy src/skill_seekers --show-error-codes --pretty +``` + +### Style Rules (from pyproject.toml) + +- **Line length:** 100 characters +- **Target Python:** 3.10+ +- **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM +- **Import sorting:** isort style with `skill_seekers` as first-party + +### Code Conventions + +1. **Use type hints** where practical (gradual typing approach) +2. **Docstrings:** Use Google-style or standard docstrings +3. **Error handling:** Use specific exceptions, provide helpful messages +4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio` +5. **File naming:** Use snake_case for all Python files + +--- + +## Architecture Patterns + +### Platform Adaptor Pattern (Strategy Pattern) + +All platform-specific logic is encapsulated in adaptors: + +```python +from skill_seekers.cli.adaptors import get_adaptor + +# Get platform-specific adaptor +adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'markdown' + +# Package skill +adaptor.package(skill_dir='output/react/', output_path='output/') + +# Upload to platform +adaptor.upload( + package_path='output/react-gemini.tar.gz', + api_key=os.getenv('GOOGLE_API_KEY') +) +``` + +### CLI Architecture (Git-style) + +Entry point: `src/skill_seekers/cli/main.py` + +The CLI uses subcommands that delegate to existing modules: + +```python +# skill-seekers scrape --config react.json +# Transforms to: doc_scraper.main() with modified sys.argv +``` + +**Available subcommands:** +- `config` - Configuration wizard +- `scrape` - Documentation scraping +- `github` - GitHub repository scraping +- `pdf` - PDF extraction +- `unified` - Multi-source scraping +- `analyze` - Local codebase analysis +- `enhance` - AI enhancement +- `package` - Package skill +- `upload` - Upload to platform +- `install` / `install-agent` - Complete workflow + +### MCP Server Architecture + +Two implementations: +- `server_fastmcp.py` - Modern, decorator-based (recommended, 708 lines) +- `server.py` - Legacy implementation (2200 lines) + +Tools are organized by category: +- Config tools (3) +- Scraping tools (8) +- Packaging tools (4) +- Splitting tools (2) +- Source tools (4) + +--- + +## Testing Instructions + +### Test Categories + +| Marker | Description | +|--------|-------------| +| `slow` | Tests taking >5 seconds | +| `integration` | Requires external services (APIs) | +| `e2e` | End-to-end tests (resource-intensive) | +| `venv` | Requires virtual environment setup | +| `bootstrap` | Bootstrap skill specific | + +### Running Specific Test Categories + +```bash +# Skip slow tests +pytest tests/ -v -m "not slow" + +# Run only integration tests +pytest tests/ -v -m integration + +# Run E2E tests +pytest tests/ -v -m e2e +``` + +### Test Configuration (pytest.ini in pyproject.toml) + +```toml +[tool.pytest.ini_options] +testpaths = ["tests"] +python_files = ["test_*.py"] +addopts = "-v --tb=short --strict-markers" +asyncio_mode = "auto" +``` + +--- + +## Git Workflow + +### Branch Structure + +``` +main (production) + ↑ + │ (only maintainer merges) + │ +development (integration) ← default branch for PRs + ↑ + │ (all contributor PRs go here) + │ +feature branches +``` + +- **`main`** - Production, always stable, protected +- **`development`** - Active development, default for PRs +- **Feature branches** - Your work, created from `development` + +### Creating a Feature Branch + +```bash +# 1. Checkout development +git checkout development +git pull upstream development + +# 2. Create feature branch +git checkout -b my-feature + +# 3. Make changes, commit, push +git add . +git commit -m "Add my feature" +git push origin my-feature + +# 4. Create PR targeting 'development' branch +``` + +--- + +## CI/CD Configuration + +### GitHub Actions Workflows + +**`.github/workflows/tests.yml`:** +- Runs on: push/PR to `main` and `development` +- Lint job: Ruff + MyPy +- Test matrix: Ubuntu + macOS, Python 3.10-3.12 +- Coverage: Uploads to Codecov + +**`.github/workflows/release.yml`:** +- Triggered on version tags +- Builds and publishes to PyPI + +### Pre-commit Checks (Manual) + +```bash +# Before committing, run: +ruff check src/ tests/ +ruff format --check src/ tests/ +pytest tests/ -v -x # Stop on first failure +``` + +--- + +## Security Considerations + +### API Keys and Secrets + +1. **Never commit API keys** to the repository +2. **Use environment variables:** + - `ANTHROPIC_API_KEY` - Claude AI + - `GOOGLE_API_KEY` - Google Gemini + - `OPENAI_API_KEY` - OpenAI + - `GITHUB_TOKEN` - GitHub API +3. **Configuration storage:** + - Stored at `~/.config/skill-seekers/config.json` + - Permissions: 600 (owner read/write only) + +### Rate Limit Handling + +- GitHub API has rate limits (5000 requests/hour for authenticated) +- The tool has built-in rate limit handling with retry logic +- Use `--non-interactive` flag for CI/CD environments + +### Custom API Endpoints + +Support for Claude-compatible APIs (e.g., GLM-4.7): + +```bash +export ANTHROPIC_API_KEY=your-glm-47-api-key +export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1 +``` + +--- + +## Common Development Tasks + +### Adding a New CLI Command + +1. Create module in `src/skill_seekers/cli/my_command.py` +2. Implement `main()` function with argument parsing +3. Add entry point in `pyproject.toml`: + ```toml + [project.scripts] + skill-seekers-my-command = "skill_seekers.cli.my_command:main" + ``` +4. Add subcommand handler in `src/skill_seekers/cli/main.py` +5. Add tests in `tests/test_my_command.py` + +### Adding a New Platform Adaptor + +1. Create `src/skill_seekers/cli/adaptors/my_platform.py` +2. Inherit from `SkillAdaptor` base class +3. Implement required methods: `package()`, `upload()`, `enhance()` +4. Register in `src/skill_seekers/cli/adaptors/__init__.py` +5. Add optional dependencies in `pyproject.toml` +6. Add tests in `tests/test_adaptors/` + +### Adding an MCP Tool + +1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py` +2. Register in `src/skill_seekers/mcp/server_fastmcp.py` +3. Add test in `tests/test_mcp_fastmcp.py` + +--- + +## Documentation + +### Project Documentation + +- **README.md** - Main project documentation +- **README.zh-CN.md** - Chinese translation +- **CLAUDE.md** - Detailed implementation guidance +- **QUICKSTART.md** - Quick start guide +- **CONTRIBUTING.md** - Contribution guidelines +- **docs/** - Comprehensive documentation (54 files) + +### Configuration Documentation + +Preset configs are in `configs/` directory: +- `godot.json` - Godot Engine +- `react.json` - React +- `vue.json` - Vue.js +- `fastapi.json` - FastAPI +- `*_unified.json` - Multi-source configs + +--- + +## Troubleshooting + +### Common Issues + +**ImportError: No module named 'skill_seekers'** +- Solution: Run `pip install -e .` + +**Tests failing with "package not installed"** +- Solution: Ensure you ran `pip install -e .` in the correct virtual environment + +**MCP server import errors** +- Solution: Install with `pip install -e ".[mcp]"` + +**Type checking failures** +- MyPy is configured to be lenient (gradual typing) +- Focus on critical paths, not full coverage + +### Getting Help + +- Check **TROUBLESHOOTING.md** for detailed solutions +- Review **docs/FAQ.md** for common questions +- Visit https://skillseekersweb.com/ for documentation +- Open an issue on GitHub with: + - Clear title and description + - Steps to reproduce + - Expected vs actual behavior + - Environment details (OS, Python version) + - Error messages and stack traces + +--- + +## Key Dependencies + +### Core Dependencies +- `requests>=2.32.5` - HTTP requests +- `beautifulsoup4>=4.14.2` - HTML parsing +- `PyGithub>=2.5.0` - GitHub API +- `GitPython>=3.1.40` - Git operations +- `httpx>=0.28.1` - Async HTTP +- `anthropic>=0.76.0` - Claude AI API +- `PyMuPDF>=1.24.14` - PDF processing +- `pydantic>=2.12.3` - Data validation +- `click>=8.3.0` - CLI framework + +### Optional Dependencies +- `mcp>=1.25` - MCP server +- `google-generativeai>=0.8.0` - Gemini support +- `openai>=1.0.0` - OpenAI support + +### Dev Dependencies +- `pytest>=8.4.2` - Testing framework +- `pytest-asyncio>=0.24.0` - Async test support +- `pytest-cov>=7.0.0` - Coverage +- `ruff>=0.14.13` - Linting/formatting +- `mypy>=1.19.1` - Type checking + +--- + +*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*