docs: Add AGENTS.md guide for AI coding agents

- Comprehensive guide for AI assistants working with the codebase - Covers project structure, development commands, architecture patterns - Includes testing guidelines, CI/CD info, and troubleshooting - Documents all entry points, dependencies, and best practices
2026-02-01 16:31:20 +03:00
parent 3a79ceba93
commit 80a40b4fc9
1 changed files with 469 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,469 @@
+# AGENTS.md - Skill Seekers
+
+This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.
+
+---
+
+## Project Overview
+
+**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms:
+
+- **Claude AI** (ZIP + YAML format)
+- **Google Gemini** (tar.gz format)
+- **OpenAI ChatGPT** (ZIP + Vector Store)
+- **Generic Markdown** (universal ZIP export)
+
+**Current Version:** 2.7.4
+**Python Version:** 3.10+ required
+**License:** MIT
+**Website:** https://skillseekersweb.com/
+
+### Core Workflow
+
+1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources
+2. **Build Phase** - Organize content into categorized references
+3. **Enhancement Phase** - AI-powered quality improvements (optional)
+4. **Package Phase** - Create platform-specific packages
+5. **Upload Phase** - Auto-upload to target platform (optional)
+
+---
+
+## Project Structure
+
+```
+/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
+├── src/skill_seekers/              # Main source code (src/ layout)
+│   ├── cli/                        # CLI tools and commands
+│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
+│   │   │   ├── base.py             # Abstract base class
+│   │   │   ├── claude.py           # Claude AI adaptor
+│   │   │   ├── gemini.py           # Google Gemini adaptor
+│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
+│   │   │   └── markdown.py         # Generic Markdown adaptor
+│   │   ├── main.py                 # Unified CLI entry point
+│   │   ├── doc_scraper.py          # Documentation scraper
+│   │   ├── github_scraper.py       # GitHub repository scraper
+│   │   ├── pdf_scraper.py          # PDF extraction
+│   │   ├── unified_scraper.py      # Multi-source scraping
+│   │   ├── codebase_scraper.py     # Local codebase analysis (C2.x/C3.x)
+│   │   ├── enhance_skill_local.py  # AI enhancement (LOCAL mode)
+│   │   ├── package_skill.py        # Skill packager
+│   │   ├── upload_skill.py         # Upload to platforms
+│   │   └── ...                     # 50+ CLI modules
+│   └── mcp/                        # MCP server integration
+│       ├── server_fastmcp.py       # FastMCP server (main)
+│       ├── server.py               # Legacy server
+│       └── tools/                  # MCP tool implementations
+├── tests/                          # Test suite (76 test files)
+├── configs/                        # Preset configuration files
+├── docs/                           # Documentation (54 markdown files)
+├── .github/workflows/              # CI/CD workflows
+├── pyproject.toml                  # Main project configuration
+└── requirements.txt                # Pinned dependencies
+```
+
+---
+
+## Build and Development Commands
+
+### Setup (REQUIRED before any development)
+
+```bash
+# Install in editable mode (REQUIRED for tests due to src/ layout)
+pip install -e .
+
+# Install with all platform dependencies
+pip install -e ".[all-llms]"
+
+# Install specific platforms only
+pip install -e ".[gemini]"    # Google Gemini support
+pip install -e ".[openai]"    # OpenAI ChatGPT support
+pip install -e ".[mcp]"       # MCP server dependencies
+```
+
+**CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first.
+
+### Building
+
+```bash
+# Build package using uv (recommended)
+uv build
+
+# Or using standard build
+python -m build
+
+# Publish to PyPI
+uv publish
+```
+
+### Running Tests
+
+**CRITICAL:** Never skip tests - all tests must pass before commits.
+
+```bash
+# All tests (must run pip install -e . first!)
+pytest tests/ -v
+
+# Specific test file
+pytest tests/test_scraper_features.py -v
+pytest tests/test_mcp_fastmcp.py -v
+
+# With coverage
+pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
+
+# Single test
+pytest tests/test_scraper_features.py::test_detect_language -v
+
+# E2E tests
+pytest tests/test_e2e_three_stream_pipeline.py -v
+```
+
+**Test Architecture:**
+- 76 test files covering all features
+- CI Matrix: Ubuntu + macOS, Python 3.10-3.13
+- 1200+ tests passing
+- Test markers: `slow`, `integration`, `e2e`, `venv`, `bootstrap`
+
+---
+
+## Code Style Guidelines
+
+### Linting and Formatting
+
+```bash
+# Run ruff linter
+ruff check src/ tests/
+
+# Run ruff formatter check
+ruff format --check src/ tests/
+
+# Auto-fix issues
+ruff check src/ tests/ --fix
+ruff format src/ tests/
+
+# Run mypy type checker
+mypy src/skill_seekers --show-error-codes --pretty
+```
+
+### Style Rules (from pyproject.toml)
+
+- **Line length:** 100 characters
+- **Target Python:** 3.10+
+- **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM
+- **Import sorting:** isort style with `skill_seekers` as first-party
+
+### Code Conventions
+
+1. **Use type hints** where practical (gradual typing approach)
+2. **Docstrings:** Use Google-style or standard docstrings
+3. **Error handling:** Use specific exceptions, provide helpful messages
+4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio`
+5. **File naming:** Use snake_case for all Python files
+
+---
+
+## Architecture Patterns
+
+### Platform Adaptor Pattern (Strategy Pattern)
+
+All platform-specific logic is encapsulated in adaptors:
+
+```python
+from skill_seekers.cli.adaptors import get_adaptor
+
+# Get platform-specific adaptor
+adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'markdown'
+
+# Package skill
+adaptor.package(skill_dir='output/react/', output_path='output/')
+
+# Upload to platform
+adaptor.upload(
+    package_path='output/react-gemini.tar.gz',
+    api_key=os.getenv('GOOGLE_API_KEY')
+)
+```
+
+### CLI Architecture (Git-style)
+
+Entry point: `src/skill_seekers/cli/main.py`
+
+The CLI uses subcommands that delegate to existing modules:
+
+```python
+# skill-seekers scrape --config react.json
+# Transforms to: doc_scraper.main() with modified sys.argv
+```
+
+**Available subcommands:**
+- `config` - Configuration wizard
+- `scrape` - Documentation scraping
+- `github` - GitHub repository scraping
+- `pdf` - PDF extraction
+- `unified` - Multi-source scraping
+- `analyze` - Local codebase analysis
+- `enhance` - AI enhancement
+- `package` - Package skill
+- `upload` - Upload to platform
+- `install` / `install-agent` - Complete workflow
+
+### MCP Server Architecture
+
+Two implementations:
+- `server_fastmcp.py` - Modern, decorator-based (recommended, 708 lines)
+- `server.py` - Legacy implementation (2200 lines)
+
+Tools are organized by category:
+- Config tools (3)
+- Scraping tools (8)
+- Packaging tools (4)
+- Splitting tools (2)
+- Source tools (4)
+
+---
+
+## Testing Instructions
+
+### Test Categories
+
+| Marker | Description |
+|--------|-------------|
+| `slow` | Tests taking >5 seconds |
+| `integration` | Requires external services (APIs) |
+| `e2e` | End-to-end tests (resource-intensive) |
+| `venv` | Requires virtual environment setup |
+| `bootstrap` | Bootstrap skill specific |
+
+### Running Specific Test Categories
+
+```bash
+# Skip slow tests
+pytest tests/ -v -m "not slow"
+
+# Run only integration tests
+pytest tests/ -v -m integration
+
+# Run E2E tests
+pytest tests/ -v -m e2e
+```
+
+### Test Configuration (pytest.ini in pyproject.toml)
+
+```toml
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+addopts = "-v --tb=short --strict-markers"
+asyncio_mode = "auto"
+```
+
+---
+
+## Git Workflow
+
+### Branch Structure
+
+```
+main (production)
+  ↑
+  │ (only maintainer merges)
+  │
+development (integration) ← default branch for PRs
+  ↑
+  │ (all contributor PRs go here)
+  │
+feature branches
+```
+
+- **`main`** - Production, always stable, protected
+- **`development`** - Active development, default for PRs
+- **Feature branches** - Your work, created from `development`
+
+### Creating a Feature Branch
+
+```bash
+# 1. Checkout development
+git checkout development
+git pull upstream development
+
+# 2. Create feature branch
+git checkout -b my-feature
+
+# 3. Make changes, commit, push
+git add .
+git commit -m "Add my feature"
+git push origin my-feature
+
+# 4. Create PR targeting 'development' branch
+```
+
+---
+
+## CI/CD Configuration
+
+### GitHub Actions Workflows
+
+**`.github/workflows/tests.yml`:**
+- Runs on: push/PR to `main` and `development`
+- Lint job: Ruff + MyPy
+- Test matrix: Ubuntu + macOS, Python 3.10-3.12
+- Coverage: Uploads to Codecov
+
+**`.github/workflows/release.yml`:**
+- Triggered on version tags
+- Builds and publishes to PyPI
+
+### Pre-commit Checks (Manual)
+
+```bash
+# Before committing, run:
+ruff check src/ tests/
+ruff format --check src/ tests/
+pytest tests/ -v -x  # Stop on first failure
+```
+
+---
+
+## Security Considerations
+
+### API Keys and Secrets
+
+1. **Never commit API keys** to the repository
+2. **Use environment variables:**
+   - `ANTHROPIC_API_KEY` - Claude AI
+   - `GOOGLE_API_KEY` - Google Gemini
+   - `OPENAI_API_KEY` - OpenAI
+   - `GITHUB_TOKEN` - GitHub API
+3. **Configuration storage:**
+   - Stored at `~/.config/skill-seekers/config.json`
+   - Permissions: 600 (owner read/write only)
+
+### Rate Limit Handling
+
+- GitHub API has rate limits (5000 requests/hour for authenticated)
+- The tool has built-in rate limit handling with retry logic
+- Use `--non-interactive` flag for CI/CD environments
+
+### Custom API Endpoints
+
+Support for Claude-compatible APIs (e.g., GLM-4.7):
+
+```bash
+export ANTHROPIC_API_KEY=your-glm-47-api-key
+export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
+```
+
+---
+
+## Common Development Tasks
+
+### Adding a New CLI Command
+
+1. Create module in `src/skill_seekers/cli/my_command.py`
+2. Implement `main()` function with argument parsing
+3. Add entry point in `pyproject.toml`:
+   ```toml
+   [project.scripts]
+   skill-seekers-my-command = "skill_seekers.cli.my_command:main"
+   ```
+4. Add subcommand handler in `src/skill_seekers/cli/main.py`
+5. Add tests in `tests/test_my_command.py`
+
+### Adding a New Platform Adaptor
+
+1. Create `src/skill_seekers/cli/adaptors/my_platform.py`
+2. Inherit from `SkillAdaptor` base class
+3. Implement required methods: `package()`, `upload()`, `enhance()`
+4. Register in `src/skill_seekers/cli/adaptors/__init__.py`
+5. Add optional dependencies in `pyproject.toml`
+6. Add tests in `tests/test_adaptors/`
+
+### Adding an MCP Tool
+
+1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py`
+2. Register in `src/skill_seekers/mcp/server_fastmcp.py`
+3. Add test in `tests/test_mcp_fastmcp.py`
+
+---
+
+## Documentation
+
+### Project Documentation
+
+- **README.md** - Main project documentation
+- **README.zh-CN.md** - Chinese translation
+- **CLAUDE.md** - Detailed implementation guidance
+- **QUICKSTART.md** - Quick start guide
+- **CONTRIBUTING.md** - Contribution guidelines
+- **docs/** - Comprehensive documentation (54 files)
+
+### Configuration Documentation
+
+Preset configs are in `configs/` directory:
+- `godot.json` - Godot Engine
+- `react.json` - React
+- `vue.json` - Vue.js
+- `fastapi.json` - FastAPI
+- `*_unified.json` - Multi-source configs
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**ImportError: No module named 'skill_seekers'**
+- Solution: Run `pip install -e .`
+
+**Tests failing with "package not installed"**
+- Solution: Ensure you ran `pip install -e .` in the correct virtual environment
+
+**MCP server import errors**
+- Solution: Install with `pip install -e ".[mcp]"`
+
+**Type checking failures**
+- MyPy is configured to be lenient (gradual typing)
+- Focus on critical paths, not full coverage
+
+### Getting Help
+
+- Check **TROUBLESHOOTING.md** for detailed solutions
+- Review **docs/FAQ.md** for common questions
+- Visit https://skillseekersweb.com/ for documentation
+- Open an issue on GitHub with:
+  - Clear title and description
+  - Steps to reproduce
+  - Expected vs actual behavior
+  - Environment details (OS, Python version)
+  - Error messages and stack traces
+
+---
+
+## Key Dependencies
+
+### Core Dependencies
+- `requests>=2.32.5` - HTTP requests
+- `beautifulsoup4>=4.14.2` - HTML parsing
+- `PyGithub>=2.5.0` - GitHub API
+- `GitPython>=3.1.40` - Git operations
+- `httpx>=0.28.1` - Async HTTP
+- `anthropic>=0.76.0` - Claude AI API
+- `PyMuPDF>=1.24.14` - PDF processing
+- `pydantic>=2.12.3` - Data validation
+- `click>=8.3.0` - CLI framework
+
+### Optional Dependencies
+- `mcp>=1.25` - MCP server
+- `google-generativeai>=0.8.0` - Gemini support
+- `openai>=1.0.0` - OpenAI support
+
+### Dev Dependencies
+- `pytest>=8.4.2` - Testing framework
+- `pytest-asyncio>=0.24.0` - Async test support
+- `pytest-cov>=7.0.0` - Coverage
+- `ruff>=0.14.13` - Linting/formatting
+- `mypy>=1.19.1` - Type checking
+
+---
+
+*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*