docs: Add AGENTS.md guide for AI coding agents
- Comprehensive guide for AI assistants working with the codebase - Covers project structure, development commands, architecture patterns - Includes testing guidelines, CI/CD info, and troubleshooting - Documents all entry points, dependencies, and best practices
This commit is contained in:
469
AGENTS.md
Normal file
469
AGENTS.md
Normal file
@@ -0,0 +1,469 @@
|
||||
# AGENTS.md - Skill Seekers
|
||||
|
||||
This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms:
|
||||
|
||||
- **Claude AI** (ZIP + YAML format)
|
||||
- **Google Gemini** (tar.gz format)
|
||||
- **OpenAI ChatGPT** (ZIP + Vector Store)
|
||||
- **Generic Markdown** (universal ZIP export)
|
||||
|
||||
**Current Version:** 2.7.4
|
||||
**Python Version:** 3.10+ required
|
||||
**License:** MIT
|
||||
**Website:** https://skillseekersweb.com/
|
||||
|
||||
### Core Workflow
|
||||
|
||||
1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources
|
||||
2. **Build Phase** - Organize content into categorized references
|
||||
3. **Enhancement Phase** - AI-powered quality improvements (optional)
|
||||
4. **Package Phase** - Create platform-specific packages
|
||||
5. **Upload Phase** - Auto-upload to target platform (optional)
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
|
||||
├── src/skill_seekers/ # Main source code (src/ layout)
|
||||
│ ├── cli/ # CLI tools and commands
|
||||
│ │ ├── adaptors/ # Platform adaptors (Strategy pattern)
|
||||
│ │ │ ├── base.py # Abstract base class
|
||||
│ │ │ ├── claude.py # Claude AI adaptor
|
||||
│ │ │ ├── gemini.py # Google Gemini adaptor
|
||||
│ │ │ ├── openai.py # OpenAI ChatGPT adaptor
|
||||
│ │ │ └── markdown.py # Generic Markdown adaptor
|
||||
│ │ ├── main.py # Unified CLI entry point
|
||||
│ │ ├── doc_scraper.py # Documentation scraper
|
||||
│ │ ├── github_scraper.py # GitHub repository scraper
|
||||
│ │ ├── pdf_scraper.py # PDF extraction
|
||||
│ │ ├── unified_scraper.py # Multi-source scraping
|
||||
│ │ ├── codebase_scraper.py # Local codebase analysis (C2.x/C3.x)
|
||||
│ │ ├── enhance_skill_local.py # AI enhancement (LOCAL mode)
|
||||
│ │ ├── package_skill.py # Skill packager
|
||||
│ │ ├── upload_skill.py # Upload to platforms
|
||||
│ │ └── ... # 50+ CLI modules
|
||||
│ └── mcp/ # MCP server integration
|
||||
│ ├── server_fastmcp.py # FastMCP server (main)
|
||||
│ ├── server.py # Legacy server
|
||||
│ └── tools/ # MCP tool implementations
|
||||
├── tests/ # Test suite (76 test files)
|
||||
├── configs/ # Preset configuration files
|
||||
├── docs/ # Documentation (54 markdown files)
|
||||
├── .github/workflows/ # CI/CD workflows
|
||||
├── pyproject.toml # Main project configuration
|
||||
└── requirements.txt # Pinned dependencies
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build and Development Commands
|
||||
|
||||
### Setup (REQUIRED before any development)
|
||||
|
||||
```bash
|
||||
# Install in editable mode (REQUIRED for tests due to src/ layout)
|
||||
pip install -e .
|
||||
|
||||
# Install with all platform dependencies
|
||||
pip install -e ".[all-llms]"
|
||||
|
||||
# Install specific platforms only
|
||||
pip install -e ".[gemini]" # Google Gemini support
|
||||
pip install -e ".[openai]" # OpenAI ChatGPT support
|
||||
pip install -e ".[mcp]" # MCP server dependencies
|
||||
```
|
||||
|
||||
**CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first.
|
||||
|
||||
### Building
|
||||
|
||||
```bash
|
||||
# Build package using uv (recommended)
|
||||
uv build
|
||||
|
||||
# Or using standard build
|
||||
python -m build
|
||||
|
||||
# Publish to PyPI
|
||||
uv publish
|
||||
```
|
||||
|
||||
### Running Tests
|
||||
|
||||
**CRITICAL:** Never skip tests - all tests must pass before commits.
|
||||
|
||||
```bash
|
||||
# All tests (must run pip install -e . first!)
|
||||
pytest tests/ -v
|
||||
|
||||
# Specific test file
|
||||
pytest tests/test_scraper_features.py -v
|
||||
pytest tests/test_mcp_fastmcp.py -v
|
||||
|
||||
# With coverage
|
||||
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
|
||||
|
||||
# Single test
|
||||
pytest tests/test_scraper_features.py::test_detect_language -v
|
||||
|
||||
# E2E tests
|
||||
pytest tests/test_e2e_three_stream_pipeline.py -v
|
||||
```
|
||||
|
||||
**Test Architecture:**
|
||||
- 76 test files covering all features
|
||||
- CI Matrix: Ubuntu + macOS, Python 3.10-3.13
|
||||
- 1200+ tests passing
|
||||
- Test markers: `slow`, `integration`, `e2e`, `venv`, `bootstrap`
|
||||
|
||||
---
|
||||
|
||||
## Code Style Guidelines
|
||||
|
||||
### Linting and Formatting
|
||||
|
||||
```bash
|
||||
# Run ruff linter
|
||||
ruff check src/ tests/
|
||||
|
||||
# Run ruff formatter check
|
||||
ruff format --check src/ tests/
|
||||
|
||||
# Auto-fix issues
|
||||
ruff check src/ tests/ --fix
|
||||
ruff format src/ tests/
|
||||
|
||||
# Run mypy type checker
|
||||
mypy src/skill_seekers --show-error-codes --pretty
|
||||
```
|
||||
|
||||
### Style Rules (from pyproject.toml)
|
||||
|
||||
- **Line length:** 100 characters
|
||||
- **Target Python:** 3.10+
|
||||
- **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM
|
||||
- **Import sorting:** isort style with `skill_seekers` as first-party
|
||||
|
||||
### Code Conventions
|
||||
|
||||
1. **Use type hints** where practical (gradual typing approach)
|
||||
2. **Docstrings:** Use Google-style or standard docstrings
|
||||
3. **Error handling:** Use specific exceptions, provide helpful messages
|
||||
4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio`
|
||||
5. **File naming:** Use snake_case for all Python files
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Platform Adaptor Pattern (Strategy Pattern)
|
||||
|
||||
All platform-specific logic is encapsulated in adaptors:
|
||||
|
||||
```python
|
||||
from skill_seekers.cli.adaptors import get_adaptor
|
||||
|
||||
# Get platform-specific adaptor
|
||||
adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'markdown'
|
||||
|
||||
# Package skill
|
||||
adaptor.package(skill_dir='output/react/', output_path='output/')
|
||||
|
||||
# Upload to platform
|
||||
adaptor.upload(
|
||||
package_path='output/react-gemini.tar.gz',
|
||||
api_key=os.getenv('GOOGLE_API_KEY')
|
||||
)
|
||||
```
|
||||
|
||||
### CLI Architecture (Git-style)
|
||||
|
||||
Entry point: `src/skill_seekers/cli/main.py`
|
||||
|
||||
The CLI uses subcommands that delegate to existing modules:
|
||||
|
||||
```python
|
||||
# skill-seekers scrape --config react.json
|
||||
# Transforms to: doc_scraper.main() with modified sys.argv
|
||||
```
|
||||
|
||||
**Available subcommands:**
|
||||
- `config` - Configuration wizard
|
||||
- `scrape` - Documentation scraping
|
||||
- `github` - GitHub repository scraping
|
||||
- `pdf` - PDF extraction
|
||||
- `unified` - Multi-source scraping
|
||||
- `analyze` - Local codebase analysis
|
||||
- `enhance` - AI enhancement
|
||||
- `package` - Package skill
|
||||
- `upload` - Upload to platform
|
||||
- `install` / `install-agent` - Complete workflow
|
||||
|
||||
### MCP Server Architecture
|
||||
|
||||
Two implementations:
|
||||
- `server_fastmcp.py` - Modern, decorator-based (recommended, 708 lines)
|
||||
- `server.py` - Legacy implementation (2200 lines)
|
||||
|
||||
Tools are organized by category:
|
||||
- Config tools (3)
|
||||
- Scraping tools (8)
|
||||
- Packaging tools (4)
|
||||
- Splitting tools (2)
|
||||
- Source tools (4)
|
||||
|
||||
---
|
||||
|
||||
## Testing Instructions
|
||||
|
||||
### Test Categories
|
||||
|
||||
| Marker | Description |
|
||||
|--------|-------------|
|
||||
| `slow` | Tests taking >5 seconds |
|
||||
| `integration` | Requires external services (APIs) |
|
||||
| `e2e` | End-to-end tests (resource-intensive) |
|
||||
| `venv` | Requires virtual environment setup |
|
||||
| `bootstrap` | Bootstrap skill specific |
|
||||
|
||||
### Running Specific Test Categories
|
||||
|
||||
```bash
|
||||
# Skip slow tests
|
||||
pytest tests/ -v -m "not slow"
|
||||
|
||||
# Run only integration tests
|
||||
pytest tests/ -v -m integration
|
||||
|
||||
# Run E2E tests
|
||||
pytest tests/ -v -m e2e
|
||||
```
|
||||
|
||||
### Test Configuration (pytest.ini in pyproject.toml)
|
||||
|
||||
```toml
|
||||
[tool.pytest.ini_options]
|
||||
testpaths = ["tests"]
|
||||
python_files = ["test_*.py"]
|
||||
addopts = "-v --tb=short --strict-markers"
|
||||
asyncio_mode = "auto"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Git Workflow
|
||||
|
||||
### Branch Structure
|
||||
|
||||
```
|
||||
main (production)
|
||||
↑
|
||||
│ (only maintainer merges)
|
||||
│
|
||||
development (integration) ← default branch for PRs
|
||||
↑
|
||||
│ (all contributor PRs go here)
|
||||
│
|
||||
feature branches
|
||||
```
|
||||
|
||||
- **`main`** - Production, always stable, protected
|
||||
- **`development`** - Active development, default for PRs
|
||||
- **Feature branches** - Your work, created from `development`
|
||||
|
||||
### Creating a Feature Branch
|
||||
|
||||
```bash
|
||||
# 1. Checkout development
|
||||
git checkout development
|
||||
git pull upstream development
|
||||
|
||||
# 2. Create feature branch
|
||||
git checkout -b my-feature
|
||||
|
||||
# 3. Make changes, commit, push
|
||||
git add .
|
||||
git commit -m "Add my feature"
|
||||
git push origin my-feature
|
||||
|
||||
# 4. Create PR targeting 'development' branch
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Configuration
|
||||
|
||||
### GitHub Actions Workflows
|
||||
|
||||
**`.github/workflows/tests.yml`:**
|
||||
- Runs on: push/PR to `main` and `development`
|
||||
- Lint job: Ruff + MyPy
|
||||
- Test matrix: Ubuntu + macOS, Python 3.10-3.12
|
||||
- Coverage: Uploads to Codecov
|
||||
|
||||
**`.github/workflows/release.yml`:**
|
||||
- Triggered on version tags
|
||||
- Builds and publishes to PyPI
|
||||
|
||||
### Pre-commit Checks (Manual)
|
||||
|
||||
```bash
|
||||
# Before committing, run:
|
||||
ruff check src/ tests/
|
||||
ruff format --check src/ tests/
|
||||
pytest tests/ -v -x # Stop on first failure
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### API Keys and Secrets
|
||||
|
||||
1. **Never commit API keys** to the repository
|
||||
2. **Use environment variables:**
|
||||
- `ANTHROPIC_API_KEY` - Claude AI
|
||||
- `GOOGLE_API_KEY` - Google Gemini
|
||||
- `OPENAI_API_KEY` - OpenAI
|
||||
- `GITHUB_TOKEN` - GitHub API
|
||||
3. **Configuration storage:**
|
||||
- Stored at `~/.config/skill-seekers/config.json`
|
||||
- Permissions: 600 (owner read/write only)
|
||||
|
||||
### Rate Limit Handling
|
||||
|
||||
- GitHub API has rate limits (5000 requests/hour for authenticated)
|
||||
- The tool has built-in rate limit handling with retry logic
|
||||
- Use `--non-interactive` flag for CI/CD environments
|
||||
|
||||
### Custom API Endpoints
|
||||
|
||||
Support for Claude-compatible APIs (e.g., GLM-4.7):
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=your-glm-47-api-key
|
||||
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Development Tasks
|
||||
|
||||
### Adding a New CLI Command
|
||||
|
||||
1. Create module in `src/skill_seekers/cli/my_command.py`
|
||||
2. Implement `main()` function with argument parsing
|
||||
3. Add entry point in `pyproject.toml`:
|
||||
```toml
|
||||
[project.scripts]
|
||||
skill-seekers-my-command = "skill_seekers.cli.my_command:main"
|
||||
```
|
||||
4. Add subcommand handler in `src/skill_seekers/cli/main.py`
|
||||
5. Add tests in `tests/test_my_command.py`
|
||||
|
||||
### Adding a New Platform Adaptor
|
||||
|
||||
1. Create `src/skill_seekers/cli/adaptors/my_platform.py`
|
||||
2. Inherit from `SkillAdaptor` base class
|
||||
3. Implement required methods: `package()`, `upload()`, `enhance()`
|
||||
4. Register in `src/skill_seekers/cli/adaptors/__init__.py`
|
||||
5. Add optional dependencies in `pyproject.toml`
|
||||
6. Add tests in `tests/test_adaptors/`
|
||||
|
||||
### Adding an MCP Tool
|
||||
|
||||
1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py`
|
||||
2. Register in `src/skill_seekers/mcp/server_fastmcp.py`
|
||||
3. Add test in `tests/test_mcp_fastmcp.py`
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
### Project Documentation
|
||||
|
||||
- **README.md** - Main project documentation
|
||||
- **README.zh-CN.md** - Chinese translation
|
||||
- **CLAUDE.md** - Detailed implementation guidance
|
||||
- **QUICKSTART.md** - Quick start guide
|
||||
- **CONTRIBUTING.md** - Contribution guidelines
|
||||
- **docs/** - Comprehensive documentation (54 files)
|
||||
|
||||
### Configuration Documentation
|
||||
|
||||
Preset configs are in `configs/` directory:
|
||||
- `godot.json` - Godot Engine
|
||||
- `react.json` - React
|
||||
- `vue.json` - Vue.js
|
||||
- `fastapi.json` - FastAPI
|
||||
- `*_unified.json` - Multi-source configs
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**ImportError: No module named 'skill_seekers'**
|
||||
- Solution: Run `pip install -e .`
|
||||
|
||||
**Tests failing with "package not installed"**
|
||||
- Solution: Ensure you ran `pip install -e .` in the correct virtual environment
|
||||
|
||||
**MCP server import errors**
|
||||
- Solution: Install with `pip install -e ".[mcp]"`
|
||||
|
||||
**Type checking failures**
|
||||
- MyPy is configured to be lenient (gradual typing)
|
||||
- Focus on critical paths, not full coverage
|
||||
|
||||
### Getting Help
|
||||
|
||||
- Check **TROUBLESHOOTING.md** for detailed solutions
|
||||
- Review **docs/FAQ.md** for common questions
|
||||
- Visit https://skillseekersweb.com/ for documentation
|
||||
- Open an issue on GitHub with:
|
||||
- Clear title and description
|
||||
- Steps to reproduce
|
||||
- Expected vs actual behavior
|
||||
- Environment details (OS, Python version)
|
||||
- Error messages and stack traces
|
||||
|
||||
---
|
||||
|
||||
## Key Dependencies
|
||||
|
||||
### Core Dependencies
|
||||
- `requests>=2.32.5` - HTTP requests
|
||||
- `beautifulsoup4>=4.14.2` - HTML parsing
|
||||
- `PyGithub>=2.5.0` - GitHub API
|
||||
- `GitPython>=3.1.40` - Git operations
|
||||
- `httpx>=0.28.1` - Async HTTP
|
||||
- `anthropic>=0.76.0` - Claude AI API
|
||||
- `PyMuPDF>=1.24.14` - PDF processing
|
||||
- `pydantic>=2.12.3` - Data validation
|
||||
- `click>=8.3.0` - CLI framework
|
||||
|
||||
### Optional Dependencies
|
||||
- `mcp>=1.25` - MCP server
|
||||
- `google-generativeai>=0.8.0` - Gemini support
|
||||
- `openai>=1.0.0` - OpenAI support
|
||||
|
||||
### Dev Dependencies
|
||||
- `pytest>=8.4.2` - Testing framework
|
||||
- `pytest-asyncio>=0.24.0` - Async test support
|
||||
- `pytest-cov>=7.0.0` - Coverage
|
||||
- `ruff>=0.14.13` - Linting/formatting
|
||||
- `mypy>=1.19.1` - Type checking
|
||||
|
||||
---
|
||||
|
||||
*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*
|
||||
Reference in New Issue
Block a user