skill-seekers-reference/AGENTS.md

# AGENTS.md - Skill Seekers

This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.

---

## Project Overview

**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms. It supports 4 target platforms:

- **Claude AI** (ZIP + YAML format)
- **Google Gemini** (tar.gz format)
- **OpenAI ChatGPT** (ZIP + Vector Store)
- **Generic Markdown** (universal ZIP export)

**Current Version:** 2.7.4
**Python Version:** 3.10+ required
**License:** MIT
**Website:** https://skillseekersweb.com/

### Core Workflow

1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources
2. **Build Phase** - Organize content into categorized references
3. **Enhancement Phase** - AI-powered quality improvements (optional)
4. **Package Phase** - Create platform-specific packages
5. **Upload Phase** - Auto-upload to target platform (optional)

---

## Project Structure

```
/mnt/1ece809a-2821-4f10-aecb-fcdf34760c0b/Git/Skill_Seekers/
├── src/skill_seekers/              # Main source code (src/ layout)
│   ├── cli/                        # CLI tools and commands
│   │   ├── adaptors/               # Platform adaptors (Strategy pattern)
│   │   │   ├── base.py             # Abstract base class
│   │   │   ├── claude.py           # Claude AI adaptor
│   │   │   ├── gemini.py           # Google Gemini adaptor
│   │   │   ├── openai.py           # OpenAI ChatGPT adaptor
│   │   │   └── markdown.py         # Generic Markdown adaptor
│   │   ├── main.py                 # Unified CLI entry point
│   │   ├── doc_scraper.py          # Documentation scraper
│   │   ├── github_scraper.py       # GitHub repository scraper
│   │   ├── pdf_scraper.py          # PDF extraction
│   │   ├── unified_scraper.py      # Multi-source scraping
│   │   ├── codebase_scraper.py     # Local codebase analysis (C2.x/C3.x)
│   │   ├── enhance_skill_local.py  # AI enhancement (LOCAL mode)
│   │   ├── package_skill.py        # Skill packager
│   │   ├── upload_skill.py         # Upload to platforms
│   │   └── ...                     # 50+ CLI modules
│   └── mcp/                        # MCP server integration
│       ├── server_fastmcp.py       # FastMCP server (main)
│       ├── server.py               # Legacy server
│       └── tools/                  # MCP tool implementations
├── tests/                          # Test suite (76 test files)
├── configs/                        # Preset configuration files
├── docs/                           # Documentation (54 markdown files)
├── .github/workflows/              # CI/CD workflows
├── pyproject.toml                  # Main project configuration
└── requirements.txt                # Pinned dependencies
```

---

## Build and Development Commands

### Setup (REQUIRED before any development)

```bash
# Install in editable mode (REQUIRED for tests due to src/ layout)
pip install -e .

# Install with all platform dependencies
pip install -e ".[all-llms]"

# Install specific platforms only
pip install -e ".[gemini]"    # Google Gemini support
pip install -e ".[openai]"    # OpenAI ChatGPT support
pip install -e ".[mcp]"       # MCP server dependencies
```

**CRITICAL:** The project uses a `src/` layout. Tests WILL FAIL unless you install with `pip install -e .` first.

### Building

```bash
# Build package using uv (recommended)
uv build

# Or using standard build
python -m build

# Publish to PyPI
uv publish
```

### Running Tests

**CRITICAL:** Never skip tests - all tests must pass before commits.

```bash
# All tests (must run pip install -e . first!)
pytest tests/ -v

# Specific test file
pytest tests/test_scraper_features.py -v
pytest tests/test_mcp_fastmcp.py -v

# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html

# Single test
pytest tests/test_scraper_features.py::test_detect_language -v

# E2E tests
pytest tests/test_e2e_three_stream_pipeline.py -v
```

**Test Architecture:**
- 76 test files covering all features
- CI Matrix: Ubuntu + macOS, Python 3.10-3.13
- 1200+ tests passing
- Test markers: `slow`, `integration`, `e2e`, `venv`, `bootstrap`

---

## Code Style Guidelines

### Linting and Formatting

```bash
# Run ruff linter
ruff check src/ tests/

# Run ruff formatter check
ruff format --check src/ tests/

# Auto-fix issues
ruff check src/ tests/ --fix
ruff format src/ tests/

# Run mypy type checker
mypy src/skill_seekers --show-error-codes --pretty
```

### Style Rules (from pyproject.toml)

- **Line length:** 100 characters
- **Target Python:** 3.10+
- **Enabled rules:** E, W, F, I, B, C4, UP, ARG, SIM
- **Import sorting:** isort style with `skill_seekers` as first-party

### Code Conventions

1. **Use type hints** where practical (gradual typing approach)
2. **Docstrings:** Use Google-style or standard docstrings
3. **Error handling:** Use specific exceptions, provide helpful messages
4. **Async code:** Use `asyncio`, mark tests with `@pytest.mark.asyncio`
5. **File naming:** Use snake_case for all Python files

---

## Architecture Patterns

### Platform Adaptor Pattern (Strategy Pattern)

All platform-specific logic is encapsulated in adaptors:

```python
from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'markdown'

# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform
adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)
```

### CLI Architecture (Git-style)

Entry point: `src/skill_seekers/cli/main.py`

The CLI uses subcommands that delegate to existing modules:

```python
# skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv
```

**Available subcommands:**
- `config` - Configuration wizard
- `scrape` - Documentation scraping
- `github` - GitHub repository scraping
- `pdf` - PDF extraction
- `unified` - Multi-source scraping
- `analyze` - Local codebase analysis
- `enhance` - AI enhancement
- `package` - Package skill
- `upload` - Upload to platform
- `install` / `install-agent` - Complete workflow

### MCP Server Architecture

Two implementations:
- `server_fastmcp.py` - Modern, decorator-based (recommended, 708 lines)
- `server.py` - Legacy implementation (2200 lines)

Tools are organized by category:
- Config tools (3)
- Scraping tools (8)
- Packaging tools (4)
- Splitting tools (2)
- Source tools (4)

---

## Testing Instructions

### Test Categories

| Marker | Description |
|--------|-------------|
| `slow` | Tests taking >5 seconds |
| `integration` | Requires external services (APIs) |
| `e2e` | End-to-end tests (resource-intensive) |
| `venv` | Requires virtual environment setup |
| `bootstrap` | Bootstrap skill specific |

### Running Specific Test Categories

```bash
# Skip slow tests
pytest tests/ -v -m "not slow"

# Run only integration tests
pytest tests/ -v -m integration

# Run E2E tests
pytest tests/ -v -m e2e
```

### Test Configuration (pytest.ini in pyproject.toml)

```toml
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
addopts = "-v --tb=short --strict-markers"
asyncio_mode = "auto"
```

---

## Git Workflow

### Branch Structure

```
main (production)
  ↑
  │ (only maintainer merges)
  │
development (integration) ← default branch for PRs
  ↑
  │ (all contributor PRs go here)
  │
feature branches
```

- **`main`** - Production, always stable, protected
- **`development`** - Active development, default for PRs
- **Feature branches** - Your work, created from `development`

### Creating a Feature Branch

```bash
# 1. Checkout development
git checkout development
git pull upstream development

# 2. Create feature branch
git checkout -b my-feature

# 3. Make changes, commit, push
git add .
git commit -m "Add my feature"
git push origin my-feature

# 4. Create PR targeting 'development' branch
```

---

## CI/CD Configuration

### GitHub Actions Workflows

**`.github/workflows/tests.yml`:**
- Runs on: push/PR to `main` and `development`
- Lint job: Ruff + MyPy
- Test matrix: Ubuntu + macOS, Python 3.10-3.12
- Coverage: Uploads to Codecov

**`.github/workflows/release.yml`:**
- Triggered on version tags
- Builds and publishes to PyPI

### Pre-commit Checks (Manual)

```bash
# Before committing, run:
ruff check src/ tests/
ruff format --check src/ tests/
pytest tests/ -v -x  # Stop on first failure
```

---

## Security Considerations

### API Keys and Secrets

1. **Never commit API keys** to the repository
2. **Use environment variables:**
   - `ANTHROPIC_API_KEY` - Claude AI
   - `GOOGLE_API_KEY` - Google Gemini
   - `OPENAI_API_KEY` - OpenAI
   - `GITHUB_TOKEN` - GitHub API
3. **Configuration storage:**
   - Stored at `~/.config/skill-seekers/config.json`
   - Permissions: 600 (owner read/write only)

### Rate Limit Handling

- GitHub API has rate limits (5000 requests/hour for authenticated)
- The tool has built-in rate limit handling with retry logic
- Use `--non-interactive` flag for CI/CD environments

### Custom API Endpoints

Support for Claude-compatible APIs (e.g., GLM-4.7):

```bash
export ANTHROPIC_API_KEY=your-glm-47-api-key
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
```

---

## Common Development Tasks

### Adding a New CLI Command

1. Create module in `src/skill_seekers/cli/my_command.py`
2. Implement `main()` function with argument parsing
3. Add entry point in `pyproject.toml`:
   ```toml
   [project.scripts]
   skill-seekers-my-command = "skill_seekers.cli.my_command:main"
   ```
4. Add subcommand handler in `src/skill_seekers/cli/main.py`
5. Add tests in `tests/test_my_command.py`

### Adding a New Platform Adaptor

1. Create `src/skill_seekers/cli/adaptors/my_platform.py`
2. Inherit from `SkillAdaptor` base class
3. Implement required methods: `package()`, `upload()`, `enhance()`
4. Register in `src/skill_seekers/cli/adaptors/__init__.py`
5. Add optional dependencies in `pyproject.toml`
6. Add tests in `tests/test_adaptors/`

### Adding an MCP Tool

1. Implement tool logic in `src/skill_seekers/mcp/tools/category_tools.py`
2. Register in `src/skill_seekers/mcp/server_fastmcp.py`
3. Add test in `tests/test_mcp_fastmcp.py`

---

## Documentation

### Project Documentation

- **README.md** - Main project documentation
- **README.zh-CN.md** - Chinese translation
- **CLAUDE.md** - Detailed implementation guidance
- **QUICKSTART.md** - Quick start guide
- **CONTRIBUTING.md** - Contribution guidelines
- **docs/** - Comprehensive documentation (54 files)

### Configuration Documentation

Preset configs are in `configs/` directory:
- `godot.json` - Godot Engine
- `react.json` - React
- `vue.json` - Vue.js
- `fastapi.json` - FastAPI
- `*_unified.json` - Multi-source configs

---

## Troubleshooting

### Common Issues

**ImportError: No module named 'skill_seekers'**
- Solution: Run `pip install -e .`

**Tests failing with "package not installed"**
- Solution: Ensure you ran `pip install -e .` in the correct virtual environment

**MCP server import errors**
- Solution: Install with `pip install -e ".[mcp]"`

**Type checking failures**
- MyPy is configured to be lenient (gradual typing)
- Focus on critical paths, not full coverage

### Getting Help

- Check **TROUBLESHOOTING.md** for detailed solutions
- Review **docs/FAQ.md** for common questions
- Visit https://skillseekersweb.com/ for documentation
- Open an issue on GitHub with:
  - Clear title and description
  - Steps to reproduce
  - Expected vs actual behavior
  - Environment details (OS, Python version)
  - Error messages and stack traces

---

## Key Dependencies

### Core Dependencies
- `requests>=2.32.5` - HTTP requests
- `beautifulsoup4>=4.14.2` - HTML parsing
- `PyGithub>=2.5.0` - GitHub API
- `GitPython>=3.1.40` - Git operations
- `httpx>=0.28.1` - Async HTTP
- `anthropic>=0.76.0` - Claude AI API
- `PyMuPDF>=1.24.14` - PDF processing
- `pydantic>=2.12.3` - Data validation
- `click>=8.3.0` - CLI framework

### Optional Dependencies
- `mcp>=1.25` - MCP server
- `google-generativeai>=0.8.0` - Gemini support
- `openai>=1.0.0` - OpenAI support

### Dev Dependencies
- `pytest>=8.4.2` - Testing framework
- `pytest-asyncio>=0.24.0` - Async test support
- `pytest-cov>=7.0.0` - Coverage
- `ruff>=0.14.13` - Linting/formatting
- `mypy>=1.19.1` - Type checking

---

*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*