docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure: - New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps) - New docs/user-guide/ section (6 files: core concepts through troubleshooting) - New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE) - New docs/advanced/ section (custom-workflows, mcp-server, multi-source) - New docs/ARCHITECTURE.md - system architecture overview - Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/ Chinese (zh-CN) translations: - Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced) - GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml) - Translation sync checker script (scripts/check_translation_sync.sh) - Translation helper script (scripts/translate_doc.py) Content updates: - CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22 - README.md: updated with new doc structure links - AGENTS.md: updated agent documentation - docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config Analysis/planning artifacts (kept for reference): - DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md - FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md - CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
1206
docs/reference/CLI_REFERENCE.md
Normal file
1206
docs/reference/CLI_REFERENCE.md
Normal file
File diff suppressed because it is too large
Load Diff
610
docs/reference/CONFIG_FORMAT.md
Normal file
610
docs/reference/CONFIG_FORMAT.md
Normal file
@@ -0,0 +1,610 @@
|
||||
# Config Format Reference - Skill Seekers
|
||||
|
||||
> **Version:** 3.1.0
|
||||
> **Last Updated:** 2026-02-16
|
||||
> **Complete JSON configuration specification**
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Single-Source Config](#single-source-config)
|
||||
- [Documentation Source](#documentation-source)
|
||||
- [GitHub Source](#github-source)
|
||||
- [PDF Source](#pdf-source)
|
||||
- [Local Source](#local-source)
|
||||
- [Unified (Multi-Source) Config](#unified-multi-source-config)
|
||||
- [Common Fields](#common-fields)
|
||||
- [Selectors](#selectors)
|
||||
- [Categories](#categories)
|
||||
- [URL Patterns](#url-patterns)
|
||||
- [Examples](#examples)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Skill Seekers uses JSON configuration files to define scraping targets. There are two types:
|
||||
|
||||
| Type | Use Case | File |
|
||||
|------|----------|------|
|
||||
| **Single-Source** | One source (docs, GitHub, PDF, or local) | `*.json` |
|
||||
| **Unified** | Multiple sources combined | `*-unified.json` |
|
||||
|
||||
---
|
||||
|
||||
## Single-Source Config
|
||||
|
||||
### Documentation Source
|
||||
|
||||
For scraping documentation websites.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react",
|
||||
"base_url": "https://react.dev/",
|
||||
"description": "React - JavaScript library for building UIs",
|
||||
|
||||
"start_urls": [
|
||||
"https://react.dev/learn",
|
||||
"https://react.dev/reference/react"
|
||||
],
|
||||
|
||||
"selectors": {
|
||||
"main_content": "article",
|
||||
"title": "h1",
|
||||
"code_blocks": "pre code"
|
||||
},
|
||||
|
||||
"url_patterns": {
|
||||
"include": ["/learn/", "/reference/"],
|
||||
"exclude": ["/blog/", "/community/"]
|
||||
},
|
||||
|
||||
"categories": {
|
||||
"getting_started": ["learn", "tutorial", "intro"],
|
||||
"api": ["reference", "api", "hooks"]
|
||||
},
|
||||
|
||||
"rate_limit": 0.5,
|
||||
"max_pages": 300,
|
||||
"merge_mode": "claude-enhanced"
|
||||
}
|
||||
```
|
||||
|
||||
#### Documentation Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `name` | string | Yes | - | Skill name (alphanumeric, dashes, underscores) |
|
||||
| `base_url` | string | Yes | - | Base documentation URL |
|
||||
| `description` | string | No | "" | Skill description for SKILL.md |
|
||||
| `start_urls` | array | No | `[base_url]` | URLs to start crawling from |
|
||||
| `selectors` | object | No | see below | CSS selectors for content extraction |
|
||||
| `url_patterns` | object | No | `{}` | Include/exclude URL patterns |
|
||||
| `categories` | object | No | `{}` | Content categorization rules |
|
||||
| `rate_limit` | number | No | 0.5 | Seconds between requests |
|
||||
| `max_pages` | number | No | 500 | Maximum pages to scrape |
|
||||
| `merge_mode` | string | No | "claude-enhanced" | Merge strategy |
|
||||
| `extract_api` | boolean | No | false | Extract API references |
|
||||
| `llms_txt_url` | string | No | auto | Path to llms.txt file |
|
||||
|
||||
---
|
||||
|
||||
### GitHub Source
|
||||
|
||||
For analyzing GitHub repositories.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react-github",
|
||||
"type": "github",
|
||||
"repo": "facebook/react",
|
||||
"description": "React GitHub repository analysis",
|
||||
|
||||
"enable_codebase_analysis": true,
|
||||
"code_analysis_depth": "deep",
|
||||
|
||||
"fetch_issues": true,
|
||||
"max_issues": 100,
|
||||
"issue_labels": ["bug", "enhancement"],
|
||||
|
||||
"fetch_releases": true,
|
||||
"max_releases": 20,
|
||||
|
||||
"fetch_changelog": true,
|
||||
"analyze_commit_history": true,
|
||||
|
||||
"file_patterns": ["*.js", "*.ts", "*.tsx"],
|
||||
"exclude_patterns": ["*.test.js", "node_modules/**"],
|
||||
|
||||
"rate_limit": 1.0
|
||||
}
|
||||
```
|
||||
|
||||
#### GitHub Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `name` | string | Yes | - | Skill name |
|
||||
| `type` | string | Yes | - | Must be `"github"` |
|
||||
| `repo` | string | Yes | - | Repository in `owner/repo` format |
|
||||
| `description` | string | No | "" | Skill description |
|
||||
| `enable_codebase_analysis` | boolean | No | true | Analyze source code |
|
||||
| `code_analysis_depth` | string | No | "standard" | `surface`, `standard`, `deep` |
|
||||
| `fetch_issues` | boolean | No | true | Fetch GitHub issues |
|
||||
| `max_issues` | number | No | 100 | Maximum issues to fetch |
|
||||
| `issue_labels` | array | No | [] | Filter by labels |
|
||||
| `fetch_releases` | boolean | No | true | Fetch releases |
|
||||
| `max_releases` | number | No | 20 | Maximum releases |
|
||||
| `fetch_changelog` | boolean | No | true | Extract CHANGELOG |
|
||||
| `analyze_commit_history` | boolean | No | false | Analyze commits |
|
||||
| `file_patterns` | array | No | [] | Include file patterns |
|
||||
| `exclude_patterns` | array | No | [] | Exclude file patterns |
|
||||
|
||||
---
|
||||
|
||||
### PDF Source
|
||||
|
||||
For extracting content from PDF files.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "product-manual",
|
||||
"type": "pdf",
|
||||
"pdf_path": "docs/manual.pdf",
|
||||
"description": "Product documentation manual",
|
||||
|
||||
"enable_ocr": false,
|
||||
"password": "",
|
||||
|
||||
"extract_images": true,
|
||||
"image_output_dir": "output/images/",
|
||||
|
||||
"extract_tables": true,
|
||||
"table_format": "markdown",
|
||||
|
||||
"page_range": [1, 100],
|
||||
"split_by_chapters": true,
|
||||
|
||||
"chunk_size": 1000,
|
||||
"chunk_overlap": 100
|
||||
}
|
||||
```
|
||||
|
||||
#### PDF Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `name` | string | Yes | - | Skill name |
|
||||
| `type` | string | Yes | - | Must be `"pdf"` |
|
||||
| `pdf_path` | string | Yes | - | Path to PDF file |
|
||||
| `description` | string | No | "" | Skill description |
|
||||
| `enable_ocr` | boolean | No | false | OCR for scanned PDFs |
|
||||
| `password` | string | No | "" | PDF password if encrypted |
|
||||
| `extract_images` | boolean | No | false | Extract embedded images |
|
||||
| `image_output_dir` | string | No | auto | Directory for images |
|
||||
| `extract_tables` | boolean | No | false | Extract tables |
|
||||
| `table_format` | string | No | "markdown" | `markdown`, `json`, `csv` |
|
||||
| `page_range` | array | No | all | `[start, end]` page range |
|
||||
| `split_by_chapters` | boolean | No | false | Split by detected chapters |
|
||||
| `chunk_size` | number | No | 1000 | Characters per chunk |
|
||||
| `chunk_overlap` | number | No | 100 | Overlap between chunks |
|
||||
|
||||
---
|
||||
|
||||
### Local Source
|
||||
|
||||
For analyzing local codebases.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-project",
|
||||
"type": "local",
|
||||
"directory": "./my-project",
|
||||
"description": "Local project analysis",
|
||||
|
||||
"languages": ["Python", "JavaScript"],
|
||||
"file_patterns": ["*.py", "*.js"],
|
||||
"exclude_patterns": ["*.pyc", "node_modules/**", ".git/**"],
|
||||
|
||||
"analysis_depth": "comprehensive",
|
||||
|
||||
"extract_api": true,
|
||||
"extract_patterns": true,
|
||||
"extract_test_examples": true,
|
||||
"extract_how_to_guides": true,
|
||||
"extract_config_patterns": true,
|
||||
|
||||
"include_comments": true,
|
||||
"include_docstrings": true,
|
||||
"include_readme": true
|
||||
}
|
||||
```
|
||||
|
||||
#### Local Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `name` | string | Yes | - | Skill name |
|
||||
| `type` | string | Yes | - | Must be `"local"` |
|
||||
| `directory` | string | Yes | - | Path to directory |
|
||||
| `description` | string | No | "" | Skill description |
|
||||
| `languages` | array | No | auto | Languages to analyze |
|
||||
| `file_patterns` | array | No | all | Include patterns |
|
||||
| `exclude_patterns` | array | No | common | Exclude patterns |
|
||||
| `analysis_depth` | string | No | "standard" | `quick`, `standard`, `comprehensive` |
|
||||
| `extract_api` | boolean | No | true | Extract API documentation |
|
||||
| `extract_patterns` | boolean | No | true | Detect patterns |
|
||||
| `extract_test_examples` | boolean | No | true | Extract test examples |
|
||||
| `extract_how_to_guides` | boolean | No | true | Generate guides |
|
||||
| `extract_config_patterns` | boolean | No | true | Extract config patterns |
|
||||
| `include_comments` | boolean | No | true | Include code comments |
|
||||
| `include_docstrings` | boolean | No | true | Include docstrings |
|
||||
| `include_readme` | boolean | No | true | Include README |
|
||||
|
||||
---
|
||||
|
||||
## Unified (Multi-Source) Config
|
||||
|
||||
Combine multiple sources into one skill with conflict detection.
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react-complete",
|
||||
"description": "React docs + GitHub + examples",
|
||||
"merge_mode": "claude-enhanced",
|
||||
|
||||
"sources": [
|
||||
{
|
||||
"type": "docs",
|
||||
"name": "react-docs",
|
||||
"base_url": "https://react.dev/",
|
||||
"max_pages": 200,
|
||||
"categories": {
|
||||
"getting_started": ["learn"],
|
||||
"api": ["reference"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "github",
|
||||
"name": "react-github",
|
||||
"repo": "facebook/react",
|
||||
"fetch_issues": true,
|
||||
"max_issues": 50
|
||||
},
|
||||
{
|
||||
"type": "pdf",
|
||||
"name": "react-cheatsheet",
|
||||
"pdf_path": "docs/react-cheatsheet.pdf"
|
||||
},
|
||||
{
|
||||
"type": "local",
|
||||
"name": "react-examples",
|
||||
"directory": "./react-examples"
|
||||
}
|
||||
],
|
||||
|
||||
"conflict_detection": {
|
||||
"enabled": true,
|
||||
"rules": [
|
||||
{
|
||||
"field": "api_signature",
|
||||
"action": "flag_mismatch"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"output_structure": {
|
||||
"group_by_source": false,
|
||||
"cross_reference": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Unified Fields
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `name` | string | Yes | - | Combined skill name |
|
||||
| `description` | string | No | "" | Skill description |
|
||||
| `merge_mode` | string | No | "claude-enhanced" | `rule-based`, `claude-enhanced` |
|
||||
| `sources` | array | Yes | - | List of source configs |
|
||||
| `conflict_detection` | object | No | `{}` | Conflict detection settings |
|
||||
| `output_structure` | object | No | `{}` | Output organization |
|
||||
| `workflows` | array | No | `[]` | Workflow presets to apply |
|
||||
| `workflow_stages` | array | No | `[]` | Inline enhancement stages |
|
||||
| `workflow_vars` | object | No | `{}` | Workflow variable overrides |
|
||||
| `workflow_dry_run` | boolean | No | `false` | Preview workflows without executing |
|
||||
|
||||
#### Workflow Configuration (Unified)
|
||||
|
||||
Unified configs support defining enhancement workflows at the top level:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react-complete",
|
||||
"description": "React docs + GitHub with security enhancement",
|
||||
"merge_mode": "claude-enhanced",
|
||||
|
||||
"workflows": ["security-focus", "api-documentation"],
|
||||
"workflow_stages": [
|
||||
{
|
||||
"name": "cleanup",
|
||||
"prompt": "Remove boilerplate sections and standardize formatting"
|
||||
}
|
||||
],
|
||||
"workflow_vars": {
|
||||
"focus_area": "performance",
|
||||
"detail_level": "comprehensive"
|
||||
},
|
||||
|
||||
"sources": [
|
||||
{"type": "docs", "base_url": "https://react.dev/"},
|
||||
{"type": "github", "repo": "facebook/react"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Workflow Fields:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `workflows` | array | List of workflow preset names to apply |
|
||||
| `workflow_stages` | array | Inline stages with `name` and `prompt` |
|
||||
| `workflow_vars` | object | Key-value pairs for workflow variables |
|
||||
| `workflow_dry_run` | boolean | Preview workflows without executing |
|
||||
|
||||
**Note:** CLI flags override config values (CLI takes precedence).
|
||||
|
||||
#### Source Types in Unified Config
|
||||
|
||||
Each source in the `sources` array can be:
|
||||
|
||||
| Type | Required Fields |
|
||||
|------|-----------------|
|
||||
| `docs` | `base_url` |
|
||||
| `github` | `repo` |
|
||||
| `pdf` | `pdf_path` |
|
||||
| `local` | `directory` |
|
||||
|
||||
---
|
||||
|
||||
## Common Fields
|
||||
|
||||
Fields available in all config types:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `name` | string | Skill identifier (letters, numbers, dashes, underscores) |
|
||||
| `description` | string | Human-readable description |
|
||||
| `rate_limit` | number | Delay between requests in seconds |
|
||||
| `output_dir` | string | Custom output directory |
|
||||
| `skip_scrape` | boolean | Use existing data |
|
||||
| `enhance_level` | number | 0=off, 1=SKILL.md, 2=+config, 3=full |
|
||||
|
||||
---
|
||||
|
||||
## Selectors
|
||||
|
||||
CSS selectors for content extraction from HTML:
|
||||
|
||||
```json
|
||||
{
|
||||
"selectors": {
|
||||
"main_content": "article",
|
||||
"title": "h1",
|
||||
"code_blocks": "pre code",
|
||||
"navigation": "nav.sidebar",
|
||||
"breadcrumbs": "nav[aria-label='breadcrumb']",
|
||||
"next_page": "a[rel='next']",
|
||||
"prev_page": "a[rel='prev']"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Default Selectors
|
||||
|
||||
If not specified, these defaults are used:
|
||||
|
||||
| Element | Default Selector |
|
||||
|---------|-----------------|
|
||||
| `main_content` | `article, main, .content, #content, [role='main']` |
|
||||
| `title` | `h1, .page-title, title` |
|
||||
| `code_blocks` | `pre code, code[class*="language-"]` |
|
||||
| `navigation` | `nav, .sidebar, .toc` |
|
||||
|
||||
---
|
||||
|
||||
## Categories
|
||||
|
||||
Map URL patterns to content categories:
|
||||
|
||||
```json
|
||||
{
|
||||
"categories": {
|
||||
"getting_started": [
|
||||
"intro", "tutorial", "quickstart",
|
||||
"installation", "getting-started"
|
||||
],
|
||||
"core_concepts": [
|
||||
"concept", "fundamental", "architecture",
|
||||
"principle", "overview"
|
||||
],
|
||||
"api_reference": [
|
||||
"reference", "api", "method", "function",
|
||||
"class", "interface", "type"
|
||||
],
|
||||
"guides": [
|
||||
"guide", "how-to", "example", "recipe",
|
||||
"pattern", "best-practice"
|
||||
],
|
||||
"advanced": [
|
||||
"advanced", "expert", "performance",
|
||||
"optimization", "internals"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Categories appear as sections in the generated SKILL.md.
|
||||
|
||||
---
|
||||
|
||||
## URL Patterns
|
||||
|
||||
Control which URLs are included or excluded:
|
||||
|
||||
```json
|
||||
{
|
||||
"url_patterns": {
|
||||
"include": [
|
||||
"/docs/",
|
||||
"/guide/",
|
||||
"/api/",
|
||||
"/reference/"
|
||||
],
|
||||
"exclude": [
|
||||
"/blog/",
|
||||
"/news/",
|
||||
"/community/",
|
||||
"/search",
|
||||
"?print=1",
|
||||
"/_static/",
|
||||
"/_images/"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern Rules
|
||||
|
||||
- Patterns are matched against the URL path
|
||||
- Use `*` for wildcards: `/api/v*/`
|
||||
- Use `**` for recursive: `/docs/**/*.html`
|
||||
- Exclude takes precedence over include
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### React Documentation
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react",
|
||||
"base_url": "https://react.dev/",
|
||||
"description": "React - JavaScript library for building UIs",
|
||||
"start_urls": [
|
||||
"https://react.dev/learn",
|
||||
"https://react.dev/reference/react",
|
||||
"https://react.dev/reference/react-dom"
|
||||
],
|
||||
"selectors": {
|
||||
"main_content": "article",
|
||||
"title": "h1",
|
||||
"code_blocks": "pre code"
|
||||
},
|
||||
"url_patterns": {
|
||||
"include": ["/learn/", "/reference/", "/blog/"],
|
||||
"exclude": ["/community/", "/search"]
|
||||
},
|
||||
"categories": {
|
||||
"getting_started": ["learn", "tutorial"],
|
||||
"api": ["reference", "api"],
|
||||
"blog": ["blog"]
|
||||
},
|
||||
"rate_limit": 0.5,
|
||||
"max_pages": 300
|
||||
}
|
||||
```
|
||||
|
||||
### Django GitHub
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "django-github",
|
||||
"type": "github",
|
||||
"repo": "django/django",
|
||||
"description": "Django web framework source code",
|
||||
"enable_codebase_analysis": true,
|
||||
"code_analysis_depth": "deep",
|
||||
"fetch_issues": true,
|
||||
"max_issues": 100,
|
||||
"fetch_releases": true,
|
||||
"file_patterns": ["*.py"],
|
||||
"exclude_patterns": ["tests/**", "docs/**"]
|
||||
}
|
||||
```
|
||||
|
||||
### Unified Multi-Source
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "godot-complete",
|
||||
"description": "Godot Engine - docs, source, and manual",
|
||||
"merge_mode": "claude-enhanced",
|
||||
"sources": [
|
||||
{
|
||||
"type": "docs",
|
||||
"name": "godot-docs",
|
||||
"base_url": "https://docs.godotengine.org/en/stable/",
|
||||
"max_pages": 500
|
||||
},
|
||||
{
|
||||
"type": "github",
|
||||
"name": "godot-source",
|
||||
"repo": "godotengine/godot",
|
||||
"fetch_issues": false
|
||||
},
|
||||
{
|
||||
"type": "pdf",
|
||||
"name": "godot-manual",
|
||||
"pdf_path": "docs/godot-manual.pdf"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Local Project
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-api",
|
||||
"type": "local",
|
||||
"directory": "./my-api-project",
|
||||
"description": "My REST API implementation",
|
||||
"languages": ["Python"],
|
||||
"file_patterns": ["*.py"],
|
||||
"exclude_patterns": ["tests/**", "migrations/**"],
|
||||
"analysis_depth": "comprehensive",
|
||||
"extract_api": true,
|
||||
"extract_test_examples": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation
|
||||
|
||||
Validate your config before scraping:
|
||||
|
||||
```bash
|
||||
# Using CLI
|
||||
skill-seekers scrape --config my-config.json --dry-run
|
||||
|
||||
# Using MCP tool
|
||||
validate_config({"config": "my-config.json"})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Reference](CLI_REFERENCE.md) - Command reference
|
||||
- [Environment Variables](ENVIRONMENT_VARIABLES.md) - Configuration environment
|
||||
|
||||
---
|
||||
|
||||
*For more examples, see `configs/` directory in the repository*
|
||||
738
docs/reference/ENVIRONMENT_VARIABLES.md
Normal file
738
docs/reference/ENVIRONMENT_VARIABLES.md
Normal file
@@ -0,0 +1,738 @@
|
||||
# Environment Variables Reference - Skill Seekers
|
||||
|
||||
> **Version:** 3.1.0
|
||||
> **Last Updated:** 2026-02-16
|
||||
> **Complete environment variable reference**
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [API Keys](#api-keys)
|
||||
- [Platform Configuration](#platform-configuration)
|
||||
- [Paths and Directories](#paths-and-directories)
|
||||
- [Scraping Behavior](#scraping-behavior)
|
||||
- [Enhancement Settings](#enhancement-settings)
|
||||
- [GitHub Configuration](#github-configuration)
|
||||
- [Vector Database Settings](#vector-database-settings)
|
||||
- [Debug and Development](#debug-and-development)
|
||||
- [MCP Server Settings](#mcp-server-settings)
|
||||
- [Examples](#examples)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Skill Seekers uses environment variables for:
|
||||
- API authentication (Claude, Gemini, OpenAI, GitHub)
|
||||
- Configuration paths
|
||||
- Output directories
|
||||
- Behavior customization
|
||||
- Debug settings
|
||||
|
||||
Variables are read at runtime and override default settings.
|
||||
|
||||
---
|
||||
|
||||
## API Keys
|
||||
|
||||
### ANTHROPIC_API_KEY
|
||||
|
||||
**Purpose:** Claude AI API access for enhancement and upload.
|
||||
|
||||
**Format:** `sk-ant-api03-...`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers enhance` (API mode)
|
||||
- `skill-seekers upload` (Claude target)
|
||||
- AI enhancement features
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Alternative:** Use `--api-key` flag per command.
|
||||
|
||||
---
|
||||
|
||||
### GOOGLE_API_KEY
|
||||
|
||||
**Purpose:** Google Gemini API access for upload.
|
||||
|
||||
**Format:** `AIza...`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers upload` (Gemini target)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export GOOGLE_API_KEY=AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### OPENAI_API_KEY
|
||||
|
||||
**Purpose:** OpenAI API access for upload and embeddings.
|
||||
|
||||
**Format:** `sk-...`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers upload` (OpenAI target)
|
||||
- Embedding generation for vector DBs
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GITHUB_TOKEN
|
||||
|
||||
**Purpose:** GitHub API authentication for higher rate limits.
|
||||
|
||||
**Format:** `ghp_...` (personal access token) or `github_pat_...` (fine-grained)
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers github`
|
||||
- `skill-seekers unified` (GitHub sources)
|
||||
- `skill-seekers analyze` (GitHub repos)
|
||||
|
||||
**Benefits:**
|
||||
- 5000 requests/hour vs 60 for unauthenticated
|
||||
- Access to private repositories
|
||||
- Higher GraphQL API limits
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Create token:** https://github.com/settings/tokens
|
||||
|
||||
---
|
||||
|
||||
## Platform Configuration
|
||||
|
||||
### ANTHROPIC_BASE_URL
|
||||
|
||||
**Purpose:** Custom Claude API endpoint.
|
||||
|
||||
**Default:** `https://api.anthropic.com`
|
||||
|
||||
**Use case:** Proxy servers, enterprise deployments, regional endpoints.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export ANTHROPIC_BASE_URL=https://custom-api.example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Paths and Directories
|
||||
|
||||
### SKILL_SEEKERS_HOME
|
||||
|
||||
**Purpose:** Base directory for Skill Seekers data.
|
||||
|
||||
**Default:**
|
||||
- Linux/macOS: `~/.config/skill-seekers/`
|
||||
- Windows: `%APPDATA%\skill-seekers\`
|
||||
|
||||
**Used for:**
|
||||
- Configuration files
|
||||
- Workflow presets
|
||||
- Cache data
|
||||
- Checkpoints
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_HOME=/opt/skill-seekers
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_OUTPUT
|
||||
|
||||
**Purpose:** Default output directory for skills.
|
||||
|
||||
**Default:** `./output/`
|
||||
|
||||
**Used by:**
|
||||
- All scraping commands
|
||||
- Package output
|
||||
- Skill generation
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_OUTPUT=/var/skills/output
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_CONFIG_DIR
|
||||
|
||||
**Purpose:** Directory containing preset configs.
|
||||
|
||||
**Default:** `configs/` (relative to working directory)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_CONFIG_DIR=/etc/skill-seekers/configs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scraping Behavior
|
||||
|
||||
### SKILL_SEEKERS_RATE_LIMIT
|
||||
|
||||
**Purpose:** Default rate limit for HTTP requests.
|
||||
|
||||
**Default:** `0.5` (seconds)
|
||||
|
||||
**Unit:** Seconds between requests
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# More aggressive (faster)
|
||||
export SKILL_SEEKERS_RATE_LIMIT=0.2
|
||||
|
||||
# More conservative (slower)
|
||||
export SKILL_SEEKERS_RATE_LIMIT=1.0
|
||||
```
|
||||
|
||||
**Override:** Use `--rate-limit` flag per command.
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_MAX_PAGES
|
||||
|
||||
**Purpose:** Default maximum pages to scrape.
|
||||
|
||||
**Default:** `500`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_MAX_PAGES=1000
|
||||
```
|
||||
|
||||
**Override:** Use `--max-pages` flag or config file.
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_WORKERS
|
||||
|
||||
**Purpose:** Default number of parallel workers.
|
||||
|
||||
**Default:** `1`
|
||||
|
||||
**Maximum:** `10`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_WORKERS=4
|
||||
```
|
||||
|
||||
**Override:** Use `--workers` flag.
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_TIMEOUT
|
||||
|
||||
**Purpose:** HTTP request timeout.
|
||||
|
||||
**Default:** `30` (seconds)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# For slow servers
|
||||
export SKILL_SEEKERS_TIMEOUT=60
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_USER_AGENT
|
||||
|
||||
**Purpose:** Custom User-Agent header.
|
||||
|
||||
**Default:** `Skill-Seekers/3.1.0`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_USER_AGENT="MyBot/1.0 (contact@example.com)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Enhancement Settings
|
||||
|
||||
### SKILL_SEEKER_AGENT
|
||||
|
||||
**Purpose:** Default local coding agent for enhancement.
|
||||
|
||||
**Default:** `claude`
|
||||
|
||||
**Options:** `claude`, `cursor`, `windsurf`, `cline`, `continue`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers enhance`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKER_AGENT=cursor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_ENHANCE_TIMEOUT
|
||||
|
||||
**Purpose:** Timeout for AI enhancement operations.
|
||||
|
||||
**Default:** `600` (seconds = 10 minutes)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# For large skills
|
||||
export SKILL_SEEKERS_ENHANCE_TIMEOUT=1200
|
||||
```
|
||||
|
||||
**Override:** Use `--timeout` flag.
|
||||
|
||||
---
|
||||
|
||||
### ANTHROPIC_MODEL
|
||||
|
||||
**Purpose:** Claude model for API enhancement.
|
||||
|
||||
**Default:** `claude-3-5-sonnet-20241022`
|
||||
|
||||
**Options:**
|
||||
- `claude-3-5-sonnet-20241022` (recommended)
|
||||
- `claude-3-opus-20240229` (highest quality, more expensive)
|
||||
- `claude-3-haiku-20240307` (fastest, cheapest)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export ANTHROPIC_MODEL=claude-3-opus-20240229
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## GitHub Configuration
|
||||
|
||||
### GITHUB_API_URL
|
||||
|
||||
**Purpose:** Custom GitHub API endpoint.
|
||||
|
||||
**Default:** `https://api.github.com`
|
||||
|
||||
**Use case:** GitHub Enterprise Server.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export GITHUB_API_URL=https://github.company.com/api/v3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### GITHUB_ENTERPRISE_TOKEN
|
||||
|
||||
**Purpose:** Separate token for GitHub Enterprise.
|
||||
|
||||
**Use case:** Different tokens for github.com vs enterprise.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export GITHUB_TOKEN=ghp_... # github.com
|
||||
export GITHUB_ENTERPRISE_TOKEN=... # enterprise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Vector Database Settings
|
||||
|
||||
### CHROMA_URL
|
||||
|
||||
**Purpose:** ChromaDB server URL.
|
||||
|
||||
**Default:** `http://localhost:8000`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers upload --target chroma`
|
||||
- `export_to_chroma` MCP tool
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export CHROMA_URL=http://chroma.example.com:8000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### CHROMA_PERSIST_DIRECTORY
|
||||
|
||||
**Purpose:** Local directory for ChromaDB persistence.
|
||||
|
||||
**Default:** `./chroma_db/`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export CHROMA_PERSIST_DIRECTORY=/var/lib/chroma
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### WEAVIATE_URL
|
||||
|
||||
**Purpose:** Weaviate server URL.
|
||||
|
||||
**Default:** `http://localhost:8080`
|
||||
|
||||
**Used by:**
|
||||
- `skill-seekers upload --target weaviate`
|
||||
- `export_to_weaviate` MCP tool
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export WEAVIATE_URL=https://weaviate.example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### WEAVIATE_API_KEY
|
||||
|
||||
**Purpose:** Weaviate API key for authentication.
|
||||
|
||||
**Used by:**
|
||||
- Weaviate Cloud
|
||||
- Authenticated Weaviate instances
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export WEAVIATE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### QDRANT_URL
|
||||
|
||||
**Purpose:** Qdrant server URL.
|
||||
|
||||
**Default:** `http://localhost:6333`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export QDRANT_URL=http://qdrant.example.com:6333
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### QDRANT_API_KEY
|
||||
|
||||
**Purpose:** Qdrant API key for authentication.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export QDRANT_API_KEY=xxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debug and Development
|
||||
|
||||
### SKILL_SEEKERS_DEBUG
|
||||
|
||||
**Purpose:** Enable debug logging.
|
||||
|
||||
**Values:** `1`, `true`, `yes`
|
||||
|
||||
**Equivalent to:** `--verbose` flag
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_DEBUG=1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_LOG_LEVEL
|
||||
|
||||
**Purpose:** Set logging level.
|
||||
|
||||
**Default:** `INFO`
|
||||
|
||||
**Options:** `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_LOG_LEVEL=DEBUG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_LOG_FILE
|
||||
|
||||
**Purpose:** Log to file instead of stdout.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_LOG_FILE=/var/log/skill-seekers.log
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_CACHE_DIR
|
||||
|
||||
**Purpose:** Custom cache directory.
|
||||
|
||||
**Default:** `~/.cache/skill-seekers/`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_CACHE_DIR=/tmp/skill-seekers-cache
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SKILL_SEEKERS_NO_CACHE
|
||||
|
||||
**Purpose:** Disable caching.
|
||||
|
||||
**Values:** `1`, `true`, `yes`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export SKILL_SEEKERS_NO_CACHE=1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MCP Server Settings
|
||||
|
||||
### MCP_TRANSPORT
|
||||
|
||||
**Purpose:** Default MCP transport mode.
|
||||
|
||||
**Default:** `stdio`
|
||||
|
||||
**Options:** `stdio`, `http`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export MCP_TRANSPORT=http
|
||||
```
|
||||
|
||||
**Override:** Use `--transport` flag.
|
||||
|
||||
---
|
||||
|
||||
### MCP_PORT
|
||||
|
||||
**Purpose:** Default MCP HTTP port.
|
||||
|
||||
**Default:** `8765`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export MCP_PORT=8080
|
||||
```
|
||||
|
||||
**Override:** Use `--port` flag.
|
||||
|
||||
---
|
||||
|
||||
### MCP_HOST
|
||||
|
||||
**Purpose:** Default MCP HTTP host.
|
||||
|
||||
**Default:** `127.0.0.1`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
export MCP_HOST=0.0.0.0
|
||||
```
|
||||
|
||||
**Override:** Use `--host` flag.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Development Environment
|
||||
|
||||
```bash
|
||||
# Debug mode
|
||||
export SKILL_SEEKERS_DEBUG=1
|
||||
export SKILL_SEEKERS_LOG_LEVEL=DEBUG
|
||||
|
||||
# Custom paths
|
||||
export SKILL_SEEKERS_HOME=./.skill-seekers
|
||||
export SKILL_SEEKERS_OUTPUT=./output
|
||||
|
||||
# Faster scraping for testing
|
||||
export SKILL_SEEKERS_RATE_LIMIT=0.1
|
||||
export SKILL_SEEKERS_MAX_PAGES=50
|
||||
```
|
||||
|
||||
### Production Environment
|
||||
|
||||
```bash
|
||||
# API keys
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
export GITHUB_TOKEN=ghp_...
|
||||
|
||||
# Custom output directory
|
||||
export SKILL_SEEKERS_OUTPUT=/var/www/skills
|
||||
|
||||
# Conservative scraping
|
||||
export SKILL_SEEKERS_RATE_LIMIT=1.0
|
||||
export SKILL_SEEKERS_WORKERS=2
|
||||
|
||||
# Logging
|
||||
export SKILL_SEEKERS_LOG_FILE=/var/log/skill-seekers.log
|
||||
export SKILL_SEEKERS_LOG_LEVEL=WARNING
|
||||
```
|
||||
|
||||
### CI/CD Environment
|
||||
|
||||
```bash
|
||||
# Non-interactive
|
||||
export SKILL_SEEKERS_LOG_LEVEL=ERROR
|
||||
|
||||
# API keys from secrets
|
||||
export ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY_SECRET}
|
||||
export GITHUB_TOKEN=${GITHUB_TOKEN_SECRET}
|
||||
|
||||
# Fresh runs (no cache)
|
||||
export SKILL_SEEKERS_NO_CACHE=1
|
||||
```
|
||||
|
||||
### Multi-Platform Setup
|
||||
|
||||
```bash
|
||||
# All API keys
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
export GOOGLE_API_KEY=AIza...
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export GITHUB_TOKEN=ghp_...
|
||||
|
||||
# Vector databases
|
||||
export CHROMA_URL=http://localhost:8000
|
||||
export WEAVIATE_URL=http://localhost:8080
|
||||
export WEAVIATE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration File
|
||||
|
||||
Environment variables can also be set in a `.env` file:
|
||||
|
||||
```bash
|
||||
# .env file
|
||||
ANTHROPIC_API_KEY=sk-ant-...
|
||||
GITHUB_TOKEN=ghp_...
|
||||
SKILL_SEEKERS_OUTPUT=./output
|
||||
SKILL_SEEKERS_RATE_LIMIT=0.5
|
||||
```
|
||||
|
||||
Load with:
|
||||
```bash
|
||||
# Automatically loaded if python-dotenv is installed
|
||||
# Or manually:
|
||||
export $(cat .env | xargs)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Priority Order
|
||||
|
||||
Settings are applied in this order (later overrides earlier):
|
||||
|
||||
1. Default values
|
||||
2. Environment variables
|
||||
3. Configuration file
|
||||
4. Command-line flags
|
||||
|
||||
Example:
|
||||
```bash
|
||||
# Default: rate_limit = 0.5
|
||||
export SKILL_SEEKERS_RATE_LIMIT=1.0 # Env var overrides default
|
||||
# Config file: rate_limit = 0.2 # Config overrides env
|
||||
skill-seekers scrape --rate-limit 2.0 # Flag overrides all
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Never commit API keys
|
||||
|
||||
```bash
|
||||
# Add to .gitignore
|
||||
echo ".env" >> .gitignore
|
||||
echo "*.key" >> .gitignore
|
||||
```
|
||||
|
||||
### Use secret management
|
||||
|
||||
```bash
|
||||
# macOS Keychain
|
||||
export ANTHROPIC_API_KEY=$(security find-generic-password -s "anthropic-api" -w)
|
||||
|
||||
# Linux Secret Service (with secret-tool)
|
||||
export ANTHROPIC_API_KEY=$(secret-tool lookup service anthropic)
|
||||
|
||||
# 1Password CLI
|
||||
export ANTHROPIC_API_KEY=$(op read "op://vault/anthropic/credential")
|
||||
```
|
||||
|
||||
### File permissions
|
||||
|
||||
```bash
|
||||
# Restrict .env file
|
||||
chmod 600 .env
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Variable not recognized
|
||||
|
||||
```bash
|
||||
# Check if set
|
||||
echo $ANTHROPIC_API_KEY
|
||||
|
||||
# Check in Python
|
||||
python -c "import os; print(os.getenv('ANTHROPIC_API_KEY'))"
|
||||
```
|
||||
|
||||
### Priority issues
|
||||
|
||||
```bash
|
||||
# See effective configuration
|
||||
skill-seekers config --show
|
||||
```
|
||||
|
||||
### Path expansion
|
||||
|
||||
```bash
|
||||
# Use full path or expand tilde
|
||||
export SKILL_SEEKERS_HOME=$HOME/.skill-seekers
|
||||
# NOT: ~/.skill-seekers (may not expand in all shells)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Reference](CLI_REFERENCE.md) - Command reference
|
||||
- [Config Format](CONFIG_FORMAT.md) - JSON configuration
|
||||
|
||||
---
|
||||
|
||||
*For platform-specific setup, see [Installation Guide](../getting-started/01-installation.md)*
|
||||
1078
docs/reference/MCP_REFERENCE.md
Normal file
1078
docs/reference/MCP_REFERENCE.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user