docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations

Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-22 01:01:51 +03:00
parent 22bdd4f5f6
commit ba9a8ff8b5
69 changed files with 31304 additions and 246 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,610 @@
# Config Format Reference - Skill Seekers
> **Version:** 3.1.0
> **Last Updated:** 2026-02-16
> **Complete JSON configuration specification**
---
## Table of Contents
- [Overview](#overview)
- [Single-Source Config](#single-source-config)
- [Documentation Source](#documentation-source)
- [GitHub Source](#github-source)
- [PDF Source](#pdf-source)
- [Local Source](#local-source)
- [Unified (Multi-Source) Config](#unified-multi-source-config)
- [Common Fields](#common-fields)
- [Selectors](#selectors)
- [Categories](#categories)
- [URL Patterns](#url-patterns)
- [Examples](#examples)
---
## Overview
Skill Seekers uses JSON configuration files to define scraping targets. There are two types:
| Type | Use Case | File |
|------|----------|------|
| **Single-Source** | One source (docs, GitHub, PDF, or local) | `*.json` |
| **Unified** | Multiple sources combined | `*-unified.json` |
---
## Single-Source Config
### Documentation Source
For scraping documentation websites.
```json
{
"name": "react",
"base_url": "https://react.dev/",
"description": "React - JavaScript library for building UIs",
"start_urls": [
"https://react.dev/learn",
"https://react.dev/reference/react"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/learn/", "/reference/"],
"exclude": ["/blog/", "/community/"]
},
"categories": {
"getting_started": ["learn", "tutorial", "intro"],
"api": ["reference", "api", "hooks"]
},
"rate_limit": 0.5,
"max_pages": 300,
"merge_mode": "claude-enhanced"
}
```
#### Documentation Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `name` | string | Yes | - | Skill name (alphanumeric, dashes, underscores) |
| `base_url` | string | Yes | - | Base documentation URL |
| `description` | string | No | "" | Skill description for SKILL.md |
| `start_urls` | array | No | `[base_url]` | URLs to start crawling from |
| `selectors` | object | No | see below | CSS selectors for content extraction |
| `url_patterns` | object | No | `{}` | Include/exclude URL patterns |
| `categories` | object | No | `{}` | Content categorization rules |
| `rate_limit` | number | No | 0.5 | Seconds between requests |
| `max_pages` | number | No | 500 | Maximum pages to scrape |
| `merge_mode` | string | No | "claude-enhanced" | Merge strategy |
| `extract_api` | boolean | No | false | Extract API references |
| `llms_txt_url` | string | No | auto | Path to llms.txt file |
---
### GitHub Source
For analyzing GitHub repositories.
```json
{
"name": "react-github",
"type": "github",
"repo": "facebook/react",
"description": "React GitHub repository analysis",
"enable_codebase_analysis": true,
"code_analysis_depth": "deep",
"fetch_issues": true,
"max_issues": 100,
"issue_labels": ["bug", "enhancement"],
"fetch_releases": true,
"max_releases": 20,
"fetch_changelog": true,
"analyze_commit_history": true,
"file_patterns": ["*.js", "*.ts", "*.tsx"],
"exclude_patterns": ["*.test.js", "node_modules/**"],
"rate_limit": 1.0
}
```
#### GitHub Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `name` | string | Yes | - | Skill name |
| `type` | string | Yes | - | Must be `"github"` |
| `repo` | string | Yes | - | Repository in `owner/repo` format |
| `description` | string | No | "" | Skill description |
| `enable_codebase_analysis` | boolean | No | true | Analyze source code |
| `code_analysis_depth` | string | No | "standard" | `surface`, `standard`, `deep` |
| `fetch_issues` | boolean | No | true | Fetch GitHub issues |
| `max_issues` | number | No | 100 | Maximum issues to fetch |
| `issue_labels` | array | No | [] | Filter by labels |
| `fetch_releases` | boolean | No | true | Fetch releases |
| `max_releases` | number | No | 20 | Maximum releases |
| `fetch_changelog` | boolean | No | true | Extract CHANGELOG |
| `analyze_commit_history` | boolean | No | false | Analyze commits |
| `file_patterns` | array | No | [] | Include file patterns |
| `exclude_patterns` | array | No | [] | Exclude file patterns |
---
### PDF Source
For extracting content from PDF files.
```json
{
"name": "product-manual",
"type": "pdf",
"pdf_path": "docs/manual.pdf",
"description": "Product documentation manual",
"enable_ocr": false,
"password": "",
"extract_images": true,
"image_output_dir": "output/images/",
"extract_tables": true,
"table_format": "markdown",
"page_range": [1, 100],
"split_by_chapters": true,
"chunk_size": 1000,
"chunk_overlap": 100
}
```
#### PDF Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `name` | string | Yes | - | Skill name |
| `type` | string | Yes | - | Must be `"pdf"` |
| `pdf_path` | string | Yes | - | Path to PDF file |
| `description` | string | No | "" | Skill description |
| `enable_ocr` | boolean | No | false | OCR for scanned PDFs |
| `password` | string | No | "" | PDF password if encrypted |
| `extract_images` | boolean | No | false | Extract embedded images |
| `image_output_dir` | string | No | auto | Directory for images |
| `extract_tables` | boolean | No | false | Extract tables |
| `table_format` | string | No | "markdown" | `markdown`, `json`, `csv` |
| `page_range` | array | No | all | `[start, end]` page range |
| `split_by_chapters` | boolean | No | false | Split by detected chapters |
| `chunk_size` | number | No | 1000 | Characters per chunk |
| `chunk_overlap` | number | No | 100 | Overlap between chunks |
---
### Local Source
For analyzing local codebases.
```json
{
"name": "my-project",
"type": "local",
"directory": "./my-project",
"description": "Local project analysis",
"languages": ["Python", "JavaScript"],
"file_patterns": ["*.py", "*.js"],
"exclude_patterns": ["*.pyc", "node_modules/**", ".git/**"],
"analysis_depth": "comprehensive",
"extract_api": true,
"extract_patterns": true,
"extract_test_examples": true,
"extract_how_to_guides": true,
"extract_config_patterns": true,
"include_comments": true,
"include_docstrings": true,
"include_readme": true
}
```
#### Local Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `name` | string | Yes | - | Skill name |
| `type` | string | Yes | - | Must be `"local"` |
| `directory` | string | Yes | - | Path to directory |
| `description` | string | No | "" | Skill description |
| `languages` | array | No | auto | Languages to analyze |
| `file_patterns` | array | No | all | Include patterns |
| `exclude_patterns` | array | No | common | Exclude patterns |
| `analysis_depth` | string | No | "standard" | `quick`, `standard`, `comprehensive` |
| `extract_api` | boolean | No | true | Extract API documentation |
| `extract_patterns` | boolean | No | true | Detect patterns |
| `extract_test_examples` | boolean | No | true | Extract test examples |
| `extract_how_to_guides` | boolean | No | true | Generate guides |
| `extract_config_patterns` | boolean | No | true | Extract config patterns |
| `include_comments` | boolean | No | true | Include code comments |
| `include_docstrings` | boolean | No | true | Include docstrings |
| `include_readme` | boolean | No | true | Include README |
---
## Unified (Multi-Source) Config
Combine multiple sources into one skill with conflict detection.
```json
{
"name": "react-complete",
"description": "React docs + GitHub + examples",
"merge_mode": "claude-enhanced",
"sources": [
{
"type": "docs",
"name": "react-docs",
"base_url": "https://react.dev/",
"max_pages": 200,
"categories": {
"getting_started": ["learn"],
"api": ["reference"]
}
},
{
"type": "github",
"name": "react-github",
"repo": "facebook/react",
"fetch_issues": true,
"max_issues": 50
},
{
"type": "pdf",
"name": "react-cheatsheet",
"pdf_path": "docs/react-cheatsheet.pdf"
},
{
"type": "local",
"name": "react-examples",
"directory": "./react-examples"
}
],
"conflict_detection": {
"enabled": true,
"rules": [
{
"field": "api_signature",
"action": "flag_mismatch"
}
]
},
"output_structure": {
"group_by_source": false,
"cross_reference": true
}
}
```
#### Unified Fields
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `name` | string | Yes | - | Combined skill name |
| `description` | string | No | "" | Skill description |
| `merge_mode` | string | No | "claude-enhanced" | `rule-based`, `claude-enhanced` |
| `sources` | array | Yes | - | List of source configs |
| `conflict_detection` | object | No | `{}` | Conflict detection settings |
| `output_structure` | object | No | `{}` | Output organization |
| `workflows` | array | No | `[]` | Workflow presets to apply |
| `workflow_stages` | array | No | `[]` | Inline enhancement stages |
| `workflow_vars` | object | No | `{}` | Workflow variable overrides |
| `workflow_dry_run` | boolean | No | `false` | Preview workflows without executing |
#### Workflow Configuration (Unified)
Unified configs support defining enhancement workflows at the top level:
```json
{
"name": "react-complete",
"description": "React docs + GitHub with security enhancement",
"merge_mode": "claude-enhanced",
"workflows": ["security-focus", "api-documentation"],
"workflow_stages": [
{
"name": "cleanup",
"prompt": "Remove boilerplate sections and standardize formatting"
}
],
"workflow_vars": {
"focus_area": "performance",
"detail_level": "comprehensive"
},
"sources": [
{"type": "docs", "base_url": "https://react.dev/"},
{"type": "github", "repo": "facebook/react"}
]
}
```
**Workflow Fields:**
| Field | Type | Description |
|-------|------|-------------|
| `workflows` | array | List of workflow preset names to apply |
| `workflow_stages` | array | Inline stages with `name` and `prompt` |
| `workflow_vars` | object | Key-value pairs for workflow variables |
| `workflow_dry_run` | boolean | Preview workflows without executing |
**Note:** CLI flags override config values (CLI takes precedence).
#### Source Types in Unified Config
Each source in the `sources` array can be:
| Type | Required Fields |
|------|-----------------|
| `docs` | `base_url` |
| `github` | `repo` |
| `pdf` | `pdf_path` |
| `local` | `directory` |
---
## Common Fields
Fields available in all config types:
| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Skill identifier (letters, numbers, dashes, underscores) |
| `description` | string | Human-readable description |
| `rate_limit` | number | Delay between requests in seconds |
| `output_dir` | string | Custom output directory |
| `skip_scrape` | boolean | Use existing data |
| `enhance_level` | number | 0=off, 1=SKILL.md, 2=+config, 3=full |
---
## Selectors
CSS selectors for content extraction from HTML:
```json
{
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code",
"navigation": "nav.sidebar",
"breadcrumbs": "nav[aria-label='breadcrumb']",
"next_page": "a[rel='next']",
"prev_page": "a[rel='prev']"
}
}
```
### Default Selectors
If not specified, these defaults are used:
| Element | Default Selector |
|---------|-----------------|
| `main_content` | `article, main, .content, #content, [role='main']` |
| `title` | `h1, .page-title, title` |
| `code_blocks` | `pre code, code[class*="language-"]` |
| `navigation` | `nav, .sidebar, .toc` |
---
## Categories
Map URL patterns to content categories:
```json
{
"categories": {
"getting_started": [
"intro", "tutorial", "quickstart",
"installation", "getting-started"
],
"core_concepts": [
"concept", "fundamental", "architecture",
"principle", "overview"
],
"api_reference": [
"reference", "api", "method", "function",
"class", "interface", "type"
],
"guides": [
"guide", "how-to", "example", "recipe",
"pattern", "best-practice"
],
"advanced": [
"advanced", "expert", "performance",
"optimization", "internals"
]
}
}
```
Categories appear as sections in the generated SKILL.md.
---
## URL Patterns
Control which URLs are included or excluded:
```json
{
"url_patterns": {
"include": [
"/docs/",
"/guide/",
"/api/",
"/reference/"
],
"exclude": [
"/blog/",
"/news/",
"/community/",
"/search",
"?print=1",
"/_static/",
"/_images/"
]
}
}
```
### Pattern Rules
- Patterns are matched against the URL path
- Use `*` for wildcards: `/api/v*/`
- Use `**` for recursive: `/docs/**/*.html`
- Exclude takes precedence over include
---
## Examples
### React Documentation
```json
{
"name": "react",
"base_url": "https://react.dev/",
"description": "React - JavaScript library for building UIs",
"start_urls": [
"https://react.dev/learn",
"https://react.dev/reference/react",
"https://react.dev/reference/react-dom"
],
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/learn/", "/reference/", "/blog/"],
"exclude": ["/community/", "/search"]
},
"categories": {
"getting_started": ["learn", "tutorial"],
"api": ["reference", "api"],
"blog": ["blog"]
},
"rate_limit": 0.5,
"max_pages": 300
}
```
### Django GitHub
```json
{
"name": "django-github",
"type": "github",
"repo": "django/django",
"description": "Django web framework source code",
"enable_codebase_analysis": true,
"code_analysis_depth": "deep",
"fetch_issues": true,
"max_issues": 100,
"fetch_releases": true,
"file_patterns": ["*.py"],
"exclude_patterns": ["tests/**", "docs/**"]
}
```
### Unified Multi-Source
```json
{
"name": "godot-complete",
"description": "Godot Engine - docs, source, and manual",
"merge_mode": "claude-enhanced",
"sources": [
{
"type": "docs",
"name": "godot-docs",
"base_url": "https://docs.godotengine.org/en/stable/",
"max_pages": 500
},
{
"type": "github",
"name": "godot-source",
"repo": "godotengine/godot",
"fetch_issues": false
},
{
"type": "pdf",
"name": "godot-manual",
"pdf_path": "docs/godot-manual.pdf"
}
]
}
```
### Local Project
```json
{
"name": "my-api",
"type": "local",
"directory": "./my-api-project",
"description": "My REST API implementation",
"languages": ["Python"],
"file_patterns": ["*.py"],
"exclude_patterns": ["tests/**", "migrations/**"],
"analysis_depth": "comprehensive",
"extract_api": true,
"extract_test_examples": true
}
```
---
## Validation
Validate your config before scraping:
```bash
# Using CLI
skill-seekers scrape --config my-config.json --dry-run
# Using MCP tool
validate_config({"config": "my-config.json"})
```
---
## See Also
- [CLI Reference](CLI_REFERENCE.md) - Command reference
- [Environment Variables](ENVIRONMENT_VARIABLES.md) - Configuration environment
---
*For more examples, see `configs/` directory in the repository*

View File

@@ -0,0 +1,738 @@
# Environment Variables Reference - Skill Seekers
> **Version:** 3.1.0
> **Last Updated:** 2026-02-16
> **Complete environment variable reference**
---
## Table of Contents
- [Overview](#overview)
- [API Keys](#api-keys)
- [Platform Configuration](#platform-configuration)
- [Paths and Directories](#paths-and-directories)
- [Scraping Behavior](#scraping-behavior)
- [Enhancement Settings](#enhancement-settings)
- [GitHub Configuration](#github-configuration)
- [Vector Database Settings](#vector-database-settings)
- [Debug and Development](#debug-and-development)
- [MCP Server Settings](#mcp-server-settings)
- [Examples](#examples)
---
## Overview
Skill Seekers uses environment variables for:
- API authentication (Claude, Gemini, OpenAI, GitHub)
- Configuration paths
- Output directories
- Behavior customization
- Debug settings
Variables are read at runtime and override default settings.
---
## API Keys
### ANTHROPIC_API_KEY
**Purpose:** Claude AI API access for enhancement and upload.
**Format:** `sk-ant-api03-...`
**Used by:**
- `skill-seekers enhance` (API mode)
- `skill-seekers upload` (Claude target)
- AI enhancement features
**Example:**
```bash
export ANTHROPIC_API_KEY=sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Alternative:** Use `--api-key` flag per command.
---
### GOOGLE_API_KEY
**Purpose:** Google Gemini API access for upload.
**Format:** `AIza...`
**Used by:**
- `skill-seekers upload` (Gemini target)
**Example:**
```bash
export GOOGLE_API_KEY=AIzaSyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
---
### OPENAI_API_KEY
**Purpose:** OpenAI API access for upload and embeddings.
**Format:** `sk-...`
**Used by:**
- `skill-seekers upload` (OpenAI target)
- Embedding generation for vector DBs
**Example:**
```bash
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
---
### GITHUB_TOKEN
**Purpose:** GitHub API authentication for higher rate limits.
**Format:** `ghp_...` (personal access token) or `github_pat_...` (fine-grained)
**Used by:**
- `skill-seekers github`
- `skill-seekers unified` (GitHub sources)
- `skill-seekers analyze` (GitHub repos)
**Benefits:**
- 5000 requests/hour vs 60 for unauthenticated
- Access to private repositories
- Higher GraphQL API limits
**Example:**
```bash
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
**Create token:** https://github.com/settings/tokens
---
## Platform Configuration
### ANTHROPIC_BASE_URL
**Purpose:** Custom Claude API endpoint.
**Default:** `https://api.anthropic.com`
**Use case:** Proxy servers, enterprise deployments, regional endpoints.
**Example:**
```bash
export ANTHROPIC_BASE_URL=https://custom-api.example.com
```
---
## Paths and Directories
### SKILL_SEEKERS_HOME
**Purpose:** Base directory for Skill Seekers data.
**Default:**
- Linux/macOS: `~/.config/skill-seekers/`
- Windows: `%APPDATA%\skill-seekers\`
**Used for:**
- Configuration files
- Workflow presets
- Cache data
- Checkpoints
**Example:**
```bash
export SKILL_SEEKERS_HOME=/opt/skill-seekers
```
---
### SKILL_SEEKERS_OUTPUT
**Purpose:** Default output directory for skills.
**Default:** `./output/`
**Used by:**
- All scraping commands
- Package output
- Skill generation
**Example:**
```bash
export SKILL_SEEKERS_OUTPUT=/var/skills/output
```
---
### SKILL_SEEKERS_CONFIG_DIR
**Purpose:** Directory containing preset configs.
**Default:** `configs/` (relative to working directory)
**Example:**
```bash
export SKILL_SEEKERS_CONFIG_DIR=/etc/skill-seekers/configs
```
---
## Scraping Behavior
### SKILL_SEEKERS_RATE_LIMIT
**Purpose:** Default rate limit for HTTP requests.
**Default:** `0.5` (seconds)
**Unit:** Seconds between requests
**Example:**
```bash
# More aggressive (faster)
export SKILL_SEEKERS_RATE_LIMIT=0.2
# More conservative (slower)
export SKILL_SEEKERS_RATE_LIMIT=1.0
```
**Override:** Use `--rate-limit` flag per command.
---
### SKILL_SEEKERS_MAX_PAGES
**Purpose:** Default maximum pages to scrape.
**Default:** `500`
**Example:**
```bash
export SKILL_SEEKERS_MAX_PAGES=1000
```
**Override:** Use `--max-pages` flag or config file.
---
### SKILL_SEEKERS_WORKERS
**Purpose:** Default number of parallel workers.
**Default:** `1`
**Maximum:** `10`
**Example:**
```bash
export SKILL_SEEKERS_WORKERS=4
```
**Override:** Use `--workers` flag.
---
### SKILL_SEEKERS_TIMEOUT
**Purpose:** HTTP request timeout.
**Default:** `30` (seconds)
**Example:**
```bash
# For slow servers
export SKILL_SEEKERS_TIMEOUT=60
```
---
### SKILL_SEEKERS_USER_AGENT
**Purpose:** Custom User-Agent header.
**Default:** `Skill-Seekers/3.1.0`
**Example:**
```bash
export SKILL_SEEKERS_USER_AGENT="MyBot/1.0 (contact@example.com)"
```
---
## Enhancement Settings
### SKILL_SEEKER_AGENT
**Purpose:** Default local coding agent for enhancement.
**Default:** `claude`
**Options:** `claude`, `cursor`, `windsurf`, `cline`, `continue`
**Used by:**
- `skill-seekers enhance`
**Example:**
```bash
export SKILL_SEEKER_AGENT=cursor
```
---
### SKILL_SEEKERS_ENHANCE_TIMEOUT
**Purpose:** Timeout for AI enhancement operations.
**Default:** `600` (seconds = 10 minutes)
**Example:**
```bash
# For large skills
export SKILL_SEEKERS_ENHANCE_TIMEOUT=1200
```
**Override:** Use `--timeout` flag.
---
### ANTHROPIC_MODEL
**Purpose:** Claude model for API enhancement.
**Default:** `claude-3-5-sonnet-20241022`
**Options:**
- `claude-3-5-sonnet-20241022` (recommended)
- `claude-3-opus-20240229` (highest quality, more expensive)
- `claude-3-haiku-20240307` (fastest, cheapest)
**Example:**
```bash
export ANTHROPIC_MODEL=claude-3-opus-20240229
```
---
## GitHub Configuration
### GITHUB_API_URL
**Purpose:** Custom GitHub API endpoint.
**Default:** `https://api.github.com`
**Use case:** GitHub Enterprise Server.
**Example:**
```bash
export GITHUB_API_URL=https://github.company.com/api/v3
```
---
### GITHUB_ENTERPRISE_TOKEN
**Purpose:** Separate token for GitHub Enterprise.
**Use case:** Different tokens for github.com vs enterprise.
**Example:**
```bash
export GITHUB_TOKEN=ghp_... # github.com
export GITHUB_ENTERPRISE_TOKEN=... # enterprise
```
---
## Vector Database Settings
### CHROMA_URL
**Purpose:** ChromaDB server URL.
**Default:** `http://localhost:8000`
**Used by:**
- `skill-seekers upload --target chroma`
- `export_to_chroma` MCP tool
**Example:**
```bash
export CHROMA_URL=http://chroma.example.com:8000
```
---
### CHROMA_PERSIST_DIRECTORY
**Purpose:** Local directory for ChromaDB persistence.
**Default:** `./chroma_db/`
**Example:**
```bash
export CHROMA_PERSIST_DIRECTORY=/var/lib/chroma
```
---
### WEAVIATE_URL
**Purpose:** Weaviate server URL.
**Default:** `http://localhost:8080`
**Used by:**
- `skill-seekers upload --target weaviate`
- `export_to_weaviate` MCP tool
**Example:**
```bash
export WEAVIATE_URL=https://weaviate.example.com
```
---
### WEAVIATE_API_KEY
**Purpose:** Weaviate API key for authentication.
**Used by:**
- Weaviate Cloud
- Authenticated Weaviate instances
**Example:**
```bash
export WEAVIATE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
---
### QDRANT_URL
**Purpose:** Qdrant server URL.
**Default:** `http://localhost:6333`
**Example:**
```bash
export QDRANT_URL=http://qdrant.example.com:6333
```
---
### QDRANT_API_KEY
**Purpose:** Qdrant API key for authentication.
**Example:**
```bash
export QDRANT_API_KEY=xxxxxxxxxxxxxxxx
```
---
## Debug and Development
### SKILL_SEEKERS_DEBUG
**Purpose:** Enable debug logging.
**Values:** `1`, `true`, `yes`
**Equivalent to:** `--verbose` flag
**Example:**
```bash
export SKILL_SEEKERS_DEBUG=1
```
---
### SKILL_SEEKERS_LOG_LEVEL
**Purpose:** Set logging level.
**Default:** `INFO`
**Options:** `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
**Example:**
```bash
export SKILL_SEEKERS_LOG_LEVEL=DEBUG
```
---
### SKILL_SEEKERS_LOG_FILE
**Purpose:** Log to file instead of stdout.
**Example:**
```bash
export SKILL_SEEKERS_LOG_FILE=/var/log/skill-seekers.log
```
---
### SKILL_SEEKERS_CACHE_DIR
**Purpose:** Custom cache directory.
**Default:** `~/.cache/skill-seekers/`
**Example:**
```bash
export SKILL_SEEKERS_CACHE_DIR=/tmp/skill-seekers-cache
```
---
### SKILL_SEEKERS_NO_CACHE
**Purpose:** Disable caching.
**Values:** `1`, `true`, `yes`
**Example:**
```bash
export SKILL_SEEKERS_NO_CACHE=1
```
---
## MCP Server Settings
### MCP_TRANSPORT
**Purpose:** Default MCP transport mode.
**Default:** `stdio`
**Options:** `stdio`, `http`
**Example:**
```bash
export MCP_TRANSPORT=http
```
**Override:** Use `--transport` flag.
---
### MCP_PORT
**Purpose:** Default MCP HTTP port.
**Default:** `8765`
**Example:**
```bash
export MCP_PORT=8080
```
**Override:** Use `--port` flag.
---
### MCP_HOST
**Purpose:** Default MCP HTTP host.
**Default:** `127.0.0.1`
**Example:**
```bash
export MCP_HOST=0.0.0.0
```
**Override:** Use `--host` flag.
---
## Examples
### Development Environment
```bash
# Debug mode
export SKILL_SEEKERS_DEBUG=1
export SKILL_SEEKERS_LOG_LEVEL=DEBUG
# Custom paths
export SKILL_SEEKERS_HOME=./.skill-seekers
export SKILL_SEEKERS_OUTPUT=./output
# Faster scraping for testing
export SKILL_SEEKERS_RATE_LIMIT=0.1
export SKILL_SEEKERS_MAX_PAGES=50
```
### Production Environment
```bash
# API keys
export ANTHROPIC_API_KEY=sk-ant-...
export GITHUB_TOKEN=ghp_...
# Custom output directory
export SKILL_SEEKERS_OUTPUT=/var/www/skills
# Conservative scraping
export SKILL_SEEKERS_RATE_LIMIT=1.0
export SKILL_SEEKERS_WORKERS=2
# Logging
export SKILL_SEEKERS_LOG_FILE=/var/log/skill-seekers.log
export SKILL_SEEKERS_LOG_LEVEL=WARNING
```
### CI/CD Environment
```bash
# Non-interactive
export SKILL_SEEKERS_LOG_LEVEL=ERROR
# API keys from secrets
export ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY_SECRET}
export GITHUB_TOKEN=${GITHUB_TOKEN_SECRET}
# Fresh runs (no cache)
export SKILL_SEEKERS_NO_CACHE=1
```
### Multi-Platform Setup
```bash
# All API keys
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=AIza...
export OPENAI_API_KEY=sk-...
export GITHUB_TOKEN=ghp_...
# Vector databases
export CHROMA_URL=http://localhost:8000
export WEAVIATE_URL=http://localhost:8080
export WEAVIATE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
---
## Configuration File
Environment variables can also be set in a `.env` file:
```bash
# .env file
ANTHROPIC_API_KEY=sk-ant-...
GITHUB_TOKEN=ghp_...
SKILL_SEEKERS_OUTPUT=./output
SKILL_SEEKERS_RATE_LIMIT=0.5
```
Load with:
```bash
# Automatically loaded if python-dotenv is installed
# Or manually:
export $(cat .env | xargs)
```
---
## Priority Order
Settings are applied in this order (later overrides earlier):
1. Default values
2. Environment variables
3. Configuration file
4. Command-line flags
Example:
```bash
# Default: rate_limit = 0.5
export SKILL_SEEKERS_RATE_LIMIT=1.0 # Env var overrides default
# Config file: rate_limit = 0.2 # Config overrides env
skill-seekers scrape --rate-limit 2.0 # Flag overrides all
```
---
## Security Best Practices
### Never commit API keys
```bash
# Add to .gitignore
echo ".env" >> .gitignore
echo "*.key" >> .gitignore
```
### Use secret management
```bash
# macOS Keychain
export ANTHROPIC_API_KEY=$(security find-generic-password -s "anthropic-api" -w)
# Linux Secret Service (with secret-tool)
export ANTHROPIC_API_KEY=$(secret-tool lookup service anthropic)
# 1Password CLI
export ANTHROPIC_API_KEY=$(op read "op://vault/anthropic/credential")
```
### File permissions
```bash
# Restrict .env file
chmod 600 .env
```
---
## Troubleshooting
### Variable not recognized
```bash
# Check if set
echo $ANTHROPIC_API_KEY
# Check in Python
python -c "import os; print(os.getenv('ANTHROPIC_API_KEY'))"
```
### Priority issues
```bash
# See effective configuration
skill-seekers config --show
```
### Path expansion
```bash
# Use full path or expand tilde
export SKILL_SEEKERS_HOME=$HOME/.skill-seekers
# NOT: ~/.skill-seekers (may not expand in all shells)
```
---
## See Also
- [CLI Reference](CLI_REFERENCE.md) - Command reference
- [Config Format](CONFIG_FORMAT.md) - JSON configuration
---
*For platform-specific setup, see [Installation Guide](../getting-started/01-installation.md)*

File diff suppressed because it is too large Load Diff