docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations

Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-22 01:01:51 +03:00
parent 22bdd4f5f6
commit ba9a8ff8b5
69 changed files with 31304 additions and 246 deletions

View File

@@ -0,0 +1,400 @@
# Custom Workflows Guide
> **Skill Seekers v3.1.0**
> **Create custom AI enhancement workflows**
---
## What are Custom Workflows?
Workflows are YAML-defined, multi-stage AI enhancement pipelines:
```yaml
my-workflow.yaml
├── name
├── description
├── variables (optional)
└── stages (1-10)
├── name
├── type (builtin/custom)
├── target (skill_md/references/)
├── prompt
└── uses_history (optional)
```
---
## Basic Workflow Structure
```yaml
name: my-custom
description: Custom enhancement workflow
stages:
- name: stage-one
type: builtin
target: skill_md
prompt: |
Improve the SKILL.md by adding...
- name: stage-two
type: custom
target: references
prompt: |
Enhance the references by...
```
---
## Workflow Fields
### Top Level
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Workflow identifier |
| `description` | No | Human-readable description |
| `variables` | No | Configurable variables |
| `stages` | Yes | Array of stage definitions |
### Stage Fields
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Stage identifier |
| `type` | Yes | `builtin` or `custom` |
| `target` | Yes | `skill_md` or `references` |
| `prompt` | Yes | AI prompt text |
| `uses_history` | No | Access previous stage results |
---
## Creating Your First Workflow
### Example: Performance Analysis
```yaml
# performance.yaml
name: performance-focus
description: Analyze and document performance characteristics
variables:
target_latency: "100ms"
target_throughput: "1000 req/s"
stages:
- name: performance-overview
type: builtin
target: skill_md
prompt: |
Add a "Performance" section to SKILL.md covering:
- Benchmark results
- Performance characteristics
- Resource requirements
- name: optimization-guide
type: custom
target: references
uses_history: true
prompt: |
Create an optimization guide with:
- Target latency: {target_latency}
- Target throughput: {target_throughput}
- Common bottlenecks
- Optimization techniques
```
### Install and Use
```bash
# Add workflow
skill-seekers workflows add performance.yaml
# Use it
skill-seekers create <source> --enhance-workflow performance-focus
# With custom variables
skill-seekers create <source> \
--enhance-workflow performance-focus \
--var target_latency=50ms \
--var target_throughput=5000req/s
```
---
## Stage Types
### builtin
Uses built-in enhancement logic:
```yaml
stages:
- name: structure-improvement
type: builtin
target: skill_md
prompt: "Improve document structure"
```
### custom
Full custom prompt control:
```yaml
stages:
- name: custom-analysis
type: custom
target: skill_md
prompt: |
Your detailed custom prompt here...
Can use {variables} and {history}
```
---
## Targets
### skill_md
Enhances the main SKILL.md file:
```yaml
stages:
- name: improve-skill
target: skill_md
prompt: "Add comprehensive overview section"
```
### references
Enhances reference files:
```yaml
stages:
- name: improve-refs
target: references
prompt: "Add cross-references between files"
```
---
## Variables
### Defining Variables
```yaml
variables:
audience: "beginners"
focus_area: "security"
include_examples: true
```
### Using Variables
```yaml
stages:
- name: customize
prompt: |
Tailor content for {audience}.
Focus on {focus_area}.
Include examples: {include_examples}
```
### Overriding at Runtime
```bash
skill-seekers create <source> \
--enhance-workflow my-workflow \
--var audience=experts \
--var focus_area=performance
```
---
## History Passing
Access results from previous stages:
```yaml
stages:
- name: analyze
type: custom
target: skill_md
prompt: "Analyze security features"
- name: document
type: custom
target: skill_md
uses_history: true
prompt: |
Based on previous analysis:
{previous_results}
Create documentation...
```
---
## Advanced Example: Security Review
```yaml
name: comprehensive-security
description: Multi-stage security analysis
variables:
compliance_framework: "OWASP Top 10"
risk_level: "high"
stages:
- name: asset-inventory
type: builtin
target: skill_md
prompt: |
Document all security-sensitive components:
- Authentication mechanisms
- Authorization checks
- Data validation
- Encryption usage
- name: threat-analysis
type: custom
target: skill_md
uses_history: true
prompt: |
Based on assets: {all_history}
Analyze threats for {compliance_framework}:
- Threat vectors
- Attack scenarios
- Risk ratings ({risk_level} focus)
- name: mitigation-guide
type: custom
target: references
uses_history: true
prompt: |
Create mitigation guide:
- Countermeasures
- Best practices
- Code examples
- Testing strategies
```
---
## Validation
### Validate Before Installing
```bash
skill-seekers workflows validate ./my-workflow.yaml
```
### Common Errors
| Error | Cause | Fix |
|-------|-------|-----|
| `Missing 'stages'` | No stages array | Add stages: |
| `Invalid type` | Not builtin/custom | Check type field |
| `Undefined variable` | Used but not defined | Add to variables: |
---
## Best Practices
### 1. Start Simple
```yaml
# Start with 1-2 stages
name: simple
description: Simple workflow
stages:
- name: improve
type: builtin
target: skill_md
prompt: "Improve SKILL.md"
```
### 2. Use Clear Stage Names
```yaml
# Good
stages:
- name: security-overview
- name: vulnerability-analysis
# Bad
stages:
- name: stage1
- name: step2
```
### 3. Document Variables
```yaml
variables:
# Target audience level: beginner, intermediate, expert
audience: "intermediate"
# Security focus area: owasp, pci, hipaa
compliance: "owasp"
```
### 4. Test Incrementally
```bash
# Test with dry run
skill-seekers create <source> \
--enhance-workflow my-workflow \
--workflow-dry-run
# Then actually run
skill-seekers create <source> \
--enhance-workflow my-workflow
```
### 5. Chain for Complex Analysis
```bash
# Use multiple workflows
skill-seekers create <source> \
--enhance-workflow security-focus \
--enhance-workflow performance-focus
```
---
## Sharing Workflows
### Export Workflow
```bash
# Get workflow content
skill-seekers workflows show my-workflow > my-workflow.yaml
```
### Share with Team
```bash
# Add to version control
git add my-workflow.yaml
git commit -m "Add custom security workflow"
# Team members install
skill-seekers workflows add my-workflow.yaml
```
### Publish
Submit to Skill Seekers community:
- GitHub Discussions
- Skill Seekers website
- Documentation contributions
---
## See Also
- [Workflows Guide](../user-guide/05-workflows.md) - Using workflows
- [MCP Reference](../reference/MCP_REFERENCE.md) - Workflows via MCP
- [Enhancement Guide](../user-guide/03-enhancement.md) - Enhancement fundamentals

322
docs/advanced/mcp-server.md Normal file
View File

@@ -0,0 +1,322 @@
# MCP Server Setup Guide
> **Skill Seekers v3.1.0**
> **Integrate with AI agents via Model Context Protocol**
---
## What is MCP?
MCP (Model Context Protocol) lets AI agents like Claude Code control Skill Seekers through natural language:
```
You: "Scrape the React documentation"
Claude: ▶️ scrape_docs({"url": "https://react.dev/"})
✅ Done! Created output/react/
```
---
## Installation
```bash
# Install with MCP support
pip install skill-seekers[mcp]
# Verify
skill-seekers-mcp --version
```
---
## Transport Modes
### stdio Mode (Default)
For Claude Code, VS Code + Cline:
```bash
skill-seekers-mcp
```
**Use when:**
- Running in Claude Code
- Direct integration with terminal-based agents
- Simple local setup
---
### HTTP Mode
For Cursor, Windsurf, HTTP clients:
```bash
# Start HTTP server
skill-seekers-mcp --transport http --port 8765
# Custom host
skill-seekers-mcp --transport http --host 0.0.0.0 --port 8765
```
**Use when:**
- IDE integration (Cursor, Windsurf)
- Remote access needed
- Multiple clients
---
## Claude Code Integration
### Automatic Setup
```bash
# In Claude Code, run:
/claude add-mcp-server skill-seekers
```
Or manually add to `~/.claude/mcp.json`:
```json
{
"mcpServers": {
"skill-seekers": {
"command": "skill-seekers-mcp",
"env": {
"ANTHROPIC_API_KEY": "sk-ant-...",
"GITHUB_TOKEN": "ghp_..."
}
}
}
}
```
### Usage
Once connected, ask Claude:
```
"List available configs"
"Scrape the Django documentation"
"Package output/react for Gemini"
"Enhance output/my-skill with security-focus workflow"
```
---
## Cursor IDE Integration
### Setup
1. Start MCP server:
```bash
skill-seekers-mcp --transport http --port 8765
```
2. In Cursor Settings → MCP:
- Name: `skill-seekers`
- URL: `http://localhost:8765`
### Usage
In Cursor chat:
```
"Create a skill from the current project"
"Analyze this codebase and generate a cursorrules file"
```
---
## Windsurf Integration
### Setup
1. Start MCP server:
```bash
skill-seekers-mcp --transport http --port 8765
```
2. In Windsurf Settings:
- Add MCP server endpoint: `http://localhost:8765`
---
## Available Tools
26 tools organized by category:
### Core Tools (9)
- `list_configs` - List presets
- `generate_config` - Create config from URL
- `validate_config` - Check config
- `estimate_pages` - Page estimation
- `scrape_docs` - Scrape documentation
- `package_skill` - Package skill
- `upload_skill` - Upload to platform
- `enhance_skill` - AI enhancement
- `install_skill` - Complete workflow
### Extended Tools (9)
- `scrape_github` - GitHub repo
- `scrape_pdf` - PDF extraction
- `scrape_codebase` - Local code
- `unified_scrape` - Multi-source
- `detect_patterns` - Pattern detection
- `extract_test_examples` - Test examples
- `build_how_to_guides` - How-to guides
- `extract_config_patterns` - Config patterns
- `detect_conflicts` - Doc/code conflicts
### Config Sources (5)
- `add_config_source` - Register git source
- `list_config_sources` - List sources
- `remove_config_source` - Remove source
- `fetch_config` - Fetch configs
- `submit_config` - Submit configs
### Vector DB (4)
- `export_to_weaviate`
- `export_to_chroma`
- `export_to_faiss`
- `export_to_qdrant`
See [MCP Reference](../reference/MCP_REFERENCE.md) for full details.
---
## Common Workflows
### Workflow 1: Documentation Skill
```
User: "Create a skill from React docs"
Claude: ▶️ scrape_docs({"url": "https://react.dev/"})
⏳ Scraping...
✅ Created output/react/
▶️ package_skill({"skill_directory": "output/react/", "target": "claude"})
✅ Created output/react-claude.zip
Skill ready! Upload to Claude?
```
### Workflow 2: GitHub Analysis
```
User: "Analyze the facebook/react repo"
Claude: ▶️ scrape_github({"repo": "facebook/react"})
⏳ Analyzing...
✅ Created output/react/
▶️ enhance_skill({"skill_directory": "output/react/", "workflow": "architecture-comprehensive"})
✅ Enhanced with architecture analysis
```
### Workflow 3: Multi-Platform Export
```
User: "Create Django skill for all platforms"
Claude: ▶️ scrape_docs({"config": "django"})
✅ Created output/django/
▶️ package_skill({"skill_directory": "output/django/", "target": "claude"})
▶️ package_skill({"skill_directory": "output/django/", "target": "gemini"})
▶️ package_skill({"skill_directory": "output/django/", "target": "openai"})
✅ Created packages for all platforms
```
---
## Configuration
### Environment Variables
Set in `~/.claude/mcp.json` or before starting server:
```bash
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=AIza...
export OPENAI_API_KEY=sk-...
export GITHUB_TOKEN=ghp_...
```
### Server Options
```bash
# Debug mode
skill-seekers-mcp --verbose
# Custom port
skill-seekers-mcp --port 8080
# Allow all origins (CORS)
skill-seekers-mcp --cors
```
---
## Security
### Local Only (stdio)
```bash
# Only accessible by local Claude Code
skill-seekers-mcp
```
### HTTP with Auth
```bash
# Use reverse proxy with auth
# nginx, traefik, etc.
```
### API Key Protection
```bash
# Don't hardcode keys
# Use environment variables
# Or secret management
```
---
## Troubleshooting
### "Server not found"
```bash
# Check if running
curl http://localhost:8765/health
# Restart
skill-seekers-mcp --transport http --port 8765
```
### "Tool not available"
```bash
# Check version
skill-seekers-mcp --version
# Update
pip install --upgrade skill-seekers[mcp]
```
### "Connection refused"
```bash
# Check port
lsof -i :8765
# Use different port
skill-seekers-mcp --port 8766
```
---
## See Also
- [MCP Reference](../reference/MCP_REFERENCE.md) - Complete tool reference
- [MCP Tools Deep Dive](mcp-tools.md) - Advanced usage
- [MCP Protocol](https://modelcontextprotocol.io/) - Official MCP docs

View File

@@ -0,0 +1,439 @@
# Multi-Source Scraping Guide
> **Skill Seekers v3.1.0**
> **Combine documentation, code, and PDFs into one skill**
---
## What is Multi-Source Scraping?
Combine multiple sources into a single, comprehensive skill:
```
┌──────────────┐
│ Documentation │──┐
│ (Web docs) │ │
└──────────────┘ │
┌──────────────┐ │ ┌──────────────────┐
│ GitHub Repo │──┼────▶│ Unified Skill │
│ (Source code)│ │ │ (Single source │
└──────────────┘ │ │ of truth) │
│ └──────────────────┘
┌──────────────┐ │
│ PDF Manual │──┘
│ (Reference) │
└──────────────┘
```
---
## When to Use Multi-Source
### Use Cases
| Scenario | Sources | Benefit |
|----------|---------|---------|
| Framework + Examples | Docs + GitHub repo | Theory + practice |
| Product + API | Docs + OpenAPI spec | Usage + reference |
| Legacy + Current | PDF + Web docs | Complete history |
| Internal + External | Local code + Public docs | Full context |
### Benefits
- **Single source of truth** - One skill with all context
- **Conflict detection** - Find doc/code discrepancies
- **Cross-references** - Link between sources
- **Comprehensive** - No gaps in knowledge
---
## Creating Unified Configs
### Basic Structure
```json
{
"name": "my-framework-complete",
"description": "Complete documentation and code",
"merge_mode": "claude-enhanced",
"sources": [
{
"type": "docs",
"name": "documentation",
"base_url": "https://docs.example.com/"
},
{
"type": "github",
"name": "source-code",
"repo": "owner/repo"
}
]
}
```
---
## Source Types
### 1. Documentation
```json
{
"type": "docs",
"name": "official-docs",
"base_url": "https://docs.framework.com/",
"max_pages": 500,
"categories": {
"getting_started": ["intro", "quickstart"],
"api": ["reference", "api"]
}
}
```
### 2. GitHub Repository
```json
{
"type": "github",
"name": "source-code",
"repo": "facebook/react",
"fetch_issues": true,
"max_issues": 100,
"enable_codebase_analysis": true
}
```
### 3. PDF Document
```json
{
"type": "pdf",
"name": "legacy-manual",
"pdf_path": "docs/legacy-manual.pdf",
"enable_ocr": false
}
```
### 4. Local Codebase
```json
{
"type": "local",
"name": "internal-tools",
"directory": "./internal-lib",
"languages": ["Python", "JavaScript"]
}
```
---
## Complete Example
### React Complete Skill
```json
{
"name": "react-complete",
"description": "React - docs, source, and guides",
"merge_mode": "claude-enhanced",
"sources": [
{
"type": "docs",
"name": "react-docs",
"base_url": "https://react.dev/",
"max_pages": 300,
"categories": {
"getting_started": ["learn", "tutorial"],
"api": ["reference", "hooks"],
"advanced": ["concurrent", "suspense"]
}
},
{
"type": "github",
"name": "react-source",
"repo": "facebook/react",
"fetch_issues": true,
"max_issues": 50,
"enable_codebase_analysis": true,
"code_analysis_depth": "deep"
},
{
"type": "pdf",
"name": "react-patterns",
"pdf_path": "downloads/react-patterns.pdf"
}
],
"conflict_detection": {
"enabled": true,
"rules": [
{
"field": "api_signature",
"action": "flag_mismatch"
},
{
"field": "version",
"action": "warn_outdated"
}
]
},
"output_structure": {
"group_by_source": false,
"cross_reference": true
}
}
```
---
## Running Unified Scraping
### Basic Command
```bash
skill-seekers unified --config react-complete.json
```
### With Options
```bash
# Fresh start (ignore cache)
skill-seekers unified --config react-complete.json --fresh
# Dry run
skill-seekers unified --config react-complete.json --dry-run
# Rule-based merging
skill-seekers unified --config react-complete.json --merge-mode rule-based
```
---
## Merge Modes
### claude-enhanced (Default)
Uses AI to intelligently merge sources:
- Detects relationships between content
- Resolves conflicts intelligently
- Creates cross-references
- Best quality, slower
```bash
skill-seekers unified --config my-config.json --merge-mode claude-enhanced
```
### rule-based
Uses defined rules for merging:
- Faster
- Deterministic
- Less sophisticated
```bash
skill-seekers unified --config my-config.json --merge-mode rule-based
```
---
## Conflict Detection
### Automatic Detection
Finds discrepancies between sources:
```json
{
"conflict_detection": {
"enabled": true,
"rules": [
{
"field": "api_signature",
"action": "flag_mismatch"
},
{
"field": "version",
"action": "warn_outdated"
},
{
"field": "deprecation",
"action": "highlight"
}
]
}
}
```
### Conflict Report
After scraping, check for conflicts:
```bash
# Conflicts are reported in output
ls output/react-complete/conflicts.json
# Or use MCP tool
detect_conflicts({
"docs_source": "output/react-docs",
"code_source": "output/react-source"
})
```
---
## Output Structure
### Merged Output
```
output/react-complete/
├── SKILL.md # Combined skill
├── references/
│ ├── index.md # Master index
│ ├── getting_started.md # From docs
│ ├── api_reference.md # From docs
│ ├── source_overview.md # From GitHub
│ ├── code_examples.md # From GitHub
│ └── patterns.md # From PDF
├── .skill-seekers/
│ ├── manifest.json # Metadata
│ ├── sources.json # Source list
│ └── conflicts.json # Detected conflicts
└── cross-references.json # Links between sources
```
---
## Best Practices
### 1. Name Sources Clearly
```json
{
"sources": [
{"type": "docs", "name": "official-docs"},
{"type": "github", "name": "source-code"},
{"type": "pdf", "name": "legacy-reference"}
]
}
```
### 2. Limit Source Scope
```json
{
"type": "github",
"name": "core-source",
"repo": "owner/repo",
"file_patterns": ["src/**/*.py"], // Only core files
"exclude_patterns": ["tests/**", "docs/**"]
}
```
### 3. Enable Conflict Detection
```json
{
"conflict_detection": {
"enabled": true
}
}
```
### 4. Use Appropriate Merge Mode
- **claude-enhanced** - Best quality, for important skills
- **rule-based** - Faster, for testing or large datasets
### 5. Test Incrementally
```bash
# Test with one source first
skill-seekers create <source1>
# Then add sources
skill-seekers unified --config my-config.json --dry-run
```
---
## Troubleshooting
### "Source not found"
```bash
# Check all sources exist
curl -I https://docs.example.com/
ls downloads/manual.pdf
```
### "Merge conflicts"
```bash
# Check conflicts report
cat output/my-skill/conflicts.json
# Adjust merge_mode
skill-seekers unified --config my-config.json --merge-mode rule-based
```
### "Out of memory"
```bash
# Process sources separately
# Then merge manually
```
---
## Examples
### Framework + Examples
```json
{
"name": "django-complete",
"sources": [
{"type": "docs", "base_url": "https://docs.djangoproject.com/"},
{"type": "github", "repo": "django/django", "fetch_issues": false}
]
}
```
### API + Documentation
```json
{
"name": "stripe-complete",
"sources": [
{"type": "docs", "base_url": "https://stripe.com/docs"},
{"type": "pdf", "pdf_path": "stripe-api-reference.pdf"}
]
}
```
### Legacy + Current
```json
{
"name": "product-docs",
"sources": [
{"type": "docs", "base_url": "https://docs.example.com/v2/"},
{"type": "pdf", "pdf_path": "v1-legacy-manual.pdf"}
]
}
```
---
## See Also
- [Config Format](../reference/CONFIG_FORMAT.md) - Full JSON specification
- [Scraping Guide](../user-guide/02-scraping.md) - Individual source options
- [MCP Reference](../reference/MCP_REFERENCE.md) - unified_scrape tool