docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure: - New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps) - New docs/user-guide/ section (6 files: core concepts through troubleshooting) - New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE) - New docs/advanced/ section (custom-workflows, mcp-server, multi-source) - New docs/ARCHITECTURE.md - system architecture overview - Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/ Chinese (zh-CN) translations: - Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced) - GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml) - Translation sync checker script (scripts/check_translation_sync.sh) - Translation helper script (scripts/translate_doc.py) Content updates: - CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22 - README.md: updated with new doc structure links - AGENTS.md: updated agent documentation - docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config Analysis/planning artifacts (kept for reference): - DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md - FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md - CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
400
docs/advanced/custom-workflows.md
Normal file
400
docs/advanced/custom-workflows.md
Normal file
@@ -0,0 +1,400 @@
|
||||
# Custom Workflows Guide
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Create custom AI enhancement workflows**
|
||||
|
||||
---
|
||||
|
||||
## What are Custom Workflows?
|
||||
|
||||
Workflows are YAML-defined, multi-stage AI enhancement pipelines:
|
||||
|
||||
```yaml
|
||||
my-workflow.yaml
|
||||
├── name
|
||||
├── description
|
||||
├── variables (optional)
|
||||
└── stages (1-10)
|
||||
├── name
|
||||
├── type (builtin/custom)
|
||||
├── target (skill_md/references/)
|
||||
├── prompt
|
||||
└── uses_history (optional)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Basic Workflow Structure
|
||||
|
||||
```yaml
|
||||
name: my-custom
|
||||
description: Custom enhancement workflow
|
||||
|
||||
stages:
|
||||
- name: stage-one
|
||||
type: builtin
|
||||
target: skill_md
|
||||
prompt: |
|
||||
Improve the SKILL.md by adding...
|
||||
|
||||
- name: stage-two
|
||||
type: custom
|
||||
target: references
|
||||
prompt: |
|
||||
Enhance the references by...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Fields
|
||||
|
||||
### Top Level
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `name` | Yes | Workflow identifier |
|
||||
| `description` | No | Human-readable description |
|
||||
| `variables` | No | Configurable variables |
|
||||
| `stages` | Yes | Array of stage definitions |
|
||||
|
||||
### Stage Fields
|
||||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `name` | Yes | Stage identifier |
|
||||
| `type` | Yes | `builtin` or `custom` |
|
||||
| `target` | Yes | `skill_md` or `references` |
|
||||
| `prompt` | Yes | AI prompt text |
|
||||
| `uses_history` | No | Access previous stage results |
|
||||
|
||||
---
|
||||
|
||||
## Creating Your First Workflow
|
||||
|
||||
### Example: Performance Analysis
|
||||
|
||||
```yaml
|
||||
# performance.yaml
|
||||
name: performance-focus
|
||||
description: Analyze and document performance characteristics
|
||||
|
||||
variables:
|
||||
target_latency: "100ms"
|
||||
target_throughput: "1000 req/s"
|
||||
|
||||
stages:
|
||||
- name: performance-overview
|
||||
type: builtin
|
||||
target: skill_md
|
||||
prompt: |
|
||||
Add a "Performance" section to SKILL.md covering:
|
||||
- Benchmark results
|
||||
- Performance characteristics
|
||||
- Resource requirements
|
||||
|
||||
- name: optimization-guide
|
||||
type: custom
|
||||
target: references
|
||||
uses_history: true
|
||||
prompt: |
|
||||
Create an optimization guide with:
|
||||
- Target latency: {target_latency}
|
||||
- Target throughput: {target_throughput}
|
||||
- Common bottlenecks
|
||||
- Optimization techniques
|
||||
```
|
||||
|
||||
### Install and Use
|
||||
|
||||
```bash
|
||||
# Add workflow
|
||||
skill-seekers workflows add performance.yaml
|
||||
|
||||
# Use it
|
||||
skill-seekers create <source> --enhance-workflow performance-focus
|
||||
|
||||
# With custom variables
|
||||
skill-seekers create <source> \
|
||||
--enhance-workflow performance-focus \
|
||||
--var target_latency=50ms \
|
||||
--var target_throughput=5000req/s
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage Types
|
||||
|
||||
### builtin
|
||||
|
||||
Uses built-in enhancement logic:
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: structure-improvement
|
||||
type: builtin
|
||||
target: skill_md
|
||||
prompt: "Improve document structure"
|
||||
```
|
||||
|
||||
### custom
|
||||
|
||||
Full custom prompt control:
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: custom-analysis
|
||||
type: custom
|
||||
target: skill_md
|
||||
prompt: |
|
||||
Your detailed custom prompt here...
|
||||
Can use {variables} and {history}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Targets
|
||||
|
||||
### skill_md
|
||||
|
||||
Enhances the main SKILL.md file:
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: improve-skill
|
||||
target: skill_md
|
||||
prompt: "Add comprehensive overview section"
|
||||
```
|
||||
|
||||
### references
|
||||
|
||||
Enhances reference files:
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: improve-refs
|
||||
target: references
|
||||
prompt: "Add cross-references between files"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Variables
|
||||
|
||||
### Defining Variables
|
||||
|
||||
```yaml
|
||||
variables:
|
||||
audience: "beginners"
|
||||
focus_area: "security"
|
||||
include_examples: true
|
||||
```
|
||||
|
||||
### Using Variables
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: customize
|
||||
prompt: |
|
||||
Tailor content for {audience}.
|
||||
Focus on {focus_area}.
|
||||
Include examples: {include_examples}
|
||||
```
|
||||
|
||||
### Overriding at Runtime
|
||||
|
||||
```bash
|
||||
skill-seekers create <source> \
|
||||
--enhance-workflow my-workflow \
|
||||
--var audience=experts \
|
||||
--var focus_area=performance
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## History Passing
|
||||
|
||||
Access results from previous stages:
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: analyze
|
||||
type: custom
|
||||
target: skill_md
|
||||
prompt: "Analyze security features"
|
||||
|
||||
- name: document
|
||||
type: custom
|
||||
target: skill_md
|
||||
uses_history: true
|
||||
prompt: |
|
||||
Based on previous analysis:
|
||||
{previous_results}
|
||||
|
||||
Create documentation...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Example: Security Review
|
||||
|
||||
```yaml
|
||||
name: comprehensive-security
|
||||
description: Multi-stage security analysis
|
||||
|
||||
variables:
|
||||
compliance_framework: "OWASP Top 10"
|
||||
risk_level: "high"
|
||||
|
||||
stages:
|
||||
- name: asset-inventory
|
||||
type: builtin
|
||||
target: skill_md
|
||||
prompt: |
|
||||
Document all security-sensitive components:
|
||||
- Authentication mechanisms
|
||||
- Authorization checks
|
||||
- Data validation
|
||||
- Encryption usage
|
||||
|
||||
- name: threat-analysis
|
||||
type: custom
|
||||
target: skill_md
|
||||
uses_history: true
|
||||
prompt: |
|
||||
Based on assets: {all_history}
|
||||
|
||||
Analyze threats for {compliance_framework}:
|
||||
- Threat vectors
|
||||
- Attack scenarios
|
||||
- Risk ratings ({risk_level} focus)
|
||||
|
||||
- name: mitigation-guide
|
||||
type: custom
|
||||
target: references
|
||||
uses_history: true
|
||||
prompt: |
|
||||
Create mitigation guide:
|
||||
- Countermeasures
|
||||
- Best practices
|
||||
- Code examples
|
||||
- Testing strategies
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation
|
||||
|
||||
### Validate Before Installing
|
||||
|
||||
```bash
|
||||
skill-seekers workflows validate ./my-workflow.yaml
|
||||
```
|
||||
|
||||
### Common Errors
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `Missing 'stages'` | No stages array | Add stages: |
|
||||
| `Invalid type` | Not builtin/custom | Check type field |
|
||||
| `Undefined variable` | Used but not defined | Add to variables: |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Start Simple
|
||||
|
||||
```yaml
|
||||
# Start with 1-2 stages
|
||||
name: simple
|
||||
description: Simple workflow
|
||||
stages:
|
||||
- name: improve
|
||||
type: builtin
|
||||
target: skill_md
|
||||
prompt: "Improve SKILL.md"
|
||||
```
|
||||
|
||||
### 2. Use Clear Stage Names
|
||||
|
||||
```yaml
|
||||
# Good
|
||||
stages:
|
||||
- name: security-overview
|
||||
- name: vulnerability-analysis
|
||||
|
||||
# Bad
|
||||
stages:
|
||||
- name: stage1
|
||||
- name: step2
|
||||
```
|
||||
|
||||
### 3. Document Variables
|
||||
|
||||
```yaml
|
||||
variables:
|
||||
# Target audience level: beginner, intermediate, expert
|
||||
audience: "intermediate"
|
||||
|
||||
# Security focus area: owasp, pci, hipaa
|
||||
compliance: "owasp"
|
||||
```
|
||||
|
||||
### 4. Test Incrementally
|
||||
|
||||
```bash
|
||||
# Test with dry run
|
||||
skill-seekers create <source> \
|
||||
--enhance-workflow my-workflow \
|
||||
--workflow-dry-run
|
||||
|
||||
# Then actually run
|
||||
skill-seekers create <source> \
|
||||
--enhance-workflow my-workflow
|
||||
```
|
||||
|
||||
### 5. Chain for Complex Analysis
|
||||
|
||||
```bash
|
||||
# Use multiple workflows
|
||||
skill-seekers create <source> \
|
||||
--enhance-workflow security-focus \
|
||||
--enhance-workflow performance-focus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sharing Workflows
|
||||
|
||||
### Export Workflow
|
||||
|
||||
```bash
|
||||
# Get workflow content
|
||||
skill-seekers workflows show my-workflow > my-workflow.yaml
|
||||
```
|
||||
|
||||
### Share with Team
|
||||
|
||||
```bash
|
||||
# Add to version control
|
||||
git add my-workflow.yaml
|
||||
git commit -m "Add custom security workflow"
|
||||
|
||||
# Team members install
|
||||
skill-seekers workflows add my-workflow.yaml
|
||||
```
|
||||
|
||||
### Publish
|
||||
|
||||
Submit to Skill Seekers community:
|
||||
- GitHub Discussions
|
||||
- Skill Seekers website
|
||||
- Documentation contributions
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Workflows Guide](../user-guide/05-workflows.md) - Using workflows
|
||||
- [MCP Reference](../reference/MCP_REFERENCE.md) - Workflows via MCP
|
||||
- [Enhancement Guide](../user-guide/03-enhancement.md) - Enhancement fundamentals
|
||||
322
docs/advanced/mcp-server.md
Normal file
322
docs/advanced/mcp-server.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# MCP Server Setup Guide
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Integrate with AI agents via Model Context Protocol**
|
||||
|
||||
---
|
||||
|
||||
## What is MCP?
|
||||
|
||||
MCP (Model Context Protocol) lets AI agents like Claude Code control Skill Seekers through natural language:
|
||||
|
||||
```
|
||||
You: "Scrape the React documentation"
|
||||
Claude: ▶️ scrape_docs({"url": "https://react.dev/"})
|
||||
✅ Done! Created output/react/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Install with MCP support
|
||||
pip install skill-seekers[mcp]
|
||||
|
||||
# Verify
|
||||
skill-seekers-mcp --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Transport Modes
|
||||
|
||||
### stdio Mode (Default)
|
||||
|
||||
For Claude Code, VS Code + Cline:
|
||||
|
||||
```bash
|
||||
skill-seekers-mcp
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Running in Claude Code
|
||||
- Direct integration with terminal-based agents
|
||||
- Simple local setup
|
||||
|
||||
---
|
||||
|
||||
### HTTP Mode
|
||||
|
||||
For Cursor, Windsurf, HTTP clients:
|
||||
|
||||
```bash
|
||||
# Start HTTP server
|
||||
skill-seekers-mcp --transport http --port 8765
|
||||
|
||||
# Custom host
|
||||
skill-seekers-mcp --transport http --host 0.0.0.0 --port 8765
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- IDE integration (Cursor, Windsurf)
|
||||
- Remote access needed
|
||||
- Multiple clients
|
||||
|
||||
---
|
||||
|
||||
## Claude Code Integration
|
||||
|
||||
### Automatic Setup
|
||||
|
||||
```bash
|
||||
# In Claude Code, run:
|
||||
/claude add-mcp-server skill-seekers
|
||||
```
|
||||
|
||||
Or manually add to `~/.claude/mcp.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"skill-seekers": {
|
||||
"command": "skill-seekers-mcp",
|
||||
"env": {
|
||||
"ANTHROPIC_API_KEY": "sk-ant-...",
|
||||
"GITHUB_TOKEN": "ghp_..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
Once connected, ask Claude:
|
||||
|
||||
```
|
||||
"List available configs"
|
||||
"Scrape the Django documentation"
|
||||
"Package output/react for Gemini"
|
||||
"Enhance output/my-skill with security-focus workflow"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cursor IDE Integration
|
||||
|
||||
### Setup
|
||||
|
||||
1. Start MCP server:
|
||||
```bash
|
||||
skill-seekers-mcp --transport http --port 8765
|
||||
```
|
||||
|
||||
2. In Cursor Settings → MCP:
|
||||
- Name: `skill-seekers`
|
||||
- URL: `http://localhost:8765`
|
||||
|
||||
### Usage
|
||||
|
||||
In Cursor chat:
|
||||
|
||||
```
|
||||
"Create a skill from the current project"
|
||||
"Analyze this codebase and generate a cursorrules file"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Windsurf Integration
|
||||
|
||||
### Setup
|
||||
|
||||
1. Start MCP server:
|
||||
```bash
|
||||
skill-seekers-mcp --transport http --port 8765
|
||||
```
|
||||
|
||||
2. In Windsurf Settings:
|
||||
- Add MCP server endpoint: `http://localhost:8765`
|
||||
|
||||
---
|
||||
|
||||
## Available Tools
|
||||
|
||||
26 tools organized by category:
|
||||
|
||||
### Core Tools (9)
|
||||
- `list_configs` - List presets
|
||||
- `generate_config` - Create config from URL
|
||||
- `validate_config` - Check config
|
||||
- `estimate_pages` - Page estimation
|
||||
- `scrape_docs` - Scrape documentation
|
||||
- `package_skill` - Package skill
|
||||
- `upload_skill` - Upload to platform
|
||||
- `enhance_skill` - AI enhancement
|
||||
- `install_skill` - Complete workflow
|
||||
|
||||
### Extended Tools (9)
|
||||
- `scrape_github` - GitHub repo
|
||||
- `scrape_pdf` - PDF extraction
|
||||
- `scrape_codebase` - Local code
|
||||
- `unified_scrape` - Multi-source
|
||||
- `detect_patterns` - Pattern detection
|
||||
- `extract_test_examples` - Test examples
|
||||
- `build_how_to_guides` - How-to guides
|
||||
- `extract_config_patterns` - Config patterns
|
||||
- `detect_conflicts` - Doc/code conflicts
|
||||
|
||||
### Config Sources (5)
|
||||
- `add_config_source` - Register git source
|
||||
- `list_config_sources` - List sources
|
||||
- `remove_config_source` - Remove source
|
||||
- `fetch_config` - Fetch configs
|
||||
- `submit_config` - Submit configs
|
||||
|
||||
### Vector DB (4)
|
||||
- `export_to_weaviate`
|
||||
- `export_to_chroma`
|
||||
- `export_to_faiss`
|
||||
- `export_to_qdrant`
|
||||
|
||||
See [MCP Reference](../reference/MCP_REFERENCE.md) for full details.
|
||||
|
||||
---
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Workflow 1: Documentation Skill
|
||||
|
||||
```
|
||||
User: "Create a skill from React docs"
|
||||
Claude: ▶️ scrape_docs({"url": "https://react.dev/"})
|
||||
⏳ Scraping...
|
||||
✅ Created output/react/
|
||||
|
||||
▶️ package_skill({"skill_directory": "output/react/", "target": "claude"})
|
||||
✅ Created output/react-claude.zip
|
||||
|
||||
Skill ready! Upload to Claude?
|
||||
```
|
||||
|
||||
### Workflow 2: GitHub Analysis
|
||||
|
||||
```
|
||||
User: "Analyze the facebook/react repo"
|
||||
Claude: ▶️ scrape_github({"repo": "facebook/react"})
|
||||
⏳ Analyzing...
|
||||
✅ Created output/react/
|
||||
|
||||
▶️ enhance_skill({"skill_directory": "output/react/", "workflow": "architecture-comprehensive"})
|
||||
✅ Enhanced with architecture analysis
|
||||
```
|
||||
|
||||
### Workflow 3: Multi-Platform Export
|
||||
|
||||
```
|
||||
User: "Create Django skill for all platforms"
|
||||
Claude: ▶️ scrape_docs({"config": "django"})
|
||||
✅ Created output/django/
|
||||
|
||||
▶️ package_skill({"skill_directory": "output/django/", "target": "claude"})
|
||||
▶️ package_skill({"skill_directory": "output/django/", "target": "gemini"})
|
||||
▶️ package_skill({"skill_directory": "output/django/", "target": "openai"})
|
||||
✅ Created packages for all platforms
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Set in `~/.claude/mcp.json` or before starting server:
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
export GOOGLE_API_KEY=AIza...
|
||||
export OPENAI_API_KEY=sk-...
|
||||
export GITHUB_TOKEN=ghp_...
|
||||
```
|
||||
|
||||
### Server Options
|
||||
|
||||
```bash
|
||||
# Debug mode
|
||||
skill-seekers-mcp --verbose
|
||||
|
||||
# Custom port
|
||||
skill-seekers-mcp --port 8080
|
||||
|
||||
# Allow all origins (CORS)
|
||||
skill-seekers-mcp --cors
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### Local Only (stdio)
|
||||
|
||||
```bash
|
||||
# Only accessible by local Claude Code
|
||||
skill-seekers-mcp
|
||||
```
|
||||
|
||||
### HTTP with Auth
|
||||
|
||||
```bash
|
||||
# Use reverse proxy with auth
|
||||
# nginx, traefik, etc.
|
||||
```
|
||||
|
||||
### API Key Protection
|
||||
|
||||
```bash
|
||||
# Don't hardcode keys
|
||||
# Use environment variables
|
||||
# Or secret management
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Server not found"
|
||||
|
||||
```bash
|
||||
# Check if running
|
||||
curl http://localhost:8765/health
|
||||
|
||||
# Restart
|
||||
skill-seekers-mcp --transport http --port 8765
|
||||
```
|
||||
|
||||
### "Tool not available"
|
||||
|
||||
```bash
|
||||
# Check version
|
||||
skill-seekers-mcp --version
|
||||
|
||||
# Update
|
||||
pip install --upgrade skill-seekers[mcp]
|
||||
```
|
||||
|
||||
### "Connection refused"
|
||||
|
||||
```bash
|
||||
# Check port
|
||||
lsof -i :8765
|
||||
|
||||
# Use different port
|
||||
skill-seekers-mcp --port 8766
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [MCP Reference](../reference/MCP_REFERENCE.md) - Complete tool reference
|
||||
- [MCP Tools Deep Dive](mcp-tools.md) - Advanced usage
|
||||
- [MCP Protocol](https://modelcontextprotocol.io/) - Official MCP docs
|
||||
439
docs/advanced/multi-source.md
Normal file
439
docs/advanced/multi-source.md
Normal file
@@ -0,0 +1,439 @@
|
||||
# Multi-Source Scraping Guide
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Combine documentation, code, and PDFs into one skill**
|
||||
|
||||
---
|
||||
|
||||
## What is Multi-Source Scraping?
|
||||
|
||||
Combine multiple sources into a single, comprehensive skill:
|
||||
|
||||
```
|
||||
┌──────────────┐
|
||||
│ Documentation │──┐
|
||||
│ (Web docs) │ │
|
||||
└──────────────┘ │
|
||||
│
|
||||
┌──────────────┐ │ ┌──────────────────┐
|
||||
│ GitHub Repo │──┼────▶│ Unified Skill │
|
||||
│ (Source code)│ │ │ (Single source │
|
||||
└──────────────┘ │ │ of truth) │
|
||||
│ └──────────────────┘
|
||||
┌──────────────┐ │
|
||||
│ PDF Manual │──┘
|
||||
│ (Reference) │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to Use Multi-Source
|
||||
|
||||
### Use Cases
|
||||
|
||||
| Scenario | Sources | Benefit |
|
||||
|----------|---------|---------|
|
||||
| Framework + Examples | Docs + GitHub repo | Theory + practice |
|
||||
| Product + API | Docs + OpenAPI spec | Usage + reference |
|
||||
| Legacy + Current | PDF + Web docs | Complete history |
|
||||
| Internal + External | Local code + Public docs | Full context |
|
||||
|
||||
### Benefits
|
||||
|
||||
- **Single source of truth** - One skill with all context
|
||||
- **Conflict detection** - Find doc/code discrepancies
|
||||
- **Cross-references** - Link between sources
|
||||
- **Comprehensive** - No gaps in knowledge
|
||||
|
||||
---
|
||||
|
||||
## Creating Unified Configs
|
||||
|
||||
### Basic Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-framework-complete",
|
||||
"description": "Complete documentation and code",
|
||||
"merge_mode": "claude-enhanced",
|
||||
|
||||
"sources": [
|
||||
{
|
||||
"type": "docs",
|
||||
"name": "documentation",
|
||||
"base_url": "https://docs.example.com/"
|
||||
},
|
||||
{
|
||||
"type": "github",
|
||||
"name": "source-code",
|
||||
"repo": "owner/repo"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Source Types
|
||||
|
||||
### 1. Documentation
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "docs",
|
||||
"name": "official-docs",
|
||||
"base_url": "https://docs.framework.com/",
|
||||
"max_pages": 500,
|
||||
"categories": {
|
||||
"getting_started": ["intro", "quickstart"],
|
||||
"api": ["reference", "api"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. GitHub Repository
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "github",
|
||||
"name": "source-code",
|
||||
"repo": "facebook/react",
|
||||
"fetch_issues": true,
|
||||
"max_issues": 100,
|
||||
"enable_codebase_analysis": true
|
||||
}
|
||||
```
|
||||
|
||||
### 3. PDF Document
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "pdf",
|
||||
"name": "legacy-manual",
|
||||
"pdf_path": "docs/legacy-manual.pdf",
|
||||
"enable_ocr": false
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Local Codebase
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "local",
|
||||
"name": "internal-tools",
|
||||
"directory": "./internal-lib",
|
||||
"languages": ["Python", "JavaScript"]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Example
|
||||
|
||||
### React Complete Skill
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "react-complete",
|
||||
"description": "React - docs, source, and guides",
|
||||
"merge_mode": "claude-enhanced",
|
||||
|
||||
"sources": [
|
||||
{
|
||||
"type": "docs",
|
||||
"name": "react-docs",
|
||||
"base_url": "https://react.dev/",
|
||||
"max_pages": 300,
|
||||
"categories": {
|
||||
"getting_started": ["learn", "tutorial"],
|
||||
"api": ["reference", "hooks"],
|
||||
"advanced": ["concurrent", "suspense"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "github",
|
||||
"name": "react-source",
|
||||
"repo": "facebook/react",
|
||||
"fetch_issues": true,
|
||||
"max_issues": 50,
|
||||
"enable_codebase_analysis": true,
|
||||
"code_analysis_depth": "deep"
|
||||
},
|
||||
{
|
||||
"type": "pdf",
|
||||
"name": "react-patterns",
|
||||
"pdf_path": "downloads/react-patterns.pdf"
|
||||
}
|
||||
],
|
||||
|
||||
"conflict_detection": {
|
||||
"enabled": true,
|
||||
"rules": [
|
||||
{
|
||||
"field": "api_signature",
|
||||
"action": "flag_mismatch"
|
||||
},
|
||||
{
|
||||
"field": "version",
|
||||
"action": "warn_outdated"
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
"output_structure": {
|
||||
"group_by_source": false,
|
||||
"cross_reference": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Running Unified Scraping
|
||||
|
||||
### Basic Command
|
||||
|
||||
```bash
|
||||
skill-seekers unified --config react-complete.json
|
||||
```
|
||||
|
||||
### With Options
|
||||
|
||||
```bash
|
||||
# Fresh start (ignore cache)
|
||||
skill-seekers unified --config react-complete.json --fresh
|
||||
|
||||
# Dry run
|
||||
skill-seekers unified --config react-complete.json --dry-run
|
||||
|
||||
# Rule-based merging
|
||||
skill-seekers unified --config react-complete.json --merge-mode rule-based
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Merge Modes
|
||||
|
||||
### claude-enhanced (Default)
|
||||
|
||||
Uses AI to intelligently merge sources:
|
||||
|
||||
- Detects relationships between content
|
||||
- Resolves conflicts intelligently
|
||||
- Creates cross-references
|
||||
- Best quality, slower
|
||||
|
||||
```bash
|
||||
skill-seekers unified --config my-config.json --merge-mode claude-enhanced
|
||||
```
|
||||
|
||||
### rule-based
|
||||
|
||||
Uses defined rules for merging:
|
||||
|
||||
- Faster
|
||||
- Deterministic
|
||||
- Less sophisticated
|
||||
|
||||
```bash
|
||||
skill-seekers unified --config my-config.json --merge-mode rule-based
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conflict Detection
|
||||
|
||||
### Automatic Detection
|
||||
|
||||
Finds discrepancies between sources:
|
||||
|
||||
```json
|
||||
{
|
||||
"conflict_detection": {
|
||||
"enabled": true,
|
||||
"rules": [
|
||||
{
|
||||
"field": "api_signature",
|
||||
"action": "flag_mismatch"
|
||||
},
|
||||
{
|
||||
"field": "version",
|
||||
"action": "warn_outdated"
|
||||
},
|
||||
{
|
||||
"field": "deprecation",
|
||||
"action": "highlight"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Conflict Report
|
||||
|
||||
After scraping, check for conflicts:
|
||||
|
||||
```bash
|
||||
# Conflicts are reported in output
|
||||
ls output/react-complete/conflicts.json
|
||||
|
||||
# Or use MCP tool
|
||||
detect_conflicts({
|
||||
"docs_source": "output/react-docs",
|
||||
"code_source": "output/react-source"
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output Structure
|
||||
|
||||
### Merged Output
|
||||
|
||||
```
|
||||
output/react-complete/
|
||||
├── SKILL.md # Combined skill
|
||||
├── references/
|
||||
│ ├── index.md # Master index
|
||||
│ ├── getting_started.md # From docs
|
||||
│ ├── api_reference.md # From docs
|
||||
│ ├── source_overview.md # From GitHub
|
||||
│ ├── code_examples.md # From GitHub
|
||||
│ └── patterns.md # From PDF
|
||||
├── .skill-seekers/
|
||||
│ ├── manifest.json # Metadata
|
||||
│ ├── sources.json # Source list
|
||||
│ └── conflicts.json # Detected conflicts
|
||||
└── cross-references.json # Links between sources
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Name Sources Clearly
|
||||
|
||||
```json
|
||||
{
|
||||
"sources": [
|
||||
{"type": "docs", "name": "official-docs"},
|
||||
{"type": "github", "name": "source-code"},
|
||||
{"type": "pdf", "name": "legacy-reference"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Limit Source Scope
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "github",
|
||||
"name": "core-source",
|
||||
"repo": "owner/repo",
|
||||
"file_patterns": ["src/**/*.py"], // Only core files
|
||||
"exclude_patterns": ["tests/**", "docs/**"]
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Enable Conflict Detection
|
||||
|
||||
```json
|
||||
{
|
||||
"conflict_detection": {
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Use Appropriate Merge Mode
|
||||
|
||||
- **claude-enhanced** - Best quality, for important skills
|
||||
- **rule-based** - Faster, for testing or large datasets
|
||||
|
||||
### 5. Test Incrementally
|
||||
|
||||
```bash
|
||||
# Test with one source first
|
||||
skill-seekers create <source1>
|
||||
|
||||
# Then add sources
|
||||
skill-seekers unified --config my-config.json --dry-run
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Source not found"
|
||||
|
||||
```bash
|
||||
# Check all sources exist
|
||||
curl -I https://docs.example.com/
|
||||
ls downloads/manual.pdf
|
||||
```
|
||||
|
||||
### "Merge conflicts"
|
||||
|
||||
```bash
|
||||
# Check conflicts report
|
||||
cat output/my-skill/conflicts.json
|
||||
|
||||
# Adjust merge_mode
|
||||
skill-seekers unified --config my-config.json --merge-mode rule-based
|
||||
```
|
||||
|
||||
### "Out of memory"
|
||||
|
||||
```bash
|
||||
# Process sources separately
|
||||
# Then merge manually
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Framework + Examples
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "django-complete",
|
||||
"sources": [
|
||||
{"type": "docs", "base_url": "https://docs.djangoproject.com/"},
|
||||
{"type": "github", "repo": "django/django", "fetch_issues": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### API + Documentation
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "stripe-complete",
|
||||
"sources": [
|
||||
{"type": "docs", "base_url": "https://stripe.com/docs"},
|
||||
{"type": "pdf", "pdf_path": "stripe-api-reference.pdf"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Legacy + Current
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "product-docs",
|
||||
"sources": [
|
||||
{"type": "docs", "base_url": "https://docs.example.com/v2/"},
|
||||
{"type": "pdf", "pdf_path": "v1-legacy-manual.pdf"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [Config Format](../reference/CONFIG_FORMAT.md) - Full JSON specification
|
||||
- [Scraping Guide](../user-guide/02-scraping.md) - Individual source options
|
||||
- [MCP Reference](../reference/MCP_REFERENCE.md) - unified_scrape tool
|
||||
Reference in New Issue
Block a user