Files
skill-seekers-reference/docs/user-guide/01-core-concepts.md
yusyus ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:01:51 +03:00

433 lines
9.2 KiB
Markdown

# Core Concepts
> **Skill Seekers v3.1.0**
> **Understanding how Skill Seekers works**
---
## Overview
Skill Seekers transforms documentation, code, and content into **structured knowledge assets** that AI systems can use effectively.
```
Raw Content → Skill Seekers → AI-Ready Skill
↓ ↓
(docs, code, (SKILL.md +
PDFs, repos) references)
```
---
## What is a Skill?
A **skill** is a structured knowledge package containing:
```
output/my-skill/
├── SKILL.md # Main file (400+ lines typically)
├── references/ # Categorized content
│ ├── index.md # Navigation
│ ├── getting_started.md
│ ├── api_reference.md
│ └── ...
├── .skill-seekers/ # Metadata
└── assets/ # Images, downloads
```
### SKILL.md Structure
```markdown
# My Framework Skill
## Overview
Brief description of the framework...
## Quick Reference
Common commands and patterns...
## Categories
- [Getting Started](#getting-started)
- [API Reference](#api-reference)
- [Guides](#guides)
## Getting Started
### Installation
```bash
npm install my-framework
```
### First Steps
...
## API Reference
...
```
### Why This Structure?
| Element | Purpose |
|---------|---------|
| **Overview** | Quick context for AI |
| **Quick Reference** | Common patterns at a glance |
| **Categories** | Organized deep dives |
| **Code Examples** | Copy-paste ready snippets |
---
## Source Types
Skill Seekers works with four types of sources:
### 1. Documentation Websites
**What:** Web-based documentation (ReadTheDocs, Docusaurus, GitBook, etc.)
**Examples:**
- React docs (react.dev)
- Django docs (docs.djangoproject.com)
- Kubernetes docs (kubernetes.io)
**Command:**
```bash
skill-seekers create https://docs.example.com/
```
**Best for:**
- Framework documentation
- API references
- Tutorials and guides
---
### 2. GitHub Repositories
**What:** Source code repositories with analysis
**Extracts:**
- Code structure and APIs
- README and documentation
- Issues and discussions
- Releases and changelog
**Command:**
```bash
skill-seekers create owner/repo
skill-seekers github --repo owner/repo
```
**Best for:**
- Understanding codebases
- API implementation details
- Contributing guidelines
---
### 3. PDF Documents
**What:** PDF manuals, papers, documentation
**Handles:**
- Text extraction
- OCR for scanned PDFs
- Table extraction
- Image extraction
**Command:**
```bash
skill-seekers create manual.pdf
skill-seekers pdf --pdf manual.pdf
```
**Best for:**
- Product manuals
- Research papers
- Legacy documentation
---
### 4. Local Codebases
**What:** Your local projects and code
**Analyzes:**
- Source code structure
- Comments and docstrings
- Test files
- Configuration patterns
**Command:**
```bash
skill-seekers create ./my-project
skill-seekers analyze --directory ./my-project
```
**Best for:**
- Your own projects
- Internal tools
- Code review preparation
---
## The Workflow
### Phase 1: Ingest
```
┌─────────────┐ ┌──────────────┐
│ Source │────▶│ Scraper │
│ (URL/repo/ │ │ (extracts │
│ PDF/local) │ │ content) │
└─────────────┘ └──────────────┘
```
- Detects source type automatically
- Crawls and downloads content
- Respects rate limits
- Extracts text, code, metadata
---
### Phase 2: Structure
```
┌──────────────┐ ┌──────────────┐
│ Raw Data │────▶│ Builder │
│ (pages/files/│ │ (organizes │
│ commits) │ │ by category)│
└──────────────┘ └──────────────┘
```
- Categorizes content by topic
- Extracts code examples
- Builds navigation structure
- Creates reference files
---
### Phase 3: Enhance (Optional)
```
┌──────────────┐ ┌──────────────┐
│ SKILL.md │────▶│ Enhancer │
│ (basic) │ │ (AI improves │
│ │ │ quality) │
└──────────────┘ └──────────────┘
```
- AI reviews and improves content
- Adds examples and patterns
- Fixes formatting
- Enhances navigation
**Modes:**
- **API:** Uses Claude API (fast, costs ~$0.10-0.30)
- **LOCAL:** Uses Claude Code (free, requires Claude Code Max)
---
### Phase 4: Package
```
┌──────────────┐ ┌──────────────┐
│ Skill Dir │────▶│ Packager │
│ (structured │ │ (creates │
│ content) │ │ platform │
│ │ │ format) │
└──────────────┘ └──────────────┘
```
- Formats for target platform
- Creates archives (ZIP, tar.gz)
- Optimizes for size
- Validates structure
---
### Phase 5: Upload (Optional)
```
┌──────────────┐ ┌──────────────┐
│ Package │────▶│ Platform │
│ (.zip/.tar) │ │ (Claude/ │
│ │ │ Gemini/etc) │
└──────────────┘ └──────────────┘
```
- Uploads to target platform
- Configures settings
- Returns skill ID/URL
---
## Enhancement Levels
Control how much AI enhancement is applied:
| Level | What Happens | Use Case |
|-------|--------------|----------|
| **0** | No enhancement | Fast scraping, manual review |
| **1** | SKILL.md only | Basic improvement |
| **2** | + architecture/config | **Recommended** - good balance |
| **3** | Full enhancement | Maximum quality, takes longer |
**Default:** Level 2
```bash
# Skip enhancement (fastest)
skill-seekers create <source> --enhance-level 0
# Full enhancement (best quality)
skill-seekers create <source> --enhance-level 3
```
---
## Target Platforms
Package skills for different AI systems:
| Platform | Format | Use |
|----------|--------|-----|
| **Claude AI** | ZIP + YAML | Claude Code, Claude API |
| **Gemini** | tar.gz | Google Gemini |
| **OpenAI** | ZIP + Vector | ChatGPT, Assistants API |
| **LangChain** | Documents | RAG pipelines |
| **LlamaIndex** | TextNodes | Query engines |
| **ChromaDB** | Collection | Vector search |
| **Weaviate** | Objects | Vector database |
| **Cursor** | .cursorrules | IDE AI assistant |
| **Windsurf** | .windsurfrules | IDE AI assistant |
---
## Configuration
### Simple (Auto-Detect)
```bash
# Just provide the source
skill-seekers create https://docs.react.dev/
```
### Preset Configs
```bash
# Use predefined configuration
skill-seekers create --config react
```
**Available presets:** `react`, `vue`, `django`, `fastapi`, `godot`, etc.
### Custom Config
```bash
# Create custom config
cat > configs/my-docs.json << 'EOF'
{
"name": "my-docs",
"base_url": "https://docs.example.com/",
"max_pages": 200
}
EOF
skill-seekers create --config configs/my-docs.json
```
See [Config Format](../reference/CONFIG_FORMAT.md) for full specification.
---
## Multi-Source Skills
Combine multiple sources into one skill:
```bash
# Create unified config
cat > configs/my-project.json << 'EOF'
{
"name": "my-project",
"sources": [
{"type": "docs", "base_url": "https://docs.example.com/"},
{"type": "github", "repo": "owner/repo"},
{"type": "pdf", "pdf_path": "manual.pdf"}
]
}
EOF
# Run unified scraping
skill-seekers unified --config configs/my-project.json
```
**Benefits:**
- Single skill with complete context
- Automatic conflict detection
- Cross-referenced content
---
## Caching and Resumption
### How Caching Works
```
First scrape: Downloads all pages → saves to output/{name}_data/
Second scrape: Reuses cached data → fast rebuild
```
### Skip Scraping
```bash
# Use cached data, just rebuild
skill-seekers create --config react --skip-scrape
```
### Resume Interrupted Jobs
```bash
# List resumable jobs
skill-seekers resume --list
# Resume specific job
skill-seekers resume job-abc123
```
---
## Rate Limiting
Be respectful to servers:
```bash
# Default: 0.5 seconds between requests
skill-seekers create <source>
# Faster (for your own servers)
skill-seekers create <source> --rate-limit 0.1
# Slower (for rate-limited sites)
skill-seekers create <source> --rate-limit 2.0
```
**Why it matters:**
- Prevents being blocked
- Respects server resources
- Good citizenship
---
## Key Takeaways
1. **Skills are structured knowledge** - Not just raw text
2. **Auto-detection works** - Usually don't need custom configs
3. **Enhancement improves quality** - Level 2 is the sweet spot
4. **Package once, use everywhere** - Same skill, multiple platforms
5. **Cache saves time** - Rebuild without re-scraping
---
## Next Steps
- [Scraping Guide](02-scraping.md) - Deep dive into source options
- [Enhancement Guide](03-enhancement.md) - AI enhancement explained
- [Config Format](../reference/CONFIG_FORMAT.md) - Custom configurations