Files
skill-seekers-reference/docs/zh-CN/user-guide/01-core-concepts.md
yusyus ba9a8ff8b5 docs: complete documentation overhaul with v3.1.0 release notes and zh-CN translations
Documentation restructure:
- New docs/getting-started/ guide (4 files: install, quick-start, first-skill, next-steps)
- New docs/user-guide/ section (6 files: core concepts through troubleshooting)
- New docs/reference/ section (CLI_REFERENCE, CONFIG_FORMAT, ENVIRONMENT_VARIABLES, MCP_REFERENCE)
- New docs/advanced/ section (custom-workflows, mcp-server, multi-source)
- New docs/ARCHITECTURE.md - system architecture overview
- Archived legacy files (QUICKSTART.md, QUICK_REFERENCE.md, docs/guides/USAGE.md) to docs/archive/legacy/

Chinese (zh-CN) translations:
- Full zh-CN mirror of all user-facing docs (getting-started, user-guide, reference, advanced)
- GitHub Actions workflow for translation sync (.github/workflows/translate-docs.yml)
- Translation sync checker script (scripts/check_translation_sync.sh)
- Translation helper script (scripts/translate_doc.py)

Content updates:
- CHANGELOG.md: [Unreleased] → [3.1.0] - 2026-02-22
- README.md: updated with new doc structure links
- AGENTS.md: updated agent documentation
- docs/features/UNIFIED_SCRAPING.md: updated for unified scraper workflow JSON config

Analysis/planning artifacts (kept for reference):
- DOCUMENTATION_OVERHAUL_PLAN.md, DOCUMENTATION_OVERHAUL_SUMMARY.md
- FEATURE_GAP_ANALYSIS.md, IMPLEMENTATION_GAPS_ANALYSIS.md, CREATE_COMMAND_COVERAGE_ANALYSIS.md
- CHINESE_TRANSLATION_IMPLEMENTATION_SUMMARY.md, ISSUE_260_UPDATE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:01:51 +03:00

9.2 KiB

Core Concepts

Skill Seekers v3.1.0
Understanding how Skill Seekers works


Overview

Skill Seekers transforms documentation, code, and content into structured knowledge assets that AI systems can use effectively.

Raw Content → Skill Seekers → AI-Ready Skill
     ↓                              ↓
  (docs, code,               (SKILL.md +
   PDFs, repos)                references)

What is a Skill?

A skill is a structured knowledge package containing:

output/my-skill/
├── SKILL.md              # Main file (400+ lines typically)
├── references/           # Categorized content
│   ├── index.md         # Navigation
│   ├── getting_started.md
│   ├── api_reference.md
│   └── ...
├── .skill-seekers/      # Metadata
└── assets/              # Images, downloads

SKILL.md Structure

# My Framework Skill

## Overview
Brief description of the framework...

## Quick Reference
Common commands and patterns...

## Categories
- [Getting Started](#getting-started)
- [API Reference](#api-reference)
- [Guides](#guides)

## Getting Started
### Installation
```bash
npm install my-framework

First Steps

...

API Reference

...


### Why This Structure?

| Element | Purpose |
|---------|---------|
| **Overview** | Quick context for AI |
| **Quick Reference** | Common patterns at a glance |
| **Categories** | Organized deep dives |
| **Code Examples** | Copy-paste ready snippets |

---

## Source Types

Skill Seekers works with four types of sources:

### 1. Documentation Websites

**What:** Web-based documentation (ReadTheDocs, Docusaurus, GitBook, etc.)

**Examples:**
- React docs (react.dev)
- Django docs (docs.djangoproject.com)
- Kubernetes docs (kubernetes.io)

**Command:**
```bash
skill-seekers create https://docs.example.com/

Best for:

  • Framework documentation
  • API references
  • Tutorials and guides

2. GitHub Repositories

What: Source code repositories with analysis

Extracts:

  • Code structure and APIs
  • README and documentation
  • Issues and discussions
  • Releases and changelog

Command:

skill-seekers create owner/repo
skill-seekers github --repo owner/repo

Best for:

  • Understanding codebases
  • API implementation details
  • Contributing guidelines

3. PDF Documents

What: PDF manuals, papers, documentation

Handles:

  • Text extraction
  • OCR for scanned PDFs
  • Table extraction
  • Image extraction

Command:

skill-seekers create manual.pdf
skill-seekers pdf --pdf manual.pdf

Best for:

  • Product manuals
  • Research papers
  • Legacy documentation

4. Local Codebases

What: Your local projects and code

Analyzes:

  • Source code structure
  • Comments and docstrings
  • Test files
  • Configuration patterns

Command:

skill-seekers create ./my-project
skill-seekers analyze --directory ./my-project

Best for:

  • Your own projects
  • Internal tools
  • Code review preparation

The Workflow

Phase 1: Ingest

┌─────────────┐     ┌──────────────┐
│   Source    │────▶│   Scraper    │
│ (URL/repo/  │     │ (extracts    │
│  PDF/local) │     │  content)    │
└─────────────┘     └──────────────┘
  • Detects source type automatically
  • Crawls and downloads content
  • Respects rate limits
  • Extracts text, code, metadata

Phase 2: Structure

┌──────────────┐     ┌──────────────┐
│   Raw Data   │────▶│   Builder    │
│ (pages/files/│     │ (organizes   │
│  commits)    │     │  by category)│
└──────────────┘     └──────────────┘
  • Categorizes content by topic
  • Extracts code examples
  • Builds navigation structure
  • Creates reference files

Phase 3: Enhance (Optional)

┌──────────────┐     ┌──────────────┐
│   SKILL.md   │────▶│  Enhancer    │
│  (basic)     │     │ (AI improves │
│              │     │  quality)    │
└──────────────┘     └──────────────┘
  • AI reviews and improves content
  • Adds examples and patterns
  • Fixes formatting
  • Enhances navigation

Modes:

  • API: Uses Claude API (fast, costs ~$0.10-0.30)
  • LOCAL: Uses Claude Code (free, requires Claude Code Max)

Phase 4: Package

┌──────────────┐     ┌──────────────┐
│   Skill Dir  │────▶│   Packager   │
│ (structured  │     │ (creates     │
│  content)    │     │  platform    │
│              │     │  format)     │
└──────────────┘     └──────────────┘
  • Formats for target platform
  • Creates archives (ZIP, tar.gz)
  • Optimizes for size
  • Validates structure

Phase 5: Upload (Optional)

┌──────────────┐     ┌──────────────┐
│   Package    │────▶│   Platform   │
│ (.zip/.tar)  │     │ (Claude/     │
│              │     │  Gemini/etc) │
└──────────────┘     └──────────────┘
  • Uploads to target platform
  • Configures settings
  • Returns skill ID/URL

Enhancement Levels

Control how much AI enhancement is applied:

Level What Happens Use Case
0 No enhancement Fast scraping, manual review
1 SKILL.md only Basic improvement
2 + architecture/config Recommended - good balance
3 Full enhancement Maximum quality, takes longer

Default: Level 2

# Skip enhancement (fastest)
skill-seekers create <source> --enhance-level 0

# Full enhancement (best quality)
skill-seekers create <source> --enhance-level 3

Target Platforms

Package skills for different AI systems:

Platform Format Use
Claude AI ZIP + YAML Claude Code, Claude API
Gemini tar.gz Google Gemini
OpenAI ZIP + Vector ChatGPT, Assistants API
LangChain Documents RAG pipelines
LlamaIndex TextNodes Query engines
ChromaDB Collection Vector search
Weaviate Objects Vector database
Cursor .cursorrules IDE AI assistant
Windsurf .windsurfrules IDE AI assistant

Configuration

Simple (Auto-Detect)

# Just provide the source
skill-seekers create https://docs.react.dev/

Preset Configs

# Use predefined configuration
skill-seekers create --config react

Available presets: react, vue, django, fastapi, godot, etc.

Custom Config

# Create custom config
cat > configs/my-docs.json << 'EOF'
{
  "name": "my-docs",
  "base_url": "https://docs.example.com/",
  "max_pages": 200
}
EOF

skill-seekers create --config configs/my-docs.json

See Config Format for full specification.


Multi-Source Skills

Combine multiple sources into one skill:

# Create unified config
cat > configs/my-project.json << 'EOF'
{
  "name": "my-project",
  "sources": [
    {"type": "docs", "base_url": "https://docs.example.com/"},
    {"type": "github", "repo": "owner/repo"},
    {"type": "pdf", "pdf_path": "manual.pdf"}
  ]
}
EOF

# Run unified scraping
skill-seekers unified --config configs/my-project.json

Benefits:

  • Single skill with complete context
  • Automatic conflict detection
  • Cross-referenced content

Caching and Resumption

How Caching Works

First scrape:    Downloads all pages → saves to output/{name}_data/
Second scrape:   Reuses cached data → fast rebuild

Skip Scraping

# Use cached data, just rebuild
skill-seekers create --config react --skip-scrape

Resume Interrupted Jobs

# List resumable jobs
skill-seekers resume --list

# Resume specific job
skill-seekers resume job-abc123

Rate Limiting

Be respectful to servers:

# Default: 0.5 seconds between requests
skill-seekers create <source>

# Faster (for your own servers)
skill-seekers create <source> --rate-limit 0.1

# Slower (for rate-limited sites)
skill-seekers create <source> --rate-limit 2.0

Why it matters:

  • Prevents being blocked
  • Respects server resources
  • Good citizenship

Key Takeaways

  1. Skills are structured knowledge - Not just raw text
  2. Auto-detection works - Usually don't need custom configs
  3. Enhancement improves quality - Level 2 is the sweet spot
  4. Package once, use everywhere - Same skill, multiple platforms
  5. Cache saves time - Rebuild without re-scraping

Next Steps