firefrost-gaming/skill-seekers-reference

Files

yusyus 37cb307455 docs: update all documentation for 17 source types

Update 32 documentation files across English and Chinese (zh-CN) docs
to reflect the 10 new source types added in the previous commit.

Updated files:
- README.md, README.zh-CN.md — taglines, feature lists, examples, install extras
- docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE
- docs/features/ — UNIFIED_SCRAPING with generic merge docs
- docs/advanced/ — multi-source guide, MCP server guide
- docs/getting-started/ — installation extras, quick-start examples
- docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge)
- docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README
- Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP
- docs/zh-CN/ — Chinese translations for all of the above

32 files changed, +3,016 lines, -245 lines

2026-03-15 15:56:04 +03:00

12 KiB

Raw Blame History

Core Concepts

Skill Seekers v3.2.0
Understanding how Skill Seekers works

Overview

Skill Seekers transforms documentation, code, and content into structured knowledge assets that AI systems can use effectively. It supports 17 source types including documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more.

Raw Content → Skill Seekers → AI-Ready Skill
     ↓                              ↓
  (docs, code, PDFs,          (SKILL.md +
   videos, notebooks,          references)
   wikis, feeds, etc.)

What is a Skill?

A skill is a structured knowledge package containing:

output/my-skill/
├── SKILL.md              # Main file (400+ lines typically)
├── references/           # Categorized content
│   ├── index.md         # Navigation
│   ├── getting_started.md
│   ├── api_reference.md
│   └── ...
├── .skill-seekers/      # Metadata
└── assets/              # Images, downloads

SKILL.md Structure

# My Framework Skill

## Overview
Brief description of the framework...

## Quick Reference
Common commands and patterns...

## Categories
- [Getting Started](#getting-started)
- [API Reference](#api-reference)
- [Guides](#guides)

## Getting Started
### Installation
```bash
npm install my-framework

First Steps

...

API Reference

...


### Why This Structure?

| Element | Purpose |
|---------|---------|
| **Overview** | Quick context for AI |
| **Quick Reference** | Common patterns at a glance |
| **Categories** | Organized deep dives |
| **Code Examples** | Copy-paste ready snippets |

---

## Source Types

Skill Seekers works with **17 types of sources**:

### 1. Documentation Websites

**What:** Web-based documentation (ReadTheDocs, Docusaurus, GitBook, etc.)

**Examples:**
- React docs (react.dev)
- Django docs (docs.djangoproject.com)
- Kubernetes docs (kubernetes.io)

**Command:**
```bash
skill-seekers create https://docs.example.com/

Best for:

Framework documentation
API references
Tutorials and guides

2. GitHub Repositories

What: Source code repositories with analysis

Extracts:

Code structure and APIs
README and documentation
Issues and discussions
Releases and changelog

Command:

skill-seekers create owner/repo
skill-seekers github --repo owner/repo

Best for:

Understanding codebases
API implementation details
Contributing guidelines

3. PDF Documents

What: PDF manuals, papers, documentation

Handles:

Text extraction
OCR for scanned PDFs
Table extraction
Image extraction

Command:

skill-seekers create manual.pdf
skill-seekers pdf --pdf manual.pdf

Best for:

Product manuals
Research papers
Legacy documentation

4. Local Codebases

What: Your local projects and code

Analyzes:

Source code structure
Comments and docstrings
Test files
Configuration patterns

Command:

skill-seekers create ./my-project
skill-seekers analyze --directory ./my-project

Best for:

Your own projects
Internal tools
Code review preparation

5. Word Documents

What: Microsoft Word (.docx) files

Command:

skill-seekers create report.docx

6. EPUB Books

What: EPUB e-book files

Command:

skill-seekers create book.epub

7. Videos

What: YouTube, Vimeo, or local video files (transcripts + visual analysis)

Command:

skill-seekers create https://www.youtube.com/watch?v=...
skill-seekers video --url https://www.youtube.com/watch?v=...

8. Jupyter Notebooks

What: .ipynb notebook files with code, markdown, and outputs

Command:

skill-seekers create analysis.ipynb
skill-seekers jupyter --notebook analysis.ipynb

9. Local HTML Files

What: HTML/HTM files on disk

Command:

skill-seekers create page.html
skill-seekers html --file page.html

10. OpenAPI/Swagger Specs

What: OpenAPI YAML/JSON specifications

Command:

skill-seekers create api-spec.yaml
skill-seekers openapi --spec api-spec.yaml

11. AsciiDoc

What: AsciiDoc (.adoc, .asciidoc) files

Command:

skill-seekers create guide.adoc
skill-seekers asciidoc --file guide.adoc

12. PowerPoint Presentations

What: Microsoft PowerPoint (.pptx) files

Command:

skill-seekers create slides.pptx
skill-seekers pptx --file slides.pptx

13. RSS/Atom Feeds

What: RSS or Atom feed files

Command:

skill-seekers create feed.rss
skill-seekers rss --feed feed.rss

14. Man Pages

What: Unix manual pages (.1 through .8, .man)

Command:

skill-seekers create grep.1
skill-seekers manpage --file grep.1

15. Confluence Wikis

What: Atlassian Confluence spaces (via API or export)

Command:

skill-seekers confluence --space DEV --base-url https://wiki.example.com

16. Notion Workspaces

What: Notion pages and databases (via API or export)

Command:

skill-seekers notion --database abc123

17. Slack/Discord Chat

What: Chat platform exports or API access

Command:

skill-seekers chat --export slack-export/

The Workflow

Phase 1: Ingest

┌─────────────┐     ┌──────────────┐
│   Source    │────▶│   Scraper    │
│ (URL/repo/  │     │ (extracts    │
│  PDF/local) │     │  content)    │
└─────────────┘     └──────────────┘

Detects source type automatically
Crawls and downloads content
Respects rate limits
Extracts text, code, metadata

Phase 2: Structure

┌──────────────┐     ┌──────────────┐
│   Raw Data   │────▶│   Builder    │
│ (pages/files/│     │ (organizes   │
│  commits)    │     │  by category)│
└──────────────┘     └──────────────┘

Categorizes content by topic
Extracts code examples
Builds navigation structure
Creates reference files

Phase 3: Enhance (Optional)

┌──────────────┐     ┌──────────────┐
│   SKILL.md   │────▶│  Enhancer    │
│  (basic)     │     │ (AI improves │
│              │     │  quality)    │
└──────────────┘     └──────────────┘

AI reviews and improves content
Adds examples and patterns
Fixes formatting
Enhances navigation

Modes:

API: Uses Claude API (fast, costs ~$0.10-0.30)
LOCAL: Uses Claude Code (free, requires Claude Code Max)

Phase 4: Package

┌──────────────┐     ┌──────────────┐
│   Skill Dir  │────▶│   Packager   │
│ (structured  │     │ (creates     │
│  content)    │     │  platform    │
│              │     │  format)     │
└──────────────┘     └──────────────┘

Formats for target platform
Creates archives (ZIP, tar.gz)
Optimizes for size
Validates structure

Phase 5: Upload (Optional)

┌──────────────┐     ┌──────────────┐
│   Package    │────▶│   Platform   │
│ (.zip/.tar)  │     │ (Claude/     │
│              │     │  Gemini/etc) │
└──────────────┘     └──────────────┘

Uploads to target platform
Configures settings
Returns skill ID/URL

Enhancement Levels

Control how much AI enhancement is applied:

Level	What Happens	Use Case
0	No enhancement	Fast scraping, manual review
1	SKILL.md only	Basic improvement
2	+ architecture/config	Recommended - good balance
3	Full enhancement	Maximum quality, takes longer

Default: Level 2

# Skip enhancement (fastest)
skill-seekers create <source> --enhance-level 0

# Full enhancement (best quality)
skill-seekers create <source> --enhance-level 3

Target Platforms

Package skills for different AI systems:

Platform	Format	Use
Claude AI	ZIP + YAML	Claude Code, Claude API
Gemini	tar.gz	Google Gemini
OpenAI	ZIP + Vector	ChatGPT, Assistants API
LangChain	Documents	RAG pipelines
LlamaIndex	TextNodes	Query engines
ChromaDB	Collection	Vector search
Weaviate	Objects	Vector database
Cursor	.cursorrules	IDE AI assistant
Windsurf	.windsurfrules	IDE AI assistant

Configuration

Simple (Auto-Detect)

# Just provide the source
skill-seekers create https://docs.react.dev/

Preset Configs

# Use predefined configuration
skill-seekers create --config react

Available presets: react, vue, django, fastapi, godot, etc.

Custom Config

# Create custom config
cat > configs/my-docs.json << 'EOF'
{
  "name": "my-docs",
  "base_url": "https://docs.example.com/",
  "max_pages": 200
}
EOF

skill-seekers create --config configs/my-docs.json

See Config Format for full specification.

Multi-Source Skills

Combine multiple sources into one skill:

# Create unified config
cat > configs/my-project.json << 'EOF'
{
  "name": "my-project",
  "sources": [
    {"type": "docs", "base_url": "https://docs.example.com/"},
    {"type": "github", "repo": "owner/repo"},
    {"type": "pdf", "pdf_path": "manual.pdf"}
  ]
}
EOF

# Run unified scraping
skill-seekers unified --config configs/my-project.json

Benefits:

Single skill with complete context
Automatic conflict detection
Cross-referenced content

Caching and Resumption

How Caching Works

First scrape:    Downloads all pages → saves to output/{name}_data/
Second scrape:   Reuses cached data → fast rebuild

Skip Scraping

# Use cached data, just rebuild
skill-seekers create --config react --skip-scrape

Resume Interrupted Jobs

# List resumable jobs
skill-seekers resume --list

# Resume specific job
skill-seekers resume job-abc123

Rate Limiting

Be respectful to servers:

# Default: 0.5 seconds between requests
skill-seekers create <source>

# Faster (for your own servers)
skill-seekers create <source> --rate-limit 0.1

# Slower (for rate-limited sites)
skill-seekers create <source> --rate-limit 2.0

Why it matters:

Prevents being blocked
Respects server resources
Good citizenship

Key Takeaways

Skills are structured knowledge - Not just raw text
Auto-detection works - Usually don't need custom configs
Enhancement improves quality - Level 2 is the sweet spot
Package once, use everywhere - Same skill, multiple platforms
Cache saves time - Rebuild without re-scraping

Next Steps

Scraping Guide - Deep dive into source options
Enhancement Guide - AI enhancement explained
Config Format - Custom configurations

12 KiB Raw Blame History

Core Concepts

Overview

What is a Skill?

SKILL.md Structure

First Steps

API Reference

2. GitHub Repositories

3. PDF Documents

4. Local Codebases

5. Word Documents

6. EPUB Books

7. Videos

8. Jupyter Notebooks

9. Local HTML Files

10. OpenAPI/Swagger Specs

11. AsciiDoc

12. PowerPoint Presentations

13. RSS/Atom Feeds

14. Man Pages

15. Confluence Wikis

16. Notion Workspaces

17. Slack/Discord Chat

The Workflow

Phase 1: Ingest

Phase 2: Structure

Phase 3: Enhance (Optional)

Phase 4: Package

Phase 5: Upload (Optional)

Enhancement Levels

Target Platforms

Configuration

Simple (Auto-Detect)

Preset Configs

Custom Config

Multi-Source Skills

Caching and Resumption

How Caching Works

Skip Scraping

Resume Interrupted Jobs

Rate Limiting

Key Takeaways

Next Steps

12 KiB

Raw Blame History