docs: update all documentation for 17 source types
Update 32 documentation files across English and Chinese (zh-CN) docs to reflect the 10 new source types added in the previous commit. Updated files: - README.md, README.zh-CN.md — taglines, feature lists, examples, install extras - docs/reference/ — CLI_REFERENCE, FEATURE_MATRIX, MCP_REFERENCE, CONFIG_FORMAT, API_REFERENCE - docs/features/ — UNIFIED_SCRAPING with generic merge docs - docs/advanced/ — multi-source guide, MCP server guide - docs/getting-started/ — installation extras, quick-start examples - docs/user-guide/ — core-concepts, scraping, packaging, workflows (complex-merge) - docs/ — FAQ, TROUBLESHOOTING, BEST_PRACTICES, ARCHITECTURE, UNIFIED_PARSERS, README - Root — BULLETPROOF_QUICKSTART, CONTRIBUTING, ROADMAP - docs/zh-CN/ — Chinese translations for all of the above 32 files changed, +3,016 lines, -245 lines
This commit is contained in:
@@ -1,19 +1,20 @@
|
||||
# Core Concepts
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Skill Seekers v3.2.0**
|
||||
> **Understanding how Skill Seekers works**
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Skill Seekers transforms documentation, code, and content into **structured knowledge assets** that AI systems can use effectively.
|
||||
Skill Seekers transforms documentation, code, and content into **structured knowledge assets** that AI systems can use effectively. It supports **17 source types** including documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and more.
|
||||
|
||||
```
|
||||
Raw Content → Skill Seekers → AI-Ready Skill
|
||||
↓ ↓
|
||||
(docs, code, (SKILL.md +
|
||||
PDFs, repos) references)
|
||||
(docs, code, PDFs, (SKILL.md +
|
||||
videos, notebooks, references)
|
||||
wikis, feeds, etc.)
|
||||
```
|
||||
|
||||
---
|
||||
@@ -76,7 +77,7 @@ npm install my-framework
|
||||
|
||||
## Source Types
|
||||
|
||||
Skill Seekers works with four types of sources:
|
||||
Skill Seekers works with **17 types of sources**:
|
||||
|
||||
### 1. Documentation Websites
|
||||
|
||||
@@ -168,6 +169,157 @@ skill-seekers analyze --directory ./my-project
|
||||
|
||||
---
|
||||
|
||||
### 5. Word Documents
|
||||
|
||||
**What:** Microsoft Word (.docx) files
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create report.docx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. EPUB Books
|
||||
|
||||
**What:** EPUB e-book files
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create book.epub
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Videos
|
||||
|
||||
**What:** YouTube, Vimeo, or local video files (transcripts + visual analysis)
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create https://www.youtube.com/watch?v=...
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 8. Jupyter Notebooks
|
||||
|
||||
**What:** `.ipynb` notebook files with code, markdown, and outputs
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create analysis.ipynb
|
||||
skill-seekers jupyter --notebook analysis.ipynb
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 9. Local HTML Files
|
||||
|
||||
**What:** HTML/HTM files on disk
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create page.html
|
||||
skill-seekers html --file page.html
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 10. OpenAPI/Swagger Specs
|
||||
|
||||
**What:** OpenAPI YAML/JSON specifications
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create api-spec.yaml
|
||||
skill-seekers openapi --spec api-spec.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 11. AsciiDoc
|
||||
|
||||
**What:** AsciiDoc (.adoc, .asciidoc) files
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create guide.adoc
|
||||
skill-seekers asciidoc --file guide.adoc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 12. PowerPoint Presentations
|
||||
|
||||
**What:** Microsoft PowerPoint (.pptx) files
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create slides.pptx
|
||||
skill-seekers pptx --file slides.pptx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 13. RSS/Atom Feeds
|
||||
|
||||
**What:** RSS or Atom feed files
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create feed.rss
|
||||
skill-seekers rss --feed feed.rss
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 14. Man Pages
|
||||
|
||||
**What:** Unix manual pages (.1 through .8, .man)
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers create grep.1
|
||||
skill-seekers manpage --file grep.1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 15. Confluence Wikis
|
||||
|
||||
**What:** Atlassian Confluence spaces (via API or export)
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers confluence --space DEV --base-url https://wiki.example.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 16. Notion Workspaces
|
||||
|
||||
**What:** Notion pages and databases (via API or export)
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers notion --database abc123
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 17. Slack/Discord Chat
|
||||
|
||||
**What:** Chat platform exports or API access
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
skill-seekers chat --export slack-export/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The Workflow
|
||||
|
||||
### Phase 1: Ingest
|
||||
|
||||
@@ -1,13 +1,13 @@
|
||||
# Scraping Guide
|
||||
|
||||
> **Skill Seekers v3.1.4**
|
||||
> **Skill Seekers v3.2.0**
|
||||
> **Complete guide to all scraping options**
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Skill Seekers can extract knowledge from four types of sources:
|
||||
Skill Seekers can extract knowledge from **17 types of sources**:
|
||||
|
||||
| Source | Command | Best For |
|
||||
|--------|---------|----------|
|
||||
@@ -15,6 +15,19 @@ Skill Seekers can extract knowledge from four types of sources:
|
||||
| **GitHub** | `create <repo>` | Source code, issues, releases |
|
||||
| **PDF** | `create <file.pdf>` | Manuals, papers, reports |
|
||||
| **Local** | `create <./path>` | Your projects, internal code |
|
||||
| **Word** | `create <file.docx>` | Reports, specifications |
|
||||
| **EPUB** | `create <file.epub>` | E-books, long-form docs |
|
||||
| **Video** | `create <url/file>` | Tutorials, presentations |
|
||||
| **Jupyter** | `create <file.ipynb>` | Data science, experiments |
|
||||
| **Local HTML** | `create <file.html>` | Offline docs, saved pages |
|
||||
| **OpenAPI** | `create <spec.yaml>` | API specs, Swagger docs |
|
||||
| **AsciiDoc** | `create <file.adoc>` | Technical documentation |
|
||||
| **PowerPoint** | `create <file.pptx>` | Slide decks, presentations |
|
||||
| **RSS/Atom** | `create <feed.rss>` | Blog feeds, news sources |
|
||||
| **Man Pages** | `create <cmd.1>` | Unix command documentation |
|
||||
| **Confluence** | `confluence` | Team wikis, knowledge bases |
|
||||
| **Notion** | `notion` | Workspace docs, databases |
|
||||
| **Slack/Discord** | `chat` | Chat history, discussions |
|
||||
|
||||
---
|
||||
|
||||
@@ -280,6 +293,274 @@ skill-seekers analyze --directory ./my-project \
|
||||
|
||||
---
|
||||
|
||||
## Video Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# YouTube video
|
||||
skill-seekers create https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
||||
|
||||
# Local video file
|
||||
skill-seekers create presentation.mp4
|
||||
|
||||
# With explicit command
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=...
|
||||
```
|
||||
|
||||
### Visual Analysis
|
||||
|
||||
```bash
|
||||
# Install full video support (includes Whisper + scene detection)
|
||||
pip install skill-seekers[video-full]
|
||||
skill-seekers video --setup # auto-detect GPU and install PyTorch
|
||||
|
||||
# Extract with visual analysis
|
||||
skill-seekers video --url <url> --visual-analysis
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[video] # Transcript only
|
||||
pip install skill-seekers[video-full] # + Whisper, scene detection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Word Document Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .docx
|
||||
skill-seekers create report.docx --name project-report
|
||||
|
||||
# With explicit command
|
||||
skill-seekers word --file report.docx
|
||||
```
|
||||
|
||||
**Handles:** Text, tables, headings, images, embedded metadata.
|
||||
|
||||
---
|
||||
|
||||
## EPUB Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .epub
|
||||
skill-seekers create programming-guide.epub --name guide
|
||||
|
||||
# With explicit command
|
||||
skill-seekers epub --file programming-guide.epub
|
||||
```
|
||||
|
||||
**Handles:** Chapters, metadata, table of contents, embedded images.
|
||||
|
||||
---
|
||||
|
||||
## Jupyter Notebook Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .ipynb
|
||||
skill-seekers create analysis.ipynb --name data-analysis
|
||||
|
||||
# With explicit command
|
||||
skill-seekers jupyter --notebook analysis.ipynb
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[jupyter]
|
||||
```
|
||||
|
||||
**Extracts:** Markdown cells, code cells, cell outputs, execution order.
|
||||
|
||||
---
|
||||
|
||||
## Local HTML Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .html
|
||||
skill-seekers create docs.html --name offline-docs
|
||||
|
||||
# With explicit command
|
||||
skill-seekers html --file docs.html
|
||||
```
|
||||
|
||||
**Handles:** Full HTML parsing, text extraction, link resolution.
|
||||
|
||||
---
|
||||
|
||||
## OpenAPI/Swagger Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from OpenAPI spec
|
||||
skill-seekers create api-spec.yaml --name my-api
|
||||
|
||||
# With explicit command
|
||||
skill-seekers openapi --spec api-spec.yaml
|
||||
```
|
||||
|
||||
**Extracts:** Endpoints, request/response schemas, authentication info, examples.
|
||||
|
||||
---
|
||||
|
||||
## AsciiDoc Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .adoc
|
||||
skill-seekers create guide.adoc --name dev-guide
|
||||
|
||||
# With explicit command
|
||||
skill-seekers asciidoc --file guide.adoc
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[asciidoc]
|
||||
```
|
||||
|
||||
**Handles:** Sections, code blocks, tables, cross-references, includes.
|
||||
|
||||
---
|
||||
|
||||
## PowerPoint Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from .pptx
|
||||
skill-seekers create slides.pptx --name presentation
|
||||
|
||||
# With explicit command
|
||||
skill-seekers pptx --file slides.pptx
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[pptx]
|
||||
```
|
||||
|
||||
**Extracts:** Slide text, speaker notes, images, tables, slide order.
|
||||
|
||||
---
|
||||
|
||||
## RSS/Atom Feed Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from RSS feed
|
||||
skill-seekers create blog.rss --name blog-archive
|
||||
|
||||
# Atom feed
|
||||
skill-seekers create updates.atom --name updates
|
||||
|
||||
# With explicit command
|
||||
skill-seekers rss --feed blog.rss
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[rss]
|
||||
```
|
||||
|
||||
**Extracts:** Articles, titles, dates, authors, categories.
|
||||
|
||||
---
|
||||
|
||||
## Man Page Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Extract from man page
|
||||
skill-seekers create curl.1 --name curl-manual
|
||||
|
||||
# With explicit command
|
||||
skill-seekers manpage --file curl.1
|
||||
```
|
||||
|
||||
**Handles:** Sections (NAME, SYNOPSIS, DESCRIPTION, OPTIONS, etc.), formatting.
|
||||
|
||||
---
|
||||
|
||||
## Confluence Wiki Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# From Confluence API
|
||||
skill-seekers confluence \
|
||||
--base-url https://wiki.example.com \
|
||||
--space DEV \
|
||||
--name team-docs
|
||||
|
||||
# From Confluence export directory
|
||||
skill-seekers confluence --export-dir ./confluence-export/
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[confluence]
|
||||
```
|
||||
|
||||
**Extracts:** Pages, page trees, attachments, labels, spaces.
|
||||
|
||||
---
|
||||
|
||||
## Notion Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# From Notion API
|
||||
export NOTION_API_KEY=secret_...
|
||||
skill-seekers notion --database abc123 --name product-wiki
|
||||
|
||||
# From Notion export directory
|
||||
skill-seekers notion --export-dir ./notion-export/
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[notion]
|
||||
```
|
||||
|
||||
**Extracts:** Pages, databases, blocks, properties, relations.
|
||||
|
||||
---
|
||||
|
||||
## Slack/Discord Chat Extraction
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# From Slack export
|
||||
skill-seekers chat --export slack-export/ --name team-discussions
|
||||
|
||||
# From Discord export
|
||||
skill-seekers chat --export discord-export/ --name server-archive
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
```bash
|
||||
pip install skill-seekers[chat]
|
||||
```
|
||||
|
||||
**Extracts:** Messages, threads, channels, reactions, attachments.
|
||||
|
||||
---
|
||||
|
||||
## Common Scraping Patterns
|
||||
|
||||
### Pattern 1: Test First
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Packaging Guide
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Skill Seekers v3.2.0**
|
||||
> **Export skills to AI platforms and vector databases**
|
||||
|
||||
---
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Workflows Guide
|
||||
|
||||
> **Skill Seekers v3.1.0**
|
||||
> **Skill Seekers v3.2.0**
|
||||
> **Enhancement workflow presets for specialized analysis**
|
||||
|
||||
---
|
||||
@@ -21,7 +21,7 @@ Basic Skill ──▶ Workflow: Security-Focus ──▶ Security-Enhanced Skill
|
||||
|
||||
## Built-in Presets
|
||||
|
||||
Skill Seekers includes 5 built-in workflow presets:
|
||||
Skill Seekers includes 6 built-in workflow presets:
|
||||
|
||||
| Preset | Stages | Best For |
|
||||
|--------|--------|----------|
|
||||
@@ -30,6 +30,7 @@ Skill Seekers includes 5 built-in workflow presets:
|
||||
| `security-focus` | 4 | Security analysis |
|
||||
| `architecture-comprehensive` | 7 | Deep architecture review |
|
||||
| `api-documentation` | 3 | API documentation focus |
|
||||
| `complex-merge` | 3 | Merging multiple source types into a unified skill |
|
||||
|
||||
---
|
||||
|
||||
@@ -233,6 +234,36 @@ skill-seekers create https://api.example.com/docs \
|
||||
|
||||
---
|
||||
|
||||
### Complex-Merge Workflow
|
||||
|
||||
**Stages:** 3
|
||||
**Purpose:** Merging multiple heterogeneous sources into a unified, coherent skill
|
||||
|
||||
```yaml
|
||||
stages:
|
||||
- name: source-alignment
|
||||
prompt: Align and deduplicate content from different source types...
|
||||
|
||||
- name: cross-reference
|
||||
prompt: Build cross-references between sources...
|
||||
|
||||
- name: unified-synthesis
|
||||
prompt: Synthesize a unified narrative from all sources...
|
||||
```
|
||||
|
||||
**Use for:**
|
||||
- Multi-source unified configs (docs + GitHub + PDF + video)
|
||||
- Combining documentation with chat history or wiki pages
|
||||
- Any skill built from 3+ different source types
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
skill-seekers unified --config configs/multi-source.json \
|
||||
--enhance-workflow complex-merge
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Chaining Multiple Workflows
|
||||
|
||||
Apply multiple workflows sequentially:
|
||||
@@ -532,7 +563,7 @@ skill-seekers create <source> \
|
||||
|
||||
## Workflow Support Across All Scrapers
|
||||
|
||||
Workflows are supported by **all 5 scrapers** in Skill Seekers:
|
||||
Workflows are supported by **all 17 source types** in Skill Seekers:
|
||||
|
||||
| Scraper | Command | Workflow Support |
|
||||
|---------|---------|------------------|
|
||||
@@ -540,6 +571,19 @@ Workflows are supported by **all 5 scrapers** in Skill Seekers:
|
||||
| GitHub | `github` | ✅ Full support |
|
||||
| Local Codebase | `analyze` | ✅ Full support |
|
||||
| PDF | `pdf` | ✅ Full support |
|
||||
| Word | `word` | ✅ Full support |
|
||||
| EPUB | `epub` | ✅ Full support |
|
||||
| Video | `video` | ✅ Full support |
|
||||
| Jupyter Notebook | `jupyter` | ✅ Full support |
|
||||
| Local HTML | `html` | ✅ Full support |
|
||||
| OpenAPI/Swagger | `openapi` | ✅ Full support |
|
||||
| AsciiDoc | `asciidoc` | ✅ Full support |
|
||||
| PowerPoint | `pptx` | ✅ Full support |
|
||||
| RSS/Atom | `rss` | ✅ Full support |
|
||||
| Man Pages | `manpage` | ✅ Full support |
|
||||
| Confluence | `confluence` | ✅ Full support |
|
||||
| Notion | `notion` | ✅ Full support |
|
||||
| Slack/Discord | `chat` | ✅ Full support |
|
||||
| Unified/Multi-Source | `unified` | ✅ Full support |
|
||||
| Create (Auto-detect) | `create` | ✅ Full support |
|
||||
|
||||
@@ -609,6 +653,7 @@ skill-seekers unified config.json --enhance-workflow api-documentation
|
||||
| **Security-Focus** | Security-sensitive projects |
|
||||
| **Architecture** | Large frameworks, systems |
|
||||
| **API-Docs** | API frameworks, libraries |
|
||||
| **Complex-Merge** | Multi-source skills (3+ source types) |
|
||||
| **Custom** | Specialized domains |
|
||||
| **Chaining** | Multiple perspectives needed |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user