docs: add video feature guide and sync README translations
- Add docs/VIDEO_GUIDE.md (483 lines) — comprehensive guide covering Quick Start, CLI reference, visual pipeline, AI enhancement, output structure, time clipping, and troubleshooting - Update README.md video section with new CLI examples (enhance, clipping, vision OCR, re-build from JSON) and link to full guide - Sync README.zh-CN.md with all video feature additions: - Quick Start section: video commands - Core Features: new video extraction feature list - Installation table: video/video-full packages + GPU note - Usage Examples: full video extraction subsection - Documentation links: VIDEO_GUIDE.md reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
15
README.md
15
README.md
@@ -715,8 +715,23 @@ skill-seekers video --video-file recording.mp4 --name myrecording
|
||||
|
||||
# Extract with visual frame analysis (requires video-full deps)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial --visual
|
||||
|
||||
# With AI enhancement (cleans OCR + generates polished SKILL.md)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --enhance-level 2
|
||||
|
||||
# Clip a specific section of a video (supports seconds, MM:SS, HH:MM:SS)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --start-time 1:30 --end-time 5:00
|
||||
|
||||
# Use Vision API for low-confidence OCR frames (requires ANTHROPIC_API_KEY)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --vision-ocr
|
||||
|
||||
# Re-build skill from previously extracted data (skip download)
|
||||
skill-seekers video --from-json output/mytutorial/video_data/extracted_data.json --name mytutorial
|
||||
```
|
||||
|
||||
> **Full guide:** See [docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md) for complete CLI reference,
|
||||
> visual pipeline details, AI enhancement options, and troubleshooting.
|
||||
|
||||
### GitHub Repository Analysis
|
||||
|
||||
```bash
|
||||
|
||||
@@ -81,6 +81,11 @@ skill-seekers create django/django # GitHub 仓库
|
||||
skill-seekers create ./my-codebase # 本地项目
|
||||
skill-seekers create manual.pdf # PDF 文件
|
||||
|
||||
# 视频(YouTube、Vimeo 或本地文件 — 需要 skill-seekers[video])
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial
|
||||
# 首次使用?自动安装 GPU 感知的视觉依赖:
|
||||
skill-seekers video --setup
|
||||
|
||||
# 根据用途导出
|
||||
skill-seekers package output/django --target claude # Claude AI 技能
|
||||
skill-seekers package output/django --target langchain # LangChain RAG
|
||||
@@ -152,6 +157,14 @@ Skill Seekers 通过以下步骤代替数天的手动预处理工作:
|
||||
- ✅ **并行处理** - 大型 PDF 快 3 倍
|
||||
- ✅ **智能缓存** - 重复运行快 50%
|
||||
|
||||
### 🎬 视频教程提取
|
||||
- ✅ **YouTube 和本地视频** - 从视频教程提取字幕、代码和结构化知识
|
||||
- ✅ **视觉帧分析** - 屏幕 OCR 提取代码编辑器、终端和幻灯片内容
|
||||
- ✅ **GPU 自动检测** - 自动安装正确的 PyTorch 版本(CUDA/ROCm/MPS/CPU)
|
||||
- ✅ **AI 增强** - 两阶段增强:清理 OCR + 生成精美 SKILL.md
|
||||
- ✅ **时间裁剪** - 提取视频的特定片段(`--start-time`、`--end-time`)
|
||||
- ✅ **播放列表支持** - 批量处理 YouTube 播放列表中的所有视频
|
||||
|
||||
### 🐙 GitHub 仓库分析
|
||||
- ✅ **深度代码分析** - 支持 Python、JavaScript、TypeScript、Java、C++、Go 的 AST 解析
|
||||
- ✅ **API 提取** - 函数、类、方法及参数和类型
|
||||
@@ -510,8 +523,14 @@ skill-seekers-setup
|
||||
| `pip install skill-seekers[openai]` | + OpenAI ChatGPT 支持 |
|
||||
| `pip install skill-seekers[all-llms]` | + 所有 LLM 平台 |
|
||||
| `pip install skill-seekers[mcp]` | + MCP 服务器 |
|
||||
| `pip install skill-seekers[video]` | + YouTube/Vimeo 字幕和元数据提取 |
|
||||
| `pip install skill-seekers[video-full]` | + Whisper 转录和视觉帧提取 |
|
||||
| `pip install skill-seekers[all]` | 全部功能 |
|
||||
|
||||
> **视频视觉依赖(GPU 感知):** 安装 `skill-seekers[video-full]` 后,运行
|
||||
> `skill-seekers video --setup` 自动检测您的 GPU 并安装正确的 PyTorch
|
||||
> 版本 + easyocr。这是安装视觉提取依赖的推荐方式。
|
||||
|
||||
---
|
||||
|
||||
## 🚀 一键安装工作流
|
||||
@@ -594,6 +613,44 @@ skill-seekers pdf --pdf docs/manual.pdf --name myskill \
|
||||
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
|
||||
```
|
||||
|
||||
### 视频教程提取
|
||||
|
||||
```bash
|
||||
# 安装视频支持
|
||||
pip install skill-seekers[video] # 字幕 + 元数据
|
||||
pip install skill-seekers[video-full] # + Whisper 转录 + 视觉帧提取
|
||||
|
||||
# 自动检测 GPU 并安装视觉依赖(PyTorch + easyocr)
|
||||
skill-seekers video --setup
|
||||
|
||||
# 从 YouTube 视频提取
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --name mytutorial
|
||||
|
||||
# 从 YouTube 播放列表提取
|
||||
skill-seekers video --playlist https://www.youtube.com/playlist?list=... --name myplaylist
|
||||
|
||||
# 从本地视频文件提取
|
||||
skill-seekers video --video-file recording.mp4 --name myrecording
|
||||
|
||||
# 使用视觉帧分析提取(需要 video-full 依赖)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial --visual
|
||||
|
||||
# 使用 AI 增强(清理 OCR + 生成精美 SKILL.md)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --enhance-level 2
|
||||
|
||||
# 裁剪视频的特定片段(支持秒数、MM:SS、HH:MM:SS 格式)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --start-time 1:30 --end-time 5:00
|
||||
|
||||
# 使用 Vision API 处理低置信度 OCR 帧(需要 ANTHROPIC_API_KEY)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --vision-ocr
|
||||
|
||||
# 从之前提取的数据重建技能(跳过下载)
|
||||
skill-seekers video --from-json output/mytutorial/video_data/extracted_data.json --name mytutorial
|
||||
```
|
||||
|
||||
> **完整指南:** 参见 [docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md) 了解完整 CLI 参考、
|
||||
> 视觉流水线详情、AI 增强选项和故障排除。
|
||||
|
||||
### GitHub 仓库分析
|
||||
|
||||
```bash
|
||||
@@ -956,6 +1013,7 @@ skill-seekers config --github
|
||||
- **[docs/ENHANCEMENT_MODES.md](docs/ENHANCEMENT_MODES.md)** - AI 增强模式指南
|
||||
- **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP 集成设置
|
||||
- **[docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)** - 多源抓取
|
||||
- **[docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md)** - 视频教程提取完整指南
|
||||
|
||||
### 集成指南
|
||||
- **[docs/integrations/LANGCHAIN.md](docs/integrations/LANGCHAIN.md)** - LangChain RAG
|
||||
|
||||
483
docs/VIDEO_GUIDE.md
Normal file
483
docs/VIDEO_GUIDE.md
Normal file
@@ -0,0 +1,483 @@
|
||||
# Video Tutorial Extraction Guide
|
||||
|
||||
Convert video tutorials into structured AI skills with transcripts, on-screen code extraction, and AI enhancement.
|
||||
|
||||
Supports YouTube videos, YouTube playlists, local video files, and pre-extracted JSON data.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Install transcript-only dependencies (lightweight, ~15 MB)
|
||||
pip install "skill-seekers[video]"
|
||||
|
||||
# Extract a YouTube tutorial (transcript only)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID
|
||||
|
||||
# Install visual extraction dependencies (auto-detects your GPU)
|
||||
skill-seekers video --setup
|
||||
|
||||
# Extract with on-screen code recognition
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID --visual
|
||||
|
||||
# Extract with AI enhancement (cleans OCR, synthesizes tutorial)
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID --visual --enhance-level 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Transcript-Only (Lightweight)
|
||||
|
||||
This installs `yt-dlp` and `youtube-transcript-api` -- everything needed to pull metadata and transcripts from YouTube videos.
|
||||
|
||||
```bash
|
||||
pip install "skill-seekers[video]"
|
||||
```
|
||||
|
||||
Total download size is around 15 MB. No GPU or native libraries required.
|
||||
|
||||
### Full Visual Extraction
|
||||
|
||||
Visual extraction adds scene detection, keyframe classification, OCR (optical character recognition), and Whisper speech-to-text. Install the base visual dependencies first:
|
||||
|
||||
```bash
|
||||
pip install "skill-seekers[video-full]"
|
||||
```
|
||||
|
||||
This installs `faster-whisper`, `scenedetect`, `opencv-python-headless`, and `pytesseract`.
|
||||
|
||||
Then run the setup command to install GPU-aware dependencies (PyTorch and EasyOCR):
|
||||
|
||||
```bash
|
||||
skill-seekers video --setup
|
||||
```
|
||||
|
||||
### GPU Setup (`--setup`)
|
||||
|
||||
The `--setup` command auto-detects your GPU hardware and installs the correct PyTorch variant along with EasyOCR. These packages are installed at runtime rather than through pip extras because PyTorch requires different builds depending on your GPU.
|
||||
|
||||
**Detection order:**
|
||||
|
||||
| GPU Type | Detection Method | PyTorch Variant Installed |
|
||||
|----------|-----------------|--------------------------|
|
||||
| NVIDIA (CUDA) | `nvidia-smi` | `torch` with CUDA 11.8 / 12.1 / 12.4 (matched to your driver) |
|
||||
| AMD (ROCm) | `rocminfo` | `torch` with ROCm 6.2 / 6.3 |
|
||||
| AMD (no ROCm) | `lspci` | CPU-only (warns to install ROCm first) |
|
||||
| Apple Silicon | macOS detection | `torch` with MPS support |
|
||||
| CPU only | Fallback | `torch` CPU build |
|
||||
|
||||
**What gets installed:**
|
||||
|
||||
- **PyTorch** -- correct build for your GPU
|
||||
- **EasyOCR** -- multi-engine OCR for on-screen text extraction
|
||||
- **opencv-python-headless** -- frame extraction and image processing
|
||||
- **scenedetect** -- scene change detection for keyframe selection
|
||||
- **pytesseract** -- Tesseract OCR engine (requires the `tesseract` system binary)
|
||||
- **faster-whisper** -- Whisper speech-to-text for audio fallback
|
||||
|
||||
**System dependency:** Tesseract must be installed separately through your system package manager:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install tesseract-ocr
|
||||
|
||||
# macOS
|
||||
brew install tesseract
|
||||
|
||||
# Arch/Manjaro
|
||||
sudo pacman -S tesseract
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Reference
|
||||
|
||||
### Video-Specific Flags
|
||||
|
||||
| Flag | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `--url URL` | string | -- | Video URL (YouTube, Vimeo) |
|
||||
| `--video-file PATH` | string | -- | Local video file path |
|
||||
| `--playlist URL` | string | -- | Playlist URL (processes all videos in the playlist) |
|
||||
| `--visual` | flag | off | Enable visual extraction (requires video-full deps) |
|
||||
| `--vision-ocr` | flag | off | Use Claude Vision API as fallback for low-confidence code frames (requires `ANTHROPIC_API_KEY`, ~$0.004/frame) |
|
||||
| `--whisper-model MODEL` | string | `base` | Whisper model size for speech-to-text fallback |
|
||||
| `--from-json FILE` | string | -- | Build skill from previously extracted JSON data |
|
||||
| `--start-time TIME` | string | -- | Start time for extraction (single video only) |
|
||||
| `--end-time TIME` | string | -- | End time for extraction (single video only) |
|
||||
| `--setup` | flag | -- | Auto-detect GPU and install visual extraction dependencies, then exit |
|
||||
| `--visual-interval SECS` | float | `0.7` | How often to sample frames during visual extraction (seconds) |
|
||||
| `--visual-min-gap SECS` | float | `0.5` | Minimum gap between extracted frames (seconds) |
|
||||
| `--visual-similarity THRESH` | float | `3.0` | Pixel-diff threshold for duplicate frame detection; lower values keep more frames |
|
||||
| `--languages LANGS` | string | `en` | Transcript language preference (comma-separated, e.g., `en,es`) |
|
||||
|
||||
### Shared Flags
|
||||
|
||||
These flags are available on all Skill Seekers commands:
|
||||
|
||||
| Flag | Type | Default | Description |
|
||||
|------|------|---------|-------------|
|
||||
| `--name NAME` | string | `video_skill` | Skill name (used for output directory and filenames) |
|
||||
| `--description TEXT` | string | -- | Skill description (used in SKILL.md) |
|
||||
| `--output DIR` | string | `output/<name>` | Output directory |
|
||||
| `--enhance-level LEVEL` | int (0-3) | `0` | AI enhancement level (see [AI Enhancement](#ai-enhancement) below). Default is 0 (disabled) for the video command. |
|
||||
| `--enhance-workflow NAME` | string | -- | Enhancement workflow preset to apply (repeatable). Auto-set to `video-tutorial` when `--enhance-level` > 0. |
|
||||
| `--api-key KEY` | string | -- | Anthropic API key (or set `ANTHROPIC_API_KEY` env var) |
|
||||
| `--dry-run` | flag | off | Preview what will happen without executing |
|
||||
| `--verbose` / `-v` | flag | off | Enable DEBUG level logging |
|
||||
| `--quiet` / `-q` | flag | off | Suppress most output (WARNING level only) |
|
||||
|
||||
---
|
||||
|
||||
## Source Types
|
||||
|
||||
### YouTube Videos
|
||||
|
||||
Provide a YouTube URL with `--url`. The tool extracts metadata (title, channel, duration, chapters, tags, view count) via `yt-dlp` and fetches transcripts via the YouTube Transcript API.
|
||||
|
||||
```bash
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --name my-tutorial
|
||||
```
|
||||
|
||||
Shortened URLs also work:
|
||||
|
||||
```bash
|
||||
skill-seekers video --url https://youtu.be/dQw4w9WgXcQ
|
||||
```
|
||||
|
||||
### YouTube Playlists
|
||||
|
||||
Provide a playlist URL with `--playlist`. Every video in the playlist is processed sequentially and combined into a single skill.
|
||||
|
||||
```bash
|
||||
skill-seekers video --playlist "https://www.youtube.com/playlist?list=PLxxxxxxx" --name course-name
|
||||
```
|
||||
|
||||
Note: `--start-time` and `--end-time` cannot be used with playlists.
|
||||
|
||||
### Local Video Files
|
||||
|
||||
Provide a file path with `--video-file`. Metadata is extracted from the file itself. If a subtitle file (`.srt` or `.vtt`) exists alongside the video with the same base name, it is used automatically.
|
||||
|
||||
```bash
|
||||
skill-seekers video --video-file recording.mp4 --name my-recording
|
||||
```
|
||||
|
||||
For transcript extraction from local files without subtitles, the tool falls back to Whisper speech-to-text (requires `faster-whisper` from the `video-full` extras).
|
||||
|
||||
### Pre-Extracted JSON (`--from-json`)
|
||||
|
||||
If you have already run extraction and saved the JSON data, you can rebuild the skill without re-downloading or re-processing:
|
||||
|
||||
```bash
|
||||
skill-seekers video --from-json output/my-tutorial_video_extracted.json --name my-tutorial
|
||||
```
|
||||
|
||||
This skips all network requests and video processing -- it only runs the skill-building step.
|
||||
|
||||
---
|
||||
|
||||
## Visual Extraction Pipeline
|
||||
|
||||
When `--visual` is enabled, the tool runs a multi-stream pipeline on the video file:
|
||||
|
||||
### Stream 1: Metadata Extraction
|
||||
|
||||
Uses `yt-dlp` to fetch title, channel, duration, chapters, tags, thumbnails, view/like counts, and upload date.
|
||||
|
||||
### Stream 2: Transcript Extraction (3-Tier Fallback)
|
||||
|
||||
Transcripts are acquired in priority order:
|
||||
|
||||
1. **YouTube Transcript API** -- Fetches official captions. Prefers manually created transcripts over auto-generated ones. Confidence is reduced by 20% for auto-generated captions.
|
||||
2. **Subtitle files** -- Parses `.srt` or `.vtt` files found alongside local video files.
|
||||
3. **Whisper fallback** -- Runs `faster-whisper` speech-to-text on the audio track. Requires the `video-full` extras.
|
||||
|
||||
If none succeed, the video is processed without a transcript (visual data only).
|
||||
|
||||
### Stream 3: Visual Extraction
|
||||
|
||||
The visual pipeline has several stages:
|
||||
|
||||
1. **Scene detection** -- Samples frames at `--visual-interval` intervals (default: every 0.7 seconds). Filters duplicates using pixel-diff comparison controlled by `--visual-similarity`.
|
||||
|
||||
2. **Keyframe classification** -- Each extracted frame is classified into one of these types:
|
||||
- `CODE_EDITOR` -- IDE or text editor showing code
|
||||
- `TERMINAL` -- Command line / terminal window
|
||||
- `SLIDE` -- Presentation slide
|
||||
- `DIAGRAM` -- Diagrams, flowcharts, architecture drawings
|
||||
- `BROWSER` -- Web browser content
|
||||
- `WEBCAM` -- Speaker face / webcam feed
|
||||
- `SCREENCAST` -- General screen recording
|
||||
- `OTHER` -- Anything else
|
||||
|
||||
3. **Per-panel OCR** -- For `CODE_EDITOR` and `TERMINAL` frames, the image is split into panels (e.g., sidebar vs. editor pane) and OCR is run on each panel separately. This avoids mixing IDE UI text with actual code.
|
||||
|
||||
4. **OCR line cleaning** -- Removes line numbers captured by OCR, IDE decorations, button labels, and intra-line duplications from multi-engine results.
|
||||
|
||||
5. **Code filtering** -- The `_is_likely_code()` function checks whether OCR text contains actual programming tokens (e.g., `=`, `{}`, `def`, `import`) rather than UI junk. Only text that passes this filter is included in reference files as code blocks.
|
||||
|
||||
6. **Text block tracking** -- Groups OCR results across sequential frames into "text groups" that track the evolution of on-screen code over time. Detects additions, modifications, and deletions between frames.
|
||||
|
||||
7. **Language detection** -- Uses the `LanguageDetector` to identify the programming language of each text group based on code patterns and keywords.
|
||||
|
||||
8. **Audio-visual alignment** -- Pairs on-screen code with overlapping transcript segments to create annotated code examples (what was on screen + what the narrator was saying).
|
||||
|
||||
### Vision API Fallback (`--vision-ocr`)
|
||||
|
||||
When `--vision-ocr` is enabled and OCR confidence is low on a code frame, the tool sends the frame image to the Claude Vision API for higher-quality code extraction. This costs approximately $0.004 per frame and requires `ANTHROPIC_API_KEY` to be set.
|
||||
|
||||
### OCR Engines
|
||||
|
||||
The tool uses a multi-engine OCR ensemble:
|
||||
|
||||
- **EasyOCR** -- Neural network-based, good at recognizing code fonts
|
||||
- **Tesseract** (via pytesseract) -- Traditional OCR engine, handles clean text well
|
||||
|
||||
Results from both engines are merged with deduplication to maximize accuracy.
|
||||
|
||||
---
|
||||
|
||||
## AI Enhancement
|
||||
|
||||
Enhancement is **disabled by default** for the video command (`--enhance-level` defaults to 0). Enable it by setting `--enhance-level` to 1, 2, or 3.
|
||||
|
||||
### Enhancement Levels
|
||||
|
||||
| Level | What It Does |
|
||||
|-------|-------------|
|
||||
| `0` | No AI enhancement. Raw extraction output only. |
|
||||
| `1` | Enhances SKILL.md only (overview, structure, readability). |
|
||||
| `2` | **Recommended.** Two-pass enhancement: first cleans reference files (Code Timeline reconstruction), then runs workflow stages and enhances SKILL.md. |
|
||||
| `3` | Full enhancement. All level-2 work plus architecture, configuration, and comprehensive documentation analysis. |
|
||||
|
||||
Enhancement auto-detects the mode based on environment:
|
||||
|
||||
- If `ANTHROPIC_API_KEY` is set, uses **API mode** (direct Claude API calls).
|
||||
- Otherwise, uses **LOCAL mode** (Claude Code CLI, free with Max plan).
|
||||
|
||||
### Two-Pass Enhancement (Level 2+)
|
||||
|
||||
At enhancement level 2 or higher, the tool runs two passes:
|
||||
|
||||
**Pass 1: Reference file cleanup.** Each reference file is sent to Claude with a focused prompt to reconstruct the Code Timeline section. The AI uses transcript context to fix OCR errors, remove UI decorations, set correct language tags, and reconstruct garbled code blocks.
|
||||
|
||||
**Pass 2: Workflow stages + SKILL.md rewrite.** The `video-tutorial` workflow runs four specialized stages, then the traditional SKILL.md enhancer rewrites the final output.
|
||||
|
||||
### The `video-tutorial` Workflow
|
||||
|
||||
When `--enhance-level` > 0 and no `--enhance-workflow` is explicitly specified, the `video-tutorial` workflow is automatically applied. It has four stages:
|
||||
|
||||
1. **`ocr_code_cleanup`** -- Reviews all code blocks for OCR noise. Removes captured line numbers, UI elements, and common OCR character confusions (l/1, O/0, rn/m). Outputs cleaned blocks with language detection and confidence scoring.
|
||||
|
||||
2. **`language_detection`** -- Determines the programming language for each code block using narrator mentions, code patterns, visible file extensions, framework context, and pre-filled `detected_language` hints from the extraction pipeline.
|
||||
|
||||
3. **`tutorial_synthesis`** -- Groups content by topic rather than timestamp. Identifies main concepts, builds a progressive learning path, and pairs code blocks with narrator explanations. Creates structured tutorial sections with prerequisites and key concepts.
|
||||
|
||||
4. **`skill_polish`** -- Produces the final SKILL.md with clear trigger conditions, a quick reference of 5-10 annotated code examples, a step-by-step guide, and key concept definitions. Ensures all code fences have correct language tags and no raw OCR artifacts remain.
|
||||
|
||||
---
|
||||
|
||||
## Output Structure
|
||||
|
||||
After extraction completes, the output directory contains:
|
||||
|
||||
```
|
||||
output/<name>/
|
||||
├── SKILL.md # Main skill file (enhanced if --enhance-level > 0)
|
||||
├── references/
|
||||
│ └── video_<sanitized-title>.md # Full transcript + OCR + Code Timeline per video
|
||||
├── frames/ # Only present with --visual
|
||||
│ └── frame_NNN_Ns.jpg # Extracted keyframes (N = frame number, Ns = timestamp)
|
||||
└── video_data/
|
||||
└── metadata.json # Full extraction metadata (VideoScraperResult)
|
||||
```
|
||||
|
||||
Additionally, a standalone JSON file is saved outside the skill directory:
|
||||
|
||||
```
|
||||
output/<name>_video_extracted.json # Raw extraction data (can be re-used with --from-json)
|
||||
```
|
||||
|
||||
### Reference File Contents
|
||||
|
||||
Each reference file (`references/video_<title>.md`) contains:
|
||||
|
||||
- **Metadata block** -- Source channel, duration, publish date, URL, view/like counts, tags
|
||||
- **Table of contents** -- From YouTube chapters or auto-generated segments
|
||||
- **Segments** -- Transcript text organized by time segment, with keyframe images and OCR text inline
|
||||
- **Code Timeline** -- (visual mode) Tracked code groups showing text evolution over time, with edit diffs
|
||||
- **Audio-Visual Alignment** -- (visual mode) Paired on-screen code with narrator explanations
|
||||
- **Transcript source** -- Which tier provided the transcript (YouTube manual, YouTube auto, subtitle file, Whisper) and confidence score
|
||||
|
||||
---
|
||||
|
||||
## Time Clipping
|
||||
|
||||
Use `--start-time` and `--end-time` to extract only a portion of a video. This is useful for long videos where you only need a specific section.
|
||||
|
||||
**Accepted time formats:**
|
||||
|
||||
| Format | Example | Meaning |
|
||||
|--------|---------|---------|
|
||||
| Seconds | `90` or `330.5` | 90 seconds / 330.5 seconds |
|
||||
| MM:SS | `1:30` | 1 minute 30 seconds |
|
||||
| HH:MM:SS | `0:05:30` | 5 minutes 30 seconds |
|
||||
|
||||
Both transcript segments and chapters are filtered to the specified range. When visual extraction is enabled, frames outside the range are skipped.
|
||||
|
||||
```bash
|
||||
# Extract only minutes 5 through 15
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --start-time 5:00 --end-time 15:00
|
||||
|
||||
# Extract from 2 minutes onward
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --start-time 120
|
||||
|
||||
# Extract the first 10 minutes
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --end-time 10:00
|
||||
```
|
||||
|
||||
Restrictions:
|
||||
|
||||
- `--start-time` must be less than `--end-time` when both are specified.
|
||||
- Time clipping cannot be used with `--playlist`.
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic transcript extraction from a YouTube video
|
||||
|
||||
```bash
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID --name react-hooks-tutorial
|
||||
```
|
||||
|
||||
### Visual extraction with on-screen code recognition
|
||||
|
||||
```bash
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --name godot-signals --visual
|
||||
```
|
||||
|
||||
### Full pipeline with AI enhancement (recommended for production skills)
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --name django-rest-api \
|
||||
--visual --enhance-level 2
|
||||
```
|
||||
|
||||
### Process a local recording with subtitles
|
||||
|
||||
```bash
|
||||
# Place recording.srt alongside recording.mp4
|
||||
skill-seekers video --video-file ./recording.mp4 --name my-lecture
|
||||
```
|
||||
|
||||
### Extract a specific section of a long video
|
||||
|
||||
```bash
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --name auth-chapter \
|
||||
--start-time 15:30 --end-time 42:00 --visual
|
||||
```
|
||||
|
||||
### Process an entire YouTube playlist as one skill
|
||||
|
||||
```bash
|
||||
skill-seekers video --playlist "https://www.youtube.com/playlist?list=PLxxxxxxx" \
|
||||
--name python-crash-course --languages en
|
||||
```
|
||||
|
||||
### Rebuild a skill from previously extracted data
|
||||
|
||||
```bash
|
||||
skill-seekers video --from-json output/my-tutorial_video_extracted.json \
|
||||
--name my-tutorial --enhance-level 2
|
||||
```
|
||||
|
||||
### Use Vision API for higher-quality code extraction on difficult frames
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
skill-seekers video --url https://youtu.be/VIDEO_ID --name cpp-tutorial \
|
||||
--visual --vision-ocr --enhance-level 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Missing video dependencies: yt-dlp, youtube-transcript-api"
|
||||
|
||||
You need to install the video extras:
|
||||
|
||||
```bash
|
||||
pip install "skill-seekers[video]"
|
||||
```
|
||||
|
||||
### "Missing video dependencies" when using `--visual`
|
||||
|
||||
Visual extraction requires the full dependency set:
|
||||
|
||||
```bash
|
||||
pip install "skill-seekers[video-full]"
|
||||
skill-seekers video --setup
|
||||
```
|
||||
|
||||
### GPU not detected by `--setup`
|
||||
|
||||
- **NVIDIA:** Ensure `nvidia-smi` is in your PATH and your GPU driver is installed.
|
||||
- **AMD:** Ensure ROCm is installed and `rocminfo` is available. If only `lspci` detects the GPU, install ROCm first for GPU acceleration: https://rocm.docs.amd.com/
|
||||
- **Fallback:** If no GPU is found, CPU-only PyTorch is installed. OCR and Whisper will still work, just slower.
|
||||
|
||||
### "tesseract is not installed or it's not in your PATH"
|
||||
|
||||
Install Tesseract via your system package manager:
|
||||
|
||||
```bash
|
||||
sudo apt install tesseract-ocr # Ubuntu/Debian
|
||||
brew install tesseract # macOS
|
||||
sudo pacman -S tesseract # Arch/Manjaro
|
||||
```
|
||||
|
||||
### YouTube transcript returns empty
|
||||
|
||||
Some videos have no captions available. Check:
|
||||
|
||||
- The video may have captions disabled by the uploader.
|
||||
- Try different languages with `--languages en,auto`.
|
||||
- For local files, place a `.srt` or `.vtt` subtitle file alongside the video.
|
||||
- Install `faster-whisper` (via `video-full`) for speech-to-text fallback.
|
||||
|
||||
### Rate limits from YouTube
|
||||
|
||||
`yt-dlp` can be rate-limited by YouTube. If you hit this:
|
||||
|
||||
- Wait a few minutes and retry.
|
||||
- For playlists, the tool processes videos sequentially with natural delays.
|
||||
- Consider downloading the video first with `yt-dlp` and using `--video-file`.
|
||||
|
||||
### OCR quality is poor
|
||||
|
||||
- Use `--vision-ocr` to enable the Claude Vision API fallback for low-confidence frames (~$0.004/frame).
|
||||
- Lower `--visual-similarity` (e.g., `1.5`) to keep more frames, giving the tracker more data points.
|
||||
- Decrease `--visual-interval` (e.g., `0.3`) to sample frames more frequently.
|
||||
- Use `--enhance-level 2` to let AI reconstruct code blocks from transcript context.
|
||||
|
||||
### Enhancement fails or hangs
|
||||
|
||||
- Verify your API key: `echo $ANTHROPIC_API_KEY`
|
||||
- Check that the key has sufficient quota.
|
||||
- Try a lower enhancement level: `--enhance-level 1` only enhances SKILL.md.
|
||||
- Without an API key, enhancement falls back to LOCAL mode (requires Claude Code CLI with a Max plan).
|
||||
|
||||
### "No videos were successfully processed"
|
||||
|
||||
Check the error output for specifics. Common causes:
|
||||
|
||||
- Invalid or private YouTube URL.
|
||||
- Network connectivity issues.
|
||||
- Video is age-restricted or geo-blocked.
|
||||
- Local file path does not exist or is not a supported format.
|
||||
Reference in New Issue
Block a user