Merge feature/video-scraper-pipeline into development

Video tutorial scraping pipeline (BETA): - Extract skills from YouTube/Vimeo/local video tutorials - Visual frame extraction with multi-engine OCR (EasyOCR + pytesseract ensemble) - Per-panel code detection and structured text assembly - Keyframe extraction via scene detection - Whisper transcription fallback - AI enhancement of extracted content - `skill-seekers video --setup` for GPU auto-detection and dependency installation (NVIDIA CUDA, AMD ROCm, CPU-only) - MCP `scrape_video` tool with setup parameter - 240 tests passing (60 setup + 180 scraper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 18:55:10 +03:00
parent 68bdbe8307 cc9cc32417
commit 446f6a8955
43 changed files with 17191 additions and 41 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,12 +1,12 @@
 # AGENTS.md - Skill Seekers

-This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.
+Essential guidance for AI coding agents working with the Skill Seekers codebase.

 ---

 ## Project Overview

-**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.
+**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, PDF files, and videos into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.

 ### Key Facts

@@ -16,8 +16,8 @@ This file provides essential guidance for AI coding agents working with the Skil
 | **Python Version** | 3.10+ (tested on 3.10, 3.11, 3.12, 3.13) |
 | **License** | MIT |
 | **Package Name** | `skill-seekers` (PyPI) |
-| **Source Files** | 169 Python files |
-| **Test Files** | 101 test files |
+| **Source Files** | 182 Python files |
+| **Test Files** | 105+ test files |
 | **Website** | https://skillseekersweb.com/ |
 | **Repository** | https://github.com/yusufkaraaslan/Skill_Seekers |

@@ -44,7 +44,7 @@ This file provides essential guidance for AI coding agents working with the Skil

 ### Core Workflow

-1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources
+1. **Scrape Phase** - Crawl documentation/GitHub/PDF/video sources
 2. **Build Phase** - Organize content into categorized references
 3. **Enhancement Phase** - AI-powered quality improvements (optional)
 4. **Package Phase** - Create platform-specific packages
@@ -73,12 +73,18 @@ This file provides essential guidance for AI coding agents working with the Skil
 │   │   │   ├── weaviate.py         # Weaviate vector DB adaptor
 │   │   │   └── streaming_adaptor.py # Streaming output adaptor
 │   │   ├── arguments/              # CLI argument definitions
+│   │   ├── parsers/                # Argument parsers
+│   │   │   └── extractors/         # Content extractors
 │   │   ├── presets/                # Preset configuration management
+│   │   ├── storage/                # Cloud storage adaptors
 │   │   ├── main.py                 # Unified CLI entry point
 │   │   ├── create_command.py       # Unified create command
 │   │   ├── doc_scraper.py          # Documentation scraper
 │   │   ├── github_scraper.py       # GitHub repository scraper
 │   │   ├── pdf_scraper.py          # PDF extraction
+│   │   ├── word_scraper.py         # Word document scraper
+│   │   ├── video_scraper.py        # Video extraction
+│   │   ├── video_setup.py          # GPU detection & dependency installation
 │   │   ├── unified_scraper.py      # Multi-source scraping
 │   │   ├── codebase_scraper.py     # Local codebase analysis
 │   │   ├── enhance_command.py      # AI enhancement command
@@ -118,10 +124,10 @@ This file provides essential guidance for AI coding agents working with the Skil
 │   │   ├── generator.py            # Embedding generation
 │   │   ├── cache.py                # Embedding cache
 │   │   └── models.py               # Embedding models
-│   ├── workflows/                  # YAML workflow presets
+│   ├── workflows/                  # YAML workflow presets (66 presets)
 │   ├── _version.py                 # Version information (reads from pyproject.toml)
 │   └── __init__.py                 # Package init
-├── tests/                          # Test suite (101 test files)
+├── tests/                          # Test suite (105+ test files)
 ├── configs/                        # Preset configuration files
 ├── docs/                           # Documentation (80+ markdown files)
 │   ├── integrations/               # Platform integration guides
@@ -245,9 +251,8 @@ pytest tests/ -v -m "not slow and not integration"

 ### Test Architecture

- **101 test files** covering all features
- **1880+ tests** passing
- CI Matrix: Ubuntu + macOS, Python 3.10-3.12
+- **105+ test files** covering all features
+- **CI Matrix:** Ubuntu + macOS, Python 3.10-3.12
 - Test markers defined in `pyproject.toml`:

 | Marker | Description |
@@ -376,6 +381,8 @@ The CLI uses subcommands that delegate to existing modules:
 - `scrape` - Documentation scraping
 - `github` - GitHub repository scraping
 - `pdf` - PDF extraction
+- `word` - Word document extraction
+- `video` - Video extraction (YouTube or local). Use `--setup` to auto-detect GPU and install visual deps.
 - `unified` - Multi-source scraping
 - `analyze` / `codebase` - Local codebase analysis
 - `enhance` - AI enhancement
@@ -402,7 +409,7 @@ Two implementations:

 Tools are organized by category:
 - Config tools (3 tools): generate_config, list_configs, validate_config
- Scraping tools (9 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
+- Scraping tools (10 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video (supports `setup` parameter for GPU detection and visual dep installation), scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
 - Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill
 - Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
 - Splitting tools (2 tools): split_config, generate_router
@@ -619,7 +626,7 @@ export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1

 **Reference (technical details):**
 - `docs/reference/CLI_REFERENCE.md` - Complete command reference (20 commands)
- `docs/reference/MCP_REFERENCE.md` - MCP tools reference (26 tools)
+- `docs/reference/MCP_REFERENCE.md` - MCP tools reference (33 tools)
 - `docs/reference/CONFIG_FORMAT.md` - JSON configuration specification
 - `docs/reference/ENVIRONMENT_VARIABLES.md` - All environment variables

@@ -629,20 +636,16 @@ export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1
 - `docs/advanced/custom-workflows.md` - Creating custom workflows
 - `docs/advanced/multi-source.md` - Multi-source scraping

-**Legacy (being phased out):**
- `QUICKSTART.md` - Old quick start (see docs/getting-started/)
- `docs/guides/USAGE.md` - Old usage guide (see docs/user-guide/)
- `docs/QUICK_REFERENCE.md` - Old reference (see docs/reference/)
-
 ### Configuration Documentation

 Preset configs are in `configs/` directory:
- `godot.json` - Godot Engine
+- `godot.json` / `godot_unified.json` - Godot Engine
 - `blender.json` / `blender-unified.json` - Blender Engine
 - `claude-code.json` - Claude Code
 - `httpx_comprehensive.json` - HTTPX library
 - `medusa-mercurjs.json` - Medusa/MercurJS
 - `astrovalley_unified.json` - Astrovalley
+- `react.json` - React documentation
 - `configs/integrations/` - Integration-specific configs

 ---
@@ -685,8 +688,13 @@ Preset configs are in `configs/` directory:
 | AWS S3 | `boto3>=1.34.0` | `pip install -e ".[s3]"` |
 | Google Cloud Storage | `google-cloud-storage>=2.10.0` | `pip install -e ".[gcs]"` |
 | Azure Blob Storage | `azure-storage-blob>=12.19.0` | `pip install -e ".[azure]"` |
+| Word Documents | `mammoth>=1.6.0`, `python-docx>=1.1.0` | `pip install -e ".[docx]"` |
+| Video (lightweight) | `yt-dlp>=2024.12.0`, `youtube-transcript-api>=1.2.0` | `pip install -e ".[video]"` |
+| Video (full) | +`faster-whisper`, `scenedetect`, `opencv-python-headless` (`easyocr` now installed via `--setup`) | `pip install -e ".[video-full]"` |
+| Video (GPU setup) | Auto-detects GPU, installs PyTorch + easyocr + all visual deps | `skill-seekers video --setup` |
 | Chroma DB | `chromadb>=0.4.0` | `pip install -e ".[chroma]"` |
 | Weaviate | `weaviate-client>=3.25.0` | `pip install -e ".[weaviate]"` |
+| Pinecone | `pinecone>=5.0.0` | `pip install -e ".[pinecone]"` |
 | Embedding Server | `fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0` | `pip install -e ".[embedding]"` |

 ### Dev Dependencies (in dependency-groups)
@@ -702,6 +710,7 @@ Preset configs are in `configs/` directory:
 | `psutil` | >=5.9.0 | Process utilities for testing |
 | `numpy` | >=1.24.0 | Numerical operations |
 | `starlette` | >=0.31.0 | HTTP transport testing |
+| `httpx` | >=0.24.0 | HTTP client for testing |
 | `boto3` | >=1.26.0 | AWS S3 testing |
 | `google-cloud-storage` | >=2.10.0 | GCS testing |
 | `azure-storage-blob` | >=12.17.0 | Azure testing |
@@ -824,6 +833,34 @@ Skill Seekers uses JSON configuration files to define scraping targets. Example

 ---

+## Workflow Presets
+
+Skill Seekers includes 66 YAML workflow presets for AI enhancement in `src/skill_seekers/workflows/`:
+
+**Built-in presets:**
+- `default.yaml` - Standard enhancement workflow
+- `minimal.yaml` - Fast, minimal enhancement
+- `security-focus.yaml` - Security-focused review
+- `architecture-comprehensive.yaml` - Deep architecture analysis
+- `api-documentation.yaml` - API documentation focus
+- And 61 more specialized presets...
+
+**Usage:**
+```bash
+# Apply a preset
+skill-seekers create ./my-project --enhance-workflow security-focus
+
+# Chain multiple presets
+skill-seekers create ./my-project --enhance-workflow security-focus --enhance-workflow minimal
+
+# Manage presets
+skill-seekers workflows list
+skill-seekers workflows show security-focus
+skill-seekers workflows copy security-focus
+```
+
+---
+
 *This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*

-*Last updated: 2026-02-24*
+*Last updated: 2026-03-01*
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+### 🎬 Video `--setup`: GPU Auto-Detection & Dependency Installation
+
+### Added
+- **`skill-seekers video --setup`** — One-command GPU auto-detection and dependency installation for the video scraper pipeline
+  - `video_setup.py` (~835 lines) — New module with complete setup orchestration
+  - **GPU auto-detection** — Detects NVIDIA (nvidia-smi → CUDA version), AMD (rocminfo → ROCm version), or CPU-only without requiring PyTorch
+  - **Correct PyTorch variant** — Installs from the right index URL: `cu124`/`cu121`/`cu118` for NVIDIA, `rocm6.3`/`rocm6.2.4` for AMD, `cpu` for CPU-only
+  - **ROCm configuration** — Sets `MIOPEN_FIND_MODE=FAST` and `HSA_OVERRIDE_GFX_VERSION` for AMD GPUs (fixes MIOpen workspace allocation issues)
+  - **Virtual environment detection** — Warns users outside a venv with opt-in `--force` override
+  - **System dependency checks** — Validates `tesseract` and `ffmpeg` binaries, provides OS-specific install instructions
+  - **Module selection** — `SetupModules` dataclass for optional component selection (easyocr, opencv, tesseract, scenedetect, whisper)
+  - **Base video deps always included** — `yt-dlp` and `youtube-transcript-api` installed automatically so video pipeline is fully ready after setup
+  - **Verification step** — Post-install import checks for all deps including `torch.cuda.is_available()` and `torch.version.hip`
+  - **Non-interactive mode** — `run_setup(interactive=False)` for MCP server and CI/CD use
+- **`--setup` flag** in `arguments/video.py` — Added to `VIDEO_ARGUMENTS` dict
+- **Early-exit in `video_scraper.py`** — `--setup` runs before source validation (no `--url` required)
+- **MCP `scrape_video` setup parameter** — `setup: bool = False` param in `server_fastmcp.py` and `scraping_tools.py`
+- **`create` command routing** — `create_command.py` forwards `--setup` to video scraper
+- **`tests/test_video_setup.py`** (60 tests) — GPU detection, CUDA/ROCm version mapping, installation, verification, venv checks, system deps, module selection, argument parsing
+
+### Changed
+- **`easyocr` removed from `video-full` optional deps** — Was pulling ~2GB of NVIDIA CUDA packages regardless of GPU vendor. Now installed via `--setup` with correct PyTorch variant.
+- **Video dependency error messages** — `video_scraper.py` and `video_visual.py` now suggest `skill-seekers video --setup` as the primary fix
+- **Multi-engine OCR** — `video_visual.py` uses EasyOCR + pytesseract ensemble for code frames (per-line confidence merge with code-token preference), EasyOCR only for non-code frames
+- **Tesseract circuit breaker** — `_tesseract_broken` flag disables pytesseract for the session after first failure, avoiding repeated subprocess errors
+- **`video_models.py`** — Added `SetupModules` dataclass for granular dependency control
+- **`video_segmenter.py`** — Updated dependency check messages to reference `--setup`
+
 ### 📄 B2: Microsoft Word (.docx) Support & Stage 1 Quality Improvements

 ### Added
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -341,6 +341,9 @@ skill-seekers how-to-guides output/test_examples.json --output output/guides/
 # Test enhancement status monitoring
 skill-seekers enhance-status output/react/ --watch

+# Video setup (auto-detect GPU and install deps)
+skill-seekers video --setup
+
 # Test multi-platform packaging
 skill-seekers package output/react/ --target gemini --dry-run

@@ -750,6 +753,7 @@ skill-seekers-install-agent = "skill_seekers.cli.install_agent:main"
 skill-seekers-patterns = "skill_seekers.cli.pattern_recognizer:main"         # C3.1 Pattern detection
 skill-seekers-how-to-guides = "skill_seekers.cli.how_to_guide_builder:main" # C3.3 Guide generation
 skill-seekers-workflows = "skill_seekers.cli.workflows_command:main"         # NEW: Workflow preset management
+skill-seekers-video = "skill_seekers.cli.video_scraper:main"                  # Video scraping pipeline (use --setup to install deps)

 # New v3.0.0 Entry Points
 skill-seekers-setup = "skill_seekers.cli.setup_wizard:main"                  # NEW: v3.0.0 Setup wizard
@@ -771,6 +775,8 @@ skill-seekers-quality = "skill_seekers.cli.quality_metrics:main"             # N
 - Install with: `pip install -e .` (installs only core deps)
 - Install dev deps: See CI workflow or manually install pytest, ruff, mypy

+**Note on video dependencies:** `easyocr` and GPU-specific PyTorch builds are **not** included in the `video-full` optional dependency group. They are installed at runtime by `skill-seekers video --setup`, which auto-detects the GPU (CUDA/ROCm/MPS/CPU) and installs the correct builds.
+
 ```toml
 [project.optional-dependencies]
 gemini = ["google-generativeai>=0.8.0"]
@@ -1985,6 +1991,13 @@ UNIVERSAL_ARGUMENTS = {
  - Profile creation
  - First-time setup

+**Video Scraper** (`src/skill_seekers/cli/`):
+- `video_scraper.py` - Main video scraping pipeline CLI
+- `video_setup.py` - GPU auto-detection, PyTorch installation, visual dependency setup (~835 lines)
+  - Detects CUDA/ROCm/MPS/CPU and installs matching PyTorch build
+  - Installs `easyocr` and other visual processing deps at runtime via `--setup`
+  - Run `skill-seekers video --setup` before first use
+
 ## 🎯 Project-Specific Best Practices

 1. **Prefer the unified `create` command** - Use `skill-seekers create <source>` over legacy commands for consistency
--- a/README.md
+++ b/README.md
@@ -92,6 +92,11 @@ skill-seekers create ./my-project

 # PDF document
 skill-seekers create manual.pdf
+
+# Video (YouTube, Vimeo, or local file — requires skill-seekers[video])
+skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial
+# First time? Auto-install GPU-aware visual deps:
+skill-seekers video --setup
 ```

 ### Export Everywhere
@@ -593,8 +598,14 @@ skill-seekers-setup
 | `pip install skill-seekers[openai]` | + OpenAI ChatGPT support |
 | `pip install skill-seekers[all-llms]` | + All LLM platforms |
 | `pip install skill-seekers[mcp]` | + MCP server for Claude Code, Cursor, etc. |
+| `pip install skill-seekers[video]` | + YouTube/Vimeo transcript & metadata extraction |
+| `pip install skill-seekers[video-full]` | + Whisper transcription & visual frame extraction |
 | `pip install skill-seekers[all]` | Everything enabled |

+> **Video visual deps (GPU-aware):** After installing `skill-seekers[video-full]`, run
+> `skill-seekers video --setup` to auto-detect your GPU and install the correct PyTorch
+> variant + easyocr. This is the recommended way to install visual extraction dependencies.
+
 ---

 ## 🚀 One-Command Install Workflow
@@ -683,6 +694,29 @@ skill-seekers pdf --pdf docs/manual.pdf --name myskill \
 skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
 ```

+### Video Extraction
+
+```bash
+# Install video support
+pip install skill-seekers[video]        # Transcripts + metadata
+pip install skill-seekers[video-full]   # + Whisper + visual frame extraction
+
+# Auto-detect GPU and install visual deps (PyTorch + easyocr)
+skill-seekers video --setup
+
+# Extract from YouTube video
+skill-seekers video --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --name mytutorial
+
+# Extract from a YouTube playlist
+skill-seekers video --playlist https://www.youtube.com/playlist?list=... --name myplaylist
+
+# Extract from a local video file
+skill-seekers video --video-file recording.mp4 --name myrecording
+
+# Extract with visual frame analysis (requires video-full deps)
+skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial --visual
+```
+
 ### GitHub Repository Analysis

 ```bash
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -59,6 +59,24 @@ Each platform has a dedicated adaptor for optimal formatting and upload.

 **Recommendation:** Use LOCAL mode for free AI enhancement or skip enhancement entirely.

+### How do I set up video extraction?
+
+**Quick setup:**
+```bash
+# 1. Install video support
+pip install skill-seekers[video-full]
+
+# 2. Auto-detect GPU and install visual deps
+skill-seekers video --setup
+```
+
+The `--setup` command auto-detects your GPU vendor (NVIDIA CUDA, AMD ROCm, or CPU-only) and installs the correct PyTorch variant along with easyocr and other visual extraction dependencies. This avoids the ~2GB NVIDIA CUDA download that would happen if easyocr were installed via pip on non-NVIDIA systems.
+
+**What it detects:**
+- **NVIDIA:** Uses `nvidia-smi` to find CUDA version → installs matching `cu124`/`cu121`/`cu118` PyTorch
+- **AMD:** Uses `rocminfo` to find ROCm version → installs matching ROCm PyTorch
+- **CPU-only:** Installs lightweight CPU-only PyTorch
+
 ### How long does it take to create a skill?

 **Typical Times:**
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@@ -90,6 +90,35 @@ pyenv install 3.12
 pyenv global 3.12
 ```

+### Issue: Video Visual Dependencies Missing
+
+**Symptoms:**
+```
+Missing video dependencies: easyocr
+RuntimeError: Required video visual dependencies not installed
+```
+
+**Solutions:**
+
+```bash
+# Run the GPU-aware setup command
+skill-seekers video --setup
+
+# This auto-detects your GPU and installs:
+# - PyTorch (correct CUDA/ROCm/CPU variant)
+# - easyocr, opencv, pytesseract, scenedetect, faster-whisper
+# - yt-dlp, youtube-transcript-api
+
+# Verify installation
+python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
+python -c "import easyocr; print('easyocr OK')"
+```
+
+**Common issues:**
+- Running outside a virtual environment → `--setup` will warn you; create a venv first
+- Missing system packages → Install `tesseract-ocr` and `ffmpeg` for your OS
+- AMD GPU without ROCm → Install ROCm first, then re-run `--setup`
+
 ## Configuration Issues

 ### Issue: API Keys Not Recognized
--- a/docs/getting-started/01-installation.md
+++ b/docs/getting-started/01-installation.md
@@ -124,10 +124,14 @@ pip install skill-seekers[dev]
 | `gcs` | Google Cloud Storage | `pip install skill-seekers[gcs]` |
 | `azure` | Azure Blob Storage | `pip install skill-seekers[azure]` |
 | `embedding` | Embedding server | `pip install skill-seekers[embedding]` |
+| `video` | YouTube/video transcript extraction | `pip install skill-seekers[video]` |
+| `video-full` | + Whisper transcription, scene detection | `pip install skill-seekers[video-full]` |
 | `all-llms` | All LLM platforms | `pip install skill-seekers[all-llms]` |
 | `all` | Everything | `pip install skill-seekers[all]` |
 | `dev` | Development tools | `pip install skill-seekers[dev]` |

+> **Video visual deps:** After installing `skill-seekers[video-full]`, run `skill-seekers video --setup` to auto-detect your GPU (NVIDIA/AMD/CPU) and install the correct PyTorch variant + easyocr.
+
 ---

 ## Post-Installation Setup
--- a/docs/plans/video/00_VIDEO_SOURCE_OVERVIEW.md
+++ b/docs/plans/video/00_VIDEO_SOURCE_OVERVIEW.md
@@ -0,0 +1,261 @@
+# Video Source Support — Master Plan
+
+**Date:** February 27, 2026
+**Feature ID:** V1.0
+**Status:** Planning
+**Priority:** High
+**Estimated Complexity:** Large (multi-sprint feature)
+
+---
+
+## Table of Contents
+
+1. [Executive Summary](#executive-summary)
+2. [Motivation & Goals](#motivation--goals)
+3. [Scope](#scope)
+4. [Plan Documents Index](#plan-documents-index)
+5. [High-Level Architecture](#high-level-architecture)
+6. [Implementation Phases](#implementation-phases)
+7. [Dependencies](#dependencies)
+8. [Risk Assessment](#risk-assessment)
+9. [Success Criteria](#success-criteria)
+
+---
+
+## Executive Summary
+
+Add **video** as a first-class source type in Skill Seekers, alongside web documentation, GitHub repositories, PDF files, and Word documents. Videos contain a massive amount of knowledge — conference talks, official tutorials, live coding sessions, architecture walkthroughs — that is currently inaccessible to our pipeline.
+
+The video source will use a **3-stream parallel extraction** model:
+
+| Stream | What | Tool |
+|--------|------|------|
+| **ASR** (Audio Speech Recognition) | Spoken words → timestamped text | youtube-transcript-api + faster-whisper |
+| **OCR** (Optical Character Recognition) | On-screen code/slides/diagrams → text | PySceneDetect + OpenCV + easyocr |
+| **Metadata** | Title, chapters, tags, description | yt-dlp Python API |
+
+These three streams are **aligned on a shared timeline** and merged into structured `VideoSegment` objects — the fundamental output unit. Segments are then categorized, converted to reference markdown files, and integrated into SKILL.md just like any other source.
+
+---
+
+## Motivation & Goals
+
+### Why Video?
+
+1. **Knowledge density** — A 30-minute conference talk can contain the equivalent of a 5,000-word blog post, plus live code demos that never appear in written docs.
+2. **Official tutorials** — Many frameworks (React, Flutter, Unity, Godot) have official video tutorials that are the canonical learning resource.
+3. **Code walkthroughs** — Screen-recorded coding sessions show real patterns, debugging workflows, and architecture decisions that written docs miss.
+4. **Conference talks** — JSConf, PyCon, GopherCon, etc. contain deep technical insights from framework authors.
+5. **Completeness** — Skill Seekers aims to be the **universal** documentation preprocessor. Video is the last major content type we don't support.
+
+### Goals
+
+- **G1:** Extract structured, time-aligned knowledge from YouTube videos, playlists, channels, and local video files.
+- **G2:** Integrate video as a first-class source in the unified config system (multiple video sources per skill, alongside docs/github/pdf).
+- **G3:** Auto-detect video sources in the `create` command (YouTube URLs, video file extensions).
+- **G4:** Support two tiers: lightweight (transcript + metadata only) and full (+ visual extraction with OCR).
+- **G5:** Produce output that is indistinguishable in quality from other source types — properly categorized reference files integrated into SKILL.md.
+- **G6:** Make visual extraction (Whisper, OCR) available as optional add-on dependencies, keeping core install lightweight.
+
+### Non-Goals (explicitly out of scope for V1.0)
+
+- Real-time / live stream processing
+- Video generation or editing
+- Speaker diarization (identifying who said what) — future enhancement
+- Automatic video discovery (e.g., "find all React tutorials on YouTube") — future enhancement
+- DRM-protected or paywalled video content (Udemy, Coursera, etc.)
+- Audio-only podcasts (similar pipeline but separate feature)
+
+---
+
+## Scope
+
+### Supported Video Sources
+
+| Source | Input Format | Example |
+|--------|-------------|---------|
+| YouTube single video | URL | `https://youtube.com/watch?v=abc123` |
+| YouTube short URL | URL | `https://youtu.be/abc123` |
+| YouTube playlist | URL | `https://youtube.com/playlist?list=PLxxx` |
+| YouTube channel | URL | `https://youtube.com/@channelname` |
+| Vimeo video | URL | `https://vimeo.com/123456` |
+| Local video file | Path | `./tutorials/intro.mp4` |
+| Local video directory | Path | `./recordings/` (batch) |
+
+### Supported Video Formats (local files)
+
+| Format | Extension | Notes |
+|--------|-----------|-------|
+| MP4 | `.mp4` | Most common, universal |
+| Matroska | `.mkv` | Common for screen recordings |
+| WebM | `.webm` | Web-native, YouTube's format |
+| AVI | `.avi` | Legacy but still used |
+| QuickTime | `.mov` | macOS screen recordings |
+| Flash Video | `.flv` | Legacy, rare |
+| MPEG Transport | `.ts` | Streaming recordings |
+| Windows Media | `.wmv` | Windows screen recordings |
+
+### Supported Languages (transcript)
+
+All languages supported by:
+- YouTube's caption system (100+ languages)
+- faster-whisper / OpenAI Whisper (99 languages)
+
+---
+
+## Plan Documents Index
+
+| Document | Content |
+|----------|---------|
+| [`01_VIDEO_RESEARCH.md`](./01_VIDEO_RESEARCH.md) | Library research, benchmarks, industry standards |
+| [`02_VIDEO_DATA_MODELS.md`](./02_VIDEO_DATA_MODELS.md) | All data classes, type definitions, JSON schemas |
+| [`03_VIDEO_PIPELINE.md`](./03_VIDEO_PIPELINE.md) | Processing pipeline (6 phases), algorithms, edge cases |
+| [`04_VIDEO_INTEGRATION.md`](./04_VIDEO_INTEGRATION.md) | CLI, config, source detection, unified scraper integration |
+| [`05_VIDEO_OUTPUT.md`](./05_VIDEO_OUTPUT.md) | Output structure, SKILL.md integration, reference file format |
+| [`06_VIDEO_TESTING.md`](./06_VIDEO_TESTING.md) | Test strategy, mocking, fixtures, CI considerations |
+| [`07_VIDEO_DEPENDENCIES.md`](./07_VIDEO_DEPENDENCIES.md) | Dependency tiers, optional installs, system requirements — **IMPLEMENTED** (`video_setup.py`, GPU auto-detection, `--setup`) |
+
+---
+
+## High-Level Architecture
+
+```
+                              ┌──────────────────────┐
+                              │    User Input         │
+                              │                       │
+                              │  YouTube URL          │
+                              │  Playlist URL         │
+                              │  Local .mp4 file      │
+                              │  Unified config JSON  │
+                              └──────────┬───────────┘
+                                         │
+                              ┌──────────▼───────────┐
+                              │  Source Detector      │
+                              │  (source_detector.py) │
+                              │  type="video"         │
+                              └──────────┬───────────┘
+                                         │
+                              ┌──────────▼───────────┐
+                              │  Video Scraper        │
+                              │  (video_scraper.py)   │
+                              │  Main orchestrator    │
+                              └──────────┬───────────┘
+                                         │
+                    ┌────────────────────┼────────────────────┐
+                    │                    │                    │
+         ┌──────────▼──────┐  ┌──────────▼──────┐  ┌──────────▼──────┐
+         │  Stream 1: ASR  │  │  Stream 2: OCR  │  │  Stream 3: Meta │
+         │                 │  │  (optional)      │  │                 │
+         │ youtube-trans-  │  │ PySceneDetect    │  │ yt-dlp          │
+         │ cript-api       │  │ OpenCV           │  │ extract_info()  │
+         │ faster-whisper  │  │ easyocr          │  │                 │
+         └────────┬────────┘  └────────┬────────┘  └────────┬────────┘
+                  │                    │                    │
+                  │    Timestamped     │   Keyframes +     │  Chapters,
+                  │    transcript      │   OCR text         │  tags, desc
+                  │                    │                    │
+                  └────────────────────┼────────────────────┘
+                                       │
+                            ┌──────────▼───────────┐
+                            │  Segmenter &         │
+                            │  Aligner             │
+                            │  (video_segmenter.py)│
+                            │                      │
+                            │  Align 3 streams     │
+                            │  on shared timeline  │
+                            └──────────┬───────────┘
+                                       │
+                              list[VideoSegment]
+                                       │
+                            ┌──────────▼───────────┐
+                            │  Output Generator    │
+                            │                      │
+                            │  ├ references/*.md   │
+                            │  ├ video_data/*.json │
+                            │  └ SKILL.md section  │
+                            └──────────────────────┘
+```
+
+---
+
+## Implementation Phases
+
+### Phase 1: Foundation (Core Pipeline)
+- `video_models.py` — All data classes
+- `video_scraper.py` — Main orchestrator
+- `video_transcript.py` — YouTube captions + Whisper fallback
+- Source detector update — YouTube URL patterns, video file extensions
+- Basic metadata extraction via yt-dlp
+- Output: timestamped transcript as reference markdown
+
+### Phase 2: Segmentation & Structure
+- `video_segmenter.py` — Chapter-aware segmentation
+- Semantic segmentation fallback (when no chapters)
+- Time-window fallback (configurable interval)
+- Segment categorization (reuse smart_categorize patterns)
+
+### Phase 3: Visual Extraction
+- `video_visual.py` — Frame extraction + scene detection
+- Frame classification (code/slide/terminal/diagram/other)
+- OCR on classified frames (easyocr)
+- Timeline alignment with ASR transcript
+
+### Phase 4: Integration
+- Unified config support (`"type": "video"`)
+- `create` command routing
+- CLI parser + arguments
+- Unified scraper integration (video alongside docs/github/pdf)
+- SKILL.md section generation
+
+### Phase 5: Quality & Polish
+- AI enhancement for video content (summarization, topic extraction)
+- RAG-optimized chunking for video segments
+- MCP tools (scrape_video, export_video)
+- Comprehensive test suite
+
+---
+
+## Dependencies
+
+### Core (always required for video)
+```
+yt-dlp>=2024.12.0
+youtube-transcript-api>=1.2.0
+```
+
+### Full (for visual extraction + local file transcription)
+```
+faster-whisper>=1.0.0
+scenedetect[opencv]>=0.6.4
+easyocr>=1.7.0
+opencv-python-headless>=4.9.0
+```
+
+### System Requirements (for full mode)
+- FFmpeg (required by faster-whisper and yt-dlp for audio extraction)
+- GPU (optional but recommended for Whisper and easyocr)
+
+---
+
+## Risk Assessment
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| YouTube API changes break scraping | Medium | High | yt-dlp actively maintained, abstract behind our API |
+| Whisper models are large (~1.5GB) | Certain | Medium | Optional dependency, offer multiple model sizes |
+| OCR accuracy on code is low | Medium | Medium | Combine OCR with transcript context, use confidence scoring |
+| Video download is slow | High | Medium | Stream audio only, don't download full video for transcript |
+| Auto-generated captions are noisy | High | Medium | Confidence filtering, AI cleanup in enhancement phase |
+| Copyright / ToS concerns | Low | High | Document that user is responsible for content rights |
+| CI tests can't download videos | Certain | Medium | Mock all network calls, use fixture transcripts |
+
+---
+
+## Success Criteria
+
+1. **Functional:** `skill-seekers create https://youtube.com/watch?v=xxx` produces a skill with video content integrated into SKILL.md.
+2. **Multi-source:** Video sources work alongside docs/github/pdf in unified configs.
+3. **Quality:** Video-derived reference files are categorized and structured (not raw transcript dumps).
+4. **Performance:** Transcript-only mode processes a 30-minute video in < 30 seconds.
+5. **Tests:** Full test suite with mocked network calls, 100% of video pipeline covered.
+6. **Tiered deps:** `pip install skill-seekers[video]` works without pulling Whisper/OpenCV.
--- a/docs/plans/video/01_VIDEO_RESEARCH.md
+++ b/docs/plans/video/01_VIDEO_RESEARCH.md
@@ -0,0 +1,591 @@
+# Video Source — Library Research & Industry Standards
+
+**Date:** February 27, 2026
+**Document:** 01 of 07
+**Status:** Complete
+
+---
+
+## Table of Contents
+
+1. [Industry Standards & Approaches](#industry-standards--approaches)
+2. [Library Comparison Matrix](#library-comparison-matrix)
+3. [Detailed Library Analysis](#detailed-library-analysis)
+4. [Architecture Patterns from Industry](#architecture-patterns-from-industry)
+5. [Benchmarks & Performance Data](#benchmarks--performance-data)
+6. [Recommendations](#recommendations)
+
+---
+
+## Industry Standards & Approaches
+
+### How the Industry Processes Video for AI/RAG
+
+Based on research from NVIDIA, LlamaIndex, Ragie, and open-source projects, the industry has converged on a **3-stream parallel extraction** model:
+
+#### The 3-Stream Model
+
+```
+Video Input
+    │
+    ├──→ Stream 1: ASR (Audio Speech Recognition)
+    │    Extract spoken words with timestamps
+    │    Tools: Whisper, YouTube captions API
+    │    Output: [{text, start, end, confidence}, ...]
+    │
+    ├──→ Stream 2: OCR (Optical Character Recognition)
+    │    Extract visual text (code, slides, diagrams)
+    │    Tools: OpenCV + scene detection + OCR engine
+    │    Output: [{text, timestamp, frame_type, bbox}, ...]
+    │
+    └──→ Stream 3: Metadata
+         Extract structural info (chapters, tags, description)
+         Tools: yt-dlp, platform APIs
+         Output: {title, chapters, tags, description, ...}
+```
+
+**Key insight (from NVIDIA's multimodal RAG blog):** Ground everything to text first. Align all streams on a shared timeline, then merge into unified text segments. This makes the output compatible with any text-based RAG pipeline without requiring multimodal embeddings.
+
+#### Reference Implementations
+
+| Project | Approach | Strengths | Weaknesses |
+|---------|----------|-----------|------------|
+| [video-analyzer](https://github.com/byjlw/video-analyzer) | Whisper + OpenCV + LLM analysis | Full pipeline, LLM summaries | No chapter support, no YouTube integration |
+| [LlamaIndex MultiModal RAG](https://www.llamaindex.ai/blog/multimodal-rag-for-advanced-video-processing-with-llamaindex-lancedb-33be4804822e) | Frame extraction + CLIP + LanceDB | Vector search over frames | Heavy (requires GPU), no ASR |
+| [VideoRAG](https://video-rag.github.io/) | Graph-based reasoning + multimodal retrieval | Multi-hour video support | Research project, not production-ready |
+| [Ragie Multimodal RAG](https://www.ragie.ai/blog/how-we-built-multimodal-rag-for-audio-and-video) | faster-whisper large-v3-turbo + OCR + object detection | Production-grade, 3-stream | Proprietary, not open-source |
+
+#### Industry Best Practices
+
+1. **Audio-only download** — Never download full video when you only need audio. Extract audio stream with FFmpeg (`-vn` flag). This is 10-50x smaller.
+2. **Prefer existing captions** — YouTube manual captions are higher quality than any ASR model. Only fall back to Whisper when captions unavailable.
+3. **Chapter-based segmentation** — YouTube chapters provide natural content boundaries. Use them as primary segmentation, fall back to time-window or semantic splitting.
+4. **Confidence filtering** — Auto-generated captions and OCR output include confidence scores. Filter low-confidence content rather than including everything.
+5. **Parallel extraction** — Run ASR and OCR in parallel (they're independent). Merge after both complete.
+6. **Tiered processing** — Offer fast/light mode (transcript only) and deep mode (+ visual). Let users choose based on their compute budget.
+
+---
+
+## Library Comparison Matrix
+
+### Metadata & Download
+
+| Library | Purpose | Install Size | Actively Maintained | Python API | License |
+|---------|---------|-------------|-------------------|------------|---------|
+| **yt-dlp** | Metadata + subtitles + download | ~15MB | Yes (weekly releases) | Yes (`YoutubeDL` class) | Unlicense |
+| pytube | YouTube download | ~1MB | Inconsistent | Yes | MIT |
+| youtube-dl | Download (original) | ~10MB | Stale | Yes | Unlicense |
+| pafy | YouTube metadata | ~50KB | Dead (2021) | Yes | LGPL |
+
+**Winner: yt-dlp** — De-facto standard, actively maintained, comprehensive Python API, supports 1000+ sites (not just YouTube).
+
+### Transcript Extraction (YouTube)
+
+| Library | Purpose | Requires Download | Speed | Accuracy | License |
+|---------|---------|-------------------|-------|----------|---------|
+| **youtube-transcript-api** | YouTube captions | No | Very fast (<1s) | Depends on caption source | MIT |
+| yt-dlp subtitles | Download subtitle files | Yes (subtitle only) | Fast (~2s) | Same as above | Unlicense |
+
+**Winner: youtube-transcript-api** — Fastest, no download needed, returns structured JSON with timestamps directly. Falls back to yt-dlp for non-YouTube platforms.
+
+### Speech-to-Text (ASR)
+
+| Library | Speed (30 min audio) | Word Timestamps | Model Sizes | GPU Required | Language Support | License |
+|---------|---------------------|----------------|-------------|-------------|-----------------|---------|
+| **faster-whisper** | ~2-4 min (GPU), ~8-15 min (CPU) | Yes (`word_timestamps=True`) | tiny (39M) → large-v3 (1.5B) | No (but recommended) | 99 languages | MIT |
+| openai-whisper | ~5-10 min (GPU), ~20-40 min (CPU) | Yes | Same models | Recommended | 99 languages | MIT |
+| whisper-timestamped | Same as openai-whisper | Yes (more accurate) | Same models | Recommended | 99 languages | MIT |
+| whisperx | ~2-3 min (GPU) | Yes (best accuracy via wav2vec2) | Same + wav2vec2 | Yes (required) | 99 languages | BSD |
+| stable-ts | Same as openai-whisper | Yes (stabilized) | Same models | Recommended | 99 languages | MIT |
+| Google Speech-to-Text | Real-time | Yes | Cloud | No | 125+ languages | Proprietary |
+| AssemblyAI | Real-time | Yes | Cloud | No | 100+ languages | Proprietary |
+
+**Winner: faster-whisper** — 4x faster than OpenAI Whisper via CTranslate2 optimization, MIT license, word-level timestamps, works without GPU (just slower), actively maintained. We may consider whisperx as a future upgrade for speaker diarization.
+
+### Scene Detection & Frame Extraction
+
+| Library | Purpose | Algorithm | Speed | License |
+|---------|---------|-----------|-------|---------|
+| **PySceneDetect** | Scene boundary detection | ContentDetector, ThresholdDetector, AdaptiveDetector | Fast | BSD |
+| opencv-python-headless | Frame extraction, image processing | Manual (absdiff, histogram) | Fast | Apache 2.0 |
+| Filmstrip | Keyframe extraction | Scene detection + selection | Medium | MIT |
+| video-keyframe-detector | Keyframe extraction | Peak estimation from frame diff | Fast | MIT |
+| decord | GPU-accelerated frame extraction | Direct frame access | Very fast | Apache 2.0 |
+
+**Winner: PySceneDetect + opencv-python-headless** — PySceneDetect handles intelligent boundary detection, OpenCV handles frame extraction and image processing. Both are well-maintained and BSD/Apache licensed.
+
+### OCR (Optical Character Recognition)
+
+| Library | Languages | GPU Support | Accuracy on Code | Speed | Install Size | License |
+|---------|-----------|------------|-------------------|-------|-------------|---------|
+| **easyocr** | 80+ | Yes (PyTorch) | Good | Medium | ~150MB + models | Apache 2.0 |
+| pytesseract | 100+ | No | Medium | Fast | ~30MB + Tesseract | Apache 2.0 |
+| PaddleOCR | 80+ | Yes (PaddlePaddle) | Very Good | Fast | ~200MB + models | Apache 2.0 |
+| TrOCR (HuggingFace) | Multilingual | Yes | Good | Slow | ~500MB | MIT |
+| docTR | 10+ | Yes (TF/PyTorch) | Good | Medium | ~100MB | Apache 2.0 |
+
+**Winner: easyocr** — Best balance of accuracy (especially on code/terminal text), GPU support, language coverage, and ease of use. PaddleOCR is a close second but has heavier dependencies (PaddlePaddle framework).
+
+---
+
+## Detailed Library Analysis
+
+### 1. yt-dlp (Metadata & Download Engine)
+
+**What it provides:**
+- Video metadata (title, description, duration, upload date, channel, tags, categories)
+- Chapter information (title, start_time, end_time for each chapter)
+- Subtitle/caption download (all available languages, all formats)
+- Thumbnail URLs
+- View/like counts
+- Playlist information (title, entries, ordering)
+- Audio-only extraction (no full video download needed)
+- Supports 1000+ video sites (YouTube, Vimeo, Dailymotion, etc.)
+
+**Python API usage:**
+
+```python
+from yt_dlp import YoutubeDL
+
+def extract_video_metadata(url: str) -> dict:
+    """Extract metadata without downloading."""
+    opts = {
+        'quiet': True,
+        'no_warnings': True,
+        'extract_flat': False,  # Full extraction
+    }
+    with YoutubeDL(opts) as ydl:
+        info = ydl.extract_info(url, download=False)
+        return info
+```
+
+**Key fields in `info_dict`:**
+
+```python
+{
+    'id': 'dQw4w9WgXcQ',              # Video ID
+    'title': 'Video Title',            # Full title
+    'description': '...',              # Full description text
+    'duration': 1832,                  # Duration in seconds
+    'upload_date': '20260115',         # YYYYMMDD format
+    'uploader': 'Channel Name',        # Channel/uploader name
+    'uploader_id': '@channelname',     # Channel ID
+    'uploader_url': 'https://...',     # Channel URL
+    'channel_follower_count': 150000,  # Subscriber count
+    'view_count': 5000000,             # View count
+    'like_count': 120000,              # Like count
+    'comment_count': 8500,             # Comment count
+    'tags': ['react', 'hooks', ...],   # Video tags
+    'categories': ['Education'],        # YouTube categories
+    'language': 'en',                  # Primary language
+    'subtitles': {                     # Manual captions
+        'en': [{'ext': 'vtt', 'url': '...'}],
+    },
+    'automatic_captions': {            # Auto-generated captions
+        'en': [{'ext': 'vtt', 'url': '...'}],
+    },
+    'chapters': [                      # Chapter markers
+        {'title': 'Intro', 'start_time': 0, 'end_time': 45},
+        {'title': 'Setup', 'start_time': 45, 'end_time': 180},
+        {'title': 'First Component', 'start_time': 180, 'end_time': 420},
+    ],
+    'thumbnail': 'https://...',        # Best thumbnail URL
+    'thumbnails': [...],               # All thumbnail variants
+    'webpage_url': 'https://...',      # Canonical URL
+    'formats': [...],                  # Available formats
+    'requested_formats': [...],        # Selected format info
+}
+```
+
+**Playlist extraction:**
+
+```python
+def extract_playlist(url: str) -> list[dict]:
+    """Extract all videos from a playlist."""
+    opts = {
+        'quiet': True,
+        'extract_flat': True,  # Don't extract each video yet
+    }
+    with YoutubeDL(opts) as ydl:
+        info = ydl.extract_info(url, download=False)
+        # info['entries'] contains all video entries
+        return info.get('entries', [])
+```
+
+**Audio-only download (for Whisper):**
+
+```python
+def download_audio(url: str, output_dir: str) -> str:
+    """Download audio stream only (no video)."""
+    opts = {
+        'format': 'bestaudio/best',
+        'postprocessors': [{
+            'key': 'FFmpegExtractAudio',
+            'preferredcodec': 'wav',
+            'preferredquality': '16',  # 16kHz (Whisper's native rate)
+        }],
+        'outtmpl': f'{output_dir}/%(id)s.%(ext)s',
+        'quiet': True,
+    }
+    with YoutubeDL(opts) as ydl:
+        info = ydl.extract_info(url, download=True)
+        return f"{output_dir}/{info['id']}.wav"
+```
+
+### 2. youtube-transcript-api (Caption Extraction)
+
+**What it provides:**
+- Direct access to YouTube captions without downloading
+- Manual and auto-generated caption support
+- Translation support (translate captions to any language)
+- Structured output with timestamps
+
+**Python API usage:**
+
+```python
+from youtube_transcript_api import YouTubeTranscriptApi
+
+def get_youtube_transcript(video_id: str, languages: list[str] = None) -> list[dict]:
+    """Get transcript with timestamps."""
+    languages = languages or ['en']
+
+    transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
+
+    # Prefer manual captions over auto-generated
+    try:
+        transcript = transcript_list.find_manually_created_transcript(languages)
+    except Exception:
+        transcript = transcript_list.find_generated_transcript(languages)
+
+    # Fetch the actual transcript data
+    data = transcript.fetch()
+    return data
+    # Returns: [{'text': 'Hello', 'start': 0.0, 'duration': 1.5}, ...]
+```
+
+**Output format:**
+
+```python
+[
+    {
+        'text': "Welcome to this React tutorial",
+        'start': 0.0,        # Start time in seconds
+        'duration': 2.5       # Duration in seconds
+    },
+    {
+        'text': "Today we'll learn about hooks",
+        'start': 2.5,
+        'duration': 3.0
+    },
+    # ... continues for entire video
+]
+```
+
+**Key features:**
+- Segments are typically 2-5 seconds each
+- Manual captions have punctuation and proper casing
+- Auto-generated captions may lack punctuation and have lower accuracy
+- Can detect available languages and caption types
+
+### 3. faster-whisper (Speech-to-Text)
+
+**What it provides:**
+- OpenAI Whisper models with 4x speedup via CTranslate2
+- Word-level timestamps with confidence scores
+- Language detection
+- VAD (Voice Activity Detection) filtering
+- Multiple model sizes from tiny (39M) to large-v3 (1.5B)
+
+**Python API usage:**
+
+```python
+from faster_whisper import WhisperModel
+
+def transcribe_with_whisper(audio_path: str, model_size: str = "base") -> dict:
+    """Transcribe audio file with word-level timestamps."""
+    model = WhisperModel(
+        model_size,
+        device="auto",          # auto-detect GPU/CPU
+        compute_type="auto",    # auto-select precision
+    )
+
+    segments, info = model.transcribe(
+        audio_path,
+        word_timestamps=True,
+        vad_filter=True,         # Filter silence
+        vad_parameters={
+            "min_silence_duration_ms": 500,
+        },
+    )
+
+    result = {
+        'language': info.language,
+        'language_probability': info.language_probability,
+        'duration': info.duration,
+        'segments': [],
+    }
+
+    for segment in segments:
+        seg_data = {
+            'start': segment.start,
+            'end': segment.end,
+            'text': segment.text.strip(),
+            'avg_logprob': segment.avg_logprob,
+            'no_speech_prob': segment.no_speech_prob,
+            'words': [],
+        }
+        if segment.words:
+            for word in segment.words:
+                seg_data['words'].append({
+                    'word': word.word,
+                    'start': word.start,
+                    'end': word.end,
+                    'probability': word.probability,
+                })
+        result['segments'].append(seg_data)
+
+    return result
+```
+
+**Model size guide:**
+
+| Model | Parameters | English WER | Multilingual WER | VRAM (FP16) | Speed (30 min, GPU) |
+|-------|-----------|-------------|------------------|-------------|---------------------|
+| tiny | 39M | 14.8% | 23.2% | ~1GB | ~30s |
+| base | 74M | 11.5% | 18.7% | ~1GB | ~45s |
+| small | 244M | 9.5% | 14.6% | ~2GB | ~90s |
+| medium | 769M | 8.0% | 12.4% | ~5GB | ~180s |
+| large-v3 | 1.5B | 5.7% | 10.1% | ~10GB | ~240s |
+| large-v3-turbo | 809M | 6.2% | 10.8% | ~6GB | ~120s |
+
+**Recommendation:** Default to `base` (good balance), offer `large-v3-turbo` for best accuracy, `tiny` for speed.
+
+### 4. PySceneDetect (Scene Boundary Detection)
+
+**What it provides:**
+- Automatic scene/cut detection in video files
+- Multiple detection algorithms (content-based, threshold, adaptive)
+- Frame-accurate boundaries
+- Integration with OpenCV
+
+**Python API usage:**
+
+```python
+from scenedetect import detect, ContentDetector, AdaptiveDetector
+
+def detect_scene_changes(video_path: str) -> list[tuple[float, float]]:
+    """Detect scene boundaries in video.
+
+    Returns list of (start_time, end_time) tuples.
+    """
+    scene_list = detect(
+        video_path,
+        ContentDetector(
+            threshold=27.0,      # Sensitivity (lower = more scenes)
+            min_scene_len=15,    # Minimum 15 frames per scene
+        ),
+    )
+
+    boundaries = []
+    for scene in scene_list:
+        start = scene[0].get_seconds()
+        end = scene[1].get_seconds()
+        boundaries.append((start, end))
+
+    return boundaries
+```
+
+**Detection algorithms:**
+
+| Algorithm | Best For | Speed | Sensitivity |
+|-----------|----------|-------|-------------|
+| ContentDetector | General content changes | Fast | Medium |
+| AdaptiveDetector | Gradual transitions | Medium | High |
+| ThresholdDetector | Hard cuts (black frames) | Very fast | Low |
+
+### 5. easyocr (Text Recognition)
+
+**What it provides:**
+- Text detection and recognition from images
+- 80+ language support
+- GPU acceleration
+- Bounding box coordinates for each text region
+- Confidence scores
+
+**Python API usage:**
+
+```python
+import easyocr
+
+def extract_text_from_frame(image_path: str, languages: list[str] = None) -> list[dict]:
+    """Extract text from a video frame image."""
+    languages = languages or ['en']
+    reader = easyocr.Reader(languages, gpu=True)
+
+    results = reader.readtext(image_path)
+    # results: [([x1,y1],[x2,y2],[x3,y3],[x4,y4]), text, confidence]
+
+    extracted = []
+    for bbox, text, confidence in results:
+        extracted.append({
+            'text': text,
+            'confidence': confidence,
+            'bbox': bbox,  # Corner coordinates
+        })
+
+    return extracted
+```
+
+**Tips for code/terminal OCR:**
+- Pre-process images: increase contrast, convert to grayscale
+- Use higher DPI/resolution frames
+- Filter by confidence threshold (>0.5 for code)
+- Detect monospace regions first, then OCR only those regions
+
+### 6. OpenCV (Frame Extraction)
+
+**What it provides:**
+- Video file reading and frame extraction
+- Image processing (resize, crop, color conversion)
+- Template matching (detect code editors, terminals)
+- Histogram analysis (detect slide vs code vs webcam)
+
+**Python API usage:**
+
+```python
+import cv2
+import numpy as np
+
+def extract_frames_at_timestamps(
+    video_path: str,
+    timestamps: list[float],
+    output_dir: str
+) -> list[str]:
+    """Extract frames at specific timestamps."""
+    cap = cv2.VideoCapture(video_path)
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    frame_paths = []
+
+    for ts in timestamps:
+        frame_number = int(ts * fps)
+        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
+        ret, frame = cap.read()
+        if ret:
+            path = f"{output_dir}/frame_{ts:.2f}.png"
+            cv2.imwrite(path, frame)
+            frame_paths.append(path)
+
+    cap.release()
+    return frame_paths
+
+
+def classify_frame(image_path: str) -> str:
+    """Classify frame as code/slide/terminal/webcam/other.
+
+    Uses heuristics:
+    - Dark background + monospace text regions = code/terminal
+    - Light background + large text blocks = slide
+    - Face detection = webcam
+    - High color variance = diagram
+    """
+    img = cv2.imread(image_path)
+    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
+    h, w = gray.shape
+
+    # Check brightness distribution
+    mean_brightness = np.mean(gray)
+    brightness_std = np.std(gray)
+
+    # Dark background with structured content = code/terminal
+    if mean_brightness < 80 and brightness_std > 40:
+        return 'code'  # or 'terminal'
+
+    # Light background with text blocks = slide
+    if mean_brightness > 180 and brightness_std < 60:
+        return 'slide'
+
+    # High edge density = diagram
+    edges = cv2.Canny(gray, 50, 150)
+    edge_density = np.count_nonzero(edges) / (h * w)
+    if edge_density > 0.15:
+        return 'diagram'
+
+    return 'other'
+```
+
+---
+
+## Benchmarks & Performance Data
+
+### Transcript Extraction Speed
+
+| Method | 10 min video | 30 min video | 60 min video | Requires Download |
+|--------|-------------|-------------|-------------|-------------------|
+| youtube-transcript-api | ~0.5s | ~0.5s | ~0.5s | No |
+| yt-dlp subtitles | ~2s | ~2s | ~2s | Subtitle file only |
+| faster-whisper (tiny, GPU) | ~10s | ~30s | ~60s | Audio only |
+| faster-whisper (base, GPU) | ~15s | ~45s | ~90s | Audio only |
+| faster-whisper (large-v3, GPU) | ~80s | ~240s | ~480s | Audio only |
+| faster-whisper (base, CPU) | ~60s | ~180s | ~360s | Audio only |
+
+### Visual Extraction Speed
+
+| Operation | Per Frame | Per 10 min video (50 keyframes) |
+|-----------|----------|-------------------------------|
+| Frame extraction (OpenCV) | ~5ms | ~0.25s |
+| Scene detection (PySceneDetect) | N/A | ~15s for full video |
+| Frame classification (heuristic) | ~10ms | ~0.5s |
+| OCR per frame (easyocr, GPU) | ~200ms | ~10s |
+| OCR per frame (easyocr, CPU) | ~1-2s | ~50-100s |
+
+### Total Pipeline Time (estimated)
+
+| Mode | 10 min video | 30 min video | 1 hour video |
+|------|-------------|-------------|-------------|
+| Transcript only (YouTube captions) | ~2s | ~2s | ~2s |
+| Transcript only (Whisper base, GPU) | ~20s | ~50s | ~100s |
+| Full (transcript + visual, GPU) | ~35s | ~80s | ~170s |
+| Full (transcript + visual, CPU) | ~120s | ~350s | ~700s |
+
+---
+
+## Recommendations
+
+### Primary Stack (Chosen)
+
+| Component | Library | Why |
+|-----------|---------|-----|
+| Metadata + download | **yt-dlp** | De-facto standard, 1000+ sites, comprehensive Python API |
+| YouTube transcripts | **youtube-transcript-api** | Fastest, no download, structured output |
+| Speech-to-text | **faster-whisper** | 4x faster than Whisper, MIT, word timestamps |
+| Scene detection | **PySceneDetect** | Best algorithm options, OpenCV-based |
+| Frame extraction | **opencv-python-headless** | Standard, headless (no GUI deps) |
+| OCR | **easyocr** | Best code/terminal accuracy, 80+ languages, GPU support |
+
+### Future Considerations
+
+| Component | Library | When to Add |
+|-----------|---------|-------------|
+| Speaker diarization | **whisperx** or **pyannote** | V2.0 — identify who said what |
+| Object detection | **YOLO** | V2.0 — detect UI elements, diagrams |
+| Multimodal embeddings | **CLIP** | V2.0 — embed frames for visual search |
+| Slide detection | **python-pptx** + heuristics | V1.5 — detect and extract slide content |
+
+### Sources
+
+- [youtube-transcript-api (PyPI)](https://pypi.org/project/youtube-transcript-api/)
+- [yt-dlp GitHub](https://github.com/yt-dlp/yt-dlp)
+- [yt-dlp Information Extraction Pipeline (DeepWiki)](https://deepwiki.com/yt-dlp/yt-dlp/2.2-information-extraction-pipeline)
+- [faster-whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
+- [faster-whisper (PyPI)](https://pypi.org/project/faster-whisper/)
+- [whisper-timestamped GitHub](https://github.com/linto-ai/whisper-timestamped)
+- [stable-ts (PyPI)](https://pypi.org/project/stable-ts/)
+- [PySceneDetect GitHub](https://github.com/Breakthrough/PySceneDetect)
+- [easyocr GitHub (implied from PyPI)](https://pypi.org/project/easyocr/)
+- [NVIDIA Multimodal RAG for Video and Audio](https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation-for-video-and-audio/)
+- [LlamaIndex MultiModal RAG for Video](https://www.llamaindex.ai/blog/multimodal-rag-for-advanced-video-processing-with-llamaindex-lancedb-33be4804822e)
+- [Ragie: How We Built Multimodal RAG](https://www.ragie.ai/blog/how-we-built-multimodal-rag-for-audio-and-video)
+- [video-analyzer GitHub](https://github.com/byjlw/video-analyzer)
+- [VideoRAG Project](https://video-rag.github.io/)
+- [video-keyframe-detector GitHub](https://github.com/joelibaceta/video-keyframe-detector)
+- [Filmstrip GitHub](https://github.com/tafsiri/filmstrip)
--- a/docs/plans/video/02_VIDEO_DATA_MODELS.md
+++ b/docs/plans/video/02_VIDEO_DATA_MODELS.md
@@ -0,0 +1,972 @@
+# Video Source — Data Models & Type Definitions
+
+**Date:** February 27, 2026
+**Document:** 02 of 07
+**Status:** Planning
+
+---
+
+## Table of Contents
+
+1. [Design Principles](#design-principles)
+2. [Core Data Classes](#core-data-classes)
+3. [Supporting Data Classes](#supporting-data-classes)
+4. [Enumerations](#enumerations)
+5. [JSON Schema (Serialization)](#json-schema-serialization)
+6. [Relationships Diagram](#relationships-diagram)
+7. [Config Schema (Unified Config)](#config-schema-unified-config)
+
+---
+
+## Design Principles
+
+1. **Immutable after creation** — Use `@dataclass(frozen=True)` for segments and frames. Once extracted, data doesn't change.
+2. **Serializable** — Every data class must serialize to/from JSON for caching, output, and inter-process communication.
+3. **Timeline-aligned** — Every piece of data has `start_time` and `end_time` fields. This is the alignment axis for merging streams.
+4. **Confidence-scored** — Every extracted piece of content carries a confidence score for quality filtering.
+5. **Source-aware** — Every piece of data traces back to its origin (which video, which stream, which tool).
+6. **Compatible** — Output structures must be compatible with existing Skill Seekers page/reference format for seamless integration.
+
+---
+
+## Core Data Classes
+
+### VideoInfo — The top-level container for a single video
+
+```python
+@dataclass
+class VideoInfo:
+    """Complete metadata and extracted content for a single video.
+
+    This is the primary output of the video scraper for one video.
+    It contains raw metadata from the platform, plus all extracted
+    and aligned content (segments).
+
+    Lifecycle:
+        1. Created with metadata during resolve phase
+        2. Transcript populated during ASR phase
+        3. Visual data populated during OCR phase (if enabled)
+        4. Segments populated during alignment phase
+    """
+
+    # === Identity ===
+    video_id: str
+    """Unique identifier.
+    - YouTube: 11-char video ID (e.g., 'dQw4w9WgXcQ')
+    - Vimeo: numeric ID (e.g., '123456789')
+    - Local: SHA-256 hash of file path
+    """
+
+    source_type: VideoSourceType
+    """Where this video came from (youtube, vimeo, local_file)."""
+
+    source_url: str | None
+    """Original URL for online videos. None for local files."""
+
+    file_path: str | None
+    """Local file path. Set for local files, or after download for
+    online videos that needed audio extraction."""
+
+    # === Basic Metadata ===
+    title: str
+    """Video title. For local files, derived from filename."""
+
+    description: str
+    """Full description text. Empty string for local files without metadata."""
+
+    duration: float
+    """Duration in seconds."""
+
+    upload_date: str | None
+    """Upload/creation date in ISO 8601 format (YYYY-MM-DD).
+    None if unknown."""
+
+    language: str
+    """Primary language code (e.g., 'en', 'tr', 'ja').
+    Detected from captions, Whisper, or metadata."""
+
+    # === Channel / Author ===
+    channel_name: str | None
+    """Channel or uploader name."""
+
+    channel_url: str | None
+    """URL to the channel/uploader page."""
+
+    channel_subscriber_count: int | None
+    """Subscriber/follower count. Quality signal."""
+
+    # === Engagement Metadata (quality signals) ===
+    view_count: int | None
+    """Total view count. Higher = more authoritative."""
+
+    like_count: int | None
+    """Like count."""
+
+    comment_count: int | None
+    """Comment count. Higher = more discussion."""
+
+    # === Discovery Metadata ===
+    tags: list[str]
+    """Video tags from platform. Used for categorization."""
+
+    categories: list[str]
+    """Platform categories (e.g., ['Education', 'Science & Technology'])."""
+
+    thumbnail_url: str | None
+    """URL to the best quality thumbnail."""
+
+    # === Structure ===
+    chapters: list[Chapter]
+    """YouTube chapter markers. Empty list if no chapters.
+    This is the PRIMARY segmentation source."""
+
+    # === Playlist Context ===
+    playlist_title: str | None
+    """Title of the playlist this video belongs to. None if standalone."""
+
+    playlist_index: int | None
+    """0-based index within the playlist. None if standalone."""
+
+    playlist_total: int | None
+    """Total number of videos in the playlist. None if standalone."""
+
+    # === Extracted Content (populated during processing) ===
+    raw_transcript: list[TranscriptSegment]
+    """Raw transcript segments as received from YouTube API or Whisper.
+    Before alignment and merging."""
+
+    segments: list[VideoSegment]
+    """Final aligned and merged segments. This is the PRIMARY output.
+    Each segment combines ASR + OCR + metadata into a single unit."""
+
+    # === Processing Metadata ===
+    transcript_source: TranscriptSource
+    """How the transcript was obtained."""
+
+    visual_extraction_enabled: bool
+    """Whether OCR/frame extraction was performed."""
+
+    whisper_model: str | None
+    """Whisper model used, if applicable (e.g., 'base', 'large-v3')."""
+
+    processing_time_seconds: float
+    """Total processing time for this video."""
+
+    extracted_at: str
+    """ISO 8601 timestamp of when extraction was performed."""
+
+    # === Quality Scores (computed) ===
+    transcript_confidence: float
+    """Average confidence of transcript (0.0 - 1.0).
+    Based on caption type or Whisper probability."""
+
+    content_richness_score: float
+    """How rich/useful the extracted content is (0.0 - 1.0).
+    Based on: duration, chapters present, code detected, engagement."""
+
+    def to_dict(self) -> dict:
+        """Serialize to JSON-compatible dictionary."""
+        ...
+
+    @classmethod
+    def from_dict(cls, data: dict) -> 'VideoInfo':
+        """Deserialize from dictionary."""
+        ...
+```
+
+### VideoSegment — The fundamental aligned content unit
+
+```python
+@dataclass
+class VideoSegment:
+    """A time-aligned segment combining all 3 extraction streams.
+
+    This is the CORE data unit of the video pipeline. Every piece
+    of video content is broken into segments that align:
+    - ASR transcript (what was said)
+    - OCR content (what was shown on screen)
+    - Metadata (chapter title, topic)
+
+    Segments are then used to generate reference markdown files
+    and integrate into SKILL.md.
+
+    Segmentation strategies (in priority order):
+    1. Chapter boundaries (YouTube chapters)
+    2. Semantic boundaries (topic shifts detected by NLP)
+    3. Time windows (configurable interval, default 3-5 minutes)
+    """
+
+    # === Time Bounds ===
+    index: int
+    """0-based segment index within the video."""
+
+    start_time: float
+    """Start time in seconds."""
+
+    end_time: float
+    """End time in seconds."""
+
+    duration: float
+    """Segment duration in seconds (end_time - start_time)."""
+
+    # === Stream 1: ASR (Audio) ===
+    transcript: str
+    """Full transcript text for this time window.
+    Concatenated from word-level timestamps."""
+
+    words: list[WordTimestamp]
+    """Word-level timestamps within this segment.
+    Allows precise text-to-time mapping."""
+
+    transcript_confidence: float
+    """Average confidence for this segment's transcript (0.0 - 1.0)."""
+
+    # === Stream 2: OCR (Visual) ===
+    keyframes: list[KeyFrame]
+    """Extracted keyframes within this time window.
+    Only populated if visual_extraction is enabled."""
+
+    ocr_text: str
+    """Combined OCR text from all keyframes in this segment.
+    Deduplicated and cleaned."""
+
+    detected_code_blocks: list[CodeBlock]
+    """Code blocks detected on screen via OCR.
+    Includes language detection and formatted code."""
+
+    has_code_on_screen: bool
+    """Whether code/terminal was detected on screen."""
+
+    has_slides: bool
+    """Whether presentation slides were detected."""
+
+    has_diagram: bool
+    """Whether diagrams/architecture drawings were detected."""
+
+    # === Stream 3: Metadata ===
+    chapter_title: str | None
+    """YouTube chapter title if this segment maps to a chapter.
+    None if video has no chapters or segment spans chapter boundary."""
+
+    topic: str | None
+    """Inferred topic for this segment.
+    Derived from chapter title, transcript keywords, or AI classification."""
+
+    category: str | None
+    """Mapped category (e.g., 'getting_started', 'api', 'tutorial').
+    Uses the same categorization system as other sources."""
+
+    # === Merged Content ===
+    content: str
+    """Final merged text content for this segment.
+
+    Merging strategy:
+    1. Start with transcript text
+    2. If code detected on screen but not mentioned in transcript,
+       append code block with annotation
+    3. If slide text detected, integrate as supplementary content
+    4. Add chapter title as heading if present
+
+    This is what gets written to reference markdown files.
+    """
+
+    summary: str | None
+    """AI-generated summary of this segment (populated during enhancement).
+    None until enhancement phase."""
+
+    # === Quality Metadata ===
+    confidence: float
+    """Overall confidence for this segment (0.0 - 1.0).
+    Weighted average of transcript + OCR confidences."""
+
+    content_type: SegmentContentType
+    """Primary content type of this segment."""
+
+    def to_dict(self) -> dict:
+        """Serialize to JSON-compatible dictionary."""
+        ...
+
+    @classmethod
+    def from_dict(cls, data: dict) -> 'VideoSegment':
+        """Deserialize from dictionary."""
+        ...
+
+    @property
+    def timestamp_display(self) -> str:
+        """Human-readable timestamp (e.g., '05:30 - 08:15')."""
+        start_min, start_sec = divmod(int(self.start_time), 60)
+        end_min, end_sec = divmod(int(self.end_time), 60)
+        return f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
+
+    @property
+    def youtube_timestamp_url(self) -> str | None:
+        """YouTube URL with timestamp parameter (e.g., '?t=330').
+        Returns None if not a YouTube video."""
+        ...
+```
+
+---
+
+## Supporting Data Classes
+
+### Chapter — YouTube chapter marker
+
+```python
+@dataclass(frozen=True)
+class Chapter:
+    """A chapter marker from a video (typically YouTube).
+
+    Chapters provide natural content boundaries and are the
+    preferred segmentation method.
+    """
+    title: str
+    """Chapter title as shown in YouTube."""
+
+    start_time: float
+    """Start time in seconds."""
+
+    end_time: float
+    """End time in seconds."""
+
+    @property
+    def duration(self) -> float:
+        return self.end_time - self.start_time
+
+    def to_dict(self) -> dict:
+        return {
+            'title': self.title,
+            'start_time': self.start_time,
+            'end_time': self.end_time,
+        }
+```
+
+### TranscriptSegment — Raw transcript chunk from API/Whisper
+
+```python
+@dataclass(frozen=True)
+class TranscriptSegment:
+    """A raw transcript segment as received from the source.
+
+    This is the unprocessed output from youtube-transcript-api or
+    faster-whisper, before alignment and merging.
+
+    youtube-transcript-api segments are typically 2-5 seconds each.
+    faster-whisper segments are typically sentence-level (5-30 seconds).
+    """
+    text: str
+    """Transcript text for this segment."""
+
+    start: float
+    """Start time in seconds."""
+
+    end: float
+    """End time in seconds. Computed as start + duration for YouTube API."""
+
+    confidence: float
+    """Confidence score (0.0 - 1.0).
+    - YouTube manual captions: 1.0 (assumed perfect)
+    - YouTube auto-generated: 0.8 (estimated)
+    - Whisper: actual model probability
+    """
+
+    words: list[WordTimestamp] | None
+    """Word-level timestamps, if available.
+    Always available from faster-whisper.
+    Not available from youtube-transcript-api.
+    """
+
+    source: TranscriptSource
+    """Which tool produced this segment."""
+
+    def to_dict(self) -> dict:
+        return {
+            'text': self.text,
+            'start': self.start,
+            'end': self.end,
+            'confidence': self.confidence,
+            'words': [w.to_dict() for w in self.words] if self.words else None,
+            'source': self.source.value,
+        }
+```
+
+### WordTimestamp — Individual word with timing
+
+```python
+@dataclass(frozen=True)
+class WordTimestamp:
+    """A single word with precise timing information.
+
+    Enables precise text-to-time mapping within segments.
+    Essential for aligning ASR with OCR content.
+    """
+    word: str
+    """The word text."""
+
+    start: float
+    """Start time in seconds."""
+
+    end: float
+    """End time in seconds."""
+
+    probability: float
+    """Model confidence for this word (0.0 - 1.0).
+    From faster-whisper's word_timestamps output."""
+
+    def to_dict(self) -> dict:
+        return {
+            'word': self.word,
+            'start': self.start,
+            'end': self.end,
+            'probability': self.probability,
+        }
+```
+
+### KeyFrame — Extracted video frame with analysis
+
+```python
+@dataclass
+class KeyFrame:
+    """An extracted video frame with visual analysis results.
+
+    Keyframes are extracted at:
+    1. Scene change boundaries (PySceneDetect)
+    2. Chapter boundaries
+    3. Regular intervals within segments (configurable)
+
+    Each frame is classified and optionally OCR'd.
+    """
+    timestamp: float
+    """Exact timestamp in seconds where this frame was extracted."""
+
+    image_path: str
+    """Path to the saved frame image file (PNG).
+    Relative to the video_data/frames/ directory."""
+
+    frame_type: FrameType
+    """Classification of what this frame shows."""
+
+    scene_change_score: float
+    """How different this frame is from the previous one (0.0 - 1.0).
+    Higher = more significant visual change.
+    From PySceneDetect's content detection."""
+
+    # === OCR Results ===
+    ocr_regions: list[OCRRegion]
+    """All text regions detected in this frame.
+    Empty list if OCR was not performed or no text detected."""
+
+    ocr_text: str
+    """Combined OCR text from all regions.
+    Cleaned and deduplicated."""
+
+    ocr_confidence: float
+    """Average OCR confidence across all regions (0.0 - 1.0)."""
+
+    # === Frame Properties ===
+    width: int
+    """Frame width in pixels."""
+
+    height: int
+    """Frame height in pixels."""
+
+    mean_brightness: float
+    """Average brightness (0-255). Used for classification."""
+
+    def to_dict(self) -> dict:
+        return {
+            'timestamp': self.timestamp,
+            'image_path': self.image_path,
+            'frame_type': self.frame_type.value,
+            'scene_change_score': self.scene_change_score,
+            'ocr_regions': [r.to_dict() for r in self.ocr_regions],
+            'ocr_text': self.ocr_text,
+            'ocr_confidence': self.ocr_confidence,
+            'width': self.width,
+            'height': self.height,
+        }
+```
+
+### OCRRegion — A detected text region in a frame
+
+```python
+@dataclass(frozen=True)
+class OCRRegion:
+    """A single text region detected by OCR within a frame.
+
+    Includes bounding box coordinates for spatial analysis
+    (e.g., detecting code editors vs. slide titles).
+    """
+    text: str
+    """Detected text content."""
+
+    confidence: float
+    """OCR confidence (0.0 - 1.0)."""
+
+    bbox: tuple[int, int, int, int]
+    """Bounding box as (x1, y1, x2, y2) in pixels.
+    Top-left to bottom-right."""
+
+    is_monospace: bool
+    """Whether the text appears to be in a monospace font.
+    Indicates code/terminal content."""
+
+    def to_dict(self) -> dict:
+        return {
+            'text': self.text,
+            'confidence': self.confidence,
+            'bbox': list(self.bbox),
+            'is_monospace': self.is_monospace,
+        }
+```
+
+### CodeBlock — Detected code on screen
+
+```python
+@dataclass
+class CodeBlock:
+    """A code block detected via OCR from video frames.
+
+    Represents code that was visible on screen during a segment.
+    May come from a code editor, terminal, or presentation slide.
+    """
+    code: str
+    """The extracted code text. Cleaned and formatted."""
+
+    language: str | None
+    """Detected programming language (e.g., 'python', 'javascript').
+    Uses the same detection heuristics as doc_scraper.detect_language().
+    None if language cannot be determined."""
+
+    source_frame: float
+    """Timestamp of the frame where this code was extracted."""
+
+    context: CodeContext
+    """Where the code appeared (editor, terminal, slide)."""
+
+    confidence: float
+    """OCR confidence for this code block (0.0 - 1.0)."""
+
+    def to_dict(self) -> dict:
+        return {
+            'code': self.code,
+            'language': self.language,
+            'source_frame': self.source_frame,
+            'context': self.context.value,
+            'confidence': self.confidence,
+        }
+```
+
+### VideoPlaylist — Container for playlist processing
+
+```python
+@dataclass
+class VideoPlaylist:
+    """A playlist or channel containing multiple videos.
+
+    Used to track multi-video processing state and ordering.
+    """
+    playlist_id: str
+    """Platform playlist ID."""
+
+    title: str
+    """Playlist title."""
+
+    description: str
+    """Playlist description."""
+
+    channel_name: str | None
+    """Channel that owns the playlist."""
+
+    video_count: int
+    """Total number of videos in the playlist."""
+
+    videos: list[VideoInfo]
+    """Extracted video information for each video.
+    Ordered by playlist index."""
+
+    source_url: str
+    """Original playlist URL."""
+
+    def to_dict(self) -> dict:
+        return {
+            'playlist_id': self.playlist_id,
+            'title': self.title,
+            'description': self.description,
+            'channel_name': self.channel_name,
+            'video_count': self.video_count,
+            'videos': [v.to_dict() for v in self.videos],
+            'source_url': self.source_url,
+        }
+```
+
+### VideoScraperResult — Top-level scraper output
+
+```python
+@dataclass
+class VideoScraperResult:
+    """Complete result from the video scraper.
+
+    This is the top-level output that gets passed to the
+    unified scraper and SKILL.md builder.
+    """
+    videos: list[VideoInfo]
+    """All processed videos."""
+
+    playlists: list[VideoPlaylist]
+    """Playlist containers (if input was playlists)."""
+
+    total_duration_seconds: float
+    """Sum of all video durations."""
+
+    total_segments: int
+    """Sum of all segments across all videos."""
+
+    total_code_blocks: int
+    """Total code blocks detected across all videos."""
+
+    categories: dict[str, list[VideoSegment]]
+    """Segments grouped by detected category.
+    Same category system as other sources."""
+
+    config: VideoSourceConfig
+    """Configuration used for this scrape."""
+
+    processing_time_seconds: float
+    """Total pipeline processing time."""
+
+    warnings: list[str]
+    """Any warnings generated during processing (e.g., missing captions)."""
+
+    errors: list[VideoError]
+    """Errors for individual videos that failed processing."""
+
+    def to_dict(self) -> dict:
+        ...
+```
+
+---
+
+## Enumerations
+
+```python
+from enum import Enum
+
+class VideoSourceType(Enum):
+    """Where a video came from."""
+    YOUTUBE = "youtube"
+    VIMEO = "vimeo"
+    LOCAL_FILE = "local_file"
+    LOCAL_DIRECTORY = "local_directory"
+
+class TranscriptSource(Enum):
+    """How the transcript was obtained."""
+    YOUTUBE_MANUAL = "youtube_manual"          # Human-created captions
+    YOUTUBE_AUTO = "youtube_auto_generated"    # YouTube's ASR
+    WHISPER = "whisper"                        # faster-whisper local ASR
+    SUBTITLE_FILE = "subtitle_file"            # SRT/VTT file alongside video
+    NONE = "none"                              # No transcript available
+
+class FrameType(Enum):
+    """Classification of a keyframe's visual content."""
+    CODE_EDITOR = "code_editor"      # IDE or code editor visible
+    TERMINAL = "terminal"            # Terminal/command line
+    SLIDE = "slide"                  # Presentation slide
+    DIAGRAM = "diagram"              # Architecture/flow diagram
+    BROWSER = "browser"              # Web browser (documentation, output)
+    WEBCAM = "webcam"                # Speaker face/webcam only
+    SCREENCAST = "screencast"        # General screen recording
+    OTHER = "other"                  # Unclassified
+
+class CodeContext(Enum):
+    """Where code was displayed in the video."""
+    EDITOR = "editor"        # Code editor / IDE
+    TERMINAL = "terminal"    # Terminal / command line output
+    SLIDE = "slide"          # Code on a presentation slide
+    BROWSER = "browser"      # Code in a browser (docs, playground)
+    UNKNOWN = "unknown"
+
+class SegmentContentType(Enum):
+    """Primary content type of a video segment."""
+    EXPLANATION = "explanation"    # Talking/explaining concepts
+    LIVE_CODING = "live_coding"   # Writing code on screen
+    DEMO = "demo"                 # Running/showing a demo
+    SLIDES = "slides"             # Presentation slides
+    Q_AND_A = "q_and_a"          # Q&A section
+    INTRO = "intro"              # Introduction/overview
+    OUTRO = "outro"              # Conclusion/wrap-up
+    MIXED = "mixed"              # Combination of types
+
+class SegmentationStrategy(Enum):
+    """How segments are determined."""
+    CHAPTERS = "chapters"                # YouTube chapter boundaries
+    SEMANTIC = "semantic"                # Topic shift detection
+    TIME_WINDOW = "time_window"          # Fixed time intervals
+    SCENE_CHANGE = "scene_change"        # Visual scene changes
+    HYBRID = "hybrid"                    # Combination of strategies
+```
+
+---
+
+## JSON Schema (Serialization)
+
+### VideoSegment JSON
+
+```json
+{
+    "index": 0,
+    "start_time": 45.0,
+    "end_time": 180.0,
+    "duration": 135.0,
+    "transcript": "Let's start by setting up our React project. First, we'll use Create React App...",
+    "words": [
+        {"word": "Let's", "start": 45.0, "end": 45.3, "probability": 0.95},
+        {"word": "start", "start": 45.3, "end": 45.6, "probability": 0.98}
+    ],
+    "transcript_confidence": 0.94,
+    "keyframes": [
+        {
+            "timestamp": 52.3,
+            "image_path": "frames/video_abc123/frame_52.30.png",
+            "frame_type": "terminal",
+            "scene_change_score": 0.72,
+            "ocr_text": "npx create-react-app my-app",
+            "ocr_confidence": 0.89,
+            "ocr_regions": [
+                {
+                    "text": "npx create-react-app my-app",
+                    "confidence": 0.89,
+                    "bbox": [120, 340, 580, 370],
+                    "is_monospace": true
+                }
+            ],
+            "width": 1920,
+            "height": 1080
+        }
+    ],
+    "ocr_text": "npx create-react-app my-app\ncd my-app\nnpm start",
+    "detected_code_blocks": [
+        {
+            "code": "npx create-react-app my-app\ncd my-app\nnpm start",
+            "language": "bash",
+            "source_frame": 52.3,
+            "context": "terminal",
+            "confidence": 0.89
+        }
+    ],
+    "has_code_on_screen": true,
+    "has_slides": false,
+    "has_diagram": false,
+    "chapter_title": "Project Setup",
+    "topic": "react project setup",
+    "category": "getting_started",
+    "content": "## Project Setup (00:45 - 03:00)\n\nLet's start by setting up our React project...\n\n```bash\nnpx create-react-app my-app\ncd my-app\nnpm start\n```\n",
+    "summary": null,
+    "confidence": 0.92,
+    "content_type": "live_coding"
+}
+```
+
+### VideoInfo JSON (abbreviated)
+
+```json
+{
+    "video_id": "abc123def45",
+    "source_type": "youtube",
+    "source_url": "https://www.youtube.com/watch?v=abc123def45",
+    "file_path": null,
+    "title": "React Hooks Tutorial for Beginners",
+    "description": "Learn React Hooks from scratch...",
+    "duration": 1832.0,
+    "upload_date": "2026-01-15",
+    "language": "en",
+    "channel_name": "React Official",
+    "channel_url": "https://www.youtube.com/@reactofficial",
+    "channel_subscriber_count": 250000,
+    "view_count": 1500000,
+    "like_count": 45000,
+    "comment_count": 2300,
+    "tags": ["react", "hooks", "tutorial", "javascript"],
+    "categories": ["Education"],
+    "thumbnail_url": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
+    "chapters": [
+        {"title": "Intro", "start_time": 0.0, "end_time": 45.0},
+        {"title": "Project Setup", "start_time": 45.0, "end_time": 180.0},
+        {"title": "useState Hook", "start_time": 180.0, "end_time": 540.0}
+    ],
+    "playlist_title": "React Complete Course",
+    "playlist_index": 3,
+    "playlist_total": 12,
+    "segments": ["... (see VideoSegment JSON above)"],
+    "transcript_source": "youtube_manual",
+    "visual_extraction_enabled": true,
+    "whisper_model": null,
+    "processing_time_seconds": 45.2,
+    "extracted_at": "2026-02-27T14:30:00Z",
+    "transcript_confidence": 0.95,
+    "content_richness_score": 0.88
+}
+```
+
+---
+
+## Relationships Diagram
+
+```
+VideoScraperResult
+├── videos: list[VideoInfo]
+│   ├── chapters: list[Chapter]
+│   ├── raw_transcript: list[TranscriptSegment]
+│   │   └── words: list[WordTimestamp] | None
+│   └── segments: list[VideoSegment]            ← PRIMARY OUTPUT
+│       ├── words: list[WordTimestamp]
+│       ├── keyframes: list[KeyFrame]
+│       │   └── ocr_regions: list[OCRRegion]
+│       └── detected_code_blocks: list[CodeBlock]
+├── playlists: list[VideoPlaylist]
+│   └── videos: list[VideoInfo]                 ← same as above
+├── categories: dict[str, list[VideoSegment]]
+├── config: VideoSourceConfig
+└── errors: list[VideoError]
+```
+
+---
+
+## Config Schema (Unified Config)
+
+### Video source in unified config JSON
+
+```json
+{
+    "type": "video",
+
+    "_comment_source": "One of: url, playlist, channel, path, directory",
+
+    "url": "https://www.youtube.com/watch?v=abc123",
+    "playlist": "https://www.youtube.com/playlist?list=PLxxx",
+    "channel": "https://www.youtube.com/@channelname",
+    "path": "./recordings/tutorial.mp4",
+    "directory": "./recordings/",
+
+    "name": "official_tutorials",
+    "description": "Official React tutorial videos",
+    "weight": 0.2,
+
+    "_comment_filtering": "Control which videos to process",
+    "max_videos": 20,
+    "min_duration": 60,
+    "max_duration": 7200,
+    "languages": ["en"],
+    "title_include_patterns": ["tutorial", "guide"],
+    "title_exclude_patterns": ["shorts", "live stream"],
+    "min_views": 1000,
+    "upload_after": "2024-01-01",
+
+    "_comment_extraction": "Control extraction depth",
+    "visual_extraction": true,
+    "whisper_model": "base",
+    "whisper_device": "auto",
+    "ocr_languages": ["en"],
+    "keyframe_interval": 5.0,
+    "min_scene_change_score": 0.3,
+    "ocr_confidence_threshold": 0.5,
+    "transcript_confidence_threshold": 0.3,
+
+    "_comment_segmentation": "Control how content is segmented",
+    "segmentation_strategy": "hybrid",
+    "time_window_seconds": 300,
+    "merge_short_segments": true,
+    "min_segment_duration": 30,
+    "max_segment_duration": 600,
+
+    "_comment_categorization": "Map segments to categories",
+    "categories": {
+        "getting_started": ["intro", "quickstart", "setup", "install"],
+        "hooks": ["useState", "useEffect", "useContext", "hooks"],
+        "components": ["component", "props", "state", "render"],
+        "advanced": ["performance", "suspense", "concurrent", "ssr"]
+    },
+
+    "_comment_local_files": "For local video sources",
+    "file_patterns": ["*.mp4", "*.mkv", "*.webm"],
+    "subtitle_patterns": ["*.srt", "*.vtt"],
+    "recursive": true
+}
+```
+
+### VideoSourceConfig dataclass (parsed from JSON)
+
+```python
+@dataclass
+class VideoSourceConfig:
+    """Configuration for video source processing.
+
+    Parsed from the 'sources' entry in unified config JSON.
+    Provides defaults for all optional fields.
+    """
+    # Source specification (exactly one must be set)
+    url: str | None = None
+    playlist: str | None = None
+    channel: str | None = None
+    path: str | None = None
+    directory: str | None = None
+
+    # Identity
+    name: str = "video"
+    description: str = ""
+    weight: float = 0.2
+
+    # Filtering
+    max_videos: int = 50
+    min_duration: float = 60.0          # 1 minute
+    max_duration: float = 7200.0        # 2 hours
+    languages: list[str] | None = None  # None = all languages
+    title_include_patterns: list[str] | None = None
+    title_exclude_patterns: list[str] | None = None
+    min_views: int | None = None
+    upload_after: str | None = None     # ISO date
+
+    # Extraction
+    visual_extraction: bool = False     # Off by default (heavy)
+    whisper_model: str = "base"
+    whisper_device: str = "auto"        # 'auto', 'cpu', 'cuda'
+    ocr_languages: list[str] | None = None
+    keyframe_interval: float = 5.0      # Extract frame every N seconds within segment
+    min_scene_change_score: float = 0.3
+    ocr_confidence_threshold: float = 0.5
+    transcript_confidence_threshold: float = 0.3
+
+    # Segmentation
+    segmentation_strategy: str = "hybrid"
+    time_window_seconds: float = 300.0  # 5 minutes
+    merge_short_segments: bool = True
+    min_segment_duration: float = 30.0
+    max_segment_duration: float = 600.0
+
+    # Categorization
+    categories: dict[str, list[str]] | None = None
+
+    # Local file options
+    file_patterns: list[str] | None = None
+    subtitle_patterns: list[str] | None = None
+    recursive: bool = True
+
+    @classmethod
+    def from_dict(cls, data: dict) -> 'VideoSourceConfig':
+        """Create config from unified config source entry."""
+        ...
+
+    def validate(self) -> list[str]:
+        """Validate configuration. Returns list of errors."""
+        errors = []
+        sources_set = sum(1 for s in [self.url, self.playlist, self.channel,
+                                       self.path, self.directory] if s is not None)
+        if sources_set == 0:
+            errors.append("Video source must specify one of: url, playlist, channel, path, directory")
+        if sources_set > 1:
+            errors.append("Video source must specify exactly one source type")
+        if self.min_duration >= self.max_duration:
+            errors.append("min_duration must be less than max_duration")
+        if self.min_segment_duration >= self.max_segment_duration:
+            errors.append("min_segment_duration must be less than max_segment_duration")
+        return errors
+```
--- a/docs/plans/video/03_VIDEO_PIPELINE.md
+++ b/docs/plans/video/03_VIDEO_PIPELINE.md
--- a/docs/plans/video/04_VIDEO_INTEGRATION.md
+++ b/docs/plans/video/04_VIDEO_INTEGRATION.md
@@ -0,0 +1,808 @@
+# Video Source — System Integration
+
+**Date:** February 27, 2026
+**Document:** 04 of 07
+**Status:** Planning
+
+---
+
+## Table of Contents
+
+1. [CLI Integration](#cli-integration)
+2. [Source Detection](#source-detection)
+3. [Unified Config Integration](#unified-config-integration)
+4. [Unified Scraper Integration](#unified-scraper-integration)
+5. [Create Command Integration](#create-command-integration)
+6. [Parser & Arguments](#parser--arguments)
+7. [MCP Tool Integration](#mcp-tool-integration)
+8. [Enhancement Integration](#enhancement-integration)
+9. [File Map (New & Modified)](#file-map-new--modified-files)
+
+---
+
+## CLI Integration
+
+### New Subcommand: `video`
+
+```bash
+# Dedicated video scraping command
+skill-seekers video --url https://youtube.com/watch?v=abc123
+skill-seekers video --playlist https://youtube.com/playlist?list=PLxxx
+skill-seekers video --channel https://youtube.com/@channelname
+skill-seekers video --path ./recording.mp4
+skill-seekers video --directory ./recordings/
+
+# With options
+skill-seekers video --url <URL> \
+    --output output/react-videos/ \
+    --visual \
+    --whisper-model large-v3 \
+    --max-videos 20 \
+    --languages en \
+    --categories '{"hooks": ["useState", "useEffect"]}' \
+    --enhance-level 2
+```
+
+### Auto-Detection via `create` Command
+
+```bash
+# These all auto-detect as video sources
+skill-seekers create https://youtube.com/watch?v=abc123
+skill-seekers create https://youtu.be/abc123
+skill-seekers create https://youtube.com/playlist?list=PLxxx
+skill-seekers create https://youtube.com/@channelname
+skill-seekers create https://vimeo.com/123456789
+skill-seekers create ./tutorial.mp4
+skill-seekers create ./recordings/                # Directory of videos
+
+# With universal flags
+skill-seekers create https://youtube.com/watch?v=abc123 --visual -p comprehensive
+skill-seekers create ./tutorial.mp4 --enhance-level 2 --dry-run
+```
+
+### Registration in main.py
+
+```python
+# In src/skill_seekers/cli/main.py - COMMAND_MODULES dict
+
+COMMAND_MODULES = {
+    # ... existing commands ...
+    'video': 'skill_seekers.cli.video_scraper',
+    # ... rest of commands ...
+}
+```
+
+---
+
+## Source Detection
+
+### Changes to `source_detector.py`
+
+```python
+# New patterns to add:
+
+class SourceDetector:
+    # Existing patterns...
+
+    # NEW: Video URL patterns
+    YOUTUBE_VIDEO_PATTERN = re.compile(
+        r'(?:https?://)?(?:www\.)?'
+        r'(?:youtube\.com/watch\?v=|youtu\.be/)'
+        r'([a-zA-Z0-9_-]{11})'
+    )
+    YOUTUBE_PLAYLIST_PATTERN = re.compile(
+        r'(?:https?://)?(?:www\.)?'
+        r'youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)'
+    )
+    YOUTUBE_CHANNEL_PATTERN = re.compile(
+        r'(?:https?://)?(?:www\.)?'
+        r'youtube\.com/(?:@|c/|channel/|user/)([a-zA-Z0-9_.-]+)'
+    )
+    VIMEO_PATTERN = re.compile(
+        r'(?:https?://)?(?:www\.)?vimeo\.com/(\d+)'
+    )
+
+    # Video file extensions
+    VIDEO_EXTENSIONS = {
+        '.mp4', '.mkv', '.webm', '.avi', '.mov',
+        '.flv', '.ts', '.wmv', '.m4v', '.ogv',
+    }
+
+    @classmethod
+    def detect(cls, source: str) -> SourceInfo:
+        """Updated detection order:
+        1. .json (config)
+        2. .pdf
+        3. .docx
+        4. Video file extensions (.mp4, .mkv, .webm, etc.)  ← NEW
+        5. Directory (may contain videos)
+        6. YouTube/Vimeo URL patterns  ← NEW
+        7. GitHub patterns
+        8. Web URL
+        9. Domain inference
+        """
+        # 1. Config file
+        if source.endswith('.json'):
+            return cls._detect_config(source)
+
+        # 2. PDF file
+        if source.endswith('.pdf'):
+            return cls._detect_pdf(source)
+
+        # 3. Word document
+        if source.endswith('.docx'):
+            return cls._detect_word(source)
+
+        # 4. NEW: Video file
+        ext = os.path.splitext(source)[1].lower()
+        if ext in cls.VIDEO_EXTENSIONS:
+            return cls._detect_video_file(source)
+
+        # 5. Directory
+        if os.path.isdir(source):
+            # Check if directory contains mostly video files
+            if cls._is_video_directory(source):
+                return cls._detect_video_directory(source)
+            return cls._detect_local(source)
+
+        # 6. NEW: Video URL patterns (before general web URL)
+        video_info = cls._detect_video_url(source)
+        if video_info:
+            return video_info
+
+        # 7. GitHub patterns
+        github_info = cls._detect_github(source)
+        if github_info:
+            return github_info
+
+        # 8. Web URL
+        if source.startswith('http://') or source.startswith('https://'):
+            return cls._detect_web(source)
+
+        # 9. Domain inference
+        if '.' in source and not source.startswith('/'):
+            return cls._detect_web(f'https://{source}')
+
+        raise ValueError(
+            f"Cannot determine source type for: {source}\n\n"
+            "Examples:\n"
+            "  Web:      skill-seekers create https://docs.react.dev/\n"
+            "  GitHub:   skill-seekers create facebook/react\n"
+            "  Local:    skill-seekers create ./my-project\n"
+            "  PDF:      skill-seekers create tutorial.pdf\n"
+            "  DOCX:     skill-seekers create document.docx\n"
+            "  Video:    skill-seekers create https://youtube.com/watch?v=xxx\n"  # NEW
+            "  Playlist: skill-seekers create https://youtube.com/playlist?list=xxx\n"  # NEW
+            "  Config:   skill-seekers create configs/react.json"
+        )
+
+    @classmethod
+    def _detect_video_url(cls, source: str) -> SourceInfo | None:
+        """Detect YouTube or Vimeo video URL."""
+
+        # YouTube video
+        match = cls.YOUTUBE_VIDEO_PATTERN.search(source)
+        if match:
+            video_id = match.group(1)
+            return SourceInfo(
+                type='video',
+                parsed={
+                    'video_source': 'youtube_video',
+                    'video_id': video_id,
+                    'url': f'https://www.youtube.com/watch?v={video_id}',
+                },
+                suggested_name=f'video-{video_id}',
+                raw_input=source,
+            )
+
+        # YouTube playlist
+        match = cls.YOUTUBE_PLAYLIST_PATTERN.search(source)
+        if match:
+            playlist_id = match.group(1)
+            return SourceInfo(
+                type='video',
+                parsed={
+                    'video_source': 'youtube_playlist',
+                    'playlist_id': playlist_id,
+                    'url': f'https://www.youtube.com/playlist?list={playlist_id}',
+                },
+                suggested_name=f'playlist-{playlist_id[:12]}',
+                raw_input=source,
+            )
+
+        # YouTube channel
+        match = cls.YOUTUBE_CHANNEL_PATTERN.search(source)
+        if match:
+            channel_name = match.group(1)
+            return SourceInfo(
+                type='video',
+                parsed={
+                    'video_source': 'youtube_channel',
+                    'channel': channel_name,
+                    'url': source if source.startswith('http') else f'https://www.youtube.com/@{channel_name}',
+                },
+                suggested_name=channel_name.lstrip('@'),
+                raw_input=source,
+            )
+
+        # Vimeo
+        match = cls.VIMEO_PATTERN.search(source)
+        if match:
+            video_id = match.group(1)
+            return SourceInfo(
+                type='video',
+                parsed={
+                    'video_source': 'vimeo',
+                    'video_id': video_id,
+                    'url': f'https://vimeo.com/{video_id}',
+                },
+                suggested_name=f'vimeo-{video_id}',
+                raw_input=source,
+            )
+
+        return None
+
+    @classmethod
+    def _detect_video_file(cls, source: str) -> SourceInfo:
+        """Detect local video file."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type='video',
+            parsed={
+                'video_source': 'local_file',
+                'file_path': os.path.abspath(source),
+            },
+            suggested_name=name,
+            raw_input=source,
+        )
+
+    @classmethod
+    def _detect_video_directory(cls, source: str) -> SourceInfo:
+        """Detect directory containing video files."""
+        directory = os.path.abspath(source)
+        name = os.path.basename(directory)
+        return SourceInfo(
+            type='video',
+            parsed={
+                'video_source': 'local_directory',
+                'directory': directory,
+            },
+            suggested_name=name,
+            raw_input=source,
+        )
+
+    @classmethod
+    def _is_video_directory(cls, path: str) -> bool:
+        """Check if a directory contains mostly video files.
+
+        Returns True if >50% of files are video files.
+        Used to distinguish video directories from code directories.
+        """
+        total = 0
+        video = 0
+        for f in os.listdir(path):
+            if os.path.isfile(os.path.join(path, f)):
+                total += 1
+                ext = os.path.splitext(f)[1].lower()
+                if ext in cls.VIDEO_EXTENSIONS:
+                    video += 1
+        return total > 0 and (video / total) > 0.5
+
+    @classmethod
+    def validate_source(cls, source_info: SourceInfo) -> None:
+        """Updated to include video validation."""
+        # ... existing validation ...
+
+        if source_info.type == 'video':
+            video_source = source_info.parsed.get('video_source')
+            if video_source == 'local_file':
+                file_path = source_info.parsed['file_path']
+                if not os.path.exists(file_path):
+                    raise ValueError(f"Video file does not exist: {file_path}")
+            elif video_source == 'local_directory':
+                directory = source_info.parsed['directory']
+                if not os.path.exists(directory):
+                    raise ValueError(f"Video directory does not exist: {directory}")
+            # For online sources, validation happens during scraping
+```
+
+---
+
+## Unified Config Integration
+
+### Updated `scraped_data` dict in `unified_scraper.py`
+
+```python
+# In UnifiedScraper.__init__():
+self.scraped_data = {
+    "documentation": [],
+    "github": [],
+    "pdf": [],
+    "word": [],
+    "local": [],
+    "video": [],      # ← NEW
+}
+```
+
+### Video Source Processing in Unified Scraper
+
+```python
+def _scrape_video_source(self, source: dict, source_index: int) -> dict:
+    """Process a video source from unified config.
+
+    Args:
+        source: Video source config dict from unified JSON
+        source_index: Index for unique naming
+
+    Returns:
+        Dict with scraping results and metadata
+    """
+    from skill_seekers.cli.video_scraper import VideoScraper
+    from skill_seekers.cli.video_models import VideoSourceConfig
+
+    config = VideoSourceConfig.from_dict(source)
+    scraper = VideoScraper(config=config, output_dir=self.output_dir)
+
+    result = scraper.scrape()
+
+    return {
+        'source_type': 'video',
+        'source_name': source.get('name', f'video_{source_index}'),
+        'weight': source.get('weight', 0.2),
+        'result': result,
+        'video_count': len(result.videos),
+        'segment_count': result.total_segments,
+        'categories': result.categories,
+    }
+```
+
+### Example Unified Config with Video
+
+```json
+{
+    "name": "react-complete",
+    "description": "React 19 - Documentation + Code + Video Tutorials",
+    "output_dir": "output/react-complete/",
+
+    "sources": [
+        {
+            "type": "documentation",
+            "url": "https://react.dev/",
+            "name": "official_docs",
+            "weight": 0.4,
+            "selectors": {
+                "main_content": "article",
+                "code_blocks": "pre code"
+            },
+            "categories": {
+                "getting_started": ["learn", "quick-start"],
+                "hooks": ["hooks", "use-state", "use-effect"],
+                "api": ["reference", "api"]
+            }
+        },
+        {
+            "type": "github",
+            "repo": "facebook/react",
+            "name": "source_code",
+            "weight": 0.3,
+            "analysis_depth": "deep"
+        },
+        {
+            "type": "video",
+            "playlist": "https://www.youtube.com/playlist?list=PLreactplaylist",
+            "name": "official_tutorials",
+            "weight": 0.2,
+            "max_videos": 15,
+            "visual_extraction": true,
+            "languages": ["en"],
+            "categories": {
+                "getting_started": ["intro", "quickstart", "setup"],
+                "hooks": ["useState", "useEffect", "hooks"],
+                "advanced": ["suspense", "concurrent", "server"]
+            }
+        },
+        {
+            "type": "video",
+            "url": "https://www.youtube.com/watch?v=abc123def45",
+            "name": "react_conf_keynote",
+            "weight": 0.1,
+            "visual_extraction": false
+        }
+    ],
+
+    "merge_strategy": "unified",
+    "conflict_resolution": "docs_first",
+
+    "enhancement": {
+        "enabled": true,
+        "level": 2
+    }
+}
+```
+
+---
+
+## Create Command Integration
+
+### Changes to Create Command Routing
+
+```python
+# In src/skill_seekers/cli/create_command.py (or equivalent in main.py)
+
+def route_source(source_info: SourceInfo, args: argparse.Namespace):
+    """Route detected source to appropriate scraper."""
+
+    if source_info.type == 'web':
+        return _route_web(source_info, args)
+    elif source_info.type == 'github':
+        return _route_github(source_info, args)
+    elif source_info.type == 'local':
+        return _route_local(source_info, args)
+    elif source_info.type == 'pdf':
+        return _route_pdf(source_info, args)
+    elif source_info.type == 'word':
+        return _route_word(source_info, args)
+    elif source_info.type == 'video':          # ← NEW
+        return _route_video(source_info, args)
+    elif source_info.type == 'config':
+        return _route_config(source_info, args)
+
+
+def _route_video(source_info: SourceInfo, args: argparse.Namespace):
+    """Route video source to video scraper."""
+    from skill_seekers.cli.video_scraper import VideoScraper
+    from skill_seekers.cli.video_models import VideoSourceConfig
+
+    parsed = source_info.parsed
+
+    # Build config from CLI args + parsed source info
+    config_dict = {
+        'name': getattr(args, 'name', None) or source_info.suggested_name,
+        'visual_extraction': getattr(args, 'visual', False),
+        'whisper_model': getattr(args, 'whisper_model', 'base'),
+        'max_videos': getattr(args, 'max_videos', 50),
+        'languages': getattr(args, 'languages', None),
+    }
+
+    # Set the appropriate source field
+    video_source = parsed['video_source']
+    if video_source in ('youtube_video', 'vimeo'):
+        config_dict['url'] = parsed['url']
+    elif video_source == 'youtube_playlist':
+        config_dict['playlist'] = parsed['url']
+    elif video_source == 'youtube_channel':
+        config_dict['channel'] = parsed['url']
+    elif video_source == 'local_file':
+        config_dict['path'] = parsed['file_path']
+    elif video_source == 'local_directory':
+        config_dict['directory'] = parsed['directory']
+
+    config = VideoSourceConfig.from_dict(config_dict)
+    output_dir = getattr(args, 'output', None) or f'output/{config_dict["name"]}/'
+
+    scraper = VideoScraper(config=config, output_dir=output_dir)
+
+    if getattr(args, 'dry_run', False):
+        scraper.dry_run()
+        return
+
+    result = scraper.scrape()
+    scraper.generate_output(result)
+```
+
+---
+
+## Parser & Arguments
+
+### New Parser: `video_parser.py`
+
+```python
+# src/skill_seekers/cli/parsers/video_parser.py
+
+from skill_seekers.cli.parsers.base import SubcommandParser
+
+
+class VideoParser(SubcommandParser):
+    """Parser for the video scraping command."""
+
+    name = 'video'
+    help = 'Extract knowledge from YouTube videos, playlists, channels, or local video files'
+    description = (
+        'Process video content into structured skill documentation.\n\n'
+        'Supports YouTube (single video, playlist, channel), Vimeo, and local video files.\n'
+        'Extracts transcripts, metadata, chapters, and optionally visual content (code, slides).'
+    )
+
+    def add_arguments(self, parser):
+        # Source (mutually exclusive group)
+        source = parser.add_mutually_exclusive_group(required=True)
+        source.add_argument('--url', help='YouTube or Vimeo video URL')
+        source.add_argument('--playlist', help='YouTube playlist URL')
+        source.add_argument('--channel', help='YouTube channel URL')
+        source.add_argument('--path', help='Local video file path')
+        source.add_argument('--directory', help='Directory containing video files')
+
+        # Add shared arguments (output, dry-run, verbose, etc.)
+        from skill_seekers.cli.arguments.common import add_all_standard_arguments
+        add_all_standard_arguments(parser)
+
+        # Add video-specific arguments
+        from skill_seekers.cli.arguments.video import add_video_arguments
+        add_video_arguments(parser)
+```
+
+### New Arguments: `video.py`
+
+```python
+# src/skill_seekers/cli/arguments/video.py
+
+VIDEO_ARGUMENTS = {
+    # === Filtering ===
+    "max_videos": {
+        "flags": ("--max-videos",),
+        "kwargs": {
+            "type": int,
+            "default": 50,
+            "help": "Maximum number of videos to process (default: 50)",
+        },
+    },
+    "min_duration": {
+        "flags": ("--min-duration",),
+        "kwargs": {
+            "type": float,
+            "default": 60.0,
+            "help": "Minimum video duration in seconds (default: 60)",
+        },
+    },
+    "max_duration": {
+        "flags": ("--max-duration",),
+        "kwargs": {
+            "type": float,
+            "default": 7200.0,
+            "help": "Maximum video duration in seconds (default: 7200 = 2 hours)",
+        },
+    },
+    "languages": {
+        "flags": ("--languages",),
+        "kwargs": {
+            "nargs": "+",
+            "default": None,
+            "help": "Preferred transcript languages (default: all). Example: --languages en es",
+        },
+    },
+    "min_views": {
+        "flags": ("--min-views",),
+        "kwargs": {
+            "type": int,
+            "default": None,
+            "help": "Minimum view count filter (online videos only)",
+        },
+    },
+
+    # === Extraction ===
+    "visual": {
+        "flags": ("--visual",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Enable visual extraction (OCR on keyframes). Requires video-full dependencies.",
+        },
+    },
+    "whisper_model": {
+        "flags": ("--whisper-model",),
+        "kwargs": {
+            "default": "base",
+            "choices": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo"],
+            "help": "Whisper model size for speech-to-text (default: base)",
+        },
+    },
+    "whisper_device": {
+        "flags": ("--whisper-device",),
+        "kwargs": {
+            "default": "auto",
+            "choices": ["auto", "cpu", "cuda"],
+            "help": "Device for Whisper inference (default: auto)",
+        },
+    },
+    "ocr_languages": {
+        "flags": ("--ocr-languages",),
+        "kwargs": {
+            "nargs": "+",
+            "default": None,
+            "help": "OCR languages for visual extraction (default: same as --languages)",
+        },
+    },
+
+    # === Segmentation ===
+    "segment_strategy": {
+        "flags": ("--segment-strategy",),
+        "kwargs": {
+            "default": "hybrid",
+            "choices": ["chapters", "semantic", "time_window", "scene_change", "hybrid"],
+            "help": "How to segment video content (default: hybrid)",
+        },
+    },
+    "segment_duration": {
+        "flags": ("--segment-duration",),
+        "kwargs": {
+            "type": float,
+            "default": 300.0,
+            "help": "Target segment duration in seconds for time_window strategy (default: 300)",
+        },
+    },
+
+    # === Local file options ===
+    "file_patterns": {
+        "flags": ("--file-patterns",),
+        "kwargs": {
+            "nargs": "+",
+            "default": None,
+            "help": "File patterns for directory scanning (default: *.mp4 *.mkv *.webm)",
+        },
+    },
+    "recursive": {
+        "flags": ("--recursive",),
+        "kwargs": {
+            "action": "store_true",
+            "default": True,
+            "help": "Recursively scan directories (default: True)",
+        },
+    },
+    "no_recursive": {
+        "flags": ("--no-recursive",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Disable recursive directory scanning",
+        },
+    },
+}
+
+
+def add_video_arguments(parser):
+    """Add all video-specific arguments to a parser."""
+    for arg_name, arg_def in VIDEO_ARGUMENTS.items():
+        parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
+```
+
+### Progressive Help for Create Command
+
+```python
+# In arguments/create.py - add video to help modes
+
+# New help flag
+"help_video": {
+    "flags": ("--help-video",),
+    "kwargs": {
+        "action": "store_true",
+        "help": "Show video-specific options",
+    },
+}
+
+# VIDEO_ARGUMENTS added to create command's video help mode
+# skill-seekers create --help-video
+```
+
+---
+
+## MCP Tool Integration
+
+### New MCP Tool: `scrape_video`
+
+```python
+# In src/skill_seekers/mcp/tools/scraping_tools.py
+
+@mcp.tool()
+def scrape_video(
+    url: str | None = None,
+    playlist: str | None = None,
+    path: str | None = None,
+    output_dir: str = "output/",
+    visual: bool = False,
+    max_videos: int = 20,
+    whisper_model: str = "base",
+) -> str:
+    """Scrape and extract knowledge from video content.
+
+    Supports YouTube videos, playlists, channels, and local video files.
+    Extracts transcripts, metadata, chapters, and optionally visual content.
+
+    Args:
+        url: YouTube or Vimeo video URL
+        playlist: YouTube playlist URL
+        path: Local video file or directory path
+        output_dir: Output directory for results
+        visual: Enable visual extraction (OCR on keyframes)
+        max_videos: Maximum videos to process (for playlists)
+        whisper_model: Whisper model size for transcription
+
+    Returns:
+        JSON string with scraping results summary
+    """
+    ...
+```
+
+### Updated Tool Count
+
+Total MCP tools: **27** (was 26, add `scrape_video`)
+
+---
+
+## Enhancement Integration
+
+### Video Content Enhancement
+
+Video segments can be enhanced using the same AI enhancement pipeline:
+
+```python
+# In enhance_skill_local.py or enhance_command.py
+
+def enhance_video_content(segments: list[VideoSegment], level: int) -> list[VideoSegment]:
+    """AI-enhance video segments.
+
+    Enhancement levels:
+    0 - No enhancement
+    1 - Summary generation per segment
+    2 - + Topic extraction, category refinement, code annotation
+    3 - + Cross-segment connections, tutorial flow analysis, key takeaways
+
+    Uses the same enhancement infrastructure as other sources.
+    """
+    if level == 0:
+        return segments
+
+    for segment in segments:
+        if level >= 1:
+            segment.summary = ai_summarize(segment.content)
+
+        if level >= 2:
+            segment.topic = ai_extract_topic(segment.content)
+            segment.category = ai_refine_category(
+                segment.content, segment.category
+            )
+            # Annotate code blocks with explanations
+            for cb in segment.detected_code_blocks:
+                cb.explanation = ai_explain_code(cb.code, segment.transcript)
+
+        if level >= 3:
+            # Cross-segment analysis (needs all segments)
+            pass  # Handled at video level, not segment level
+
+    return segments
+```
+
+---
+
+## File Map (New & Modified Files)
+
+### New Files
+
+| File | Purpose | Estimated Size |
+|------|---------|---------------|
+| `src/skill_seekers/cli/video_scraper.py` | Main video scraper orchestrator | ~800-1000 lines |
+| `src/skill_seekers/cli/video_models.py` | All data classes and enums | ~500-600 lines |
+| `src/skill_seekers/cli/video_transcript.py` | Transcript extraction (YouTube API + Whisper) | ~400-500 lines |
+| `src/skill_seekers/cli/video_visual.py` | Visual extraction (scene detection + OCR) | ~500-600 lines |
+| `src/skill_seekers/cli/video_segmenter.py` | Segmentation and stream alignment | ~400-500 lines |
+| `src/skill_seekers/cli/parsers/video_parser.py` | CLI argument parser | ~80-100 lines |
+| `src/skill_seekers/cli/arguments/video.py` | Video-specific argument definitions | ~120-150 lines |
+| `tests/test_video_scraper.py` | Video scraper tests | ~600-800 lines |
+| `tests/test_video_transcript.py` | Transcript extraction tests | ~400-500 lines |
+| `tests/test_video_visual.py` | Visual extraction tests | ~400-500 lines |
+| `tests/test_video_segmenter.py` | Segmentation tests | ~300-400 lines |
+| `tests/test_video_models.py` | Data model tests | ~200-300 lines |
+| `tests/test_video_integration.py` | Integration tests | ~300-400 lines |
+| `tests/fixtures/video/` | Test fixtures (mock transcripts, metadata) | Various |
+
+### Modified Files
+
+| File | Changes |
+|------|---------|
+| `src/skill_seekers/cli/source_detector.py` | Add video URL patterns, video file detection, video directory detection |
+| `src/skill_seekers/cli/main.py` | Register `video` subcommand in COMMAND_MODULES |
+| `src/skill_seekers/cli/unified_scraper.py` | Add `"video": []` to scraped_data, add `_scrape_video_source()` |
+| `src/skill_seekers/cli/arguments/create.py` | Add video args to create command, add `--help-video` |
+| `src/skill_seekers/cli/parsers/__init__.py` | Register VideoParser |
+| `src/skill_seekers/cli/config_validator.py` | Validate video source entries in unified config |
+| `src/skill_seekers/mcp/tools/scraping_tools.py` | Add `scrape_video` tool |
+| `pyproject.toml` | Add `[video]` and `[video-full]` optional dependencies, add `skill-seekers-video` entry point |
+| `tests/test_source_detector.py` | Add video detection tests |
+| `tests/test_unified.py` | Add video source integration tests |
--- a/docs/plans/video/05_VIDEO_OUTPUT.md
+++ b/docs/plans/video/05_VIDEO_OUTPUT.md
@@ -0,0 +1,619 @@
+# Video Source — Output Structure & SKILL.md Integration
+
+**Date:** February 27, 2026
+**Document:** 05 of 07
+**Status:** Planning
+
+---
+
+## Table of Contents
+
+1. [Output Directory Structure](#output-directory-structure)
+2. [Reference File Format](#reference-file-format)
+3. [SKILL.md Section Format](#skillmd-section-format)
+4. [Metadata JSON Format](#metadata-json-format)
+5. [Page JSON Format (Compatibility)](#page-json-format-compatibility)
+6. [RAG Chunking for Video](#rag-chunking-for-video)
+7. [Examples](#examples)
+
+---
+
+## Output Directory Structure
+
+```
+output/{skill_name}/
+├── SKILL.md                              # Main skill file (video section added)
+├── references/
+│   ├── getting_started.md                # From docs (existing)
+│   ├── api.md                            # From docs (existing)
+│   ├── video_react-hooks-tutorial.md     # ← Video reference file
+│   ├── video_project-setup-guide.md      # ← Video reference file
+│   └── video_advanced-patterns.md        # ← Video reference file
+├── video_data/                           # ← NEW: Video-specific data
+│   ├── metadata.json                     # VideoScraperResult (full metadata)
+│   ├── transcripts/
+│   │   ├── abc123def45.json              # Raw transcript per video
+│   │   ├── xyz789ghi01.json
+│   │   └── ...
+│   ├── segments/
+│   │   ├── abc123def45_segments.json     # Aligned segments per video
+│   │   ├── xyz789ghi01_segments.json
+│   │   └── ...
+│   └── frames/                           # Only if --visual enabled
+│       ├── abc123def45/
+│       │   ├── frame_045.00_terminal.png
+│       │   ├── frame_052.30_code.png
+│       │   ├── frame_128.00_slide.png
+│       │   └── ...
+│       └── xyz789ghi01/
+│           └── ...
+├── pages/                                # Existing page format
+│   ├── page_001.json                     # From docs (existing)
+│   ├── video_abc123def45.json            # ← Video in page format
+│   └── ...
+└── {skill_name}_data/                    # Raw scrape data (existing)
+```
+
+---
+
+## Reference File Format
+
+Each video produces one reference markdown file in `references/`. The filename is derived from the video title, sanitized and prefixed with `video_`.
+
+### Naming Convention
+
+```
+video_{sanitized_title}.md
+```
+
+Sanitization rules:
+- Lowercase
+- Replace spaces and special chars with hyphens
+- Remove consecutive hyphens
+- Truncate to 60 characters
+- Example: "React Hooks Tutorial for Beginners" → `video_react-hooks-tutorial-for-beginners.md`
+
+### File Structure
+
+```markdown
+# {Video Title}
+
+> **Source:** [{channel_name}]({channel_url}) | **Duration:** {HH:MM:SS} | **Published:** {date}
+> **URL:** [{url}]({url})
+> **Views:** {view_count} | **Likes:** {like_count}
+> **Tags:** {tag1}, {tag2}, {tag3}
+
+{description_summary (first 200 chars)}
+
+---
+
+## Table of Contents
+
+{auto-generated from chapter titles / segment headings}
+
+---
+
+{segments rendered as sections}
+
+### {Chapter Title or "Segment N"} ({MM:SS} - {MM:SS})
+
+{merged content: transcript + code blocks + slide text}
+
+```{language}
+{code shown on screen}
+```
+
+---
+
+### {Next Chapter} ({MM:SS} - {MM:SS})
+
+{content continues...}
+
+---
+
+## Key Takeaways
+
+{AI-generated summary of main points — populated during enhancement}
+
+## Code Examples
+
+{Consolidated list of all code blocks from the video}
+```
+
+### Full Example
+
+```markdown
+# React Hooks Tutorial for Beginners
+
+> **Source:** [React Official](https://youtube.com/@reactofficial) | **Duration:** 30:32 | **Published:** 2026-01-15
+> **URL:** [https://youtube.com/watch?v=abc123def45](https://youtube.com/watch?v=abc123def45)
+> **Views:** 1,500,000 | **Likes:** 45,000
+> **Tags:** react, hooks, tutorial, javascript, web development
+
+Learn React Hooks from scratch in this comprehensive tutorial. We'll cover useState, useEffect, useContext, and custom hooks with practical examples.
+
+---
+
+## Table of Contents
+
+- [Intro](#intro-0000---0045)
+- [Project Setup](#project-setup-0045---0300)
+- [useState Hook](#usestate-hook-0300---0900)
+- [useEffect Hook](#useeffect-hook-0900---1500)
+- [Custom Hooks](#custom-hooks-1500---2200)
+- [Best Practices](#best-practices-2200---2800)
+- [Wrap Up](#wrap-up-2800---3032)
+
+---
+
+### Intro (00:00 - 00:45)
+
+Welcome to this React Hooks tutorial. Today we'll learn about the most important hooks in React and how to use them effectively in your applications. By the end of this video, you'll understand useState, useEffect, useContext, and how to create your own custom hooks.
+
+---
+
+### Project Setup (00:45 - 03:00)
+
+Let's start by setting up our React project. We'll use Create React App which gives us a great starting point with all the tooling configured.
+
+**Terminal command:**
+```bash
+npx create-react-app hooks-demo
+cd hooks-demo
+npm start
+```
+
+Open the project in your code editor. You'll see the standard React project structure with src/App.js as our main component file. Let's clear out the boilerplate and start fresh.
+
+**Code shown in editor:**
+```jsx
+import React from 'react';
+
+function App() {
+  return (
+    <div className="App">
+      <h1>Hooks Demo</h1>
+    </div>
+  );
+}
+
+export default App;
+```
+
+---
+
+### useState Hook (03:00 - 09:00)
+
+The useState hook is the most fundamental hook in React. It lets you add state to functional components. Before hooks, you needed class components for state management.
+
+Let's create a simple counter to demonstrate useState. The hook returns an array with two elements: the current state value and a function to update it. We use array destructuring to name them.
+
+**Code shown in editor:**
+```jsx
+import React, { useState } from 'react';
+
+function Counter() {
+  const [count, setCount] = useState(0);
+
+  return (
+    <div>
+      <p>Count: {count}</p>
+      <button onClick={() => setCount(count + 1)}>
+        Increment
+      </button>
+      <button onClick={() => setCount(count - 1)}>
+        Decrement
+      </button>
+    </div>
+  );
+}
+```
+
+Important things to remember about useState: the initial value is only used on the first render. If you need to compute the initial value, pass a function instead of a value to avoid recomputing on every render.
+
+---
+
+## Key Takeaways
+
+1. **useState** is for managing simple state values in functional components
+2. **useEffect** handles side effects (data fetching, subscriptions, DOM updates)
+3. Always include a dependency array in useEffect to control when it runs
+4. Custom hooks let you extract reusable stateful logic
+5. Follow the Rules of Hooks: only call hooks at the top level, only in React functions
+
+## Code Examples
+
+### Counter with useState
+```jsx
+const [count, setCount] = useState(0);
+```
+
+### Data Fetching with useEffect
+```jsx
+useEffect(() => {
+  fetch('/api/data')
+    .then(res => res.json())
+    .then(setData);
+}, []);
+```
+
+### Custom Hook: useLocalStorage
+```jsx
+function useLocalStorage(key, initialValue) {
+  const [value, setValue] = useState(() => {
+    const saved = localStorage.getItem(key);
+    return saved ? JSON.parse(saved) : initialValue;
+  });
+
+  useEffect(() => {
+    localStorage.setItem(key, JSON.stringify(value));
+  }, [key, value]);
+
+  return [value, setValue];
+}
+```
+```
+
+---
+
+## SKILL.md Section Format
+
+Video content is integrated into SKILL.md as a dedicated section, following the existing section patterns.
+
+### Section Placement
+
+```markdown
+# {Skill Name}
+
+## Overview
+{existing overview section}
+
+## Quick Reference
+{existing quick reference}
+
+## Getting Started
+{from docs/github}
+
+## Core Concepts
+{from docs/github}
+
+## API Reference
+{from docs/github}
+
+## Video Tutorials                    ← NEW SECTION
+{from video sources}
+
+## Code Examples
+{consolidated from all sources}
+
+## References
+{file listing}
+```
+
+### Section Content
+
+```markdown
+## Video Tutorials
+
+This skill includes knowledge extracted from {N} video tutorial(s) totaling {HH:MM:SS} of content.
+
+### {Video Title 1}
+**Source:** [{channel}]({url}) | {duration} | {view_count} views
+
+{summary or first segment content, abbreviated}
+
+**Topics covered:** {chapter titles or detected topics}
+
+→ Full transcript: [references/video_{sanitized_title}.md](references/video_{sanitized_title}.md)
+
+---
+
+### {Video Title 2}
+...
+
+### Key Patterns from Videos
+
+{AI-generated section highlighting patterns that appear across multiple videos}
+
+### Code Examples from Videos
+
+{Consolidated code blocks from all videos, organized by topic}
+
+```{language}
+// From: {video_title} at {timestamp}
+{code}
+```
+```
+
+### Playlist Grouping
+
+When a video source is a playlist, the SKILL.md section groups videos under the playlist title:
+
+```markdown
+## Video Tutorials
+
+### React Complete Course (12 videos, 6:30:00 total)
+
+1. **Introduction to React** (15:00) — Components, JSX, virtual DOM
+2. **React Hooks Deep Dive** (30:32) — useState, useEffect, custom hooks
+3. **State Management** (28:15) — Context API, Redux patterns
+...
+
+→ Full transcripts in [references/](references/) (video_*.md files)
+```
+
+---
+
+## Metadata JSON Format
+
+### `video_data/metadata.json` — Full scraper result
+
+```json
+{
+    "scraper_version": "3.2.0",
+    "extracted_at": "2026-02-27T14:30:00Z",
+    "processing_time_seconds": 125.4,
+    "config": {
+        "visual_extraction": true,
+        "whisper_model": "base",
+        "segmentation_strategy": "hybrid",
+        "max_videos": 20
+    },
+    "summary": {
+        "total_videos": 5,
+        "total_duration_seconds": 5420.0,
+        "total_segments": 42,
+        "total_code_blocks": 18,
+        "total_keyframes": 156,
+        "languages": ["en"],
+        "categories_found": ["getting_started", "hooks", "advanced"]
+    },
+    "videos": [
+        {
+            "video_id": "abc123def45",
+            "title": "React Hooks Tutorial for Beginners",
+            "duration": 1832.0,
+            "segments_count": 7,
+            "code_blocks_count": 5,
+            "transcript_source": "youtube_manual",
+            "transcript_confidence": 0.95,
+            "content_richness_score": 0.88,
+            "reference_file": "references/video_react-hooks-tutorial-for-beginners.md"
+        }
+    ],
+    "warnings": [
+        "Video xyz789: Auto-generated captions used (manual not available)"
+    ],
+    "errors": []
+}
+```
+
+### `video_data/transcripts/{video_id}.json` — Raw transcript
+
+```json
+{
+    "video_id": "abc123def45",
+    "transcript_source": "youtube_manual",
+    "language": "en",
+    "segments": [
+        {
+            "text": "Welcome to this React Hooks tutorial.",
+            "start": 0.0,
+            "end": 2.5,
+            "confidence": 1.0,
+            "words": null
+        },
+        {
+            "text": "Today we'll learn about the most important hooks.",
+            "start": 2.5,
+            "end": 5.8,
+            "confidence": 1.0,
+            "words": null
+        }
+    ]
+}
+```
+
+### `video_data/segments/{video_id}_segments.json` — Aligned segments
+
+```json
+{
+    "video_id": "abc123def45",
+    "segmentation_strategy": "chapters",
+    "segments": [
+        {
+            "index": 0,
+            "start_time": 0.0,
+            "end_time": 45.0,
+            "duration": 45.0,
+            "chapter_title": "Intro",
+            "category": "getting_started",
+            "content_type": "explanation",
+            "transcript": "Welcome to this React Hooks tutorial...",
+            "transcript_confidence": 0.95,
+            "has_code_on_screen": false,
+            "has_slides": false,
+            "keyframes_count": 2,
+            "code_blocks_count": 0,
+            "confidence": 0.95
+        }
+    ]
+}
+```
+
+---
+
+## Page JSON Format (Compatibility)
+
+For compatibility with the existing page-based pipeline (`pages/*.json`), each video also produces a page JSON file. This ensures video content flows through the same build pipeline as other sources.
+
+### `pages/video_{video_id}.json`
+
+```json
+{
+    "url": "https://www.youtube.com/watch?v=abc123def45",
+    "title": "React Hooks Tutorial for Beginners",
+    "content": "{full merged content from all segments}",
+    "category": "tutorials",
+    "source_type": "video",
+    "metadata": {
+        "video_id": "abc123def45",
+        "duration": 1832.0,
+        "channel": "React Official",
+        "view_count": 1500000,
+        "chapters": 7,
+        "transcript_source": "youtube_manual",
+        "has_visual_extraction": true
+    },
+    "code_blocks": [
+        {
+            "language": "jsx",
+            "code": "const [count, setCount] = useState(0);",
+            "source": "video_ocr",
+            "timestamp": 195.0
+        }
+    ],
+    "extracted_at": "2026-02-27T14:30:00Z"
+}
+```
+
+This format is compatible with the existing `build_skill()` function in `doc_scraper.py`, which reads `pages/*.json` files to build the skill.
+
+---
+
+## RAG Chunking for Video
+
+When `--chunk-for-rag` is enabled, video segments are chunked differently from text documents because they already have natural boundaries (chapters/segments).
+
+### Chunking Strategy
+
+```
+For each VideoSegment:
+    IF segment.duration <= chunk_duration_threshold (default: 300s / 5 min):
+        → Output as single chunk
+
+    ELIF segment has sub-sections (code blocks interleaved with explanation):
+        → Split at code block boundaries
+        → Each chunk = explanation + associated code block
+
+    ELSE (long segment without clear sub-sections):
+        → Split at sentence boundaries
+        → Target chunk size: config.chunk_size tokens
+        → Overlap: config.chunk_overlap tokens
+```
+
+### RAG Metadata per Chunk
+
+```json
+{
+    "text": "chunk content...",
+    "metadata": {
+        "source": "video",
+        "source_type": "youtube",
+        "video_id": "abc123def45",
+        "video_title": "React Hooks Tutorial",
+        "channel": "React Official",
+        "timestamp_start": 180.0,
+        "timestamp_end": 300.0,
+        "timestamp_url": "https://youtube.com/watch?v=abc123def45&t=180",
+        "chapter": "useState Hook",
+        "category": "hooks",
+        "content_type": "live_coding",
+        "has_code": true,
+        "language": "en",
+        "confidence": 0.94,
+        "view_count": 1500000,
+        "upload_date": "2026-01-15"
+    }
+}
+```
+
+The `timestamp_url` field is especially valuable — it lets RAG systems link directly to the relevant moment in the video.
+
+---
+
+## Examples
+
+### Minimal Output (transcript only, single video)
+
+```
+output/react-hooks-video/
+├── SKILL.md                          # Skill with video section
+├── references/
+│   └── video_react-hooks-tutorial.md  # Full transcript organized by chapters
+├── video_data/
+│   ├── metadata.json                 # Scraper metadata
+│   ├── transcripts/
+│   │   └── abc123def45.json          # Raw transcript
+│   └── segments/
+│       └── abc123def45_segments.json  # Aligned segments
+└── pages/
+    └── video_abc123def45.json         # Page-compatible format
+```
+
+### Full Output (visual extraction, playlist of 5 videos)
+
+```
+output/react-complete/
+├── SKILL.md
+├── references/
+│   ├── video_intro-to-react.md
+│   ├── video_react-hooks-deep-dive.md
+│   ├── video_state-management.md
+│   ├── video_react-router.md
+│   └── video_testing-react-apps.md
+├── video_data/
+│   ├── metadata.json
+│   ├── transcripts/
+│   │   ├── abc123def45.json
+│   │   ├── def456ghi78.json
+│   │   ├── ghi789jkl01.json
+│   │   ├── jkl012mno34.json
+│   │   └── mno345pqr67.json
+│   ├── segments/
+│   │   ├── abc123def45_segments.json
+│   │   ├── def456ghi78_segments.json
+│   │   ├── ghi789jkl01_segments.json
+│   │   ├── jkl012mno34_segments.json
+│   │   └── mno345pqr67_segments.json
+│   └── frames/
+│       ├── abc123def45/
+│       │   ├── frame_045.00_terminal.png
+│       │   ├── frame_052.30_code.png
+│       │   ├── frame_128.00_slide.png
+│       │   └── ... (50+ frames)
+│       ├── def456ghi78/
+│       │   └── ...
+│       └── ...
+└── pages/
+    ├── video_abc123def45.json
+    ├── video_def456ghi78.json
+    ├── video_ghi789jkl01.json
+    ├── video_jkl012mno34.json
+    └── video_mno345pqr67.json
+```
+
+### Mixed Source Output (docs + github + video)
+
+```
+output/react-unified/
+├── SKILL.md                              # Unified skill with ALL sources
+├── references/
+│   ├── getting_started.md                # From docs
+│   ├── hooks.md                          # From docs
+│   ├── api_reference.md                  # From docs
+│   ├── architecture.md                   # From GitHub analysis
+│   ├── patterns.md                       # From GitHub analysis
+│   ├── video_react-hooks-tutorial.md     # From video
+│   ├── video_react-conf-keynote.md       # From video
+│   └── video_advanced-patterns.md        # From video
+├── video_data/
+│   └── ... (video-specific data)
+├── pages/
+│   ├── page_001.json                     # From docs
+│   ├── page_002.json
+│   ├── video_abc123def45.json            # From video
+│   └── video_def456ghi78.json
+└── react_data/
+    └── pages/                            # Raw scrape data
+```
--- a/docs/plans/video/06_VIDEO_TESTING.md
+++ b/docs/plans/video/06_VIDEO_TESTING.md
@@ -0,0 +1,748 @@
+# Video Source — Testing Strategy
+
+**Date:** February 27, 2026
+**Document:** 06 of 07
+**Status:** Planning
+
+---
+
+## Table of Contents
+
+1. [Testing Principles](#testing-principles)
+2. [Test File Structure](#test-file-structure)
+3. [Fixtures & Mock Data](#fixtures--mock-data)
+4. [Unit Tests](#unit-tests)
+5. [Integration Tests](#integration-tests)
+6. [E2E Tests](#e2e-tests)
+7. [CI Considerations](#ci-considerations)
+8. [Performance Tests](#performance-tests)
+
+---
+
+## Testing Principles
+
+1. **No network calls in unit tests** — All YouTube API, yt-dlp, and download operations must be mocked.
+2. **No GPU required in CI** — All Whisper and easyocr tests must work on CPU, or be marked `@pytest.mark.slow`.
+3. **No video files in repo** — Test fixtures use JSON transcripts and small synthetic images, not actual video files.
+4. **100% pipeline coverage** — Every phase of the 6-phase pipeline must be tested.
+5. **Edge case focus** — Test missing chapters, empty transcripts, corrupt frames, rate limits.
+6. **Compatible with existing test infra** — Use existing conftest.py, markers, and patterns.
+
+---
+
+## Test File Structure
+
+```
+tests/
+├── test_video_models.py          # Data model tests (serialization, validation)
+├── test_video_scraper.py         # Main scraper orchestration tests
+├── test_video_transcript.py      # Transcript extraction tests
+├── test_video_visual.py          # Visual extraction tests
+├── test_video_segmenter.py       # Segmentation and alignment tests
+├── test_video_integration.py     # Integration with unified scraper, create command
+├── test_video_output.py          # Output generation tests
+├── test_video_source_detector.py # Source detection tests (or add to existing)
+├── fixtures/
+│   └── video/
+│       ├── sample_metadata.json       # yt-dlp info_dict mock
+│       ├── sample_transcript.json     # YouTube transcript mock
+│       ├── sample_whisper_output.json # Whisper transcription mock
+│       ├── sample_chapters.json       # Chapter data mock
+│       ├── sample_playlist.json       # Playlist metadata mock
+│       ├── sample_segments.json       # Pre-aligned segments
+│       ├── sample_frame_code.png      # 100x100 synthetic dark frame
+│       ├── sample_frame_slide.png     # 100x100 synthetic light frame
+│       ├── sample_frame_diagram.png   # 100x100 synthetic edge-heavy frame
+│       ├── sample_srt.srt             # SRT subtitle file
+│       ├── sample_vtt.vtt             # WebVTT subtitle file
+│       └── sample_config.json         # Video source config
+```
+
+---
+
+## Fixtures & Mock Data
+
+### yt-dlp Metadata Fixture
+
+```python
+# tests/fixtures/video/sample_metadata.json
+SAMPLE_YTDLP_METADATA = {
+    "id": "abc123def45",
+    "title": "React Hooks Tutorial for Beginners",
+    "description": "Learn React Hooks from scratch. Covers useState, useEffect, and custom hooks.",
+    "duration": 1832,
+    "upload_date": "20260115",
+    "uploader": "React Official",
+    "uploader_url": "https://www.youtube.com/@reactofficial",
+    "channel_follower_count": 250000,
+    "view_count": 1500000,
+    "like_count": 45000,
+    "comment_count": 2300,
+    "tags": ["react", "hooks", "tutorial", "javascript"],
+    "categories": ["Education"],
+    "language": "en",
+    "thumbnail": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
+    "webpage_url": "https://www.youtube.com/watch?v=abc123def45",
+    "chapters": [
+        {"title": "Intro", "start_time": 0, "end_time": 45},
+        {"title": "Project Setup", "start_time": 45, "end_time": 180},
+        {"title": "useState Hook", "start_time": 180, "end_time": 540},
+        {"title": "useEffect Hook", "start_time": 540, "end_time": 900},
+        {"title": "Custom Hooks", "start_time": 900, "end_time": 1320},
+        {"title": "Best Practices", "start_time": 1320, "end_time": 1680},
+        {"title": "Wrap Up", "start_time": 1680, "end_time": 1832},
+    ],
+    "subtitles": {
+        "en": [{"ext": "vtt", "url": "https://..."}],
+    },
+    "automatic_captions": {
+        "en": [{"ext": "vtt", "url": "https://..."}],
+    },
+    "extractor": "youtube",
+}
+```
+
+### YouTube Transcript Fixture
+
+```python
+SAMPLE_YOUTUBE_TRANSCRIPT = [
+    {"text": "Welcome to this React Hooks tutorial.", "start": 0.0, "duration": 2.5},
+    {"text": "Today we'll learn about the most important hooks.", "start": 2.5, "duration": 3.0},
+    {"text": "Let's start by setting up our project.", "start": 45.0, "duration": 2.8},
+    {"text": "We'll use Create React App.", "start": 47.8, "duration": 2.0},
+    {"text": "Run npx create-react-app hooks-demo.", "start": 49.8, "duration": 3.5},
+    # ... more segments covering all chapters
+]
+```
+
+### Whisper Output Fixture
+
+```python
+SAMPLE_WHISPER_OUTPUT = {
+    "language": "en",
+    "language_probability": 0.98,
+    "duration": 1832.0,
+    "segments": [
+        {
+            "start": 0.0,
+            "end": 2.5,
+            "text": "Welcome to this React Hooks tutorial.",
+            "avg_logprob": -0.15,
+            "no_speech_prob": 0.01,
+            "words": [
+                {"word": "Welcome", "start": 0.0, "end": 0.4, "probability": 0.97},
+                {"word": "to", "start": 0.4, "end": 0.5, "probability": 0.99},
+                {"word": "this", "start": 0.5, "end": 0.7, "probability": 0.98},
+                {"word": "React", "start": 0.7, "end": 1.1, "probability": 0.95},
+                {"word": "Hooks", "start": 1.1, "end": 1.5, "probability": 0.93},
+                {"word": "tutorial.", "start": 1.5, "end": 2.3, "probability": 0.96},
+            ],
+        },
+    ],
+}
+```
+
+### Synthetic Frame Fixtures
+
+```python
+# Generate in conftest.py or fixture setup
+import numpy as np
+import cv2
+
+def create_dark_frame(path: str):
+    """Create a synthetic dark frame (simulates code editor)."""
+    img = np.zeros((1080, 1920, 3), dtype=np.uint8)
+    img[200:250, 100:800] = [200, 200, 200]  # Simulated text line
+    img[270:320, 100:600] = [180, 180, 180]  # Another text line
+    cv2.imwrite(path, img)
+
+def create_light_frame(path: str):
+    """Create a synthetic light frame (simulates slide)."""
+    img = np.ones((1080, 1920, 3), dtype=np.uint8) * 240
+    img[100:150, 200:1000] = [40, 40, 40]  # Title text
+    img[300:330, 200:1200] = [60, 60, 60]  # Body text
+    cv2.imwrite(path, img)
+```
+
+### conftest.py Additions
+
+```python
+# tests/conftest.py — add video fixtures
+
+import pytest
+import json
+from pathlib import Path
+
+FIXTURES_DIR = Path(__file__).parent / "fixtures" / "video"
+
+
+@pytest.fixture
+def sample_ytdlp_metadata():
+    """Load sample yt-dlp metadata."""
+    with open(FIXTURES_DIR / "sample_metadata.json") as f:
+        return json.load(f)
+
+
+@pytest.fixture
+def sample_transcript():
+    """Load sample YouTube transcript."""
+    with open(FIXTURES_DIR / "sample_transcript.json") as f:
+        return json.load(f)
+
+
+@pytest.fixture
+def sample_whisper_output():
+    """Load sample Whisper transcription output."""
+    with open(FIXTURES_DIR / "sample_whisper_output.json") as f:
+        return json.load(f)
+
+
+@pytest.fixture
+def sample_chapters():
+    """Load sample chapter data."""
+    with open(FIXTURES_DIR / "sample_chapters.json") as f:
+        return json.load(f)
+
+
+@pytest.fixture
+def sample_video_config():
+    """Create a sample VideoSourceConfig."""
+    from skill_seekers.cli.video_models import VideoSourceConfig
+    return VideoSourceConfig(
+        url="https://www.youtube.com/watch?v=abc123def45",
+        name="test_video",
+        visual_extraction=False,
+        max_videos=5,
+    )
+
+
+@pytest.fixture
+def video_output_dir(tmp_path):
+    """Create a temporary output directory for video tests."""
+    output = tmp_path / "output" / "test_video"
+    output.mkdir(parents=True)
+    (output / "video_data").mkdir()
+    (output / "video_data" / "transcripts").mkdir()
+    (output / "video_data" / "segments").mkdir()
+    (output / "video_data" / "frames").mkdir()
+    (output / "references").mkdir()
+    (output / "pages").mkdir()
+    return output
+```
+
+---
+
+## Unit Tests
+
+### test_video_models.py
+
+```python
+"""Tests for video data models and serialization."""
+
+class TestVideoInfo:
+    def test_create_from_ytdlp_metadata(self, sample_ytdlp_metadata):
+        """VideoInfo correctly parses yt-dlp info_dict."""
+        ...
+
+    def test_serialization_round_trip(self):
+        """VideoInfo serializes to dict and deserializes back identically."""
+        ...
+
+    def test_content_richness_score(self):
+        """Content richness score computed correctly based on signals."""
+        ...
+
+    def test_empty_chapters(self):
+        """VideoInfo handles video with no chapters."""
+        ...
+
+
+class TestVideoSegment:
+    def test_timestamp_display(self):
+        """Timestamp display formats correctly (MM:SS - MM:SS)."""
+        ...
+
+    def test_youtube_timestamp_url(self):
+        """YouTube timestamp URL generated correctly."""
+        ...
+
+    def test_segment_with_code_blocks(self):
+        """Segment correctly tracks detected code blocks."""
+        ...
+
+    def test_segment_without_visual(self):
+        """Segment works when visual extraction is disabled."""
+        ...
+
+
+class TestChapter:
+    def test_chapter_duration(self):
+        """Chapter duration computed correctly."""
+        ...
+
+    def test_chapter_serialization(self):
+        """Chapter serializes to/from dict."""
+        ...
+
+
+class TestTranscriptSegment:
+    def test_from_youtube_api(self):
+        """TranscriptSegment created from YouTube API format."""
+        ...
+
+    def test_from_whisper_output(self):
+        """TranscriptSegment created from Whisper output."""
+        ...
+
+    def test_with_word_timestamps(self):
+        """TranscriptSegment preserves word-level timestamps."""
+        ...
+
+
+class TestVideoSourceConfig:
+    def test_validate_single_source(self):
+        """Config requires exactly one source field."""
+        ...
+
+    def test_validate_duration_range(self):
+        """Config validates min < max duration."""
+        ...
+
+    def test_defaults(self):
+        """Config has sensible defaults."""
+        ...
+
+    def test_from_unified_config(self, sample_video_config):
+        """Config created from unified config JSON entry."""
+        ...
+
+
+class TestEnums:
+    def test_all_video_source_types(self):
+        """All VideoSourceType values are valid."""
+        ...
+
+    def test_all_frame_types(self):
+        """All FrameType values are valid."""
+        ...
+
+    def test_all_transcript_sources(self):
+        """All TranscriptSource values are valid."""
+        ...
+```
+
+### test_video_transcript.py
+
+```python
+"""Tests for transcript extraction (YouTube API + Whisper + subtitle parsing)."""
+
+class TestYouTubeTranscript:
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    def test_extract_manual_captions(self, mock_api, sample_transcript):
+        """Prefers manual captions over auto-generated."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    def test_fallback_to_auto_generated(self, mock_api):
+        """Falls back to auto-generated when manual not available."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    def test_fallback_to_translation(self, mock_api):
+        """Falls back to translated captions when preferred language unavailable."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    def test_no_transcript_available(self, mock_api):
+        """Raises TranscriptNotAvailable when no captions exist."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    def test_confidence_scoring(self, mock_api, sample_transcript):
+        """Manual captions get 1.0 confidence, auto-generated get 0.8."""
+        ...
+
+
+class TestWhisperTranscription:
+    @pytest.mark.slow
+    @patch('skill_seekers.cli.video_transcript.WhisperModel')
+    def test_transcribe_with_word_timestamps(self, mock_model):
+        """Whisper returns word-level timestamps."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.WhisperModel')
+    def test_language_detection(self, mock_model):
+        """Whisper detects video language."""
+        ...
+
+    @patch('skill_seekers.cli.video_transcript.WhisperModel')
+    def test_vad_filtering(self, mock_model):
+        """VAD filter removes silence segments."""
+        ...
+
+    def test_download_audio_only(self):
+        """Audio extraction downloads audio stream only (not video)."""
+        # Mock yt-dlp download
+        ...
+
+
+class TestSubtitleParsing:
+    def test_parse_srt(self, tmp_path):
+        """Parse SRT subtitle file into segments."""
+        srt_content = "1\n00:00:01,500 --> 00:00:04,000\nHello world\n\n2\n00:00:05,000 --> 00:00:08,000\nSecond line\n"
+        srt_file = tmp_path / "test.srt"
+        srt_file.write_text(srt_content)
+        ...
+
+    def test_parse_vtt(self, tmp_path):
+        """Parse WebVTT subtitle file into segments."""
+        vtt_content = "WEBVTT\n\n00:00:01.500 --> 00:00:04.000\nHello world\n\n00:00:05.000 --> 00:00:08.000\nSecond line\n"
+        vtt_file = tmp_path / "test.vtt"
+        vtt_file.write_text(vtt_content)
+        ...
+
+    def test_srt_html_tag_removal(self, tmp_path):
+        """SRT parser removes inline HTML tags."""
+        ...
+
+    def test_empty_subtitle_file(self, tmp_path):
+        """Handle empty subtitle file gracefully."""
+        ...
+
+
+class TestTranscriptFallbackChain:
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    @patch('skill_seekers.cli.video_transcript.WhisperModel')
+    def test_youtube_then_whisper_fallback(self, mock_whisper, mock_yt_api):
+        """Falls back to Whisper when YouTube captions fail."""
+        ...
+
+    def test_subtitle_file_discovery(self, tmp_path):
+        """Discovers sidecar subtitle files for local videos."""
+        ...
+```
+
+### test_video_visual.py
+
+```python
+"""Tests for visual extraction (scene detection, frame extraction, OCR)."""
+
+class TestFrameClassification:
+    def test_classify_dark_frame_as_code(self, tmp_path):
+        """Dark frame with text patterns classified as code_editor."""
+        ...
+
+    def test_classify_light_frame_as_slide(self, tmp_path):
+        """Light uniform frame classified as slide."""
+        ...
+
+    def test_classify_high_edge_as_diagram(self, tmp_path):
+        """High edge density frame classified as diagram."""
+        ...
+
+    def test_classify_blank_frame_as_other(self, tmp_path):
+        """Nearly blank frame classified as other."""
+        ...
+
+
+class TestKeyframeTimestamps:
+    def test_chapter_boundaries_included(self, sample_chapters):
+        """Keyframe timestamps include chapter start times."""
+        ...
+
+    def test_long_chapter_midpoint(self, sample_chapters):
+        """Long chapters (>2 min) get midpoint keyframe."""
+        ...
+
+    def test_deduplication_within_1_second(self):
+        """Timestamps within 1 second are deduplicated."""
+        ...
+
+    def test_regular_intervals_fill_gaps(self):
+        """Regular interval timestamps fill gaps between scenes."""
+        ...
+
+
+class TestOCRExtraction:
+    @pytest.mark.slow
+    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
+    def test_extract_text_from_code_frame(self, mock_reader, tmp_path):
+        """OCR extracts text from code editor frame."""
+        ...
+
+    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
+    def test_confidence_filtering(self, mock_reader):
+        """Low-confidence OCR results are filtered out."""
+        ...
+
+    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
+    def test_monospace_detection(self, mock_reader):
+        """Monospace text regions correctly detected."""
+        ...
+
+
+class TestCodeBlockDetection:
+    def test_detect_python_code(self):
+        """Detect Python code from OCR text."""
+        ...
+
+    def test_detect_terminal_commands(self):
+        """Detect terminal commands from OCR text."""
+        ...
+
+    def test_language_detection_from_ocr(self):
+        """Language detection works on OCR-extracted code."""
+        ...
+```
+
+### test_video_segmenter.py
+
+```python
+"""Tests for segmentation and stream alignment."""
+
+class TestChapterSegmentation:
+    def test_chapters_create_segments(self, sample_chapters):
+        """Chapters map directly to segments."""
+        ...
+
+    def test_long_chapter_splitting(self):
+        """Chapters exceeding max_segment_duration are split."""
+        ...
+
+    def test_empty_chapters(self):
+        """Falls back to time window when no chapters."""
+        ...
+
+
+class TestTimeWindowSegmentation:
+    def test_fixed_windows(self):
+        """Creates segments at fixed intervals."""
+        ...
+
+    def test_sentence_boundary_alignment(self):
+        """Segments split at sentence boundaries, not mid-word."""
+        ...
+
+    def test_configurable_window_size(self):
+        """Window size respects config.time_window_seconds."""
+        ...
+
+
+class TestStreamAlignment:
+    def test_align_transcript_to_segments(self, sample_transcript, sample_chapters):
+        """Transcript segments mapped to correct time windows."""
+        ...
+
+    def test_align_keyframes_to_segments(self):
+        """Keyframes mapped to correct segments by timestamp."""
+        ...
+
+    def test_partial_overlap_handling(self):
+        """Transcript segments partially overlapping window boundaries."""
+        ...
+
+    def test_empty_segment_handling(self):
+        """Handle segments with no transcript (silence, music)."""
+        ...
+
+
+class TestContentMerging:
+    def test_transcript_only_content(self):
+        """Content is just transcript when no visual data."""
+        ...
+
+    def test_code_block_appended(self):
+        """Code on screen is appended to transcript content."""
+        ...
+
+    def test_duplicate_code_not_repeated(self):
+        """Code mentioned in transcript is not duplicated from OCR."""
+        ...
+
+    def test_chapter_title_as_heading(self):
+        """Chapter title becomes markdown heading in content."""
+        ...
+
+    def test_slide_text_supplementary(self):
+        """Slide text adds to content when not in transcript."""
+        ...
+
+
+class TestCategorization:
+    def test_category_from_chapter_title(self):
+        """Category inferred from chapter title keywords."""
+        ...
+
+    def test_category_from_transcript(self):
+        """Category inferred from transcript content."""
+        ...
+
+    def test_custom_categories_from_config(self):
+        """Custom category keywords from config used."""
+        ...
+```
+
+---
+
+## Integration Tests
+
+### test_video_integration.py
+
+```python
+"""Integration tests for video pipeline end-to-end."""
+
+class TestSourceDetectorVideo:
+    def test_detect_youtube_video(self):
+        info = SourceDetector.detect("https://youtube.com/watch?v=abc123def45")
+        assert info.type == "video"
+        assert info.parsed["video_source"] == "youtube_video"
+
+    def test_detect_youtube_short_url(self):
+        info = SourceDetector.detect("https://youtu.be/abc123def45")
+        assert info.type == "video"
+
+    def test_detect_youtube_playlist(self):
+        info = SourceDetector.detect("https://youtube.com/playlist?list=PLxxx")
+        assert info.type == "video"
+        assert info.parsed["video_source"] == "youtube_playlist"
+
+    def test_detect_youtube_channel(self):
+        info = SourceDetector.detect("https://youtube.com/@reactofficial")
+        assert info.type == "video"
+        assert info.parsed["video_source"] == "youtube_channel"
+
+    def test_detect_vimeo(self):
+        info = SourceDetector.detect("https://vimeo.com/123456789")
+        assert info.type == "video"
+        assert info.parsed["video_source"] == "vimeo"
+
+    def test_detect_mp4_file(self, tmp_path):
+        f = tmp_path / "tutorial.mp4"
+        f.touch()
+        info = SourceDetector.detect(str(f))
+        assert info.type == "video"
+        assert info.parsed["video_source"] == "local_file"
+
+    def test_detect_video_directory(self, tmp_path):
+        d = tmp_path / "videos"
+        d.mkdir()
+        (d / "vid1.mp4").touch()
+        (d / "vid2.mkv").touch()
+        info = SourceDetector.detect(str(d))
+        assert info.type == "video"
+
+    def test_youtube_not_confused_with_web(self):
+        """YouTube URLs detected as video, not web."""
+        info = SourceDetector.detect("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
+        assert info.type == "video"
+        assert info.type != "web"
+
+
+class TestUnifiedConfigVideo:
+    def test_video_source_in_config(self, tmp_path):
+        """Video source parsed correctly from unified config."""
+        ...
+
+    def test_multiple_video_sources(self, tmp_path):
+        """Multiple video sources in same config."""
+        ...
+
+    def test_video_alongside_docs(self, tmp_path):
+        """Video source alongside documentation source."""
+        ...
+
+
+class TestFullPipeline:
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    @patch('skill_seekers.cli.video_scraper.YoutubeDL')
+    def test_single_video_transcript_only(
+        self, mock_ytdl, mock_transcript, sample_ytdlp_metadata,
+        sample_transcript, video_output_dir
+    ):
+        """Full pipeline: single YouTube video, transcript only."""
+        mock_ytdl.return_value.__enter__.return_value.extract_info.return_value = sample_ytdlp_metadata
+        mock_transcript.list_transcripts.return_value = ...
+
+        # Run pipeline
+        # Assert output files exist and content is correct
+        ...
+
+    @pytest.mark.slow
+    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
+    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
+    @patch('skill_seekers.cli.video_scraper.YoutubeDL')
+    def test_single_video_with_visual(
+        self, mock_ytdl, mock_transcript, mock_ocr,
+        sample_ytdlp_metadata, video_output_dir
+    ):
+        """Full pipeline: single video with visual extraction."""
+        ...
+```
+
+---
+
+## CI Considerations
+
+### What Runs in CI (Default)
+
+- All unit tests (mocked, no network, no GPU)
+- Integration tests with mocked external services
+- Source detection tests (pure logic)
+- Data model tests (pure logic)
+
+### What Doesn't Run in CI (Marked)
+
+```python
+@pytest.mark.slow       # Whisper model loading, actual OCR
+@pytest.mark.integration  # Real YouTube API calls
+@pytest.mark.e2e         # Full pipeline with real video download
+```
+
+### CI Test Matrix Compatibility
+
+| Test | Ubuntu | macOS | Python 3.10 | Python 3.12 | GPU |
+|------|--------|-------|-------------|-------------|-----|
+| Unit tests | Yes | Yes | Yes | Yes | No |
+| Integration (mocked) | Yes | Yes | Yes | Yes | No |
+| Whisper tests (mocked) | Yes | Yes | Yes | Yes | No |
+| OCR tests (mocked) | Yes | Yes | Yes | Yes | No |
+| E2E (real download) | Skip | Skip | Skip | Skip | No |
+
+### Dependency Handling in Tests
+
+```python
+# At top of visual test files:
+pytest.importorskip("cv2", reason="opencv-python-headless required for visual tests")
+pytest.importorskip("easyocr", reason="easyocr required for OCR tests")
+
+# At top of whisper test files:
+pytest.importorskip("faster_whisper", reason="faster-whisper required for transcription tests")
+```
+
+---
+
+## Performance Tests
+
+```python
+@pytest.mark.benchmark
+class TestVideoPerformance:
+    def test_transcript_parsing_speed(self, sample_transcript):
+        """Transcript parsing completes in < 10ms for 1000 segments."""
+        ...
+
+    def test_segment_alignment_speed(self):
+        """Segment alignment completes in < 50ms for 100 segments."""
+        ...
+
+    def test_frame_classification_speed(self, tmp_path):
+        """Frame classification completes in < 20ms per frame."""
+        ...
+
+    def test_content_merging_speed(self):
+        """Content merging completes in < 5ms per segment."""
+        ...
+
+    def test_output_generation_speed(self, video_output_dir):
+        """Output generation (5 videos, 50 segments) in < 1 second."""
+        ...
+```
--- a/docs/plans/video/07_VIDEO_DEPENDENCIES.md
+++ b/docs/plans/video/07_VIDEO_DEPENDENCIES.md
@@ -0,0 +1,515 @@
+# Video Source — Dependencies & System Requirements
+
+**Date:** February 27, 2026
+**Document:** 07 of 07
+**Status:** Planning
+
+> **Status: IMPLEMENTED** — `skill-seekers video --setup` (see `video_setup.py`, 835 lines, 60 tests)
+> - GPU auto-detection: NVIDIA (nvidia-smi/CUDA), AMD (rocminfo/ROCm), CPU fallback
+> - Correct PyTorch index URL selection per GPU vendor
+> - EasyOCR removed from pip extras, installed at runtime via --setup
+> - ROCm configuration (MIOPEN_FIND_MODE, HSA_OVERRIDE_GFX_VERSION)
+> - Virtual environment detection with --force override
+> - System dependency checks (tesseract, ffmpeg)
+> - Non-interactive mode for MCP/CI usage
+
+---
+
+## Table of Contents
+
+1. [Dependency Tiers](#dependency-tiers)
+2. [pyproject.toml Changes](#pyprojecttoml-changes)
+3. [System Requirements](#system-requirements)
+4. [Import Guards](#import-guards)
+5. [Dependency Check Command](#dependency-check-command)
+6. [Model Management](#model-management)
+7. [Docker Considerations](#docker-considerations)
+
+---
+
+## Dependency Tiers
+
+Video processing has two tiers to keep the base install lightweight:
+
+### Tier 1: `[video]` — Lightweight (YouTube transcripts + metadata)
+
+**Use case:** YouTube videos with existing captions. No download, no GPU needed.
+
+| Package | Version | Size | Purpose |
+|---------|---------|------|---------|
+| `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata extraction, audio download |
+| `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube caption extraction |
+
+**Capabilities:**
+- YouTube metadata (title, chapters, tags, description, engagement)
+- YouTube captions (manual and auto-generated)
+- Vimeo metadata
+- Playlist and channel resolution
+- Subtitle file parsing (SRT, VTT)
+- Segmentation and alignment
+- Full output generation
+
+**NOT included:**
+- Speech-to-text (Whisper)
+- Visual extraction (frame + OCR)
+- Local video file transcription (without subtitles)
+
+### Tier 2: `[video-full]` — Full (adds Whisper + visual extraction)
+
+**Use case:** Local videos without subtitles, or when you want code/slide extraction from screen.
+
+| Package | Version | Size | Purpose |
+|---------|---------|------|---------|
+| `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata + audio download |
+| `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube captions |
+| `faster-whisper` | `>=1.0.0` | ~5MB (+ models: 75MB-3GB) | Speech-to-text |
+| `scenedetect[opencv]` | `>=0.6.4` | ~50MB (includes OpenCV) | Scene boundary detection |
+| `easyocr` | `>=1.7.0` | ~150MB (+ models: ~200MB) | Text recognition from frames |
+| `opencv-python-headless` | `>=4.9.0` | ~50MB | Frame extraction, image processing |
+
+**Additional capabilities over Tier 1:**
+- Whisper speech-to-text (99 languages, word-level timestamps)
+- Scene detection (find visual transitions)
+- Keyframe extraction (save important frames)
+- Frame classification (code/slide/terminal/diagram)
+- OCR on frames (extract code and text from screen)
+- Code block detection from video
+
+**Total install size:**
+- Tier 1: ~15MB
+- Tier 2: ~270MB + models (~300MB-3.2GB depending on Whisper model)
+
+---
+
+## pyproject.toml Changes
+
+```toml
+[project.optional-dependencies]
+# Existing dependencies...
+gemini = ["google-generativeai>=0.8.0"]
+openai = ["openai>=1.0.0"]
+all-llms = ["google-generativeai>=0.8.0", "openai>=1.0.0"]
+
+# NEW: Video processing
+video = [
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
+]
+video-full = [
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
+    "faster-whisper>=1.0.0",
+    "scenedetect[opencv]>=0.6.4",
+    "easyocr>=1.7.0",
+    "opencv-python-headless>=4.9.0",
+]
+
+# Update 'all' to include video
+all = [
+    # ... existing all dependencies ...
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
+    "faster-whisper>=1.0.0",
+    "scenedetect[opencv]>=0.6.4",
+    "easyocr>=1.7.0",
+    "opencv-python-headless>=4.9.0",
+]
+
+[project.scripts]
+# ... existing entry points ...
+skill-seekers-video = "skill_seekers.cli.video_scraper:main"      # NEW
+```
+
+### Installation Commands
+
+```bash
+# Lightweight video (YouTube transcripts + metadata)
+pip install skill-seekers[video]
+
+# Full video (+ Whisper + visual extraction)
+pip install skill-seekers[video-full]
+
+# Everything
+pip install skill-seekers[all]
+
+# Development (editable)
+pip install -e ".[video]"
+pip install -e ".[video-full]"
+```
+
+---
+
+## System Requirements
+
+### Tier 1 (Lightweight)
+
+| Requirement | Needed For | How to Check |
+|-------------|-----------|-------------|
+| Python 3.10+ | All | `python --version` |
+| Internet connection | YouTube API calls | N/A |
+
+No additional system dependencies. Pure Python.
+
+### Tier 2 (Full)
+
+| Requirement | Needed For | How to Check | Install |
+|-------------|-----------|-------------|---------|
+| Python 3.10+ | All | `python --version` | — |
+| FFmpeg | Audio extraction, video processing | `ffmpeg -version` | See below |
+| GPU (optional) | Whisper + easyocr acceleration | `nvidia-smi` (NVIDIA) | CUDA toolkit |
+
+### FFmpeg Installation
+
+FFmpeg is required for:
+- Extracting audio from video files (Whisper input)
+- Downloading audio-only streams (yt-dlp post-processing)
+- Converting between audio formats
+
+```bash
+# macOS
+brew install ffmpeg
+
+# Ubuntu/Debian
+sudo apt install ffmpeg
+
+# Windows (winget)
+winget install ffmpeg
+
+# Windows (choco)
+choco install ffmpeg
+
+# Verify
+ffmpeg -version
+```
+
+### GPU Support (Optional)
+
+GPU accelerates Whisper (~4x) and easyocr (~5x) but is not required.
+
+**NVIDIA GPU (CUDA):**
+```bash
+# Check CUDA availability
+python -c "import torch; print(torch.cuda.is_available())"
+
+# faster-whisper uses CTranslate2 which auto-detects CUDA
+# easyocr uses PyTorch which auto-detects CUDA
+# No additional setup needed if PyTorch CUDA is working
+```
+
+**Apple Silicon (MPS):**
+```bash
+# faster-whisper does not support MPS directly
+# Falls back to CPU on Apple Silicon
+# easyocr has partial MPS support
+```
+
+**CPU-only (no GPU):**
+```bash
+# Everything works on CPU, just slower
+# Whisper base model: ~4x slower on CPU vs GPU
+# easyocr: ~5x slower on CPU vs GPU
+# For short videos (<10 min), CPU is fine
+```
+
+---
+
+## Import Guards
+
+All video dependencies use try/except import guards to provide clear error messages:
+
+### video_scraper.py
+
+```python
+"""Video scraper - main orchestrator."""
+
+# Core dependencies (always available)
+import json
+import logging
+import os
+from pathlib import Path
+
+# Tier 1: Video basics
+try:
+    from yt_dlp import YoutubeDL
+    HAS_YTDLP = True
+except ImportError:
+    HAS_YTDLP = False
+
+try:
+    from youtube_transcript_api import YouTubeTranscriptApi
+    HAS_YT_TRANSCRIPT = True
+except ImportError:
+    HAS_YT_TRANSCRIPT = False
+
+# Feature availability check
+def check_video_dependencies(require_full: bool = False) -> None:
+    """Check that video dependencies are installed.
+
+    Args:
+        require_full: If True, check for full dependencies (Whisper, OCR)
+
+    Raises:
+        ImportError: With installation instructions
+    """
+    missing = []
+
+    if not HAS_YTDLP:
+        missing.append("yt-dlp")
+    if not HAS_YT_TRANSCRIPT:
+        missing.append("youtube-transcript-api")
+
+    if missing:
+        raise ImportError(
+            f"Video processing requires: {', '.join(missing)}\n"
+            f"Install with: pip install skill-seekers[video]"
+        )
+
+    if require_full:
+        full_missing = []
+        try:
+            import faster_whisper
+        except ImportError:
+            full_missing.append("faster-whisper")
+        try:
+            import cv2
+        except ImportError:
+            full_missing.append("opencv-python-headless")
+        try:
+            import scenedetect
+        except ImportError:
+            full_missing.append("scenedetect[opencv]")
+        try:
+            import easyocr
+        except ImportError:
+            full_missing.append("easyocr")
+
+        if full_missing:
+            raise ImportError(
+                f"Visual extraction requires: {', '.join(full_missing)}\n"
+                f"Install with: pip install skill-seekers[video-full]"
+            )
+```
+
+### video_transcript.py
+
+```python
+"""Transcript extraction module."""
+
+# YouTube transcript (Tier 1)
+try:
+    from youtube_transcript_api import YouTubeTranscriptApi
+    HAS_YT_TRANSCRIPT = True
+except ImportError:
+    HAS_YT_TRANSCRIPT = False
+
+# Whisper (Tier 2)
+try:
+    from faster_whisper import WhisperModel
+    HAS_WHISPER = True
+except ImportError:
+    HAS_WHISPER = False
+
+
+def get_transcript(video_info, config):
+    """Get transcript using best available method."""
+
+    # Try YouTube captions first (Tier 1)
+    if HAS_YT_TRANSCRIPT and video_info.source_type == VideoSourceType.YOUTUBE:
+        try:
+            return extract_youtube_transcript(video_info.video_id, config.languages)
+        except TranscriptNotAvailable:
+            pass
+
+    # Try Whisper fallback (Tier 2)
+    if HAS_WHISPER:
+        return transcribe_with_whisper(video_info, config)
+
+    # No transcript possible
+    if not HAS_WHISPER:
+        logger.warning(
+            f"No transcript for {video_info.video_id}. "
+            "Install faster-whisper for speech-to-text: "
+            "pip install skill-seekers[video-full]"
+        )
+    return [], TranscriptSource.NONE
+```
+
+### video_visual.py
+
+```python
+"""Visual extraction module."""
+
+try:
+    import cv2
+    HAS_OPENCV = True
+except ImportError:
+    HAS_OPENCV = False
+
+try:
+    from scenedetect import detect, ContentDetector
+    HAS_SCENEDETECT = True
+except ImportError:
+    HAS_SCENEDETECT = False
+
+try:
+    import easyocr
+    HAS_EASYOCR = True
+except ImportError:
+    HAS_EASYOCR = False
+
+
+def check_visual_dependencies() -> None:
+    """Check visual extraction dependencies."""
+    missing = []
+    if not HAS_OPENCV:
+        missing.append("opencv-python-headless")
+    if not HAS_SCENEDETECT:
+        missing.append("scenedetect[opencv]")
+    if not HAS_EASYOCR:
+        missing.append("easyocr")
+
+    if missing:
+        raise ImportError(
+            f"Visual extraction requires: {', '.join(missing)}\n"
+            f"Install with: pip install skill-seekers[video-full]"
+        )
+
+
+def check_ffmpeg() -> bool:
+    """Check if FFmpeg is available."""
+    import shutil
+    return shutil.which('ffmpeg') is not None
+```
+
+---
+
+## Dependency Check Command
+
+Add a dependency check to the `config` command:
+
+```bash
+# Check all video dependencies
+skill-seekers config --check-video
+
+# Output:
+# Video Dependencies:
+#   yt-dlp              ✅ 2025.01.15
+#   youtube-transcript-api ✅ 1.2.3
+#   faster-whisper      ❌ Not installed (pip install skill-seekers[video-full])
+#   opencv-python-headless ❌ Not installed
+#   scenedetect         ❌ Not installed
+#   easyocr             ❌ Not installed
+#
+# System Dependencies:
+#   FFmpeg              ✅ 6.1.1
+#   GPU (CUDA)          ❌ Not available (CPU mode will be used)
+#
+# Available modes:
+#   Transcript only     ✅ YouTube captions available
+#   Whisper fallback    ❌ Install faster-whisper
+#   Visual extraction   ❌ Install video-full dependencies
+```
+
+---
+
+## Model Management
+
+### Whisper Models
+
+Whisper models are downloaded on first use and cached in the user's home directory.
+
+| Model | Download Size | Disk Size | First-Use Download Time |
+|-------|-------------|-----------|------------------------|
+| tiny | 75 MB | 75 MB | ~15s |
+| base | 142 MB | 142 MB | ~25s |
+| small | 466 MB | 466 MB | ~60s |
+| medium | 1.5 GB | 1.5 GB | ~3 min |
+| large-v3 | 3.1 GB | 3.1 GB | ~5 min |
+| large-v3-turbo | 1.6 GB | 1.6 GB | ~3 min |
+
+**Cache location:** `~/.cache/huggingface/hub/` (CTranslate2 models)
+
+**Pre-download command:**
+```bash
+# Pre-download a model before using it
+python -c "from faster_whisper import WhisperModel; WhisperModel('base')"
+```
+
+### easyocr Models
+
+easyocr models are also downloaded on first use.
+
+| Language Pack | Download Size | Disk Size |
+|-------------|-------------|-----------|
+| English | ~100 MB | ~100 MB |
+| + Additional language | ~50-100 MB each | ~50-100 MB each |
+
+**Cache location:** `~/.EasyOCR/model/`
+
+**Pre-download command:**
+```bash
+# Pre-download English OCR model
+python -c "import easyocr; easyocr.Reader(['en'])"
+```
+
+---
+
+## Docker Considerations
+
+### Dockerfile additions for video support
+
+```dockerfile
+# Tier 1 (lightweight)
+RUN pip install skill-seekers[video]
+
+# Tier 2 (full)
+RUN apt-get update && apt-get install -y ffmpeg
+RUN pip install skill-seekers[video-full]
+
+# Pre-download Whisper model (avoids first-run download)
+RUN python -c "from faster_whisper import WhisperModel; WhisperModel('base')"
+
+# Pre-download easyocr model
+RUN python -c "import easyocr; easyocr.Reader(['en'])"
+```
+
+### Docker image sizes
+
+| Tier | Base Image Size | Additional Size | Total |
+|------|----------------|----------------|-------|
+| Tier 1 (video) | ~300 MB | ~20 MB | ~320 MB |
+| Tier 2 (video-full, CPU) | ~300 MB | ~800 MB | ~1.1 GB |
+| Tier 2 (video-full, GPU) | ~5 GB (CUDA base) | ~800 MB | ~5.8 GB |
+
+### Kubernetes resource recommendations
+
+```yaml
+# Tier 1 (transcript only)
+resources:
+  requests:
+    memory: "256Mi"
+    cpu: "500m"
+  limits:
+    memory: "512Mi"
+    cpu: "1000m"
+
+# Tier 2 (full, CPU)
+resources:
+  requests:
+    memory: "2Gi"
+    cpu: "2000m"
+  limits:
+    memory: "4Gi"
+    cpu: "4000m"
+
+# Tier 2 (full, GPU)
+resources:
+  requests:
+    memory: "4Gi"
+    cpu: "2000m"
+    nvidia.com/gpu: 1
+  limits:
+    memory: "8Gi"
+    cpu: "4000m"
+    nvidia.com/gpu: 1
+```
--- a/docs/reference/CLI_REFERENCE.md
+++ b/docs/reference/CLI_REFERENCE.md
@@ -32,6 +32,7 @@
  - [unified](#unified) - Multi-source scraping
  - [update](#update) - Incremental updates
  - [upload](#upload) - Upload to platform
+  - [video](#video) - Video extraction & setup
  - [workflows](#workflows) - Manage workflow presets
 - [Common Workflows](#common-workflows)
 - [Exit Codes](#exit-codes)
@@ -1035,6 +1036,44 @@ skill-seekers upload output/react-weaviate.zip --target weaviate \

 ---

+### video
+
+Extract skills from video tutorials (YouTube, Vimeo, or local files).
+
+### Usage
+
+```bash
+# Setup (first time — auto-detects GPU, installs PyTorch + visual deps)
+skill-seekers video --setup
+
+# Extract from YouTube
+skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID --name my-skill
+
+# With visual frame extraction (requires --setup first)
+skill-seekers video --url VIDEO_URL --name my-skill --visual
+
+# Local video file
+skill-seekers video --url /path/to/video.mp4 --name my-skill
+```
+
+### Key Flags
+
+| Flag | Description |
+|------|-------------|
+| `--setup` | Auto-detect GPU and install visual extraction dependencies |
+| `--url URL` | Video URL (YouTube, Vimeo) or local file path |
+| `--name NAME` | Skill name for output |
+| `--visual` | Enable visual frame extraction (OCR on keyframes) |
+| `--vision-api` | Use Claude Vision API as OCR fallback for low-confidence frames |
+
+### Notes
+
+- `--setup` detects NVIDIA (CUDA), AMD (ROCm), or CPU-only and installs the correct PyTorch variant
+- Requires `pip install skill-seekers[video]` (transcripts) or `skill-seekers[video-full]` (+ whisper + scene detection)
+- EasyOCR is NOT included in pip extras — it is installed by `--setup` with the correct GPU backend
+
+---
+
 ### workflows

 Manage enhancement workflow presets.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -52,7 +52,6 @@ dependencies = [
    "anthropic>=0.76.0", # Required for AI enhancement (core feature)
    "PyMuPDF>=1.24.14",
    "Pillow>=11.0.0",
-    "pytesseract>=0.3.13",
    "pydantic>=2.12.3",
    "pydantic-settings>=2.11.0",
    "python-dotenv>=1.1.1",
@@ -115,6 +114,24 @@ docx = [
    "python-docx>=1.1.0",
 ]

+# Video processing (lightweight: YouTube transcripts + metadata)
+video = [
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
+]
+
+# Video processing (full: + Whisper + visual extraction)
+# NOTE: easyocr removed — it pulls torch with the wrong GPU variant.
+# Use: skill-seekers video --setup  (auto-detects GPU, installs correct PyTorch + easyocr)
+video-full = [
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
+    "faster-whisper>=1.0.0",
+    "scenedetect[opencv]>=0.6.4",
+    "opencv-python-headless>=4.9.0",
+    "pytesseract>=0.3.13",
+]
+
 # RAG vector database upload support
 chroma = [
    "chromadb>=0.4.0",
@@ -156,9 +173,13 @@ embedding = [
 ]

 # All optional dependencies combined (dev dependencies now in [dependency-groups])
+# Note: video-full deps (opencv, easyocr, faster-whisper) excluded due to heavy
+# native dependencies. Install separately: pip install skill-seekers[video-full]
 all = [
    "mammoth>=1.6.0",
    "python-docx>=1.1.0",
+    "yt-dlp>=2024.12.0",
+    "youtube-transcript-api>=1.2.0",
    "mcp>=1.25,<2",
    "httpx>=0.28.1",
    "httpx-sse>=0.4.3",
@@ -201,6 +222,7 @@ skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
 skill-seekers-github = "skill_seekers.cli.github_scraper:main"
 skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
 skill-seekers-word = "skill_seekers.cli.word_scraper:main"
+skill-seekers-video = "skill_seekers.cli.video_scraper:main"
 skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
 skill-seekers-enhance = "skill_seekers.cli.enhance_command:main"
 skill-seekers-enhance-status = "skill_seekers.cli.enhance_status:main"
--- a/src/skill_seekers/cli/arguments/create.py
+++ b/src/skill_seekers/cli/arguments/create.py
@@ -410,6 +410,111 @@ WORD_ARGUMENTS: dict[str, dict[str, Any]] = {
    },
 }

+# Video specific (from video.py)
+VIDEO_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "video_url": {
+        "flags": ("--video-url",),
+        "kwargs": {
+            "type": str,
+            "help": "Video URL (YouTube, Vimeo)",
+            "metavar": "URL",
+        },
+    },
+    "video_file": {
+        "flags": ("--video-file",),
+        "kwargs": {
+            "type": str,
+            "help": "Local video file path",
+            "metavar": "PATH",
+        },
+    },
+    "video_playlist": {
+        "flags": ("--video-playlist",),
+        "kwargs": {
+            "type": str,
+            "help": "Playlist URL",
+            "metavar": "URL",
+        },
+    },
+    "video_languages": {
+        "flags": ("--video-languages",),
+        "kwargs": {
+            "type": str,
+            "default": "en",
+            "help": "Transcript language preference (comma-separated)",
+            "metavar": "LANGS",
+        },
+    },
+    "visual": {
+        "flags": ("--visual",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Enable visual extraction (requires video-full deps)",
+        },
+    },
+    "whisper_model": {
+        "flags": ("--whisper-model",),
+        "kwargs": {
+            "type": str,
+            "default": "base",
+            "help": "Whisper model size (default: base)",
+            "metavar": "MODEL",
+        },
+    },
+    "visual_interval": {
+        "flags": ("--visual-interval",),
+        "kwargs": {
+            "type": float,
+            "default": 0.7,
+            "help": "Visual scan interval in seconds (default: 0.7)",
+            "metavar": "SECS",
+        },
+    },
+    "visual_min_gap": {
+        "flags": ("--visual-min-gap",),
+        "kwargs": {
+            "type": float,
+            "default": 0.5,
+            "help": "Min gap between extracted frames in seconds (default: 0.5)",
+            "metavar": "SECS",
+        },
+    },
+    "visual_similarity": {
+        "flags": ("--visual-similarity",),
+        "kwargs": {
+            "type": float,
+            "default": 3.0,
+            "help": "Pixel-diff threshold for duplicate detection; lower = more frames (default: 3.0)",
+            "metavar": "THRESH",
+        },
+    },
+    "vision_ocr": {
+        "flags": ("--vision-ocr",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Use Claude Vision API as fallback for low-confidence code frames (requires ANTHROPIC_API_KEY, ~$0.004/frame)",
+        },
+    },
+    "start_time": {
+        "flags": ("--start-time",),
+        "kwargs": {
+            "type": str,
+            "default": None,
+            "metavar": "TIME",
+            "help": "Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
+        },
+    },
+    "end_time": {
+        "flags": ("--end-time",),
+        "kwargs": {
+            "type": str,
+            "default": None,
+            "metavar": "TIME",
+            "help": "End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
+        },
+    },
+}
+
 # Multi-source config specific (from unified_scraper.py)
 CONFIG_ARGUMENTS: dict[str, dict[str, Any]] = {
    "merge_mode": {
@@ -493,6 +598,7 @@ def get_source_specific_arguments(source_type: str) -> dict[str, dict[str, Any]]
        "local": LOCAL_ARGUMENTS,
        "pdf": PDF_ARGUMENTS,
        "word": WORD_ARGUMENTS,
+        "video": VIDEO_ARGUMENTS,
        "config": CONFIG_ARGUMENTS,
    }
    return source_args.get(source_type, {})
@@ -530,6 +636,7 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
    - 'local': Universal + local-specific
    - 'pdf': Universal + pdf-specific
    - 'word': Universal + word-specific
+    - 'video': Universal + video-specific
    - 'advanced': Advanced/rare arguments
    - 'all': All 120+ arguments

@@ -570,6 +677,10 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
        for arg_name, arg_def in WORD_ARGUMENTS.items():
            parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])

+    if mode in ["video", "all"]:
+        for arg_name, arg_def in VIDEO_ARGUMENTS.items():
+            parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
+
    if mode in ["config", "all"]:
        for arg_name, arg_def in CONFIG_ARGUMENTS.items():
            parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
--- a/src/skill_seekers/cli/arguments/video.py
+++ b/src/skill_seekers/cli/arguments/video.py
@@ -0,0 +1,166 @@
+"""Video command argument definitions.
+
+This module defines ALL arguments for the video command in ONE place.
+Both video_scraper.py (standalone) and parsers/video_parser.py (unified CLI)
+import and use these definitions.
+
+Shared arguments (name, description, output, enhance-level, api-key,
+dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
+via ``add_all_standard_arguments()``.
+"""
+
+import argparse
+from typing import Any
+
+from .common import add_all_standard_arguments
+
+# Video-specific argument definitions as data structure
+# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
+#       verbose, quiet, workflow args) are registered by add_all_standard_arguments().
+VIDEO_ARGUMENTS: dict[str, dict[str, Any]] = {
+    "url": {
+        "flags": ("--url",),
+        "kwargs": {
+            "type": str,
+            "help": "Video URL (YouTube, Vimeo)",
+            "metavar": "URL",
+        },
+    },
+    "video_file": {
+        "flags": ("--video-file",),
+        "kwargs": {
+            "type": str,
+            "help": "Local video file path",
+            "metavar": "PATH",
+        },
+    },
+    "playlist": {
+        "flags": ("--playlist",),
+        "kwargs": {
+            "type": str,
+            "help": "Playlist URL",
+            "metavar": "URL",
+        },
+    },
+    "languages": {
+        "flags": ("--languages",),
+        "kwargs": {
+            "type": str,
+            "default": "en",
+            "help": "Transcript language preference (comma-separated, default: en)",
+            "metavar": "LANGS",
+        },
+    },
+    "visual": {
+        "flags": ("--visual",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Enable visual extraction (requires video-full deps)",
+        },
+    },
+    "whisper_model": {
+        "flags": ("--whisper-model",),
+        "kwargs": {
+            "type": str,
+            "default": "base",
+            "help": "Whisper model size (default: base)",
+            "metavar": "MODEL",
+        },
+    },
+    "from_json": {
+        "flags": ("--from-json",),
+        "kwargs": {
+            "type": str,
+            "help": "Build skill from extracted JSON",
+            "metavar": "FILE",
+        },
+    },
+    "visual_interval": {
+        "flags": ("--visual-interval",),
+        "kwargs": {
+            "type": float,
+            "default": 0.7,
+            "help": "Visual scan interval in seconds (default: 0.7)",
+            "metavar": "SECS",
+        },
+    },
+    "visual_min_gap": {
+        "flags": ("--visual-min-gap",),
+        "kwargs": {
+            "type": float,
+            "default": 0.5,
+            "help": "Minimum gap between extracted frames in seconds (default: 0.5)",
+            "metavar": "SECS",
+        },
+    },
+    "visual_similarity": {
+        "flags": ("--visual-similarity",),
+        "kwargs": {
+            "type": float,
+            "default": 3.0,
+            "help": "Pixel-diff threshold for duplicate frame detection; lower = more frames kept (default: 3.0)",
+            "metavar": "THRESH",
+        },
+    },
+    "vision_ocr": {
+        "flags": ("--vision-ocr",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Use Claude Vision API as fallback for low-confidence code frames (requires ANTHROPIC_API_KEY, ~$0.004/frame)",
+        },
+    },
+    "start_time": {
+        "flags": ("--start-time",),
+        "kwargs": {
+            "type": str,
+            "default": None,
+            "metavar": "TIME",
+            "help": "Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
+        },
+    },
+    "end_time": {
+        "flags": ("--end-time",),
+        "kwargs": {
+            "type": str,
+            "default": None,
+            "metavar": "TIME",
+            "help": "End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
+        },
+    },
+    "setup": {
+        "flags": ("--setup",),
+        "kwargs": {
+            "action": "store_true",
+            "help": "Auto-detect GPU and install visual extraction deps (PyTorch, easyocr, etc.)",
+        },
+    },
+}
+
+
+def add_video_arguments(parser: argparse.ArgumentParser) -> None:
+    """Add all video command arguments to a parser.
+
+    Registers shared args (name, description, output, enhance-level, api-key,
+    dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
+    then adds video-specific args on top.
+
+    The default for --enhance-level is overridden to 0 (disabled) for video.
+    """
+    # Shared universal args first
+    add_all_standard_arguments(parser)
+
+    # Override enhance-level default to 0 for video
+    for action in parser._actions:
+        if hasattr(action, "dest") and action.dest == "enhance_level":
+            action.default = 0
+            action.help = (
+                "AI enhancement level (auto-detects API vs LOCAL mode): "
+                "0=disabled (default for video), 1=SKILL.md only, 2=+architecture/config, 3=full enhancement. "
+                "Mode selection: uses API if ANTHROPIC_API_KEY is set, otherwise LOCAL (Claude Code)"
+            )
+
+    # Video-specific args
+    for arg_name, arg_def in VIDEO_ARGUMENTS.items():
+        flags = arg_def["flags"]
+        kwargs = arg_def["kwargs"]
+        parser.add_argument(*flags, **kwargs)
--- a/src/skill_seekers/cli/config_validator.py
+++ b/src/skill_seekers/cli/config_validator.py
@@ -27,7 +27,7 @@ class ConfigValidator:
    """

    # Valid source types
-    VALID_SOURCE_TYPES = {"documentation", "github", "pdf", "local"}
+    VALID_SOURCE_TYPES = {"documentation", "github", "pdf", "local", "word", "video"}

    # Valid merge modes
    VALID_MERGE_MODES = {"rule-based", "claude-enhanced"}
--- a/src/skill_seekers/cli/create_command.py
+++ b/src/skill_seekers/cli/create_command.py
@@ -134,6 +134,8 @@ class CreateCommand:
            return self._route_pdf()
        elif self.source_info.type == "word":
            return self._route_word()
+        elif self.source_info.type == "video":
+            return self._route_video()
        elif self.source_info.type == "config":
            return self._route_config()
        else:
@@ -349,6 +351,69 @@ class CreateCommand:
        finally:
            sys.argv = original_argv

+    def _route_video(self) -> int:
+        """Route to video scraper (video_scraper.py)."""
+        from skill_seekers.cli import video_scraper
+
+        # Reconstruct argv for video_scraper
+        argv = ["video_scraper"]
+
+        # Add video source (URL or file)
+        parsed = self.source_info.parsed
+        video_playlist = getattr(self.args, "video_playlist", None)
+        if parsed.get("source_kind") == "file":
+            argv.extend(["--video-file", parsed["file_path"]])
+        elif video_playlist:
+            # Explicit --video-playlist flag takes precedence
+            argv.extend(["--playlist", video_playlist])
+        elif parsed.get("url"):
+            url = parsed["url"]
+            # Detect playlist vs single video
+            if "playlist" in url.lower():
+                argv.extend(["--playlist", url])
+            else:
+                argv.extend(["--url", url])
+
+        # Add universal arguments
+        self._add_common_args(argv)
+
+        # Add video-specific arguments
+        video_langs = getattr(self.args, "video_languages", None) or getattr(
+            self.args, "languages", None
+        )
+        if video_langs:
+            argv.extend(["--languages", video_langs])
+        if getattr(self.args, "visual", False):
+            argv.append("--visual")
+        if getattr(self.args, "vision_ocr", False):
+            argv.append("--vision-ocr")
+        if getattr(self.args, "whisper_model", None) and self.args.whisper_model != "base":
+            argv.extend(["--whisper-model", self.args.whisper_model])
+        vi = getattr(self.args, "visual_interval", None)
+        if vi is not None and vi != 0.7:
+            argv.extend(["--visual-interval", str(vi)])
+        vmg = getattr(self.args, "visual_min_gap", None)
+        if vmg is not None and vmg != 0.5:
+            argv.extend(["--visual-min-gap", str(vmg)])
+        vs = getattr(self.args, "visual_similarity", None)
+        if vs is not None and vs != 3.0:
+            argv.extend(["--visual-similarity", str(vs)])
+        st = getattr(self.args, "start_time", None)
+        if st is not None:
+            argv.extend(["--start-time", str(st)])
+        et = getattr(self.args, "end_time", None)
+        if et is not None:
+            argv.extend(["--end-time", str(et)])
+
+        # Call video_scraper with modified argv
+        logger.debug(f"Calling video_scraper with argv: {argv}")
+        original_argv = sys.argv
+        try:
+            sys.argv = argv
+            return video_scraper.main()
+        finally:
+            sys.argv = original_argv
+
    def _route_config(self) -> int:
        """Route to unified scraper for config files (unified_scraper.py)."""
        from skill_seekers.cli import unified_scraper
@@ -476,6 +541,8 @@ Examples:
  Local:    skill-seekers create ./my-project -p comprehensive
  PDF:      skill-seekers create tutorial.pdf --ocr
  DOCX:     skill-seekers create document.docx
+  Video:    skill-seekers create https://youtube.com/watch?v=...
+  Video:    skill-seekers create recording.mp4
  Config:   skill-seekers create configs/react.json

 Source Auto-Detection:
@@ -484,6 +551,8 @@ Source Auto-Detection:
  • ./path → local codebase
  • file.pdf → PDF extraction
  • file.docx → Word document extraction
+  • youtube.com/... → Video transcript extraction
+  • file.mp4 → Video file extraction
  • file.json → multi-source config

 Progressive Help (13 → 120+ flags):
@@ -491,6 +560,7 @@ Progressive Help (13 → 120+ flags):
  --help-github    GitHub repository options
  --help-local     Local codebase analysis
  --help-pdf       PDF extraction options
+  --help-video     Video extraction options
  --help-advanced  Rare/advanced options
  --help-all       All options + compatibility

@@ -521,6 +591,9 @@ Common Workflows:
    parser.add_argument(
        "--help-word", action="store_true", help=argparse.SUPPRESS, dest="_help_word"
    )
+    parser.add_argument(
+        "--help-video", action="store_true", help=argparse.SUPPRESS, dest="_help_video"
+    )
    parser.add_argument(
        "--help-config", action="store_true", help=argparse.SUPPRESS, dest="_help_config"
    )
@@ -579,6 +652,15 @@ Common Workflows:
        add_create_arguments(parser_word, mode="word")
        parser_word.print_help()
        return 0
+    elif args._help_video:
+        parser_video = argparse.ArgumentParser(
+            prog="skill-seekers create",
+            description="Create skill from video (YouTube, Vimeo, local files)",
+            formatter_class=argparse.RawDescriptionHelpFormatter,
+        )
+        add_create_arguments(parser_video, mode="video")
+        parser_video.print_help()
+        return 0
    elif args._help_config:
        parser_config = argparse.ArgumentParser(
            prog="skill-seekers create",
--- a/src/skill_seekers/cli/enhance_skill.py
+++ b/src/skill_seekers/cli/enhance_skill.py
@@ -97,9 +97,17 @@ class SkillEnhancer:
            print(f"❌ Error calling Claude API: {e}")
            return None

+    def _is_video_source(self, references):
+        """Check if the references come from video tutorial extraction."""
+        return any(meta["source"] == "video_tutorial" for meta in references.values())
+
    def _build_enhancement_prompt(self, references, current_skill_md):
        """Build the prompt for Claude with multi-source awareness"""

+        # Dispatch to video-specific prompt if video source detected
+        if self._is_video_source(references):
+            return self._build_video_enhancement_prompt(references, current_skill_md)
+
        # Extract skill name and description
        skill_name = self.skill_dir.name

@@ -276,6 +284,148 @@ Return ONLY the complete SKILL.md content, starting with the frontmatter (---).

        return prompt

+    def _build_video_enhancement_prompt(self, references, current_skill_md):
+        """Build a video-specific enhancement prompt.
+
+        Video tutorial references contain transcript text, OCR'd code panels,
+        code timelines with edits, and audio-visual alignment pairs. This prompt
+        is tailored to reconstruct clean code from noisy OCR, detect programming
+        languages from context, and synthesize a coherent tutorial skill.
+        """
+        skill_name = self.skill_dir.name
+
+        prompt = f"""You are enhancing a Claude skill built from VIDEO TUTORIAL extraction. This skill is about: {skill_name}
+
+The raw data was extracted from video tutorials using:
+1. **Transcript** (speech-to-text) — HIGH quality, this is the primary signal
+2. **OCR on code panels** — NOISY, may contain line numbers, UI chrome, garbled text
+3. **Code Timeline** — Tracks code evolution across frames with diffs
+4. **Audio-Visual Alignment** — Pairs of on-screen code + narrator explanation
+
+CURRENT SKILL.MD:
+{"```markdown" if current_skill_md else "(none - create from scratch)"}
+{current_skill_md or "No existing SKILL.md"}
+{"```" if current_skill_md else ""}
+
+REFERENCE FILES:
+"""
+
+        # Add all reference content
+        for filename, metadata in references.items():
+            content = metadata["content"]
+            if len(content) > 30000:
+                content = content[:30000] + "\n\n[Content truncated for size...]"
+            prompt += f"\n#### {filename}\n"
+            prompt += f"*Source: {metadata['source']}, Confidence: {metadata['confidence']}*\n\n"
+            prompt += f"```markdown\n{content}\n```\n"
+
+        prompt += """
+
+VIDEO-SPECIFIC ENHANCEMENT INSTRUCTIONS:
+
+You are working with data extracted from programming tutorial videos. The data has
+specific characteristics you MUST handle:
+
+## 1. OCR Code Reconstruction (CRITICAL)
+
+The OCR'd code blocks are NOISY. Common issues you MUST fix:
+- **Line numbers in code**: OCR captures line numbers (1, 2, 3...) as part of the code — STRIP THEM
+- **UI chrome contamination**: Tab bars, file names, button text appear in code blocks — REMOVE
+- **Garbled characters**: OCR errors like `l` → `1`, `O` → `0`, `rn` → `m` — FIX using context
+- **Duplicate fragments**: Same code appears across multiple frames with minor OCR variations — DEDUPLICATE
+- **Incomplete lines**: Lines cut off at panel edges — RECONSTRUCT from transcript context
+- **Animation/timeline numbers**: Frame counters or timeline numbers in code — REMOVE
+
+When reconstructing code:
+- The TRANSCRIPT is the ground truth for WHAT the code does
+- The OCR is the ground truth for HOW the code looks (syntax, structure)
+- Combine both: use transcript to understand intent, OCR for actual code structure
+- If OCR is too garbled, reconstruct the code based on what the narrator describes
+
+## 2. Language Detection
+
+The OCR-based language detection is often WRONG. Fix it by:
+- Reading the transcript for language mentions ("in GDScript", "this Python function", "our C# class")
+- Using code patterns: `extends`, `func`, `var`, `signal` = GDScript; `def`, `class`, `import` = Python;
+  `function`, `const`, `let` = JavaScript/TypeScript; `using`, `namespace` = C#
+- Looking at file extensions mentioned in the transcript or visible in tab bars
+- Using proper language tags in all code fences (```gdscript, ```python, etc.)
+
+## 3. Code Timeline Processing
+
+The "Code Timeline" section shows how code EVOLVES during the tutorial. Use it to:
+- Show the FINAL version of each code block (not intermediate states)
+- Optionally show key intermediate steps if the tutorial is about building up code progressively
+- The edit diffs show exactly what changed between frames — use these to understand the tutorial flow
+
+## 4. Audio-Visual Alignment
+
+These are the MOST VALUABLE pairs: each links on-screen code with the narrator's explanation.
+- Use these to create annotated code examples with inline comments
+- The narrator text explains WHY each piece of code exists
+- Cross-reference these pairs to build the "how-to" sections
+
+## 5. Tutorial Structure
+
+Transform the raw chronological data into a LOGICAL tutorial structure:
+- Group by TOPIC, not by timestamp (e.g., "Setting Up the State Machine" not "Segment 3")
+- Create clear section headers that describe what is being TAUGHT
+- Build a progressive learning path: concepts build on each other
+- Include prerequisite knowledge mentioned by the narrator
+
+YOUR TASK — Create an enhanced SKILL.md:
+
+1. **Clean Overview Section**
+   - What does this tutorial teach? (from transcript, NOT generic)
+   - Prerequisites mentioned by the narrator
+   - Key technologies/frameworks used (from actual code, not guesses)
+
+2. **"When to Use This Skill" Section**
+   - Specific trigger conditions based on what the tutorial covers
+   - Use cases directly from the tutorial content
+   - Reference the framework/library/tool being taught
+
+3. **Quick Reference Section** (MOST IMPORTANT)
+   - Extract 5-10 CLEAN, reconstructed code examples
+   - Each example must be:
+     a. Denoised (no line numbers, no UI chrome, no garbled text)
+     b. Complete (not cut off mid-line)
+     c. Properly language-tagged
+     d. Annotated with a description from the transcript
+   - Prefer code from Audio-Visual Alignment pairs (they have narrator context)
+   - Show the FINAL working version of each code block
+
+4. **Step-by-Step Tutorial Section**
+   - Follow the tutorial's teaching flow
+   - Each step includes: clean code + explanation from transcript
+   - Use narrator's explanations as the descriptions (paraphrase, don't copy verbatim)
+   - Show code evolution where the tutorial builds up code incrementally
+
+5. **Key Concepts Section**
+   - Extract terminology and concepts the narrator explains
+   - Define them using the narrator's own explanations
+   - Link concepts to specific code examples
+
+6. **Reference Files Description**
+   - Explain what each reference file contains
+   - Note that OCR data is raw and may contain errors
+   - Point to the most useful sections (Audio-Visual Alignment, Code Timeline)
+
+7. **Keep the frontmatter** (---\\nname: ...\\n---) intact if present
+
+CRITICAL RULES:
+- NEVER include raw OCR text with line numbers or UI chrome — always clean it first
+- ALWAYS use correct language tags (detect from context, not from OCR metadata)
+- The transcript is your BEST source for understanding content — trust it over garbled OCR
+- Extract REAL code from the references, reconstruct where needed, but never invent code
+- Keep code examples SHORT and focused (5-30 lines max per example)
+- Make the skill actionable: someone reading it should be able to implement what the tutorial teaches
+
+OUTPUT:
+Return ONLY the complete SKILL.md content, starting with the frontmatter (---).
+"""
+        return prompt
+
    def save_enhanced_skill_md(self, content):
        """Save the enhanced SKILL.md"""
        # Backup original
--- a/src/skill_seekers/cli/main.py
+++ b/src/skill_seekers/cli/main.py
@@ -12,6 +12,8 @@ Commands:
    scrape               Scrape documentation website
    github               Scrape GitHub repository
    pdf                  Extract from PDF file
+    word                 Extract from Word (.docx) file
+    video                Extract from video (YouTube or local)
    unified              Multi-source scraping (docs + GitHub + PDF)
    analyze              Analyze local codebase and extract code knowledge
    enhance              AI-powered enhancement (auto: API or LOCAL mode)
@@ -48,6 +50,7 @@ COMMAND_MODULES = {
    "github": "skill_seekers.cli.github_scraper",
    "pdf": "skill_seekers.cli.pdf_scraper",
    "word": "skill_seekers.cli.word_scraper",
+    "video": "skill_seekers.cli.video_scraper",
    "unified": "skill_seekers.cli.unified_scraper",
    "enhance": "skill_seekers.cli.enhance_command",
    "enhance-status": "skill_seekers.cli.enhance_status",
@@ -142,7 +145,6 @@ def _reconstruct_argv(command: str, args: argparse.Namespace) -> list[str]:
        # Handle positional arguments (no -- prefix)
        if key in [
            "source",  # create command
-            "url",
            "directory",
            "file",
            "job_id",
--- a/src/skill_seekers/cli/parsers/init.py
+++ b/src/skill_seekers/cli/parsers/init.py
@@ -13,6 +13,7 @@ from .scrape_parser import ScrapeParser
 from .github_parser import GitHubParser
 from .pdf_parser import PDFParser
 from .word_parser import WordParser
+from .video_parser import VideoParser
 from .unified_parser import UnifiedParser
 from .enhance_parser import EnhanceParser
 from .enhance_status_parser import EnhanceStatusParser
@@ -43,6 +44,7 @@ PARSERS = [
    EnhanceStatusParser(),
    PDFParser(),
    WordParser(),
+    VideoParser(),
    UnifiedParser(),
    EstimateParser(),
    InstallParser(),
--- a/src/skill_seekers/cli/parsers/video_parser.py
+++ b/src/skill_seekers/cli/parsers/video_parser.py
@@ -0,0 +1,32 @@
+"""Video subcommand parser.
+
+Uses shared argument definitions from arguments.video to ensure
+consistency with the standalone video_scraper module.
+"""
+
+from .base import SubcommandParser
+from skill_seekers.cli.arguments.video import add_video_arguments
+
+
+class VideoParser(SubcommandParser):
+    """Parser for video subcommand."""
+
+    @property
+    def name(self) -> str:
+        return "video"
+
+    @property
+    def help(self) -> str:
+        return "Extract from video (YouTube, local files)"
+
+    @property
+    def description(self) -> str:
+        return "Extract transcripts and metadata from videos and generate skill"
+
+    def add_arguments(self, parser):
+        """Add video-specific arguments.
+
+        Uses shared argument definitions to ensure consistency
+        with video_scraper.py (standalone scraper).
+        """
+        add_video_arguments(parser)
--- a/src/skill_seekers/cli/source_detector.py
+++ b/src/skill_seekers/cli/source_detector.py
@@ -63,24 +63,34 @@ class SourceDetector:
        if source.endswith(".docx"):
            return cls._detect_word(source)

-        # 2. Directory detection
+        # Video file extensions
+        VIDEO_EXTENSIONS = (".mp4", ".mkv", ".avi", ".mov", ".webm", ".flv", ".wmv")
+        if source.lower().endswith(VIDEO_EXTENSIONS):
+            return cls._detect_video_file(source)
+
+        # 2. Video URL detection (before directory check)
+        video_url_info = cls._detect_video_url(source)
+        if video_url_info:
+            return video_url_info
+
+        # 3. Directory detection
        if os.path.isdir(source):
            return cls._detect_local(source)

-        # 3. GitHub patterns
+        # 4. GitHub patterns
        github_info = cls._detect_github(source)
        if github_info:
            return github_info

-        # 4. URL detection
+        # 5. URL detection
        if source.startswith("http://") or source.startswith("https://"):
            return cls._detect_web(source)

-        # 5. Domain inference (add https://)
+        # 6. Domain inference (add https://)
        if "." in source and not source.startswith("/"):
            return cls._detect_web(f"https://{source}")

-        # 6. Error - cannot determine
+        # 7. Error - cannot determine
        raise ValueError(
            f"Cannot determine source type for: {source}\n\n"
            "Examples:\n"
@@ -89,6 +99,8 @@ class SourceDetector:
            "  Local:  skill-seekers create ./my-project\n"
            "  PDF:    skill-seekers create tutorial.pdf\n"
            "  DOCX:   skill-seekers create document.docx\n"
+            "  Video:  skill-seekers create https://youtube.com/watch?v=...\n"
+            "  Video:  skill-seekers create recording.mp4\n"
            "  Config: skill-seekers create configs/react.json"
        )

@@ -116,6 +128,55 @@ class SourceDetector:
            type="word", parsed={"file_path": source}, suggested_name=name, raw_input=source
        )

+    @classmethod
+    def _detect_video_file(cls, source: str) -> SourceInfo:
+        """Detect local video file source."""
+        name = os.path.splitext(os.path.basename(source))[0]
+        return SourceInfo(
+            type="video",
+            parsed={"file_path": source, "source_kind": "file"},
+            suggested_name=name,
+            raw_input=source,
+        )
+
+    @classmethod
+    def _detect_video_url(cls, source: str) -> SourceInfo | None:
+        """Detect video platform URL (YouTube, Vimeo).
+
+        Returns SourceInfo if the source is a video URL, None otherwise.
+        """
+        lower = source.lower()
+
+        # YouTube patterns
+        youtube_keywords = ["youtube.com/watch", "youtu.be/", "youtube.com/playlist",
+                            "youtube.com/@", "youtube.com/channel/", "youtube.com/c/",
+                            "youtube.com/shorts/", "youtube.com/embed/"]
+        if any(kw in lower for kw in youtube_keywords):
+            # Determine suggested name
+            if "playlist" in lower:
+                name = "youtube_playlist"
+            elif "/@" in lower or "/channel/" in lower or "/c/" in lower:
+                name = "youtube_channel"
+            else:
+                name = "youtube_video"
+            return SourceInfo(
+                type="video",
+                parsed={"url": source, "source_kind": "url"},
+                suggested_name=name,
+                raw_input=source,
+            )
+
+        # Vimeo patterns
+        if "vimeo.com/" in lower:
+            return SourceInfo(
+                type="video",
+                parsed={"url": source, "source_kind": "url"},
+                suggested_name="vimeo_video",
+                raw_input=source,
+            )
+
+        return None
+
    @classmethod
    def _detect_local(cls, source: str) -> SourceInfo:
        """Detect local directory source."""
@@ -209,6 +270,15 @@ class SourceDetector:
            if not os.path.isfile(file_path):
                raise ValueError(f"Path is not a file: {file_path}")

+        elif source_info.type == "video":
+            if source_info.parsed.get("source_kind") == "file":
+                file_path = source_info.parsed["file_path"]
+                if not os.path.exists(file_path):
+                    raise ValueError(f"Video file does not exist: {file_path}")
+                if not os.path.isfile(file_path):
+                    raise ValueError(f"Path is not a file: {file_path}")
+            # URL-based video sources are validated during processing
+
        elif source_info.type == "config":
            config_path = source_info.parsed["config_path"]
            if not os.path.exists(config_path):
--- a/src/skill_seekers/cli/unified_scraper.py
+++ b/src/skill_seekers/cli/unified_scraper.py
@@ -74,11 +74,19 @@ class UnifiedScraper:
            "github": [],  # List of github sources
            "pdf": [],  # List of pdf sources
            "word": [],  # List of word sources
+            "video": [],  # List of video sources
            "local": [],  # List of local sources (docs or code)
        }

        # Track source index for unique naming (multi-source support)
-        self._source_counters = {"documentation": 0, "github": 0, "pdf": 0, "word": 0, "local": 0}
+        self._source_counters = {
+            "documentation": 0,
+            "github": 0,
+            "pdf": 0,
+            "word": 0,
+            "video": 0,
+            "local": 0,
+        }

        # Output paths - cleaner organization
        self.name = self.config["name"]
@@ -154,6 +162,8 @@ class UnifiedScraper:
                    self._scrape_pdf(source)
                elif source_type == "word":
                    self._scrape_word(source)
+                elif source_type == "video":
+                    self._scrape_video(source)
                elif source_type == "local":
                    self._scrape_local(source)
                else:
@@ -576,6 +586,66 @@ class UnifiedScraper:

        logger.info(f"✅ Word: {len(word_data.get('pages', []))} sections extracted")

+    def _scrape_video(self, source: dict[str, Any]):
+        """Scrape video source (YouTube, local file, etc.)."""
+        try:
+            from skill_seekers.cli.video_scraper import VideoToSkillConverter
+        except ImportError as e:
+            logger.error(
+                f"Video scraper dependencies not installed: {e}\n"
+                "  Install with: pip install skill-seekers[video]\n"
+                "  For visual extraction (frame analysis, OCR): pip install skill-seekers[video-full]"
+            )
+            return
+
+        # Multi-source support: Get unique index for this video source
+        idx = self._source_counters["video"]
+        self._source_counters["video"] += 1
+
+        # Determine video identifier
+        video_url = source.get("url", "")
+        video_id = video_url or source.get("path", f"video_{idx}")
+
+        # Create config for video scraper
+        video_config = {
+            "name": f"{self.name}_video_{idx}",
+            "url": source.get("url"),
+            "video_file": source.get("path"),
+            "playlist": source.get("playlist"),
+            "description": source.get("description", ""),
+            "languages": ",".join(source.get("languages", ["en"])),
+            "visual": source.get("visual_extraction", False),
+            "whisper_model": source.get("whisper_model", "base"),
+        }
+
+        # Process video
+        logger.info(f"Scraping video: {video_id}")
+        converter = VideoToSkillConverter(video_config)
+
+        try:
+            result = converter.process()
+            converter.save_extracted_data()
+
+            # Append to list
+            self.scraped_data["video"].append(
+                {
+                    "video_id": video_id,
+                    "idx": idx,
+                    "data": result.to_dict(),
+                    "data_file": converter.data_file,
+                }
+            )
+
+            # Build standalone SKILL.md for synthesis
+            converter.build_skill()
+            logger.info("✅ Video: Standalone SKILL.md created")
+
+            logger.info(
+                f"✅ Video: {len(result.videos)} videos, {result.total_segments} segments extracted"
+            )
+        except Exception as e:
+            logger.error(f"Failed to process video source: {e}")
+
    def _scrape_local(self, source: dict[str, Any]):
        """
        Scrape local directory (documentation files or source code).
--- a/src/skill_seekers/cli/utils.py
+++ b/src/skill_seekers/cli/utils.py
@@ -289,6 +289,10 @@ def read_reference_files(
            else:
                return "codebase_analysis", "medium", repo_id

+        # Video tutorial sources (video_*.md from video scraper)
+        elif relative_path.name.startswith("video_"):
+            return "video_tutorial", "high", None
+
        # Conflicts report (discrepancy detection)
        elif "conflicts" in path_str:
            return "conflicts", "medium", None
--- a/src/skill_seekers/cli/video_metadata.py
+++ b/src/skill_seekers/cli/video_metadata.py
@@ -0,0 +1,270 @@
+"""Video metadata extraction module.
+
+Uses yt-dlp for metadata extraction without downloading video content.
+Supports YouTube, Vimeo, and local video files.
+"""
+
+import hashlib
+import logging
+import os
+import re
+
+from skill_seekers.cli.video_models import (
+    Chapter,
+    VideoInfo,
+    VideoSourceType,
+)
+
+logger = logging.getLogger(__name__)
+
+# Optional dependency: yt-dlp
+try:
+    import yt_dlp
+
+    HAS_YTDLP = True
+except ImportError:
+    HAS_YTDLP = False
+
+
+# =============================================================================
+# Video ID Extraction
+# =============================================================================
+
+
+# YouTube URL patterns
+YOUTUBE_PATTERNS = [
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})"),
+    re.compile(r"(?:https?://)?youtu\.be/([a-zA-Z0-9_-]{11})"),
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/embed/([a-zA-Z0-9_-]{11})"),
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/v/([a-zA-Z0-9_-]{11})"),
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/shorts/([a-zA-Z0-9_-]{11})"),
+]
+
+YOUTUBE_PLAYLIST_PATTERN = re.compile(
+    r"(?:https?://)?(?:www\.)?youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)"
+)
+
+YOUTUBE_CHANNEL_PATTERNS = [
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/@([a-zA-Z0-9_-]+)"),
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/channel/([a-zA-Z0-9_-]+)"),
+    re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/c/([a-zA-Z0-9_-]+)"),
+]
+
+VIMEO_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?vimeo\.com/(\d+)")
+
+
+def extract_video_id(url: str) -> str | None:
+    """Extract YouTube video ID from various URL formats.
+
+    Args:
+        url: YouTube URL in any supported format.
+
+    Returns:
+        11-character video ID, or None if not a YouTube URL.
+    """
+    for pattern in YOUTUBE_PATTERNS:
+        match = pattern.search(url)
+        if match:
+            return match.group(1)
+    return None
+
+
+def detect_video_source_type(url_or_path: str) -> VideoSourceType:
+    """Detect the source type of a video URL or file path.
+
+    Args:
+        url_or_path: URL or local file path.
+
+    Returns:
+        VideoSourceType enum value.
+    """
+    if os.path.isfile(url_or_path):
+        return VideoSourceType.LOCAL_FILE
+    if os.path.isdir(url_or_path):
+        return VideoSourceType.LOCAL_DIRECTORY
+
+    url_lower = url_or_path.lower()
+    if "youtube.com" in url_lower or "youtu.be" in url_lower:
+        return VideoSourceType.YOUTUBE
+    if "vimeo.com" in url_lower:
+        return VideoSourceType.VIMEO
+
+    return VideoSourceType.LOCAL_FILE
+
+
+# =============================================================================
+# YouTube Metadata via yt-dlp
+# =============================================================================
+
+
+def _check_ytdlp():
+    """Raise RuntimeError if yt-dlp is not installed."""
+    if not HAS_YTDLP:
+        raise RuntimeError(
+            "yt-dlp is required for video metadata extraction.\n"
+            'Install with: pip install "skill-seekers[video]"\n'
+            "Or: pip install yt-dlp"
+        )
+
+
+def extract_youtube_metadata(url: str) -> VideoInfo:
+    """Extract metadata from a YouTube video URL without downloading.
+
+    Args:
+        url: YouTube video URL.
+
+    Returns:
+        VideoInfo with metadata populated.
+
+    Raises:
+        RuntimeError: If yt-dlp is not installed.
+    """
+    _check_ytdlp()
+
+    ydl_opts = {
+        "quiet": True,
+        "no_warnings": True,
+        "extract_flat": False,
+        "skip_download": True,
+    }
+
+    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+        info = ydl.extract_info(url, download=False)
+
+    video_id = info.get("id", extract_video_id(url) or "unknown")
+
+    # Parse chapters
+    chapters = []
+    raw_chapters = info.get("chapters") or []
+    for i, ch in enumerate(raw_chapters):
+        end_time = ch.get("end_time", 0)
+        if i + 1 < len(raw_chapters):
+            end_time = raw_chapters[i + 1].get("start_time", end_time)
+        chapters.append(
+            Chapter(
+                title=ch.get("title", f"Chapter {i + 1}"),
+                start_time=ch.get("start_time", 0),
+                end_time=end_time,
+            )
+        )
+
+    return VideoInfo(
+        video_id=video_id,
+        source_type=VideoSourceType.YOUTUBE,
+        source_url=url,
+        title=info.get("title", ""),
+        description=info.get("description", ""),
+        duration=float(info.get("duration", 0)),
+        upload_date=info.get("upload_date"),
+        language=info.get("language") or "en",
+        channel_name=info.get("channel") or info.get("uploader"),
+        channel_url=info.get("channel_url") or info.get("uploader_url"),
+        view_count=info.get("view_count"),
+        like_count=info.get("like_count"),
+        comment_count=info.get("comment_count"),
+        tags=info.get("tags") or [],
+        categories=info.get("categories") or [],
+        thumbnail_url=info.get("thumbnail"),
+        chapters=chapters,
+    )
+
+
+def extract_local_metadata(file_path: str) -> VideoInfo:
+    """Extract basic metadata from a local video file.
+
+    Args:
+        file_path: Path to video file.
+
+    Returns:
+        VideoInfo with basic metadata from filename/file properties.
+    """
+    path = os.path.abspath(file_path)
+    name = os.path.splitext(os.path.basename(path))[0]
+    video_id = hashlib.sha256(path.encode()).hexdigest()[:16]
+
+    return VideoInfo(
+        video_id=video_id,
+        source_type=VideoSourceType.LOCAL_FILE,
+        file_path=path,
+        title=name.replace("-", " ").replace("_", " ").title(),
+        duration=0.0,  # Would need ffprobe for accurate duration
+    )
+
+
+# =============================================================================
+# Playlist / Channel Resolution
+# =============================================================================
+
+
+def resolve_playlist(url: str) -> list[str]:
+    """Resolve a YouTube playlist URL to a list of video URLs.
+
+    Args:
+        url: YouTube playlist URL.
+
+    Returns:
+        List of video URLs in playlist order.
+
+    Raises:
+        RuntimeError: If yt-dlp is not installed.
+    """
+    _check_ytdlp()
+
+    ydl_opts = {
+        "quiet": True,
+        "no_warnings": True,
+        "extract_flat": True,
+        "skip_download": True,
+    }
+
+    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+        info = ydl.extract_info(url, download=False)
+
+    entries = info.get("entries") or []
+    video_urls = []
+    for entry in entries:
+        vid_url = entry.get("url") or entry.get("webpage_url")
+        if vid_url:
+            video_urls.append(vid_url)
+        elif entry.get("id"):
+            video_urls.append(f"https://www.youtube.com/watch?v={entry['id']}")
+
+    return video_urls
+
+
+def resolve_channel(url: str, max_videos: int = 50) -> list[str]:
+    """Resolve a YouTube channel URL to a list of recent video URLs.
+
+    Args:
+        url: YouTube channel URL.
+        max_videos: Maximum number of videos to resolve.
+
+    Returns:
+        List of video URLs (most recent first).
+
+    Raises:
+        RuntimeError: If yt-dlp is not installed.
+    """
+    _check_ytdlp()
+
+    ydl_opts = {
+        "quiet": True,
+        "no_warnings": True,
+        "extract_flat": True,
+        "skip_download": True,
+        "playlistend": max_videos,
+    }
+
+    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+        info = ydl.extract_info(url, download=False)
+
+    entries = info.get("entries") or []
+    video_urls = []
+    for entry in entries:
+        vid_url = entry.get("url") or entry.get("webpage_url")
+        if vid_url:
+            video_urls.append(vid_url)
+        elif entry.get("id"):
+            video_urls.append(f"https://www.youtube.com/watch?v={entry['id']}")
+
+    return video_urls[:max_videos]
--- a/src/skill_seekers/cli/video_models.py
+++ b/src/skill_seekers/cli/video_models.py
@@ -0,0 +1,848 @@
+"""Video source data models and type definitions.
+
+Defines all enumerations and dataclasses for the video extraction pipeline:
+- Enums: VideoSourceType, TranscriptSource, FrameType, CodeContext, SegmentContentType
+- Core: VideoInfo, VideoSegment, VideoScraperResult
+- Supporting: Chapter, TranscriptSegment, WordTimestamp, KeyFrame, OCRRegion,
+  FrameSubSection, CodeBlock
+- Config: VideoSourceConfig
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Any
+
+
+# =============================================================================
+# Enumerations
+# =============================================================================
+
+
+class VideoSourceType(Enum):
+    """Where a video came from."""
+
+    YOUTUBE = "youtube"
+    VIMEO = "vimeo"
+    LOCAL_FILE = "local_file"
+    LOCAL_DIRECTORY = "local_directory"
+
+
+class TranscriptSource(Enum):
+    """How the transcript was obtained."""
+
+    YOUTUBE_MANUAL = "youtube_manual"
+    YOUTUBE_AUTO = "youtube_auto_generated"
+    WHISPER = "whisper"
+    SUBTITLE_FILE = "subtitle_file"
+    NONE = "none"
+
+
+class FrameType(Enum):
+    """Classification of a keyframe's visual content."""
+
+    CODE_EDITOR = "code_editor"
+    TERMINAL = "terminal"
+    SLIDE = "slide"
+    DIAGRAM = "diagram"
+    BROWSER = "browser"
+    WEBCAM = "webcam"
+    SCREENCAST = "screencast"
+    OTHER = "other"
+
+
+class CodeContext(Enum):
+    """Where code was displayed in the video."""
+
+    EDITOR = "editor"
+    TERMINAL = "terminal"
+    SLIDE = "slide"
+    BROWSER = "browser"
+    UNKNOWN = "unknown"
+
+
+class SegmentContentType(Enum):
+    """Primary content type of a video segment."""
+
+    EXPLANATION = "explanation"
+    LIVE_CODING = "live_coding"
+    DEMO = "demo"
+    SLIDES = "slides"
+    Q_AND_A = "q_and_a"
+    INTRO = "intro"
+    OUTRO = "outro"
+    MIXED = "mixed"
+
+
+class SegmentationStrategy(Enum):
+    """How segments are determined."""
+
+    CHAPTERS = "chapters"
+    TIME_WINDOW = "time_window"
+    SCENE_CHANGE = "scene_change"
+    HYBRID = "hybrid"
+
+
+# =============================================================================
+# Supporting Data Classes
+# =============================================================================
+
+
+@dataclass(frozen=True)
+class Chapter:
+    """A chapter marker from a video (typically YouTube)."""
+
+    title: str
+    start_time: float
+    end_time: float
+
+    @property
+    def duration(self) -> float:
+        return self.end_time - self.start_time
+
+    def to_dict(self) -> dict:
+        return {
+            "title": self.title,
+            "start_time": self.start_time,
+            "end_time": self.end_time,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> Chapter:
+        return cls(
+            title=data["title"],
+            start_time=data["start_time"],
+            end_time=data["end_time"],
+        )
+
+
+@dataclass(frozen=True)
+class WordTimestamp:
+    """A single word with precise timing information."""
+
+    word: str
+    start: float
+    end: float
+    probability: float = 1.0
+
+    def to_dict(self) -> dict:
+        return {
+            "word": self.word,
+            "start": self.start,
+            "end": self.end,
+            "probability": self.probability,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> WordTimestamp:
+        return cls(
+            word=data["word"],
+            start=data["start"],
+            end=data["end"],
+            probability=data.get("probability", 1.0),
+        )
+
+
+@dataclass(frozen=True)
+class TranscriptSegment:
+    """A raw transcript segment from YouTube API or Whisper."""
+
+    text: str
+    start: float
+    end: float
+    confidence: float = 1.0
+    words: list[WordTimestamp] | None = None
+    source: TranscriptSource = TranscriptSource.NONE
+
+    def to_dict(self) -> dict:
+        return {
+            "text": self.text,
+            "start": self.start,
+            "end": self.end,
+            "confidence": self.confidence,
+            "words": [w.to_dict() for w in self.words] if self.words else None,
+            "source": self.source.value,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> TranscriptSegment:
+        words = None
+        if data.get("words"):
+            words = [WordTimestamp.from_dict(w) for w in data["words"]]
+        return cls(
+            text=data["text"],
+            start=data["start"],
+            end=data["end"],
+            confidence=data.get("confidence", 1.0),
+            words=words,
+            source=TranscriptSource(data.get("source", "none")),
+        )
+
+
+@dataclass(frozen=True)
+class OCRRegion:
+    """A detected text region in a video frame."""
+
+    text: str
+    confidence: float
+    bbox: tuple[int, int, int, int]
+    is_monospace: bool = False
+
+    def to_dict(self) -> dict:
+        return {
+            "text": self.text,
+            "confidence": self.confidence,
+            "bbox": list(self.bbox),
+            "is_monospace": self.is_monospace,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> OCRRegion:
+        return cls(
+            text=data["text"],
+            confidence=data["confidence"],
+            bbox=tuple(data["bbox"]),
+            is_monospace=data.get("is_monospace", False),
+        )
+
+
+@dataclass
+class FrameSubSection:
+    """A single panel/region within a video frame, OCR'd independently.
+
+    Each IDE panel (e.g. code editor, terminal, file tree) is detected
+    as a separate sub-section so that side-by-side editors produce
+    independent OCR results instead of being merged into one blob.
+    """
+
+    bbox: tuple[int, int, int, int]  # (x1, y1, x2, y2)
+    frame_type: FrameType = FrameType.OTHER
+    ocr_text: str = ""
+    ocr_regions: list[OCRRegion] = field(default_factory=list)
+    ocr_confidence: float = 0.0
+    panel_id: str = ""  # e.g. "panel_0_0" (row_col)
+    _vision_used: bool = False  # Whether Vision API was used for OCR
+
+    def to_dict(self) -> dict:
+        return {
+            "bbox": list(self.bbox),
+            "frame_type": self.frame_type.value,
+            "ocr_text": self.ocr_text,
+            "ocr_regions": [r.to_dict() for r in self.ocr_regions],
+            "ocr_confidence": self.ocr_confidence,
+            "panel_id": self.panel_id,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> FrameSubSection:
+        return cls(
+            bbox=tuple(data["bbox"]),
+            frame_type=FrameType(data.get("frame_type", "other")),
+            ocr_text=data.get("ocr_text", ""),
+            ocr_regions=[OCRRegion.from_dict(r) for r in data.get("ocr_regions", [])],
+            ocr_confidence=data.get("ocr_confidence", 0.0),
+            panel_id=data.get("panel_id", ""),
+        )
+
+
+@dataclass
+class KeyFrame:
+    """An extracted video frame with visual analysis results."""
+
+    timestamp: float
+    image_path: str
+    frame_type: FrameType = FrameType.OTHER
+    scene_change_score: float = 0.0
+    ocr_regions: list[OCRRegion] = field(default_factory=list)
+    ocr_text: str = ""
+    ocr_confidence: float = 0.0
+    width: int = 0
+    height: int = 0
+    sub_sections: list[FrameSubSection] = field(default_factory=list)
+
+    def to_dict(self) -> dict:
+        return {
+            "timestamp": self.timestamp,
+            "image_path": self.image_path,
+            "frame_type": self.frame_type.value,
+            "scene_change_score": self.scene_change_score,
+            "ocr_regions": [r.to_dict() for r in self.ocr_regions],
+            "ocr_text": self.ocr_text,
+            "ocr_confidence": self.ocr_confidence,
+            "width": self.width,
+            "height": self.height,
+            "sub_sections": [ss.to_dict() for ss in self.sub_sections],
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> KeyFrame:
+        return cls(
+            timestamp=data["timestamp"],
+            image_path=data["image_path"],
+            frame_type=FrameType(data.get("frame_type", "other")),
+            scene_change_score=data.get("scene_change_score", 0.0),
+            ocr_regions=[OCRRegion.from_dict(r) for r in data.get("ocr_regions", [])],
+            ocr_text=data.get("ocr_text", ""),
+            ocr_confidence=data.get("ocr_confidence", 0.0),
+            width=data.get("width", 0),
+            height=data.get("height", 0),
+            sub_sections=[FrameSubSection.from_dict(ss) for ss in data.get("sub_sections", [])],
+        )
+
+
+@dataclass
+class CodeBlock:
+    """A code block detected via OCR from video frames."""
+
+    code: str
+    language: str | None = None
+    source_frame: float = 0.0
+    context: CodeContext = CodeContext.UNKNOWN
+    confidence: float = 0.0
+    text_group_id: str = ""
+
+    def to_dict(self) -> dict:
+        return {
+            "code": self.code,
+            "language": self.language,
+            "source_frame": self.source_frame,
+            "context": self.context.value,
+            "confidence": self.confidence,
+            "text_group_id": self.text_group_id,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> CodeBlock:
+        return cls(
+            code=data["code"],
+            language=data.get("language"),
+            source_frame=data.get("source_frame", 0.0),
+            context=CodeContext(data.get("context", "unknown")),
+            confidence=data.get("confidence", 0.0),
+            text_group_id=data.get("text_group_id", ""),
+        )
+
+
+@dataclass
+class TextGroupEdit:
+    """Represents an edit detected between appearances of a text group."""
+
+    timestamp: float
+    added_lines: list[str] = field(default_factory=list)
+    removed_lines: list[str] = field(default_factory=list)
+    modified_lines: list[dict] = field(default_factory=list)
+
+    def to_dict(self) -> dict:
+        return {
+            "timestamp": self.timestamp,
+            "added_lines": self.added_lines,
+            "removed_lines": self.removed_lines,
+            "modified_lines": self.modified_lines,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> TextGroupEdit:
+        return cls(
+            timestamp=data["timestamp"],
+            added_lines=data.get("added_lines", []),
+            removed_lines=data.get("removed_lines", []),
+            modified_lines=data.get("modified_lines", []),
+        )
+
+
+@dataclass
+class TextGroup:
+    """A group of related text blocks tracked across the video.
+
+    Represents a single code file/snippet as it appears and evolves
+    across multiple video frames.
+    """
+
+    group_id: str
+    appearances: list[tuple[float, float]] = field(default_factory=list)
+    consensus_lines: list[dict] = field(default_factory=list)
+    edits: list[TextGroupEdit] = field(default_factory=list)
+    detected_language: str | None = None
+    frame_type: FrameType = FrameType.CODE_EDITOR
+    panel_id: str = ""  # Tracks which panel this group originated from
+
+    @property
+    def full_text(self) -> str:
+        return "\n".join(line["text"] for line in self.consensus_lines if line.get("text"))
+
+    def to_dict(self) -> dict:
+        return {
+            "group_id": self.group_id,
+            "appearances": [[s, e] for s, e in self.appearances],
+            "consensus_lines": self.consensus_lines,
+            "edits": [e.to_dict() for e in self.edits],
+            "detected_language": self.detected_language,
+            "frame_type": self.frame_type.value,
+            "panel_id": self.panel_id,
+            "full_text": self.full_text,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> TextGroup:
+        return cls(
+            group_id=data["group_id"],
+            appearances=[tuple(a) for a in data.get("appearances", [])],
+            consensus_lines=data.get("consensus_lines", []),
+            edits=[TextGroupEdit.from_dict(e) for e in data.get("edits", [])],
+            detected_language=data.get("detected_language"),
+            frame_type=FrameType(data.get("frame_type", "code_editor")),
+            panel_id=data.get("panel_id", ""),
+        )
+
+
+@dataclass
+class TextGroupTimeline:
+    """Timeline of all text groups and their lifecycle in the video."""
+
+    text_groups: list[TextGroup] = field(default_factory=list)
+    total_code_time: float = 0.0
+    total_groups: int = 0
+    total_edits: int = 0
+
+    def get_groups_at_time(self, timestamp: float) -> list[TextGroup]:
+        """Return all text groups visible at a given timestamp."""
+        return [
+            tg
+            for tg in self.text_groups
+            if any(start <= timestamp <= end for start, end in tg.appearances)
+        ]
+
+    def to_dict(self) -> dict:
+        return {
+            "text_groups": [tg.to_dict() for tg in self.text_groups],
+            "total_code_time": self.total_code_time,
+            "total_groups": self.total_groups,
+            "total_edits": self.total_edits,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> TextGroupTimeline:
+        return cls(
+            text_groups=[TextGroup.from_dict(tg) for tg in data.get("text_groups", [])],
+            total_code_time=data.get("total_code_time", 0.0),
+            total_groups=data.get("total_groups", 0),
+            total_edits=data.get("total_edits", 0),
+        )
+
+
+@dataclass
+class AudioVisualAlignment:
+    """Links on-screen code with concurrent transcript narration."""
+
+    text_group_id: str
+    start_time: float
+    end_time: float
+    on_screen_code: str
+    transcript_during: str
+    language: str | None = None
+
+    def to_dict(self) -> dict:
+        return {
+            "text_group_id": self.text_group_id,
+            "start_time": self.start_time,
+            "end_time": self.end_time,
+            "on_screen_code": self.on_screen_code,
+            "transcript_during": self.transcript_during,
+            "language": self.language,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> AudioVisualAlignment:
+        return cls(
+            text_group_id=data["text_group_id"],
+            start_time=data["start_time"],
+            end_time=data["end_time"],
+            on_screen_code=data["on_screen_code"],
+            transcript_during=data.get("transcript_during", ""),
+            language=data.get("language"),
+        )
+
+
+# =============================================================================
+# Core Data Classes
+# =============================================================================
+
+
+@dataclass
+class VideoSegment:
+    """A time-aligned segment combining transcript + visual + metadata."""
+
+    index: int
+    start_time: float
+    end_time: float
+    duration: float
+
+    # Stream 1: ASR (Audio)
+    transcript: str = ""
+    words: list[WordTimestamp] = field(default_factory=list)
+    transcript_confidence: float = 0.0
+
+    # Stream 2: OCR (Visual)
+    keyframes: list[KeyFrame] = field(default_factory=list)
+    ocr_text: str = ""
+    detected_code_blocks: list[CodeBlock] = field(default_factory=list)
+    has_code_on_screen: bool = False
+    has_slides: bool = False
+    has_diagram: bool = False
+
+    # Stream 3: Metadata
+    chapter_title: str | None = None
+    topic: str | None = None
+    category: str | None = None
+
+    # Merged content
+    content: str = ""
+    summary: str | None = None
+
+    # Quality metadata
+    confidence: float = 0.0
+    content_type: SegmentContentType = SegmentContentType.MIXED
+
+    def to_dict(self) -> dict:
+        return {
+            "index": self.index,
+            "start_time": self.start_time,
+            "end_time": self.end_time,
+            "duration": self.duration,
+            "transcript": self.transcript,
+            "words": [w.to_dict() for w in self.words],
+            "transcript_confidence": self.transcript_confidence,
+            "keyframes": [k.to_dict() for k in self.keyframes],
+            "ocr_text": self.ocr_text,
+            "detected_code_blocks": [c.to_dict() for c in self.detected_code_blocks],
+            "has_code_on_screen": self.has_code_on_screen,
+            "has_slides": self.has_slides,
+            "has_diagram": self.has_diagram,
+            "chapter_title": self.chapter_title,
+            "topic": self.topic,
+            "category": self.category,
+            "content": self.content,
+            "summary": self.summary,
+            "confidence": self.confidence,
+            "content_type": self.content_type.value,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> VideoSegment:
+        return cls(
+            index=data["index"],
+            start_time=data["start_time"],
+            end_time=data["end_time"],
+            duration=data["duration"],
+            transcript=data.get("transcript", ""),
+            words=[WordTimestamp.from_dict(w) for w in data.get("words", [])],
+            transcript_confidence=data.get("transcript_confidence", 0.0),
+            keyframes=[KeyFrame.from_dict(k) for k in data.get("keyframes", [])],
+            ocr_text=data.get("ocr_text", ""),
+            detected_code_blocks=[
+                CodeBlock.from_dict(c) for c in data.get("detected_code_blocks", [])
+            ],
+            has_code_on_screen=data.get("has_code_on_screen", False),
+            has_slides=data.get("has_slides", False),
+            has_diagram=data.get("has_diagram", False),
+            chapter_title=data.get("chapter_title"),
+            topic=data.get("topic"),
+            category=data.get("category"),
+            content=data.get("content", ""),
+            summary=data.get("summary"),
+            confidence=data.get("confidence", 0.0),
+            content_type=SegmentContentType(data.get("content_type", "mixed")),
+        )
+
+    @property
+    def timestamp_display(self) -> str:
+        """Human-readable timestamp (e.g., '05:30 - 08:15')."""
+        start_min, start_sec = divmod(int(self.start_time), 60)
+        end_min, end_sec = divmod(int(self.end_time), 60)
+        if self.start_time >= 3600 or self.end_time >= 3600:
+            start_hr, start_min = divmod(start_min, 60)
+            end_hr, end_min = divmod(end_min, 60)
+            return f"{start_hr:d}:{start_min:02d}:{start_sec:02d} - {end_hr:d}:{end_min:02d}:{end_sec:02d}"
+        return f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
+
+
+@dataclass
+class VideoInfo:
+    """Complete metadata and extracted content for a single video."""
+
+    # Identity
+    video_id: str
+    source_type: VideoSourceType
+    source_url: str | None = None
+    file_path: str | None = None
+
+    # Basic metadata
+    title: str = ""
+    description: str = ""
+    duration: float = 0.0
+    upload_date: str | None = None
+    language: str = "en"
+
+    # Channel / Author
+    channel_name: str | None = None
+    channel_url: str | None = None
+
+    # Engagement metadata
+    view_count: int | None = None
+    like_count: int | None = None
+    comment_count: int | None = None
+
+    # Discovery metadata
+    tags: list[str] = field(default_factory=list)
+    categories: list[str] = field(default_factory=list)
+    thumbnail_url: str | None = None
+
+    # Structure
+    chapters: list[Chapter] = field(default_factory=list)
+
+    # Playlist context
+    playlist_title: str | None = None
+    playlist_index: int | None = None
+    playlist_total: int | None = None
+
+    # Extracted content
+    raw_transcript: list[TranscriptSegment] = field(default_factory=list)
+    segments: list[VideoSegment] = field(default_factory=list)
+
+    # Processing metadata
+    transcript_source: TranscriptSource = TranscriptSource.NONE
+    visual_extraction_enabled: bool = False
+    whisper_model: str | None = None
+    processing_time_seconds: float = 0.0
+    extracted_at: str = ""
+
+    # Quality scores
+    transcript_confidence: float = 0.0
+    content_richness_score: float = 0.0
+
+    # Time-clipping metadata (None when full video is used)
+    original_duration: float | None = None
+    clip_start: float | None = None
+    clip_end: float | None = None
+
+    # Consensus-based text tracking (Phase A-D)
+    text_group_timeline: TextGroupTimeline | None = None
+    audio_visual_alignments: list[AudioVisualAlignment] = field(default_factory=list)
+
+    def to_dict(self) -> dict:
+        return {
+            "video_id": self.video_id,
+            "source_type": self.source_type.value,
+            "source_url": self.source_url,
+            "file_path": self.file_path,
+            "title": self.title,
+            "description": self.description,
+            "duration": self.duration,
+            "upload_date": self.upload_date,
+            "language": self.language,
+            "channel_name": self.channel_name,
+            "channel_url": self.channel_url,
+            "view_count": self.view_count,
+            "like_count": self.like_count,
+            "comment_count": self.comment_count,
+            "tags": self.tags,
+            "categories": self.categories,
+            "thumbnail_url": self.thumbnail_url,
+            "chapters": [c.to_dict() for c in self.chapters],
+            "playlist_title": self.playlist_title,
+            "playlist_index": self.playlist_index,
+            "playlist_total": self.playlist_total,
+            "raw_transcript": [t.to_dict() for t in self.raw_transcript],
+            "segments": [s.to_dict() for s in self.segments],
+            "transcript_source": self.transcript_source.value,
+            "visual_extraction_enabled": self.visual_extraction_enabled,
+            "whisper_model": self.whisper_model,
+            "processing_time_seconds": self.processing_time_seconds,
+            "extracted_at": self.extracted_at,
+            "transcript_confidence": self.transcript_confidence,
+            "content_richness_score": self.content_richness_score,
+            "original_duration": self.original_duration,
+            "clip_start": self.clip_start,
+            "clip_end": self.clip_end,
+            "text_group_timeline": self.text_group_timeline.to_dict()
+            if self.text_group_timeline
+            else None,
+            "audio_visual_alignments": [a.to_dict() for a in self.audio_visual_alignments],
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> VideoInfo:
+        timeline_data = data.get("text_group_timeline")
+        timeline = TextGroupTimeline.from_dict(timeline_data) if timeline_data else None
+        return cls(
+            video_id=data["video_id"],
+            source_type=VideoSourceType(data["source_type"]),
+            source_url=data.get("source_url"),
+            file_path=data.get("file_path"),
+            title=data.get("title", ""),
+            description=data.get("description", ""),
+            duration=data.get("duration", 0.0),
+            upload_date=data.get("upload_date"),
+            language=data.get("language", "en"),
+            channel_name=data.get("channel_name"),
+            channel_url=data.get("channel_url"),
+            view_count=data.get("view_count"),
+            like_count=data.get("like_count"),
+            comment_count=data.get("comment_count"),
+            tags=data.get("tags", []),
+            categories=data.get("categories", []),
+            thumbnail_url=data.get("thumbnail_url"),
+            chapters=[Chapter.from_dict(c) for c in data.get("chapters", [])],
+            playlist_title=data.get("playlist_title"),
+            playlist_index=data.get("playlist_index"),
+            playlist_total=data.get("playlist_total"),
+            raw_transcript=[TranscriptSegment.from_dict(t) for t in data.get("raw_transcript", [])],
+            segments=[VideoSegment.from_dict(s) for s in data.get("segments", [])],
+            transcript_source=TranscriptSource(data.get("transcript_source", "none")),
+            visual_extraction_enabled=data.get("visual_extraction_enabled", False),
+            whisper_model=data.get("whisper_model"),
+            processing_time_seconds=data.get("processing_time_seconds", 0.0),
+            extracted_at=data.get("extracted_at", ""),
+            transcript_confidence=data.get("transcript_confidence", 0.0),
+            content_richness_score=data.get("content_richness_score", 0.0),
+            original_duration=data.get("original_duration"),
+            clip_start=data.get("clip_start"),
+            clip_end=data.get("clip_end"),
+            text_group_timeline=timeline,
+            audio_visual_alignments=[
+                AudioVisualAlignment.from_dict(a) for a in data.get("audio_visual_alignments", [])
+            ],
+        )
+
+
+@dataclass
+class VideoSourceConfig:
+    """Configuration for video source processing."""
+
+    # Source specification (exactly one should be set)
+    url: str | None = None
+    playlist: str | None = None
+    channel: str | None = None
+    path: str | None = None
+    directory: str | None = None
+
+    # Identity
+    name: str = "video"
+    description: str = ""
+
+    # Filtering
+    max_videos: int = 50
+    languages: list[str] | None = None
+
+    # Extraction
+    visual_extraction: bool = False
+    whisper_model: str = "base"
+
+    # Segmentation
+    time_window_seconds: float = 120.0
+    min_segment_duration: float = 10.0
+    max_segment_duration: float = 600.0
+
+    # Categorization
+    categories: dict[str, list[str]] | None = None
+
+    # Subtitle files
+    subtitle_patterns: list[str] | None = None
+
+    # Time-clipping (single video only)
+    clip_start: float | None = None
+    clip_end: float | None = None
+
+    @classmethod
+    def from_dict(cls, data: dict) -> VideoSourceConfig:
+        return cls(
+            url=data.get("url"),
+            playlist=data.get("playlist"),
+            channel=data.get("channel"),
+            path=data.get("path"),
+            directory=data.get("directory"),
+            name=data.get("name", "video"),
+            description=data.get("description", ""),
+            max_videos=data.get("max_videos", 50),
+            languages=data.get("languages"),
+            visual_extraction=data.get("visual_extraction", False),
+            whisper_model=data.get("whisper_model", "base"),
+            time_window_seconds=data.get("time_window_seconds", 120.0),
+            min_segment_duration=data.get("min_segment_duration", 10.0),
+            max_segment_duration=data.get("max_segment_duration", 600.0),
+            categories=data.get("categories"),
+            subtitle_patterns=data.get("subtitle_patterns"),
+            clip_start=data.get("clip_start"),
+            clip_end=data.get("clip_end"),
+        )
+
+    def validate(self) -> list[str]:
+        """Validate configuration. Returns list of errors."""
+        errors = []
+        sources_set = sum(
+            1
+            for s in [self.url, self.playlist, self.channel, self.path, self.directory]
+            if s is not None
+        )
+        if sources_set == 0:
+            errors.append(
+                "Video source must specify one of: url, playlist, channel, path, directory"
+            )
+        if sources_set > 1:
+            errors.append("Video source must specify exactly one source type")
+
+        # Clip range validation
+        has_clip = self.clip_start is not None or self.clip_end is not None
+        if has_clip and self.playlist is not None:
+            errors.append(
+                "--start-time/--end-time cannot be used with --playlist. "
+                "Clip range is for single videos only."
+            )
+        if (
+            self.clip_start is not None
+            and self.clip_end is not None
+            and self.clip_start >= self.clip_end
+        ):
+            errors.append(
+                f"--start-time ({self.clip_start}s) must be before --end-time ({self.clip_end}s)"
+            )
+
+        return errors
+
+
+@dataclass
+class VideoScraperResult:
+    """Complete result from the video scraper."""
+
+    videos: list[VideoInfo] = field(default_factory=list)
+    total_duration_seconds: float = 0.0
+    total_segments: int = 0
+    total_code_blocks: int = 0
+    config: VideoSourceConfig | None = None
+    processing_time_seconds: float = 0.0
+    warnings: list[str] = field(default_factory=list)
+    errors: list[dict[str, Any]] = field(default_factory=list)
+
+    def to_dict(self) -> dict:
+        return {
+            "videos": [v.to_dict() for v in self.videos],
+            "total_duration_seconds": self.total_duration_seconds,
+            "total_segments": self.total_segments,
+            "total_code_blocks": self.total_code_blocks,
+            "processing_time_seconds": self.processing_time_seconds,
+            "warnings": self.warnings,
+            "errors": self.errors,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict) -> VideoScraperResult:
+        return cls(
+            videos=[VideoInfo.from_dict(v) for v in data.get("videos", [])],
+            total_duration_seconds=data.get("total_duration_seconds", 0.0),
+            total_segments=data.get("total_segments", 0),
+            total_code_blocks=data.get("total_code_blocks", 0),
+            processing_time_seconds=data.get("processing_time_seconds", 0.0),
+            warnings=data.get("warnings", []),
+            errors=data.get("errors", []),
+        )
--- a/src/skill_seekers/cli/video_scraper.py
+++ b/src/skill_seekers/cli/video_scraper.py
--- a/src/skill_seekers/cli/video_segmenter.py
+++ b/src/skill_seekers/cli/video_segmenter.py
@@ -0,0 +1,231 @@
+"""Video segmentation module.
+
+Aligns transcript + metadata into VideoSegment objects using:
+1. Chapter-based segmentation (primary — uses YouTube chapters)
+2. Time-window segmentation (fallback — fixed-duration windows)
+"""
+
+import logging
+
+from skill_seekers.cli.video_models import (
+    SegmentContentType,
+    TranscriptSegment,
+    VideoInfo,
+    VideoSegment,
+    VideoSourceConfig,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def _classify_content_type(transcript: str) -> SegmentContentType:
+    """Classify segment content type based on transcript text."""
+    lower = transcript.lower()
+
+    code_indicators = ["import ", "def ", "class ", "function ", "const ", "npm ", "pip ", "git "]
+    intro_indicators = ["welcome", "hello", "today we", "in this video", "let's get started"]
+    outro_indicators = ["thanks for watching", "subscribe", "see you next", "that's it for"]
+
+    if any(kw in lower for kw in outro_indicators):
+        return SegmentContentType.OUTRO
+    if any(kw in lower for kw in intro_indicators):
+        return SegmentContentType.INTRO
+    if sum(1 for kw in code_indicators if kw in lower) >= 2:
+        return SegmentContentType.LIVE_CODING
+
+    return SegmentContentType.EXPLANATION
+
+
+def _build_segment_content(
+    transcript: str,
+    chapter_title: str | None,
+    start_time: float,
+    end_time: float,
+) -> str:
+    """Build merged content string for a segment."""
+    parts = []
+
+    # Add chapter heading
+    start_min, start_sec = divmod(int(start_time), 60)
+    end_min, end_sec = divmod(int(end_time), 60)
+    ts = f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
+
+    if chapter_title:
+        parts.append(f"### {chapter_title} ({ts})\n")
+    else:
+        parts.append(f"### Segment ({ts})\n")
+
+    if transcript:
+        parts.append(transcript)
+
+    return "\n".join(parts)
+
+
+def _get_transcript_in_range(
+    transcript_segments: list[TranscriptSegment],
+    start_time: float,
+    end_time: float,
+) -> tuple[str, float]:
+    """Get concatenated transcript text and average confidence for a time range.
+
+    Returns:
+        Tuple of (text, avg_confidence).
+    """
+    texts = []
+    confidences = []
+
+    for seg in transcript_segments:
+        # Check overlap: segment overlaps with time range
+        if seg.end > start_time and seg.start < end_time:
+            texts.append(seg.text)
+            confidences.append(seg.confidence)
+
+    text = " ".join(texts)
+    avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0
+    return text, avg_confidence
+
+
+def segment_by_chapters(
+    video_info: VideoInfo,
+    transcript_segments: list[TranscriptSegment],
+) -> list[VideoSegment]:
+    """Segment video using YouTube chapter boundaries.
+
+    Args:
+        video_info: Video metadata with chapters.
+        transcript_segments: Raw transcript segments.
+
+    Returns:
+        List of VideoSegment objects aligned to chapters.
+    """
+    segments = []
+
+    for i, chapter in enumerate(video_info.chapters):
+        transcript, confidence = _get_transcript_in_range(
+            transcript_segments, chapter.start_time, chapter.end_time
+        )
+
+        content_type = _classify_content_type(transcript)
+        content = _build_segment_content(
+            transcript, chapter.title, chapter.start_time, chapter.end_time
+        )
+
+        segments.append(
+            VideoSegment(
+                index=i,
+                start_time=chapter.start_time,
+                end_time=chapter.end_time,
+                duration=chapter.end_time - chapter.start_time,
+                transcript=transcript,
+                transcript_confidence=confidence,
+                chapter_title=chapter.title,
+                content=content,
+                confidence=confidence,
+                content_type=content_type,
+            )
+        )
+
+    return segments
+
+
+def segment_by_time_window(
+    video_info: VideoInfo,
+    transcript_segments: list[TranscriptSegment],
+    window_seconds: float = 120.0,
+    start_offset: float = 0.0,
+    end_limit: float | None = None,
+) -> list[VideoSegment]:
+    """Segment video using fixed time windows.
+
+    Args:
+        video_info: Video metadata.
+        transcript_segments: Raw transcript segments.
+        window_seconds: Duration of each window in seconds.
+        start_offset: Start segmentation at this time (seconds).
+        end_limit: Stop segmentation at this time (seconds). None = full duration.
+
+    Returns:
+        List of VideoSegment objects.
+    """
+    segments = []
+    duration = video_info.duration
+
+    if duration <= 0 and transcript_segments:
+        duration = max(seg.end for seg in transcript_segments)
+
+    if end_limit is not None:
+        duration = min(duration, end_limit)
+
+    if duration <= 0:
+        return segments
+
+    current_time = start_offset
+    index = 0
+
+    while current_time < duration:
+        end_time = min(current_time + window_seconds, duration)
+
+        transcript, confidence = _get_transcript_in_range(
+            transcript_segments, current_time, end_time
+        )
+
+        if transcript.strip():
+            content_type = _classify_content_type(transcript)
+            content = _build_segment_content(transcript, None, current_time, end_time)
+
+            segments.append(
+                VideoSegment(
+                    index=index,
+                    start_time=current_time,
+                    end_time=end_time,
+                    duration=end_time - current_time,
+                    transcript=transcript,
+                    transcript_confidence=confidence,
+                    content=content,
+                    confidence=confidence,
+                    content_type=content_type,
+                )
+            )
+            index += 1
+
+        current_time = end_time
+
+    return segments
+
+
+def segment_video(
+    video_info: VideoInfo,
+    transcript_segments: list[TranscriptSegment],
+    config: VideoSourceConfig,
+) -> list[VideoSegment]:
+    """Segment a video using the best available strategy.
+
+    Priority:
+    1. Chapter-based (if chapters available)
+    2. Time-window fallback
+
+    Args:
+        video_info: Video metadata.
+        transcript_segments: Raw transcript segments.
+        config: Video source configuration.
+
+    Returns:
+        List of VideoSegment objects.
+    """
+    # Use chapters if available
+    if video_info.chapters:
+        logger.info(f"Using chapter-based segmentation ({len(video_info.chapters)} chapters)")
+        segments = segment_by_chapters(video_info, transcript_segments)
+        if segments:
+            return segments
+
+    # Fallback to time-window
+    window = config.time_window_seconds
+    logger.info(f"Using time-window segmentation ({window}s windows)")
+    return segment_by_time_window(
+        video_info,
+        transcript_segments,
+        window,
+        start_offset=config.clip_start or 0.0,
+        end_limit=config.clip_end,
+    )
--- a/src/skill_seekers/cli/video_setup.py
+++ b/src/skill_seekers/cli/video_setup.py
@@ -0,0 +1,835 @@
+"""GPU auto-detection and video dependency installation.
+
+Detects NVIDIA (CUDA) or AMD (ROCm) GPUs using system tools (without
+requiring torch to be installed) and installs the correct PyTorch variant
+plus all visual extraction dependencies (easyocr, opencv, etc.).
+
+Also handles:
+- Virtual environment creation (if not already in one)
+- System dependency checks (tesseract binary)
+- ROCm environment variable configuration (MIOPEN_FIND_MODE)
+
+Usage:
+    skill-seekers video --setup          # Interactive (all modules)
+    skill-seekers video --setup          # Interactive, choose modules
+    From MCP: run_setup(interactive=False)
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import platform
+import re
+import shutil
+import subprocess
+import sys
+import venv
+from dataclasses import dataclass, field
+from enum import Enum
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+# =============================================================================
+# Data Structures
+# =============================================================================
+
+
+class GPUVendor(Enum):
+    """Detected GPU hardware vendor."""
+
+    NVIDIA = "nvidia"
+    AMD = "amd"
+    NONE = "none"
+
+
+@dataclass
+class GPUInfo:
+    """Result of GPU auto-detection."""
+
+    vendor: GPUVendor
+    name: str = ""
+    compute_version: str = ""
+    index_url: str = ""
+    details: list[str] = field(default_factory=list)
+
+
+@dataclass
+class SetupModules:
+    """Which modules to install during setup."""
+
+    torch: bool = True
+    easyocr: bool = True
+    opencv: bool = True
+    tesseract: bool = True
+    scenedetect: bool = True
+    whisper: bool = True
+
+
+# =============================================================================
+# PyTorch Index URL Mapping
+# =============================================================================
+
+_PYTORCH_BASE = "https://download.pytorch.org/whl"
+
+
+def _cuda_version_to_index_url(version: str) -> str:
+    """Map a CUDA version string to the correct PyTorch index URL."""
+    try:
+        parts = version.split(".")
+        major = int(parts[0])
+        minor = int(parts[1]) if len(parts) > 1 else 0
+        ver = major + minor / 10.0
+    except (ValueError, IndexError):
+        return f"{_PYTORCH_BASE}/cpu"
+
+    if ver >= 12.4:
+        return f"{_PYTORCH_BASE}/cu124"
+    if ver >= 12.1:
+        return f"{_PYTORCH_BASE}/cu121"
+    if ver >= 11.8:
+        return f"{_PYTORCH_BASE}/cu118"
+    return f"{_PYTORCH_BASE}/cpu"
+
+
+def _rocm_version_to_index_url(version: str) -> str:
+    """Map a ROCm version string to the correct PyTorch index URL."""
+    try:
+        parts = version.split(".")
+        major = int(parts[0])
+        minor = int(parts[1]) if len(parts) > 1 else 0
+        ver = major + minor / 10.0
+    except (ValueError, IndexError):
+        return f"{_PYTORCH_BASE}/cpu"
+
+    if ver >= 6.3:
+        return f"{_PYTORCH_BASE}/rocm6.3"
+    if ver >= 6.0:
+        return f"{_PYTORCH_BASE}/rocm6.2.4"
+    return f"{_PYTORCH_BASE}/cpu"
+
+
+# =============================================================================
+# GPU Detection (without torch)
+# =============================================================================
+
+
+def detect_gpu() -> GPUInfo:
+    """Detect GPU vendor and compute version using system tools.
+
+    Detection order:
+    1. nvidia-smi  -> NVIDIA + CUDA version
+    2. rocminfo    -> AMD + ROCm version
+    3. lspci       -> AMD GPU present but no ROCm (warn)
+    4. Fallback    -> CPU-only
+    """
+    # 1. Check NVIDIA
+    nvidia = _check_nvidia()
+    if nvidia is not None:
+        return nvidia
+
+    # 2. Check AMD ROCm
+    amd = _check_amd_rocm()
+    if amd is not None:
+        return amd
+
+    # 3. Check if AMD GPU exists but ROCm isn't installed
+    amd_no_rocm = _check_amd_lspci()
+    if amd_no_rocm is not None:
+        return amd_no_rocm
+
+    # 4. CPU fallback
+    return GPUInfo(
+        vendor=GPUVendor.NONE,
+        name="CPU-only",
+        index_url=f"{_PYTORCH_BASE}/cpu",
+        details=["No GPU detected, will use CPU-only PyTorch"],
+    )
+
+
+def _check_nvidia() -> GPUInfo | None:
+    """Detect NVIDIA GPU via nvidia-smi."""
+    if not shutil.which("nvidia-smi"):
+        return None
+    try:
+        result = subprocess.run(
+            ["nvidia-smi"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        if result.returncode != 0:
+            return None
+
+        output = result.stdout
+        # Parse CUDA version from "CUDA Version: X.Y"
+        cuda_match = re.search(r"CUDA Version:\s*(\d+\.\d+)", output)
+        cuda_ver = cuda_match.group(1) if cuda_match else ""
+
+        # Parse GPU name from the table row (e.g., "NVIDIA GeForce RTX 4090")
+        gpu_name = ""
+        name_match = re.search(r"\|\s+(NVIDIA[^\|]+?)\s+(?:On|Off)\s+\|", output)
+        if name_match:
+            gpu_name = name_match.group(1).strip()
+
+        index_url = _cuda_version_to_index_url(cuda_ver) if cuda_ver else f"{_PYTORCH_BASE}/cpu"
+
+        return GPUInfo(
+            vendor=GPUVendor.NVIDIA,
+            name=gpu_name or "NVIDIA GPU",
+            compute_version=cuda_ver,
+            index_url=index_url,
+            details=[f"CUDA {cuda_ver}" if cuda_ver else "CUDA version unknown"],
+        )
+    except (subprocess.TimeoutExpired, OSError):
+        return None
+
+
+def _check_amd_rocm() -> GPUInfo | None:
+    """Detect AMD GPU via rocminfo."""
+    if not shutil.which("rocminfo"):
+        return None
+    try:
+        result = subprocess.run(
+            ["rocminfo"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        if result.returncode != 0:
+            return None
+
+        output = result.stdout
+        # Parse GPU name from "Name: gfx..." or "Marketing Name: ..."
+        gpu_name = ""
+        marketing_match = re.search(r"Marketing Name:\s*(.+)", output)
+        if marketing_match:
+            gpu_name = marketing_match.group(1).strip()
+
+        # Get ROCm version from /opt/rocm/.info/version
+        rocm_ver = _read_rocm_version()
+
+        index_url = _rocm_version_to_index_url(rocm_ver) if rocm_ver else f"{_PYTORCH_BASE}/cpu"
+
+        return GPUInfo(
+            vendor=GPUVendor.AMD,
+            name=gpu_name or "AMD GPU",
+            compute_version=rocm_ver,
+            index_url=index_url,
+            details=[f"ROCm {rocm_ver}" if rocm_ver else "ROCm version unknown"],
+        )
+    except (subprocess.TimeoutExpired, OSError):
+        return None
+
+
+def _read_rocm_version() -> str:
+    """Read ROCm version from /opt/rocm/.info/version."""
+    try:
+        with open("/opt/rocm/.info/version") as f:
+            return f.read().strip().split("-")[0]
+    except (OSError, IOError):
+        return ""
+
+
+def _check_amd_lspci() -> GPUInfo | None:
+    """Detect AMD GPU via lspci when ROCm isn't installed."""
+    if not shutil.which("lspci"):
+        return None
+    try:
+        result = subprocess.run(
+            ["lspci"],
+            capture_output=True,
+            text=True,
+            timeout=10,
+        )
+        if result.returncode != 0:
+            return None
+
+        # Look for AMD/ATI VGA or Display controllers
+        for line in result.stdout.splitlines():
+            if ("VGA" in line or "Display" in line) and ("AMD" in line or "ATI" in line):
+                return GPUInfo(
+                    vendor=GPUVendor.AMD,
+                    name=line.split(":")[-1].strip() if ":" in line else "AMD GPU",
+                    compute_version="",
+                    index_url=f"{_PYTORCH_BASE}/cpu",
+                    details=[
+                        "AMD GPU detected but ROCm is not installed",
+                        "Install ROCm first for GPU acceleration: https://rocm.docs.amd.com/",
+                        "Falling back to CPU-only PyTorch",
+                    ],
+                )
+    except (subprocess.TimeoutExpired, OSError):
+        pass
+    return None
+
+
+# =============================================================================
+# Virtual Environment
+# =============================================================================
+
+
+def is_in_venv() -> bool:
+    """Check if the current Python process is running inside a venv."""
+    return sys.prefix != sys.base_prefix
+
+
+def create_venv(venv_path: str = ".venv") -> bool:
+    """Create a virtual environment and return True on success."""
+    path = Path(venv_path).resolve()
+    if path.exists():
+        logger.info(f"Venv already exists at {path}")
+        return True
+    try:
+        venv.create(str(path), with_pip=True)
+        return True
+    except Exception as exc:  # noqa: BLE001
+        logger.error(f"Failed to create venv: {exc}")
+        return False
+
+
+def get_venv_python(venv_path: str = ".venv") -> str:
+    """Return the python executable path inside a venv."""
+    path = Path(venv_path).resolve()
+    if platform.system() == "Windows":
+        return str(path / "Scripts" / "python.exe")
+    return str(path / "bin" / "python")
+
+
+def get_venv_activate_cmd(venv_path: str = ".venv") -> str:
+    """Return the shell command to activate the venv."""
+    path = Path(venv_path).resolve()
+    if platform.system() == "Windows":
+        return str(path / "Scripts" / "activate")
+    return f"source {path}/bin/activate"
+
+
+# =============================================================================
+# System Dependency Checks
+# =============================================================================
+
+
+def _detect_distro() -> str:
+    """Detect Linux distro family for install command suggestions."""
+    try:
+        with open("/etc/os-release") as f:
+            content = f.read().lower()
+        if "arch" in content or "manjaro" in content or "endeavour" in content:
+            return "arch"
+        if "debian" in content or "ubuntu" in content or "mint" in content or "pop" in content:
+            return "debian"
+        if "fedora" in content or "rhel" in content or "centos" in content or "rocky" in content:
+            return "fedora"
+        if "opensuse" in content or "suse" in content:
+            return "suse"
+    except OSError:
+        pass
+    return "unknown"
+
+
+def _get_tesseract_install_cmd() -> str:
+    """Return distro-specific command to install tesseract."""
+    distro = _detect_distro()
+    cmds = {
+        "arch": "sudo pacman -S tesseract tesseract-data-eng",
+        "debian": "sudo apt install tesseract-ocr tesseract-ocr-eng",
+        "fedora": "sudo dnf install tesseract tesseract-langpack-eng",
+        "suse": "sudo zypper install tesseract-ocr tesseract-ocr-traineddata-english",
+    }
+    return cmds.get(distro, "Install tesseract-ocr with your package manager")
+
+
+def check_tesseract() -> dict[str, bool | str]:
+    """Check if tesseract binary is installed and has English data.
+
+    Returns dict with keys: installed, has_eng, install_cmd, version.
+    """
+    result: dict[str, bool | str] = {
+        "installed": False,
+        "has_eng": False,
+        "install_cmd": _get_tesseract_install_cmd(),
+        "version": "",
+    }
+
+    tess_bin = shutil.which("tesseract")
+    if not tess_bin:
+        return result
+
+    result["installed"] = True
+
+    # Get version
+    try:
+        ver = subprocess.run(
+            ["tesseract", "--version"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        first_line = (ver.stdout or ver.stderr).split("\n")[0]
+        result["version"] = first_line.strip()
+    except (subprocess.TimeoutExpired, OSError):
+        pass
+
+    # Check for eng language data
+    try:
+        langs = subprocess.run(
+            ["tesseract", "--list-langs"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        output = langs.stdout + langs.stderr
+        result["has_eng"] = "eng" in output.split()
+    except (subprocess.TimeoutExpired, OSError):
+        pass
+
+    return result
+
+
+# =============================================================================
+# ROCm Environment Configuration
+# =============================================================================
+
+
+def configure_rocm_env() -> list[str]:
+    """Set environment variables for ROCm/MIOpen to work correctly.
+
+    Returns list of env vars that were set.
+    """
+    changes: list[str] = []
+
+    # MIOPEN_FIND_MODE=FAST avoids the workspace allocation issue
+    # where MIOpen requires huge workspace but allocates 0 bytes
+    if "MIOPEN_FIND_MODE" not in os.environ:
+        os.environ["MIOPEN_FIND_MODE"] = "FAST"
+        changes.append("MIOPEN_FIND_MODE=FAST")
+
+    # Ensure MIOpen user DB has a writable location
+    if "MIOPEN_USER_DB_PATH" not in os.environ:
+        db_path = os.path.expanduser("~/.config/miopen")
+        os.makedirs(db_path, exist_ok=True)
+        os.environ["MIOPEN_USER_DB_PATH"] = db_path
+        changes.append(f"MIOPEN_USER_DB_PATH={db_path}")
+
+    return changes
+
+
+# =============================================================================
+# Installation
+# =============================================================================
+
+
+_BASE_VIDEO_DEPS = ["yt-dlp", "youtube-transcript-api"]
+
+
+def _build_visual_deps(modules: SetupModules) -> list[str]:
+    """Build the list of pip packages based on selected modules."""
+    # Base video deps are always included — setup must leave video fully ready
+    deps: list[str] = list(_BASE_VIDEO_DEPS)
+    if modules.easyocr:
+        deps.append("easyocr")
+    if modules.opencv:
+        deps.append("opencv-python-headless")
+    if modules.tesseract:
+        deps.append("pytesseract")
+    if modules.scenedetect:
+        deps.append("scenedetect[opencv]")
+    if modules.whisper:
+        deps.append("faster-whisper")
+    return deps
+
+
+def install_torch(gpu_info: GPUInfo, python_exe: str | None = None) -> bool:
+    """Install PyTorch with the correct GPU variant.
+
+    Returns True on success, False on failure.
+    """
+    exe = python_exe or sys.executable
+    cmd = [exe, "-m", "pip", "install", "torch", "torchvision", "--index-url", gpu_info.index_url]
+    logger.info(f"Installing PyTorch from {gpu_info.index_url}")
+    try:
+        result = subprocess.run(cmd, timeout=600, capture_output=True, text=True)
+        if result.returncode != 0:
+            logger.error(f"PyTorch install failed:\n{result.stderr[-500:]}")
+            return False
+        return True
+    except subprocess.TimeoutExpired:
+        logger.error("PyTorch installation timed out (10 min)")
+        return False
+    except OSError as exc:
+        logger.error(f"PyTorch installation error: {exc}")
+        return False
+
+
+def install_visual_deps(
+    modules: SetupModules | None = None, python_exe: str | None = None
+) -> bool:
+    """Install visual extraction dependencies.
+
+    Returns True on success, False on failure.
+    """
+    mods = modules or SetupModules()
+    deps = _build_visual_deps(mods)
+    if not deps:
+        return True
+
+    exe = python_exe or sys.executable
+    cmd = [exe, "-m", "pip", "install"] + deps
+    logger.info(f"Installing visual deps: {', '.join(deps)}")
+    try:
+        result = subprocess.run(cmd, timeout=600, capture_output=True, text=True)
+        if result.returncode != 0:
+            logger.error(f"Visual deps install failed:\n{result.stderr[-500:]}")
+            return False
+        return True
+    except subprocess.TimeoutExpired:
+        logger.error("Visual deps installation timed out (10 min)")
+        return False
+    except OSError as exc:
+        logger.error(f"Visual deps installation error: {exc}")
+        return False
+
+
+def install_skill_seekers(python_exe: str) -> bool:
+    """Install skill-seekers into the target python environment."""
+    cmd = [python_exe, "-m", "pip", "install", "skill-seekers"]
+    try:
+        result = subprocess.run(cmd, timeout=300, capture_output=True, text=True)
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, OSError):
+        return False
+
+
+# =============================================================================
+# Verification
+# =============================================================================
+
+
+def verify_installation() -> dict[str, bool]:
+    """Verify that all video deps are importable.
+
+    Returns a dict mapping package name to import success.
+    """
+    results: dict[str, bool] = {}
+
+    # Base video deps
+    try:
+        import yt_dlp  # noqa: F401
+
+        results["yt-dlp"] = True
+    except ImportError:
+        results["yt-dlp"] = False
+
+    try:
+        import youtube_transcript_api  # noqa: F401
+
+        results["youtube-transcript-api"] = True
+    except ImportError:
+        results["youtube-transcript-api"] = False
+
+    # torch
+    try:
+        import torch
+
+        results["torch"] = True
+        results["torch.cuda"] = torch.cuda.is_available()
+        results["torch.rocm"] = hasattr(torch.version, "hip") and torch.version.hip is not None
+    except ImportError:
+        results["torch"] = False
+        results["torch.cuda"] = False
+        results["torch.rocm"] = False
+
+    # easyocr
+    try:
+        import easyocr  # noqa: F401
+
+        results["easyocr"] = True
+    except ImportError:
+        results["easyocr"] = False
+
+    # opencv
+    try:
+        import cv2  # noqa: F401
+
+        results["opencv"] = True
+    except ImportError:
+        results["opencv"] = False
+
+    # pytesseract
+    try:
+        import pytesseract  # noqa: F401
+
+        results["pytesseract"] = True
+    except ImportError:
+        results["pytesseract"] = False
+
+    # scenedetect
+    try:
+        import scenedetect  # noqa: F401
+
+        results["scenedetect"] = True
+    except ImportError:
+        results["scenedetect"] = False
+
+    # faster-whisper
+    try:
+        import faster_whisper  # noqa: F401
+
+        results["faster-whisper"] = True
+    except ImportError:
+        results["faster-whisper"] = False
+
+    return results
+
+
+# =============================================================================
+# Module Selection (Interactive)
+# =============================================================================
+
+
+def _ask_modules(interactive: bool) -> SetupModules:
+    """Ask the user which modules to install. Returns all if non-interactive."""
+    if not interactive:
+        return SetupModules()
+
+    print("Which modules do you want to install?")
+    print("  [a] All (default)")
+    print("  [c] Choose individually")
+    try:
+        choice = input("  > ").strip().lower()
+    except (EOFError, KeyboardInterrupt):
+        print()
+        return SetupModules()
+
+    if choice not in ("c", "choose"):
+        return SetupModules()
+
+    modules = SetupModules()
+    _ask = _interactive_yn
+
+    modules.torch = _ask("PyTorch (required for easyocr GPU)", default=True)
+    modules.easyocr = _ask("EasyOCR (text extraction from video frames)", default=True)
+    modules.opencv = _ask("OpenCV (frame extraction and image processing)", default=True)
+    modules.tesseract = _ask("pytesseract (secondary OCR engine)", default=True)
+    modules.scenedetect = _ask("scenedetect (scene change detection)", default=True)
+    modules.whisper = _ask("faster-whisper (local audio transcription)", default=True)
+
+    return modules
+
+
+def _interactive_yn(prompt: str, default: bool = True) -> bool:
+    """Ask a yes/no question, return bool."""
+    suffix = "[Y/n]" if default else "[y/N]"
+    try:
+        answer = input(f"  {prompt}? {suffix} ").strip().lower()
+    except (EOFError, KeyboardInterrupt):
+        return default
+    if not answer:
+        return default
+    return answer in ("y", "yes")
+
+
+# =============================================================================
+# Orchestrator
+# =============================================================================
+
+
+def run_setup(interactive: bool = True) -> int:
+    """Auto-detect GPU and install all visual extraction dependencies.
+
+    Handles:
+    1. Venv creation (if not in one)
+    2. GPU detection
+    3. Module selection (optional — interactive only)
+    4. System dep checks (tesseract binary)
+    5. ROCm env var configuration
+    6. PyTorch installation (correct GPU variant)
+    7. Visual deps installation
+    8. Verification
+
+    Args:
+        interactive: If True, prompt user for confirmation before installing.
+
+    Returns:
+        0 on success, 1 on failure.
+    """
+    print("=" * 60)
+    print("  Video Visual Extraction Setup")
+    print("=" * 60)
+    print()
+
+    total_steps = 7
+
+    # ── Step 1: Venv check ──
+    print(f"[1/{total_steps}] Checking environment...")
+    if is_in_venv():
+        print(f"  Already in venv: {sys.prefix}")
+        python_exe = sys.executable
+    else:
+        print("  Not in a virtual environment.")
+        venv_path = ".venv"
+        if interactive:
+            try:
+                answer = input(
+                    f"  Create venv at ./{venv_path}? [Y/n] "
+                ).strip().lower()
+            except (EOFError, KeyboardInterrupt):
+                print("\nSetup cancelled.")
+                return 1
+            if answer and answer not in ("y", "yes"):
+                print("  Continuing without venv (installing to system Python).")
+                python_exe = sys.executable
+            else:
+                if not create_venv(venv_path):
+                    print("  FAILED: Could not create venv.")
+                    return 1
+                python_exe = get_venv_python(venv_path)
+                activate_cmd = get_venv_activate_cmd(venv_path)
+                print(f"  Venv created at ./{venv_path}")
+                print(f"  Installing skill-seekers into venv...")
+                if not install_skill_seekers(python_exe):
+                    print("  FAILED: Could not install skill-seekers into venv.")
+                    return 1
+                print(f"  After setup completes, activate with:")
+                print(f"    {activate_cmd}")
+        else:
+            # Non-interactive: use current python
+            python_exe = sys.executable
+    print()
+
+    # ── Step 2: GPU detection ──
+    print(f"[2/{total_steps}] Detecting GPU...")
+    gpu_info = detect_gpu()
+
+    vendor_label = {
+        GPUVendor.NVIDIA: "NVIDIA (CUDA)",
+        GPUVendor.AMD: "AMD (ROCm)",
+        GPUVendor.NONE: "CPU-only",
+    }
+    print(f"  GPU:    {gpu_info.name}")
+    print(f"  Vendor: {vendor_label.get(gpu_info.vendor, gpu_info.vendor.value)}")
+    if gpu_info.compute_version:
+        print(f"  Version: {gpu_info.compute_version}")
+    for detail in gpu_info.details:
+        print(f"  {detail}")
+    print(f"  PyTorch index: {gpu_info.index_url}")
+    print()
+
+    # ── Step 3: Module selection ──
+    print(f"[3/{total_steps}] Selecting modules...")
+    modules = _ask_modules(interactive)
+    deps = _build_visual_deps(modules)
+    print(f"  Selected: {', '.join(deps) if deps else '(none)'}")
+    if modules.torch:
+        print(f"  + PyTorch + torchvision")
+    print()
+
+    # ── Step 4: System dependency check ──
+    print(f"[4/{total_steps}] Checking system dependencies...")
+    if modules.tesseract:
+        tess = check_tesseract()
+        if not tess["installed"]:
+            print(f"  WARNING: tesseract binary not found!")
+            print(f"  The pytesseract Python package needs the tesseract binary installed.")
+            print(f"  Install it with: {tess['install_cmd']}")
+            print()
+        elif not tess["has_eng"]:
+            print(f"  WARNING: tesseract installed ({tess['version']}) but English data missing!")
+            print(f"  Install with: {tess['install_cmd']}")
+            print()
+        else:
+            print(f"  tesseract: {tess['version']} (eng data OK)")
+    else:
+        print("  tesseract: skipped (not selected)")
+    print()
+
+    # ── Step 5: ROCm configuration ──
+    print(f"[5/{total_steps}] Configuring GPU environment...")
+    if gpu_info.vendor == GPUVendor.AMD:
+        changes = configure_rocm_env()
+        if changes:
+            print("  Set ROCm environment variables:")
+            for c in changes:
+                print(f"    {c}")
+            print("  (These fix MIOpen workspace allocation issues)")
+        else:
+            print("  ROCm env vars already configured.")
+    elif gpu_info.vendor == GPUVendor.NVIDIA:
+        print("  NVIDIA: no extra configuration needed.")
+    else:
+        print("  CPU-only: no GPU configuration needed.")
+    print()
+
+    # ── Step 6: Confirm and install ──
+    if interactive:
+        print("Ready to install. Summary:")
+        if modules.torch:
+            print(f"  - PyTorch + torchvision (from {gpu_info.index_url})")
+        for dep in deps:
+            print(f"  - {dep}")
+        print()
+        try:
+            answer = input("Proceed? [Y/n] ").strip().lower()
+        except (EOFError, KeyboardInterrupt):
+            print("\nSetup cancelled.")
+            return 1
+        if answer and answer not in ("y", "yes"):
+            print("Setup cancelled.")
+            return 1
+        print()
+
+    print(f"[6/{total_steps}] Installing packages...")
+    if modules.torch:
+        print("  Installing PyTorch...")
+        if not install_torch(gpu_info, python_exe):
+            print("  FAILED: PyTorch installation failed.")
+            print(f"  Try: {python_exe} -m pip install torch torchvision --index-url {gpu_info.index_url}")
+            return 1
+        print("  PyTorch installed.")
+
+    if deps:
+        print("  Installing visual packages...")
+        if not install_visual_deps(modules, python_exe):
+            print("  FAILED: Visual packages installation failed.")
+            print(f"  Try: {python_exe} -m pip install {' '.join(deps)}")
+            return 1
+        print("  Visual packages installed.")
+    print()
+
+    # ── Step 7: Verify ──
+    print(f"[7/{total_steps}] Verifying installation...")
+    results = verify_installation()
+    all_ok = True
+    for pkg, ok in results.items():
+        status = "OK" if ok else "MISSING"
+        print(f"  {pkg}: {status}")
+        # torch.cuda / torch.rocm are informational, not required
+        if not ok and pkg not in ("torch.cuda", "torch.rocm"):
+            # Only count as failure if the module was selected
+            if pkg == "torch" and modules.torch:
+                all_ok = False
+            elif pkg == "easyocr" and modules.easyocr:
+                all_ok = False
+            elif pkg == "opencv" and modules.opencv:
+                all_ok = False
+            elif pkg == "pytesseract" and modules.tesseract:
+                all_ok = False
+            elif pkg == "scenedetect" and modules.scenedetect:
+                all_ok = False
+            elif pkg == "faster-whisper" and modules.whisper:
+                all_ok = False
+
+    print()
+    if all_ok:
+        print("Setup complete! You can now use: skill-seekers video --url <URL> --visual")
+        if not is_in_venv() and python_exe != sys.executable:
+            activate_cmd = get_venv_activate_cmd()
+            print(f"\nDon't forget to activate the venv first:")
+            print(f"  {activate_cmd}")
+    else:
+        print("Some packages failed to install. Check the output above.")
+        return 1
+
+    return 0
--- a/src/skill_seekers/cli/video_transcript.py
+++ b/src/skill_seekers/cli/video_transcript.py
@@ -0,0 +1,396 @@
+"""Video transcript extraction module.
+
+Handles all transcript acquisition:
+- YouTube captions via youtube-transcript-api (Tier 1)
+- Subtitle file parsing: SRT and VTT (Tier 1)
+- Whisper ASR stub (Tier 2 — raises ImportError with install instructions)
+"""
+
+import logging
+import re
+from pathlib import Path
+
+from skill_seekers.cli.video_models import (
+    TranscriptSegment,
+    TranscriptSource,
+    VideoInfo,
+    VideoSourceConfig,
+    VideoSourceType,
+)
+
+logger = logging.getLogger(__name__)
+
+# Optional dependency: youtube-transcript-api
+try:
+    from youtube_transcript_api import YouTubeTranscriptApi
+
+    HAS_YOUTUBE_TRANSCRIPT = True
+except ImportError:
+    HAS_YOUTUBE_TRANSCRIPT = False
+
+# Optional dependency: faster-whisper (Tier 2)
+try:
+    from faster_whisper import WhisperModel  # noqa: F401
+
+    HAS_WHISPER = True
+except ImportError:
+    HAS_WHISPER = False
+
+
+# =============================================================================
+# YouTube Transcript Extraction (Tier 1)
+# =============================================================================
+
+
+def extract_youtube_transcript(
+    video_id: str,
+    languages: list[str] | None = None,
+) -> tuple[list[TranscriptSegment], TranscriptSource]:
+    """Fetch YouTube captions via youtube-transcript-api.
+
+    Args:
+        video_id: YouTube video ID (11 chars).
+        languages: Language preference list (e.g., ['en', 'tr']).
+
+    Returns:
+        Tuple of (transcript segments, source type).
+
+    Raises:
+        RuntimeError: If youtube-transcript-api is not installed.
+    """
+    if not HAS_YOUTUBE_TRANSCRIPT:
+        raise RuntimeError(
+            "youtube-transcript-api is required for YouTube transcript extraction.\n"
+            'Install with: pip install "skill-seekers[video]"\n'
+            "Or: pip install youtube-transcript-api"
+        )
+
+    if languages is None:
+        languages = ["en"]
+
+    try:
+        ytt_api = YouTubeTranscriptApi()
+
+        # Use list_transcripts to detect whether the transcript is auto-generated
+        source = TranscriptSource.YOUTUBE_MANUAL
+        try:
+            transcript_list = ytt_api.list(video_id)
+            # Prefer manually created transcripts; fall back to auto-generated
+            try:
+                transcript_entry = transcript_list.find_manually_created_transcript(languages)
+                source = TranscriptSource.YOUTUBE_MANUAL
+            except Exception:
+                try:
+                    transcript_entry = transcript_list.find_generated_transcript(languages)
+                    source = TranscriptSource.YOUTUBE_AUTO
+                except Exception:
+                    # Fall back to any available transcript
+                    transcript_entry = transcript_list.find_transcript(languages)
+                    source = (
+                        TranscriptSource.YOUTUBE_AUTO
+                        if transcript_entry.is_generated
+                        else TranscriptSource.YOUTUBE_MANUAL
+                    )
+            transcript = transcript_entry.fetch()
+        except Exception:
+            # Fall back to direct fetch if list fails (older API versions)
+            transcript = ytt_api.fetch(video_id, languages=languages)
+            # Check is_generated on the FetchedTranscript if available
+            if getattr(transcript, "is_generated", False):
+                source = TranscriptSource.YOUTUBE_AUTO
+
+        segments = []
+        for snippet in transcript.snippets:
+            text = snippet.text.strip()
+            if not text:
+                continue
+            start = snippet.start
+            duration = snippet.duration
+            segments.append(
+                TranscriptSegment(
+                    text=text,
+                    start=start,
+                    end=start + duration,
+                    confidence=1.0,
+                    source=source,
+                )
+            )
+
+        if not segments:
+            return [], TranscriptSource.NONE
+
+        return segments, source
+
+    except Exception as e:
+        logger.warning(f"Failed to fetch YouTube transcript for {video_id}: {e}")
+        return [], TranscriptSource.NONE
+
+
+# =============================================================================
+# Subtitle File Parsing (Tier 1)
+# =============================================================================
+
+
+def _parse_timestamp_srt(ts: str) -> float:
+    """Parse SRT timestamp (HH:MM:SS,mmm) to seconds."""
+    ts = ts.strip().replace(",", ".")
+    parts = ts.split(":")
+    if len(parts) == 3:
+        h, m, s = parts
+        return int(h) * 3600 + int(m) * 60 + float(s)
+    return 0.0
+
+
+def _parse_timestamp_vtt(ts: str) -> float:
+    """Parse VTT timestamp (HH:MM:SS.mmm or MM:SS.mmm) to seconds."""
+    ts = ts.strip()
+    parts = ts.split(":")
+    if len(parts) == 3:
+        h, m, s = parts
+        return int(h) * 3600 + int(m) * 60 + float(s)
+    elif len(parts) == 2:
+        m, s = parts
+        return int(m) * 60 + float(s)
+    return 0.0
+
+
+def parse_srt(path: str) -> list[TranscriptSegment]:
+    """Parse an SRT subtitle file into TranscriptSegments.
+
+    Args:
+        path: Path to .srt file.
+
+    Returns:
+        List of TranscriptSegment objects.
+    """
+    content = Path(path).read_text(encoding="utf-8", errors="replace")
+    segments = []
+
+    # SRT format: index\nstart --> end\ntext\n\n
+    blocks = re.split(r"\n\s*\n", content.strip())
+    for block in blocks:
+        lines = block.strip().split("\n")
+        if len(lines) < 2:
+            continue
+
+        # Find the timestamp line (contains -->)
+        ts_line = None
+        text_lines = []
+        for line in lines:
+            if "-->" in line:
+                ts_line = line
+            elif ts_line is not None:
+                text_lines.append(line)
+
+        if ts_line is None:
+            continue
+
+        parts = ts_line.split("-->")
+        if len(parts) != 2:
+            continue
+
+        start = _parse_timestamp_srt(parts[0])
+        end = _parse_timestamp_srt(parts[1])
+        text = " ".join(text_lines).strip()
+
+        # Remove HTML tags
+        text = re.sub(r"<[^>]+>", "", text)
+
+        if text:
+            segments.append(
+                TranscriptSegment(
+                    text=text,
+                    start=start,
+                    end=end,
+                    confidence=1.0,
+                    source=TranscriptSource.SUBTITLE_FILE,
+                )
+            )
+
+    return segments
+
+
+def parse_vtt(path: str) -> list[TranscriptSegment]:
+    """Parse a WebVTT subtitle file into TranscriptSegments.
+
+    Args:
+        path: Path to .vtt file.
+
+    Returns:
+        List of TranscriptSegment objects.
+    """
+    content = Path(path).read_text(encoding="utf-8", errors="replace")
+    segments = []
+
+    # Skip VTT header
+    lines = content.strip().split("\n")
+    i = 0
+    # Skip WEBVTT header and any metadata
+    while i < len(lines) and not re.match(r"\d{2}:\d{2}", lines[i]):
+        i += 1
+
+    current_text_lines = []
+    current_start = 0.0
+    current_end = 0.0
+    in_cue = False
+
+    while i < len(lines):
+        line = lines[i].strip()
+        i += 1
+
+        if "-->" in line:
+            # Save previous cue
+            if in_cue and current_text_lines:
+                text = " ".join(current_text_lines).strip()
+                text = re.sub(r"<[^>]+>", "", text)
+                if text:
+                    segments.append(
+                        TranscriptSegment(
+                            text=text,
+                            start=current_start,
+                            end=current_end,
+                            confidence=1.0,
+                            source=TranscriptSource.SUBTITLE_FILE,
+                        )
+                    )
+
+            parts = line.split("-->")
+            current_start = _parse_timestamp_vtt(parts[0])
+            current_end = _parse_timestamp_vtt(parts[1].split()[0])
+            current_text_lines = []
+            in_cue = True
+
+        elif line == "":
+            if in_cue and current_text_lines:
+                text = " ".join(current_text_lines).strip()
+                text = re.sub(r"<[^>]+>", "", text)
+                if text:
+                    segments.append(
+                        TranscriptSegment(
+                            text=text,
+                            start=current_start,
+                            end=current_end,
+                            confidence=1.0,
+                            source=TranscriptSource.SUBTITLE_FILE,
+                        )
+                    )
+                current_text_lines = []
+                in_cue = False
+
+        elif in_cue:
+            # Skip cue identifiers (numeric lines before timestamps)
+            if not line.isdigit():
+                current_text_lines.append(line)
+
+    # Handle last cue
+    if in_cue and current_text_lines:
+        text = " ".join(current_text_lines).strip()
+        text = re.sub(r"<[^>]+>", "", text)
+        if text:
+            segments.append(
+                TranscriptSegment(
+                    text=text,
+                    start=current_start,
+                    end=current_end,
+                    confidence=1.0,
+                    source=TranscriptSource.SUBTITLE_FILE,
+                )
+            )
+
+    return segments
+
+
+# =============================================================================
+# Whisper Stub (Tier 2)
+# =============================================================================
+
+
+def transcribe_with_whisper(
+    audio_path: str,  # noqa: ARG001
+    model: str = "base",  # noqa: ARG001
+    language: str | None = None,  # noqa: ARG001
+) -> list[TranscriptSegment]:
+    """Transcribe audio using faster-whisper (Tier 2).
+
+    Raises:
+        RuntimeError: Always, unless faster-whisper is installed.
+    """
+    if not HAS_WHISPER:
+        raise RuntimeError(
+            "faster-whisper is required for Whisper transcription.\n"
+            'Install with: pip install "skill-seekers[video-full]"\n'
+            "Or: pip install faster-whisper"
+        )
+
+    # Tier 2 implementation placeholder
+    raise NotImplementedError("Whisper transcription will be implemented in Tier 2")
+
+
+# =============================================================================
+# Main Entry Point
+# =============================================================================
+
+
+def get_transcript(
+    video_info: VideoInfo,
+    config: VideoSourceConfig,
+) -> tuple[list[TranscriptSegment], TranscriptSource]:
+    """Get transcript for a video, trying available methods in priority order.
+
+    Priority:
+    1. YouTube API (for YouTube videos)
+    2. Subtitle files (SRT/VTT alongside local files)
+    3. Whisper fallback (Tier 2)
+    4. NONE (no transcript available)
+
+    Args:
+        video_info: Video metadata.
+        config: Video source configuration.
+
+    Returns:
+        Tuple of (transcript segments, source type).
+    """
+    languages = config.languages or ["en"]
+
+    # 1. Try YouTube API for YouTube videos
+    if video_info.source_type == VideoSourceType.YOUTUBE and HAS_YOUTUBE_TRANSCRIPT:
+        try:
+            segments, source = extract_youtube_transcript(video_info.video_id, languages)
+            if segments:
+                logger.info(
+                    f"Got {len(segments)} transcript segments via YouTube API "
+                    f"({source.value}) for '{video_info.title}'"
+                )
+                return segments, source
+        except Exception as e:
+            logger.warning(f"YouTube transcript failed: {e}")
+
+    # 2. Try subtitle files for local videos
+    if video_info.file_path:
+        base = Path(video_info.file_path).stem
+        parent = Path(video_info.file_path).parent
+
+        for ext in [".srt", ".vtt"]:
+            sub_path = parent / f"{base}{ext}"
+            if sub_path.exists():
+                logger.info(f"Found subtitle file: {sub_path}")
+                segments = parse_srt(str(sub_path)) if ext == ".srt" else parse_vtt(str(sub_path))
+                if segments:
+                    return segments, TranscriptSource.SUBTITLE_FILE
+
+    # 3. Whisper fallback (Tier 2 — only if installed)
+    if HAS_WHISPER and video_info.file_path:
+        try:
+            segments = transcribe_with_whisper(
+                video_info.file_path,
+                model=config.whisper_model,
+                language=languages[0] if languages else None,
+            )
+            if segments:
+                return segments, TranscriptSource.WHISPER
+        except (RuntimeError, NotImplementedError):
+            pass
+
+    # 4. No transcript available
+    logger.warning(f"No transcript available for '{video_info.title}'")
+    return [], TranscriptSource.NONE
--- a/src/skill_seekers/cli/video_visual.py
+++ b/src/skill_seekers/cli/video_visual.py
--- a/src/skill_seekers/mcp/server_fastmcp.py
+++ b/src/skill_seekers/mcp/server_fastmcp.py
@@ -3,20 +3,21 @@
 Skill Seeker MCP Server (FastMCP Implementation)

 Modern, decorator-based MCP server using FastMCP for simplified tool registration.
-Provides 25 tools for generating Claude AI skills from documentation.
+Provides 33 tools for generating Claude AI skills from documentation.

 This is a streamlined alternative to server.py (2200 lines → 708 lines, 68% reduction).
 All tool implementations are delegated to modular tool files in tools/ directory.

 **Architecture:**
 - FastMCP server with decorator-based tool registration
- 25 tools organized into 6 categories:
+- 33 tools organized into 7 categories:
  * Config tools (3): generate_config, list_configs, validate_config
-  * Scraping tools (8): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
+  * Scraping tools (10): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
  * Packaging tools (4): package_skill, upload_skill, enhance_skill, install_skill
  * Splitting tools (2): split_config, generate_router
-  * Source tools (4): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
+  * Source tools (5): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
  * Vector Database tools (4): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant
+  * Workflow tools (5): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow

 **Usage:**
  # Stdio transport (default, backward compatible)
@@ -98,6 +99,7 @@ try:
        scrape_docs_impl,
        scrape_github_impl,
        scrape_pdf_impl,
+        scrape_video_impl,
        # Splitting tools
        split_config_impl,
        submit_config_impl,
@@ -139,6 +141,7 @@ except ImportError:
        scrape_docs_impl,
        scrape_github_impl,
        scrape_pdf_impl,
+        scrape_video_impl,
        split_config_impl,
        submit_config_impl,
        upload_skill_impl,
@@ -249,7 +252,7 @@ async def validate_config(config_path: str) -> str:


 # ============================================================================
-# SCRAPING TOOLS (4 tools)
+# SCRAPING TOOLS (10 tools)
 # ============================================================================


@@ -420,6 +423,95 @@ async def scrape_pdf(
    return str(result)


+@safe_tool_decorator(
+    description="Extract transcripts and metadata from videos (YouTube, Vimeo, local files) and build Claude skill."
+)
+async def scrape_video(
+    url: str | None = None,
+    video_file: str | None = None,
+    playlist: str | None = None,
+    name: str | None = None,
+    description: str | None = None,
+    languages: str | None = None,
+    from_json: str | None = None,
+    visual: bool = False,
+    whisper_model: str | None = None,
+    visual_interval: float | None = None,
+    visual_min_gap: float | None = None,
+    visual_similarity: float | None = None,
+    vision_ocr: bool = False,
+    start_time: str | None = None,
+    end_time: str | None = None,
+    setup: bool = False,
+) -> str:
+    """
+    Scrape video content and build Claude skill.
+
+    Args:
+        url: Video URL (YouTube, Vimeo)
+        video_file: Local video file path
+        playlist: Playlist URL
+        name: Skill name
+        description: Skill description
+        languages: Transcript language preferences (comma-separated)
+        from_json: Build from extracted JSON file
+        visual: Enable visual frame extraction (requires video-full extras)
+        whisper_model: Whisper model size for local transcription (e.g., base, small, medium, large)
+        visual_interval: Seconds between frame captures (default: 5.0)
+        visual_min_gap: Minimum seconds between kept frames (default: 2.0)
+        visual_similarity: Similarity threshold to skip duplicate frames 0.0-1.0 (default: 0.95)
+        vision_ocr: Use vision model for OCR on extracted frames
+        start_time: Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.
+        end_time: End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.
+        setup: Auto-detect GPU and install visual extraction deps (PyTorch, easyocr, etc.)
+
+    Returns:
+        Video scraping results with file paths.
+    """
+    if setup:
+        from skill_seekers.cli.video_setup import run_setup
+
+        rc = run_setup(interactive=False)
+        return "Setup completed successfully." if rc == 0 else "Setup failed. Check logs."
+
+    args = {}
+    if url:
+        args["url"] = url
+    if video_file:
+        args["video_file"] = video_file
+    if playlist:
+        args["playlist"] = playlist
+    if name:
+        args["name"] = name
+    if description:
+        args["description"] = description
+    if languages:
+        args["languages"] = languages
+    if from_json:
+        args["from_json"] = from_json
+    if start_time:
+        args["start_time"] = start_time
+    if end_time:
+        args["end_time"] = end_time
+    if visual:
+        args["visual"] = visual
+    if whisper_model:
+        args["whisper_model"] = whisper_model
+    if visual_interval is not None:
+        args["visual_interval"] = visual_interval
+    if visual_min_gap is not None:
+        args["visual_min_gap"] = visual_min_gap
+    if visual_similarity is not None:
+        args["visual_similarity"] = visual_similarity
+    if vision_ocr:
+        args["vision_ocr"] = vision_ocr
+
+    result = await scrape_video_impl(args)
+    if isinstance(result, list) and result:
+        return result[0].text if hasattr(result[0], "text") else str(result[0])
+    return str(result)
+
+
@safe_tool_decorator(
    description="Analyze local codebase and extract code knowledge. Walks directory tree, analyzes code files, extracts signatures, docstrings, and optionally generates API reference documentation and dependency graphs."
 )
--- a/src/skill_seekers/mcp/tools/init.py
+++ b/src/skill_seekers/mcp/tools/init.py
@@ -63,6 +63,9 @@ from .scraping_tools import (
 from .scraping_tools import (
    scrape_pdf_tool as scrape_pdf_impl,
 )
+from .scraping_tools import (
+    scrape_video_tool as scrape_video_impl,
+)
 from .source_tools import (
    add_config_source_tool as add_config_source_impl,
 )
@@ -123,6 +126,7 @@ __all__ = [
    "scrape_docs_impl",
    "scrape_github_impl",
    "scrape_pdf_impl",
+    "scrape_video_impl",
    "scrape_codebase_impl",
    "detect_patterns_impl",
    "extract_test_examples_impl",
--- a/src/skill_seekers/mcp/tools/scraping_tools.py
+++ b/src/skill_seekers/mcp/tools/scraping_tools.py
@@ -356,6 +356,124 @@ async def scrape_pdf_tool(args: dict) -> list[TextContent]:
        return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]


+async def scrape_video_tool(args: dict) -> list[TextContent]:
+    """
+    Scrape video content (YouTube, local files) and build Claude skill.
+
+    Extracts transcripts, metadata, and optionally visual content from videos
+    to create skills.
+
+    Args:
+        args: Dictionary containing:
+            - url (str, optional): Video URL (YouTube, Vimeo)
+            - video_file (str, optional): Local video file path
+            - playlist (str, optional): Playlist URL
+            - name (str, optional): Skill name
+            - description (str, optional): Skill description
+            - languages (str, optional): Language preferences (comma-separated)
+            - from_json (str, optional): Build from extracted JSON file
+            - visual (bool, optional): Enable visual frame extraction (default: False)
+            - whisper_model (str, optional): Whisper model size (default: base)
+            - visual_interval (float, optional): Seconds between frame captures (default: 5.0)
+            - visual_min_gap (float, optional): Minimum seconds between kept frames (default: 2.0)
+            - visual_similarity (float, optional): Similarity threshold to skip duplicate frames (default: 0.95)
+            - vision_ocr (bool, optional): Use vision model for OCR on frames (default: False)
+            - start_time (str, optional): Start time for extraction (seconds, MM:SS, or HH:MM:SS)
+            - end_time (str, optional): End time for extraction (seconds, MM:SS, or HH:MM:SS)
+            - setup (bool, optional): Auto-detect GPU and install visual extraction deps
+
+    Returns:
+        List[TextContent]: Tool execution results
+    """
+    # Handle --setup early exit
+    if args.get("setup", False):
+        from skill_seekers.cli.video_setup import run_setup
+
+        rc = run_setup(interactive=False)
+        msg = "Setup completed successfully." if rc == 0 else "Setup failed. Check logs."
+        return [TextContent(type="text", text=msg)]
+
+    url = args.get("url")
+    video_file = args.get("video_file")
+    playlist = args.get("playlist")
+    name = args.get("name")
+    description = args.get("description")
+    languages = args.get("languages")
+    from_json = args.get("from_json")
+    visual = args.get("visual", False)
+    whisper_model = args.get("whisper_model")
+    visual_interval = args.get("visual_interval")
+    visual_min_gap = args.get("visual_min_gap")
+    visual_similarity = args.get("visual_similarity")
+    vision_ocr = args.get("vision_ocr", False)
+    start_time = args.get("start_time")
+    end_time = args.get("end_time")
+
+    # Build command
+    cmd = [sys.executable, str(CLI_DIR / "video_scraper.py")]
+
+    if from_json:
+        cmd.extend(["--from-json", from_json])
+    elif url:
+        cmd.extend(["--url", url])
+        if name:
+            cmd.extend(["--name", name])
+        if description:
+            cmd.extend(["--description", description])
+        if languages:
+            cmd.extend(["--languages", languages])
+    elif video_file:
+        cmd.extend(["--video-file", video_file])
+        if name:
+            cmd.extend(["--name", name])
+        if description:
+            cmd.extend(["--description", description])
+    elif playlist:
+        cmd.extend(["--playlist", playlist])
+        if name:
+            cmd.extend(["--name", name])
+    else:
+        return [
+            TextContent(
+                type="text",
+                text="❌ Error: Must specify --url, --video-file, --playlist, or --from-json",
+            )
+        ]
+
+    # Visual extraction parameters
+    if visual:
+        cmd.append("--visual")
+    if whisper_model:
+        cmd.extend(["--whisper-model", whisper_model])
+    if visual_interval is not None:
+        cmd.extend(["--visual-interval", str(visual_interval)])
+    if visual_min_gap is not None:
+        cmd.extend(["--visual-min-gap", str(visual_min_gap)])
+    if visual_similarity is not None:
+        cmd.extend(["--visual-similarity", str(visual_similarity)])
+    if vision_ocr:
+        cmd.append("--vision-ocr")
+    if start_time:
+        cmd.extend(["--start-time", str(start_time)])
+    if end_time:
+        cmd.extend(["--end-time", str(end_time)])
+
+    # Run video_scraper.py with streaming
+    timeout = 600  # 10 minutes for video extraction
+
+    progress_msg = "🎬 Scraping video content...\n"
+    progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
+
+    stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
+
+    output = progress_msg + stdout
+
+    if returncode == 0:
+        return [TextContent(type="text", text=output)]
+    else:
+        return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
+
+
 async def scrape_github_tool(args: dict) -> list[TextContent]:
    """
    Scrape GitHub repository and build Claude skill.
--- a/src/skill_seekers/workflows/video-tutorial.yaml
+++ b/src/skill_seekers/workflows/video-tutorial.yaml
@@ -0,0 +1,111 @@
+name: video-tutorial
+description: >
+  Video tutorial enhancement workflow. Cleans OCR noise, reconstructs code from
+  transcript + visual data, detects programming languages, and synthesizes a
+  coherent tutorial skill from raw video extraction output.
+version: "1.0"
+applies_to:
+  - video_scraping
+variables: {}
+stages:
+  - name: ocr_code_cleanup
+    type: custom
+    target: skill_md
+    enabled: true
+    uses_history: false
+    prompt: >
+      You are reviewing code blocks extracted from video tutorial OCR.
+      The OCR output is noisy — it contains line numbers, UI chrome text,
+      garbled characters, and incomplete lines.
+
+      Clean each code block by:
+      1. Remove line numbers that OCR captured (leading digits like "1 ", "2 ", "23 ")
+      2. Remove UI elements (tab bar text, file names, button labels)
+      3. Fix common OCR errors (l/1, O/0, rn/m confusions)
+      4. Remove animation timeline numbers or frame counters
+      5. Strip trailing whitespace and normalize indentation
+
+      Output JSON with:
+      - "cleaned_blocks": array of cleaned code strings
+      - "languages_detected": map of block index to detected language
+      - "confidence": overall confidence in the cleanup (0-1)
+
+  - name: language_detection
+    type: custom
+    target: skill_md
+    enabled: true
+    uses_history: true
+    prompt: >
+      Based on the previous OCR cleanup results and the transcript content,
+      determine the programming language for each code block.
+
+      Detection strategy (in priority order):
+      1. Narrator mentions: "in GDScript", "this Python function", "our C# class"
+      2. Code patterns: extends/func/signal=GDScript, def/import=Python,
+         function/const/let=JavaScript, using/namespace=C#
+      3. File extensions visible in OCR (.gd, .py, .js, .cs)
+      4. Framework context from transcript (Godot=GDScript, Unity=C#, Django=Python)
+
+      Output JSON with:
+      - "language_map": map of block index to language identifier
+      - "primary_language": the main language used in the tutorial
+      - "framework": detected framework/engine if any
+
+  - name: tutorial_synthesis
+    type: custom
+    target: skill_md
+    enabled: true
+    uses_history: true
+    prompt: >
+      Synthesize the cleaned code blocks, detected languages, and transcript
+      into a coherent tutorial structure.
+
+      Group content by TOPIC rather than timestamp:
+      1. Identify the main concepts taught in the tutorial
+      2. Group related code blocks under concept headings
+      3. Use narrator explanations as descriptions for each code block
+      4. Build a progressive learning path where concepts build on each other
+      5. Show final working code for each concept, not intermediate OCR states
+
+      Use the Audio-Visual Alignment pairs (code + narrator text) as the
+      primary source for creating annotated examples.
+
+      Output JSON with:
+      - "sections": array of tutorial sections with title, description, code examples
+      - "prerequisites": what the viewer should know beforehand
+      - "key_concepts": important terms and their definitions from the tutorial
+      - "learning_path": ordered list of concept names
+
+  - name: skill_polish
+    type: custom
+    target: skill_md
+    enabled: true
+    uses_history: true
+    prompt: >
+      Using all previous stage results, polish the SKILL.md for this video tutorial.
+
+      Create:
+      1. Clear "When to Use This Skill" with specific trigger conditions
+      2. Quick Reference with 5-10 clean, annotated code examples
+      3. Step-by-step guide following the tutorial flow
+      4. Key concepts with definitions from the narrator
+      5. Proper language tags on all code fences
+
+      Rules:
+      - Never include raw OCR artifacts (line numbers, UI chrome)
+      - Always use correct language tags
+      - Keep code examples short and focused (5-30 lines)
+      - Make it actionable for someone implementing what the tutorial teaches
+
+      Output JSON with:
+      - "improved_overview": enhanced overview section
+      - "quick_start": concise getting-started snippet
+      - "key_concepts": essential concepts with definitions
+      - "code_examples": array of clean, annotated code examples
+
+post_process:
+  reorder_sections: []
+  add_metadata:
+    enhanced: true
+    workflow: video-tutorial
+    source_type: video
--- a/tests/test_cli_parsers.py
+++ b/tests/test_cli_parsers.py
@@ -24,12 +24,12 @@ class TestParserRegistry:

    def test_all_parsers_registered(self):
        """Test that all parsers are registered."""
-        assert len(PARSERS) == 22, f"Expected 22 parsers, got {len(PARSERS)}"
+        assert len(PARSERS) == 23, f"Expected 23 parsers, got {len(PARSERS)}"

    def test_get_parser_names(self):
        """Test getting list of parser names."""
        names = get_parser_names()
-        assert len(names) == 22
+        assert len(names) == 23
        assert "scrape" in names
        assert "github" in names
        assert "package" in names
@@ -37,6 +37,7 @@ class TestParserRegistry:
        assert "analyze" in names
        assert "config" in names
        assert "workflows" in names
+        assert "video" in names

    def test_all_parsers_are_subcommand_parsers(self):
        """Test that all parsers inherit from SubcommandParser."""
@@ -242,9 +243,9 @@ class TestBackwardCompatibility:
            assert cmd in names, f"Command '{cmd}' not found in parser registry!"

    def test_command_count_matches(self):
-        """Test that we have exactly 22 commands (includes new create, workflows, and word commands)."""
-        assert len(PARSERS) == 22
-        assert len(get_parser_names()) == 22
+        """Test that we have exactly 23 commands (includes create, workflows, word, and video commands)."""
+        assert len(PARSERS) == 23
+        assert len(get_parser_names()) == 23


 if __name__ == "__main__":
--- a/tests/test_video_scraper.py
+++ b/tests/test_video_scraper.py
--- a/tests/test_video_setup.py
+++ b/tests/test_video_setup.py
@@ -0,0 +1,679 @@
+#!/usr/bin/env python3
+"""
+Tests for Video Setup (cli/video_setup.py) and video_visual.py resilience.
+
+Tests cover:
+- GPU detection (NVIDIA, AMD ROCm, AMD without ROCm, CPU fallback)
+- CUDA / ROCm version → index URL mapping
+- PyTorch installation (mocked subprocess)
+- Visual deps installation (mocked subprocess)
+- Installation verification
+- run_setup orchestrator
+- Venv detection and creation
+- System dep checks (tesseract binary)
+- ROCm env var configuration
+- Module selection (SetupModules)
+- Tesseract circuit breaker (video_visual.py)
+- --setup flag in VIDEO_ARGUMENTS and early-exit in video_scraper
+"""
+
+import os
+import subprocess
+import sys
+import tempfile
+import unittest
+from unittest.mock import MagicMock, patch
+
+from skill_seekers.cli.video_setup import (
+    _BASE_VIDEO_DEPS,
+    GPUInfo,
+    GPUVendor,
+    SetupModules,
+    _build_visual_deps,
+    _cuda_version_to_index_url,
+    _detect_distro,
+    _PYTORCH_BASE,
+    _rocm_version_to_index_url,
+    check_tesseract,
+    configure_rocm_env,
+    create_venv,
+    detect_gpu,
+    get_venv_activate_cmd,
+    get_venv_python,
+    install_torch,
+    install_visual_deps,
+    is_in_venv,
+    run_setup,
+    verify_installation,
+)
+
+
+# =============================================================================
+# GPU Detection Tests
+# =============================================================================
+
+
+class TestGPUDetection(unittest.TestCase):
+    """Tests for detect_gpu() and its helpers."""
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_nvidia_detected(self, mock_run, mock_which):
+        """nvidia-smi present → GPUVendor.NVIDIA."""
+        mock_which.side_effect = lambda cmd: "/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
+        mock_run.return_value = MagicMock(
+            returncode=0,
+            stdout=(
+                "+-------------------------+\n"
+                "| NVIDIA GeForce RTX 4090  On |\n"
+                "| CUDA Version: 12.4      |\n"
+                "+-------------------------+\n"
+            ),
+        )
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.NVIDIA
+        assert "12.4" in gpu.compute_version
+        assert "cu124" in gpu.index_url
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    @patch("skill_seekers.cli.video_setup._read_rocm_version", return_value="6.3.1")
+    def test_amd_rocm_detected(self, mock_rocm_ver, mock_run, mock_which):
+        """rocminfo present → GPUVendor.AMD."""
+
+        def which_side(cmd):
+            if cmd == "nvidia-smi":
+                return None
+            if cmd == "rocminfo":
+                return "/usr/bin/rocminfo"
+            return None
+
+        mock_which.side_effect = which_side
+        mock_run.return_value = MagicMock(
+            returncode=0,
+            stdout="Marketing Name: AMD Radeon RX 7900 XTX\n",
+        )
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.AMD
+        assert "rocm6.3" in gpu.index_url
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_amd_no_rocm_fallback(self, mock_run, mock_which):
+        """AMD GPU in lspci but no ROCm → AMD vendor, CPU index URL."""
+
+        def which_side(cmd):
+            if cmd == "lspci":
+                return "/usr/bin/lspci"
+            return None
+
+        mock_which.side_effect = which_side
+
+        mock_run.return_value = MagicMock(
+            returncode=0,
+            stdout="06:00.0 VGA compatible controller: AMD/ATI Navi 31 [Radeon RX 7900 XTX]\n",
+        )
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.AMD
+        assert "cpu" in gpu.index_url
+        assert any("ROCm is not installed" in d for d in gpu.details)
+
+    @patch("skill_seekers.cli.video_setup.shutil.which", return_value=None)
+    def test_cpu_fallback(self, mock_which):
+        """No GPU tools found → GPUVendor.NONE."""
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.NONE
+        assert "cpu" in gpu.index_url
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_nvidia_smi_error(self, mock_run, mock_which):
+        """nvidia-smi returns non-zero → skip to next check."""
+        mock_which.side_effect = lambda cmd: (
+            "/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
+        )
+        mock_run.return_value = MagicMock(returncode=1, stdout="")
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.NONE
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_nvidia_smi_timeout(self, mock_run, mock_which):
+        """nvidia-smi times out → skip to next check."""
+        mock_which.side_effect = lambda cmd: (
+            "/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
+        )
+        mock_run.side_effect = subprocess.TimeoutExpired(cmd="nvidia-smi", timeout=10)
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.NONE
+
+    @patch("skill_seekers.cli.video_setup.shutil.which")
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_rocminfo_error(self, mock_run, mock_which):
+        """rocminfo returns non-zero → skip to next check."""
+
+        def which_side(cmd):
+            if cmd == "nvidia-smi":
+                return None
+            if cmd == "rocminfo":
+                return "/usr/bin/rocminfo"
+            return None
+
+        mock_which.side_effect = which_side
+        mock_run.return_value = MagicMock(returncode=1, stdout="")
+        gpu = detect_gpu()
+        assert gpu.vendor == GPUVendor.NONE
+
+
+# =============================================================================
+# Version Mapping Tests
+# =============================================================================
+
+
+class TestVersionMapping(unittest.TestCase):
+    """Tests for CUDA/ROCm version → index URL mapping."""
+
+    def test_cuda_124(self):
+        assert _cuda_version_to_index_url("12.4") == f"{_PYTORCH_BASE}/cu124"
+
+    def test_cuda_126(self):
+        assert _cuda_version_to_index_url("12.6") == f"{_PYTORCH_BASE}/cu124"
+
+    def test_cuda_121(self):
+        assert _cuda_version_to_index_url("12.1") == f"{_PYTORCH_BASE}/cu121"
+
+    def test_cuda_118(self):
+        assert _cuda_version_to_index_url("11.8") == f"{_PYTORCH_BASE}/cu118"
+
+    def test_cuda_old_falls_to_cpu(self):
+        assert _cuda_version_to_index_url("10.2") == f"{_PYTORCH_BASE}/cpu"
+
+    def test_cuda_invalid_string(self):
+        assert _cuda_version_to_index_url("garbage") == f"{_PYTORCH_BASE}/cpu"
+
+    def test_rocm_63(self):
+        assert _rocm_version_to_index_url("6.3.1") == f"{_PYTORCH_BASE}/rocm6.3"
+
+    def test_rocm_60(self):
+        assert _rocm_version_to_index_url("6.0") == f"{_PYTORCH_BASE}/rocm6.2.4"
+
+    def test_rocm_old_falls_to_cpu(self):
+        assert _rocm_version_to_index_url("5.4") == f"{_PYTORCH_BASE}/cpu"
+
+    def test_rocm_invalid(self):
+        assert _rocm_version_to_index_url("bad") == f"{_PYTORCH_BASE}/cpu"
+
+
+# =============================================================================
+# Venv Tests
+# =============================================================================
+
+
+class TestVenv(unittest.TestCase):
+    """Tests for venv detection and creation."""
+
+    def test_is_in_venv_returns_bool(self):
+        result = is_in_venv()
+        assert isinstance(result, bool)
+
+    def test_is_in_venv_detects_prefix_mismatch(self):
+        # If sys.prefix != sys.base_prefix, we're in a venv
+        with patch.object(sys, "prefix", "/some/venv"), \
+             patch.object(sys, "base_prefix", "/usr"):
+            assert is_in_venv() is True
+
+    def test_is_in_venv_detects_no_venv(self):
+        with patch.object(sys, "prefix", "/usr"), \
+             patch.object(sys, "base_prefix", "/usr"):
+            assert is_in_venv() is False
+
+    def test_create_venv_in_tempdir(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            venv_path = os.path.join(tmpdir, "test_venv")
+            result = create_venv(venv_path)
+            assert result is True
+            assert os.path.isdir(venv_path)
+
+    def test_create_venv_already_exists(self):
+        with tempfile.TemporaryDirectory() as tmpdir:
+            # Create it once
+            create_venv(tmpdir)
+            # Creating again should succeed (already exists)
+            assert create_venv(tmpdir) is True
+
+    def test_get_venv_python_linux(self):
+        with patch("skill_seekers.cli.video_setup.platform.system", return_value="Linux"):
+            path = get_venv_python("/path/.venv")
+            assert path.endswith("bin/python")
+
+    def test_get_venv_activate_cmd_linux(self):
+        with patch("skill_seekers.cli.video_setup.platform.system", return_value="Linux"):
+            cmd = get_venv_activate_cmd("/path/.venv")
+            assert "source" in cmd
+            assert "bin/activate" in cmd
+
+
+# =============================================================================
+# System Dep Check Tests
+# =============================================================================
+
+
+class TestSystemDeps(unittest.TestCase):
+    """Tests for system dependency checks."""
+
+    @patch("skill_seekers.cli.video_setup.shutil.which", return_value=None)
+    def test_tesseract_not_installed(self, mock_which):
+        result = check_tesseract()
+        assert result["installed"] is False
+        assert result["has_eng"] is False
+        assert isinstance(result["install_cmd"], str)
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    @patch("skill_seekers.cli.video_setup.shutil.which", return_value="/usr/bin/tesseract")
+    def test_tesseract_installed_with_eng(self, mock_which, mock_run):
+        mock_run.side_effect = [
+            # --version call
+            MagicMock(returncode=0, stdout="tesseract 5.3.0\n", stderr=""),
+            # --list-langs call
+            MagicMock(returncode=0, stdout="List of available languages:\neng\nosd\n", stderr=""),
+        ]
+        result = check_tesseract()
+        assert result["installed"] is True
+        assert result["has_eng"] is True
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    @patch("skill_seekers.cli.video_setup.shutil.which", return_value="/usr/bin/tesseract")
+    def test_tesseract_installed_no_eng(self, mock_which, mock_run):
+        mock_run.side_effect = [
+            MagicMock(returncode=0, stdout="tesseract 5.3.0\n", stderr=""),
+            MagicMock(returncode=0, stdout="List of available languages:\nosd\n", stderr=""),
+        ]
+        result = check_tesseract()
+        assert result["installed"] is True
+        assert result["has_eng"] is False
+
+    def test_detect_distro_returns_string(self):
+        result = _detect_distro()
+        assert isinstance(result, str)
+
+    @patch("builtins.open", side_effect=OSError)
+    def test_detect_distro_no_os_release(self, mock_open):
+        assert _detect_distro() == "unknown"
+
+
+# =============================================================================
+# ROCm Configuration Tests
+# =============================================================================
+
+
+class TestROCmConfig(unittest.TestCase):
+    """Tests for configure_rocm_env()."""
+
+    def test_sets_miopen_find_mode(self):
+        env_backup = os.environ.get("MIOPEN_FIND_MODE")
+        try:
+            os.environ.pop("MIOPEN_FIND_MODE", None)
+            changes = configure_rocm_env()
+            assert "MIOPEN_FIND_MODE=FAST" in changes
+            assert os.environ["MIOPEN_FIND_MODE"] == "FAST"
+        finally:
+            if env_backup is not None:
+                os.environ["MIOPEN_FIND_MODE"] = env_backup
+
+    def test_does_not_override_existing(self):
+        env_backup = os.environ.get("MIOPEN_FIND_MODE")
+        try:
+            os.environ["MIOPEN_FIND_MODE"] = "NORMAL"
+            changes = configure_rocm_env()
+            miopen_changes = [c for c in changes if "MIOPEN_FIND_MODE" in c]
+            assert len(miopen_changes) == 0
+            assert os.environ["MIOPEN_FIND_MODE"] == "NORMAL"
+        finally:
+            if env_backup is not None:
+                os.environ["MIOPEN_FIND_MODE"] = env_backup
+            else:
+                os.environ.pop("MIOPEN_FIND_MODE", None)
+
+    def test_sets_miopen_user_db_path(self):
+        env_backup = os.environ.get("MIOPEN_USER_DB_PATH")
+        try:
+            os.environ.pop("MIOPEN_USER_DB_PATH", None)
+            changes = configure_rocm_env()
+            db_changes = [c for c in changes if "MIOPEN_USER_DB_PATH" in c]
+            assert len(db_changes) == 1
+        finally:
+            if env_backup is not None:
+                os.environ["MIOPEN_USER_DB_PATH"] = env_backup
+
+
+# =============================================================================
+# Module Selection Tests
+# =============================================================================
+
+
+class TestModuleSelection(unittest.TestCase):
+    """Tests for SetupModules and _build_visual_deps."""
+
+    def test_default_modules_all_true(self):
+        m = SetupModules()
+        assert m.torch is True
+        assert m.easyocr is True
+        assert m.opencv is True
+        assert m.tesseract is True
+        assert m.scenedetect is True
+        assert m.whisper is True
+
+    def test_build_all_deps(self):
+        deps = _build_visual_deps(SetupModules())
+        assert "yt-dlp" in deps
+        assert "youtube-transcript-api" in deps
+        assert "easyocr" in deps
+        assert "opencv-python-headless" in deps
+        assert "pytesseract" in deps
+        assert "scenedetect[opencv]" in deps
+        assert "faster-whisper" in deps
+
+    def test_build_no_optional_deps(self):
+        """Even with all optional modules off, base video deps are included."""
+        m = SetupModules(
+            torch=False, easyocr=False, opencv=False,
+            tesseract=False, scenedetect=False, whisper=False,
+        )
+        deps = _build_visual_deps(m)
+        assert deps == list(_BASE_VIDEO_DEPS)
+
+    def test_build_partial_deps(self):
+        m = SetupModules(easyocr=True, opencv=True, tesseract=False, scenedetect=False, whisper=False)
+        deps = _build_visual_deps(m)
+        assert "yt-dlp" in deps
+        assert "youtube-transcript-api" in deps
+        assert "easyocr" in deps
+        assert "opencv-python-headless" in deps
+        assert "pytesseract" not in deps
+        assert "faster-whisper" not in deps
+
+
+# =============================================================================
+# Installation Tests
+# =============================================================================
+
+
+class TestInstallation(unittest.TestCase):
+    """Tests for install_torch() and install_visual_deps()."""
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_torch_success(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
+        assert install_torch(gpu) is True
+        call_args = mock_run.call_args[0][0]
+        assert "torch" in call_args
+        assert "--index-url" in call_args
+        assert f"{_PYTORCH_BASE}/cu124" in call_args
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_torch_cpu(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        gpu = GPUInfo(vendor=GPUVendor.NONE, index_url=f"{_PYTORCH_BASE}/cpu")
+        assert install_torch(gpu) is True
+        call_args = mock_run.call_args[0][0]
+        assert f"{_PYTORCH_BASE}/cpu" in call_args
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_torch_failure(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="error msg")
+        gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
+        assert install_torch(gpu) is False
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_torch_timeout(self, mock_run):
+        mock_run.side_effect = subprocess.TimeoutExpired(cmd="pip", timeout=600)
+        gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
+        assert install_torch(gpu) is False
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_torch_custom_python(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        gpu = GPUInfo(vendor=GPUVendor.NONE, index_url=f"{_PYTORCH_BASE}/cpu")
+        install_torch(gpu, python_exe="/custom/python")
+        call_args = mock_run.call_args[0][0]
+        assert call_args[0] == "/custom/python"
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_visual_deps_success(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        assert install_visual_deps() is True
+        call_args = mock_run.call_args[0][0]
+        assert "easyocr" in call_args
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_visual_deps_failure(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="error")
+        assert install_visual_deps() is False
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_visual_deps_partial_modules(self, mock_run):
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        modules = SetupModules(easyocr=True, opencv=False, tesseract=False, scenedetect=False, whisper=False)
+        install_visual_deps(modules)
+        call_args = mock_run.call_args[0][0]
+        assert "easyocr" in call_args
+        assert "opencv-python-headless" not in call_args
+
+    @patch("skill_seekers.cli.video_setup.subprocess.run")
+    def test_install_visual_deps_base_only(self, mock_run):
+        """Even with all optional modules off, base video deps get installed."""
+        mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
+        modules = SetupModules(easyocr=False, opencv=False, tesseract=False, scenedetect=False, whisper=False)
+        result = install_visual_deps(modules)
+        assert result is True
+        call_args = mock_run.call_args[0][0]
+        assert "yt-dlp" in call_args
+        assert "youtube-transcript-api" in call_args
+        assert "easyocr" not in call_args
+
+
+# =============================================================================
+# Verification Tests
+# =============================================================================
+
+
+class TestVerification(unittest.TestCase):
+    """Tests for verify_installation()."""
+
+    @patch.dict("sys.modules", {"torch": None, "easyocr": None, "cv2": None})
+    def test_returns_dict(self):
+        results = verify_installation()
+        assert isinstance(results, dict)
+
+    def test_expected_keys(self):
+        results = verify_installation()
+        for key in ("yt-dlp", "youtube-transcript-api", "torch", "torch.cuda", "torch.rocm", "easyocr", "opencv"):
+            assert key in results, f"Missing key: {key}"
+
+
+# =============================================================================
+# Orchestrator Tests
+# =============================================================================
+
+
+class TestRunSetup(unittest.TestCase):
+    """Tests for run_setup() orchestrator."""
+
+    @patch("skill_seekers.cli.video_setup.verify_installation")
+    @patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=True)
+    @patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
+    @patch("skill_seekers.cli.video_setup.check_tesseract")
+    @patch("skill_seekers.cli.video_setup.detect_gpu")
+    def test_non_interactive_success(self, mock_detect, mock_tess, mock_torch, mock_deps, mock_verify):
+        mock_detect.return_value = GPUInfo(
+            vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
+        )
+        mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
+        mock_verify.return_value = {
+            "torch": True, "torch.cuda": False, "torch.rocm": False,
+            "easyocr": True, "opencv": True, "pytesseract": True,
+            "scenedetect": True, "faster-whisper": True,
+        }
+        rc = run_setup(interactive=False)
+        assert rc == 0
+        mock_torch.assert_called_once()
+        mock_deps.assert_called_once()
+
+    @patch("skill_seekers.cli.video_setup.install_torch", return_value=False)
+    @patch("skill_seekers.cli.video_setup.check_tesseract")
+    @patch("skill_seekers.cli.video_setup.detect_gpu")
+    def test_failure_returns_nonzero(self, mock_detect, mock_tess, mock_torch):
+        mock_detect.return_value = GPUInfo(
+            vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
+        )
+        mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
+        rc = run_setup(interactive=False)
+        assert rc == 1
+
+    @patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
+    @patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=False)
+    @patch("skill_seekers.cli.video_setup.check_tesseract")
+    @patch("skill_seekers.cli.video_setup.detect_gpu")
+    def test_visual_deps_failure(self, mock_detect, mock_tess, mock_deps, mock_torch):
+        mock_detect.return_value = GPUInfo(
+            vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
+        )
+        mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
+        rc = run_setup(interactive=False)
+        assert rc == 1
+
+    @patch("skill_seekers.cli.video_setup.verify_installation")
+    @patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=True)
+    @patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
+    @patch("skill_seekers.cli.video_setup.check_tesseract")
+    @patch("skill_seekers.cli.video_setup.detect_gpu")
+    def test_rocm_configures_env(self, mock_detect, mock_tess, mock_torch, mock_deps, mock_verify):
+        """AMD GPU → configure_rocm_env called and env vars set."""
+        mock_detect.return_value = GPUInfo(
+            vendor=GPUVendor.AMD, name="RX 7900", index_url=f"{_PYTORCH_BASE}/rocm6.3",
+        )
+        mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
+        mock_verify.return_value = {
+            "torch": True, "torch.cuda": False, "torch.rocm": True,
+            "easyocr": True, "opencv": True, "pytesseract": True,
+            "scenedetect": True, "faster-whisper": True,
+        }
+        rc = run_setup(interactive=False)
+        assert rc == 0
+        assert os.environ.get("MIOPEN_FIND_MODE") is not None
+
+
+# =============================================================================
+# Tesseract Circuit Breaker Tests (video_visual.py)
+# =============================================================================
+
+
+class TestTesseractCircuitBreaker(unittest.TestCase):
+    """Tests for _tesseract_broken flag in video_visual.py."""
+
+    def test_circuit_breaker_flag_exists(self):
+        import skill_seekers.cli.video_visual as vv
+        assert hasattr(vv, "_tesseract_broken")
+
+    def test_circuit_breaker_skips_after_failure(self):
+        import skill_seekers.cli.video_visual as vv
+        from skill_seekers.cli.video_models import FrameType
+
+        # Save and set broken state
+        original = vv._tesseract_broken
+        try:
+            vv._tesseract_broken = True
+            result = vv._run_tesseract_ocr("/nonexistent/path.png", FrameType.CODE_EDITOR)
+            assert result == []
+        finally:
+            vv._tesseract_broken = original
+
+    def test_circuit_breaker_allows_when_not_broken(self):
+        import skill_seekers.cli.video_visual as vv
+        from skill_seekers.cli.video_models import FrameType
+
+        original = vv._tesseract_broken
+        try:
+            vv._tesseract_broken = False
+            if not vv.HAS_PYTESSERACT:
+                # pytesseract not installed → returns [] immediately
+                result = vv._run_tesseract_ocr("/nonexistent/path.png", FrameType.CODE_EDITOR)
+                assert result == []
+            # If pytesseract IS installed, it would try to run and potentially fail
+            # on our fake path — that's fine, the circuit breaker would trigger
+        finally:
+            vv._tesseract_broken = original
+
+
+# =============================================================================
+# MIOPEN Env Var Tests (video_visual.py)
+# =============================================================================
+
+
+class TestMIOPENEnvVars(unittest.TestCase):
+    """Tests that video_visual.py sets MIOPEN env vars at import time."""
+
+    def test_miopen_find_mode_set(self):
+        # video_visual.py sets this at module level before torch import
+        assert "MIOPEN_FIND_MODE" in os.environ
+
+    def test_miopen_user_db_path_set(self):
+        assert "MIOPEN_USER_DB_PATH" in os.environ
+
+
+# =============================================================================
+# Argument & Early-Exit Tests
+# =============================================================================
+
+
+class TestVideoArgumentSetup(unittest.TestCase):
+    """Tests for --setup flag in VIDEO_ARGUMENTS."""
+
+    def test_setup_in_video_arguments(self):
+        from skill_seekers.cli.arguments.video import VIDEO_ARGUMENTS
+
+        assert "setup" in VIDEO_ARGUMENTS
+        assert VIDEO_ARGUMENTS["setup"]["kwargs"]["action"] == "store_true"
+
+    def test_parser_accepts_setup(self):
+        import argparse
+
+        from skill_seekers.cli.arguments.video import add_video_arguments
+
+        parser = argparse.ArgumentParser()
+        add_video_arguments(parser)
+        args = parser.parse_args(["--setup"])
+        assert args.setup is True
+
+    def test_parser_default_false(self):
+        import argparse
+
+        from skill_seekers.cli.arguments.video import add_video_arguments
+
+        parser = argparse.ArgumentParser()
+        add_video_arguments(parser)
+        args = parser.parse_args(["--url", "https://example.com"])
+        assert args.setup is False
+
+
+class TestVideoScraperSetupEarlyExit(unittest.TestCase):
+    """Test that --setup exits before source validation."""
+
+    @patch("skill_seekers.cli.video_setup.run_setup", return_value=0)
+    def test_setup_skips_source_validation(self, mock_setup):
+        """--setup without --url should NOT error about missing source."""
+        from skill_seekers.cli.video_scraper import main
+
+        old_argv = sys.argv
+        try:
+            sys.argv = ["video_scraper", "--setup"]
+            rc = main()
+            assert rc == 0
+            mock_setup.assert_called_once_with(interactive=True)
+        finally:
+            sys.argv = old_argv
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/uv.lock
+++ b/uv.lock
@@ -250,6 +250,63 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f8/00/3ed12264094ec91f534fae429945efbaa9f8c666f3aa7061cc3b2a26a0cd/authlib-1.6.7-py2.py3-none-any.whl", hash = "sha256:c637340d9a02789d2efa1d003a7437d10d3e565237bcb5fcbc6c134c7b95bab0", size = 244115, upload-time = "2026-02-06T14:04:12.141Z" },
 ]

+[[package]]
+name = "av"
+version = "16.1.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/78/cd/3a83ffbc3cc25b39721d174487fb0d51a76582f4a1703f98e46170ce83d4/av-16.1.0.tar.gz", hash = "sha256:a094b4fd87a3721dacf02794d3d2c82b8d712c85b9534437e82a8a978c175ffd", size = 4285203, upload-time = "2026-01-11T07:31:33.772Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/97/51/2217a9249409d2e88e16e3f16f7c0def9fd3e7ffc4238b2ec211f9935bdb/av-16.1.0-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:2395748b0c34fe3a150a1721e4f3d4487b939520991b13e7b36f8926b3b12295", size = 26942590, upload-time = "2026-01-09T20:17:58.588Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/cd/a7070f4febc76a327c38808e01e2ff6b94531fe0b321af54ea3915165338/av-16.1.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:72d7ac832710a158eeb7a93242370aa024a7646516291c562ee7f14a7ea881fd", size = 21507910, upload-time = "2026-01-09T20:18:02.309Z" },
+    { url = "https://files.pythonhosted.org/packages/ae/30/ec812418cd9b297f0238fe20eb0747d8a8b68d82c5f73c56fe519a274143/av-16.1.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:6cbac833092e66b6b0ac4d81ab077970b8ca874951e9c3974d41d922aaa653ed", size = 38738309, upload-time = "2026-01-09T20:18:04.701Z" },
+    { url = "https://files.pythonhosted.org/packages/3a/b8/6c5795bf1f05f45c5261f8bce6154e0e5e86b158a6676650ddd77c28805e/av-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:eb990672d97c18f99c02f31c8d5750236f770ffe354b5a52c5f4d16c5e65f619", size = 40293006, upload-time = "2026-01-09T20:18:07.238Z" },
+    { url = "https://files.pythonhosted.org/packages/a7/44/5e183bcb9333fc3372ee6e683be8b0c9b515a506894b2d32ff465430c074/av-16.1.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:05ad70933ac3b8ef896a820ea64b33b6cca91a5fac5259cb9ba7fa010435be15", size = 40123516, upload-time = "2026-01-09T20:18:09.955Z" },
+    { url = "https://files.pythonhosted.org/packages/12/1d/b5346d582a3c3d958b4d26a2cc63ce607233582d956121eb20d2bbe55c2e/av-16.1.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:d831a1062a3c47520bf99de6ec682bd1d64a40dfa958e5457bb613c5270e7ce3", size = 41463289, upload-time = "2026-01-09T20:18:12.459Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/31/acc946c0545f72b8d0d74584cb2a0ade9b7dfe2190af3ef9aa52a2e3c0b1/av-16.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:358ab910fef3c5a806c55176f2b27e5663b33c4d0a692dafeb049c6ed71f8aff", size = 31754959, upload-time = "2026-01-09T20:18:14.718Z" },
+    { url = "https://files.pythonhosted.org/packages/48/d0/b71b65d1b36520dcb8291a2307d98b7fc12329a45614a303ff92ada4d723/av-16.1.0-cp311-cp311-macosx_11_0_x86_64.whl", hash = "sha256:e88ad64ee9d2b9c4c5d891f16c22ae78e725188b8926eb88187538d9dd0b232f", size = 26927747, upload-time = "2026-01-09T20:18:16.976Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/79/720a5a6ccdee06eafa211b945b0a450e3a0b8fc3d12922f0f3c454d870d2/av-16.1.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:cb296073fa6935724de72593800ba86ae49ed48af03960a4aee34f8a611f442b", size = 21492232, upload-time = "2026-01-09T20:18:19.266Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/4f/a1ba8d922f2f6d1a3d52419463ef26dd6c4d43ee364164a71b424b5ae204/av-16.1.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:720edd4d25aa73723c1532bb0597806d7b9af5ee34fc02358782c358cfe2f879", size = 39291737, upload-time = "2026-01-09T20:18:21.513Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/31/fc62b9fe8738d2693e18d99f040b219e26e8df894c10d065f27c6b4f07e3/av-16.1.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:c7f2bc703d0df260a1fdf4de4253c7f5500ca9fc57772ea241b0cb241bcf972e", size = 40846822, upload-time = "2026-01-09T20:18:24.275Z" },
+    { url = "https://files.pythonhosted.org/packages/53/10/ab446583dbce730000e8e6beec6ec3c2753e628c7f78f334a35cad0317f4/av-16.1.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d69c393809babada7d54964d56099e4b30a3e1f8b5736ca5e27bd7be0e0f3c83", size = 40675604, upload-time = "2026-01-09T20:18:26.866Z" },
+    { url = "https://files.pythonhosted.org/packages/31/d7/1003be685277005f6d63fd9e64904ee222fe1f7a0ea70af313468bb597db/av-16.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:441892be28582356d53f282873c5a951592daaf71642c7f20165e3ddcb0b4c63", size = 42015955, upload-time = "2026-01-09T20:18:29.461Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/4a/fa2a38ee9306bf4579f556f94ecbc757520652eb91294d2a99c7cf7623b9/av-16.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:273a3e32de64819e4a1cd96341824299fe06f70c46f2288b5dc4173944f0fd62", size = 31750339, upload-time = "2026-01-09T20:18:32.249Z" },
+    { url = "https://files.pythonhosted.org/packages/9c/84/2535f55edcd426cebec02eb37b811b1b0c163f26b8d3f53b059e2ec32665/av-16.1.0-cp312-cp312-macosx_11_0_x86_64.whl", hash = "sha256:640f57b93f927fba8689f6966c956737ee95388a91bd0b8c8b5e0481f73513d6", size = 26945785, upload-time = "2026-01-09T20:18:34.486Z" },
+    { url = "https://files.pythonhosted.org/packages/b6/17/ffb940c9e490bf42e86db4db1ff426ee1559cd355a69609ec1efe4d3a9eb/av-16.1.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:ae3fb658eec00852ebd7412fdc141f17f3ddce8afee2d2e1cf366263ad2a3b35", size = 21481147, upload-time = "2026-01-09T20:18:36.716Z" },
+    { url = "https://files.pythonhosted.org/packages/15/c1/e0d58003d2d83c3921887d5c8c9b8f5f7de9b58dc2194356a2656a45cfdc/av-16.1.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:27ee558d9c02a142eebcbe55578a6d817fedfde42ff5676275504e16d07a7f86", size = 39517197, upload-time = "2026-01-11T09:57:31.937Z" },
+    { url = "https://files.pythonhosted.org/packages/32/77/787797b43475d1b90626af76f80bfb0c12cfec5e11eafcfc4151b8c80218/av-16.1.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:7ae547f6d5fa31763f73900d43901e8c5fa6367bb9a9840978d57b5a7ae14ed2", size = 41174337, upload-time = "2026-01-11T09:57:35.792Z" },
+    { url = "https://files.pythonhosted.org/packages/8e/ac/d90df7f1e3b97fc5554cf45076df5045f1e0a6adf13899e10121229b826c/av-16.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8cf065f9d438e1921dc31fc7aa045790b58aee71736897866420d80b5450f62a", size = 40817720, upload-time = "2026-01-11T09:57:39.039Z" },
+    { url = "https://files.pythonhosted.org/packages/80/6f/13c3a35f9dbcebafd03fe0c4cbd075d71ac8968ec849a3cfce406c35a9d2/av-16.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:a345877a9d3cc0f08e2bc4ec163ee83176864b92587afb9d08dff50f37a9a829", size = 42267396, upload-time = "2026-01-11T09:57:42.115Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/b9/275df9607f7fb44317ccb1d4be74827185c0d410f52b6e2cd770fe209118/av-16.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:f49243b1d27c91cd8c66fdba90a674e344eb8eb917264f36117bf2b6879118fd", size = 31752045, upload-time = "2026-01-11T09:57:45.106Z" },
+    { url = "https://files.pythonhosted.org/packages/75/2a/63797a4dde34283dd8054219fcb29294ba1c25d68ba8c8c8a6ae53c62c45/av-16.1.0-cp313-cp313-macosx_11_0_x86_64.whl", hash = "sha256:ce2a1b3d8bf619f6c47a9f28cfa7518ff75ddd516c234a4ee351037b05e6a587", size = 26916715, upload-time = "2026-01-11T09:57:47.682Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/c4/0b49cf730d0ae8cda925402f18ae814aef351f5772d14da72dd87ff66448/av-16.1.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:408dbe6a2573ca58a855eb8cd854112b33ea598651902c36709f5f84c991ed8e", size = 21452167, upload-time = "2026-01-11T09:57:50.606Z" },
+    { url = "https://files.pythonhosted.org/packages/51/23/408806503e8d5d840975aad5699b153aaa21eb6de41ade75248a79b7a37f/av-16.1.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:57f657f86652a160a8a01887aaab82282f9e629abf94c780bbdbb01595d6f0f7", size = 39215659, upload-time = "2026-01-11T09:57:53.757Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/19/a8528d5bba592b3903f44c28dab9cc653c95fcf7393f382d2751a1d1523e/av-16.1.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:adbad2b355c2ee4552cac59762809d791bda90586d134a33c6f13727fb86cb3a", size = 40874970, upload-time = "2026-01-11T09:57:56.802Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/24/2dbcdf0e929ad56b7df078e514e7bd4ca0d45cba798aff3c8caac097d2f7/av-16.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f42e1a68ec2aebd21f7eb6895be69efa6aa27eec1670536876399725bbda4b99", size = 40530345, upload-time = "2026-01-11T09:58:00.421Z" },
+    { url = "https://files.pythonhosted.org/packages/54/27/ae91b41207f34e99602d1c72ab6ffd9c51d7c67e3fbcd4e3a6c0e54f882c/av-16.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:58fe47aeaef0f100c40ec8a5de9abbd37f118d3ca03829a1009cf288e9aef67c", size = 41972163, upload-time = "2026-01-11T09:58:03.756Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/7a/22158fb923b2a9a00dfab0e96ef2e8a1763a94dd89e666a5858412383d46/av-16.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:565093ebc93b2f4b76782589564869dadfa83af5b852edebedd8fee746457d06", size = 31729230, upload-time = "2026-01-11T09:58:07.254Z" },
+    { url = "https://files.pythonhosted.org/packages/7f/f1/878f8687d801d6c4565d57ebec08449c46f75126ebca8e0fed6986599627/av-16.1.0-cp313-cp313t-macosx_11_0_x86_64.whl", hash = "sha256:574081a24edb98343fd9f473e21ae155bf61443d4ec9d7708987fa597d6b04b2", size = 27008769, upload-time = "2026-01-11T09:58:10.266Z" },
+    { url = "https://files.pythonhosted.org/packages/30/f1/bd4ce8c8b5cbf1d43e27048e436cbc9de628d48ede088a1d0a993768eb86/av-16.1.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:9ab00ea29c25ebf2ea1d1e928d7babb3532d562481c5d96c0829212b70756ad0", size = 21590588, upload-time = "2026-01-11T09:58:12.629Z" },
+    { url = "https://files.pythonhosted.org/packages/1d/dd/c81f6f9209201ff0b5d5bed6da6c6e641eef52d8fbc930d738c3f4f6f75d/av-16.1.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:a84a91188c1071f238a9523fd42dbe567fb2e2607b22b779851b2ce0eac1b560", size = 40638029, upload-time = "2026-01-11T09:58:15.399Z" },
+    { url = "https://files.pythonhosted.org/packages/15/4d/07edff82b78d0459a6e807e01cd280d3180ce832efc1543de80d77676722/av-16.1.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:c2cd0de4dd022a7225ff224fde8e7971496d700be41c50adaaa26c07bb50bf97", size = 41970776, upload-time = "2026-01-11T09:58:19.075Z" },
+    { url = "https://files.pythonhosted.org/packages/da/9d/1f48b354b82fa135d388477cd1b11b81bdd4384bd6a42a60808e2ec2d66b/av-16.1.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:0816143530624a5a93bc5494f8c6eeaf77549b9366709c2ac8566c1e9bff6df5", size = 41764751, upload-time = "2026-01-11T09:58:22.788Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/c7/a509801e98db35ec552dd79da7bdbcff7104044bfeb4c7d196c1ce121593/av-16.1.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:e3a28053af29644696d0c007e897d19b1197585834660a54773e12a40b16974c", size = 43034355, upload-time = "2026-01-11T09:58:26.125Z" },
+    { url = "https://files.pythonhosted.org/packages/36/8b/e5f530d9e8f640da5f5c5f681a424c65f9dd171c871cd255d8a861785a6e/av-16.1.0-cp313-cp313t-win_amd64.whl", hash = "sha256:2e3e67144a202b95ed299d165232533989390a9ea3119d37eccec697dc6dbb0c", size = 31947047, upload-time = "2026-01-11T09:58:31.867Z" },
+    { url = "https://files.pythonhosted.org/packages/df/18/8812221108c27d19f7e5f486a82c827923061edf55f906824ee0fcaadf50/av-16.1.0-cp314-cp314-macosx_11_0_x86_64.whl", hash = "sha256:39a634d8e5a87e78ea80772774bfd20c0721f0d633837ff185f36c9d14ffede4", size = 26916179, upload-time = "2026-01-11T09:58:36.506Z" },
+    { url = "https://files.pythonhosted.org/packages/38/ef/49d128a9ddce42a2766fe2b6595bd9c49e067ad8937a560f7838a541464e/av-16.1.0-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:0ba32fb9e9300948a7fa9f8a3fc686e6f7f77599a665c71eb2118fdfd2c743f9", size = 21460168, upload-time = "2026-01-11T09:58:39.231Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/a9/b310d390844656fa74eeb8c2750e98030877c75b97551a23a77d3f982741/av-16.1.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:ca04d17815182d34ce3edc53cbda78a4f36e956c0fd73e3bab249872a831c4d7", size = 39210194, upload-time = "2026-01-11T09:58:42.138Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/7b/e65aae179929d0f173af6e474ad1489b5b5ad4c968a62c42758d619e54cf/av-16.1.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:ee0e8de2e124a9ef53c955fe2add6ee7c56cc8fd83318265549e44057db77142", size = 40811675, upload-time = "2026-01-11T09:58:45.871Z" },
+    { url = "https://files.pythonhosted.org/packages/54/3f/5d7edefd26b6a5187d6fac0f5065ee286109934f3dea607ef05e53f05b31/av-16.1.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:22bf77a2f658827043a1e184b479c3bf25c4c43ab32353677df2d119f080e28f", size = 40543942, upload-time = "2026-01-11T09:58:49.759Z" },
+    { url = "https://files.pythonhosted.org/packages/1b/24/f8b17897b67be0900a211142f5646a99d896168f54d57c81f3e018853796/av-16.1.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2dd419d262e6a71cab206d80bbf28e0a10d0f227b671cdf5e854c028faa2d043", size = 41924336, upload-time = "2026-01-11T09:58:53.344Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/cf/d32bc6bbbcf60b65f6510c54690ed3ae1c4ca5d9fafbce835b6056858686/av-16.1.0-cp314-cp314-win_amd64.whl", hash = "sha256:53585986fd431cd436f290fba662cfb44d9494fbc2949a183de00acc5b33fa88", size = 31735077, upload-time = "2026-01-11T09:58:56.684Z" },
+    { url = "https://files.pythonhosted.org/packages/53/f4/9b63dc70af8636399bd933e9df4f3025a0294609510239782c1b746fc796/av-16.1.0-cp314-cp314t-macosx_11_0_x86_64.whl", hash = "sha256:76f5ed8495cf41e1209a5775d3699dc63fdc1740b94a095e2485f13586593205", size = 27014423, upload-time = "2026-01-11T09:58:59.703Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/da/787a07a0d6ed35a0888d7e5cfb8c2ffa202f38b7ad2c657299fac08eb046/av-16.1.0-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:8d55397190f12a1a3ae7538be58c356cceb2bf50df1b33523817587748ce89e5", size = 21595536, upload-time = "2026-01-11T09:59:02.508Z" },
+    { url = "https://files.pythonhosted.org/packages/d8/f4/9a7d8651a611be6e7e3ab7b30bb43779899c8cac5f7293b9fb634c44a3f3/av-16.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:9d51d9037437218261b4bbf9df78a95e216f83d7774fbfe8d289230b5b2e28e2", size = 40642490, upload-time = "2026-01-11T09:59:05.842Z" },
+    { url = "https://files.pythonhosted.org/packages/6b/e4/eb79bc538a94b4ff93cd4237d00939cba797579f3272490dd0144c165a21/av-16.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:0ce07a89c15644407f49d942111ca046e323bbab0a9078ff43ee57c9b4a50dad", size = 41976905, upload-time = "2026-01-11T09:59:09.169Z" },
+    { url = "https://files.pythonhosted.org/packages/5e/f5/f6db0dd86b70167a4d55ee0d9d9640983c570d25504f2bde42599f38241e/av-16.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:cac0c074892ea97113b53556ff41c99562db7b9f09f098adac1f08318c2acad5", size = 41770481, upload-time = "2026-01-11T09:59:12.74Z" },
+    { url = "https://files.pythonhosted.org/packages/9e/8b/33651d658e45e16ab7671ea5fcf3d20980ea7983234f4d8d0c63c65581a5/av-16.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7dec3dcbc35a187ce450f65a2e0dda820d5a9e6553eea8344a1459af11c98649", size = 43036824, upload-time = "2026-01-11T09:59:16.507Z" },
+    { url = "https://files.pythonhosted.org/packages/83/41/7f13361db54d7e02f11552575c0384dadaf0918138f4eaa82ea03a9f9580/av-16.1.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6f90dc082ff2068ddbe77618400b44d698d25d9c4edac57459e250c16b33d700", size = 31948164, upload-time = "2026-01-11T09:59:19.501Z" },
+]
+
 [[package]]
 name = "azure-core"
 version = "1.38.0"
@@ -871,6 +928,49 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/0d/c3/e90f4a4feae6410f914f8ebac129b9ae7a8c92eb60a638012dde42030a9d/cryptography-46.0.3-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:6b5063083824e5509fdba180721d55909ffacccc8adbec85268b48439423d78c", size = 3438528, upload-time = "2025-10-15T23:18:26.227Z" },
 ]

+[[package]]
+name = "ctranslate2"
+version = "4.7.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+    { name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+    { name = "pyyaml" },
+    { name = "setuptools" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/cb/e0/b69c40c3d739b213a78d327071240590792071b4f890e34088b03b95bb1e/ctranslate2-4.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:9017a355dd7c6d29dc3bca6e9fc74827306c61b702c66bb1f6b939655e7de3fa", size = 1255773, upload-time = "2026-02-04T06:11:04.769Z" },
+    { url = "https://files.pythonhosted.org/packages/51/29/e5c2fc1253e3fb9b2c86997f36524bba182a8ed77fb4f8fe8444a5649191/ctranslate2-4.7.1-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:6abcd0552285e7173475836f9d133e04dfc3e42ca8e6930f65eaa4b8b13a47fa", size = 11914945, upload-time = "2026-02-04T06:11:06.853Z" },
+    { url = "https://files.pythonhosted.org/packages/03/25/e7fe847d3f02c84d2e9c5e8312434fbeab5af3d8916b6c8e2bdbe860d052/ctranslate2-4.7.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8492cba605319e0d7f2760180957d5a2a435dfdebcef1a75d2ade740e6b9fb0b", size = 16547973, upload-time = "2026-02-04T06:11:09.021Z" },
+    { url = "https://files.pythonhosted.org/packages/68/75/074ed22bc340c2e26c09af6bf85859b586516e4e2d753b20189936d0dcf7/ctranslate2-4.7.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:688bd82482b5d057eff5bc1e727f11bb9a1277b7e4fce8ab01fd3bb70e69294b", size = 38636471, upload-time = "2026-02-04T06:11:12.146Z" },
+    { url = "https://files.pythonhosted.org/packages/76/b6/9baf8a565f6dcdbfbc9cfd179dd6214529838cda4e91e89b616045a670f0/ctranslate2-4.7.1-cp310-cp310-win_amd64.whl", hash = "sha256:3b39a5f4e3c87ac91976996458a64ba08a7cbf974dc0be4e6df83a9e040d4bd2", size = 18842389, upload-time = "2026-02-04T06:11:15.154Z" },
+    { url = "https://files.pythonhosted.org/packages/da/25/41920ccee68e91cb6fa0fc9e8078ab2b7839f2c668f750dc123144cb7c6e/ctranslate2-4.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:f74200bab9996b14a57cf6f7cb27d0921ceedc4acc1e905598e3e85b4d75b1ec", size = 1256943, upload-time = "2026-02-04T06:11:17.781Z" },
+    { url = "https://files.pythonhosted.org/packages/79/22/bc81fcc9f10ba4da3ffd1a9adec15cfb73cb700b3bbe69c6c8b55d333316/ctranslate2-4.7.1-cp311-cp311-macosx_11_0_x86_64.whl", hash = "sha256:59b427eb3ac999a746315b03a63942fddd351f511db82ba1a66880d4dea98e25", size = 11916445, upload-time = "2026-02-04T06:11:19.938Z" },
+    { url = "https://files.pythonhosted.org/packages/0a/a7/494a66bb02c7926331cadfff51d5ce81f5abfb1e8d05d7f2459082f31b48/ctranslate2-4.7.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:95f0c1051c180669d2a83a44b44b518b2d1683de125f623bbc81ad5dd6f6141c", size = 16696997, upload-time = "2026-02-04T06:11:22.697Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/4e/b48f79fd36e5d3c7e12db383aa49814c340921a618ef7364bd0ced670644/ctranslate2-4.7.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0ed92d9ab0ac6bc7005942be83d68714c80adb0897ab17f98157294ee0374347", size = 38836379, upload-time = "2026-02-04T06:11:26.325Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/23/8c01ac52e1f26fc4dbe985a35222ae7cd365bbf7ee5db5fd5545d8926f91/ctranslate2-4.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:67d9ad9b69933fbfeee7dcec899b2cd9341d5dca4fdfb53e8ba8c109dc332ee1", size = 18843315, upload-time = "2026-02-04T06:11:29.441Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/0f/581de94b64c5f2327a736270bc7e7a5f8fe5cf1ed56a2203b52de4d8986a/ctranslate2-4.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4c0cbd46a23b8dc37ccdbd9b447cb5f7fadc361c90e9df17d82ca84b1f019986", size = 1257089, upload-time = "2026-02-04T06:11:32.442Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/e9/d55b0e436362f9fe26bd98fefd2dd5d81926121f1d7f799c805e6035bb26/ctranslate2-4.7.1-cp312-cp312-macosx_11_0_x86_64.whl", hash = "sha256:5b141ddad1da5f84cf3c2a569a56227a37de649a555d376cbd9b80e8f0373dd8", size = 11918502, upload-time = "2026-02-04T06:11:33.986Z" },
+    { url = "https://files.pythonhosted.org/packages/ec/ce/9f29f0b0bb4280c2ebafb3ddb6cdff8ef1c2e185ee020c0ec0ecba7dc934/ctranslate2-4.7.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d00a62544db4a3caaa58a3c50d39b25613c042b430053ae32384d94eb1d40990", size = 16859601, upload-time = "2026-02-04T06:11:36.227Z" },
+    { url = "https://files.pythonhosted.org/packages/b3/86/428d270fd72117d19fb48ed3211aa8a3c8bd7577373252962cb634e0fd01/ctranslate2-4.7.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:722b93a89647974cbd182b4c7f87fefc7794fff7fc9cbd0303b6447905cc157e", size = 38995338, upload-time = "2026-02-04T06:11:42.789Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/f4/d23dbfb9c62cb642c114a30f05d753ba61d6ffbfd8a3a4012fe85a073bcb/ctranslate2-4.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:d0f734dc3757118094663bdaaf713f5090c55c1927fb330a76bb8b84173940e8", size = 18844949, upload-time = "2026-02-04T06:11:45.436Z" },
+    { url = "https://files.pythonhosted.org/packages/34/6d/eb49ba05db286b4ea9d5d3fcf5f5cd0a9a5e218d46349618d5041001e303/ctranslate2-4.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:6b2abf2929756e3ec6246057b56df379995661560a2d776af05f9d97f63afcf5", size = 1256960, upload-time = "2026-02-04T06:11:47.487Z" },
+    { url = "https://files.pythonhosted.org/packages/45/5a/b9cce7b00d89fc6fdeaf27587aa52d0597b465058563e93ff50910553bdd/ctranslate2-4.7.1-cp313-cp313-macosx_11_0_x86_64.whl", hash = "sha256:857ef3959d6b1c40dc227c715a36db33db2d097164996d6c75b6db8e30828f52", size = 11918645, upload-time = "2026-02-04T06:11:49.599Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/03/c0db0a5276599fb44ceafa2f2cb1afd5628808ec406fe036060a39693680/ctranslate2-4.7.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:393a9e7e989034660526a2c0e8bb65d1924f43d9a5c77d336494a353d16ba2a4", size = 16860452, upload-time = "2026-02-04T06:11:52.276Z" },
+    { url = "https://files.pythonhosted.org/packages/0b/03/4e3728ce29d192ee75ed9a2d8589bf4f19edafe5bed3845187de51b179a3/ctranslate2-4.7.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5a3d0682f2b9082e31c73d75b45f16cde77355ab76d7e8356a24c3cb2480a6d3", size = 38995174, upload-time = "2026-02-04T06:11:55.477Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/15/6e8e87c6a201d69803a79ac2e29623ce7c2cc9cd1df9db99810cca714373/ctranslate2-4.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:baa6d2b10f57933d8c11791e8522659217918722d07bbef2389a443801125fe7", size = 18844953, upload-time = "2026-02-04T06:11:58.519Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/73/8a6b7ba18cad0c8667ee221ddab8c361cb70926440e5b8dd0e81924c28ac/ctranslate2-4.7.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:d5dfb076566551f4959dfd0706f94c923c1931def9b7bb249a2caa6ab23353a0", size = 1257560, upload-time = "2026-02-04T06:12:00.926Z" },
+    { url = "https://files.pythonhosted.org/packages/70/c2/8817ca5d6c1b175b23a12f7c8b91484652f8718a76353317e5919b038733/ctranslate2-4.7.1-cp314-cp314-macosx_11_0_x86_64.whl", hash = "sha256:eecdb4ed934b384f16e8c01b185b082d6b5ffc7dcbb0b6a6eb48cd465282d957", size = 11918995, upload-time = "2026-02-04T06:12:02.875Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/33/b8eb3acc67bbca4d9872fc9ff94db78e6167a7ba5cd932f585d1560effc7/ctranslate2-4.7.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1aa6796edcc3c8d163c9e39c429d50076d266d68980fed9d1b2443f617c67e9e", size = 16844162, upload-time = "2026-02-04T06:12:05.099Z" },
+    { url = "https://files.pythonhosted.org/packages/80/11/6474893b07121057035069a0a483fe1cd8c47878213f282afb4c0c6fc275/ctranslate2-4.7.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:24c0482c51726430fb83724451921c0e539d769c8618dcfd46b1645e7f75960d", size = 38966728, upload-time = "2026-02-04T06:12:07.923Z" },
+    { url = "https://files.pythonhosted.org/packages/94/88/8fc7ff435c5e783e5fad9586d839d463e023988dbbbad949d442092d01f1/ctranslate2-4.7.1-cp314-cp314-win_amd64.whl", hash = "sha256:76db234c0446a23d20dd8eeaa7a789cc87d1d05283f48bf3152bae9fa0a69844", size = 19100788, upload-time = "2026-02-04T06:12:10.592Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/b3/f100013a76a98d64e67c721bd4559ea4eeb54be3e4ac45f4d801769899af/ctranslate2-4.7.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:058c9db2277dc8b19ecc86c7937628f69022f341844b9081d2ab642965d88fc6", size = 1280179, upload-time = "2026-02-04T06:12:12.596Z" },
+    { url = "https://files.pythonhosted.org/packages/39/22/b77f748015667a5e2ca54a5ee080d7016fce34314f0e8cf904784549305a/ctranslate2-4.7.1-cp314-cp314t-macosx_11_0_x86_64.whl", hash = "sha256:5abcf885062c7f28a3f9a46be8d185795e8706ac6230ad086cae0bc82917df31", size = 11940166, upload-time = "2026-02-04T06:12:14.054Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/78/6d7fd52f646c6ba3343f71277a9bbef33734632949d1651231948b0f0359/ctranslate2-4.7.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9950acb04a002d5c60ae90a1ddceead1a803af1f00cadd9b1a1dc76e1f017481", size = 16849483, upload-time = "2026-02-04T06:12:17.082Z" },
+    { url = "https://files.pythonhosted.org/packages/40/27/58769ff15ac31b44205bd7a8aeca80cf7357c657ea5df1b94ce0f5c83771/ctranslate2-4.7.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1dcc734e92e3f1ceeaa0c42bbfd009352857be179ecd4a7ed6cccc086a202f58", size = 38949393, upload-time = "2026-02-04T06:12:21.302Z" },
+    { url = "https://files.pythonhosted.org/packages/0e/5c/9fa0ad6462b62efd0fb5ac1100eee47bc96ecc198ff4e237c731e5473616/ctranslate2-4.7.1-cp314-cp314t-win_amd64.whl", hash = "sha256:dfb7657bdb7b8211c8f9ecb6f3b70bc0db0e0384d01a8b1808cb66fe7199df59", size = 19123451, upload-time = "2026-02-04T06:12:24.115Z" },
+]
+
 [[package]]
 name = "cuda-bindings"
 version = "12.9.4"
@@ -1006,6 +1106,22 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ae/8b/c8050e556f5d7a1f33a93c2c94379a0bae23c58a79ad9709d7e052d0c3b8/fastapi-0.128.4-py3-none-any.whl", hash = "sha256:9321282cee605fd2075ccbc95c0f2e549d675c59de4a952bba202cd1730ac66b", size = 103684, upload-time = "2026-02-07T08:14:07.939Z" },
 ]

+[[package]]
+name = "faster-whisper"
+version = "1.2.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "av" },
+    { name = "ctranslate2" },
+    { name = "huggingface-hub" },
+    { name = "onnxruntime" },
+    { name = "tokenizers" },
+    { name = "tqdm" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/05/99/49ee85903dee060d9f08297b4a342e5e0bcfca2f027a07b4ee0a38ab13f9/faster_whisper-1.2.1-py3-none-any.whl", hash = "sha256:79a66ad50688c0b794dd501dc340a736992a6342f7f95e5811be60b5224a26a7", size = 1118909, upload-time = "2025-10-31T11:35:47.794Z" },
+]
+
 [[package]]
 name = "ffmpeg-python"
 version = "0.2.0"
@@ -3391,6 +3507,44 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/27/4b/7c1a00c2c3fbd004253937f7520f692a9650767aa73894d7a34f0d65d3f4/openai-2.14.0-py3-none-any.whl", hash = "sha256:7ea40aca4ffc4c4a776e77679021b47eec1160e341f42ae086ba949c9dcc9183", size = 1067558, upload-time = "2025-12-19T03:28:43.727Z" },
 ]

+[[package]]
+name = "opencv-python"
+version = "4.13.0.92"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+    { name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fc/6f/5a28fef4c4a382be06afe3938c64cc168223016fa520c5abaf37e8862aa5/opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19", size = 46247052, upload-time = "2026-02-05T07:01:25.046Z" },
+    { url = "https://files.pythonhosted.org/packages/08/ac/6c98c44c650b8114a0fb901691351cfb3956d502e8e9b5cd27f4ee7fbf2f/opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9", size = 32568781, upload-time = "2026-02-05T07:01:41.379Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/51/82fed528b45173bf629fa44effb76dff8bc9f4eeaee759038362dfa60237/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a", size = 47685527, upload-time = "2026-02-05T06:59:11.24Z" },
+    { url = "https://files.pythonhosted.org/packages/db/07/90b34a8e2cf9c50fe8ed25cac9011cde0676b4d9d9c973751ac7616223a2/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf", size = 70460872, upload-time = "2026-02-05T06:59:19.162Z" },
+    { url = "https://files.pythonhosted.org/packages/02/6d/7a9cc719b3eaf4377b9c2e3edeb7ed3a81de41f96421510c0a169ca3cfd4/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616", size = 46708208, upload-time = "2026-02-05T06:59:15.419Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/55/b3b49a1b97aabcfbbd6c7326df9cb0b6fa0c0aefa8e89d500939e04aa229/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5", size = 72927042, upload-time = "2026-02-05T06:59:23.389Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/17/de5458312bcb07ddf434d7bfcb24bb52c59635ad58c6e7c751b48949b009/opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59", size = 30932638, upload-time = "2026-02-05T07:02:14.98Z" },
+    { url = "https://files.pythonhosted.org/packages/e9/a5/1be1516390333ff9be3a9cb648c9f33df79d5096e5884b5df71a588af463/opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5", size = 40212062, upload-time = "2026-02-05T07:02:12.724Z" },
+]
+
+[[package]]
+name = "opencv-python-headless"
+version = "4.13.0.92"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+    { name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+]
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/79/42/2310883be3b8826ac58c3f2787b9358a2d46923d61f88fedf930bc59c60c/opencv_python_headless-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:1a7d040ac656c11b8c38677cc8cccdc149f98535089dbe5b081e80a4e5903209", size = 46247192, upload-time = "2026-02-05T07:01:35.187Z" },
+    { url = "https://files.pythonhosted.org/packages/2d/1e/6f9e38005a6f7f22af785df42a43139d0e20f169eb5787ce8be37ee7fcc9/opencv_python_headless-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:3e0a6f0a37994ec6ce5f59e936be21d5d6384a4556f2d2da9c2f9c5dc948394c", size = 32568914, upload-time = "2026-02-05T07:01:51.989Z" },
+    { url = "https://files.pythonhosted.org/packages/21/76/9417a6aef9def70e467a5bf560579f816148a4c658b7d525581b356eda9e/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c8cfc8e87ed452b5cecb9419473ee5560a989859fe1d10d1ce11ae87b09a2cb", size = 33703709, upload-time = "2026-02-05T10:24:46.469Z" },
+    { url = "https://files.pythonhosted.org/packages/92/ce/bd17ff5772938267fd49716e94ca24f616ff4cb1ff4c6be13085108037be/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0525a3d2c0b46c611e2130b5fdebc94cf404845d8fa64d2f3a3b679572a5bd22", size = 56016764, upload-time = "2026-02-05T10:26:48.904Z" },
+    { url = "https://files.pythonhosted.org/packages/8f/b4/b7bcbf7c874665825a8c8e1097e93ea25d1f1d210a3e20d4451d01da30aa/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:eb60e36b237b1ebd40a912da5384b348df8ed534f6f644d8e0b4f103e272ba7d", size = 35010236, upload-time = "2026-02-05T10:28:11.031Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/33/b5db29a6c00eb8f50708110d8d453747ca125c8b805bc437b289dbdcc057/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0bd48544f77c68b2941392fcdf9bcd2b9cdf00e98cb8c29b2455d194763cf99e", size = 60391106, upload-time = "2026-02-05T10:30:14.236Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/c3/52cfea47cd33e53e8c0fbd6e7c800b457245c1fda7d61660b4ffe9596a7f/opencv_python_headless-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:a7cf08e5b191f4ebb530791acc0825a7986e0d0dee2a3c491184bd8599848a4b", size = 30812232, upload-time = "2026-02-05T07:02:29.594Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/90/b338326131ccb2aaa3c2c85d00f41822c0050139a4bfe723cfd95455bd2d/opencv_python_headless-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:77a82fe35ddcec0f62c15f2ba8a12ecc2ed4207c17b0902c7a3151ae29f37fb6", size = 40070414, upload-time = "2026-02-05T07:02:26.448Z" },
+]
+
 [[package]]
 name = "opentelemetry-api"
 version = "1.39.1"
@@ -5103,6 +5257,27 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/58/5b/632a58724221ef03d78ab65062e82a1010e1bef8e8e0b9d7c6d7b8044841/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:473b32699f4200e69801bf5abf93f1a4ecd432a70984df164fc22ccf39c4a6f3", size = 531885, upload-time = "2025-11-19T15:18:27.146Z" },
 ]

+[[package]]
+name = "scenedetect"
+version = "0.6.7"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "click" },
+    { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
+    { name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
+    { name = "platformdirs" },
+    { name = "tqdm" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/bd/b1/800d4c1d4da24cd673b921c0b5ffd5bbdcaa2a7f4f4dd86dd2c202a673c6/scenedetect-0.6.7.tar.gz", hash = "sha256:1a2c73b57de2e1656f7896edc8523de7217f361179a8966e947f79d33e40830f", size = 164213, upload-time = "2025-08-25T03:37:24.124Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e7/e9/05a20eaeed21d2e0761fc4d3819f1f5013a49945133323ba0ce7be8be291/scenedetect-0.6.7-py3-none-any.whl", hash = "sha256:935571453142f5d7d44a8d9bb713fdd89bdb69efdbce92c7dfe09d52c523ac2b", size = 130834, upload-time = "2025-08-25T03:37:22.8Z" },
+]
+
+[package.optional-dependencies]
+opencv = [
+    { name = "opencv-python" },
+]
+
 [[package]]
 name = "schedule"
 version = "1.2.2"
@@ -5422,7 +5597,6 @@ dependencies = [
    { name = "pygithub" },
    { name = "pygments" },
    { name = "pymupdf" },
-    { name = "pytesseract" },
    { name = "python-dotenv" },
    { name = "pyyaml" },
    { name = "requests" },
@@ -5453,6 +5627,8 @@ all = [
    { name = "uvicorn" },
    { name = "voyageai" },
    { name = "weaviate-client" },
+    { name = "youtube-transcript-api" },
+    { name = "yt-dlp" },
 ]
 all-cloud = [
    { name = "azure-storage-blob" },
@@ -5513,6 +5689,18 @@ s3 = [
 sentence-transformers = [
    { name = "sentence-transformers" },
 ]
+video = [
+    { name = "youtube-transcript-api" },
+    { name = "yt-dlp" },
+]
+video-full = [
+    { name = "faster-whisper" },
+    { name = "opencv-python-headless" },
+    { name = "pytesseract" },
+    { name = "scenedetect", extra = ["opencv"] },
+    { name = "youtube-transcript-api" },
+    { name = "yt-dlp" },
+]
 weaviate = [
    { name = "weaviate-client" },
 ]
@@ -5551,6 +5739,7 @@ requires-dist = [
    { name = "click", specifier = ">=8.3.0" },
    { name = "fastapi", marker = "extra == 'all'", specifier = ">=0.109.0" },
    { name = "fastapi", marker = "extra == 'embedding'", specifier = ">=0.109.0" },
+    { name = "faster-whisper", marker = "extra == 'video-full'", specifier = ">=1.0.0" },
    { name = "gitpython", specifier = ">=3.1.40" },
    { name = "google-cloud-storage", marker = "extra == 'all'", specifier = ">=2.10.0" },
    { name = "google-cloud-storage", marker = "extra == 'all-cloud'", specifier = ">=2.10.0" },
@@ -5576,6 +5765,7 @@ requires-dist = [
    { name = "openai", marker = "extra == 'all'", specifier = ">=1.0.0" },
    { name = "openai", marker = "extra == 'all-llms'", specifier = ">=1.0.0" },
    { name = "openai", marker = "extra == 'openai'", specifier = ">=1.0.0" },
+    { name = "opencv-python-headless", marker = "extra == 'video-full'", specifier = ">=4.9.0" },
    { name = "pathspec", specifier = ">=0.12.1" },
    { name = "pillow", specifier = ">=11.0.0" },
    { name = "pinecone", marker = "extra == 'all'", specifier = ">=5.0.0" },
@@ -5586,12 +5776,13 @@ requires-dist = [
    { name = "pygithub", specifier = ">=2.5.0" },
    { name = "pygments", specifier = ">=2.19.2" },
    { name = "pymupdf", specifier = ">=1.24.14" },
-    { name = "pytesseract", specifier = ">=0.3.13" },
+    { name = "pytesseract", marker = "extra == 'video-full'", specifier = ">=0.3.13" },
    { name = "python-docx", marker = "extra == 'all'", specifier = ">=1.1.0" },
    { name = "python-docx", marker = "extra == 'docx'", specifier = ">=1.1.0" },
    { name = "python-dotenv", specifier = ">=1.1.1" },
    { name = "pyyaml", specifier = ">=6.0" },
    { name = "requests", specifier = ">=2.32.5" },
+    { name = "scenedetect", extras = ["opencv"], marker = "extra == 'video-full'", specifier = ">=0.6.4" },
    { name = "schedule", specifier = ">=1.2.0" },
    { name = "sentence-transformers", marker = "extra == 'all'", specifier = ">=2.3.0" },
    { name = "sentence-transformers", marker = "extra == 'embedding'", specifier = ">=2.3.0" },
@@ -5610,8 +5801,14 @@ requires-dist = [
    { name = "weaviate-client", marker = "extra == 'all'", specifier = ">=3.25.0" },
    { name = "weaviate-client", marker = "extra == 'rag-upload'", specifier = ">=3.25.0" },
    { name = "weaviate-client", marker = "extra == 'weaviate'", specifier = ">=3.25.0" },
+    { name = "youtube-transcript-api", marker = "extra == 'all'", specifier = ">=1.2.0" },
+    { name = "youtube-transcript-api", marker = "extra == 'video'", specifier = ">=1.2.0" },
+    { name = "youtube-transcript-api", marker = "extra == 'video-full'", specifier = ">=1.2.0" },
+    { name = "yt-dlp", marker = "extra == 'all'", specifier = ">=2024.12.0" },
+    { name = "yt-dlp", marker = "extra == 'video'", specifier = ">=2024.12.0" },
+    { name = "yt-dlp", marker = "extra == 'video-full'", specifier = ">=2024.12.0" },
 ]
-provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "embedding", "all"]
+provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "video", "video-full", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "embedding", "all"]

 [package.metadata.requires-dev]
 dev = [
@@ -6774,6 +6971,28 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/73/ae/b48f95715333080afb75a4504487cbe142cae1268afc482d06692d605ae6/yarl-1.22.0-py3-none-any.whl", hash = "sha256:1380560bdba02b6b6c90de54133c81c9f2a453dee9912fe58c1dcced1edb7cff", size = 46814, upload-time = "2025-10-06T14:12:53.872Z" },
 ]

+[[package]]
+name = "youtube-transcript-api"
+version = "1.2.4"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "defusedxml" },
+    { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/60/43/4104185a2eaa839daa693b30e15c37e7e58795e8e09ec414f22b3db54bec/youtube_transcript_api-1.2.4.tar.gz", hash = "sha256:b72d0e96a335df599d67cee51d49e143cff4f45b84bcafc202ff51291603ddcd", size = 469839, upload-time = "2026-01-29T09:09:17.088Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/be/95/129ea37efd6cd6ed00f62baae6543345c677810b8a3bf0026756e1d3cf3c/youtube_transcript_api-1.2.4-py3-none-any.whl", hash = "sha256:03878759356da5caf5edac77431780b91448fb3d8c21d4496015bdc8a7bc43ff", size = 485227, upload-time = "2026-01-29T09:09:15.427Z" },
+]
+
+[[package]]
+name = "yt-dlp"
+version = "2026.2.21"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/58/d9/55ffff25204733e94a507552ad984d5a8a8e4f9d1f0d91763e6b1a41c79b/yt_dlp-2026.2.21.tar.gz", hash = "sha256:4407dfc1a71fec0dee5ef916a8d4b66057812939b509ae45451fa8fb4376b539", size = 3116630, upload-time = "2026-02-21T20:40:53.522Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5a/40/664c99ee36d80d84ce7a96cd98aebcb3d16c19e6c3ad3461d2cf5424040e/yt_dlp-2026.2.21-py3-none-any.whl", hash = "sha256:0d8408f5b6d20487f5caeb946dfd04f9bcd2f1a3a125b744a0a982b590e449f7", size = 3313392, upload-time = "2026-02-21T20:40:51.514Z" },
+]
+
 [[package]]
 name = "zipp"
 version = "3.23.0"