Merge feature/video-scraper-pipeline into development

Video tutorial scraping pipeline (BETA):
- Extract skills from YouTube/Vimeo/local video tutorials
- Visual frame extraction with multi-engine OCR (EasyOCR + pytesseract ensemble)
- Per-panel code detection and structured text assembly
- Keyframe extraction via scene detection
- Whisper transcription fallback
- AI enhancement of extracted content
- `skill-seekers video --setup` for GPU auto-detection and dependency installation
  (NVIDIA CUDA, AMD ROCm, CPU-only)
- MCP `scrape_video` tool with setup parameter
- 240 tests passing (60 setup + 180 scraper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-03-01 18:55:10 +03:00
43 changed files with 17191 additions and 41 deletions

View File

@@ -1,12 +1,12 @@
# AGENTS.md - Skill Seekers
This file provides essential guidance for AI coding agents working with the Skill Seekers codebase.
Essential guidance for AI coding agents working with the Skill Seekers codebase.
---
## Project Overview
**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.
**Skill Seekers** is a Python CLI tool that converts documentation websites, GitHub repositories, PDF files, and videos into AI-ready skills for LLM platforms and RAG (Retrieval-Augmented Generation) pipelines. It serves as the universal preprocessing layer for AI systems.
### Key Facts
@@ -16,8 +16,8 @@ This file provides essential guidance for AI coding agents working with the Skil
| **Python Version** | 3.10+ (tested on 3.10, 3.11, 3.12, 3.13) |
| **License** | MIT |
| **Package Name** | `skill-seekers` (PyPI) |
| **Source Files** | 169 Python files |
| **Test Files** | 101 test files |
| **Source Files** | 182 Python files |
| **Test Files** | 105+ test files |
| **Website** | https://skillseekersweb.com/ |
| **Repository** | https://github.com/yusufkaraaslan/Skill_Seekers |
@@ -44,7 +44,7 @@ This file provides essential guidance for AI coding agents working with the Skil
### Core Workflow
1. **Scrape Phase** - Crawl documentation/GitHub/PDF sources
1. **Scrape Phase** - Crawl documentation/GitHub/PDF/video sources
2. **Build Phase** - Organize content into categorized references
3. **Enhancement Phase** - AI-powered quality improvements (optional)
4. **Package Phase** - Create platform-specific packages
@@ -73,12 +73,18 @@ This file provides essential guidance for AI coding agents working with the Skil
│ │ │ ├── weaviate.py # Weaviate vector DB adaptor
│ │ │ └── streaming_adaptor.py # Streaming output adaptor
│ │ ├── arguments/ # CLI argument definitions
│ │ ├── parsers/ # Argument parsers
│ │ │ └── extractors/ # Content extractors
│ │ ├── presets/ # Preset configuration management
│ │ ├── storage/ # Cloud storage adaptors
│ │ ├── main.py # Unified CLI entry point
│ │ ├── create_command.py # Unified create command
│ │ ├── doc_scraper.py # Documentation scraper
│ │ ├── github_scraper.py # GitHub repository scraper
│ │ ├── pdf_scraper.py # PDF extraction
│ │ ├── word_scraper.py # Word document scraper
│ │ ├── video_scraper.py # Video extraction
│ │ ├── video_setup.py # GPU detection & dependency installation
│ │ ├── unified_scraper.py # Multi-source scraping
│ │ ├── codebase_scraper.py # Local codebase analysis
│ │ ├── enhance_command.py # AI enhancement command
@@ -118,10 +124,10 @@ This file provides essential guidance for AI coding agents working with the Skil
│ │ ├── generator.py # Embedding generation
│ │ ├── cache.py # Embedding cache
│ │ └── models.py # Embedding models
│ ├── workflows/ # YAML workflow presets
│ ├── workflows/ # YAML workflow presets (66 presets)
│ ├── _version.py # Version information (reads from pyproject.toml)
│ └── __init__.py # Package init
├── tests/ # Test suite (101 test files)
├── tests/ # Test suite (105+ test files)
├── configs/ # Preset configuration files
├── docs/ # Documentation (80+ markdown files)
│ ├── integrations/ # Platform integration guides
@@ -245,9 +251,8 @@ pytest tests/ -v -m "not slow and not integration"
### Test Architecture
- **101 test files** covering all features
- **1880+ tests** passing
- CI Matrix: Ubuntu + macOS, Python 3.10-3.12
- **105+ test files** covering all features
- **CI Matrix:** Ubuntu + macOS, Python 3.10-3.12
- Test markers defined in `pyproject.toml`:
| Marker | Description |
@@ -376,6 +381,8 @@ The CLI uses subcommands that delegate to existing modules:
- `scrape` - Documentation scraping
- `github` - GitHub repository scraping
- `pdf` - PDF extraction
- `word` - Word document extraction
- `video` - Video extraction (YouTube or local). Use `--setup` to auto-detect GPU and install visual deps.
- `unified` - Multi-source scraping
- `analyze` / `codebase` - Local codebase analysis
- `enhance` - AI enhancement
@@ -402,7 +409,7 @@ Two implementations:
Tools are organized by category:
- Config tools (3 tools): generate_config, list_configs, validate_config
- Scraping tools (9 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
- Scraping tools (10 tools): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video (supports `setup` parameter for GPU detection and visual dep installation), scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
- Packaging tools (4 tools): package_skill, upload_skill, enhance_skill, install_skill
- Source tools (5 tools): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
- Splitting tools (2 tools): split_config, generate_router
@@ -619,7 +626,7 @@ export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1
**Reference (technical details):**
- `docs/reference/CLI_REFERENCE.md` - Complete command reference (20 commands)
- `docs/reference/MCP_REFERENCE.md` - MCP tools reference (26 tools)
- `docs/reference/MCP_REFERENCE.md` - MCP tools reference (33 tools)
- `docs/reference/CONFIG_FORMAT.md` - JSON configuration specification
- `docs/reference/ENVIRONMENT_VARIABLES.md` - All environment variables
@@ -629,20 +636,16 @@ export ANTHROPIC_BASE_URL=https://custom-endpoint.com/v1
- `docs/advanced/custom-workflows.md` - Creating custom workflows
- `docs/advanced/multi-source.md` - Multi-source scraping
**Legacy (being phased out):**
- `QUICKSTART.md` - Old quick start (see docs/getting-started/)
- `docs/guides/USAGE.md` - Old usage guide (see docs/user-guide/)
- `docs/QUICK_REFERENCE.md` - Old reference (see docs/reference/)
### Configuration Documentation
Preset configs are in `configs/` directory:
- `godot.json` - Godot Engine
- `godot.json` / `godot_unified.json` - Godot Engine
- `blender.json` / `blender-unified.json` - Blender Engine
- `claude-code.json` - Claude Code
- `httpx_comprehensive.json` - HTTPX library
- `medusa-mercurjs.json` - Medusa/MercurJS
- `astrovalley_unified.json` - Astrovalley
- `react.json` - React documentation
- `configs/integrations/` - Integration-specific configs
---
@@ -685,8 +688,13 @@ Preset configs are in `configs/` directory:
| AWS S3 | `boto3>=1.34.0` | `pip install -e ".[s3]"` |
| Google Cloud Storage | `google-cloud-storage>=2.10.0` | `pip install -e ".[gcs]"` |
| Azure Blob Storage | `azure-storage-blob>=12.19.0` | `pip install -e ".[azure]"` |
| Word Documents | `mammoth>=1.6.0`, `python-docx>=1.1.0` | `pip install -e ".[docx]"` |
| Video (lightweight) | `yt-dlp>=2024.12.0`, `youtube-transcript-api>=1.2.0` | `pip install -e ".[video]"` |
| Video (full) | +`faster-whisper`, `scenedetect`, `opencv-python-headless` (`easyocr` now installed via `--setup`) | `pip install -e ".[video-full]"` |
| Video (GPU setup) | Auto-detects GPU, installs PyTorch + easyocr + all visual deps | `skill-seekers video --setup` |
| Chroma DB | `chromadb>=0.4.0` | `pip install -e ".[chroma]"` |
| Weaviate | `weaviate-client>=3.25.0` | `pip install -e ".[weaviate]"` |
| Pinecone | `pinecone>=5.0.0` | `pip install -e ".[pinecone]"` |
| Embedding Server | `fastapi>=0.109.0`, `uvicorn>=0.27.0`, `sentence-transformers>=2.3.0` | `pip install -e ".[embedding]"` |
### Dev Dependencies (in dependency-groups)
@@ -702,6 +710,7 @@ Preset configs are in `configs/` directory:
| `psutil` | >=5.9.0 | Process utilities for testing |
| `numpy` | >=1.24.0 | Numerical operations |
| `starlette` | >=0.31.0 | HTTP transport testing |
| `httpx` | >=0.24.0 | HTTP client for testing |
| `boto3` | >=1.26.0 | AWS S3 testing |
| `google-cloud-storage` | >=2.10.0 | GCS testing |
| `azure-storage-blob` | >=12.17.0 | Azure testing |
@@ -824,6 +833,34 @@ Skill Seekers uses JSON configuration files to define scraping targets. Example
---
## Workflow Presets
Skill Seekers includes 66 YAML workflow presets for AI enhancement in `src/skill_seekers/workflows/`:
**Built-in presets:**
- `default.yaml` - Standard enhancement workflow
- `minimal.yaml` - Fast, minimal enhancement
- `security-focus.yaml` - Security-focused review
- `architecture-comprehensive.yaml` - Deep architecture analysis
- `api-documentation.yaml` - API documentation focus
- And 61 more specialized presets...
**Usage:**
```bash
# Apply a preset
skill-seekers create ./my-project --enhance-workflow security-focus
# Chain multiple presets
skill-seekers create ./my-project --enhance-workflow security-focus --enhance-workflow minimal
# Manage presets
skill-seekers workflows list
skill-seekers workflows show security-focus
skill-seekers workflows copy security-focus
```
---
*This document is maintained for AI coding agents. For human contributors, see README.md and CONTRIBUTING.md.*
*Last updated: 2026-02-24*
*Last updated: 2026-03-01*

View File

@@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### 🎬 Video `--setup`: GPU Auto-Detection & Dependency Installation
### Added
- **`skill-seekers video --setup`** — One-command GPU auto-detection and dependency installation for the video scraper pipeline
- `video_setup.py` (~835 lines) — New module with complete setup orchestration
- **GPU auto-detection** — Detects NVIDIA (nvidia-smi → CUDA version), AMD (rocminfo → ROCm version), or CPU-only without requiring PyTorch
- **Correct PyTorch variant** — Installs from the right index URL: `cu124`/`cu121`/`cu118` for NVIDIA, `rocm6.3`/`rocm6.2.4` for AMD, `cpu` for CPU-only
- **ROCm configuration** — Sets `MIOPEN_FIND_MODE=FAST` and `HSA_OVERRIDE_GFX_VERSION` for AMD GPUs (fixes MIOpen workspace allocation issues)
- **Virtual environment detection** — Warns users outside a venv with opt-in `--force` override
- **System dependency checks** — Validates `tesseract` and `ffmpeg` binaries, provides OS-specific install instructions
- **Module selection** — `SetupModules` dataclass for optional component selection (easyocr, opencv, tesseract, scenedetect, whisper)
- **Base video deps always included** — `yt-dlp` and `youtube-transcript-api` installed automatically so video pipeline is fully ready after setup
- **Verification step** — Post-install import checks for all deps including `torch.cuda.is_available()` and `torch.version.hip`
- **Non-interactive mode** — `run_setup(interactive=False)` for MCP server and CI/CD use
- **`--setup` flag** in `arguments/video.py` — Added to `VIDEO_ARGUMENTS` dict
- **Early-exit in `video_scraper.py`** — `--setup` runs before source validation (no `--url` required)
- **MCP `scrape_video` setup parameter** — `setup: bool = False` param in `server_fastmcp.py` and `scraping_tools.py`
- **`create` command routing** — `create_command.py` forwards `--setup` to video scraper
- **`tests/test_video_setup.py`** (60 tests) — GPU detection, CUDA/ROCm version mapping, installation, verification, venv checks, system deps, module selection, argument parsing
### Changed
- **`easyocr` removed from `video-full` optional deps** — Was pulling ~2GB of NVIDIA CUDA packages regardless of GPU vendor. Now installed via `--setup` with correct PyTorch variant.
- **Video dependency error messages** — `video_scraper.py` and `video_visual.py` now suggest `skill-seekers video --setup` as the primary fix
- **Multi-engine OCR** — `video_visual.py` uses EasyOCR + pytesseract ensemble for code frames (per-line confidence merge with code-token preference), EasyOCR only for non-code frames
- **Tesseract circuit breaker** — `_tesseract_broken` flag disables pytesseract for the session after first failure, avoiding repeated subprocess errors
- **`video_models.py`** — Added `SetupModules` dataclass for granular dependency control
- **`video_segmenter.py`** — Updated dependency check messages to reference `--setup`
### 📄 B2: Microsoft Word (.docx) Support & Stage 1 Quality Improvements
### Added

View File

@@ -341,6 +341,9 @@ skill-seekers how-to-guides output/test_examples.json --output output/guides/
# Test enhancement status monitoring
skill-seekers enhance-status output/react/ --watch
# Video setup (auto-detect GPU and install deps)
skill-seekers video --setup
# Test multi-platform packaging
skill-seekers package output/react/ --target gemini --dry-run
@@ -750,6 +753,7 @@ skill-seekers-install-agent = "skill_seekers.cli.install_agent:main"
skill-seekers-patterns = "skill_seekers.cli.pattern_recognizer:main" # C3.1 Pattern detection
skill-seekers-how-to-guides = "skill_seekers.cli.how_to_guide_builder:main" # C3.3 Guide generation
skill-seekers-workflows = "skill_seekers.cli.workflows_command:main" # NEW: Workflow preset management
skill-seekers-video = "skill_seekers.cli.video_scraper:main" # Video scraping pipeline (use --setup to install deps)
# New v3.0.0 Entry Points
skill-seekers-setup = "skill_seekers.cli.setup_wizard:main" # NEW: v3.0.0 Setup wizard
@@ -771,6 +775,8 @@ skill-seekers-quality = "skill_seekers.cli.quality_metrics:main" # N
- Install with: `pip install -e .` (installs only core deps)
- Install dev deps: See CI workflow or manually install pytest, ruff, mypy
**Note on video dependencies:** `easyocr` and GPU-specific PyTorch builds are **not** included in the `video-full` optional dependency group. They are installed at runtime by `skill-seekers video --setup`, which auto-detects the GPU (CUDA/ROCm/MPS/CPU) and installs the correct builds.
```toml
[project.optional-dependencies]
gemini = ["google-generativeai>=0.8.0"]
@@ -1985,6 +1991,13 @@ UNIVERSAL_ARGUMENTS = {
- Profile creation
- First-time setup
**Video Scraper** (`src/skill_seekers/cli/`):
- `video_scraper.py` - Main video scraping pipeline CLI
- `video_setup.py` - GPU auto-detection, PyTorch installation, visual dependency setup (~835 lines)
- Detects CUDA/ROCm/MPS/CPU and installs matching PyTorch build
- Installs `easyocr` and other visual processing deps at runtime via `--setup`
- Run `skill-seekers video --setup` before first use
## 🎯 Project-Specific Best Practices
1. **Prefer the unified `create` command** - Use `skill-seekers create <source>` over legacy commands for consistency

View File

@@ -92,6 +92,11 @@ skill-seekers create ./my-project
# PDF document
skill-seekers create manual.pdf
# Video (YouTube, Vimeo, or local file — requires skill-seekers[video])
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial
# First time? Auto-install GPU-aware visual deps:
skill-seekers video --setup
```
### Export Everywhere
@@ -593,8 +598,14 @@ skill-seekers-setup
| `pip install skill-seekers[openai]` | + OpenAI ChatGPT support |
| `pip install skill-seekers[all-llms]` | + All LLM platforms |
| `pip install skill-seekers[mcp]` | + MCP server for Claude Code, Cursor, etc. |
| `pip install skill-seekers[video]` | + YouTube/Vimeo transcript & metadata extraction |
| `pip install skill-seekers[video-full]` | + Whisper transcription & visual frame extraction |
| `pip install skill-seekers[all]` | Everything enabled |
> **Video visual deps (GPU-aware):** After installing `skill-seekers[video-full]`, run
> `skill-seekers video --setup` to auto-detect your GPU and install the correct PyTorch
> variant + easyocr. This is the recommended way to install visual extraction dependencies.
---
## 🚀 One-Command Install Workflow
@@ -683,6 +694,29 @@ skill-seekers pdf --pdf docs/manual.pdf --name myskill \
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
```
### Video Extraction
```bash
# Install video support
pip install skill-seekers[video] # Transcripts + metadata
pip install skill-seekers[video-full] # + Whisper + visual frame extraction
# Auto-detect GPU and install visual deps (PyTorch + easyocr)
skill-seekers video --setup
# Extract from YouTube video
skill-seekers video --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --name mytutorial
# Extract from a YouTube playlist
skill-seekers video --playlist https://www.youtube.com/playlist?list=... --name myplaylist
# Extract from a local video file
skill-seekers video --video-file recording.mp4 --name myrecording
# Extract with visual frame analysis (requires video-full deps)
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial --visual
```
### GitHub Repository Analysis
```bash

View File

@@ -59,6 +59,24 @@ Each platform has a dedicated adaptor for optimal formatting and upload.
**Recommendation:** Use LOCAL mode for free AI enhancement or skip enhancement entirely.
### How do I set up video extraction?
**Quick setup:**
```bash
# 1. Install video support
pip install skill-seekers[video-full]
# 2. Auto-detect GPU and install visual deps
skill-seekers video --setup
```
The `--setup` command auto-detects your GPU vendor (NVIDIA CUDA, AMD ROCm, or CPU-only) and installs the correct PyTorch variant along with easyocr and other visual extraction dependencies. This avoids the ~2GB NVIDIA CUDA download that would happen if easyocr were installed via pip on non-NVIDIA systems.
**What it detects:**
- **NVIDIA:** Uses `nvidia-smi` to find CUDA version → installs matching `cu124`/`cu121`/`cu118` PyTorch
- **AMD:** Uses `rocminfo` to find ROCm version → installs matching ROCm PyTorch
- **CPU-only:** Installs lightweight CPU-only PyTorch
### How long does it take to create a skill?
**Typical Times:**

View File

@@ -90,6 +90,35 @@ pyenv install 3.12
pyenv global 3.12
```
### Issue: Video Visual Dependencies Missing
**Symptoms:**
```
Missing video dependencies: easyocr
RuntimeError: Required video visual dependencies not installed
```
**Solutions:**
```bash
# Run the GPU-aware setup command
skill-seekers video --setup
# This auto-detects your GPU and installs:
# - PyTorch (correct CUDA/ROCm/CPU variant)
# - easyocr, opencv, pytesseract, scenedetect, faster-whisper
# - yt-dlp, youtube-transcript-api
# Verify installation
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
python -c "import easyocr; print('easyocr OK')"
```
**Common issues:**
- Running outside a virtual environment → `--setup` will warn you; create a venv first
- Missing system packages → Install `tesseract-ocr` and `ffmpeg` for your OS
- AMD GPU without ROCm → Install ROCm first, then re-run `--setup`
## Configuration Issues
### Issue: API Keys Not Recognized

View File

@@ -124,10 +124,14 @@ pip install skill-seekers[dev]
| `gcs` | Google Cloud Storage | `pip install skill-seekers[gcs]` |
| `azure` | Azure Blob Storage | `pip install skill-seekers[azure]` |
| `embedding` | Embedding server | `pip install skill-seekers[embedding]` |
| `video` | YouTube/video transcript extraction | `pip install skill-seekers[video]` |
| `video-full` | + Whisper transcription, scene detection | `pip install skill-seekers[video-full]` |
| `all-llms` | All LLM platforms | `pip install skill-seekers[all-llms]` |
| `all` | Everything | `pip install skill-seekers[all]` |
| `dev` | Development tools | `pip install skill-seekers[dev]` |
> **Video visual deps:** After installing `skill-seekers[video-full]`, run `skill-seekers video --setup` to auto-detect your GPU (NVIDIA/AMD/CPU) and install the correct PyTorch variant + easyocr.
---
## Post-Installation Setup

View File

@@ -0,0 +1,261 @@
# Video Source Support — Master Plan
**Date:** February 27, 2026
**Feature ID:** V1.0
**Status:** Planning
**Priority:** High
**Estimated Complexity:** Large (multi-sprint feature)
---
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Motivation & Goals](#motivation--goals)
3. [Scope](#scope)
4. [Plan Documents Index](#plan-documents-index)
5. [High-Level Architecture](#high-level-architecture)
6. [Implementation Phases](#implementation-phases)
7. [Dependencies](#dependencies)
8. [Risk Assessment](#risk-assessment)
9. [Success Criteria](#success-criteria)
---
## Executive Summary
Add **video** as a first-class source type in Skill Seekers, alongside web documentation, GitHub repositories, PDF files, and Word documents. Videos contain a massive amount of knowledge — conference talks, official tutorials, live coding sessions, architecture walkthroughs — that is currently inaccessible to our pipeline.
The video source will use a **3-stream parallel extraction** model:
| Stream | What | Tool |
|--------|------|------|
| **ASR** (Audio Speech Recognition) | Spoken words → timestamped text | youtube-transcript-api + faster-whisper |
| **OCR** (Optical Character Recognition) | On-screen code/slides/diagrams → text | PySceneDetect + OpenCV + easyocr |
| **Metadata** | Title, chapters, tags, description | yt-dlp Python API |
These three streams are **aligned on a shared timeline** and merged into structured `VideoSegment` objects — the fundamental output unit. Segments are then categorized, converted to reference markdown files, and integrated into SKILL.md just like any other source.
---
## Motivation & Goals
### Why Video?
1. **Knowledge density** — A 30-minute conference talk can contain the equivalent of a 5,000-word blog post, plus live code demos that never appear in written docs.
2. **Official tutorials** — Many frameworks (React, Flutter, Unity, Godot) have official video tutorials that are the canonical learning resource.
3. **Code walkthroughs** — Screen-recorded coding sessions show real patterns, debugging workflows, and architecture decisions that written docs miss.
4. **Conference talks** — JSConf, PyCon, GopherCon, etc. contain deep technical insights from framework authors.
5. **Completeness** — Skill Seekers aims to be the **universal** documentation preprocessor. Video is the last major content type we don't support.
### Goals
- **G1:** Extract structured, time-aligned knowledge from YouTube videos, playlists, channels, and local video files.
- **G2:** Integrate video as a first-class source in the unified config system (multiple video sources per skill, alongside docs/github/pdf).
- **G3:** Auto-detect video sources in the `create` command (YouTube URLs, video file extensions).
- **G4:** Support two tiers: lightweight (transcript + metadata only) and full (+ visual extraction with OCR).
- **G5:** Produce output that is indistinguishable in quality from other source types — properly categorized reference files integrated into SKILL.md.
- **G6:** Make visual extraction (Whisper, OCR) available as optional add-on dependencies, keeping core install lightweight.
### Non-Goals (explicitly out of scope for V1.0)
- Real-time / live stream processing
- Video generation or editing
- Speaker diarization (identifying who said what) — future enhancement
- Automatic video discovery (e.g., "find all React tutorials on YouTube") — future enhancement
- DRM-protected or paywalled video content (Udemy, Coursera, etc.)
- Audio-only podcasts (similar pipeline but separate feature)
---
## Scope
### Supported Video Sources
| Source | Input Format | Example |
|--------|-------------|---------|
| YouTube single video | URL | `https://youtube.com/watch?v=abc123` |
| YouTube short URL | URL | `https://youtu.be/abc123` |
| YouTube playlist | URL | `https://youtube.com/playlist?list=PLxxx` |
| YouTube channel | URL | `https://youtube.com/@channelname` |
| Vimeo video | URL | `https://vimeo.com/123456` |
| Local video file | Path | `./tutorials/intro.mp4` |
| Local video directory | Path | `./recordings/` (batch) |
### Supported Video Formats (local files)
| Format | Extension | Notes |
|--------|-----------|-------|
| MP4 | `.mp4` | Most common, universal |
| Matroska | `.mkv` | Common for screen recordings |
| WebM | `.webm` | Web-native, YouTube's format |
| AVI | `.avi` | Legacy but still used |
| QuickTime | `.mov` | macOS screen recordings |
| Flash Video | `.flv` | Legacy, rare |
| MPEG Transport | `.ts` | Streaming recordings |
| Windows Media | `.wmv` | Windows screen recordings |
### Supported Languages (transcript)
All languages supported by:
- YouTube's caption system (100+ languages)
- faster-whisper / OpenAI Whisper (99 languages)
---
## Plan Documents Index
| Document | Content |
|----------|---------|
| [`01_VIDEO_RESEARCH.md`](./01_VIDEO_RESEARCH.md) | Library research, benchmarks, industry standards |
| [`02_VIDEO_DATA_MODELS.md`](./02_VIDEO_DATA_MODELS.md) | All data classes, type definitions, JSON schemas |
| [`03_VIDEO_PIPELINE.md`](./03_VIDEO_PIPELINE.md) | Processing pipeline (6 phases), algorithms, edge cases |
| [`04_VIDEO_INTEGRATION.md`](./04_VIDEO_INTEGRATION.md) | CLI, config, source detection, unified scraper integration |
| [`05_VIDEO_OUTPUT.md`](./05_VIDEO_OUTPUT.md) | Output structure, SKILL.md integration, reference file format |
| [`06_VIDEO_TESTING.md`](./06_VIDEO_TESTING.md) | Test strategy, mocking, fixtures, CI considerations |
| [`07_VIDEO_DEPENDENCIES.md`](./07_VIDEO_DEPENDENCIES.md) | Dependency tiers, optional installs, system requirements — **IMPLEMENTED** (`video_setup.py`, GPU auto-detection, `--setup`) |
---
## High-Level Architecture
```
┌──────────────────────┐
│ User Input │
│ │
│ YouTube URL │
│ Playlist URL │
│ Local .mp4 file │
│ Unified config JSON │
└──────────┬───────────┘
┌──────────▼───────────┐
│ Source Detector │
│ (source_detector.py) │
│ type="video" │
└──────────┬───────────┘
┌──────────▼───────────┐
│ Video Scraper │
│ (video_scraper.py) │
│ Main orchestrator │
└──────────┬───────────┘
┌────────────────────┼────────────────────┐
│ │ │
┌──────────▼──────┐ ┌──────────▼──────┐ ┌──────────▼──────┐
│ Stream 1: ASR │ │ Stream 2: OCR │ │ Stream 3: Meta │
│ │ │ (optional) │ │ │
│ youtube-trans- │ │ PySceneDetect │ │ yt-dlp │
│ cript-api │ │ OpenCV │ │ extract_info() │
│ faster-whisper │ │ easyocr │ │ │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
│ Timestamped │ Keyframes + │ Chapters,
│ transcript │ OCR text │ tags, desc
│ │ │
└────────────────────┼────────────────────┘
┌──────────▼───────────┐
│ Segmenter & │
│ Aligner │
│ (video_segmenter.py)│
│ │
│ Align 3 streams │
│ on shared timeline │
└──────────┬───────────┘
list[VideoSegment]
┌──────────▼───────────┐
│ Output Generator │
│ │
│ ├ references/*.md │
│ ├ video_data/*.json │
│ └ SKILL.md section │
└──────────────────────┘
```
---
## Implementation Phases
### Phase 1: Foundation (Core Pipeline)
- `video_models.py` — All data classes
- `video_scraper.py` — Main orchestrator
- `video_transcript.py` — YouTube captions + Whisper fallback
- Source detector update — YouTube URL patterns, video file extensions
- Basic metadata extraction via yt-dlp
- Output: timestamped transcript as reference markdown
### Phase 2: Segmentation & Structure
- `video_segmenter.py` — Chapter-aware segmentation
- Semantic segmentation fallback (when no chapters)
- Time-window fallback (configurable interval)
- Segment categorization (reuse smart_categorize patterns)
### Phase 3: Visual Extraction
- `video_visual.py` — Frame extraction + scene detection
- Frame classification (code/slide/terminal/diagram/other)
- OCR on classified frames (easyocr)
- Timeline alignment with ASR transcript
### Phase 4: Integration
- Unified config support (`"type": "video"`)
- `create` command routing
- CLI parser + arguments
- Unified scraper integration (video alongside docs/github/pdf)
- SKILL.md section generation
### Phase 5: Quality & Polish
- AI enhancement for video content (summarization, topic extraction)
- RAG-optimized chunking for video segments
- MCP tools (scrape_video, export_video)
- Comprehensive test suite
---
## Dependencies
### Core (always required for video)
```
yt-dlp>=2024.12.0
youtube-transcript-api>=1.2.0
```
### Full (for visual extraction + local file transcription)
```
faster-whisper>=1.0.0
scenedetect[opencv]>=0.6.4
easyocr>=1.7.0
opencv-python-headless>=4.9.0
```
### System Requirements (for full mode)
- FFmpeg (required by faster-whisper and yt-dlp for audio extraction)
- GPU (optional but recommended for Whisper and easyocr)
---
## Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| YouTube API changes break scraping | Medium | High | yt-dlp actively maintained, abstract behind our API |
| Whisper models are large (~1.5GB) | Certain | Medium | Optional dependency, offer multiple model sizes |
| OCR accuracy on code is low | Medium | Medium | Combine OCR with transcript context, use confidence scoring |
| Video download is slow | High | Medium | Stream audio only, don't download full video for transcript |
| Auto-generated captions are noisy | High | Medium | Confidence filtering, AI cleanup in enhancement phase |
| Copyright / ToS concerns | Low | High | Document that user is responsible for content rights |
| CI tests can't download videos | Certain | Medium | Mock all network calls, use fixture transcripts |
---
## Success Criteria
1. **Functional:** `skill-seekers create https://youtube.com/watch?v=xxx` produces a skill with video content integrated into SKILL.md.
2. **Multi-source:** Video sources work alongside docs/github/pdf in unified configs.
3. **Quality:** Video-derived reference files are categorized and structured (not raw transcript dumps).
4. **Performance:** Transcript-only mode processes a 30-minute video in < 30 seconds.
5. **Tests:** Full test suite with mocked network calls, 100% of video pipeline covered.
6. **Tiered deps:** `pip install skill-seekers[video]` works without pulling Whisper/OpenCV.

View File

@@ -0,0 +1,591 @@
# Video Source — Library Research & Industry Standards
**Date:** February 27, 2026
**Document:** 01 of 07
**Status:** Complete
---
## Table of Contents
1. [Industry Standards & Approaches](#industry-standards--approaches)
2. [Library Comparison Matrix](#library-comparison-matrix)
3. [Detailed Library Analysis](#detailed-library-analysis)
4. [Architecture Patterns from Industry](#architecture-patterns-from-industry)
5. [Benchmarks & Performance Data](#benchmarks--performance-data)
6. [Recommendations](#recommendations)
---
## Industry Standards & Approaches
### How the Industry Processes Video for AI/RAG
Based on research from NVIDIA, LlamaIndex, Ragie, and open-source projects, the industry has converged on a **3-stream parallel extraction** model:
#### The 3-Stream Model
```
Video Input
├──→ Stream 1: ASR (Audio Speech Recognition)
│ Extract spoken words with timestamps
│ Tools: Whisper, YouTube captions API
│ Output: [{text, start, end, confidence}, ...]
├──→ Stream 2: OCR (Optical Character Recognition)
│ Extract visual text (code, slides, diagrams)
│ Tools: OpenCV + scene detection + OCR engine
│ Output: [{text, timestamp, frame_type, bbox}, ...]
└──→ Stream 3: Metadata
Extract structural info (chapters, tags, description)
Tools: yt-dlp, platform APIs
Output: {title, chapters, tags, description, ...}
```
**Key insight (from NVIDIA's multimodal RAG blog):** Ground everything to text first. Align all streams on a shared timeline, then merge into unified text segments. This makes the output compatible with any text-based RAG pipeline without requiring multimodal embeddings.
#### Reference Implementations
| Project | Approach | Strengths | Weaknesses |
|---------|----------|-----------|------------|
| [video-analyzer](https://github.com/byjlw/video-analyzer) | Whisper + OpenCV + LLM analysis | Full pipeline, LLM summaries | No chapter support, no YouTube integration |
| [LlamaIndex MultiModal RAG](https://www.llamaindex.ai/blog/multimodal-rag-for-advanced-video-processing-with-llamaindex-lancedb-33be4804822e) | Frame extraction + CLIP + LanceDB | Vector search over frames | Heavy (requires GPU), no ASR |
| [VideoRAG](https://video-rag.github.io/) | Graph-based reasoning + multimodal retrieval | Multi-hour video support | Research project, not production-ready |
| [Ragie Multimodal RAG](https://www.ragie.ai/blog/how-we-built-multimodal-rag-for-audio-and-video) | faster-whisper large-v3-turbo + OCR + object detection | Production-grade, 3-stream | Proprietary, not open-source |
#### Industry Best Practices
1. **Audio-only download** — Never download full video when you only need audio. Extract audio stream with FFmpeg (`-vn` flag). This is 10-50x smaller.
2. **Prefer existing captions** — YouTube manual captions are higher quality than any ASR model. Only fall back to Whisper when captions unavailable.
3. **Chapter-based segmentation** — YouTube chapters provide natural content boundaries. Use them as primary segmentation, fall back to time-window or semantic splitting.
4. **Confidence filtering** — Auto-generated captions and OCR output include confidence scores. Filter low-confidence content rather than including everything.
5. **Parallel extraction** — Run ASR and OCR in parallel (they're independent). Merge after both complete.
6. **Tiered processing** — Offer fast/light mode (transcript only) and deep mode (+ visual). Let users choose based on their compute budget.
---
## Library Comparison Matrix
### Metadata & Download
| Library | Purpose | Install Size | Actively Maintained | Python API | License |
|---------|---------|-------------|-------------------|------------|---------|
| **yt-dlp** | Metadata + subtitles + download | ~15MB | Yes (weekly releases) | Yes (`YoutubeDL` class) | Unlicense |
| pytube | YouTube download | ~1MB | Inconsistent | Yes | MIT |
| youtube-dl | Download (original) | ~10MB | Stale | Yes | Unlicense |
| pafy | YouTube metadata | ~50KB | Dead (2021) | Yes | LGPL |
**Winner: yt-dlp** — De-facto standard, actively maintained, comprehensive Python API, supports 1000+ sites (not just YouTube).
### Transcript Extraction (YouTube)
| Library | Purpose | Requires Download | Speed | Accuracy | License |
|---------|---------|-------------------|-------|----------|---------|
| **youtube-transcript-api** | YouTube captions | No | Very fast (<1s) | Depends on caption source | MIT |
| yt-dlp subtitles | Download subtitle files | Yes (subtitle only) | Fast (~2s) | Same as above | Unlicense |
**Winner: youtube-transcript-api** — Fastest, no download needed, returns structured JSON with timestamps directly. Falls back to yt-dlp for non-YouTube platforms.
### Speech-to-Text (ASR)
| Library | Speed (30 min audio) | Word Timestamps | Model Sizes | GPU Required | Language Support | License |
|---------|---------------------|----------------|-------------|-------------|-----------------|---------|
| **faster-whisper** | ~2-4 min (GPU), ~8-15 min (CPU) | Yes (`word_timestamps=True`) | tiny (39M) → large-v3 (1.5B) | No (but recommended) | 99 languages | MIT |
| openai-whisper | ~5-10 min (GPU), ~20-40 min (CPU) | Yes | Same models | Recommended | 99 languages | MIT |
| whisper-timestamped | Same as openai-whisper | Yes (more accurate) | Same models | Recommended | 99 languages | MIT |
| whisperx | ~2-3 min (GPU) | Yes (best accuracy via wav2vec2) | Same + wav2vec2 | Yes (required) | 99 languages | BSD |
| stable-ts | Same as openai-whisper | Yes (stabilized) | Same models | Recommended | 99 languages | MIT |
| Google Speech-to-Text | Real-time | Yes | Cloud | No | 125+ languages | Proprietary |
| AssemblyAI | Real-time | Yes | Cloud | No | 100+ languages | Proprietary |
**Winner: faster-whisper** — 4x faster than OpenAI Whisper via CTranslate2 optimization, MIT license, word-level timestamps, works without GPU (just slower), actively maintained. We may consider whisperx as a future upgrade for speaker diarization.
### Scene Detection & Frame Extraction
| Library | Purpose | Algorithm | Speed | License |
|---------|---------|-----------|-------|---------|
| **PySceneDetect** | Scene boundary detection | ContentDetector, ThresholdDetector, AdaptiveDetector | Fast | BSD |
| opencv-python-headless | Frame extraction, image processing | Manual (absdiff, histogram) | Fast | Apache 2.0 |
| Filmstrip | Keyframe extraction | Scene detection + selection | Medium | MIT |
| video-keyframe-detector | Keyframe extraction | Peak estimation from frame diff | Fast | MIT |
| decord | GPU-accelerated frame extraction | Direct frame access | Very fast | Apache 2.0 |
**Winner: PySceneDetect + opencv-python-headless** — PySceneDetect handles intelligent boundary detection, OpenCV handles frame extraction and image processing. Both are well-maintained and BSD/Apache licensed.
### OCR (Optical Character Recognition)
| Library | Languages | GPU Support | Accuracy on Code | Speed | Install Size | License |
|---------|-----------|------------|-------------------|-------|-------------|---------|
| **easyocr** | 80+ | Yes (PyTorch) | Good | Medium | ~150MB + models | Apache 2.0 |
| pytesseract | 100+ | No | Medium | Fast | ~30MB + Tesseract | Apache 2.0 |
| PaddleOCR | 80+ | Yes (PaddlePaddle) | Very Good | Fast | ~200MB + models | Apache 2.0 |
| TrOCR (HuggingFace) | Multilingual | Yes | Good | Slow | ~500MB | MIT |
| docTR | 10+ | Yes (TF/PyTorch) | Good | Medium | ~100MB | Apache 2.0 |
**Winner: easyocr** — Best balance of accuracy (especially on code/terminal text), GPU support, language coverage, and ease of use. PaddleOCR is a close second but has heavier dependencies (PaddlePaddle framework).
---
## Detailed Library Analysis
### 1. yt-dlp (Metadata & Download Engine)
**What it provides:**
- Video metadata (title, description, duration, upload date, channel, tags, categories)
- Chapter information (title, start_time, end_time for each chapter)
- Subtitle/caption download (all available languages, all formats)
- Thumbnail URLs
- View/like counts
- Playlist information (title, entries, ordering)
- Audio-only extraction (no full video download needed)
- Supports 1000+ video sites (YouTube, Vimeo, Dailymotion, etc.)
**Python API usage:**
```python
from yt_dlp import YoutubeDL
def extract_video_metadata(url: str) -> dict:
"""Extract metadata without downloading."""
opts = {
'quiet': True,
'no_warnings': True,
'extract_flat': False, # Full extraction
}
with YoutubeDL(opts) as ydl:
info = ydl.extract_info(url, download=False)
return info
```
**Key fields in `info_dict`:**
```python
{
'id': 'dQw4w9WgXcQ', # Video ID
'title': 'Video Title', # Full title
'description': '...', # Full description text
'duration': 1832, # Duration in seconds
'upload_date': '20260115', # YYYYMMDD format
'uploader': 'Channel Name', # Channel/uploader name
'uploader_id': '@channelname', # Channel ID
'uploader_url': 'https://...', # Channel URL
'channel_follower_count': 150000, # Subscriber count
'view_count': 5000000, # View count
'like_count': 120000, # Like count
'comment_count': 8500, # Comment count
'tags': ['react', 'hooks', ...], # Video tags
'categories': ['Education'], # YouTube categories
'language': 'en', # Primary language
'subtitles': { # Manual captions
'en': [{'ext': 'vtt', 'url': '...'}],
},
'automatic_captions': { # Auto-generated captions
'en': [{'ext': 'vtt', 'url': '...'}],
},
'chapters': [ # Chapter markers
{'title': 'Intro', 'start_time': 0, 'end_time': 45},
{'title': 'Setup', 'start_time': 45, 'end_time': 180},
{'title': 'First Component', 'start_time': 180, 'end_time': 420},
],
'thumbnail': 'https://...', # Best thumbnail URL
'thumbnails': [...], # All thumbnail variants
'webpage_url': 'https://...', # Canonical URL
'formats': [...], # Available formats
'requested_formats': [...], # Selected format info
}
```
**Playlist extraction:**
```python
def extract_playlist(url: str) -> list[dict]:
"""Extract all videos from a playlist."""
opts = {
'quiet': True,
'extract_flat': True, # Don't extract each video yet
}
with YoutubeDL(opts) as ydl:
info = ydl.extract_info(url, download=False)
# info['entries'] contains all video entries
return info.get('entries', [])
```
**Audio-only download (for Whisper):**
```python
def download_audio(url: str, output_dir: str) -> str:
"""Download audio stream only (no video)."""
opts = {
'format': 'bestaudio/best',
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'wav',
'preferredquality': '16', # 16kHz (Whisper's native rate)
}],
'outtmpl': f'{output_dir}/%(id)s.%(ext)s',
'quiet': True,
}
with YoutubeDL(opts) as ydl:
info = ydl.extract_info(url, download=True)
return f"{output_dir}/{info['id']}.wav"
```
### 2. youtube-transcript-api (Caption Extraction)
**What it provides:**
- Direct access to YouTube captions without downloading
- Manual and auto-generated caption support
- Translation support (translate captions to any language)
- Structured output with timestamps
**Python API usage:**
```python
from youtube_transcript_api import YouTubeTranscriptApi
def get_youtube_transcript(video_id: str, languages: list[str] = None) -> list[dict]:
"""Get transcript with timestamps."""
languages = languages or ['en']
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
# Prefer manual captions over auto-generated
try:
transcript = transcript_list.find_manually_created_transcript(languages)
except Exception:
transcript = transcript_list.find_generated_transcript(languages)
# Fetch the actual transcript data
data = transcript.fetch()
return data
# Returns: [{'text': 'Hello', 'start': 0.0, 'duration': 1.5}, ...]
```
**Output format:**
```python
[
{
'text': "Welcome to this React tutorial",
'start': 0.0, # Start time in seconds
'duration': 2.5 # Duration in seconds
},
{
'text': "Today we'll learn about hooks",
'start': 2.5,
'duration': 3.0
},
# ... continues for entire video
]
```
**Key features:**
- Segments are typically 2-5 seconds each
- Manual captions have punctuation and proper casing
- Auto-generated captions may lack punctuation and have lower accuracy
- Can detect available languages and caption types
### 3. faster-whisper (Speech-to-Text)
**What it provides:**
- OpenAI Whisper models with 4x speedup via CTranslate2
- Word-level timestamps with confidence scores
- Language detection
- VAD (Voice Activity Detection) filtering
- Multiple model sizes from tiny (39M) to large-v3 (1.5B)
**Python API usage:**
```python
from faster_whisper import WhisperModel
def transcribe_with_whisper(audio_path: str, model_size: str = "base") -> dict:
"""Transcribe audio file with word-level timestamps."""
model = WhisperModel(
model_size,
device="auto", # auto-detect GPU/CPU
compute_type="auto", # auto-select precision
)
segments, info = model.transcribe(
audio_path,
word_timestamps=True,
vad_filter=True, # Filter silence
vad_parameters={
"min_silence_duration_ms": 500,
},
)
result = {
'language': info.language,
'language_probability': info.language_probability,
'duration': info.duration,
'segments': [],
}
for segment in segments:
seg_data = {
'start': segment.start,
'end': segment.end,
'text': segment.text.strip(),
'avg_logprob': segment.avg_logprob,
'no_speech_prob': segment.no_speech_prob,
'words': [],
}
if segment.words:
for word in segment.words:
seg_data['words'].append({
'word': word.word,
'start': word.start,
'end': word.end,
'probability': word.probability,
})
result['segments'].append(seg_data)
return result
```
**Model size guide:**
| Model | Parameters | English WER | Multilingual WER | VRAM (FP16) | Speed (30 min, GPU) |
|-------|-----------|-------------|------------------|-------------|---------------------|
| tiny | 39M | 14.8% | 23.2% | ~1GB | ~30s |
| base | 74M | 11.5% | 18.7% | ~1GB | ~45s |
| small | 244M | 9.5% | 14.6% | ~2GB | ~90s |
| medium | 769M | 8.0% | 12.4% | ~5GB | ~180s |
| large-v3 | 1.5B | 5.7% | 10.1% | ~10GB | ~240s |
| large-v3-turbo | 809M | 6.2% | 10.8% | ~6GB | ~120s |
**Recommendation:** Default to `base` (good balance), offer `large-v3-turbo` for best accuracy, `tiny` for speed.
### 4. PySceneDetect (Scene Boundary Detection)
**What it provides:**
- Automatic scene/cut detection in video files
- Multiple detection algorithms (content-based, threshold, adaptive)
- Frame-accurate boundaries
- Integration with OpenCV
**Python API usage:**
```python
from scenedetect import detect, ContentDetector, AdaptiveDetector
def detect_scene_changes(video_path: str) -> list[tuple[float, float]]:
"""Detect scene boundaries in video.
Returns list of (start_time, end_time) tuples.
"""
scene_list = detect(
video_path,
ContentDetector(
threshold=27.0, # Sensitivity (lower = more scenes)
min_scene_len=15, # Minimum 15 frames per scene
),
)
boundaries = []
for scene in scene_list:
start = scene[0].get_seconds()
end = scene[1].get_seconds()
boundaries.append((start, end))
return boundaries
```
**Detection algorithms:**
| Algorithm | Best For | Speed | Sensitivity |
|-----------|----------|-------|-------------|
| ContentDetector | General content changes | Fast | Medium |
| AdaptiveDetector | Gradual transitions | Medium | High |
| ThresholdDetector | Hard cuts (black frames) | Very fast | Low |
### 5. easyocr (Text Recognition)
**What it provides:**
- Text detection and recognition from images
- 80+ language support
- GPU acceleration
- Bounding box coordinates for each text region
- Confidence scores
**Python API usage:**
```python
import easyocr
def extract_text_from_frame(image_path: str, languages: list[str] = None) -> list[dict]:
"""Extract text from a video frame image."""
languages = languages or ['en']
reader = easyocr.Reader(languages, gpu=True)
results = reader.readtext(image_path)
# results: [([x1,y1],[x2,y2],[x3,y3],[x4,y4]), text, confidence]
extracted = []
for bbox, text, confidence in results:
extracted.append({
'text': text,
'confidence': confidence,
'bbox': bbox, # Corner coordinates
})
return extracted
```
**Tips for code/terminal OCR:**
- Pre-process images: increase contrast, convert to grayscale
- Use higher DPI/resolution frames
- Filter by confidence threshold (>0.5 for code)
- Detect monospace regions first, then OCR only those regions
### 6. OpenCV (Frame Extraction)
**What it provides:**
- Video file reading and frame extraction
- Image processing (resize, crop, color conversion)
- Template matching (detect code editors, terminals)
- Histogram analysis (detect slide vs code vs webcam)
**Python API usage:**
```python
import cv2
import numpy as np
def extract_frames_at_timestamps(
video_path: str,
timestamps: list[float],
output_dir: str
) -> list[str]:
"""Extract frames at specific timestamps."""
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
frame_paths = []
for ts in timestamps:
frame_number = int(ts * fps)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)
ret, frame = cap.read()
if ret:
path = f"{output_dir}/frame_{ts:.2f}.png"
cv2.imwrite(path, frame)
frame_paths.append(path)
cap.release()
return frame_paths
def classify_frame(image_path: str) -> str:
"""Classify frame as code/slide/terminal/webcam/other.
Uses heuristics:
- Dark background + monospace text regions = code/terminal
- Light background + large text blocks = slide
- Face detection = webcam
- High color variance = diagram
"""
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
h, w = gray.shape
# Check brightness distribution
mean_brightness = np.mean(gray)
brightness_std = np.std(gray)
# Dark background with structured content = code/terminal
if mean_brightness < 80 and brightness_std > 40:
return 'code' # or 'terminal'
# Light background with text blocks = slide
if mean_brightness > 180 and brightness_std < 60:
return 'slide'
# High edge density = diagram
edges = cv2.Canny(gray, 50, 150)
edge_density = np.count_nonzero(edges) / (h * w)
if edge_density > 0.15:
return 'diagram'
return 'other'
```
---
## Benchmarks & Performance Data
### Transcript Extraction Speed
| Method | 10 min video | 30 min video | 60 min video | Requires Download |
|--------|-------------|-------------|-------------|-------------------|
| youtube-transcript-api | ~0.5s | ~0.5s | ~0.5s | No |
| yt-dlp subtitles | ~2s | ~2s | ~2s | Subtitle file only |
| faster-whisper (tiny, GPU) | ~10s | ~30s | ~60s | Audio only |
| faster-whisper (base, GPU) | ~15s | ~45s | ~90s | Audio only |
| faster-whisper (large-v3, GPU) | ~80s | ~240s | ~480s | Audio only |
| faster-whisper (base, CPU) | ~60s | ~180s | ~360s | Audio only |
### Visual Extraction Speed
| Operation | Per Frame | Per 10 min video (50 keyframes) |
|-----------|----------|-------------------------------|
| Frame extraction (OpenCV) | ~5ms | ~0.25s |
| Scene detection (PySceneDetect) | N/A | ~15s for full video |
| Frame classification (heuristic) | ~10ms | ~0.5s |
| OCR per frame (easyocr, GPU) | ~200ms | ~10s |
| OCR per frame (easyocr, CPU) | ~1-2s | ~50-100s |
### Total Pipeline Time (estimated)
| Mode | 10 min video | 30 min video | 1 hour video |
|------|-------------|-------------|-------------|
| Transcript only (YouTube captions) | ~2s | ~2s | ~2s |
| Transcript only (Whisper base, GPU) | ~20s | ~50s | ~100s |
| Full (transcript + visual, GPU) | ~35s | ~80s | ~170s |
| Full (transcript + visual, CPU) | ~120s | ~350s | ~700s |
---
## Recommendations
### Primary Stack (Chosen)
| Component | Library | Why |
|-----------|---------|-----|
| Metadata + download | **yt-dlp** | De-facto standard, 1000+ sites, comprehensive Python API |
| YouTube transcripts | **youtube-transcript-api** | Fastest, no download, structured output |
| Speech-to-text | **faster-whisper** | 4x faster than Whisper, MIT, word timestamps |
| Scene detection | **PySceneDetect** | Best algorithm options, OpenCV-based |
| Frame extraction | **opencv-python-headless** | Standard, headless (no GUI deps) |
| OCR | **easyocr** | Best code/terminal accuracy, 80+ languages, GPU support |
### Future Considerations
| Component | Library | When to Add |
|-----------|---------|-------------|
| Speaker diarization | **whisperx** or **pyannote** | V2.0 — identify who said what |
| Object detection | **YOLO** | V2.0 — detect UI elements, diagrams |
| Multimodal embeddings | **CLIP** | V2.0 — embed frames for visual search |
| Slide detection | **python-pptx** + heuristics | V1.5 — detect and extract slide content |
### Sources
- [youtube-transcript-api (PyPI)](https://pypi.org/project/youtube-transcript-api/)
- [yt-dlp GitHub](https://github.com/yt-dlp/yt-dlp)
- [yt-dlp Information Extraction Pipeline (DeepWiki)](https://deepwiki.com/yt-dlp/yt-dlp/2.2-information-extraction-pipeline)
- [faster-whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
- [faster-whisper (PyPI)](https://pypi.org/project/faster-whisper/)
- [whisper-timestamped GitHub](https://github.com/linto-ai/whisper-timestamped)
- [stable-ts (PyPI)](https://pypi.org/project/stable-ts/)
- [PySceneDetect GitHub](https://github.com/Breakthrough/PySceneDetect)
- [easyocr GitHub (implied from PyPI)](https://pypi.org/project/easyocr/)
- [NVIDIA Multimodal RAG for Video and Audio](https://developer.nvidia.com/blog/an-easy-introduction-to-multimodal-retrieval-augmented-generation-for-video-and-audio/)
- [LlamaIndex MultiModal RAG for Video](https://www.llamaindex.ai/blog/multimodal-rag-for-advanced-video-processing-with-llamaindex-lancedb-33be4804822e)
- [Ragie: How We Built Multimodal RAG](https://www.ragie.ai/blog/how-we-built-multimodal-rag-for-audio-and-video)
- [video-analyzer GitHub](https://github.com/byjlw/video-analyzer)
- [VideoRAG Project](https://video-rag.github.io/)
- [video-keyframe-detector GitHub](https://github.com/joelibaceta/video-keyframe-detector)
- [Filmstrip GitHub](https://github.com/tafsiri/filmstrip)

View File

@@ -0,0 +1,972 @@
# Video Source — Data Models & Type Definitions
**Date:** February 27, 2026
**Document:** 02 of 07
**Status:** Planning
---
## Table of Contents
1. [Design Principles](#design-principles)
2. [Core Data Classes](#core-data-classes)
3. [Supporting Data Classes](#supporting-data-classes)
4. [Enumerations](#enumerations)
5. [JSON Schema (Serialization)](#json-schema-serialization)
6. [Relationships Diagram](#relationships-diagram)
7. [Config Schema (Unified Config)](#config-schema-unified-config)
---
## Design Principles
1. **Immutable after creation** — Use `@dataclass(frozen=True)` for segments and frames. Once extracted, data doesn't change.
2. **Serializable** — Every data class must serialize to/from JSON for caching, output, and inter-process communication.
3. **Timeline-aligned** — Every piece of data has `start_time` and `end_time` fields. This is the alignment axis for merging streams.
4. **Confidence-scored** — Every extracted piece of content carries a confidence score for quality filtering.
5. **Source-aware** — Every piece of data traces back to its origin (which video, which stream, which tool).
6. **Compatible** — Output structures must be compatible with existing Skill Seekers page/reference format for seamless integration.
---
## Core Data Classes
### VideoInfo — The top-level container for a single video
```python
@dataclass
class VideoInfo:
"""Complete metadata and extracted content for a single video.
This is the primary output of the video scraper for one video.
It contains raw metadata from the platform, plus all extracted
and aligned content (segments).
Lifecycle:
1. Created with metadata during resolve phase
2. Transcript populated during ASR phase
3. Visual data populated during OCR phase (if enabled)
4. Segments populated during alignment phase
"""
# === Identity ===
video_id: str
"""Unique identifier.
- YouTube: 11-char video ID (e.g., 'dQw4w9WgXcQ')
- Vimeo: numeric ID (e.g., '123456789')
- Local: SHA-256 hash of file path
"""
source_type: VideoSourceType
"""Where this video came from (youtube, vimeo, local_file)."""
source_url: str | None
"""Original URL for online videos. None for local files."""
file_path: str | None
"""Local file path. Set for local files, or after download for
online videos that needed audio extraction."""
# === Basic Metadata ===
title: str
"""Video title. For local files, derived from filename."""
description: str
"""Full description text. Empty string for local files without metadata."""
duration: float
"""Duration in seconds."""
upload_date: str | None
"""Upload/creation date in ISO 8601 format (YYYY-MM-DD).
None if unknown."""
language: str
"""Primary language code (e.g., 'en', 'tr', 'ja').
Detected from captions, Whisper, or metadata."""
# === Channel / Author ===
channel_name: str | None
"""Channel or uploader name."""
channel_url: str | None
"""URL to the channel/uploader page."""
channel_subscriber_count: int | None
"""Subscriber/follower count. Quality signal."""
# === Engagement Metadata (quality signals) ===
view_count: int | None
"""Total view count. Higher = more authoritative."""
like_count: int | None
"""Like count."""
comment_count: int | None
"""Comment count. Higher = more discussion."""
# === Discovery Metadata ===
tags: list[str]
"""Video tags from platform. Used for categorization."""
categories: list[str]
"""Platform categories (e.g., ['Education', 'Science & Technology'])."""
thumbnail_url: str | None
"""URL to the best quality thumbnail."""
# === Structure ===
chapters: list[Chapter]
"""YouTube chapter markers. Empty list if no chapters.
This is the PRIMARY segmentation source."""
# === Playlist Context ===
playlist_title: str | None
"""Title of the playlist this video belongs to. None if standalone."""
playlist_index: int | None
"""0-based index within the playlist. None if standalone."""
playlist_total: int | None
"""Total number of videos in the playlist. None if standalone."""
# === Extracted Content (populated during processing) ===
raw_transcript: list[TranscriptSegment]
"""Raw transcript segments as received from YouTube API or Whisper.
Before alignment and merging."""
segments: list[VideoSegment]
"""Final aligned and merged segments. This is the PRIMARY output.
Each segment combines ASR + OCR + metadata into a single unit."""
# === Processing Metadata ===
transcript_source: TranscriptSource
"""How the transcript was obtained."""
visual_extraction_enabled: bool
"""Whether OCR/frame extraction was performed."""
whisper_model: str | None
"""Whisper model used, if applicable (e.g., 'base', 'large-v3')."""
processing_time_seconds: float
"""Total processing time for this video."""
extracted_at: str
"""ISO 8601 timestamp of when extraction was performed."""
# === Quality Scores (computed) ===
transcript_confidence: float
"""Average confidence of transcript (0.0 - 1.0).
Based on caption type or Whisper probability."""
content_richness_score: float
"""How rich/useful the extracted content is (0.0 - 1.0).
Based on: duration, chapters present, code detected, engagement."""
def to_dict(self) -> dict:
"""Serialize to JSON-compatible dictionary."""
...
@classmethod
def from_dict(cls, data: dict) -> 'VideoInfo':
"""Deserialize from dictionary."""
...
```
### VideoSegment — The fundamental aligned content unit
```python
@dataclass
class VideoSegment:
"""A time-aligned segment combining all 3 extraction streams.
This is the CORE data unit of the video pipeline. Every piece
of video content is broken into segments that align:
- ASR transcript (what was said)
- OCR content (what was shown on screen)
- Metadata (chapter title, topic)
Segments are then used to generate reference markdown files
and integrate into SKILL.md.
Segmentation strategies (in priority order):
1. Chapter boundaries (YouTube chapters)
2. Semantic boundaries (topic shifts detected by NLP)
3. Time windows (configurable interval, default 3-5 minutes)
"""
# === Time Bounds ===
index: int
"""0-based segment index within the video."""
start_time: float
"""Start time in seconds."""
end_time: float
"""End time in seconds."""
duration: float
"""Segment duration in seconds (end_time - start_time)."""
# === Stream 1: ASR (Audio) ===
transcript: str
"""Full transcript text for this time window.
Concatenated from word-level timestamps."""
words: list[WordTimestamp]
"""Word-level timestamps within this segment.
Allows precise text-to-time mapping."""
transcript_confidence: float
"""Average confidence for this segment's transcript (0.0 - 1.0)."""
# === Stream 2: OCR (Visual) ===
keyframes: list[KeyFrame]
"""Extracted keyframes within this time window.
Only populated if visual_extraction is enabled."""
ocr_text: str
"""Combined OCR text from all keyframes in this segment.
Deduplicated and cleaned."""
detected_code_blocks: list[CodeBlock]
"""Code blocks detected on screen via OCR.
Includes language detection and formatted code."""
has_code_on_screen: bool
"""Whether code/terminal was detected on screen."""
has_slides: bool
"""Whether presentation slides were detected."""
has_diagram: bool
"""Whether diagrams/architecture drawings were detected."""
# === Stream 3: Metadata ===
chapter_title: str | None
"""YouTube chapter title if this segment maps to a chapter.
None if video has no chapters or segment spans chapter boundary."""
topic: str | None
"""Inferred topic for this segment.
Derived from chapter title, transcript keywords, or AI classification."""
category: str | None
"""Mapped category (e.g., 'getting_started', 'api', 'tutorial').
Uses the same categorization system as other sources."""
# === Merged Content ===
content: str
"""Final merged text content for this segment.
Merging strategy:
1. Start with transcript text
2. If code detected on screen but not mentioned in transcript,
append code block with annotation
3. If slide text detected, integrate as supplementary content
4. Add chapter title as heading if present
This is what gets written to reference markdown files.
"""
summary: str | None
"""AI-generated summary of this segment (populated during enhancement).
None until enhancement phase."""
# === Quality Metadata ===
confidence: float
"""Overall confidence for this segment (0.0 - 1.0).
Weighted average of transcript + OCR confidences."""
content_type: SegmentContentType
"""Primary content type of this segment."""
def to_dict(self) -> dict:
"""Serialize to JSON-compatible dictionary."""
...
@classmethod
def from_dict(cls, data: dict) -> 'VideoSegment':
"""Deserialize from dictionary."""
...
@property
def timestamp_display(self) -> str:
"""Human-readable timestamp (e.g., '05:30 - 08:15')."""
start_min, start_sec = divmod(int(self.start_time), 60)
end_min, end_sec = divmod(int(self.end_time), 60)
return f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
@property
def youtube_timestamp_url(self) -> str | None:
"""YouTube URL with timestamp parameter (e.g., '?t=330').
Returns None if not a YouTube video."""
...
```
---
## Supporting Data Classes
### Chapter — YouTube chapter marker
```python
@dataclass(frozen=True)
class Chapter:
"""A chapter marker from a video (typically YouTube).
Chapters provide natural content boundaries and are the
preferred segmentation method.
"""
title: str
"""Chapter title as shown in YouTube."""
start_time: float
"""Start time in seconds."""
end_time: float
"""End time in seconds."""
@property
def duration(self) -> float:
return self.end_time - self.start_time
def to_dict(self) -> dict:
return {
'title': self.title,
'start_time': self.start_time,
'end_time': self.end_time,
}
```
### TranscriptSegment — Raw transcript chunk from API/Whisper
```python
@dataclass(frozen=True)
class TranscriptSegment:
"""A raw transcript segment as received from the source.
This is the unprocessed output from youtube-transcript-api or
faster-whisper, before alignment and merging.
youtube-transcript-api segments are typically 2-5 seconds each.
faster-whisper segments are typically sentence-level (5-30 seconds).
"""
text: str
"""Transcript text for this segment."""
start: float
"""Start time in seconds."""
end: float
"""End time in seconds. Computed as start + duration for YouTube API."""
confidence: float
"""Confidence score (0.0 - 1.0).
- YouTube manual captions: 1.0 (assumed perfect)
- YouTube auto-generated: 0.8 (estimated)
- Whisper: actual model probability
"""
words: list[WordTimestamp] | None
"""Word-level timestamps, if available.
Always available from faster-whisper.
Not available from youtube-transcript-api.
"""
source: TranscriptSource
"""Which tool produced this segment."""
def to_dict(self) -> dict:
return {
'text': self.text,
'start': self.start,
'end': self.end,
'confidence': self.confidence,
'words': [w.to_dict() for w in self.words] if self.words else None,
'source': self.source.value,
}
```
### WordTimestamp — Individual word with timing
```python
@dataclass(frozen=True)
class WordTimestamp:
"""A single word with precise timing information.
Enables precise text-to-time mapping within segments.
Essential for aligning ASR with OCR content.
"""
word: str
"""The word text."""
start: float
"""Start time in seconds."""
end: float
"""End time in seconds."""
probability: float
"""Model confidence for this word (0.0 - 1.0).
From faster-whisper's word_timestamps output."""
def to_dict(self) -> dict:
return {
'word': self.word,
'start': self.start,
'end': self.end,
'probability': self.probability,
}
```
### KeyFrame — Extracted video frame with analysis
```python
@dataclass
class KeyFrame:
"""An extracted video frame with visual analysis results.
Keyframes are extracted at:
1. Scene change boundaries (PySceneDetect)
2. Chapter boundaries
3. Regular intervals within segments (configurable)
Each frame is classified and optionally OCR'd.
"""
timestamp: float
"""Exact timestamp in seconds where this frame was extracted."""
image_path: str
"""Path to the saved frame image file (PNG).
Relative to the video_data/frames/ directory."""
frame_type: FrameType
"""Classification of what this frame shows."""
scene_change_score: float
"""How different this frame is from the previous one (0.0 - 1.0).
Higher = more significant visual change.
From PySceneDetect's content detection."""
# === OCR Results ===
ocr_regions: list[OCRRegion]
"""All text regions detected in this frame.
Empty list if OCR was not performed or no text detected."""
ocr_text: str
"""Combined OCR text from all regions.
Cleaned and deduplicated."""
ocr_confidence: float
"""Average OCR confidence across all regions (0.0 - 1.0)."""
# === Frame Properties ===
width: int
"""Frame width in pixels."""
height: int
"""Frame height in pixels."""
mean_brightness: float
"""Average brightness (0-255). Used for classification."""
def to_dict(self) -> dict:
return {
'timestamp': self.timestamp,
'image_path': self.image_path,
'frame_type': self.frame_type.value,
'scene_change_score': self.scene_change_score,
'ocr_regions': [r.to_dict() for r in self.ocr_regions],
'ocr_text': self.ocr_text,
'ocr_confidence': self.ocr_confidence,
'width': self.width,
'height': self.height,
}
```
### OCRRegion — A detected text region in a frame
```python
@dataclass(frozen=True)
class OCRRegion:
"""A single text region detected by OCR within a frame.
Includes bounding box coordinates for spatial analysis
(e.g., detecting code editors vs. slide titles).
"""
text: str
"""Detected text content."""
confidence: float
"""OCR confidence (0.0 - 1.0)."""
bbox: tuple[int, int, int, int]
"""Bounding box as (x1, y1, x2, y2) in pixels.
Top-left to bottom-right."""
is_monospace: bool
"""Whether the text appears to be in a monospace font.
Indicates code/terminal content."""
def to_dict(self) -> dict:
return {
'text': self.text,
'confidence': self.confidence,
'bbox': list(self.bbox),
'is_monospace': self.is_monospace,
}
```
### CodeBlock — Detected code on screen
```python
@dataclass
class CodeBlock:
"""A code block detected via OCR from video frames.
Represents code that was visible on screen during a segment.
May come from a code editor, terminal, or presentation slide.
"""
code: str
"""The extracted code text. Cleaned and formatted."""
language: str | None
"""Detected programming language (e.g., 'python', 'javascript').
Uses the same detection heuristics as doc_scraper.detect_language().
None if language cannot be determined."""
source_frame: float
"""Timestamp of the frame where this code was extracted."""
context: CodeContext
"""Where the code appeared (editor, terminal, slide)."""
confidence: float
"""OCR confidence for this code block (0.0 - 1.0)."""
def to_dict(self) -> dict:
return {
'code': self.code,
'language': self.language,
'source_frame': self.source_frame,
'context': self.context.value,
'confidence': self.confidence,
}
```
### VideoPlaylist — Container for playlist processing
```python
@dataclass
class VideoPlaylist:
"""A playlist or channel containing multiple videos.
Used to track multi-video processing state and ordering.
"""
playlist_id: str
"""Platform playlist ID."""
title: str
"""Playlist title."""
description: str
"""Playlist description."""
channel_name: str | None
"""Channel that owns the playlist."""
video_count: int
"""Total number of videos in the playlist."""
videos: list[VideoInfo]
"""Extracted video information for each video.
Ordered by playlist index."""
source_url: str
"""Original playlist URL."""
def to_dict(self) -> dict:
return {
'playlist_id': self.playlist_id,
'title': self.title,
'description': self.description,
'channel_name': self.channel_name,
'video_count': self.video_count,
'videos': [v.to_dict() for v in self.videos],
'source_url': self.source_url,
}
```
### VideoScraperResult — Top-level scraper output
```python
@dataclass
class VideoScraperResult:
"""Complete result from the video scraper.
This is the top-level output that gets passed to the
unified scraper and SKILL.md builder.
"""
videos: list[VideoInfo]
"""All processed videos."""
playlists: list[VideoPlaylist]
"""Playlist containers (if input was playlists)."""
total_duration_seconds: float
"""Sum of all video durations."""
total_segments: int
"""Sum of all segments across all videos."""
total_code_blocks: int
"""Total code blocks detected across all videos."""
categories: dict[str, list[VideoSegment]]
"""Segments grouped by detected category.
Same category system as other sources."""
config: VideoSourceConfig
"""Configuration used for this scrape."""
processing_time_seconds: float
"""Total pipeline processing time."""
warnings: list[str]
"""Any warnings generated during processing (e.g., missing captions)."""
errors: list[VideoError]
"""Errors for individual videos that failed processing."""
def to_dict(self) -> dict:
...
```
---
## Enumerations
```python
from enum import Enum
class VideoSourceType(Enum):
"""Where a video came from."""
YOUTUBE = "youtube"
VIMEO = "vimeo"
LOCAL_FILE = "local_file"
LOCAL_DIRECTORY = "local_directory"
class TranscriptSource(Enum):
"""How the transcript was obtained."""
YOUTUBE_MANUAL = "youtube_manual" # Human-created captions
YOUTUBE_AUTO = "youtube_auto_generated" # YouTube's ASR
WHISPER = "whisper" # faster-whisper local ASR
SUBTITLE_FILE = "subtitle_file" # SRT/VTT file alongside video
NONE = "none" # No transcript available
class FrameType(Enum):
"""Classification of a keyframe's visual content."""
CODE_EDITOR = "code_editor" # IDE or code editor visible
TERMINAL = "terminal" # Terminal/command line
SLIDE = "slide" # Presentation slide
DIAGRAM = "diagram" # Architecture/flow diagram
BROWSER = "browser" # Web browser (documentation, output)
WEBCAM = "webcam" # Speaker face/webcam only
SCREENCAST = "screencast" # General screen recording
OTHER = "other" # Unclassified
class CodeContext(Enum):
"""Where code was displayed in the video."""
EDITOR = "editor" # Code editor / IDE
TERMINAL = "terminal" # Terminal / command line output
SLIDE = "slide" # Code on a presentation slide
BROWSER = "browser" # Code in a browser (docs, playground)
UNKNOWN = "unknown"
class SegmentContentType(Enum):
"""Primary content type of a video segment."""
EXPLANATION = "explanation" # Talking/explaining concepts
LIVE_CODING = "live_coding" # Writing code on screen
DEMO = "demo" # Running/showing a demo
SLIDES = "slides" # Presentation slides
Q_AND_A = "q_and_a" # Q&A section
INTRO = "intro" # Introduction/overview
OUTRO = "outro" # Conclusion/wrap-up
MIXED = "mixed" # Combination of types
class SegmentationStrategy(Enum):
"""How segments are determined."""
CHAPTERS = "chapters" # YouTube chapter boundaries
SEMANTIC = "semantic" # Topic shift detection
TIME_WINDOW = "time_window" # Fixed time intervals
SCENE_CHANGE = "scene_change" # Visual scene changes
HYBRID = "hybrid" # Combination of strategies
```
---
## JSON Schema (Serialization)
### VideoSegment JSON
```json
{
"index": 0,
"start_time": 45.0,
"end_time": 180.0,
"duration": 135.0,
"transcript": "Let's start by setting up our React project. First, we'll use Create React App...",
"words": [
{"word": "Let's", "start": 45.0, "end": 45.3, "probability": 0.95},
{"word": "start", "start": 45.3, "end": 45.6, "probability": 0.98}
],
"transcript_confidence": 0.94,
"keyframes": [
{
"timestamp": 52.3,
"image_path": "frames/video_abc123/frame_52.30.png",
"frame_type": "terminal",
"scene_change_score": 0.72,
"ocr_text": "npx create-react-app my-app",
"ocr_confidence": 0.89,
"ocr_regions": [
{
"text": "npx create-react-app my-app",
"confidence": 0.89,
"bbox": [120, 340, 580, 370],
"is_monospace": true
}
],
"width": 1920,
"height": 1080
}
],
"ocr_text": "npx create-react-app my-app\ncd my-app\nnpm start",
"detected_code_blocks": [
{
"code": "npx create-react-app my-app\ncd my-app\nnpm start",
"language": "bash",
"source_frame": 52.3,
"context": "terminal",
"confidence": 0.89
}
],
"has_code_on_screen": true,
"has_slides": false,
"has_diagram": false,
"chapter_title": "Project Setup",
"topic": "react project setup",
"category": "getting_started",
"content": "## Project Setup (00:45 - 03:00)\n\nLet's start by setting up our React project...\n\n```bash\nnpx create-react-app my-app\ncd my-app\nnpm start\n```\n",
"summary": null,
"confidence": 0.92,
"content_type": "live_coding"
}
```
### VideoInfo JSON (abbreviated)
```json
{
"video_id": "abc123def45",
"source_type": "youtube",
"source_url": "https://www.youtube.com/watch?v=abc123def45",
"file_path": null,
"title": "React Hooks Tutorial for Beginners",
"description": "Learn React Hooks from scratch...",
"duration": 1832.0,
"upload_date": "2026-01-15",
"language": "en",
"channel_name": "React Official",
"channel_url": "https://www.youtube.com/@reactofficial",
"channel_subscriber_count": 250000,
"view_count": 1500000,
"like_count": 45000,
"comment_count": 2300,
"tags": ["react", "hooks", "tutorial", "javascript"],
"categories": ["Education"],
"thumbnail_url": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
"chapters": [
{"title": "Intro", "start_time": 0.0, "end_time": 45.0},
{"title": "Project Setup", "start_time": 45.0, "end_time": 180.0},
{"title": "useState Hook", "start_time": 180.0, "end_time": 540.0}
],
"playlist_title": "React Complete Course",
"playlist_index": 3,
"playlist_total": 12,
"segments": ["... (see VideoSegment JSON above)"],
"transcript_source": "youtube_manual",
"visual_extraction_enabled": true,
"whisper_model": null,
"processing_time_seconds": 45.2,
"extracted_at": "2026-02-27T14:30:00Z",
"transcript_confidence": 0.95,
"content_richness_score": 0.88
}
```
---
## Relationships Diagram
```
VideoScraperResult
├── videos: list[VideoInfo]
│ ├── chapters: list[Chapter]
│ ├── raw_transcript: list[TranscriptSegment]
│ │ └── words: list[WordTimestamp] | None
│ └── segments: list[VideoSegment] ← PRIMARY OUTPUT
│ ├── words: list[WordTimestamp]
│ ├── keyframes: list[KeyFrame]
│ │ └── ocr_regions: list[OCRRegion]
│ └── detected_code_blocks: list[CodeBlock]
├── playlists: list[VideoPlaylist]
│ └── videos: list[VideoInfo] ← same as above
├── categories: dict[str, list[VideoSegment]]
├── config: VideoSourceConfig
└── errors: list[VideoError]
```
---
## Config Schema (Unified Config)
### Video source in unified config JSON
```json
{
"type": "video",
"_comment_source": "One of: url, playlist, channel, path, directory",
"url": "https://www.youtube.com/watch?v=abc123",
"playlist": "https://www.youtube.com/playlist?list=PLxxx",
"channel": "https://www.youtube.com/@channelname",
"path": "./recordings/tutorial.mp4",
"directory": "./recordings/",
"name": "official_tutorials",
"description": "Official React tutorial videos",
"weight": 0.2,
"_comment_filtering": "Control which videos to process",
"max_videos": 20,
"min_duration": 60,
"max_duration": 7200,
"languages": ["en"],
"title_include_patterns": ["tutorial", "guide"],
"title_exclude_patterns": ["shorts", "live stream"],
"min_views": 1000,
"upload_after": "2024-01-01",
"_comment_extraction": "Control extraction depth",
"visual_extraction": true,
"whisper_model": "base",
"whisper_device": "auto",
"ocr_languages": ["en"],
"keyframe_interval": 5.0,
"min_scene_change_score": 0.3,
"ocr_confidence_threshold": 0.5,
"transcript_confidence_threshold": 0.3,
"_comment_segmentation": "Control how content is segmented",
"segmentation_strategy": "hybrid",
"time_window_seconds": 300,
"merge_short_segments": true,
"min_segment_duration": 30,
"max_segment_duration": 600,
"_comment_categorization": "Map segments to categories",
"categories": {
"getting_started": ["intro", "quickstart", "setup", "install"],
"hooks": ["useState", "useEffect", "useContext", "hooks"],
"components": ["component", "props", "state", "render"],
"advanced": ["performance", "suspense", "concurrent", "ssr"]
},
"_comment_local_files": "For local video sources",
"file_patterns": ["*.mp4", "*.mkv", "*.webm"],
"subtitle_patterns": ["*.srt", "*.vtt"],
"recursive": true
}
```
### VideoSourceConfig dataclass (parsed from JSON)
```python
@dataclass
class VideoSourceConfig:
"""Configuration for video source processing.
Parsed from the 'sources' entry in unified config JSON.
Provides defaults for all optional fields.
"""
# Source specification (exactly one must be set)
url: str | None = None
playlist: str | None = None
channel: str | None = None
path: str | None = None
directory: str | None = None
# Identity
name: str = "video"
description: str = ""
weight: float = 0.2
# Filtering
max_videos: int = 50
min_duration: float = 60.0 # 1 minute
max_duration: float = 7200.0 # 2 hours
languages: list[str] | None = None # None = all languages
title_include_patterns: list[str] | None = None
title_exclude_patterns: list[str] | None = None
min_views: int | None = None
upload_after: str | None = None # ISO date
# Extraction
visual_extraction: bool = False # Off by default (heavy)
whisper_model: str = "base"
whisper_device: str = "auto" # 'auto', 'cpu', 'cuda'
ocr_languages: list[str] | None = None
keyframe_interval: float = 5.0 # Extract frame every N seconds within segment
min_scene_change_score: float = 0.3
ocr_confidence_threshold: float = 0.5
transcript_confidence_threshold: float = 0.3
# Segmentation
segmentation_strategy: str = "hybrid"
time_window_seconds: float = 300.0 # 5 minutes
merge_short_segments: bool = True
min_segment_duration: float = 30.0
max_segment_duration: float = 600.0
# Categorization
categories: dict[str, list[str]] | None = None
# Local file options
file_patterns: list[str] | None = None
subtitle_patterns: list[str] | None = None
recursive: bool = True
@classmethod
def from_dict(cls, data: dict) -> 'VideoSourceConfig':
"""Create config from unified config source entry."""
...
def validate(self) -> list[str]:
"""Validate configuration. Returns list of errors."""
errors = []
sources_set = sum(1 for s in [self.url, self.playlist, self.channel,
self.path, self.directory] if s is not None)
if sources_set == 0:
errors.append("Video source must specify one of: url, playlist, channel, path, directory")
if sources_set > 1:
errors.append("Video source must specify exactly one source type")
if self.min_duration >= self.max_duration:
errors.append("min_duration must be less than max_duration")
if self.min_segment_duration >= self.max_segment_duration:
errors.append("min_segment_duration must be less than max_segment_duration")
return errors
```

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,808 @@
# Video Source — System Integration
**Date:** February 27, 2026
**Document:** 04 of 07
**Status:** Planning
---
## Table of Contents
1. [CLI Integration](#cli-integration)
2. [Source Detection](#source-detection)
3. [Unified Config Integration](#unified-config-integration)
4. [Unified Scraper Integration](#unified-scraper-integration)
5. [Create Command Integration](#create-command-integration)
6. [Parser & Arguments](#parser--arguments)
7. [MCP Tool Integration](#mcp-tool-integration)
8. [Enhancement Integration](#enhancement-integration)
9. [File Map (New & Modified)](#file-map-new--modified-files)
---
## CLI Integration
### New Subcommand: `video`
```bash
# Dedicated video scraping command
skill-seekers video --url https://youtube.com/watch?v=abc123
skill-seekers video --playlist https://youtube.com/playlist?list=PLxxx
skill-seekers video --channel https://youtube.com/@channelname
skill-seekers video --path ./recording.mp4
skill-seekers video --directory ./recordings/
# With options
skill-seekers video --url <URL> \
--output output/react-videos/ \
--visual \
--whisper-model large-v3 \
--max-videos 20 \
--languages en \
--categories '{"hooks": ["useState", "useEffect"]}' \
--enhance-level 2
```
### Auto-Detection via `create` Command
```bash
# These all auto-detect as video sources
skill-seekers create https://youtube.com/watch?v=abc123
skill-seekers create https://youtu.be/abc123
skill-seekers create https://youtube.com/playlist?list=PLxxx
skill-seekers create https://youtube.com/@channelname
skill-seekers create https://vimeo.com/123456789
skill-seekers create ./tutorial.mp4
skill-seekers create ./recordings/ # Directory of videos
# With universal flags
skill-seekers create https://youtube.com/watch?v=abc123 --visual -p comprehensive
skill-seekers create ./tutorial.mp4 --enhance-level 2 --dry-run
```
### Registration in main.py
```python
# In src/skill_seekers/cli/main.py - COMMAND_MODULES dict
COMMAND_MODULES = {
# ... existing commands ...
'video': 'skill_seekers.cli.video_scraper',
# ... rest of commands ...
}
```
---
## Source Detection
### Changes to `source_detector.py`
```python
# New patterns to add:
class SourceDetector:
# Existing patterns...
# NEW: Video URL patterns
YOUTUBE_VIDEO_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'(?:youtube\.com/watch\?v=|youtu\.be/)'
r'([a-zA-Z0-9_-]{11})'
)
YOUTUBE_PLAYLIST_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)'
)
YOUTUBE_CHANNEL_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'youtube\.com/(?:@|c/|channel/|user/)([a-zA-Z0-9_.-]+)'
)
VIMEO_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?vimeo\.com/(\d+)'
)
# Video file extensions
VIDEO_EXTENSIONS = {
'.mp4', '.mkv', '.webm', '.avi', '.mov',
'.flv', '.ts', '.wmv', '.m4v', '.ogv',
}
@classmethod
def detect(cls, source: str) -> SourceInfo:
"""Updated detection order:
1. .json (config)
2. .pdf
3. .docx
4. Video file extensions (.mp4, .mkv, .webm, etc.) ← NEW
5. Directory (may contain videos)
6. YouTube/Vimeo URL patterns ← NEW
7. GitHub patterns
8. Web URL
9. Domain inference
"""
# 1. Config file
if source.endswith('.json'):
return cls._detect_config(source)
# 2. PDF file
if source.endswith('.pdf'):
return cls._detect_pdf(source)
# 3. Word document
if source.endswith('.docx'):
return cls._detect_word(source)
# 4. NEW: Video file
ext = os.path.splitext(source)[1].lower()
if ext in cls.VIDEO_EXTENSIONS:
return cls._detect_video_file(source)
# 5. Directory
if os.path.isdir(source):
# Check if directory contains mostly video files
if cls._is_video_directory(source):
return cls._detect_video_directory(source)
return cls._detect_local(source)
# 6. NEW: Video URL patterns (before general web URL)
video_info = cls._detect_video_url(source)
if video_info:
return video_info
# 7. GitHub patterns
github_info = cls._detect_github(source)
if github_info:
return github_info
# 8. Web URL
if source.startswith('http://') or source.startswith('https://'):
return cls._detect_web(source)
# 9. Domain inference
if '.' in source and not source.startswith('/'):
return cls._detect_web(f'https://{source}')
raise ValueError(
f"Cannot determine source type for: {source}\n\n"
"Examples:\n"
" Web: skill-seekers create https://docs.react.dev/\n"
" GitHub: skill-seekers create facebook/react\n"
" Local: skill-seekers create ./my-project\n"
" PDF: skill-seekers create tutorial.pdf\n"
" DOCX: skill-seekers create document.docx\n"
" Video: skill-seekers create https://youtube.com/watch?v=xxx\n" # NEW
" Playlist: skill-seekers create https://youtube.com/playlist?list=xxx\n" # NEW
" Config: skill-seekers create configs/react.json"
)
@classmethod
def _detect_video_url(cls, source: str) -> SourceInfo | None:
"""Detect YouTube or Vimeo video URL."""
# YouTube video
match = cls.YOUTUBE_VIDEO_PATTERN.search(source)
if match:
video_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_video',
'video_id': video_id,
'url': f'https://www.youtube.com/watch?v={video_id}',
},
suggested_name=f'video-{video_id}',
raw_input=source,
)
# YouTube playlist
match = cls.YOUTUBE_PLAYLIST_PATTERN.search(source)
if match:
playlist_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_playlist',
'playlist_id': playlist_id,
'url': f'https://www.youtube.com/playlist?list={playlist_id}',
},
suggested_name=f'playlist-{playlist_id[:12]}',
raw_input=source,
)
# YouTube channel
match = cls.YOUTUBE_CHANNEL_PATTERN.search(source)
if match:
channel_name = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_channel',
'channel': channel_name,
'url': source if source.startswith('http') else f'https://www.youtube.com/@{channel_name}',
},
suggested_name=channel_name.lstrip('@'),
raw_input=source,
)
# Vimeo
match = cls.VIMEO_PATTERN.search(source)
if match:
video_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'vimeo',
'video_id': video_id,
'url': f'https://vimeo.com/{video_id}',
},
suggested_name=f'vimeo-{video_id}',
raw_input=source,
)
return None
@classmethod
def _detect_video_file(cls, source: str) -> SourceInfo:
"""Detect local video file."""
name = os.path.splitext(os.path.basename(source))[0]
return SourceInfo(
type='video',
parsed={
'video_source': 'local_file',
'file_path': os.path.abspath(source),
},
suggested_name=name,
raw_input=source,
)
@classmethod
def _detect_video_directory(cls, source: str) -> SourceInfo:
"""Detect directory containing video files."""
directory = os.path.abspath(source)
name = os.path.basename(directory)
return SourceInfo(
type='video',
parsed={
'video_source': 'local_directory',
'directory': directory,
},
suggested_name=name,
raw_input=source,
)
@classmethod
def _is_video_directory(cls, path: str) -> bool:
"""Check if a directory contains mostly video files.
Returns True if >50% of files are video files.
Used to distinguish video directories from code directories.
"""
total = 0
video = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
total += 1
ext = os.path.splitext(f)[1].lower()
if ext in cls.VIDEO_EXTENSIONS:
video += 1
return total > 0 and (video / total) > 0.5
@classmethod
def validate_source(cls, source_info: SourceInfo) -> None:
"""Updated to include video validation."""
# ... existing validation ...
if source_info.type == 'video':
video_source = source_info.parsed.get('video_source')
if video_source == 'local_file':
file_path = source_info.parsed['file_path']
if not os.path.exists(file_path):
raise ValueError(f"Video file does not exist: {file_path}")
elif video_source == 'local_directory':
directory = source_info.parsed['directory']
if not os.path.exists(directory):
raise ValueError(f"Video directory does not exist: {directory}")
# For online sources, validation happens during scraping
```
---
## Unified Config Integration
### Updated `scraped_data` dict in `unified_scraper.py`
```python
# In UnifiedScraper.__init__():
self.scraped_data = {
"documentation": [],
"github": [],
"pdf": [],
"word": [],
"local": [],
"video": [], # ← NEW
}
```
### Video Source Processing in Unified Scraper
```python
def _scrape_video_source(self, source: dict, source_index: int) -> dict:
"""Process a video source from unified config.
Args:
source: Video source config dict from unified JSON
source_index: Index for unique naming
Returns:
Dict with scraping results and metadata
"""
from skill_seekers.cli.video_scraper import VideoScraper
from skill_seekers.cli.video_models import VideoSourceConfig
config = VideoSourceConfig.from_dict(source)
scraper = VideoScraper(config=config, output_dir=self.output_dir)
result = scraper.scrape()
return {
'source_type': 'video',
'source_name': source.get('name', f'video_{source_index}'),
'weight': source.get('weight', 0.2),
'result': result,
'video_count': len(result.videos),
'segment_count': result.total_segments,
'categories': result.categories,
}
```
### Example Unified Config with Video
```json
{
"name": "react-complete",
"description": "React 19 - Documentation + Code + Video Tutorials",
"output_dir": "output/react-complete/",
"sources": [
{
"type": "documentation",
"url": "https://react.dev/",
"name": "official_docs",
"weight": 0.4,
"selectors": {
"main_content": "article",
"code_blocks": "pre code"
},
"categories": {
"getting_started": ["learn", "quick-start"],
"hooks": ["hooks", "use-state", "use-effect"],
"api": ["reference", "api"]
}
},
{
"type": "github",
"repo": "facebook/react",
"name": "source_code",
"weight": 0.3,
"analysis_depth": "deep"
},
{
"type": "video",
"playlist": "https://www.youtube.com/playlist?list=PLreactplaylist",
"name": "official_tutorials",
"weight": 0.2,
"max_videos": 15,
"visual_extraction": true,
"languages": ["en"],
"categories": {
"getting_started": ["intro", "quickstart", "setup"],
"hooks": ["useState", "useEffect", "hooks"],
"advanced": ["suspense", "concurrent", "server"]
}
},
{
"type": "video",
"url": "https://www.youtube.com/watch?v=abc123def45",
"name": "react_conf_keynote",
"weight": 0.1,
"visual_extraction": false
}
],
"merge_strategy": "unified",
"conflict_resolution": "docs_first",
"enhancement": {
"enabled": true,
"level": 2
}
}
```
---
## Create Command Integration
### Changes to Create Command Routing
```python
# In src/skill_seekers/cli/create_command.py (or equivalent in main.py)
def route_source(source_info: SourceInfo, args: argparse.Namespace):
"""Route detected source to appropriate scraper."""
if source_info.type == 'web':
return _route_web(source_info, args)
elif source_info.type == 'github':
return _route_github(source_info, args)
elif source_info.type == 'local':
return _route_local(source_info, args)
elif source_info.type == 'pdf':
return _route_pdf(source_info, args)
elif source_info.type == 'word':
return _route_word(source_info, args)
elif source_info.type == 'video': # ← NEW
return _route_video(source_info, args)
elif source_info.type == 'config':
return _route_config(source_info, args)
def _route_video(source_info: SourceInfo, args: argparse.Namespace):
"""Route video source to video scraper."""
from skill_seekers.cli.video_scraper import VideoScraper
from skill_seekers.cli.video_models import VideoSourceConfig
parsed = source_info.parsed
# Build config from CLI args + parsed source info
config_dict = {
'name': getattr(args, 'name', None) or source_info.suggested_name,
'visual_extraction': getattr(args, 'visual', False),
'whisper_model': getattr(args, 'whisper_model', 'base'),
'max_videos': getattr(args, 'max_videos', 50),
'languages': getattr(args, 'languages', None),
}
# Set the appropriate source field
video_source = parsed['video_source']
if video_source in ('youtube_video', 'vimeo'):
config_dict['url'] = parsed['url']
elif video_source == 'youtube_playlist':
config_dict['playlist'] = parsed['url']
elif video_source == 'youtube_channel':
config_dict['channel'] = parsed['url']
elif video_source == 'local_file':
config_dict['path'] = parsed['file_path']
elif video_source == 'local_directory':
config_dict['directory'] = parsed['directory']
config = VideoSourceConfig.from_dict(config_dict)
output_dir = getattr(args, 'output', None) or f'output/{config_dict["name"]}/'
scraper = VideoScraper(config=config, output_dir=output_dir)
if getattr(args, 'dry_run', False):
scraper.dry_run()
return
result = scraper.scrape()
scraper.generate_output(result)
```
---
## Parser & Arguments
### New Parser: `video_parser.py`
```python
# src/skill_seekers/cli/parsers/video_parser.py
from skill_seekers.cli.parsers.base import SubcommandParser
class VideoParser(SubcommandParser):
"""Parser for the video scraping command."""
name = 'video'
help = 'Extract knowledge from YouTube videos, playlists, channels, or local video files'
description = (
'Process video content into structured skill documentation.\n\n'
'Supports YouTube (single video, playlist, channel), Vimeo, and local video files.\n'
'Extracts transcripts, metadata, chapters, and optionally visual content (code, slides).'
)
def add_arguments(self, parser):
# Source (mutually exclusive group)
source = parser.add_mutually_exclusive_group(required=True)
source.add_argument('--url', help='YouTube or Vimeo video URL')
source.add_argument('--playlist', help='YouTube playlist URL')
source.add_argument('--channel', help='YouTube channel URL')
source.add_argument('--path', help='Local video file path')
source.add_argument('--directory', help='Directory containing video files')
# Add shared arguments (output, dry-run, verbose, etc.)
from skill_seekers.cli.arguments.common import add_all_standard_arguments
add_all_standard_arguments(parser)
# Add video-specific arguments
from skill_seekers.cli.arguments.video import add_video_arguments
add_video_arguments(parser)
```
### New Arguments: `video.py`
```python
# src/skill_seekers/cli/arguments/video.py
VIDEO_ARGUMENTS = {
# === Filtering ===
"max_videos": {
"flags": ("--max-videos",),
"kwargs": {
"type": int,
"default": 50,
"help": "Maximum number of videos to process (default: 50)",
},
},
"min_duration": {
"flags": ("--min-duration",),
"kwargs": {
"type": float,
"default": 60.0,
"help": "Minimum video duration in seconds (default: 60)",
},
},
"max_duration": {
"flags": ("--max-duration",),
"kwargs": {
"type": float,
"default": 7200.0,
"help": "Maximum video duration in seconds (default: 7200 = 2 hours)",
},
},
"languages": {
"flags": ("--languages",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "Preferred transcript languages (default: all). Example: --languages en es",
},
},
"min_views": {
"flags": ("--min-views",),
"kwargs": {
"type": int,
"default": None,
"help": "Minimum view count filter (online videos only)",
},
},
# === Extraction ===
"visual": {
"flags": ("--visual",),
"kwargs": {
"action": "store_true",
"help": "Enable visual extraction (OCR on keyframes). Requires video-full dependencies.",
},
},
"whisper_model": {
"flags": ("--whisper-model",),
"kwargs": {
"default": "base",
"choices": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo"],
"help": "Whisper model size for speech-to-text (default: base)",
},
},
"whisper_device": {
"flags": ("--whisper-device",),
"kwargs": {
"default": "auto",
"choices": ["auto", "cpu", "cuda"],
"help": "Device for Whisper inference (default: auto)",
},
},
"ocr_languages": {
"flags": ("--ocr-languages",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "OCR languages for visual extraction (default: same as --languages)",
},
},
# === Segmentation ===
"segment_strategy": {
"flags": ("--segment-strategy",),
"kwargs": {
"default": "hybrid",
"choices": ["chapters", "semantic", "time_window", "scene_change", "hybrid"],
"help": "How to segment video content (default: hybrid)",
},
},
"segment_duration": {
"flags": ("--segment-duration",),
"kwargs": {
"type": float,
"default": 300.0,
"help": "Target segment duration in seconds for time_window strategy (default: 300)",
},
},
# === Local file options ===
"file_patterns": {
"flags": ("--file-patterns",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "File patterns for directory scanning (default: *.mp4 *.mkv *.webm)",
},
},
"recursive": {
"flags": ("--recursive",),
"kwargs": {
"action": "store_true",
"default": True,
"help": "Recursively scan directories (default: True)",
},
},
"no_recursive": {
"flags": ("--no-recursive",),
"kwargs": {
"action": "store_true",
"help": "Disable recursive directory scanning",
},
},
}
def add_video_arguments(parser):
"""Add all video-specific arguments to a parser."""
for arg_name, arg_def in VIDEO_ARGUMENTS.items():
parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
```
### Progressive Help for Create Command
```python
# In arguments/create.py - add video to help modes
# New help flag
"help_video": {
"flags": ("--help-video",),
"kwargs": {
"action": "store_true",
"help": "Show video-specific options",
},
}
# VIDEO_ARGUMENTS added to create command's video help mode
# skill-seekers create --help-video
```
---
## MCP Tool Integration
### New MCP Tool: `scrape_video`
```python
# In src/skill_seekers/mcp/tools/scraping_tools.py
@mcp.tool()
def scrape_video(
url: str | None = None,
playlist: str | None = None,
path: str | None = None,
output_dir: str = "output/",
visual: bool = False,
max_videos: int = 20,
whisper_model: str = "base",
) -> str:
"""Scrape and extract knowledge from video content.
Supports YouTube videos, playlists, channels, and local video files.
Extracts transcripts, metadata, chapters, and optionally visual content.
Args:
url: YouTube or Vimeo video URL
playlist: YouTube playlist URL
path: Local video file or directory path
output_dir: Output directory for results
visual: Enable visual extraction (OCR on keyframes)
max_videos: Maximum videos to process (for playlists)
whisper_model: Whisper model size for transcription
Returns:
JSON string with scraping results summary
"""
...
```
### Updated Tool Count
Total MCP tools: **27** (was 26, add `scrape_video`)
---
## Enhancement Integration
### Video Content Enhancement
Video segments can be enhanced using the same AI enhancement pipeline:
```python
# In enhance_skill_local.py or enhance_command.py
def enhance_video_content(segments: list[VideoSegment], level: int) -> list[VideoSegment]:
"""AI-enhance video segments.
Enhancement levels:
0 - No enhancement
1 - Summary generation per segment
2 - + Topic extraction, category refinement, code annotation
3 - + Cross-segment connections, tutorial flow analysis, key takeaways
Uses the same enhancement infrastructure as other sources.
"""
if level == 0:
return segments
for segment in segments:
if level >= 1:
segment.summary = ai_summarize(segment.content)
if level >= 2:
segment.topic = ai_extract_topic(segment.content)
segment.category = ai_refine_category(
segment.content, segment.category
)
# Annotate code blocks with explanations
for cb in segment.detected_code_blocks:
cb.explanation = ai_explain_code(cb.code, segment.transcript)
if level >= 3:
# Cross-segment analysis (needs all segments)
pass # Handled at video level, not segment level
return segments
```
---
## File Map (New & Modified Files)
### New Files
| File | Purpose | Estimated Size |
|------|---------|---------------|
| `src/skill_seekers/cli/video_scraper.py` | Main video scraper orchestrator | ~800-1000 lines |
| `src/skill_seekers/cli/video_models.py` | All data classes and enums | ~500-600 lines |
| `src/skill_seekers/cli/video_transcript.py` | Transcript extraction (YouTube API + Whisper) | ~400-500 lines |
| `src/skill_seekers/cli/video_visual.py` | Visual extraction (scene detection + OCR) | ~500-600 lines |
| `src/skill_seekers/cli/video_segmenter.py` | Segmentation and stream alignment | ~400-500 lines |
| `src/skill_seekers/cli/parsers/video_parser.py` | CLI argument parser | ~80-100 lines |
| `src/skill_seekers/cli/arguments/video.py` | Video-specific argument definitions | ~120-150 lines |
| `tests/test_video_scraper.py` | Video scraper tests | ~600-800 lines |
| `tests/test_video_transcript.py` | Transcript extraction tests | ~400-500 lines |
| `tests/test_video_visual.py` | Visual extraction tests | ~400-500 lines |
| `tests/test_video_segmenter.py` | Segmentation tests | ~300-400 lines |
| `tests/test_video_models.py` | Data model tests | ~200-300 lines |
| `tests/test_video_integration.py` | Integration tests | ~300-400 lines |
| `tests/fixtures/video/` | Test fixtures (mock transcripts, metadata) | Various |
### Modified Files
| File | Changes |
|------|---------|
| `src/skill_seekers/cli/source_detector.py` | Add video URL patterns, video file detection, video directory detection |
| `src/skill_seekers/cli/main.py` | Register `video` subcommand in COMMAND_MODULES |
| `src/skill_seekers/cli/unified_scraper.py` | Add `"video": []` to scraped_data, add `_scrape_video_source()` |
| `src/skill_seekers/cli/arguments/create.py` | Add video args to create command, add `--help-video` |
| `src/skill_seekers/cli/parsers/__init__.py` | Register VideoParser |
| `src/skill_seekers/cli/config_validator.py` | Validate video source entries in unified config |
| `src/skill_seekers/mcp/tools/scraping_tools.py` | Add `scrape_video` tool |
| `pyproject.toml` | Add `[video]` and `[video-full]` optional dependencies, add `skill-seekers-video` entry point |
| `tests/test_source_detector.py` | Add video detection tests |
| `tests/test_unified.py` | Add video source integration tests |

View File

@@ -0,0 +1,619 @@
# Video Source — Output Structure & SKILL.md Integration
**Date:** February 27, 2026
**Document:** 05 of 07
**Status:** Planning
---
## Table of Contents
1. [Output Directory Structure](#output-directory-structure)
2. [Reference File Format](#reference-file-format)
3. [SKILL.md Section Format](#skillmd-section-format)
4. [Metadata JSON Format](#metadata-json-format)
5. [Page JSON Format (Compatibility)](#page-json-format-compatibility)
6. [RAG Chunking for Video](#rag-chunking-for-video)
7. [Examples](#examples)
---
## Output Directory Structure
```
output/{skill_name}/
├── SKILL.md # Main skill file (video section added)
├── references/
│ ├── getting_started.md # From docs (existing)
│ ├── api.md # From docs (existing)
│ ├── video_react-hooks-tutorial.md # ← Video reference file
│ ├── video_project-setup-guide.md # ← Video reference file
│ └── video_advanced-patterns.md # ← Video reference file
├── video_data/ # ← NEW: Video-specific data
│ ├── metadata.json # VideoScraperResult (full metadata)
│ ├── transcripts/
│ │ ├── abc123def45.json # Raw transcript per video
│ │ ├── xyz789ghi01.json
│ │ └── ...
│ ├── segments/
│ │ ├── abc123def45_segments.json # Aligned segments per video
│ │ ├── xyz789ghi01_segments.json
│ │ └── ...
│ └── frames/ # Only if --visual enabled
│ ├── abc123def45/
│ │ ├── frame_045.00_terminal.png
│ │ ├── frame_052.30_code.png
│ │ ├── frame_128.00_slide.png
│ │ └── ...
│ └── xyz789ghi01/
│ └── ...
├── pages/ # Existing page format
│ ├── page_001.json # From docs (existing)
│ ├── video_abc123def45.json # ← Video in page format
│ └── ...
└── {skill_name}_data/ # Raw scrape data (existing)
```
---
## Reference File Format
Each video produces one reference markdown file in `references/`. The filename is derived from the video title, sanitized and prefixed with `video_`.
### Naming Convention
```
video_{sanitized_title}.md
```
Sanitization rules:
- Lowercase
- Replace spaces and special chars with hyphens
- Remove consecutive hyphens
- Truncate to 60 characters
- Example: "React Hooks Tutorial for Beginners" → `video_react-hooks-tutorial-for-beginners.md`
### File Structure
```markdown
# {Video Title}
> **Source:** [{channel_name}]({channel_url}) | **Duration:** {HH:MM:SS} | **Published:** {date}
> **URL:** [{url}]({url})
> **Views:** {view_count} | **Likes:** {like_count}
> **Tags:** {tag1}, {tag2}, {tag3}
{description_summary (first 200 chars)}
---
## Table of Contents
{auto-generated from chapter titles / segment headings}
---
{segments rendered as sections}
### {Chapter Title or "Segment N"} ({MM:SS} - {MM:SS})
{merged content: transcript + code blocks + slide text}
```{language}
{code shown on screen}
```
---
### {Next Chapter} ({MM:SS} - {MM:SS})
{content continues...}
---
## Key Takeaways
{AI-generated summary of main points — populated during enhancement}
## Code Examples
{Consolidated list of all code blocks from the video}
```
### Full Example
```markdown
# React Hooks Tutorial for Beginners
> **Source:** [React Official](https://youtube.com/@reactofficial) | **Duration:** 30:32 | **Published:** 2026-01-15
> **URL:** [https://youtube.com/watch?v=abc123def45](https://youtube.com/watch?v=abc123def45)
> **Views:** 1,500,000 | **Likes:** 45,000
> **Tags:** react, hooks, tutorial, javascript, web development
Learn React Hooks from scratch in this comprehensive tutorial. We'll cover useState, useEffect, useContext, and custom hooks with practical examples.
---
## Table of Contents
- [Intro](#intro-0000---0045)
- [Project Setup](#project-setup-0045---0300)
- [useState Hook](#usestate-hook-0300---0900)
- [useEffect Hook](#useeffect-hook-0900---1500)
- [Custom Hooks](#custom-hooks-1500---2200)
- [Best Practices](#best-practices-2200---2800)
- [Wrap Up](#wrap-up-2800---3032)
---
### Intro (00:00 - 00:45)
Welcome to this React Hooks tutorial. Today we'll learn about the most important hooks in React and how to use them effectively in your applications. By the end of this video, you'll understand useState, useEffect, useContext, and how to create your own custom hooks.
---
### Project Setup (00:45 - 03:00)
Let's start by setting up our React project. We'll use Create React App which gives us a great starting point with all the tooling configured.
**Terminal command:**
```bash
npx create-react-app hooks-demo
cd hooks-demo
npm start
```
Open the project in your code editor. You'll see the standard React project structure with src/App.js as our main component file. Let's clear out the boilerplate and start fresh.
**Code shown in editor:**
```jsx
import React from 'react';
function App() {
return (
<div className="App">
<h1>Hooks Demo</h1>
</div>
);
}
export default App;
```
---
### useState Hook (03:00 - 09:00)
The useState hook is the most fundamental hook in React. It lets you add state to functional components. Before hooks, you needed class components for state management.
Let's create a simple counter to demonstrate useState. The hook returns an array with two elements: the current state value and a function to update it. We use array destructuring to name them.
**Code shown in editor:**
```jsx
import React, { useState } from 'react';
function Counter() {
const [count, setCount] = useState(0);
return (
<div>
<p>Count: {count}</p>
<button onClick={() => setCount(count + 1)}>
Increment
</button>
<button onClick={() => setCount(count - 1)}>
Decrement
</button>
</div>
);
}
```
Important things to remember about useState: the initial value is only used on the first render. If you need to compute the initial value, pass a function instead of a value to avoid recomputing on every render.
---
## Key Takeaways
1. **useState** is for managing simple state values in functional components
2. **useEffect** handles side effects (data fetching, subscriptions, DOM updates)
3. Always include a dependency array in useEffect to control when it runs
4. Custom hooks let you extract reusable stateful logic
5. Follow the Rules of Hooks: only call hooks at the top level, only in React functions
## Code Examples
### Counter with useState
```jsx
const [count, setCount] = useState(0);
```
### Data Fetching with useEffect
```jsx
useEffect(() => {
fetch('/api/data')
.then(res => res.json())
.then(setData);
}, []);
```
### Custom Hook: useLocalStorage
```jsx
function useLocalStorage(key, initialValue) {
const [value, setValue] = useState(() => {
const saved = localStorage.getItem(key);
return saved ? JSON.parse(saved) : initialValue;
});
useEffect(() => {
localStorage.setItem(key, JSON.stringify(value));
}, [key, value]);
return [value, setValue];
}
```
```
---
## SKILL.md Section Format
Video content is integrated into SKILL.md as a dedicated section, following the existing section patterns.
### Section Placement
```markdown
# {Skill Name}
## Overview
{existing overview section}
## Quick Reference
{existing quick reference}
## Getting Started
{from docs/github}
## Core Concepts
{from docs/github}
## API Reference
{from docs/github}
## Video Tutorials ← NEW SECTION
{from video sources}
## Code Examples
{consolidated from all sources}
## References
{file listing}
```
### Section Content
```markdown
## Video Tutorials
This skill includes knowledge extracted from {N} video tutorial(s) totaling {HH:MM:SS} of content.
### {Video Title 1}
**Source:** [{channel}]({url}) | {duration} | {view_count} views
{summary or first segment content, abbreviated}
**Topics covered:** {chapter titles or detected topics}
→ Full transcript: [references/video_{sanitized_title}.md](references/video_{sanitized_title}.md)
---
### {Video Title 2}
...
### Key Patterns from Videos
{AI-generated section highlighting patterns that appear across multiple videos}
### Code Examples from Videos
{Consolidated code blocks from all videos, organized by topic}
```{language}
// From: {video_title} at {timestamp}
{code}
```
```
### Playlist Grouping
When a video source is a playlist, the SKILL.md section groups videos under the playlist title:
```markdown
## Video Tutorials
### React Complete Course (12 videos, 6:30:00 total)
1. **Introduction to React** (15:00) — Components, JSX, virtual DOM
2. **React Hooks Deep Dive** (30:32) — useState, useEffect, custom hooks
3. **State Management** (28:15) — Context API, Redux patterns
...
→ Full transcripts in [references/](references/) (video_*.md files)
```
---
## Metadata JSON Format
### `video_data/metadata.json` — Full scraper result
```json
{
"scraper_version": "3.2.0",
"extracted_at": "2026-02-27T14:30:00Z",
"processing_time_seconds": 125.4,
"config": {
"visual_extraction": true,
"whisper_model": "base",
"segmentation_strategy": "hybrid",
"max_videos": 20
},
"summary": {
"total_videos": 5,
"total_duration_seconds": 5420.0,
"total_segments": 42,
"total_code_blocks": 18,
"total_keyframes": 156,
"languages": ["en"],
"categories_found": ["getting_started", "hooks", "advanced"]
},
"videos": [
{
"video_id": "abc123def45",
"title": "React Hooks Tutorial for Beginners",
"duration": 1832.0,
"segments_count": 7,
"code_blocks_count": 5,
"transcript_source": "youtube_manual",
"transcript_confidence": 0.95,
"content_richness_score": 0.88,
"reference_file": "references/video_react-hooks-tutorial-for-beginners.md"
}
],
"warnings": [
"Video xyz789: Auto-generated captions used (manual not available)"
],
"errors": []
}
```
### `video_data/transcripts/{video_id}.json` — Raw transcript
```json
{
"video_id": "abc123def45",
"transcript_source": "youtube_manual",
"language": "en",
"segments": [
{
"text": "Welcome to this React Hooks tutorial.",
"start": 0.0,
"end": 2.5,
"confidence": 1.0,
"words": null
},
{
"text": "Today we'll learn about the most important hooks.",
"start": 2.5,
"end": 5.8,
"confidence": 1.0,
"words": null
}
]
}
```
### `video_data/segments/{video_id}_segments.json` — Aligned segments
```json
{
"video_id": "abc123def45",
"segmentation_strategy": "chapters",
"segments": [
{
"index": 0,
"start_time": 0.0,
"end_time": 45.0,
"duration": 45.0,
"chapter_title": "Intro",
"category": "getting_started",
"content_type": "explanation",
"transcript": "Welcome to this React Hooks tutorial...",
"transcript_confidence": 0.95,
"has_code_on_screen": false,
"has_slides": false,
"keyframes_count": 2,
"code_blocks_count": 0,
"confidence": 0.95
}
]
}
```
---
## Page JSON Format (Compatibility)
For compatibility with the existing page-based pipeline (`pages/*.json`), each video also produces a page JSON file. This ensures video content flows through the same build pipeline as other sources.
### `pages/video_{video_id}.json`
```json
{
"url": "https://www.youtube.com/watch?v=abc123def45",
"title": "React Hooks Tutorial for Beginners",
"content": "{full merged content from all segments}",
"category": "tutorials",
"source_type": "video",
"metadata": {
"video_id": "abc123def45",
"duration": 1832.0,
"channel": "React Official",
"view_count": 1500000,
"chapters": 7,
"transcript_source": "youtube_manual",
"has_visual_extraction": true
},
"code_blocks": [
{
"language": "jsx",
"code": "const [count, setCount] = useState(0);",
"source": "video_ocr",
"timestamp": 195.0
}
],
"extracted_at": "2026-02-27T14:30:00Z"
}
```
This format is compatible with the existing `build_skill()` function in `doc_scraper.py`, which reads `pages/*.json` files to build the skill.
---
## RAG Chunking for Video
When `--chunk-for-rag` is enabled, video segments are chunked differently from text documents because they already have natural boundaries (chapters/segments).
### Chunking Strategy
```
For each VideoSegment:
IF segment.duration <= chunk_duration_threshold (default: 300s / 5 min):
→ Output as single chunk
ELIF segment has sub-sections (code blocks interleaved with explanation):
→ Split at code block boundaries
→ Each chunk = explanation + associated code block
ELSE (long segment without clear sub-sections):
→ Split at sentence boundaries
→ Target chunk size: config.chunk_size tokens
→ Overlap: config.chunk_overlap tokens
```
### RAG Metadata per Chunk
```json
{
"text": "chunk content...",
"metadata": {
"source": "video",
"source_type": "youtube",
"video_id": "abc123def45",
"video_title": "React Hooks Tutorial",
"channel": "React Official",
"timestamp_start": 180.0,
"timestamp_end": 300.0,
"timestamp_url": "https://youtube.com/watch?v=abc123def45&t=180",
"chapter": "useState Hook",
"category": "hooks",
"content_type": "live_coding",
"has_code": true,
"language": "en",
"confidence": 0.94,
"view_count": 1500000,
"upload_date": "2026-01-15"
}
}
```
The `timestamp_url` field is especially valuable — it lets RAG systems link directly to the relevant moment in the video.
---
## Examples
### Minimal Output (transcript only, single video)
```
output/react-hooks-video/
├── SKILL.md # Skill with video section
├── references/
│ └── video_react-hooks-tutorial.md # Full transcript organized by chapters
├── video_data/
│ ├── metadata.json # Scraper metadata
│ ├── transcripts/
│ │ └── abc123def45.json # Raw transcript
│ └── segments/
│ └── abc123def45_segments.json # Aligned segments
└── pages/
└── video_abc123def45.json # Page-compatible format
```
### Full Output (visual extraction, playlist of 5 videos)
```
output/react-complete/
├── SKILL.md
├── references/
│ ├── video_intro-to-react.md
│ ├── video_react-hooks-deep-dive.md
│ ├── video_state-management.md
│ ├── video_react-router.md
│ └── video_testing-react-apps.md
├── video_data/
│ ├── metadata.json
│ ├── transcripts/
│ │ ├── abc123def45.json
│ │ ├── def456ghi78.json
│ │ ├── ghi789jkl01.json
│ │ ├── jkl012mno34.json
│ │ └── mno345pqr67.json
│ ├── segments/
│ │ ├── abc123def45_segments.json
│ │ ├── def456ghi78_segments.json
│ │ ├── ghi789jkl01_segments.json
│ │ ├── jkl012mno34_segments.json
│ │ └── mno345pqr67_segments.json
│ └── frames/
│ ├── abc123def45/
│ │ ├── frame_045.00_terminal.png
│ │ ├── frame_052.30_code.png
│ │ ├── frame_128.00_slide.png
│ │ └── ... (50+ frames)
│ ├── def456ghi78/
│ │ └── ...
│ └── ...
└── pages/
├── video_abc123def45.json
├── video_def456ghi78.json
├── video_ghi789jkl01.json
├── video_jkl012mno34.json
└── video_mno345pqr67.json
```
### Mixed Source Output (docs + github + video)
```
output/react-unified/
├── SKILL.md # Unified skill with ALL sources
├── references/
│ ├── getting_started.md # From docs
│ ├── hooks.md # From docs
│ ├── api_reference.md # From docs
│ ├── architecture.md # From GitHub analysis
│ ├── patterns.md # From GitHub analysis
│ ├── video_react-hooks-tutorial.md # From video
│ ├── video_react-conf-keynote.md # From video
│ └── video_advanced-patterns.md # From video
├── video_data/
│ └── ... (video-specific data)
├── pages/
│ ├── page_001.json # From docs
│ ├── page_002.json
│ ├── video_abc123def45.json # From video
│ └── video_def456ghi78.json
└── react_data/
└── pages/ # Raw scrape data
```

View File

@@ -0,0 +1,748 @@
# Video Source — Testing Strategy
**Date:** February 27, 2026
**Document:** 06 of 07
**Status:** Planning
---
## Table of Contents
1. [Testing Principles](#testing-principles)
2. [Test File Structure](#test-file-structure)
3. [Fixtures & Mock Data](#fixtures--mock-data)
4. [Unit Tests](#unit-tests)
5. [Integration Tests](#integration-tests)
6. [E2E Tests](#e2e-tests)
7. [CI Considerations](#ci-considerations)
8. [Performance Tests](#performance-tests)
---
## Testing Principles
1. **No network calls in unit tests** — All YouTube API, yt-dlp, and download operations must be mocked.
2. **No GPU required in CI** — All Whisper and easyocr tests must work on CPU, or be marked `@pytest.mark.slow`.
3. **No video files in repo** — Test fixtures use JSON transcripts and small synthetic images, not actual video files.
4. **100% pipeline coverage** — Every phase of the 6-phase pipeline must be tested.
5. **Edge case focus** — Test missing chapters, empty transcripts, corrupt frames, rate limits.
6. **Compatible with existing test infra** — Use existing conftest.py, markers, and patterns.
---
## Test File Structure
```
tests/
├── test_video_models.py # Data model tests (serialization, validation)
├── test_video_scraper.py # Main scraper orchestration tests
├── test_video_transcript.py # Transcript extraction tests
├── test_video_visual.py # Visual extraction tests
├── test_video_segmenter.py # Segmentation and alignment tests
├── test_video_integration.py # Integration with unified scraper, create command
├── test_video_output.py # Output generation tests
├── test_video_source_detector.py # Source detection tests (or add to existing)
├── fixtures/
│ └── video/
│ ├── sample_metadata.json # yt-dlp info_dict mock
│ ├── sample_transcript.json # YouTube transcript mock
│ ├── sample_whisper_output.json # Whisper transcription mock
│ ├── sample_chapters.json # Chapter data mock
│ ├── sample_playlist.json # Playlist metadata mock
│ ├── sample_segments.json # Pre-aligned segments
│ ├── sample_frame_code.png # 100x100 synthetic dark frame
│ ├── sample_frame_slide.png # 100x100 synthetic light frame
│ ├── sample_frame_diagram.png # 100x100 synthetic edge-heavy frame
│ ├── sample_srt.srt # SRT subtitle file
│ ├── sample_vtt.vtt # WebVTT subtitle file
│ └── sample_config.json # Video source config
```
---
## Fixtures & Mock Data
### yt-dlp Metadata Fixture
```python
# tests/fixtures/video/sample_metadata.json
SAMPLE_YTDLP_METADATA = {
"id": "abc123def45",
"title": "React Hooks Tutorial for Beginners",
"description": "Learn React Hooks from scratch. Covers useState, useEffect, and custom hooks.",
"duration": 1832,
"upload_date": "20260115",
"uploader": "React Official",
"uploader_url": "https://www.youtube.com/@reactofficial",
"channel_follower_count": 250000,
"view_count": 1500000,
"like_count": 45000,
"comment_count": 2300,
"tags": ["react", "hooks", "tutorial", "javascript"],
"categories": ["Education"],
"language": "en",
"thumbnail": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
"webpage_url": "https://www.youtube.com/watch?v=abc123def45",
"chapters": [
{"title": "Intro", "start_time": 0, "end_time": 45},
{"title": "Project Setup", "start_time": 45, "end_time": 180},
{"title": "useState Hook", "start_time": 180, "end_time": 540},
{"title": "useEffect Hook", "start_time": 540, "end_time": 900},
{"title": "Custom Hooks", "start_time": 900, "end_time": 1320},
{"title": "Best Practices", "start_time": 1320, "end_time": 1680},
{"title": "Wrap Up", "start_time": 1680, "end_time": 1832},
],
"subtitles": {
"en": [{"ext": "vtt", "url": "https://..."}],
},
"automatic_captions": {
"en": [{"ext": "vtt", "url": "https://..."}],
},
"extractor": "youtube",
}
```
### YouTube Transcript Fixture
```python
SAMPLE_YOUTUBE_TRANSCRIPT = [
{"text": "Welcome to this React Hooks tutorial.", "start": 0.0, "duration": 2.5},
{"text": "Today we'll learn about the most important hooks.", "start": 2.5, "duration": 3.0},
{"text": "Let's start by setting up our project.", "start": 45.0, "duration": 2.8},
{"text": "We'll use Create React App.", "start": 47.8, "duration": 2.0},
{"text": "Run npx create-react-app hooks-demo.", "start": 49.8, "duration": 3.5},
# ... more segments covering all chapters
]
```
### Whisper Output Fixture
```python
SAMPLE_WHISPER_OUTPUT = {
"language": "en",
"language_probability": 0.98,
"duration": 1832.0,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Welcome to this React Hooks tutorial.",
"avg_logprob": -0.15,
"no_speech_prob": 0.01,
"words": [
{"word": "Welcome", "start": 0.0, "end": 0.4, "probability": 0.97},
{"word": "to", "start": 0.4, "end": 0.5, "probability": 0.99},
{"word": "this", "start": 0.5, "end": 0.7, "probability": 0.98},
{"word": "React", "start": 0.7, "end": 1.1, "probability": 0.95},
{"word": "Hooks", "start": 1.1, "end": 1.5, "probability": 0.93},
{"word": "tutorial.", "start": 1.5, "end": 2.3, "probability": 0.96},
],
},
],
}
```
### Synthetic Frame Fixtures
```python
# Generate in conftest.py or fixture setup
import numpy as np
import cv2
def create_dark_frame(path: str):
"""Create a synthetic dark frame (simulates code editor)."""
img = np.zeros((1080, 1920, 3), dtype=np.uint8)
img[200:250, 100:800] = [200, 200, 200] # Simulated text line
img[270:320, 100:600] = [180, 180, 180] # Another text line
cv2.imwrite(path, img)
def create_light_frame(path: str):
"""Create a synthetic light frame (simulates slide)."""
img = np.ones((1080, 1920, 3), dtype=np.uint8) * 240
img[100:150, 200:1000] = [40, 40, 40] # Title text
img[300:330, 200:1200] = [60, 60, 60] # Body text
cv2.imwrite(path, img)
```
### conftest.py Additions
```python
# tests/conftest.py — add video fixtures
import pytest
import json
from pathlib import Path
FIXTURES_DIR = Path(__file__).parent / "fixtures" / "video"
@pytest.fixture
def sample_ytdlp_metadata():
"""Load sample yt-dlp metadata."""
with open(FIXTURES_DIR / "sample_metadata.json") as f:
return json.load(f)
@pytest.fixture
def sample_transcript():
"""Load sample YouTube transcript."""
with open(FIXTURES_DIR / "sample_transcript.json") as f:
return json.load(f)
@pytest.fixture
def sample_whisper_output():
"""Load sample Whisper transcription output."""
with open(FIXTURES_DIR / "sample_whisper_output.json") as f:
return json.load(f)
@pytest.fixture
def sample_chapters():
"""Load sample chapter data."""
with open(FIXTURES_DIR / "sample_chapters.json") as f:
return json.load(f)
@pytest.fixture
def sample_video_config():
"""Create a sample VideoSourceConfig."""
from skill_seekers.cli.video_models import VideoSourceConfig
return VideoSourceConfig(
url="https://www.youtube.com/watch?v=abc123def45",
name="test_video",
visual_extraction=False,
max_videos=5,
)
@pytest.fixture
def video_output_dir(tmp_path):
"""Create a temporary output directory for video tests."""
output = tmp_path / "output" / "test_video"
output.mkdir(parents=True)
(output / "video_data").mkdir()
(output / "video_data" / "transcripts").mkdir()
(output / "video_data" / "segments").mkdir()
(output / "video_data" / "frames").mkdir()
(output / "references").mkdir()
(output / "pages").mkdir()
return output
```
---
## Unit Tests
### test_video_models.py
```python
"""Tests for video data models and serialization."""
class TestVideoInfo:
def test_create_from_ytdlp_metadata(self, sample_ytdlp_metadata):
"""VideoInfo correctly parses yt-dlp info_dict."""
...
def test_serialization_round_trip(self):
"""VideoInfo serializes to dict and deserializes back identically."""
...
def test_content_richness_score(self):
"""Content richness score computed correctly based on signals."""
...
def test_empty_chapters(self):
"""VideoInfo handles video with no chapters."""
...
class TestVideoSegment:
def test_timestamp_display(self):
"""Timestamp display formats correctly (MM:SS - MM:SS)."""
...
def test_youtube_timestamp_url(self):
"""YouTube timestamp URL generated correctly."""
...
def test_segment_with_code_blocks(self):
"""Segment correctly tracks detected code blocks."""
...
def test_segment_without_visual(self):
"""Segment works when visual extraction is disabled."""
...
class TestChapter:
def test_chapter_duration(self):
"""Chapter duration computed correctly."""
...
def test_chapter_serialization(self):
"""Chapter serializes to/from dict."""
...
class TestTranscriptSegment:
def test_from_youtube_api(self):
"""TranscriptSegment created from YouTube API format."""
...
def test_from_whisper_output(self):
"""TranscriptSegment created from Whisper output."""
...
def test_with_word_timestamps(self):
"""TranscriptSegment preserves word-level timestamps."""
...
class TestVideoSourceConfig:
def test_validate_single_source(self):
"""Config requires exactly one source field."""
...
def test_validate_duration_range(self):
"""Config validates min < max duration."""
...
def test_defaults(self):
"""Config has sensible defaults."""
...
def test_from_unified_config(self, sample_video_config):
"""Config created from unified config JSON entry."""
...
class TestEnums:
def test_all_video_source_types(self):
"""All VideoSourceType values are valid."""
...
def test_all_frame_types(self):
"""All FrameType values are valid."""
...
def test_all_transcript_sources(self):
"""All TranscriptSource values are valid."""
...
```
### test_video_transcript.py
```python
"""Tests for transcript extraction (YouTube API + Whisper + subtitle parsing)."""
class TestYouTubeTranscript:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_extract_manual_captions(self, mock_api, sample_transcript):
"""Prefers manual captions over auto-generated."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_fallback_to_auto_generated(self, mock_api):
"""Falls back to auto-generated when manual not available."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_fallback_to_translation(self, mock_api):
"""Falls back to translated captions when preferred language unavailable."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_no_transcript_available(self, mock_api):
"""Raises TranscriptNotAvailable when no captions exist."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_confidence_scoring(self, mock_api, sample_transcript):
"""Manual captions get 1.0 confidence, auto-generated get 0.8."""
...
class TestWhisperTranscription:
@pytest.mark.slow
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_transcribe_with_word_timestamps(self, mock_model):
"""Whisper returns word-level timestamps."""
...
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_language_detection(self, mock_model):
"""Whisper detects video language."""
...
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_vad_filtering(self, mock_model):
"""VAD filter removes silence segments."""
...
def test_download_audio_only(self):
"""Audio extraction downloads audio stream only (not video)."""
# Mock yt-dlp download
...
class TestSubtitleParsing:
def test_parse_srt(self, tmp_path):
"""Parse SRT subtitle file into segments."""
srt_content = "1\n00:00:01,500 --> 00:00:04,000\nHello world\n\n2\n00:00:05,000 --> 00:00:08,000\nSecond line\n"
srt_file = tmp_path / "test.srt"
srt_file.write_text(srt_content)
...
def test_parse_vtt(self, tmp_path):
"""Parse WebVTT subtitle file into segments."""
vtt_content = "WEBVTT\n\n00:00:01.500 --> 00:00:04.000\nHello world\n\n00:00:05.000 --> 00:00:08.000\nSecond line\n"
vtt_file = tmp_path / "test.vtt"
vtt_file.write_text(vtt_content)
...
def test_srt_html_tag_removal(self, tmp_path):
"""SRT parser removes inline HTML tags."""
...
def test_empty_subtitle_file(self, tmp_path):
"""Handle empty subtitle file gracefully."""
...
class TestTranscriptFallbackChain:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_youtube_then_whisper_fallback(self, mock_whisper, mock_yt_api):
"""Falls back to Whisper when YouTube captions fail."""
...
def test_subtitle_file_discovery(self, tmp_path):
"""Discovers sidecar subtitle files for local videos."""
...
```
### test_video_visual.py
```python
"""Tests for visual extraction (scene detection, frame extraction, OCR)."""
class TestFrameClassification:
def test_classify_dark_frame_as_code(self, tmp_path):
"""Dark frame with text patterns classified as code_editor."""
...
def test_classify_light_frame_as_slide(self, tmp_path):
"""Light uniform frame classified as slide."""
...
def test_classify_high_edge_as_diagram(self, tmp_path):
"""High edge density frame classified as diagram."""
...
def test_classify_blank_frame_as_other(self, tmp_path):
"""Nearly blank frame classified as other."""
...
class TestKeyframeTimestamps:
def test_chapter_boundaries_included(self, sample_chapters):
"""Keyframe timestamps include chapter start times."""
...
def test_long_chapter_midpoint(self, sample_chapters):
"""Long chapters (>2 min) get midpoint keyframe."""
...
def test_deduplication_within_1_second(self):
"""Timestamps within 1 second are deduplicated."""
...
def test_regular_intervals_fill_gaps(self):
"""Regular interval timestamps fill gaps between scenes."""
...
class TestOCRExtraction:
@pytest.mark.slow
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_extract_text_from_code_frame(self, mock_reader, tmp_path):
"""OCR extracts text from code editor frame."""
...
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_confidence_filtering(self, mock_reader):
"""Low-confidence OCR results are filtered out."""
...
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_monospace_detection(self, mock_reader):
"""Monospace text regions correctly detected."""
...
class TestCodeBlockDetection:
def test_detect_python_code(self):
"""Detect Python code from OCR text."""
...
def test_detect_terminal_commands(self):
"""Detect terminal commands from OCR text."""
...
def test_language_detection_from_ocr(self):
"""Language detection works on OCR-extracted code."""
...
```
### test_video_segmenter.py
```python
"""Tests for segmentation and stream alignment."""
class TestChapterSegmentation:
def test_chapters_create_segments(self, sample_chapters):
"""Chapters map directly to segments."""
...
def test_long_chapter_splitting(self):
"""Chapters exceeding max_segment_duration are split."""
...
def test_empty_chapters(self):
"""Falls back to time window when no chapters."""
...
class TestTimeWindowSegmentation:
def test_fixed_windows(self):
"""Creates segments at fixed intervals."""
...
def test_sentence_boundary_alignment(self):
"""Segments split at sentence boundaries, not mid-word."""
...
def test_configurable_window_size(self):
"""Window size respects config.time_window_seconds."""
...
class TestStreamAlignment:
def test_align_transcript_to_segments(self, sample_transcript, sample_chapters):
"""Transcript segments mapped to correct time windows."""
...
def test_align_keyframes_to_segments(self):
"""Keyframes mapped to correct segments by timestamp."""
...
def test_partial_overlap_handling(self):
"""Transcript segments partially overlapping window boundaries."""
...
def test_empty_segment_handling(self):
"""Handle segments with no transcript (silence, music)."""
...
class TestContentMerging:
def test_transcript_only_content(self):
"""Content is just transcript when no visual data."""
...
def test_code_block_appended(self):
"""Code on screen is appended to transcript content."""
...
def test_duplicate_code_not_repeated(self):
"""Code mentioned in transcript is not duplicated from OCR."""
...
def test_chapter_title_as_heading(self):
"""Chapter title becomes markdown heading in content."""
...
def test_slide_text_supplementary(self):
"""Slide text adds to content when not in transcript."""
...
class TestCategorization:
def test_category_from_chapter_title(self):
"""Category inferred from chapter title keywords."""
...
def test_category_from_transcript(self):
"""Category inferred from transcript content."""
...
def test_custom_categories_from_config(self):
"""Custom category keywords from config used."""
...
```
---
## Integration Tests
### test_video_integration.py
```python
"""Integration tests for video pipeline end-to-end."""
class TestSourceDetectorVideo:
def test_detect_youtube_video(self):
info = SourceDetector.detect("https://youtube.com/watch?v=abc123def45")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_video"
def test_detect_youtube_short_url(self):
info = SourceDetector.detect("https://youtu.be/abc123def45")
assert info.type == "video"
def test_detect_youtube_playlist(self):
info = SourceDetector.detect("https://youtube.com/playlist?list=PLxxx")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_playlist"
def test_detect_youtube_channel(self):
info = SourceDetector.detect("https://youtube.com/@reactofficial")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_channel"
def test_detect_vimeo(self):
info = SourceDetector.detect("https://vimeo.com/123456789")
assert info.type == "video"
assert info.parsed["video_source"] == "vimeo"
def test_detect_mp4_file(self, tmp_path):
f = tmp_path / "tutorial.mp4"
f.touch()
info = SourceDetector.detect(str(f))
assert info.type == "video"
assert info.parsed["video_source"] == "local_file"
def test_detect_video_directory(self, tmp_path):
d = tmp_path / "videos"
d.mkdir()
(d / "vid1.mp4").touch()
(d / "vid2.mkv").touch()
info = SourceDetector.detect(str(d))
assert info.type == "video"
def test_youtube_not_confused_with_web(self):
"""YouTube URLs detected as video, not web."""
info = SourceDetector.detect("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
assert info.type == "video"
assert info.type != "web"
class TestUnifiedConfigVideo:
def test_video_source_in_config(self, tmp_path):
"""Video source parsed correctly from unified config."""
...
def test_multiple_video_sources(self, tmp_path):
"""Multiple video sources in same config."""
...
def test_video_alongside_docs(self, tmp_path):
"""Video source alongside documentation source."""
...
class TestFullPipeline:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_scraper.YoutubeDL')
def test_single_video_transcript_only(
self, mock_ytdl, mock_transcript, sample_ytdlp_metadata,
sample_transcript, video_output_dir
):
"""Full pipeline: single YouTube video, transcript only."""
mock_ytdl.return_value.__enter__.return_value.extract_info.return_value = sample_ytdlp_metadata
mock_transcript.list_transcripts.return_value = ...
# Run pipeline
# Assert output files exist and content is correct
...
@pytest.mark.slow
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_scraper.YoutubeDL')
def test_single_video_with_visual(
self, mock_ytdl, mock_transcript, mock_ocr,
sample_ytdlp_metadata, video_output_dir
):
"""Full pipeline: single video with visual extraction."""
...
```
---
## CI Considerations
### What Runs in CI (Default)
- All unit tests (mocked, no network, no GPU)
- Integration tests with mocked external services
- Source detection tests (pure logic)
- Data model tests (pure logic)
### What Doesn't Run in CI (Marked)
```python
@pytest.mark.slow # Whisper model loading, actual OCR
@pytest.mark.integration # Real YouTube API calls
@pytest.mark.e2e # Full pipeline with real video download
```
### CI Test Matrix Compatibility
| Test | Ubuntu | macOS | Python 3.10 | Python 3.12 | GPU |
|------|--------|-------|-------------|-------------|-----|
| Unit tests | Yes | Yes | Yes | Yes | No |
| Integration (mocked) | Yes | Yes | Yes | Yes | No |
| Whisper tests (mocked) | Yes | Yes | Yes | Yes | No |
| OCR tests (mocked) | Yes | Yes | Yes | Yes | No |
| E2E (real download) | Skip | Skip | Skip | Skip | No |
### Dependency Handling in Tests
```python
# At top of visual test files:
pytest.importorskip("cv2", reason="opencv-python-headless required for visual tests")
pytest.importorskip("easyocr", reason="easyocr required for OCR tests")
# At top of whisper test files:
pytest.importorskip("faster_whisper", reason="faster-whisper required for transcription tests")
```
---
## Performance Tests
```python
@pytest.mark.benchmark
class TestVideoPerformance:
def test_transcript_parsing_speed(self, sample_transcript):
"""Transcript parsing completes in < 10ms for 1000 segments."""
...
def test_segment_alignment_speed(self):
"""Segment alignment completes in < 50ms for 100 segments."""
...
def test_frame_classification_speed(self, tmp_path):
"""Frame classification completes in < 20ms per frame."""
...
def test_content_merging_speed(self):
"""Content merging completes in < 5ms per segment."""
...
def test_output_generation_speed(self, video_output_dir):
"""Output generation (5 videos, 50 segments) in < 1 second."""
...
```

View File

@@ -0,0 +1,515 @@
# Video Source — Dependencies & System Requirements
**Date:** February 27, 2026
**Document:** 07 of 07
**Status:** Planning
> **Status: IMPLEMENTED** — `skill-seekers video --setup` (see `video_setup.py`, 835 lines, 60 tests)
> - GPU auto-detection: NVIDIA (nvidia-smi/CUDA), AMD (rocminfo/ROCm), CPU fallback
> - Correct PyTorch index URL selection per GPU vendor
> - EasyOCR removed from pip extras, installed at runtime via --setup
> - ROCm configuration (MIOPEN_FIND_MODE, HSA_OVERRIDE_GFX_VERSION)
> - Virtual environment detection with --force override
> - System dependency checks (tesseract, ffmpeg)
> - Non-interactive mode for MCP/CI usage
---
## Table of Contents
1. [Dependency Tiers](#dependency-tiers)
2. [pyproject.toml Changes](#pyprojecttoml-changes)
3. [System Requirements](#system-requirements)
4. [Import Guards](#import-guards)
5. [Dependency Check Command](#dependency-check-command)
6. [Model Management](#model-management)
7. [Docker Considerations](#docker-considerations)
---
## Dependency Tiers
Video processing has two tiers to keep the base install lightweight:
### Tier 1: `[video]` — Lightweight (YouTube transcripts + metadata)
**Use case:** YouTube videos with existing captions. No download, no GPU needed.
| Package | Version | Size | Purpose |
|---------|---------|------|---------|
| `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata extraction, audio download |
| `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube caption extraction |
**Capabilities:**
- YouTube metadata (title, chapters, tags, description, engagement)
- YouTube captions (manual and auto-generated)
- Vimeo metadata
- Playlist and channel resolution
- Subtitle file parsing (SRT, VTT)
- Segmentation and alignment
- Full output generation
**NOT included:**
- Speech-to-text (Whisper)
- Visual extraction (frame + OCR)
- Local video file transcription (without subtitles)
### Tier 2: `[video-full]` — Full (adds Whisper + visual extraction)
**Use case:** Local videos without subtitles, or when you want code/slide extraction from screen.
| Package | Version | Size | Purpose |
|---------|---------|------|---------|
| `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata + audio download |
| `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube captions |
| `faster-whisper` | `>=1.0.0` | ~5MB (+ models: 75MB-3GB) | Speech-to-text |
| `scenedetect[opencv]` | `>=0.6.4` | ~50MB (includes OpenCV) | Scene boundary detection |
| `easyocr` | `>=1.7.0` | ~150MB (+ models: ~200MB) | Text recognition from frames |
| `opencv-python-headless` | `>=4.9.0` | ~50MB | Frame extraction, image processing |
**Additional capabilities over Tier 1:**
- Whisper speech-to-text (99 languages, word-level timestamps)
- Scene detection (find visual transitions)
- Keyframe extraction (save important frames)
- Frame classification (code/slide/terminal/diagram)
- OCR on frames (extract code and text from screen)
- Code block detection from video
**Total install size:**
- Tier 1: ~15MB
- Tier 2: ~270MB + models (~300MB-3.2GB depending on Whisper model)
---
## pyproject.toml Changes
```toml
[project.optional-dependencies]
# Existing dependencies...
gemini = ["google-generativeai>=0.8.0"]
openai = ["openai>=1.0.0"]
all-llms = ["google-generativeai>=0.8.0", "openai>=1.0.0"]
# NEW: Video processing
video = [
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
]
video-full = [
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
"faster-whisper>=1.0.0",
"scenedetect[opencv]>=0.6.4",
"easyocr>=1.7.0",
"opencv-python-headless>=4.9.0",
]
# Update 'all' to include video
all = [
# ... existing all dependencies ...
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
"faster-whisper>=1.0.0",
"scenedetect[opencv]>=0.6.4",
"easyocr>=1.7.0",
"opencv-python-headless>=4.9.0",
]
[project.scripts]
# ... existing entry points ...
skill-seekers-video = "skill_seekers.cli.video_scraper:main" # NEW
```
### Installation Commands
```bash
# Lightweight video (YouTube transcripts + metadata)
pip install skill-seekers[video]
# Full video (+ Whisper + visual extraction)
pip install skill-seekers[video-full]
# Everything
pip install skill-seekers[all]
# Development (editable)
pip install -e ".[video]"
pip install -e ".[video-full]"
```
---
## System Requirements
### Tier 1 (Lightweight)
| Requirement | Needed For | How to Check |
|-------------|-----------|-------------|
| Python 3.10+ | All | `python --version` |
| Internet connection | YouTube API calls | N/A |
No additional system dependencies. Pure Python.
### Tier 2 (Full)
| Requirement | Needed For | How to Check | Install |
|-------------|-----------|-------------|---------|
| Python 3.10+ | All | `python --version` | — |
| FFmpeg | Audio extraction, video processing | `ffmpeg -version` | See below |
| GPU (optional) | Whisper + easyocr acceleration | `nvidia-smi` (NVIDIA) | CUDA toolkit |
### FFmpeg Installation
FFmpeg is required for:
- Extracting audio from video files (Whisper input)
- Downloading audio-only streams (yt-dlp post-processing)
- Converting between audio formats
```bash
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Windows (winget)
winget install ffmpeg
# Windows (choco)
choco install ffmpeg
# Verify
ffmpeg -version
```
### GPU Support (Optional)
GPU accelerates Whisper (~4x) and easyocr (~5x) but is not required.
**NVIDIA GPU (CUDA):**
```bash
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# faster-whisper uses CTranslate2 which auto-detects CUDA
# easyocr uses PyTorch which auto-detects CUDA
# No additional setup needed if PyTorch CUDA is working
```
**Apple Silicon (MPS):**
```bash
# faster-whisper does not support MPS directly
# Falls back to CPU on Apple Silicon
# easyocr has partial MPS support
```
**CPU-only (no GPU):**
```bash
# Everything works on CPU, just slower
# Whisper base model: ~4x slower on CPU vs GPU
# easyocr: ~5x slower on CPU vs GPU
# For short videos (<10 min), CPU is fine
```
---
## Import Guards
All video dependencies use try/except import guards to provide clear error messages:
### video_scraper.py
```python
"""Video scraper - main orchestrator."""
# Core dependencies (always available)
import json
import logging
import os
from pathlib import Path
# Tier 1: Video basics
try:
from yt_dlp import YoutubeDL
HAS_YTDLP = True
except ImportError:
HAS_YTDLP = False
try:
from youtube_transcript_api import YouTubeTranscriptApi
HAS_YT_TRANSCRIPT = True
except ImportError:
HAS_YT_TRANSCRIPT = False
# Feature availability check
def check_video_dependencies(require_full: bool = False) -> None:
"""Check that video dependencies are installed.
Args:
require_full: If True, check for full dependencies (Whisper, OCR)
Raises:
ImportError: With installation instructions
"""
missing = []
if not HAS_YTDLP:
missing.append("yt-dlp")
if not HAS_YT_TRANSCRIPT:
missing.append("youtube-transcript-api")
if missing:
raise ImportError(
f"Video processing requires: {', '.join(missing)}\n"
f"Install with: pip install skill-seekers[video]"
)
if require_full:
full_missing = []
try:
import faster_whisper
except ImportError:
full_missing.append("faster-whisper")
try:
import cv2
except ImportError:
full_missing.append("opencv-python-headless")
try:
import scenedetect
except ImportError:
full_missing.append("scenedetect[opencv]")
try:
import easyocr
except ImportError:
full_missing.append("easyocr")
if full_missing:
raise ImportError(
f"Visual extraction requires: {', '.join(full_missing)}\n"
f"Install with: pip install skill-seekers[video-full]"
)
```
### video_transcript.py
```python
"""Transcript extraction module."""
# YouTube transcript (Tier 1)
try:
from youtube_transcript_api import YouTubeTranscriptApi
HAS_YT_TRANSCRIPT = True
except ImportError:
HAS_YT_TRANSCRIPT = False
# Whisper (Tier 2)
try:
from faster_whisper import WhisperModel
HAS_WHISPER = True
except ImportError:
HAS_WHISPER = False
def get_transcript(video_info, config):
"""Get transcript using best available method."""
# Try YouTube captions first (Tier 1)
if HAS_YT_TRANSCRIPT and video_info.source_type == VideoSourceType.YOUTUBE:
try:
return extract_youtube_transcript(video_info.video_id, config.languages)
except TranscriptNotAvailable:
pass
# Try Whisper fallback (Tier 2)
if HAS_WHISPER:
return transcribe_with_whisper(video_info, config)
# No transcript possible
if not HAS_WHISPER:
logger.warning(
f"No transcript for {video_info.video_id}. "
"Install faster-whisper for speech-to-text: "
"pip install skill-seekers[video-full]"
)
return [], TranscriptSource.NONE
```
### video_visual.py
```python
"""Visual extraction module."""
try:
import cv2
HAS_OPENCV = True
except ImportError:
HAS_OPENCV = False
try:
from scenedetect import detect, ContentDetector
HAS_SCENEDETECT = True
except ImportError:
HAS_SCENEDETECT = False
try:
import easyocr
HAS_EASYOCR = True
except ImportError:
HAS_EASYOCR = False
def check_visual_dependencies() -> None:
"""Check visual extraction dependencies."""
missing = []
if not HAS_OPENCV:
missing.append("opencv-python-headless")
if not HAS_SCENEDETECT:
missing.append("scenedetect[opencv]")
if not HAS_EASYOCR:
missing.append("easyocr")
if missing:
raise ImportError(
f"Visual extraction requires: {', '.join(missing)}\n"
f"Install with: pip install skill-seekers[video-full]"
)
def check_ffmpeg() -> bool:
"""Check if FFmpeg is available."""
import shutil
return shutil.which('ffmpeg') is not None
```
---
## Dependency Check Command
Add a dependency check to the `config` command:
```bash
# Check all video dependencies
skill-seekers config --check-video
# Output:
# Video Dependencies:
# yt-dlp ✅ 2025.01.15
# youtube-transcript-api ✅ 1.2.3
# faster-whisper ❌ Not installed (pip install skill-seekers[video-full])
# opencv-python-headless ❌ Not installed
# scenedetect ❌ Not installed
# easyocr ❌ Not installed
#
# System Dependencies:
# FFmpeg ✅ 6.1.1
# GPU (CUDA) ❌ Not available (CPU mode will be used)
#
# Available modes:
# Transcript only ✅ YouTube captions available
# Whisper fallback ❌ Install faster-whisper
# Visual extraction ❌ Install video-full dependencies
```
---
## Model Management
### Whisper Models
Whisper models are downloaded on first use and cached in the user's home directory.
| Model | Download Size | Disk Size | First-Use Download Time |
|-------|-------------|-----------|------------------------|
| tiny | 75 MB | 75 MB | ~15s |
| base | 142 MB | 142 MB | ~25s |
| small | 466 MB | 466 MB | ~60s |
| medium | 1.5 GB | 1.5 GB | ~3 min |
| large-v3 | 3.1 GB | 3.1 GB | ~5 min |
| large-v3-turbo | 1.6 GB | 1.6 GB | ~3 min |
**Cache location:** `~/.cache/huggingface/hub/` (CTranslate2 models)
**Pre-download command:**
```bash
# Pre-download a model before using it
python -c "from faster_whisper import WhisperModel; WhisperModel('base')"
```
### easyocr Models
easyocr models are also downloaded on first use.
| Language Pack | Download Size | Disk Size |
|-------------|-------------|-----------|
| English | ~100 MB | ~100 MB |
| + Additional language | ~50-100 MB each | ~50-100 MB each |
**Cache location:** `~/.EasyOCR/model/`
**Pre-download command:**
```bash
# Pre-download English OCR model
python -c "import easyocr; easyocr.Reader(['en'])"
```
---
## Docker Considerations
### Dockerfile additions for video support
```dockerfile
# Tier 1 (lightweight)
RUN pip install skill-seekers[video]
# Tier 2 (full)
RUN apt-get update && apt-get install -y ffmpeg
RUN pip install skill-seekers[video-full]
# Pre-download Whisper model (avoids first-run download)
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('base')"
# Pre-download easyocr model
RUN python -c "import easyocr; easyocr.Reader(['en'])"
```
### Docker image sizes
| Tier | Base Image Size | Additional Size | Total |
|------|----------------|----------------|-------|
| Tier 1 (video) | ~300 MB | ~20 MB | ~320 MB |
| Tier 2 (video-full, CPU) | ~300 MB | ~800 MB | ~1.1 GB |
| Tier 2 (video-full, GPU) | ~5 GB (CUDA base) | ~800 MB | ~5.8 GB |
### Kubernetes resource recommendations
```yaml
# Tier 1 (transcript only)
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
# Tier 2 (full, CPU)
resources:
requests:
memory: "2Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "4000m"
# Tier 2 (full, GPU)
resources:
requests:
memory: "4Gi"
cpu: "2000m"
nvidia.com/gpu: 1
limits:
memory: "8Gi"
cpu: "4000m"
nvidia.com/gpu: 1
```

View File

@@ -32,6 +32,7 @@
- [unified](#unified) - Multi-source scraping
- [update](#update) - Incremental updates
- [upload](#upload) - Upload to platform
- [video](#video) - Video extraction & setup
- [workflows](#workflows) - Manage workflow presets
- [Common Workflows](#common-workflows)
- [Exit Codes](#exit-codes)
@@ -1035,6 +1036,44 @@ skill-seekers upload output/react-weaviate.zip --target weaviate \
---
### video
Extract skills from video tutorials (YouTube, Vimeo, or local files).
### Usage
```bash
# Setup (first time — auto-detects GPU, installs PyTorch + visual deps)
skill-seekers video --setup
# Extract from YouTube
skill-seekers video --url https://www.youtube.com/watch?v=VIDEO_ID --name my-skill
# With visual frame extraction (requires --setup first)
skill-seekers video --url VIDEO_URL --name my-skill --visual
# Local video file
skill-seekers video --url /path/to/video.mp4 --name my-skill
```
### Key Flags
| Flag | Description |
|------|-------------|
| `--setup` | Auto-detect GPU and install visual extraction dependencies |
| `--url URL` | Video URL (YouTube, Vimeo) or local file path |
| `--name NAME` | Skill name for output |
| `--visual` | Enable visual frame extraction (OCR on keyframes) |
| `--vision-api` | Use Claude Vision API as OCR fallback for low-confidence frames |
### Notes
- `--setup` detects NVIDIA (CUDA), AMD (ROCm), or CPU-only and installs the correct PyTorch variant
- Requires `pip install skill-seekers[video]` (transcripts) or `skill-seekers[video-full]` (+ whisper + scene detection)
- EasyOCR is NOT included in pip extras — it is installed by `--setup` with the correct GPU backend
---
### workflows
Manage enhancement workflow presets.

View File

@@ -52,7 +52,6 @@ dependencies = [
"anthropic>=0.76.0", # Required for AI enhancement (core feature)
"PyMuPDF>=1.24.14",
"Pillow>=11.0.0",
"pytesseract>=0.3.13",
"pydantic>=2.12.3",
"pydantic-settings>=2.11.0",
"python-dotenv>=1.1.1",
@@ -115,6 +114,24 @@ docx = [
"python-docx>=1.1.0",
]
# Video processing (lightweight: YouTube transcripts + metadata)
video = [
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
]
# Video processing (full: + Whisper + visual extraction)
# NOTE: easyocr removed — it pulls torch with the wrong GPU variant.
# Use: skill-seekers video --setup (auto-detects GPU, installs correct PyTorch + easyocr)
video-full = [
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
"faster-whisper>=1.0.0",
"scenedetect[opencv]>=0.6.4",
"opencv-python-headless>=4.9.0",
"pytesseract>=0.3.13",
]
# RAG vector database upload support
chroma = [
"chromadb>=0.4.0",
@@ -156,9 +173,13 @@ embedding = [
]
# All optional dependencies combined (dev dependencies now in [dependency-groups])
# Note: video-full deps (opencv, easyocr, faster-whisper) excluded due to heavy
# native dependencies. Install separately: pip install skill-seekers[video-full]
all = [
"mammoth>=1.6.0",
"python-docx>=1.1.0",
"yt-dlp>=2024.12.0",
"youtube-transcript-api>=1.2.0",
"mcp>=1.25,<2",
"httpx>=0.28.1",
"httpx-sse>=0.4.3",
@@ -201,6 +222,7 @@ skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
skill-seekers-github = "skill_seekers.cli.github_scraper:main"
skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
skill-seekers-word = "skill_seekers.cli.word_scraper:main"
skill-seekers-video = "skill_seekers.cli.video_scraper:main"
skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
skill-seekers-enhance = "skill_seekers.cli.enhance_command:main"
skill-seekers-enhance-status = "skill_seekers.cli.enhance_status:main"

View File

@@ -410,6 +410,111 @@ WORD_ARGUMENTS: dict[str, dict[str, Any]] = {
},
}
# Video specific (from video.py)
VIDEO_ARGUMENTS: dict[str, dict[str, Any]] = {
"video_url": {
"flags": ("--video-url",),
"kwargs": {
"type": str,
"help": "Video URL (YouTube, Vimeo)",
"metavar": "URL",
},
},
"video_file": {
"flags": ("--video-file",),
"kwargs": {
"type": str,
"help": "Local video file path",
"metavar": "PATH",
},
},
"video_playlist": {
"flags": ("--video-playlist",),
"kwargs": {
"type": str,
"help": "Playlist URL",
"metavar": "URL",
},
},
"video_languages": {
"flags": ("--video-languages",),
"kwargs": {
"type": str,
"default": "en",
"help": "Transcript language preference (comma-separated)",
"metavar": "LANGS",
},
},
"visual": {
"flags": ("--visual",),
"kwargs": {
"action": "store_true",
"help": "Enable visual extraction (requires video-full deps)",
},
},
"whisper_model": {
"flags": ("--whisper-model",),
"kwargs": {
"type": str,
"default": "base",
"help": "Whisper model size (default: base)",
"metavar": "MODEL",
},
},
"visual_interval": {
"flags": ("--visual-interval",),
"kwargs": {
"type": float,
"default": 0.7,
"help": "Visual scan interval in seconds (default: 0.7)",
"metavar": "SECS",
},
},
"visual_min_gap": {
"flags": ("--visual-min-gap",),
"kwargs": {
"type": float,
"default": 0.5,
"help": "Min gap between extracted frames in seconds (default: 0.5)",
"metavar": "SECS",
},
},
"visual_similarity": {
"flags": ("--visual-similarity",),
"kwargs": {
"type": float,
"default": 3.0,
"help": "Pixel-diff threshold for duplicate detection; lower = more frames (default: 3.0)",
"metavar": "THRESH",
},
},
"vision_ocr": {
"flags": ("--vision-ocr",),
"kwargs": {
"action": "store_true",
"help": "Use Claude Vision API as fallback for low-confidence code frames (requires ANTHROPIC_API_KEY, ~$0.004/frame)",
},
},
"start_time": {
"flags": ("--start-time",),
"kwargs": {
"type": str,
"default": None,
"metavar": "TIME",
"help": "Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
},
},
"end_time": {
"flags": ("--end-time",),
"kwargs": {
"type": str,
"default": None,
"metavar": "TIME",
"help": "End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
},
},
}
# Multi-source config specific (from unified_scraper.py)
CONFIG_ARGUMENTS: dict[str, dict[str, Any]] = {
"merge_mode": {
@@ -493,6 +598,7 @@ def get_source_specific_arguments(source_type: str) -> dict[str, dict[str, Any]]
"local": LOCAL_ARGUMENTS,
"pdf": PDF_ARGUMENTS,
"word": WORD_ARGUMENTS,
"video": VIDEO_ARGUMENTS,
"config": CONFIG_ARGUMENTS,
}
return source_args.get(source_type, {})
@@ -530,6 +636,7 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
- 'local': Universal + local-specific
- 'pdf': Universal + pdf-specific
- 'word': Universal + word-specific
- 'video': Universal + video-specific
- 'advanced': Advanced/rare arguments
- 'all': All 120+ arguments
@@ -570,6 +677,10 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
for arg_name, arg_def in WORD_ARGUMENTS.items():
parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
if mode in ["video", "all"]:
for arg_name, arg_def in VIDEO_ARGUMENTS.items():
parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
if mode in ["config", "all"]:
for arg_name, arg_def in CONFIG_ARGUMENTS.items():
parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])

View File

@@ -0,0 +1,166 @@
"""Video command argument definitions.
This module defines ALL arguments for the video command in ONE place.
Both video_scraper.py (standalone) and parsers/video_parser.py (unified CLI)
import and use these definitions.
Shared arguments (name, description, output, enhance-level, api-key,
dry-run, verbose, quiet, workflow args) come from common.py / workflow.py
via ``add_all_standard_arguments()``.
"""
import argparse
from typing import Any
from .common import add_all_standard_arguments
# Video-specific argument definitions as data structure
# NOTE: Shared args (name, description, output, enhance_level, api_key, dry_run,
# verbose, quiet, workflow args) are registered by add_all_standard_arguments().
VIDEO_ARGUMENTS: dict[str, dict[str, Any]] = {
"url": {
"flags": ("--url",),
"kwargs": {
"type": str,
"help": "Video URL (YouTube, Vimeo)",
"metavar": "URL",
},
},
"video_file": {
"flags": ("--video-file",),
"kwargs": {
"type": str,
"help": "Local video file path",
"metavar": "PATH",
},
},
"playlist": {
"flags": ("--playlist",),
"kwargs": {
"type": str,
"help": "Playlist URL",
"metavar": "URL",
},
},
"languages": {
"flags": ("--languages",),
"kwargs": {
"type": str,
"default": "en",
"help": "Transcript language preference (comma-separated, default: en)",
"metavar": "LANGS",
},
},
"visual": {
"flags": ("--visual",),
"kwargs": {
"action": "store_true",
"help": "Enable visual extraction (requires video-full deps)",
},
},
"whisper_model": {
"flags": ("--whisper-model",),
"kwargs": {
"type": str,
"default": "base",
"help": "Whisper model size (default: base)",
"metavar": "MODEL",
},
},
"from_json": {
"flags": ("--from-json",),
"kwargs": {
"type": str,
"help": "Build skill from extracted JSON",
"metavar": "FILE",
},
},
"visual_interval": {
"flags": ("--visual-interval",),
"kwargs": {
"type": float,
"default": 0.7,
"help": "Visual scan interval in seconds (default: 0.7)",
"metavar": "SECS",
},
},
"visual_min_gap": {
"flags": ("--visual-min-gap",),
"kwargs": {
"type": float,
"default": 0.5,
"help": "Minimum gap between extracted frames in seconds (default: 0.5)",
"metavar": "SECS",
},
},
"visual_similarity": {
"flags": ("--visual-similarity",),
"kwargs": {
"type": float,
"default": 3.0,
"help": "Pixel-diff threshold for duplicate frame detection; lower = more frames kept (default: 3.0)",
"metavar": "THRESH",
},
},
"vision_ocr": {
"flags": ("--vision-ocr",),
"kwargs": {
"action": "store_true",
"help": "Use Claude Vision API as fallback for low-confidence code frames (requires ANTHROPIC_API_KEY, ~$0.004/frame)",
},
},
"start_time": {
"flags": ("--start-time",),
"kwargs": {
"type": str,
"default": None,
"metavar": "TIME",
"help": "Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
},
},
"end_time": {
"flags": ("--end-time",),
"kwargs": {
"type": str,
"default": None,
"metavar": "TIME",
"help": "End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.",
},
},
"setup": {
"flags": ("--setup",),
"kwargs": {
"action": "store_true",
"help": "Auto-detect GPU and install visual extraction deps (PyTorch, easyocr, etc.)",
},
},
}
def add_video_arguments(parser: argparse.ArgumentParser) -> None:
"""Add all video command arguments to a parser.
Registers shared args (name, description, output, enhance-level, api-key,
dry-run, verbose, quiet, workflow args) via add_all_standard_arguments(),
then adds video-specific args on top.
The default for --enhance-level is overridden to 0 (disabled) for video.
"""
# Shared universal args first
add_all_standard_arguments(parser)
# Override enhance-level default to 0 for video
for action in parser._actions:
if hasattr(action, "dest") and action.dest == "enhance_level":
action.default = 0
action.help = (
"AI enhancement level (auto-detects API vs LOCAL mode): "
"0=disabled (default for video), 1=SKILL.md only, 2=+architecture/config, 3=full enhancement. "
"Mode selection: uses API if ANTHROPIC_API_KEY is set, otherwise LOCAL (Claude Code)"
)
# Video-specific args
for arg_name, arg_def in VIDEO_ARGUMENTS.items():
flags = arg_def["flags"]
kwargs = arg_def["kwargs"]
parser.add_argument(*flags, **kwargs)

View File

@@ -27,7 +27,7 @@ class ConfigValidator:
"""
# Valid source types
VALID_SOURCE_TYPES = {"documentation", "github", "pdf", "local"}
VALID_SOURCE_TYPES = {"documentation", "github", "pdf", "local", "word", "video"}
# Valid merge modes
VALID_MERGE_MODES = {"rule-based", "claude-enhanced"}

View File

@@ -134,6 +134,8 @@ class CreateCommand:
return self._route_pdf()
elif self.source_info.type == "word":
return self._route_word()
elif self.source_info.type == "video":
return self._route_video()
elif self.source_info.type == "config":
return self._route_config()
else:
@@ -349,6 +351,69 @@ class CreateCommand:
finally:
sys.argv = original_argv
def _route_video(self) -> int:
"""Route to video scraper (video_scraper.py)."""
from skill_seekers.cli import video_scraper
# Reconstruct argv for video_scraper
argv = ["video_scraper"]
# Add video source (URL or file)
parsed = self.source_info.parsed
video_playlist = getattr(self.args, "video_playlist", None)
if parsed.get("source_kind") == "file":
argv.extend(["--video-file", parsed["file_path"]])
elif video_playlist:
# Explicit --video-playlist flag takes precedence
argv.extend(["--playlist", video_playlist])
elif parsed.get("url"):
url = parsed["url"]
# Detect playlist vs single video
if "playlist" in url.lower():
argv.extend(["--playlist", url])
else:
argv.extend(["--url", url])
# Add universal arguments
self._add_common_args(argv)
# Add video-specific arguments
video_langs = getattr(self.args, "video_languages", None) or getattr(
self.args, "languages", None
)
if video_langs:
argv.extend(["--languages", video_langs])
if getattr(self.args, "visual", False):
argv.append("--visual")
if getattr(self.args, "vision_ocr", False):
argv.append("--vision-ocr")
if getattr(self.args, "whisper_model", None) and self.args.whisper_model != "base":
argv.extend(["--whisper-model", self.args.whisper_model])
vi = getattr(self.args, "visual_interval", None)
if vi is not None and vi != 0.7:
argv.extend(["--visual-interval", str(vi)])
vmg = getattr(self.args, "visual_min_gap", None)
if vmg is not None and vmg != 0.5:
argv.extend(["--visual-min-gap", str(vmg)])
vs = getattr(self.args, "visual_similarity", None)
if vs is not None and vs != 3.0:
argv.extend(["--visual-similarity", str(vs)])
st = getattr(self.args, "start_time", None)
if st is not None:
argv.extend(["--start-time", str(st)])
et = getattr(self.args, "end_time", None)
if et is not None:
argv.extend(["--end-time", str(et)])
# Call video_scraper with modified argv
logger.debug(f"Calling video_scraper with argv: {argv}")
original_argv = sys.argv
try:
sys.argv = argv
return video_scraper.main()
finally:
sys.argv = original_argv
def _route_config(self) -> int:
"""Route to unified scraper for config files (unified_scraper.py)."""
from skill_seekers.cli import unified_scraper
@@ -476,6 +541,8 @@ Examples:
Local: skill-seekers create ./my-project -p comprehensive
PDF: skill-seekers create tutorial.pdf --ocr
DOCX: skill-seekers create document.docx
Video: skill-seekers create https://youtube.com/watch?v=...
Video: skill-seekers create recording.mp4
Config: skill-seekers create configs/react.json
Source Auto-Detection:
@@ -484,6 +551,8 @@ Source Auto-Detection:
• ./path → local codebase
• file.pdf → PDF extraction
• file.docx → Word document extraction
• youtube.com/... → Video transcript extraction
• file.mp4 → Video file extraction
• file.json → multi-source config
Progressive Help (13 → 120+ flags):
@@ -491,6 +560,7 @@ Progressive Help (13 → 120+ flags):
--help-github GitHub repository options
--help-local Local codebase analysis
--help-pdf PDF extraction options
--help-video Video extraction options
--help-advanced Rare/advanced options
--help-all All options + compatibility
@@ -521,6 +591,9 @@ Common Workflows:
parser.add_argument(
"--help-word", action="store_true", help=argparse.SUPPRESS, dest="_help_word"
)
parser.add_argument(
"--help-video", action="store_true", help=argparse.SUPPRESS, dest="_help_video"
)
parser.add_argument(
"--help-config", action="store_true", help=argparse.SUPPRESS, dest="_help_config"
)
@@ -579,6 +652,15 @@ Common Workflows:
add_create_arguments(parser_word, mode="word")
parser_word.print_help()
return 0
elif args._help_video:
parser_video = argparse.ArgumentParser(
prog="skill-seekers create",
description="Create skill from video (YouTube, Vimeo, local files)",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
add_create_arguments(parser_video, mode="video")
parser_video.print_help()
return 0
elif args._help_config:
parser_config = argparse.ArgumentParser(
prog="skill-seekers create",

View File

@@ -97,9 +97,17 @@ class SkillEnhancer:
print(f"❌ Error calling Claude API: {e}")
return None
def _is_video_source(self, references):
"""Check if the references come from video tutorial extraction."""
return any(meta["source"] == "video_tutorial" for meta in references.values())
def _build_enhancement_prompt(self, references, current_skill_md):
"""Build the prompt for Claude with multi-source awareness"""
# Dispatch to video-specific prompt if video source detected
if self._is_video_source(references):
return self._build_video_enhancement_prompt(references, current_skill_md)
# Extract skill name and description
skill_name = self.skill_dir.name
@@ -276,6 +284,148 @@ Return ONLY the complete SKILL.md content, starting with the frontmatter (---).
return prompt
def _build_video_enhancement_prompt(self, references, current_skill_md):
"""Build a video-specific enhancement prompt.
Video tutorial references contain transcript text, OCR'd code panels,
code timelines with edits, and audio-visual alignment pairs. This prompt
is tailored to reconstruct clean code from noisy OCR, detect programming
languages from context, and synthesize a coherent tutorial skill.
"""
skill_name = self.skill_dir.name
prompt = f"""You are enhancing a Claude skill built from VIDEO TUTORIAL extraction. This skill is about: {skill_name}
The raw data was extracted from video tutorials using:
1. **Transcript** (speech-to-text) — HIGH quality, this is the primary signal
2. **OCR on code panels** — NOISY, may contain line numbers, UI chrome, garbled text
3. **Code Timeline** — Tracks code evolution across frames with diffs
4. **Audio-Visual Alignment** — Pairs of on-screen code + narrator explanation
CURRENT SKILL.MD:
{"```markdown" if current_skill_md else "(none - create from scratch)"}
{current_skill_md or "No existing SKILL.md"}
{"```" if current_skill_md else ""}
REFERENCE FILES:
"""
# Add all reference content
for filename, metadata in references.items():
content = metadata["content"]
if len(content) > 30000:
content = content[:30000] + "\n\n[Content truncated for size...]"
prompt += f"\n#### {filename}\n"
prompt += f"*Source: {metadata['source']}, Confidence: {metadata['confidence']}*\n\n"
prompt += f"```markdown\n{content}\n```\n"
prompt += """
VIDEO-SPECIFIC ENHANCEMENT INSTRUCTIONS:
You are working with data extracted from programming tutorial videos. The data has
specific characteristics you MUST handle:
## 1. OCR Code Reconstruction (CRITICAL)
The OCR'd code blocks are NOISY. Common issues you MUST fix:
- **Line numbers in code**: OCR captures line numbers (1, 2, 3...) as part of the code — STRIP THEM
- **UI chrome contamination**: Tab bars, file names, button text appear in code blocks — REMOVE
- **Garbled characters**: OCR errors like `l` → `1`, `O` → `0`, `rn` → `m` — FIX using context
- **Duplicate fragments**: Same code appears across multiple frames with minor OCR variations — DEDUPLICATE
- **Incomplete lines**: Lines cut off at panel edges — RECONSTRUCT from transcript context
- **Animation/timeline numbers**: Frame counters or timeline numbers in code — REMOVE
When reconstructing code:
- The TRANSCRIPT is the ground truth for WHAT the code does
- The OCR is the ground truth for HOW the code looks (syntax, structure)
- Combine both: use transcript to understand intent, OCR for actual code structure
- If OCR is too garbled, reconstruct the code based on what the narrator describes
## 2. Language Detection
The OCR-based language detection is often WRONG. Fix it by:
- Reading the transcript for language mentions ("in GDScript", "this Python function", "our C# class")
- Using code patterns: `extends`, `func`, `var`, `signal` = GDScript; `def`, `class`, `import` = Python;
`function`, `const`, `let` = JavaScript/TypeScript; `using`, `namespace` = C#
- Looking at file extensions mentioned in the transcript or visible in tab bars
- Using proper language tags in all code fences (```gdscript, ```python, etc.)
## 3. Code Timeline Processing
The "Code Timeline" section shows how code EVOLVES during the tutorial. Use it to:
- Show the FINAL version of each code block (not intermediate states)
- Optionally show key intermediate steps if the tutorial is about building up code progressively
- The edit diffs show exactly what changed between frames — use these to understand the tutorial flow
## 4. Audio-Visual Alignment
These are the MOST VALUABLE pairs: each links on-screen code with the narrator's explanation.
- Use these to create annotated code examples with inline comments
- The narrator text explains WHY each piece of code exists
- Cross-reference these pairs to build the "how-to" sections
## 5. Tutorial Structure
Transform the raw chronological data into a LOGICAL tutorial structure:
- Group by TOPIC, not by timestamp (e.g., "Setting Up the State Machine" not "Segment 3")
- Create clear section headers that describe what is being TAUGHT
- Build a progressive learning path: concepts build on each other
- Include prerequisite knowledge mentioned by the narrator
YOUR TASK — Create an enhanced SKILL.md:
1. **Clean Overview Section**
- What does this tutorial teach? (from transcript, NOT generic)
- Prerequisites mentioned by the narrator
- Key technologies/frameworks used (from actual code, not guesses)
2. **"When to Use This Skill" Section**
- Specific trigger conditions based on what the tutorial covers
- Use cases directly from the tutorial content
- Reference the framework/library/tool being taught
3. **Quick Reference Section** (MOST IMPORTANT)
- Extract 5-10 CLEAN, reconstructed code examples
- Each example must be:
a. Denoised (no line numbers, no UI chrome, no garbled text)
b. Complete (not cut off mid-line)
c. Properly language-tagged
d. Annotated with a description from the transcript
- Prefer code from Audio-Visual Alignment pairs (they have narrator context)
- Show the FINAL working version of each code block
4. **Step-by-Step Tutorial Section**
- Follow the tutorial's teaching flow
- Each step includes: clean code + explanation from transcript
- Use narrator's explanations as the descriptions (paraphrase, don't copy verbatim)
- Show code evolution where the tutorial builds up code incrementally
5. **Key Concepts Section**
- Extract terminology and concepts the narrator explains
- Define them using the narrator's own explanations
- Link concepts to specific code examples
6. **Reference Files Description**
- Explain what each reference file contains
- Note that OCR data is raw and may contain errors
- Point to the most useful sections (Audio-Visual Alignment, Code Timeline)
7. **Keep the frontmatter** (---\\nname: ...\\n---) intact if present
CRITICAL RULES:
- NEVER include raw OCR text with line numbers or UI chrome — always clean it first
- ALWAYS use correct language tags (detect from context, not from OCR metadata)
- The transcript is your BEST source for understanding content — trust it over garbled OCR
- Extract REAL code from the references, reconstruct where needed, but never invent code
- Keep code examples SHORT and focused (5-30 lines max per example)
- Make the skill actionable: someone reading it should be able to implement what the tutorial teaches
OUTPUT:
Return ONLY the complete SKILL.md content, starting with the frontmatter (---).
"""
return prompt
def save_enhanced_skill_md(self, content):
"""Save the enhanced SKILL.md"""
# Backup original

View File

@@ -12,6 +12,8 @@ Commands:
scrape Scrape documentation website
github Scrape GitHub repository
pdf Extract from PDF file
word Extract from Word (.docx) file
video Extract from video (YouTube or local)
unified Multi-source scraping (docs + GitHub + PDF)
analyze Analyze local codebase and extract code knowledge
enhance AI-powered enhancement (auto: API or LOCAL mode)
@@ -48,6 +50,7 @@ COMMAND_MODULES = {
"github": "skill_seekers.cli.github_scraper",
"pdf": "skill_seekers.cli.pdf_scraper",
"word": "skill_seekers.cli.word_scraper",
"video": "skill_seekers.cli.video_scraper",
"unified": "skill_seekers.cli.unified_scraper",
"enhance": "skill_seekers.cli.enhance_command",
"enhance-status": "skill_seekers.cli.enhance_status",
@@ -142,7 +145,6 @@ def _reconstruct_argv(command: str, args: argparse.Namespace) -> list[str]:
# Handle positional arguments (no -- prefix)
if key in [
"source", # create command
"url",
"directory",
"file",
"job_id",

View File

@@ -13,6 +13,7 @@ from .scrape_parser import ScrapeParser
from .github_parser import GitHubParser
from .pdf_parser import PDFParser
from .word_parser import WordParser
from .video_parser import VideoParser
from .unified_parser import UnifiedParser
from .enhance_parser import EnhanceParser
from .enhance_status_parser import EnhanceStatusParser
@@ -43,6 +44,7 @@ PARSERS = [
EnhanceStatusParser(),
PDFParser(),
WordParser(),
VideoParser(),
UnifiedParser(),
EstimateParser(),
InstallParser(),

View File

@@ -0,0 +1,32 @@
"""Video subcommand parser.
Uses shared argument definitions from arguments.video to ensure
consistency with the standalone video_scraper module.
"""
from .base import SubcommandParser
from skill_seekers.cli.arguments.video import add_video_arguments
class VideoParser(SubcommandParser):
"""Parser for video subcommand."""
@property
def name(self) -> str:
return "video"
@property
def help(self) -> str:
return "Extract from video (YouTube, local files)"
@property
def description(self) -> str:
return "Extract transcripts and metadata from videos and generate skill"
def add_arguments(self, parser):
"""Add video-specific arguments.
Uses shared argument definitions to ensure consistency
with video_scraper.py (standalone scraper).
"""
add_video_arguments(parser)

View File

@@ -63,24 +63,34 @@ class SourceDetector:
if source.endswith(".docx"):
return cls._detect_word(source)
# 2. Directory detection
# Video file extensions
VIDEO_EXTENSIONS = (".mp4", ".mkv", ".avi", ".mov", ".webm", ".flv", ".wmv")
if source.lower().endswith(VIDEO_EXTENSIONS):
return cls._detect_video_file(source)
# 2. Video URL detection (before directory check)
video_url_info = cls._detect_video_url(source)
if video_url_info:
return video_url_info
# 3. Directory detection
if os.path.isdir(source):
return cls._detect_local(source)
# 3. GitHub patterns
# 4. GitHub patterns
github_info = cls._detect_github(source)
if github_info:
return github_info
# 4. URL detection
# 5. URL detection
if source.startswith("http://") or source.startswith("https://"):
return cls._detect_web(source)
# 5. Domain inference (add https://)
# 6. Domain inference (add https://)
if "." in source and not source.startswith("/"):
return cls._detect_web(f"https://{source}")
# 6. Error - cannot determine
# 7. Error - cannot determine
raise ValueError(
f"Cannot determine source type for: {source}\n\n"
"Examples:\n"
@@ -89,6 +99,8 @@ class SourceDetector:
" Local: skill-seekers create ./my-project\n"
" PDF: skill-seekers create tutorial.pdf\n"
" DOCX: skill-seekers create document.docx\n"
" Video: skill-seekers create https://youtube.com/watch?v=...\n"
" Video: skill-seekers create recording.mp4\n"
" Config: skill-seekers create configs/react.json"
)
@@ -116,6 +128,55 @@ class SourceDetector:
type="word", parsed={"file_path": source}, suggested_name=name, raw_input=source
)
@classmethod
def _detect_video_file(cls, source: str) -> SourceInfo:
"""Detect local video file source."""
name = os.path.splitext(os.path.basename(source))[0]
return SourceInfo(
type="video",
parsed={"file_path": source, "source_kind": "file"},
suggested_name=name,
raw_input=source,
)
@classmethod
def _detect_video_url(cls, source: str) -> SourceInfo | None:
"""Detect video platform URL (YouTube, Vimeo).
Returns SourceInfo if the source is a video URL, None otherwise.
"""
lower = source.lower()
# YouTube patterns
youtube_keywords = ["youtube.com/watch", "youtu.be/", "youtube.com/playlist",
"youtube.com/@", "youtube.com/channel/", "youtube.com/c/",
"youtube.com/shorts/", "youtube.com/embed/"]
if any(kw in lower for kw in youtube_keywords):
# Determine suggested name
if "playlist" in lower:
name = "youtube_playlist"
elif "/@" in lower or "/channel/" in lower or "/c/" in lower:
name = "youtube_channel"
else:
name = "youtube_video"
return SourceInfo(
type="video",
parsed={"url": source, "source_kind": "url"},
suggested_name=name,
raw_input=source,
)
# Vimeo patterns
if "vimeo.com/" in lower:
return SourceInfo(
type="video",
parsed={"url": source, "source_kind": "url"},
suggested_name="vimeo_video",
raw_input=source,
)
return None
@classmethod
def _detect_local(cls, source: str) -> SourceInfo:
"""Detect local directory source."""
@@ -209,6 +270,15 @@ class SourceDetector:
if not os.path.isfile(file_path):
raise ValueError(f"Path is not a file: {file_path}")
elif source_info.type == "video":
if source_info.parsed.get("source_kind") == "file":
file_path = source_info.parsed["file_path"]
if not os.path.exists(file_path):
raise ValueError(f"Video file does not exist: {file_path}")
if not os.path.isfile(file_path):
raise ValueError(f"Path is not a file: {file_path}")
# URL-based video sources are validated during processing
elif source_info.type == "config":
config_path = source_info.parsed["config_path"]
if not os.path.exists(config_path):

View File

@@ -74,11 +74,19 @@ class UnifiedScraper:
"github": [], # List of github sources
"pdf": [], # List of pdf sources
"word": [], # List of word sources
"video": [], # List of video sources
"local": [], # List of local sources (docs or code)
}
# Track source index for unique naming (multi-source support)
self._source_counters = {"documentation": 0, "github": 0, "pdf": 0, "word": 0, "local": 0}
self._source_counters = {
"documentation": 0,
"github": 0,
"pdf": 0,
"word": 0,
"video": 0,
"local": 0,
}
# Output paths - cleaner organization
self.name = self.config["name"]
@@ -154,6 +162,8 @@ class UnifiedScraper:
self._scrape_pdf(source)
elif source_type == "word":
self._scrape_word(source)
elif source_type == "video":
self._scrape_video(source)
elif source_type == "local":
self._scrape_local(source)
else:
@@ -576,6 +586,66 @@ class UnifiedScraper:
logger.info(f"✅ Word: {len(word_data.get('pages', []))} sections extracted")
def _scrape_video(self, source: dict[str, Any]):
"""Scrape video source (YouTube, local file, etc.)."""
try:
from skill_seekers.cli.video_scraper import VideoToSkillConverter
except ImportError as e:
logger.error(
f"Video scraper dependencies not installed: {e}\n"
" Install with: pip install skill-seekers[video]\n"
" For visual extraction (frame analysis, OCR): pip install skill-seekers[video-full]"
)
return
# Multi-source support: Get unique index for this video source
idx = self._source_counters["video"]
self._source_counters["video"] += 1
# Determine video identifier
video_url = source.get("url", "")
video_id = video_url or source.get("path", f"video_{idx}")
# Create config for video scraper
video_config = {
"name": f"{self.name}_video_{idx}",
"url": source.get("url"),
"video_file": source.get("path"),
"playlist": source.get("playlist"),
"description": source.get("description", ""),
"languages": ",".join(source.get("languages", ["en"])),
"visual": source.get("visual_extraction", False),
"whisper_model": source.get("whisper_model", "base"),
}
# Process video
logger.info(f"Scraping video: {video_id}")
converter = VideoToSkillConverter(video_config)
try:
result = converter.process()
converter.save_extracted_data()
# Append to list
self.scraped_data["video"].append(
{
"video_id": video_id,
"idx": idx,
"data": result.to_dict(),
"data_file": converter.data_file,
}
)
# Build standalone SKILL.md for synthesis
converter.build_skill()
logger.info("✅ Video: Standalone SKILL.md created")
logger.info(
f"✅ Video: {len(result.videos)} videos, {result.total_segments} segments extracted"
)
except Exception as e:
logger.error(f"Failed to process video source: {e}")
def _scrape_local(self, source: dict[str, Any]):
"""
Scrape local directory (documentation files or source code).

View File

@@ -289,6 +289,10 @@ def read_reference_files(
else:
return "codebase_analysis", "medium", repo_id
# Video tutorial sources (video_*.md from video scraper)
elif relative_path.name.startswith("video_"):
return "video_tutorial", "high", None
# Conflicts report (discrepancy detection)
elif "conflicts" in path_str:
return "conflicts", "medium", None

View File

@@ -0,0 +1,270 @@
"""Video metadata extraction module.
Uses yt-dlp for metadata extraction without downloading video content.
Supports YouTube, Vimeo, and local video files.
"""
import hashlib
import logging
import os
import re
from skill_seekers.cli.video_models import (
Chapter,
VideoInfo,
VideoSourceType,
)
logger = logging.getLogger(__name__)
# Optional dependency: yt-dlp
try:
import yt_dlp
HAS_YTDLP = True
except ImportError:
HAS_YTDLP = False
# =============================================================================
# Video ID Extraction
# =============================================================================
# YouTube URL patterns
YOUTUBE_PATTERNS = [
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/watch\?v=([a-zA-Z0-9_-]{11})"),
re.compile(r"(?:https?://)?youtu\.be/([a-zA-Z0-9_-]{11})"),
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/embed/([a-zA-Z0-9_-]{11})"),
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/v/([a-zA-Z0-9_-]{11})"),
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/shorts/([a-zA-Z0-9_-]{11})"),
]
YOUTUBE_PLAYLIST_PATTERN = re.compile(
r"(?:https?://)?(?:www\.)?youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)"
)
YOUTUBE_CHANNEL_PATTERNS = [
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/@([a-zA-Z0-9_-]+)"),
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/channel/([a-zA-Z0-9_-]+)"),
re.compile(r"(?:https?://)?(?:www\.)?youtube\.com/c/([a-zA-Z0-9_-]+)"),
]
VIMEO_PATTERN = re.compile(r"(?:https?://)?(?:www\.)?vimeo\.com/(\d+)")
def extract_video_id(url: str) -> str | None:
"""Extract YouTube video ID from various URL formats.
Args:
url: YouTube URL in any supported format.
Returns:
11-character video ID, or None if not a YouTube URL.
"""
for pattern in YOUTUBE_PATTERNS:
match = pattern.search(url)
if match:
return match.group(1)
return None
def detect_video_source_type(url_or_path: str) -> VideoSourceType:
"""Detect the source type of a video URL or file path.
Args:
url_or_path: URL or local file path.
Returns:
VideoSourceType enum value.
"""
if os.path.isfile(url_or_path):
return VideoSourceType.LOCAL_FILE
if os.path.isdir(url_or_path):
return VideoSourceType.LOCAL_DIRECTORY
url_lower = url_or_path.lower()
if "youtube.com" in url_lower or "youtu.be" in url_lower:
return VideoSourceType.YOUTUBE
if "vimeo.com" in url_lower:
return VideoSourceType.VIMEO
return VideoSourceType.LOCAL_FILE
# =============================================================================
# YouTube Metadata via yt-dlp
# =============================================================================
def _check_ytdlp():
"""Raise RuntimeError if yt-dlp is not installed."""
if not HAS_YTDLP:
raise RuntimeError(
"yt-dlp is required for video metadata extraction.\n"
'Install with: pip install "skill-seekers[video]"\n'
"Or: pip install yt-dlp"
)
def extract_youtube_metadata(url: str) -> VideoInfo:
"""Extract metadata from a YouTube video URL without downloading.
Args:
url: YouTube video URL.
Returns:
VideoInfo with metadata populated.
Raises:
RuntimeError: If yt-dlp is not installed.
"""
_check_ytdlp()
ydl_opts = {
"quiet": True,
"no_warnings": True,
"extract_flat": False,
"skip_download": True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=False)
video_id = info.get("id", extract_video_id(url) or "unknown")
# Parse chapters
chapters = []
raw_chapters = info.get("chapters") or []
for i, ch in enumerate(raw_chapters):
end_time = ch.get("end_time", 0)
if i + 1 < len(raw_chapters):
end_time = raw_chapters[i + 1].get("start_time", end_time)
chapters.append(
Chapter(
title=ch.get("title", f"Chapter {i + 1}"),
start_time=ch.get("start_time", 0),
end_time=end_time,
)
)
return VideoInfo(
video_id=video_id,
source_type=VideoSourceType.YOUTUBE,
source_url=url,
title=info.get("title", ""),
description=info.get("description", ""),
duration=float(info.get("duration", 0)),
upload_date=info.get("upload_date"),
language=info.get("language") or "en",
channel_name=info.get("channel") or info.get("uploader"),
channel_url=info.get("channel_url") or info.get("uploader_url"),
view_count=info.get("view_count"),
like_count=info.get("like_count"),
comment_count=info.get("comment_count"),
tags=info.get("tags") or [],
categories=info.get("categories") or [],
thumbnail_url=info.get("thumbnail"),
chapters=chapters,
)
def extract_local_metadata(file_path: str) -> VideoInfo:
"""Extract basic metadata from a local video file.
Args:
file_path: Path to video file.
Returns:
VideoInfo with basic metadata from filename/file properties.
"""
path = os.path.abspath(file_path)
name = os.path.splitext(os.path.basename(path))[0]
video_id = hashlib.sha256(path.encode()).hexdigest()[:16]
return VideoInfo(
video_id=video_id,
source_type=VideoSourceType.LOCAL_FILE,
file_path=path,
title=name.replace("-", " ").replace("_", " ").title(),
duration=0.0, # Would need ffprobe for accurate duration
)
# =============================================================================
# Playlist / Channel Resolution
# =============================================================================
def resolve_playlist(url: str) -> list[str]:
"""Resolve a YouTube playlist URL to a list of video URLs.
Args:
url: YouTube playlist URL.
Returns:
List of video URLs in playlist order.
Raises:
RuntimeError: If yt-dlp is not installed.
"""
_check_ytdlp()
ydl_opts = {
"quiet": True,
"no_warnings": True,
"extract_flat": True,
"skip_download": True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=False)
entries = info.get("entries") or []
video_urls = []
for entry in entries:
vid_url = entry.get("url") or entry.get("webpage_url")
if vid_url:
video_urls.append(vid_url)
elif entry.get("id"):
video_urls.append(f"https://www.youtube.com/watch?v={entry['id']}")
return video_urls
def resolve_channel(url: str, max_videos: int = 50) -> list[str]:
"""Resolve a YouTube channel URL to a list of recent video URLs.
Args:
url: YouTube channel URL.
max_videos: Maximum number of videos to resolve.
Returns:
List of video URLs (most recent first).
Raises:
RuntimeError: If yt-dlp is not installed.
"""
_check_ytdlp()
ydl_opts = {
"quiet": True,
"no_warnings": True,
"extract_flat": True,
"skip_download": True,
"playlistend": max_videos,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=False)
entries = info.get("entries") or []
video_urls = []
for entry in entries:
vid_url = entry.get("url") or entry.get("webpage_url")
if vid_url:
video_urls.append(vid_url)
elif entry.get("id"):
video_urls.append(f"https://www.youtube.com/watch?v={entry['id']}")
return video_urls[:max_videos]

View File

@@ -0,0 +1,848 @@
"""Video source data models and type definitions.
Defines all enumerations and dataclasses for the video extraction pipeline:
- Enums: VideoSourceType, TranscriptSource, FrameType, CodeContext, SegmentContentType
- Core: VideoInfo, VideoSegment, VideoScraperResult
- Supporting: Chapter, TranscriptSegment, WordTimestamp, KeyFrame, OCRRegion,
FrameSubSection, CodeBlock
- Config: VideoSourceConfig
"""
from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
# =============================================================================
# Enumerations
# =============================================================================
class VideoSourceType(Enum):
"""Where a video came from."""
YOUTUBE = "youtube"
VIMEO = "vimeo"
LOCAL_FILE = "local_file"
LOCAL_DIRECTORY = "local_directory"
class TranscriptSource(Enum):
"""How the transcript was obtained."""
YOUTUBE_MANUAL = "youtube_manual"
YOUTUBE_AUTO = "youtube_auto_generated"
WHISPER = "whisper"
SUBTITLE_FILE = "subtitle_file"
NONE = "none"
class FrameType(Enum):
"""Classification of a keyframe's visual content."""
CODE_EDITOR = "code_editor"
TERMINAL = "terminal"
SLIDE = "slide"
DIAGRAM = "diagram"
BROWSER = "browser"
WEBCAM = "webcam"
SCREENCAST = "screencast"
OTHER = "other"
class CodeContext(Enum):
"""Where code was displayed in the video."""
EDITOR = "editor"
TERMINAL = "terminal"
SLIDE = "slide"
BROWSER = "browser"
UNKNOWN = "unknown"
class SegmentContentType(Enum):
"""Primary content type of a video segment."""
EXPLANATION = "explanation"
LIVE_CODING = "live_coding"
DEMO = "demo"
SLIDES = "slides"
Q_AND_A = "q_and_a"
INTRO = "intro"
OUTRO = "outro"
MIXED = "mixed"
class SegmentationStrategy(Enum):
"""How segments are determined."""
CHAPTERS = "chapters"
TIME_WINDOW = "time_window"
SCENE_CHANGE = "scene_change"
HYBRID = "hybrid"
# =============================================================================
# Supporting Data Classes
# =============================================================================
@dataclass(frozen=True)
class Chapter:
"""A chapter marker from a video (typically YouTube)."""
title: str
start_time: float
end_time: float
@property
def duration(self) -> float:
return self.end_time - self.start_time
def to_dict(self) -> dict:
return {
"title": self.title,
"start_time": self.start_time,
"end_time": self.end_time,
}
@classmethod
def from_dict(cls, data: dict) -> Chapter:
return cls(
title=data["title"],
start_time=data["start_time"],
end_time=data["end_time"],
)
@dataclass(frozen=True)
class WordTimestamp:
"""A single word with precise timing information."""
word: str
start: float
end: float
probability: float = 1.0
def to_dict(self) -> dict:
return {
"word": self.word,
"start": self.start,
"end": self.end,
"probability": self.probability,
}
@classmethod
def from_dict(cls, data: dict) -> WordTimestamp:
return cls(
word=data["word"],
start=data["start"],
end=data["end"],
probability=data.get("probability", 1.0),
)
@dataclass(frozen=True)
class TranscriptSegment:
"""A raw transcript segment from YouTube API or Whisper."""
text: str
start: float
end: float
confidence: float = 1.0
words: list[WordTimestamp] | None = None
source: TranscriptSource = TranscriptSource.NONE
def to_dict(self) -> dict:
return {
"text": self.text,
"start": self.start,
"end": self.end,
"confidence": self.confidence,
"words": [w.to_dict() for w in self.words] if self.words else None,
"source": self.source.value,
}
@classmethod
def from_dict(cls, data: dict) -> TranscriptSegment:
words = None
if data.get("words"):
words = [WordTimestamp.from_dict(w) for w in data["words"]]
return cls(
text=data["text"],
start=data["start"],
end=data["end"],
confidence=data.get("confidence", 1.0),
words=words,
source=TranscriptSource(data.get("source", "none")),
)
@dataclass(frozen=True)
class OCRRegion:
"""A detected text region in a video frame."""
text: str
confidence: float
bbox: tuple[int, int, int, int]
is_monospace: bool = False
def to_dict(self) -> dict:
return {
"text": self.text,
"confidence": self.confidence,
"bbox": list(self.bbox),
"is_monospace": self.is_monospace,
}
@classmethod
def from_dict(cls, data: dict) -> OCRRegion:
return cls(
text=data["text"],
confidence=data["confidence"],
bbox=tuple(data["bbox"]),
is_monospace=data.get("is_monospace", False),
)
@dataclass
class FrameSubSection:
"""A single panel/region within a video frame, OCR'd independently.
Each IDE panel (e.g. code editor, terminal, file tree) is detected
as a separate sub-section so that side-by-side editors produce
independent OCR results instead of being merged into one blob.
"""
bbox: tuple[int, int, int, int] # (x1, y1, x2, y2)
frame_type: FrameType = FrameType.OTHER
ocr_text: str = ""
ocr_regions: list[OCRRegion] = field(default_factory=list)
ocr_confidence: float = 0.0
panel_id: str = "" # e.g. "panel_0_0" (row_col)
_vision_used: bool = False # Whether Vision API was used for OCR
def to_dict(self) -> dict:
return {
"bbox": list(self.bbox),
"frame_type": self.frame_type.value,
"ocr_text": self.ocr_text,
"ocr_regions": [r.to_dict() for r in self.ocr_regions],
"ocr_confidence": self.ocr_confidence,
"panel_id": self.panel_id,
}
@classmethod
def from_dict(cls, data: dict) -> FrameSubSection:
return cls(
bbox=tuple(data["bbox"]),
frame_type=FrameType(data.get("frame_type", "other")),
ocr_text=data.get("ocr_text", ""),
ocr_regions=[OCRRegion.from_dict(r) for r in data.get("ocr_regions", [])],
ocr_confidence=data.get("ocr_confidence", 0.0),
panel_id=data.get("panel_id", ""),
)
@dataclass
class KeyFrame:
"""An extracted video frame with visual analysis results."""
timestamp: float
image_path: str
frame_type: FrameType = FrameType.OTHER
scene_change_score: float = 0.0
ocr_regions: list[OCRRegion] = field(default_factory=list)
ocr_text: str = ""
ocr_confidence: float = 0.0
width: int = 0
height: int = 0
sub_sections: list[FrameSubSection] = field(default_factory=list)
def to_dict(self) -> dict:
return {
"timestamp": self.timestamp,
"image_path": self.image_path,
"frame_type": self.frame_type.value,
"scene_change_score": self.scene_change_score,
"ocr_regions": [r.to_dict() for r in self.ocr_regions],
"ocr_text": self.ocr_text,
"ocr_confidence": self.ocr_confidence,
"width": self.width,
"height": self.height,
"sub_sections": [ss.to_dict() for ss in self.sub_sections],
}
@classmethod
def from_dict(cls, data: dict) -> KeyFrame:
return cls(
timestamp=data["timestamp"],
image_path=data["image_path"],
frame_type=FrameType(data.get("frame_type", "other")),
scene_change_score=data.get("scene_change_score", 0.0),
ocr_regions=[OCRRegion.from_dict(r) for r in data.get("ocr_regions", [])],
ocr_text=data.get("ocr_text", ""),
ocr_confidence=data.get("ocr_confidence", 0.0),
width=data.get("width", 0),
height=data.get("height", 0),
sub_sections=[FrameSubSection.from_dict(ss) for ss in data.get("sub_sections", [])],
)
@dataclass
class CodeBlock:
"""A code block detected via OCR from video frames."""
code: str
language: str | None = None
source_frame: float = 0.0
context: CodeContext = CodeContext.UNKNOWN
confidence: float = 0.0
text_group_id: str = ""
def to_dict(self) -> dict:
return {
"code": self.code,
"language": self.language,
"source_frame": self.source_frame,
"context": self.context.value,
"confidence": self.confidence,
"text_group_id": self.text_group_id,
}
@classmethod
def from_dict(cls, data: dict) -> CodeBlock:
return cls(
code=data["code"],
language=data.get("language"),
source_frame=data.get("source_frame", 0.0),
context=CodeContext(data.get("context", "unknown")),
confidence=data.get("confidence", 0.0),
text_group_id=data.get("text_group_id", ""),
)
@dataclass
class TextGroupEdit:
"""Represents an edit detected between appearances of a text group."""
timestamp: float
added_lines: list[str] = field(default_factory=list)
removed_lines: list[str] = field(default_factory=list)
modified_lines: list[dict] = field(default_factory=list)
def to_dict(self) -> dict:
return {
"timestamp": self.timestamp,
"added_lines": self.added_lines,
"removed_lines": self.removed_lines,
"modified_lines": self.modified_lines,
}
@classmethod
def from_dict(cls, data: dict) -> TextGroupEdit:
return cls(
timestamp=data["timestamp"],
added_lines=data.get("added_lines", []),
removed_lines=data.get("removed_lines", []),
modified_lines=data.get("modified_lines", []),
)
@dataclass
class TextGroup:
"""A group of related text blocks tracked across the video.
Represents a single code file/snippet as it appears and evolves
across multiple video frames.
"""
group_id: str
appearances: list[tuple[float, float]] = field(default_factory=list)
consensus_lines: list[dict] = field(default_factory=list)
edits: list[TextGroupEdit] = field(default_factory=list)
detected_language: str | None = None
frame_type: FrameType = FrameType.CODE_EDITOR
panel_id: str = "" # Tracks which panel this group originated from
@property
def full_text(self) -> str:
return "\n".join(line["text"] for line in self.consensus_lines if line.get("text"))
def to_dict(self) -> dict:
return {
"group_id": self.group_id,
"appearances": [[s, e] for s, e in self.appearances],
"consensus_lines": self.consensus_lines,
"edits": [e.to_dict() for e in self.edits],
"detected_language": self.detected_language,
"frame_type": self.frame_type.value,
"panel_id": self.panel_id,
"full_text": self.full_text,
}
@classmethod
def from_dict(cls, data: dict) -> TextGroup:
return cls(
group_id=data["group_id"],
appearances=[tuple(a) for a in data.get("appearances", [])],
consensus_lines=data.get("consensus_lines", []),
edits=[TextGroupEdit.from_dict(e) for e in data.get("edits", [])],
detected_language=data.get("detected_language"),
frame_type=FrameType(data.get("frame_type", "code_editor")),
panel_id=data.get("panel_id", ""),
)
@dataclass
class TextGroupTimeline:
"""Timeline of all text groups and their lifecycle in the video."""
text_groups: list[TextGroup] = field(default_factory=list)
total_code_time: float = 0.0
total_groups: int = 0
total_edits: int = 0
def get_groups_at_time(self, timestamp: float) -> list[TextGroup]:
"""Return all text groups visible at a given timestamp."""
return [
tg
for tg in self.text_groups
if any(start <= timestamp <= end for start, end in tg.appearances)
]
def to_dict(self) -> dict:
return {
"text_groups": [tg.to_dict() for tg in self.text_groups],
"total_code_time": self.total_code_time,
"total_groups": self.total_groups,
"total_edits": self.total_edits,
}
@classmethod
def from_dict(cls, data: dict) -> TextGroupTimeline:
return cls(
text_groups=[TextGroup.from_dict(tg) for tg in data.get("text_groups", [])],
total_code_time=data.get("total_code_time", 0.0),
total_groups=data.get("total_groups", 0),
total_edits=data.get("total_edits", 0),
)
@dataclass
class AudioVisualAlignment:
"""Links on-screen code with concurrent transcript narration."""
text_group_id: str
start_time: float
end_time: float
on_screen_code: str
transcript_during: str
language: str | None = None
def to_dict(self) -> dict:
return {
"text_group_id": self.text_group_id,
"start_time": self.start_time,
"end_time": self.end_time,
"on_screen_code": self.on_screen_code,
"transcript_during": self.transcript_during,
"language": self.language,
}
@classmethod
def from_dict(cls, data: dict) -> AudioVisualAlignment:
return cls(
text_group_id=data["text_group_id"],
start_time=data["start_time"],
end_time=data["end_time"],
on_screen_code=data["on_screen_code"],
transcript_during=data.get("transcript_during", ""),
language=data.get("language"),
)
# =============================================================================
# Core Data Classes
# =============================================================================
@dataclass
class VideoSegment:
"""A time-aligned segment combining transcript + visual + metadata."""
index: int
start_time: float
end_time: float
duration: float
# Stream 1: ASR (Audio)
transcript: str = ""
words: list[WordTimestamp] = field(default_factory=list)
transcript_confidence: float = 0.0
# Stream 2: OCR (Visual)
keyframes: list[KeyFrame] = field(default_factory=list)
ocr_text: str = ""
detected_code_blocks: list[CodeBlock] = field(default_factory=list)
has_code_on_screen: bool = False
has_slides: bool = False
has_diagram: bool = False
# Stream 3: Metadata
chapter_title: str | None = None
topic: str | None = None
category: str | None = None
# Merged content
content: str = ""
summary: str | None = None
# Quality metadata
confidence: float = 0.0
content_type: SegmentContentType = SegmentContentType.MIXED
def to_dict(self) -> dict:
return {
"index": self.index,
"start_time": self.start_time,
"end_time": self.end_time,
"duration": self.duration,
"transcript": self.transcript,
"words": [w.to_dict() for w in self.words],
"transcript_confidence": self.transcript_confidence,
"keyframes": [k.to_dict() for k in self.keyframes],
"ocr_text": self.ocr_text,
"detected_code_blocks": [c.to_dict() for c in self.detected_code_blocks],
"has_code_on_screen": self.has_code_on_screen,
"has_slides": self.has_slides,
"has_diagram": self.has_diagram,
"chapter_title": self.chapter_title,
"topic": self.topic,
"category": self.category,
"content": self.content,
"summary": self.summary,
"confidence": self.confidence,
"content_type": self.content_type.value,
}
@classmethod
def from_dict(cls, data: dict) -> VideoSegment:
return cls(
index=data["index"],
start_time=data["start_time"],
end_time=data["end_time"],
duration=data["duration"],
transcript=data.get("transcript", ""),
words=[WordTimestamp.from_dict(w) for w in data.get("words", [])],
transcript_confidence=data.get("transcript_confidence", 0.0),
keyframes=[KeyFrame.from_dict(k) for k in data.get("keyframes", [])],
ocr_text=data.get("ocr_text", ""),
detected_code_blocks=[
CodeBlock.from_dict(c) for c in data.get("detected_code_blocks", [])
],
has_code_on_screen=data.get("has_code_on_screen", False),
has_slides=data.get("has_slides", False),
has_diagram=data.get("has_diagram", False),
chapter_title=data.get("chapter_title"),
topic=data.get("topic"),
category=data.get("category"),
content=data.get("content", ""),
summary=data.get("summary"),
confidence=data.get("confidence", 0.0),
content_type=SegmentContentType(data.get("content_type", "mixed")),
)
@property
def timestamp_display(self) -> str:
"""Human-readable timestamp (e.g., '05:30 - 08:15')."""
start_min, start_sec = divmod(int(self.start_time), 60)
end_min, end_sec = divmod(int(self.end_time), 60)
if self.start_time >= 3600 or self.end_time >= 3600:
start_hr, start_min = divmod(start_min, 60)
end_hr, end_min = divmod(end_min, 60)
return f"{start_hr:d}:{start_min:02d}:{start_sec:02d} - {end_hr:d}:{end_min:02d}:{end_sec:02d}"
return f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
@dataclass
class VideoInfo:
"""Complete metadata and extracted content for a single video."""
# Identity
video_id: str
source_type: VideoSourceType
source_url: str | None = None
file_path: str | None = None
# Basic metadata
title: str = ""
description: str = ""
duration: float = 0.0
upload_date: str | None = None
language: str = "en"
# Channel / Author
channel_name: str | None = None
channel_url: str | None = None
# Engagement metadata
view_count: int | None = None
like_count: int | None = None
comment_count: int | None = None
# Discovery metadata
tags: list[str] = field(default_factory=list)
categories: list[str] = field(default_factory=list)
thumbnail_url: str | None = None
# Structure
chapters: list[Chapter] = field(default_factory=list)
# Playlist context
playlist_title: str | None = None
playlist_index: int | None = None
playlist_total: int | None = None
# Extracted content
raw_transcript: list[TranscriptSegment] = field(default_factory=list)
segments: list[VideoSegment] = field(default_factory=list)
# Processing metadata
transcript_source: TranscriptSource = TranscriptSource.NONE
visual_extraction_enabled: bool = False
whisper_model: str | None = None
processing_time_seconds: float = 0.0
extracted_at: str = ""
# Quality scores
transcript_confidence: float = 0.0
content_richness_score: float = 0.0
# Time-clipping metadata (None when full video is used)
original_duration: float | None = None
clip_start: float | None = None
clip_end: float | None = None
# Consensus-based text tracking (Phase A-D)
text_group_timeline: TextGroupTimeline | None = None
audio_visual_alignments: list[AudioVisualAlignment] = field(default_factory=list)
def to_dict(self) -> dict:
return {
"video_id": self.video_id,
"source_type": self.source_type.value,
"source_url": self.source_url,
"file_path": self.file_path,
"title": self.title,
"description": self.description,
"duration": self.duration,
"upload_date": self.upload_date,
"language": self.language,
"channel_name": self.channel_name,
"channel_url": self.channel_url,
"view_count": self.view_count,
"like_count": self.like_count,
"comment_count": self.comment_count,
"tags": self.tags,
"categories": self.categories,
"thumbnail_url": self.thumbnail_url,
"chapters": [c.to_dict() for c in self.chapters],
"playlist_title": self.playlist_title,
"playlist_index": self.playlist_index,
"playlist_total": self.playlist_total,
"raw_transcript": [t.to_dict() for t in self.raw_transcript],
"segments": [s.to_dict() for s in self.segments],
"transcript_source": self.transcript_source.value,
"visual_extraction_enabled": self.visual_extraction_enabled,
"whisper_model": self.whisper_model,
"processing_time_seconds": self.processing_time_seconds,
"extracted_at": self.extracted_at,
"transcript_confidence": self.transcript_confidence,
"content_richness_score": self.content_richness_score,
"original_duration": self.original_duration,
"clip_start": self.clip_start,
"clip_end": self.clip_end,
"text_group_timeline": self.text_group_timeline.to_dict()
if self.text_group_timeline
else None,
"audio_visual_alignments": [a.to_dict() for a in self.audio_visual_alignments],
}
@classmethod
def from_dict(cls, data: dict) -> VideoInfo:
timeline_data = data.get("text_group_timeline")
timeline = TextGroupTimeline.from_dict(timeline_data) if timeline_data else None
return cls(
video_id=data["video_id"],
source_type=VideoSourceType(data["source_type"]),
source_url=data.get("source_url"),
file_path=data.get("file_path"),
title=data.get("title", ""),
description=data.get("description", ""),
duration=data.get("duration", 0.0),
upload_date=data.get("upload_date"),
language=data.get("language", "en"),
channel_name=data.get("channel_name"),
channel_url=data.get("channel_url"),
view_count=data.get("view_count"),
like_count=data.get("like_count"),
comment_count=data.get("comment_count"),
tags=data.get("tags", []),
categories=data.get("categories", []),
thumbnail_url=data.get("thumbnail_url"),
chapters=[Chapter.from_dict(c) for c in data.get("chapters", [])],
playlist_title=data.get("playlist_title"),
playlist_index=data.get("playlist_index"),
playlist_total=data.get("playlist_total"),
raw_transcript=[TranscriptSegment.from_dict(t) for t in data.get("raw_transcript", [])],
segments=[VideoSegment.from_dict(s) for s in data.get("segments", [])],
transcript_source=TranscriptSource(data.get("transcript_source", "none")),
visual_extraction_enabled=data.get("visual_extraction_enabled", False),
whisper_model=data.get("whisper_model"),
processing_time_seconds=data.get("processing_time_seconds", 0.0),
extracted_at=data.get("extracted_at", ""),
transcript_confidence=data.get("transcript_confidence", 0.0),
content_richness_score=data.get("content_richness_score", 0.0),
original_duration=data.get("original_duration"),
clip_start=data.get("clip_start"),
clip_end=data.get("clip_end"),
text_group_timeline=timeline,
audio_visual_alignments=[
AudioVisualAlignment.from_dict(a) for a in data.get("audio_visual_alignments", [])
],
)
@dataclass
class VideoSourceConfig:
"""Configuration for video source processing."""
# Source specification (exactly one should be set)
url: str | None = None
playlist: str | None = None
channel: str | None = None
path: str | None = None
directory: str | None = None
# Identity
name: str = "video"
description: str = ""
# Filtering
max_videos: int = 50
languages: list[str] | None = None
# Extraction
visual_extraction: bool = False
whisper_model: str = "base"
# Segmentation
time_window_seconds: float = 120.0
min_segment_duration: float = 10.0
max_segment_duration: float = 600.0
# Categorization
categories: dict[str, list[str]] | None = None
# Subtitle files
subtitle_patterns: list[str] | None = None
# Time-clipping (single video only)
clip_start: float | None = None
clip_end: float | None = None
@classmethod
def from_dict(cls, data: dict) -> VideoSourceConfig:
return cls(
url=data.get("url"),
playlist=data.get("playlist"),
channel=data.get("channel"),
path=data.get("path"),
directory=data.get("directory"),
name=data.get("name", "video"),
description=data.get("description", ""),
max_videos=data.get("max_videos", 50),
languages=data.get("languages"),
visual_extraction=data.get("visual_extraction", False),
whisper_model=data.get("whisper_model", "base"),
time_window_seconds=data.get("time_window_seconds", 120.0),
min_segment_duration=data.get("min_segment_duration", 10.0),
max_segment_duration=data.get("max_segment_duration", 600.0),
categories=data.get("categories"),
subtitle_patterns=data.get("subtitle_patterns"),
clip_start=data.get("clip_start"),
clip_end=data.get("clip_end"),
)
def validate(self) -> list[str]:
"""Validate configuration. Returns list of errors."""
errors = []
sources_set = sum(
1
for s in [self.url, self.playlist, self.channel, self.path, self.directory]
if s is not None
)
if sources_set == 0:
errors.append(
"Video source must specify one of: url, playlist, channel, path, directory"
)
if sources_set > 1:
errors.append("Video source must specify exactly one source type")
# Clip range validation
has_clip = self.clip_start is not None or self.clip_end is not None
if has_clip and self.playlist is not None:
errors.append(
"--start-time/--end-time cannot be used with --playlist. "
"Clip range is for single videos only."
)
if (
self.clip_start is not None
and self.clip_end is not None
and self.clip_start >= self.clip_end
):
errors.append(
f"--start-time ({self.clip_start}s) must be before --end-time ({self.clip_end}s)"
)
return errors
@dataclass
class VideoScraperResult:
"""Complete result from the video scraper."""
videos: list[VideoInfo] = field(default_factory=list)
total_duration_seconds: float = 0.0
total_segments: int = 0
total_code_blocks: int = 0
config: VideoSourceConfig | None = None
processing_time_seconds: float = 0.0
warnings: list[str] = field(default_factory=list)
errors: list[dict[str, Any]] = field(default_factory=list)
def to_dict(self) -> dict:
return {
"videos": [v.to_dict() for v in self.videos],
"total_duration_seconds": self.total_duration_seconds,
"total_segments": self.total_segments,
"total_code_blocks": self.total_code_blocks,
"processing_time_seconds": self.processing_time_seconds,
"warnings": self.warnings,
"errors": self.errors,
}
@classmethod
def from_dict(cls, data: dict) -> VideoScraperResult:
return cls(
videos=[VideoInfo.from_dict(v) for v in data.get("videos", [])],
total_duration_seconds=data.get("total_duration_seconds", 0.0),
total_segments=data.get("total_segments", 0),
total_code_blocks=data.get("total_code_blocks", 0),
processing_time_seconds=data.get("processing_time_seconds", 0.0),
warnings=data.get("warnings", []),
errors=data.get("errors", []),
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,231 @@
"""Video segmentation module.
Aligns transcript + metadata into VideoSegment objects using:
1. Chapter-based segmentation (primary — uses YouTube chapters)
2. Time-window segmentation (fallback — fixed-duration windows)
"""
import logging
from skill_seekers.cli.video_models import (
SegmentContentType,
TranscriptSegment,
VideoInfo,
VideoSegment,
VideoSourceConfig,
)
logger = logging.getLogger(__name__)
def _classify_content_type(transcript: str) -> SegmentContentType:
"""Classify segment content type based on transcript text."""
lower = transcript.lower()
code_indicators = ["import ", "def ", "class ", "function ", "const ", "npm ", "pip ", "git "]
intro_indicators = ["welcome", "hello", "today we", "in this video", "let's get started"]
outro_indicators = ["thanks for watching", "subscribe", "see you next", "that's it for"]
if any(kw in lower for kw in outro_indicators):
return SegmentContentType.OUTRO
if any(kw in lower for kw in intro_indicators):
return SegmentContentType.INTRO
if sum(1 for kw in code_indicators if kw in lower) >= 2:
return SegmentContentType.LIVE_CODING
return SegmentContentType.EXPLANATION
def _build_segment_content(
transcript: str,
chapter_title: str | None,
start_time: float,
end_time: float,
) -> str:
"""Build merged content string for a segment."""
parts = []
# Add chapter heading
start_min, start_sec = divmod(int(start_time), 60)
end_min, end_sec = divmod(int(end_time), 60)
ts = f"{start_min:02d}:{start_sec:02d} - {end_min:02d}:{end_sec:02d}"
if chapter_title:
parts.append(f"### {chapter_title} ({ts})\n")
else:
parts.append(f"### Segment ({ts})\n")
if transcript:
parts.append(transcript)
return "\n".join(parts)
def _get_transcript_in_range(
transcript_segments: list[TranscriptSegment],
start_time: float,
end_time: float,
) -> tuple[str, float]:
"""Get concatenated transcript text and average confidence for a time range.
Returns:
Tuple of (text, avg_confidence).
"""
texts = []
confidences = []
for seg in transcript_segments:
# Check overlap: segment overlaps with time range
if seg.end > start_time and seg.start < end_time:
texts.append(seg.text)
confidences.append(seg.confidence)
text = " ".join(texts)
avg_confidence = sum(confidences) / len(confidences) if confidences else 0.0
return text, avg_confidence
def segment_by_chapters(
video_info: VideoInfo,
transcript_segments: list[TranscriptSegment],
) -> list[VideoSegment]:
"""Segment video using YouTube chapter boundaries.
Args:
video_info: Video metadata with chapters.
transcript_segments: Raw transcript segments.
Returns:
List of VideoSegment objects aligned to chapters.
"""
segments = []
for i, chapter in enumerate(video_info.chapters):
transcript, confidence = _get_transcript_in_range(
transcript_segments, chapter.start_time, chapter.end_time
)
content_type = _classify_content_type(transcript)
content = _build_segment_content(
transcript, chapter.title, chapter.start_time, chapter.end_time
)
segments.append(
VideoSegment(
index=i,
start_time=chapter.start_time,
end_time=chapter.end_time,
duration=chapter.end_time - chapter.start_time,
transcript=transcript,
transcript_confidence=confidence,
chapter_title=chapter.title,
content=content,
confidence=confidence,
content_type=content_type,
)
)
return segments
def segment_by_time_window(
video_info: VideoInfo,
transcript_segments: list[TranscriptSegment],
window_seconds: float = 120.0,
start_offset: float = 0.0,
end_limit: float | None = None,
) -> list[VideoSegment]:
"""Segment video using fixed time windows.
Args:
video_info: Video metadata.
transcript_segments: Raw transcript segments.
window_seconds: Duration of each window in seconds.
start_offset: Start segmentation at this time (seconds).
end_limit: Stop segmentation at this time (seconds). None = full duration.
Returns:
List of VideoSegment objects.
"""
segments = []
duration = video_info.duration
if duration <= 0 and transcript_segments:
duration = max(seg.end for seg in transcript_segments)
if end_limit is not None:
duration = min(duration, end_limit)
if duration <= 0:
return segments
current_time = start_offset
index = 0
while current_time < duration:
end_time = min(current_time + window_seconds, duration)
transcript, confidence = _get_transcript_in_range(
transcript_segments, current_time, end_time
)
if transcript.strip():
content_type = _classify_content_type(transcript)
content = _build_segment_content(transcript, None, current_time, end_time)
segments.append(
VideoSegment(
index=index,
start_time=current_time,
end_time=end_time,
duration=end_time - current_time,
transcript=transcript,
transcript_confidence=confidence,
content=content,
confidence=confidence,
content_type=content_type,
)
)
index += 1
current_time = end_time
return segments
def segment_video(
video_info: VideoInfo,
transcript_segments: list[TranscriptSegment],
config: VideoSourceConfig,
) -> list[VideoSegment]:
"""Segment a video using the best available strategy.
Priority:
1. Chapter-based (if chapters available)
2. Time-window fallback
Args:
video_info: Video metadata.
transcript_segments: Raw transcript segments.
config: Video source configuration.
Returns:
List of VideoSegment objects.
"""
# Use chapters if available
if video_info.chapters:
logger.info(f"Using chapter-based segmentation ({len(video_info.chapters)} chapters)")
segments = segment_by_chapters(video_info, transcript_segments)
if segments:
return segments
# Fallback to time-window
window = config.time_window_seconds
logger.info(f"Using time-window segmentation ({window}s windows)")
return segment_by_time_window(
video_info,
transcript_segments,
window,
start_offset=config.clip_start or 0.0,
end_limit=config.clip_end,
)

View File

@@ -0,0 +1,835 @@
"""GPU auto-detection and video dependency installation.
Detects NVIDIA (CUDA) or AMD (ROCm) GPUs using system tools (without
requiring torch to be installed) and installs the correct PyTorch variant
plus all visual extraction dependencies (easyocr, opencv, etc.).
Also handles:
- Virtual environment creation (if not already in one)
- System dependency checks (tesseract binary)
- ROCm environment variable configuration (MIOPEN_FIND_MODE)
Usage:
skill-seekers video --setup # Interactive (all modules)
skill-seekers video --setup # Interactive, choose modules
From MCP: run_setup(interactive=False)
"""
from __future__ import annotations
import logging
import os
import platform
import re
import shutil
import subprocess
import sys
import venv
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
logger = logging.getLogger(__name__)
# =============================================================================
# Data Structures
# =============================================================================
class GPUVendor(Enum):
"""Detected GPU hardware vendor."""
NVIDIA = "nvidia"
AMD = "amd"
NONE = "none"
@dataclass
class GPUInfo:
"""Result of GPU auto-detection."""
vendor: GPUVendor
name: str = ""
compute_version: str = ""
index_url: str = ""
details: list[str] = field(default_factory=list)
@dataclass
class SetupModules:
"""Which modules to install during setup."""
torch: bool = True
easyocr: bool = True
opencv: bool = True
tesseract: bool = True
scenedetect: bool = True
whisper: bool = True
# =============================================================================
# PyTorch Index URL Mapping
# =============================================================================
_PYTORCH_BASE = "https://download.pytorch.org/whl"
def _cuda_version_to_index_url(version: str) -> str:
"""Map a CUDA version string to the correct PyTorch index URL."""
try:
parts = version.split(".")
major = int(parts[0])
minor = int(parts[1]) if len(parts) > 1 else 0
ver = major + minor / 10.0
except (ValueError, IndexError):
return f"{_PYTORCH_BASE}/cpu"
if ver >= 12.4:
return f"{_PYTORCH_BASE}/cu124"
if ver >= 12.1:
return f"{_PYTORCH_BASE}/cu121"
if ver >= 11.8:
return f"{_PYTORCH_BASE}/cu118"
return f"{_PYTORCH_BASE}/cpu"
def _rocm_version_to_index_url(version: str) -> str:
"""Map a ROCm version string to the correct PyTorch index URL."""
try:
parts = version.split(".")
major = int(parts[0])
minor = int(parts[1]) if len(parts) > 1 else 0
ver = major + minor / 10.0
except (ValueError, IndexError):
return f"{_PYTORCH_BASE}/cpu"
if ver >= 6.3:
return f"{_PYTORCH_BASE}/rocm6.3"
if ver >= 6.0:
return f"{_PYTORCH_BASE}/rocm6.2.4"
return f"{_PYTORCH_BASE}/cpu"
# =============================================================================
# GPU Detection (without torch)
# =============================================================================
def detect_gpu() -> GPUInfo:
"""Detect GPU vendor and compute version using system tools.
Detection order:
1. nvidia-smi -> NVIDIA + CUDA version
2. rocminfo -> AMD + ROCm version
3. lspci -> AMD GPU present but no ROCm (warn)
4. Fallback -> CPU-only
"""
# 1. Check NVIDIA
nvidia = _check_nvidia()
if nvidia is not None:
return nvidia
# 2. Check AMD ROCm
amd = _check_amd_rocm()
if amd is not None:
return amd
# 3. Check if AMD GPU exists but ROCm isn't installed
amd_no_rocm = _check_amd_lspci()
if amd_no_rocm is not None:
return amd_no_rocm
# 4. CPU fallback
return GPUInfo(
vendor=GPUVendor.NONE,
name="CPU-only",
index_url=f"{_PYTORCH_BASE}/cpu",
details=["No GPU detected, will use CPU-only PyTorch"],
)
def _check_nvidia() -> GPUInfo | None:
"""Detect NVIDIA GPU via nvidia-smi."""
if not shutil.which("nvidia-smi"):
return None
try:
result = subprocess.run(
["nvidia-smi"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
return None
output = result.stdout
# Parse CUDA version from "CUDA Version: X.Y"
cuda_match = re.search(r"CUDA Version:\s*(\d+\.\d+)", output)
cuda_ver = cuda_match.group(1) if cuda_match else ""
# Parse GPU name from the table row (e.g., "NVIDIA GeForce RTX 4090")
gpu_name = ""
name_match = re.search(r"\|\s+(NVIDIA[^\|]+?)\s+(?:On|Off)\s+\|", output)
if name_match:
gpu_name = name_match.group(1).strip()
index_url = _cuda_version_to_index_url(cuda_ver) if cuda_ver else f"{_PYTORCH_BASE}/cpu"
return GPUInfo(
vendor=GPUVendor.NVIDIA,
name=gpu_name or "NVIDIA GPU",
compute_version=cuda_ver,
index_url=index_url,
details=[f"CUDA {cuda_ver}" if cuda_ver else "CUDA version unknown"],
)
except (subprocess.TimeoutExpired, OSError):
return None
def _check_amd_rocm() -> GPUInfo | None:
"""Detect AMD GPU via rocminfo."""
if not shutil.which("rocminfo"):
return None
try:
result = subprocess.run(
["rocminfo"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
return None
output = result.stdout
# Parse GPU name from "Name: gfx..." or "Marketing Name: ..."
gpu_name = ""
marketing_match = re.search(r"Marketing Name:\s*(.+)", output)
if marketing_match:
gpu_name = marketing_match.group(1).strip()
# Get ROCm version from /opt/rocm/.info/version
rocm_ver = _read_rocm_version()
index_url = _rocm_version_to_index_url(rocm_ver) if rocm_ver else f"{_PYTORCH_BASE}/cpu"
return GPUInfo(
vendor=GPUVendor.AMD,
name=gpu_name or "AMD GPU",
compute_version=rocm_ver,
index_url=index_url,
details=[f"ROCm {rocm_ver}" if rocm_ver else "ROCm version unknown"],
)
except (subprocess.TimeoutExpired, OSError):
return None
def _read_rocm_version() -> str:
"""Read ROCm version from /opt/rocm/.info/version."""
try:
with open("/opt/rocm/.info/version") as f:
return f.read().strip().split("-")[0]
except (OSError, IOError):
return ""
def _check_amd_lspci() -> GPUInfo | None:
"""Detect AMD GPU via lspci when ROCm isn't installed."""
if not shutil.which("lspci"):
return None
try:
result = subprocess.run(
["lspci"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
return None
# Look for AMD/ATI VGA or Display controllers
for line in result.stdout.splitlines():
if ("VGA" in line or "Display" in line) and ("AMD" in line or "ATI" in line):
return GPUInfo(
vendor=GPUVendor.AMD,
name=line.split(":")[-1].strip() if ":" in line else "AMD GPU",
compute_version="",
index_url=f"{_PYTORCH_BASE}/cpu",
details=[
"AMD GPU detected but ROCm is not installed",
"Install ROCm first for GPU acceleration: https://rocm.docs.amd.com/",
"Falling back to CPU-only PyTorch",
],
)
except (subprocess.TimeoutExpired, OSError):
pass
return None
# =============================================================================
# Virtual Environment
# =============================================================================
def is_in_venv() -> bool:
"""Check if the current Python process is running inside a venv."""
return sys.prefix != sys.base_prefix
def create_venv(venv_path: str = ".venv") -> bool:
"""Create a virtual environment and return True on success."""
path = Path(venv_path).resolve()
if path.exists():
logger.info(f"Venv already exists at {path}")
return True
try:
venv.create(str(path), with_pip=True)
return True
except Exception as exc: # noqa: BLE001
logger.error(f"Failed to create venv: {exc}")
return False
def get_venv_python(venv_path: str = ".venv") -> str:
"""Return the python executable path inside a venv."""
path = Path(venv_path).resolve()
if platform.system() == "Windows":
return str(path / "Scripts" / "python.exe")
return str(path / "bin" / "python")
def get_venv_activate_cmd(venv_path: str = ".venv") -> str:
"""Return the shell command to activate the venv."""
path = Path(venv_path).resolve()
if platform.system() == "Windows":
return str(path / "Scripts" / "activate")
return f"source {path}/bin/activate"
# =============================================================================
# System Dependency Checks
# =============================================================================
def _detect_distro() -> str:
"""Detect Linux distro family for install command suggestions."""
try:
with open("/etc/os-release") as f:
content = f.read().lower()
if "arch" in content or "manjaro" in content or "endeavour" in content:
return "arch"
if "debian" in content or "ubuntu" in content or "mint" in content or "pop" in content:
return "debian"
if "fedora" in content or "rhel" in content or "centos" in content or "rocky" in content:
return "fedora"
if "opensuse" in content or "suse" in content:
return "suse"
except OSError:
pass
return "unknown"
def _get_tesseract_install_cmd() -> str:
"""Return distro-specific command to install tesseract."""
distro = _detect_distro()
cmds = {
"arch": "sudo pacman -S tesseract tesseract-data-eng",
"debian": "sudo apt install tesseract-ocr tesseract-ocr-eng",
"fedora": "sudo dnf install tesseract tesseract-langpack-eng",
"suse": "sudo zypper install tesseract-ocr tesseract-ocr-traineddata-english",
}
return cmds.get(distro, "Install tesseract-ocr with your package manager")
def check_tesseract() -> dict[str, bool | str]:
"""Check if tesseract binary is installed and has English data.
Returns dict with keys: installed, has_eng, install_cmd, version.
"""
result: dict[str, bool | str] = {
"installed": False,
"has_eng": False,
"install_cmd": _get_tesseract_install_cmd(),
"version": "",
}
tess_bin = shutil.which("tesseract")
if not tess_bin:
return result
result["installed"] = True
# Get version
try:
ver = subprocess.run(
["tesseract", "--version"],
capture_output=True,
text=True,
timeout=5,
)
first_line = (ver.stdout or ver.stderr).split("\n")[0]
result["version"] = first_line.strip()
except (subprocess.TimeoutExpired, OSError):
pass
# Check for eng language data
try:
langs = subprocess.run(
["tesseract", "--list-langs"],
capture_output=True,
text=True,
timeout=5,
)
output = langs.stdout + langs.stderr
result["has_eng"] = "eng" in output.split()
except (subprocess.TimeoutExpired, OSError):
pass
return result
# =============================================================================
# ROCm Environment Configuration
# =============================================================================
def configure_rocm_env() -> list[str]:
"""Set environment variables for ROCm/MIOpen to work correctly.
Returns list of env vars that were set.
"""
changes: list[str] = []
# MIOPEN_FIND_MODE=FAST avoids the workspace allocation issue
# where MIOpen requires huge workspace but allocates 0 bytes
if "MIOPEN_FIND_MODE" not in os.environ:
os.environ["MIOPEN_FIND_MODE"] = "FAST"
changes.append("MIOPEN_FIND_MODE=FAST")
# Ensure MIOpen user DB has a writable location
if "MIOPEN_USER_DB_PATH" not in os.environ:
db_path = os.path.expanduser("~/.config/miopen")
os.makedirs(db_path, exist_ok=True)
os.environ["MIOPEN_USER_DB_PATH"] = db_path
changes.append(f"MIOPEN_USER_DB_PATH={db_path}")
return changes
# =============================================================================
# Installation
# =============================================================================
_BASE_VIDEO_DEPS = ["yt-dlp", "youtube-transcript-api"]
def _build_visual_deps(modules: SetupModules) -> list[str]:
"""Build the list of pip packages based on selected modules."""
# Base video deps are always included — setup must leave video fully ready
deps: list[str] = list(_BASE_VIDEO_DEPS)
if modules.easyocr:
deps.append("easyocr")
if modules.opencv:
deps.append("opencv-python-headless")
if modules.tesseract:
deps.append("pytesseract")
if modules.scenedetect:
deps.append("scenedetect[opencv]")
if modules.whisper:
deps.append("faster-whisper")
return deps
def install_torch(gpu_info: GPUInfo, python_exe: str | None = None) -> bool:
"""Install PyTorch with the correct GPU variant.
Returns True on success, False on failure.
"""
exe = python_exe or sys.executable
cmd = [exe, "-m", "pip", "install", "torch", "torchvision", "--index-url", gpu_info.index_url]
logger.info(f"Installing PyTorch from {gpu_info.index_url}")
try:
result = subprocess.run(cmd, timeout=600, capture_output=True, text=True)
if result.returncode != 0:
logger.error(f"PyTorch install failed:\n{result.stderr[-500:]}")
return False
return True
except subprocess.TimeoutExpired:
logger.error("PyTorch installation timed out (10 min)")
return False
except OSError as exc:
logger.error(f"PyTorch installation error: {exc}")
return False
def install_visual_deps(
modules: SetupModules | None = None, python_exe: str | None = None
) -> bool:
"""Install visual extraction dependencies.
Returns True on success, False on failure.
"""
mods = modules or SetupModules()
deps = _build_visual_deps(mods)
if not deps:
return True
exe = python_exe or sys.executable
cmd = [exe, "-m", "pip", "install"] + deps
logger.info(f"Installing visual deps: {', '.join(deps)}")
try:
result = subprocess.run(cmd, timeout=600, capture_output=True, text=True)
if result.returncode != 0:
logger.error(f"Visual deps install failed:\n{result.stderr[-500:]}")
return False
return True
except subprocess.TimeoutExpired:
logger.error("Visual deps installation timed out (10 min)")
return False
except OSError as exc:
logger.error(f"Visual deps installation error: {exc}")
return False
def install_skill_seekers(python_exe: str) -> bool:
"""Install skill-seekers into the target python environment."""
cmd = [python_exe, "-m", "pip", "install", "skill-seekers"]
try:
result = subprocess.run(cmd, timeout=300, capture_output=True, text=True)
return result.returncode == 0
except (subprocess.TimeoutExpired, OSError):
return False
# =============================================================================
# Verification
# =============================================================================
def verify_installation() -> dict[str, bool]:
"""Verify that all video deps are importable.
Returns a dict mapping package name to import success.
"""
results: dict[str, bool] = {}
# Base video deps
try:
import yt_dlp # noqa: F401
results["yt-dlp"] = True
except ImportError:
results["yt-dlp"] = False
try:
import youtube_transcript_api # noqa: F401
results["youtube-transcript-api"] = True
except ImportError:
results["youtube-transcript-api"] = False
# torch
try:
import torch
results["torch"] = True
results["torch.cuda"] = torch.cuda.is_available()
results["torch.rocm"] = hasattr(torch.version, "hip") and torch.version.hip is not None
except ImportError:
results["torch"] = False
results["torch.cuda"] = False
results["torch.rocm"] = False
# easyocr
try:
import easyocr # noqa: F401
results["easyocr"] = True
except ImportError:
results["easyocr"] = False
# opencv
try:
import cv2 # noqa: F401
results["opencv"] = True
except ImportError:
results["opencv"] = False
# pytesseract
try:
import pytesseract # noqa: F401
results["pytesseract"] = True
except ImportError:
results["pytesseract"] = False
# scenedetect
try:
import scenedetect # noqa: F401
results["scenedetect"] = True
except ImportError:
results["scenedetect"] = False
# faster-whisper
try:
import faster_whisper # noqa: F401
results["faster-whisper"] = True
except ImportError:
results["faster-whisper"] = False
return results
# =============================================================================
# Module Selection (Interactive)
# =============================================================================
def _ask_modules(interactive: bool) -> SetupModules:
"""Ask the user which modules to install. Returns all if non-interactive."""
if not interactive:
return SetupModules()
print("Which modules do you want to install?")
print(" [a] All (default)")
print(" [c] Choose individually")
try:
choice = input(" > ").strip().lower()
except (EOFError, KeyboardInterrupt):
print()
return SetupModules()
if choice not in ("c", "choose"):
return SetupModules()
modules = SetupModules()
_ask = _interactive_yn
modules.torch = _ask("PyTorch (required for easyocr GPU)", default=True)
modules.easyocr = _ask("EasyOCR (text extraction from video frames)", default=True)
modules.opencv = _ask("OpenCV (frame extraction and image processing)", default=True)
modules.tesseract = _ask("pytesseract (secondary OCR engine)", default=True)
modules.scenedetect = _ask("scenedetect (scene change detection)", default=True)
modules.whisper = _ask("faster-whisper (local audio transcription)", default=True)
return modules
def _interactive_yn(prompt: str, default: bool = True) -> bool:
"""Ask a yes/no question, return bool."""
suffix = "[Y/n]" if default else "[y/N]"
try:
answer = input(f" {prompt}? {suffix} ").strip().lower()
except (EOFError, KeyboardInterrupt):
return default
if not answer:
return default
return answer in ("y", "yes")
# =============================================================================
# Orchestrator
# =============================================================================
def run_setup(interactive: bool = True) -> int:
"""Auto-detect GPU and install all visual extraction dependencies.
Handles:
1. Venv creation (if not in one)
2. GPU detection
3. Module selection (optional — interactive only)
4. System dep checks (tesseract binary)
5. ROCm env var configuration
6. PyTorch installation (correct GPU variant)
7. Visual deps installation
8. Verification
Args:
interactive: If True, prompt user for confirmation before installing.
Returns:
0 on success, 1 on failure.
"""
print("=" * 60)
print(" Video Visual Extraction Setup")
print("=" * 60)
print()
total_steps = 7
# ── Step 1: Venv check ──
print(f"[1/{total_steps}] Checking environment...")
if is_in_venv():
print(f" Already in venv: {sys.prefix}")
python_exe = sys.executable
else:
print(" Not in a virtual environment.")
venv_path = ".venv"
if interactive:
try:
answer = input(
f" Create venv at ./{venv_path}? [Y/n] "
).strip().lower()
except (EOFError, KeyboardInterrupt):
print("\nSetup cancelled.")
return 1
if answer and answer not in ("y", "yes"):
print(" Continuing without venv (installing to system Python).")
python_exe = sys.executable
else:
if not create_venv(venv_path):
print(" FAILED: Could not create venv.")
return 1
python_exe = get_venv_python(venv_path)
activate_cmd = get_venv_activate_cmd(venv_path)
print(f" Venv created at ./{venv_path}")
print(f" Installing skill-seekers into venv...")
if not install_skill_seekers(python_exe):
print(" FAILED: Could not install skill-seekers into venv.")
return 1
print(f" After setup completes, activate with:")
print(f" {activate_cmd}")
else:
# Non-interactive: use current python
python_exe = sys.executable
print()
# ── Step 2: GPU detection ──
print(f"[2/{total_steps}] Detecting GPU...")
gpu_info = detect_gpu()
vendor_label = {
GPUVendor.NVIDIA: "NVIDIA (CUDA)",
GPUVendor.AMD: "AMD (ROCm)",
GPUVendor.NONE: "CPU-only",
}
print(f" GPU: {gpu_info.name}")
print(f" Vendor: {vendor_label.get(gpu_info.vendor, gpu_info.vendor.value)}")
if gpu_info.compute_version:
print(f" Version: {gpu_info.compute_version}")
for detail in gpu_info.details:
print(f" {detail}")
print(f" PyTorch index: {gpu_info.index_url}")
print()
# ── Step 3: Module selection ──
print(f"[3/{total_steps}] Selecting modules...")
modules = _ask_modules(interactive)
deps = _build_visual_deps(modules)
print(f" Selected: {', '.join(deps) if deps else '(none)'}")
if modules.torch:
print(f" + PyTorch + torchvision")
print()
# ── Step 4: System dependency check ──
print(f"[4/{total_steps}] Checking system dependencies...")
if modules.tesseract:
tess = check_tesseract()
if not tess["installed"]:
print(f" WARNING: tesseract binary not found!")
print(f" The pytesseract Python package needs the tesseract binary installed.")
print(f" Install it with: {tess['install_cmd']}")
print()
elif not tess["has_eng"]:
print(f" WARNING: tesseract installed ({tess['version']}) but English data missing!")
print(f" Install with: {tess['install_cmd']}")
print()
else:
print(f" tesseract: {tess['version']} (eng data OK)")
else:
print(" tesseract: skipped (not selected)")
print()
# ── Step 5: ROCm configuration ──
print(f"[5/{total_steps}] Configuring GPU environment...")
if gpu_info.vendor == GPUVendor.AMD:
changes = configure_rocm_env()
if changes:
print(" Set ROCm environment variables:")
for c in changes:
print(f" {c}")
print(" (These fix MIOpen workspace allocation issues)")
else:
print(" ROCm env vars already configured.")
elif gpu_info.vendor == GPUVendor.NVIDIA:
print(" NVIDIA: no extra configuration needed.")
else:
print(" CPU-only: no GPU configuration needed.")
print()
# ── Step 6: Confirm and install ──
if interactive:
print("Ready to install. Summary:")
if modules.torch:
print(f" - PyTorch + torchvision (from {gpu_info.index_url})")
for dep in deps:
print(f" - {dep}")
print()
try:
answer = input("Proceed? [Y/n] ").strip().lower()
except (EOFError, KeyboardInterrupt):
print("\nSetup cancelled.")
return 1
if answer and answer not in ("y", "yes"):
print("Setup cancelled.")
return 1
print()
print(f"[6/{total_steps}] Installing packages...")
if modules.torch:
print(" Installing PyTorch...")
if not install_torch(gpu_info, python_exe):
print(" FAILED: PyTorch installation failed.")
print(f" Try: {python_exe} -m pip install torch torchvision --index-url {gpu_info.index_url}")
return 1
print(" PyTorch installed.")
if deps:
print(" Installing visual packages...")
if not install_visual_deps(modules, python_exe):
print(" FAILED: Visual packages installation failed.")
print(f" Try: {python_exe} -m pip install {' '.join(deps)}")
return 1
print(" Visual packages installed.")
print()
# ── Step 7: Verify ──
print(f"[7/{total_steps}] Verifying installation...")
results = verify_installation()
all_ok = True
for pkg, ok in results.items():
status = "OK" if ok else "MISSING"
print(f" {pkg}: {status}")
# torch.cuda / torch.rocm are informational, not required
if not ok and pkg not in ("torch.cuda", "torch.rocm"):
# Only count as failure if the module was selected
if pkg == "torch" and modules.torch:
all_ok = False
elif pkg == "easyocr" and modules.easyocr:
all_ok = False
elif pkg == "opencv" and modules.opencv:
all_ok = False
elif pkg == "pytesseract" and modules.tesseract:
all_ok = False
elif pkg == "scenedetect" and modules.scenedetect:
all_ok = False
elif pkg == "faster-whisper" and modules.whisper:
all_ok = False
print()
if all_ok:
print("Setup complete! You can now use: skill-seekers video --url <URL> --visual")
if not is_in_venv() and python_exe != sys.executable:
activate_cmd = get_venv_activate_cmd()
print(f"\nDon't forget to activate the venv first:")
print(f" {activate_cmd}")
else:
print("Some packages failed to install. Check the output above.")
return 1
return 0

View File

@@ -0,0 +1,396 @@
"""Video transcript extraction module.
Handles all transcript acquisition:
- YouTube captions via youtube-transcript-api (Tier 1)
- Subtitle file parsing: SRT and VTT (Tier 1)
- Whisper ASR stub (Tier 2 — raises ImportError with install instructions)
"""
import logging
import re
from pathlib import Path
from skill_seekers.cli.video_models import (
TranscriptSegment,
TranscriptSource,
VideoInfo,
VideoSourceConfig,
VideoSourceType,
)
logger = logging.getLogger(__name__)
# Optional dependency: youtube-transcript-api
try:
from youtube_transcript_api import YouTubeTranscriptApi
HAS_YOUTUBE_TRANSCRIPT = True
except ImportError:
HAS_YOUTUBE_TRANSCRIPT = False
# Optional dependency: faster-whisper (Tier 2)
try:
from faster_whisper import WhisperModel # noqa: F401
HAS_WHISPER = True
except ImportError:
HAS_WHISPER = False
# =============================================================================
# YouTube Transcript Extraction (Tier 1)
# =============================================================================
def extract_youtube_transcript(
video_id: str,
languages: list[str] | None = None,
) -> tuple[list[TranscriptSegment], TranscriptSource]:
"""Fetch YouTube captions via youtube-transcript-api.
Args:
video_id: YouTube video ID (11 chars).
languages: Language preference list (e.g., ['en', 'tr']).
Returns:
Tuple of (transcript segments, source type).
Raises:
RuntimeError: If youtube-transcript-api is not installed.
"""
if not HAS_YOUTUBE_TRANSCRIPT:
raise RuntimeError(
"youtube-transcript-api is required for YouTube transcript extraction.\n"
'Install with: pip install "skill-seekers[video]"\n'
"Or: pip install youtube-transcript-api"
)
if languages is None:
languages = ["en"]
try:
ytt_api = YouTubeTranscriptApi()
# Use list_transcripts to detect whether the transcript is auto-generated
source = TranscriptSource.YOUTUBE_MANUAL
try:
transcript_list = ytt_api.list(video_id)
# Prefer manually created transcripts; fall back to auto-generated
try:
transcript_entry = transcript_list.find_manually_created_transcript(languages)
source = TranscriptSource.YOUTUBE_MANUAL
except Exception:
try:
transcript_entry = transcript_list.find_generated_transcript(languages)
source = TranscriptSource.YOUTUBE_AUTO
except Exception:
# Fall back to any available transcript
transcript_entry = transcript_list.find_transcript(languages)
source = (
TranscriptSource.YOUTUBE_AUTO
if transcript_entry.is_generated
else TranscriptSource.YOUTUBE_MANUAL
)
transcript = transcript_entry.fetch()
except Exception:
# Fall back to direct fetch if list fails (older API versions)
transcript = ytt_api.fetch(video_id, languages=languages)
# Check is_generated on the FetchedTranscript if available
if getattr(transcript, "is_generated", False):
source = TranscriptSource.YOUTUBE_AUTO
segments = []
for snippet in transcript.snippets:
text = snippet.text.strip()
if not text:
continue
start = snippet.start
duration = snippet.duration
segments.append(
TranscriptSegment(
text=text,
start=start,
end=start + duration,
confidence=1.0,
source=source,
)
)
if not segments:
return [], TranscriptSource.NONE
return segments, source
except Exception as e:
logger.warning(f"Failed to fetch YouTube transcript for {video_id}: {e}")
return [], TranscriptSource.NONE
# =============================================================================
# Subtitle File Parsing (Tier 1)
# =============================================================================
def _parse_timestamp_srt(ts: str) -> float:
"""Parse SRT timestamp (HH:MM:SS,mmm) to seconds."""
ts = ts.strip().replace(",", ".")
parts = ts.split(":")
if len(parts) == 3:
h, m, s = parts
return int(h) * 3600 + int(m) * 60 + float(s)
return 0.0
def _parse_timestamp_vtt(ts: str) -> float:
"""Parse VTT timestamp (HH:MM:SS.mmm or MM:SS.mmm) to seconds."""
ts = ts.strip()
parts = ts.split(":")
if len(parts) == 3:
h, m, s = parts
return int(h) * 3600 + int(m) * 60 + float(s)
elif len(parts) == 2:
m, s = parts
return int(m) * 60 + float(s)
return 0.0
def parse_srt(path: str) -> list[TranscriptSegment]:
"""Parse an SRT subtitle file into TranscriptSegments.
Args:
path: Path to .srt file.
Returns:
List of TranscriptSegment objects.
"""
content = Path(path).read_text(encoding="utf-8", errors="replace")
segments = []
# SRT format: index\nstart --> end\ntext\n\n
blocks = re.split(r"\n\s*\n", content.strip())
for block in blocks:
lines = block.strip().split("\n")
if len(lines) < 2:
continue
# Find the timestamp line (contains -->)
ts_line = None
text_lines = []
for line in lines:
if "-->" in line:
ts_line = line
elif ts_line is not None:
text_lines.append(line)
if ts_line is None:
continue
parts = ts_line.split("-->")
if len(parts) != 2:
continue
start = _parse_timestamp_srt(parts[0])
end = _parse_timestamp_srt(parts[1])
text = " ".join(text_lines).strip()
# Remove HTML tags
text = re.sub(r"<[^>]+>", "", text)
if text:
segments.append(
TranscriptSegment(
text=text,
start=start,
end=end,
confidence=1.0,
source=TranscriptSource.SUBTITLE_FILE,
)
)
return segments
def parse_vtt(path: str) -> list[TranscriptSegment]:
"""Parse a WebVTT subtitle file into TranscriptSegments.
Args:
path: Path to .vtt file.
Returns:
List of TranscriptSegment objects.
"""
content = Path(path).read_text(encoding="utf-8", errors="replace")
segments = []
# Skip VTT header
lines = content.strip().split("\n")
i = 0
# Skip WEBVTT header and any metadata
while i < len(lines) and not re.match(r"\d{2}:\d{2}", lines[i]):
i += 1
current_text_lines = []
current_start = 0.0
current_end = 0.0
in_cue = False
while i < len(lines):
line = lines[i].strip()
i += 1
if "-->" in line:
# Save previous cue
if in_cue and current_text_lines:
text = " ".join(current_text_lines).strip()
text = re.sub(r"<[^>]+>", "", text)
if text:
segments.append(
TranscriptSegment(
text=text,
start=current_start,
end=current_end,
confidence=1.0,
source=TranscriptSource.SUBTITLE_FILE,
)
)
parts = line.split("-->")
current_start = _parse_timestamp_vtt(parts[0])
current_end = _parse_timestamp_vtt(parts[1].split()[0])
current_text_lines = []
in_cue = True
elif line == "":
if in_cue and current_text_lines:
text = " ".join(current_text_lines).strip()
text = re.sub(r"<[^>]+>", "", text)
if text:
segments.append(
TranscriptSegment(
text=text,
start=current_start,
end=current_end,
confidence=1.0,
source=TranscriptSource.SUBTITLE_FILE,
)
)
current_text_lines = []
in_cue = False
elif in_cue:
# Skip cue identifiers (numeric lines before timestamps)
if not line.isdigit():
current_text_lines.append(line)
# Handle last cue
if in_cue and current_text_lines:
text = " ".join(current_text_lines).strip()
text = re.sub(r"<[^>]+>", "", text)
if text:
segments.append(
TranscriptSegment(
text=text,
start=current_start,
end=current_end,
confidence=1.0,
source=TranscriptSource.SUBTITLE_FILE,
)
)
return segments
# =============================================================================
# Whisper Stub (Tier 2)
# =============================================================================
def transcribe_with_whisper(
audio_path: str, # noqa: ARG001
model: str = "base", # noqa: ARG001
language: str | None = None, # noqa: ARG001
) -> list[TranscriptSegment]:
"""Transcribe audio using faster-whisper (Tier 2).
Raises:
RuntimeError: Always, unless faster-whisper is installed.
"""
if not HAS_WHISPER:
raise RuntimeError(
"faster-whisper is required for Whisper transcription.\n"
'Install with: pip install "skill-seekers[video-full]"\n'
"Or: pip install faster-whisper"
)
# Tier 2 implementation placeholder
raise NotImplementedError("Whisper transcription will be implemented in Tier 2")
# =============================================================================
# Main Entry Point
# =============================================================================
def get_transcript(
video_info: VideoInfo,
config: VideoSourceConfig,
) -> tuple[list[TranscriptSegment], TranscriptSource]:
"""Get transcript for a video, trying available methods in priority order.
Priority:
1. YouTube API (for YouTube videos)
2. Subtitle files (SRT/VTT alongside local files)
3. Whisper fallback (Tier 2)
4. NONE (no transcript available)
Args:
video_info: Video metadata.
config: Video source configuration.
Returns:
Tuple of (transcript segments, source type).
"""
languages = config.languages or ["en"]
# 1. Try YouTube API for YouTube videos
if video_info.source_type == VideoSourceType.YOUTUBE and HAS_YOUTUBE_TRANSCRIPT:
try:
segments, source = extract_youtube_transcript(video_info.video_id, languages)
if segments:
logger.info(
f"Got {len(segments)} transcript segments via YouTube API "
f"({source.value}) for '{video_info.title}'"
)
return segments, source
except Exception as e:
logger.warning(f"YouTube transcript failed: {e}")
# 2. Try subtitle files for local videos
if video_info.file_path:
base = Path(video_info.file_path).stem
parent = Path(video_info.file_path).parent
for ext in [".srt", ".vtt"]:
sub_path = parent / f"{base}{ext}"
if sub_path.exists():
logger.info(f"Found subtitle file: {sub_path}")
segments = parse_srt(str(sub_path)) if ext == ".srt" else parse_vtt(str(sub_path))
if segments:
return segments, TranscriptSource.SUBTITLE_FILE
# 3. Whisper fallback (Tier 2 — only if installed)
if HAS_WHISPER and video_info.file_path:
try:
segments = transcribe_with_whisper(
video_info.file_path,
model=config.whisper_model,
language=languages[0] if languages else None,
)
if segments:
return segments, TranscriptSource.WHISPER
except (RuntimeError, NotImplementedError):
pass
# 4. No transcript available
logger.warning(f"No transcript available for '{video_info.title}'")
return [], TranscriptSource.NONE

File diff suppressed because it is too large Load Diff

View File

@@ -3,20 +3,21 @@
Skill Seeker MCP Server (FastMCP Implementation)
Modern, decorator-based MCP server using FastMCP for simplified tool registration.
Provides 25 tools for generating Claude AI skills from documentation.
Provides 33 tools for generating Claude AI skills from documentation.
This is a streamlined alternative to server.py (2200 lines → 708 lines, 68% reduction).
All tool implementations are delegated to modular tool files in tools/ directory.
**Architecture:**
- FastMCP server with decorator-based tool registration
- 25 tools organized into 6 categories:
- 33 tools organized into 7 categories:
* Config tools (3): generate_config, list_configs, validate_config
* Scraping tools (8): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
* Scraping tools (10): estimate_pages, scrape_docs, scrape_github, scrape_pdf, scrape_video, scrape_codebase, detect_patterns, extract_test_examples, build_how_to_guides, extract_config_patterns
* Packaging tools (4): package_skill, upload_skill, enhance_skill, install_skill
* Splitting tools (2): split_config, generate_router
* Source tools (4): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
* Source tools (5): fetch_config, submit_config, add_config_source, list_config_sources, remove_config_source
* Vector Database tools (4): export_to_weaviate, export_to_chroma, export_to_faiss, export_to_qdrant
* Workflow tools (5): list_workflows, get_workflow, create_workflow, update_workflow, delete_workflow
**Usage:**
# Stdio transport (default, backward compatible)
@@ -98,6 +99,7 @@ try:
scrape_docs_impl,
scrape_github_impl,
scrape_pdf_impl,
scrape_video_impl,
# Splitting tools
split_config_impl,
submit_config_impl,
@@ -139,6 +141,7 @@ except ImportError:
scrape_docs_impl,
scrape_github_impl,
scrape_pdf_impl,
scrape_video_impl,
split_config_impl,
submit_config_impl,
upload_skill_impl,
@@ -249,7 +252,7 @@ async def validate_config(config_path: str) -> str:
# ============================================================================
# SCRAPING TOOLS (4 tools)
# SCRAPING TOOLS (10 tools)
# ============================================================================
@@ -420,6 +423,95 @@ async def scrape_pdf(
return str(result)
@safe_tool_decorator(
description="Extract transcripts and metadata from videos (YouTube, Vimeo, local files) and build Claude skill."
)
async def scrape_video(
url: str | None = None,
video_file: str | None = None,
playlist: str | None = None,
name: str | None = None,
description: str | None = None,
languages: str | None = None,
from_json: str | None = None,
visual: bool = False,
whisper_model: str | None = None,
visual_interval: float | None = None,
visual_min_gap: float | None = None,
visual_similarity: float | None = None,
vision_ocr: bool = False,
start_time: str | None = None,
end_time: str | None = None,
setup: bool = False,
) -> str:
"""
Scrape video content and build Claude skill.
Args:
url: Video URL (YouTube, Vimeo)
video_file: Local video file path
playlist: Playlist URL
name: Skill name
description: Skill description
languages: Transcript language preferences (comma-separated)
from_json: Build from extracted JSON file
visual: Enable visual frame extraction (requires video-full extras)
whisper_model: Whisper model size for local transcription (e.g., base, small, medium, large)
visual_interval: Seconds between frame captures (default: 5.0)
visual_min_gap: Minimum seconds between kept frames (default: 2.0)
visual_similarity: Similarity threshold to skip duplicate frames 0.0-1.0 (default: 0.95)
vision_ocr: Use vision model for OCR on extracted frames
start_time: Start time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.
end_time: End time for extraction (seconds, MM:SS, or HH:MM:SS). Single video only.
setup: Auto-detect GPU and install visual extraction deps (PyTorch, easyocr, etc.)
Returns:
Video scraping results with file paths.
"""
if setup:
from skill_seekers.cli.video_setup import run_setup
rc = run_setup(interactive=False)
return "Setup completed successfully." if rc == 0 else "Setup failed. Check logs."
args = {}
if url:
args["url"] = url
if video_file:
args["video_file"] = video_file
if playlist:
args["playlist"] = playlist
if name:
args["name"] = name
if description:
args["description"] = description
if languages:
args["languages"] = languages
if from_json:
args["from_json"] = from_json
if start_time:
args["start_time"] = start_time
if end_time:
args["end_time"] = end_time
if visual:
args["visual"] = visual
if whisper_model:
args["whisper_model"] = whisper_model
if visual_interval is not None:
args["visual_interval"] = visual_interval
if visual_min_gap is not None:
args["visual_min_gap"] = visual_min_gap
if visual_similarity is not None:
args["visual_similarity"] = visual_similarity
if vision_ocr:
args["vision_ocr"] = vision_ocr
result = await scrape_video_impl(args)
if isinstance(result, list) and result:
return result[0].text if hasattr(result[0], "text") else str(result[0])
return str(result)
@safe_tool_decorator(
description="Analyze local codebase and extract code knowledge. Walks directory tree, analyzes code files, extracts signatures, docstrings, and optionally generates API reference documentation and dependency graphs."
)

View File

@@ -63,6 +63,9 @@ from .scraping_tools import (
from .scraping_tools import (
scrape_pdf_tool as scrape_pdf_impl,
)
from .scraping_tools import (
scrape_video_tool as scrape_video_impl,
)
from .source_tools import (
add_config_source_tool as add_config_source_impl,
)
@@ -123,6 +126,7 @@ __all__ = [
"scrape_docs_impl",
"scrape_github_impl",
"scrape_pdf_impl",
"scrape_video_impl",
"scrape_codebase_impl",
"detect_patterns_impl",
"extract_test_examples_impl",

View File

@@ -356,6 +356,124 @@ async def scrape_pdf_tool(args: dict) -> list[TextContent]:
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
async def scrape_video_tool(args: dict) -> list[TextContent]:
"""
Scrape video content (YouTube, local files) and build Claude skill.
Extracts transcripts, metadata, and optionally visual content from videos
to create skills.
Args:
args: Dictionary containing:
- url (str, optional): Video URL (YouTube, Vimeo)
- video_file (str, optional): Local video file path
- playlist (str, optional): Playlist URL
- name (str, optional): Skill name
- description (str, optional): Skill description
- languages (str, optional): Language preferences (comma-separated)
- from_json (str, optional): Build from extracted JSON file
- visual (bool, optional): Enable visual frame extraction (default: False)
- whisper_model (str, optional): Whisper model size (default: base)
- visual_interval (float, optional): Seconds between frame captures (default: 5.0)
- visual_min_gap (float, optional): Minimum seconds between kept frames (default: 2.0)
- visual_similarity (float, optional): Similarity threshold to skip duplicate frames (default: 0.95)
- vision_ocr (bool, optional): Use vision model for OCR on frames (default: False)
- start_time (str, optional): Start time for extraction (seconds, MM:SS, or HH:MM:SS)
- end_time (str, optional): End time for extraction (seconds, MM:SS, or HH:MM:SS)
- setup (bool, optional): Auto-detect GPU and install visual extraction deps
Returns:
List[TextContent]: Tool execution results
"""
# Handle --setup early exit
if args.get("setup", False):
from skill_seekers.cli.video_setup import run_setup
rc = run_setup(interactive=False)
msg = "Setup completed successfully." if rc == 0 else "Setup failed. Check logs."
return [TextContent(type="text", text=msg)]
url = args.get("url")
video_file = args.get("video_file")
playlist = args.get("playlist")
name = args.get("name")
description = args.get("description")
languages = args.get("languages")
from_json = args.get("from_json")
visual = args.get("visual", False)
whisper_model = args.get("whisper_model")
visual_interval = args.get("visual_interval")
visual_min_gap = args.get("visual_min_gap")
visual_similarity = args.get("visual_similarity")
vision_ocr = args.get("vision_ocr", False)
start_time = args.get("start_time")
end_time = args.get("end_time")
# Build command
cmd = [sys.executable, str(CLI_DIR / "video_scraper.py")]
if from_json:
cmd.extend(["--from-json", from_json])
elif url:
cmd.extend(["--url", url])
if name:
cmd.extend(["--name", name])
if description:
cmd.extend(["--description", description])
if languages:
cmd.extend(["--languages", languages])
elif video_file:
cmd.extend(["--video-file", video_file])
if name:
cmd.extend(["--name", name])
if description:
cmd.extend(["--description", description])
elif playlist:
cmd.extend(["--playlist", playlist])
if name:
cmd.extend(["--name", name])
else:
return [
TextContent(
type="text",
text="❌ Error: Must specify --url, --video-file, --playlist, or --from-json",
)
]
# Visual extraction parameters
if visual:
cmd.append("--visual")
if whisper_model:
cmd.extend(["--whisper-model", whisper_model])
if visual_interval is not None:
cmd.extend(["--visual-interval", str(visual_interval)])
if visual_min_gap is not None:
cmd.extend(["--visual-min-gap", str(visual_min_gap)])
if visual_similarity is not None:
cmd.extend(["--visual-similarity", str(visual_similarity)])
if vision_ocr:
cmd.append("--vision-ocr")
if start_time:
cmd.extend(["--start-time", str(start_time)])
if end_time:
cmd.extend(["--end-time", str(end_time)])
# Run video_scraper.py with streaming
timeout = 600 # 10 minutes for video extraction
progress_msg = "🎬 Scraping video content...\n"
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
output = progress_msg + stdout
if returncode == 0:
return [TextContent(type="text", text=output)]
else:
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
async def scrape_github_tool(args: dict) -> list[TextContent]:
"""
Scrape GitHub repository and build Claude skill.

View File

@@ -0,0 +1,111 @@
name: video-tutorial
description: >
Video tutorial enhancement workflow. Cleans OCR noise, reconstructs code from
transcript + visual data, detects programming languages, and synthesizes a
coherent tutorial skill from raw video extraction output.
version: "1.0"
applies_to:
- video_scraping
variables: {}
stages:
- name: ocr_code_cleanup
type: custom
target: skill_md
enabled: true
uses_history: false
prompt: >
You are reviewing code blocks extracted from video tutorial OCR.
The OCR output is noisy — it contains line numbers, UI chrome text,
garbled characters, and incomplete lines.
Clean each code block by:
1. Remove line numbers that OCR captured (leading digits like "1 ", "2 ", "23 ")
2. Remove UI elements (tab bar text, file names, button labels)
3. Fix common OCR errors (l/1, O/0, rn/m confusions)
4. Remove animation timeline numbers or frame counters
5. Strip trailing whitespace and normalize indentation
Output JSON with:
- "cleaned_blocks": array of cleaned code strings
- "languages_detected": map of block index to detected language
- "confidence": overall confidence in the cleanup (0-1)
- name: language_detection
type: custom
target: skill_md
enabled: true
uses_history: true
prompt: >
Based on the previous OCR cleanup results and the transcript content,
determine the programming language for each code block.
Detection strategy (in priority order):
1. Narrator mentions: "in GDScript", "this Python function", "our C# class"
2. Code patterns: extends/func/signal=GDScript, def/import=Python,
function/const/let=JavaScript, using/namespace=C#
3. File extensions visible in OCR (.gd, .py, .js, .cs)
4. Framework context from transcript (Godot=GDScript, Unity=C#, Django=Python)
Output JSON with:
- "language_map": map of block index to language identifier
- "primary_language": the main language used in the tutorial
- "framework": detected framework/engine if any
- name: tutorial_synthesis
type: custom
target: skill_md
enabled: true
uses_history: true
prompt: >
Synthesize the cleaned code blocks, detected languages, and transcript
into a coherent tutorial structure.
Group content by TOPIC rather than timestamp:
1. Identify the main concepts taught in the tutorial
2. Group related code blocks under concept headings
3. Use narrator explanations as descriptions for each code block
4. Build a progressive learning path where concepts build on each other
5. Show final working code for each concept, not intermediate OCR states
Use the Audio-Visual Alignment pairs (code + narrator text) as the
primary source for creating annotated examples.
Output JSON with:
- "sections": array of tutorial sections with title, description, code examples
- "prerequisites": what the viewer should know beforehand
- "key_concepts": important terms and their definitions from the tutorial
- "learning_path": ordered list of concept names
- name: skill_polish
type: custom
target: skill_md
enabled: true
uses_history: true
prompt: >
Using all previous stage results, polish the SKILL.md for this video tutorial.
Create:
1. Clear "When to Use This Skill" with specific trigger conditions
2. Quick Reference with 5-10 clean, annotated code examples
3. Step-by-step guide following the tutorial flow
4. Key concepts with definitions from the narrator
5. Proper language tags on all code fences
Rules:
- Never include raw OCR artifacts (line numbers, UI chrome)
- Always use correct language tags
- Keep code examples short and focused (5-30 lines)
- Make it actionable for someone implementing what the tutorial teaches
Output JSON with:
- "improved_overview": enhanced overview section
- "quick_start": concise getting-started snippet
- "key_concepts": essential concepts with definitions
- "code_examples": array of clean, annotated code examples
post_process:
reorder_sections: []
add_metadata:
enhanced: true
workflow: video-tutorial
source_type: video

View File

@@ -24,12 +24,12 @@ class TestParserRegistry:
def test_all_parsers_registered(self):
"""Test that all parsers are registered."""
assert len(PARSERS) == 22, f"Expected 22 parsers, got {len(PARSERS)}"
assert len(PARSERS) == 23, f"Expected 23 parsers, got {len(PARSERS)}"
def test_get_parser_names(self):
"""Test getting list of parser names."""
names = get_parser_names()
assert len(names) == 22
assert len(names) == 23
assert "scrape" in names
assert "github" in names
assert "package" in names
@@ -37,6 +37,7 @@ class TestParserRegistry:
assert "analyze" in names
assert "config" in names
assert "workflows" in names
assert "video" in names
def test_all_parsers_are_subcommand_parsers(self):
"""Test that all parsers inherit from SubcommandParser."""
@@ -242,9 +243,9 @@ class TestBackwardCompatibility:
assert cmd in names, f"Command '{cmd}' not found in parser registry!"
def test_command_count_matches(self):
"""Test that we have exactly 22 commands (includes new create, workflows, and word commands)."""
assert len(PARSERS) == 22
assert len(get_parser_names()) == 22
"""Test that we have exactly 23 commands (includes create, workflows, word, and video commands)."""
assert len(PARSERS) == 23
assert len(get_parser_names()) == 23
if __name__ == "__main__":

3400
tests/test_video_scraper.py Normal file

File diff suppressed because it is too large Load Diff

679
tests/test_video_setup.py Normal file
View File

@@ -0,0 +1,679 @@
#!/usr/bin/env python3
"""
Tests for Video Setup (cli/video_setup.py) and video_visual.py resilience.
Tests cover:
- GPU detection (NVIDIA, AMD ROCm, AMD without ROCm, CPU fallback)
- CUDA / ROCm version → index URL mapping
- PyTorch installation (mocked subprocess)
- Visual deps installation (mocked subprocess)
- Installation verification
- run_setup orchestrator
- Venv detection and creation
- System dep checks (tesseract binary)
- ROCm env var configuration
- Module selection (SetupModules)
- Tesseract circuit breaker (video_visual.py)
- --setup flag in VIDEO_ARGUMENTS and early-exit in video_scraper
"""
import os
import subprocess
import sys
import tempfile
import unittest
from unittest.mock import MagicMock, patch
from skill_seekers.cli.video_setup import (
_BASE_VIDEO_DEPS,
GPUInfo,
GPUVendor,
SetupModules,
_build_visual_deps,
_cuda_version_to_index_url,
_detect_distro,
_PYTORCH_BASE,
_rocm_version_to_index_url,
check_tesseract,
configure_rocm_env,
create_venv,
detect_gpu,
get_venv_activate_cmd,
get_venv_python,
install_torch,
install_visual_deps,
is_in_venv,
run_setup,
verify_installation,
)
# =============================================================================
# GPU Detection Tests
# =============================================================================
class TestGPUDetection(unittest.TestCase):
"""Tests for detect_gpu() and its helpers."""
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_nvidia_detected(self, mock_run, mock_which):
"""nvidia-smi present → GPUVendor.NVIDIA."""
mock_which.side_effect = lambda cmd: "/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
mock_run.return_value = MagicMock(
returncode=0,
stdout=(
"+-------------------------+\n"
"| NVIDIA GeForce RTX 4090 On |\n"
"| CUDA Version: 12.4 |\n"
"+-------------------------+\n"
),
)
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.NVIDIA
assert "12.4" in gpu.compute_version
assert "cu124" in gpu.index_url
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
@patch("skill_seekers.cli.video_setup._read_rocm_version", return_value="6.3.1")
def test_amd_rocm_detected(self, mock_rocm_ver, mock_run, mock_which):
"""rocminfo present → GPUVendor.AMD."""
def which_side(cmd):
if cmd == "nvidia-smi":
return None
if cmd == "rocminfo":
return "/usr/bin/rocminfo"
return None
mock_which.side_effect = which_side
mock_run.return_value = MagicMock(
returncode=0,
stdout="Marketing Name: AMD Radeon RX 7900 XTX\n",
)
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.AMD
assert "rocm6.3" in gpu.index_url
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_amd_no_rocm_fallback(self, mock_run, mock_which):
"""AMD GPU in lspci but no ROCm → AMD vendor, CPU index URL."""
def which_side(cmd):
if cmd == "lspci":
return "/usr/bin/lspci"
return None
mock_which.side_effect = which_side
mock_run.return_value = MagicMock(
returncode=0,
stdout="06:00.0 VGA compatible controller: AMD/ATI Navi 31 [Radeon RX 7900 XTX]\n",
)
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.AMD
assert "cpu" in gpu.index_url
assert any("ROCm is not installed" in d for d in gpu.details)
@patch("skill_seekers.cli.video_setup.shutil.which", return_value=None)
def test_cpu_fallback(self, mock_which):
"""No GPU tools found → GPUVendor.NONE."""
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.NONE
assert "cpu" in gpu.index_url
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_nvidia_smi_error(self, mock_run, mock_which):
"""nvidia-smi returns non-zero → skip to next check."""
mock_which.side_effect = lambda cmd: (
"/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
)
mock_run.return_value = MagicMock(returncode=1, stdout="")
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.NONE
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_nvidia_smi_timeout(self, mock_run, mock_which):
"""nvidia-smi times out → skip to next check."""
mock_which.side_effect = lambda cmd: (
"/usr/bin/nvidia-smi" if cmd == "nvidia-smi" else None
)
mock_run.side_effect = subprocess.TimeoutExpired(cmd="nvidia-smi", timeout=10)
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.NONE
@patch("skill_seekers.cli.video_setup.shutil.which")
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_rocminfo_error(self, mock_run, mock_which):
"""rocminfo returns non-zero → skip to next check."""
def which_side(cmd):
if cmd == "nvidia-smi":
return None
if cmd == "rocminfo":
return "/usr/bin/rocminfo"
return None
mock_which.side_effect = which_side
mock_run.return_value = MagicMock(returncode=1, stdout="")
gpu = detect_gpu()
assert gpu.vendor == GPUVendor.NONE
# =============================================================================
# Version Mapping Tests
# =============================================================================
class TestVersionMapping(unittest.TestCase):
"""Tests for CUDA/ROCm version → index URL mapping."""
def test_cuda_124(self):
assert _cuda_version_to_index_url("12.4") == f"{_PYTORCH_BASE}/cu124"
def test_cuda_126(self):
assert _cuda_version_to_index_url("12.6") == f"{_PYTORCH_BASE}/cu124"
def test_cuda_121(self):
assert _cuda_version_to_index_url("12.1") == f"{_PYTORCH_BASE}/cu121"
def test_cuda_118(self):
assert _cuda_version_to_index_url("11.8") == f"{_PYTORCH_BASE}/cu118"
def test_cuda_old_falls_to_cpu(self):
assert _cuda_version_to_index_url("10.2") == f"{_PYTORCH_BASE}/cpu"
def test_cuda_invalid_string(self):
assert _cuda_version_to_index_url("garbage") == f"{_PYTORCH_BASE}/cpu"
def test_rocm_63(self):
assert _rocm_version_to_index_url("6.3.1") == f"{_PYTORCH_BASE}/rocm6.3"
def test_rocm_60(self):
assert _rocm_version_to_index_url("6.0") == f"{_PYTORCH_BASE}/rocm6.2.4"
def test_rocm_old_falls_to_cpu(self):
assert _rocm_version_to_index_url("5.4") == f"{_PYTORCH_BASE}/cpu"
def test_rocm_invalid(self):
assert _rocm_version_to_index_url("bad") == f"{_PYTORCH_BASE}/cpu"
# =============================================================================
# Venv Tests
# =============================================================================
class TestVenv(unittest.TestCase):
"""Tests for venv detection and creation."""
def test_is_in_venv_returns_bool(self):
result = is_in_venv()
assert isinstance(result, bool)
def test_is_in_venv_detects_prefix_mismatch(self):
# If sys.prefix != sys.base_prefix, we're in a venv
with patch.object(sys, "prefix", "/some/venv"), \
patch.object(sys, "base_prefix", "/usr"):
assert is_in_venv() is True
def test_is_in_venv_detects_no_venv(self):
with patch.object(sys, "prefix", "/usr"), \
patch.object(sys, "base_prefix", "/usr"):
assert is_in_venv() is False
def test_create_venv_in_tempdir(self):
with tempfile.TemporaryDirectory() as tmpdir:
venv_path = os.path.join(tmpdir, "test_venv")
result = create_venv(venv_path)
assert result is True
assert os.path.isdir(venv_path)
def test_create_venv_already_exists(self):
with tempfile.TemporaryDirectory() as tmpdir:
# Create it once
create_venv(tmpdir)
# Creating again should succeed (already exists)
assert create_venv(tmpdir) is True
def test_get_venv_python_linux(self):
with patch("skill_seekers.cli.video_setup.platform.system", return_value="Linux"):
path = get_venv_python("/path/.venv")
assert path.endswith("bin/python")
def test_get_venv_activate_cmd_linux(self):
with patch("skill_seekers.cli.video_setup.platform.system", return_value="Linux"):
cmd = get_venv_activate_cmd("/path/.venv")
assert "source" in cmd
assert "bin/activate" in cmd
# =============================================================================
# System Dep Check Tests
# =============================================================================
class TestSystemDeps(unittest.TestCase):
"""Tests for system dependency checks."""
@patch("skill_seekers.cli.video_setup.shutil.which", return_value=None)
def test_tesseract_not_installed(self, mock_which):
result = check_tesseract()
assert result["installed"] is False
assert result["has_eng"] is False
assert isinstance(result["install_cmd"], str)
@patch("skill_seekers.cli.video_setup.subprocess.run")
@patch("skill_seekers.cli.video_setup.shutil.which", return_value="/usr/bin/tesseract")
def test_tesseract_installed_with_eng(self, mock_which, mock_run):
mock_run.side_effect = [
# --version call
MagicMock(returncode=0, stdout="tesseract 5.3.0\n", stderr=""),
# --list-langs call
MagicMock(returncode=0, stdout="List of available languages:\neng\nosd\n", stderr=""),
]
result = check_tesseract()
assert result["installed"] is True
assert result["has_eng"] is True
@patch("skill_seekers.cli.video_setup.subprocess.run")
@patch("skill_seekers.cli.video_setup.shutil.which", return_value="/usr/bin/tesseract")
def test_tesseract_installed_no_eng(self, mock_which, mock_run):
mock_run.side_effect = [
MagicMock(returncode=0, stdout="tesseract 5.3.0\n", stderr=""),
MagicMock(returncode=0, stdout="List of available languages:\nosd\n", stderr=""),
]
result = check_tesseract()
assert result["installed"] is True
assert result["has_eng"] is False
def test_detect_distro_returns_string(self):
result = _detect_distro()
assert isinstance(result, str)
@patch("builtins.open", side_effect=OSError)
def test_detect_distro_no_os_release(self, mock_open):
assert _detect_distro() == "unknown"
# =============================================================================
# ROCm Configuration Tests
# =============================================================================
class TestROCmConfig(unittest.TestCase):
"""Tests for configure_rocm_env()."""
def test_sets_miopen_find_mode(self):
env_backup = os.environ.get("MIOPEN_FIND_MODE")
try:
os.environ.pop("MIOPEN_FIND_MODE", None)
changes = configure_rocm_env()
assert "MIOPEN_FIND_MODE=FAST" in changes
assert os.environ["MIOPEN_FIND_MODE"] == "FAST"
finally:
if env_backup is not None:
os.environ["MIOPEN_FIND_MODE"] = env_backup
def test_does_not_override_existing(self):
env_backup = os.environ.get("MIOPEN_FIND_MODE")
try:
os.environ["MIOPEN_FIND_MODE"] = "NORMAL"
changes = configure_rocm_env()
miopen_changes = [c for c in changes if "MIOPEN_FIND_MODE" in c]
assert len(miopen_changes) == 0
assert os.environ["MIOPEN_FIND_MODE"] == "NORMAL"
finally:
if env_backup is not None:
os.environ["MIOPEN_FIND_MODE"] = env_backup
else:
os.environ.pop("MIOPEN_FIND_MODE", None)
def test_sets_miopen_user_db_path(self):
env_backup = os.environ.get("MIOPEN_USER_DB_PATH")
try:
os.environ.pop("MIOPEN_USER_DB_PATH", None)
changes = configure_rocm_env()
db_changes = [c for c in changes if "MIOPEN_USER_DB_PATH" in c]
assert len(db_changes) == 1
finally:
if env_backup is not None:
os.environ["MIOPEN_USER_DB_PATH"] = env_backup
# =============================================================================
# Module Selection Tests
# =============================================================================
class TestModuleSelection(unittest.TestCase):
"""Tests for SetupModules and _build_visual_deps."""
def test_default_modules_all_true(self):
m = SetupModules()
assert m.torch is True
assert m.easyocr is True
assert m.opencv is True
assert m.tesseract is True
assert m.scenedetect is True
assert m.whisper is True
def test_build_all_deps(self):
deps = _build_visual_deps(SetupModules())
assert "yt-dlp" in deps
assert "youtube-transcript-api" in deps
assert "easyocr" in deps
assert "opencv-python-headless" in deps
assert "pytesseract" in deps
assert "scenedetect[opencv]" in deps
assert "faster-whisper" in deps
def test_build_no_optional_deps(self):
"""Even with all optional modules off, base video deps are included."""
m = SetupModules(
torch=False, easyocr=False, opencv=False,
tesseract=False, scenedetect=False, whisper=False,
)
deps = _build_visual_deps(m)
assert deps == list(_BASE_VIDEO_DEPS)
def test_build_partial_deps(self):
m = SetupModules(easyocr=True, opencv=True, tesseract=False, scenedetect=False, whisper=False)
deps = _build_visual_deps(m)
assert "yt-dlp" in deps
assert "youtube-transcript-api" in deps
assert "easyocr" in deps
assert "opencv-python-headless" in deps
assert "pytesseract" not in deps
assert "faster-whisper" not in deps
# =============================================================================
# Installation Tests
# =============================================================================
class TestInstallation(unittest.TestCase):
"""Tests for install_torch() and install_visual_deps()."""
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_torch_success(self, mock_run):
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
assert install_torch(gpu) is True
call_args = mock_run.call_args[0][0]
assert "torch" in call_args
assert "--index-url" in call_args
assert f"{_PYTORCH_BASE}/cu124" in call_args
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_torch_cpu(self, mock_run):
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
gpu = GPUInfo(vendor=GPUVendor.NONE, index_url=f"{_PYTORCH_BASE}/cpu")
assert install_torch(gpu) is True
call_args = mock_run.call_args[0][0]
assert f"{_PYTORCH_BASE}/cpu" in call_args
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_torch_failure(self, mock_run):
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="error msg")
gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
assert install_torch(gpu) is False
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_torch_timeout(self, mock_run):
mock_run.side_effect = subprocess.TimeoutExpired(cmd="pip", timeout=600)
gpu = GPUInfo(vendor=GPUVendor.NVIDIA, index_url=f"{_PYTORCH_BASE}/cu124")
assert install_torch(gpu) is False
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_torch_custom_python(self, mock_run):
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
gpu = GPUInfo(vendor=GPUVendor.NONE, index_url=f"{_PYTORCH_BASE}/cpu")
install_torch(gpu, python_exe="/custom/python")
call_args = mock_run.call_args[0][0]
assert call_args[0] == "/custom/python"
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_visual_deps_success(self, mock_run):
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
assert install_visual_deps() is True
call_args = mock_run.call_args[0][0]
assert "easyocr" in call_args
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_visual_deps_failure(self, mock_run):
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="error")
assert install_visual_deps() is False
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_visual_deps_partial_modules(self, mock_run):
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
modules = SetupModules(easyocr=True, opencv=False, tesseract=False, scenedetect=False, whisper=False)
install_visual_deps(modules)
call_args = mock_run.call_args[0][0]
assert "easyocr" in call_args
assert "opencv-python-headless" not in call_args
@patch("skill_seekers.cli.video_setup.subprocess.run")
def test_install_visual_deps_base_only(self, mock_run):
"""Even with all optional modules off, base video deps get installed."""
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
modules = SetupModules(easyocr=False, opencv=False, tesseract=False, scenedetect=False, whisper=False)
result = install_visual_deps(modules)
assert result is True
call_args = mock_run.call_args[0][0]
assert "yt-dlp" in call_args
assert "youtube-transcript-api" in call_args
assert "easyocr" not in call_args
# =============================================================================
# Verification Tests
# =============================================================================
class TestVerification(unittest.TestCase):
"""Tests for verify_installation()."""
@patch.dict("sys.modules", {"torch": None, "easyocr": None, "cv2": None})
def test_returns_dict(self):
results = verify_installation()
assert isinstance(results, dict)
def test_expected_keys(self):
results = verify_installation()
for key in ("yt-dlp", "youtube-transcript-api", "torch", "torch.cuda", "torch.rocm", "easyocr", "opencv"):
assert key in results, f"Missing key: {key}"
# =============================================================================
# Orchestrator Tests
# =============================================================================
class TestRunSetup(unittest.TestCase):
"""Tests for run_setup() orchestrator."""
@patch("skill_seekers.cli.video_setup.verify_installation")
@patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=True)
@patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
@patch("skill_seekers.cli.video_setup.check_tesseract")
@patch("skill_seekers.cli.video_setup.detect_gpu")
def test_non_interactive_success(self, mock_detect, mock_tess, mock_torch, mock_deps, mock_verify):
mock_detect.return_value = GPUInfo(
vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
)
mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
mock_verify.return_value = {
"torch": True, "torch.cuda": False, "torch.rocm": False,
"easyocr": True, "opencv": True, "pytesseract": True,
"scenedetect": True, "faster-whisper": True,
}
rc = run_setup(interactive=False)
assert rc == 0
mock_torch.assert_called_once()
mock_deps.assert_called_once()
@patch("skill_seekers.cli.video_setup.install_torch", return_value=False)
@patch("skill_seekers.cli.video_setup.check_tesseract")
@patch("skill_seekers.cli.video_setup.detect_gpu")
def test_failure_returns_nonzero(self, mock_detect, mock_tess, mock_torch):
mock_detect.return_value = GPUInfo(
vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
)
mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
rc = run_setup(interactive=False)
assert rc == 1
@patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
@patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=False)
@patch("skill_seekers.cli.video_setup.check_tesseract")
@patch("skill_seekers.cli.video_setup.detect_gpu")
def test_visual_deps_failure(self, mock_detect, mock_tess, mock_deps, mock_torch):
mock_detect.return_value = GPUInfo(
vendor=GPUVendor.NONE, name="CPU-only", index_url=f"{_PYTORCH_BASE}/cpu",
)
mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
rc = run_setup(interactive=False)
assert rc == 1
@patch("skill_seekers.cli.video_setup.verify_installation")
@patch("skill_seekers.cli.video_setup.install_visual_deps", return_value=True)
@patch("skill_seekers.cli.video_setup.install_torch", return_value=True)
@patch("skill_seekers.cli.video_setup.check_tesseract")
@patch("skill_seekers.cli.video_setup.detect_gpu")
def test_rocm_configures_env(self, mock_detect, mock_tess, mock_torch, mock_deps, mock_verify):
"""AMD GPU → configure_rocm_env called and env vars set."""
mock_detect.return_value = GPUInfo(
vendor=GPUVendor.AMD, name="RX 7900", index_url=f"{_PYTORCH_BASE}/rocm6.3",
)
mock_tess.return_value = {"installed": True, "has_eng": True, "install_cmd": "", "version": "5.3.0"}
mock_verify.return_value = {
"torch": True, "torch.cuda": False, "torch.rocm": True,
"easyocr": True, "opencv": True, "pytesseract": True,
"scenedetect": True, "faster-whisper": True,
}
rc = run_setup(interactive=False)
assert rc == 0
assert os.environ.get("MIOPEN_FIND_MODE") is not None
# =============================================================================
# Tesseract Circuit Breaker Tests (video_visual.py)
# =============================================================================
class TestTesseractCircuitBreaker(unittest.TestCase):
"""Tests for _tesseract_broken flag in video_visual.py."""
def test_circuit_breaker_flag_exists(self):
import skill_seekers.cli.video_visual as vv
assert hasattr(vv, "_tesseract_broken")
def test_circuit_breaker_skips_after_failure(self):
import skill_seekers.cli.video_visual as vv
from skill_seekers.cli.video_models import FrameType
# Save and set broken state
original = vv._tesseract_broken
try:
vv._tesseract_broken = True
result = vv._run_tesseract_ocr("/nonexistent/path.png", FrameType.CODE_EDITOR)
assert result == []
finally:
vv._tesseract_broken = original
def test_circuit_breaker_allows_when_not_broken(self):
import skill_seekers.cli.video_visual as vv
from skill_seekers.cli.video_models import FrameType
original = vv._tesseract_broken
try:
vv._tesseract_broken = False
if not vv.HAS_PYTESSERACT:
# pytesseract not installed → returns [] immediately
result = vv._run_tesseract_ocr("/nonexistent/path.png", FrameType.CODE_EDITOR)
assert result == []
# If pytesseract IS installed, it would try to run and potentially fail
# on our fake path — that's fine, the circuit breaker would trigger
finally:
vv._tesseract_broken = original
# =============================================================================
# MIOPEN Env Var Tests (video_visual.py)
# =============================================================================
class TestMIOPENEnvVars(unittest.TestCase):
"""Tests that video_visual.py sets MIOPEN env vars at import time."""
def test_miopen_find_mode_set(self):
# video_visual.py sets this at module level before torch import
assert "MIOPEN_FIND_MODE" in os.environ
def test_miopen_user_db_path_set(self):
assert "MIOPEN_USER_DB_PATH" in os.environ
# =============================================================================
# Argument & Early-Exit Tests
# =============================================================================
class TestVideoArgumentSetup(unittest.TestCase):
"""Tests for --setup flag in VIDEO_ARGUMENTS."""
def test_setup_in_video_arguments(self):
from skill_seekers.cli.arguments.video import VIDEO_ARGUMENTS
assert "setup" in VIDEO_ARGUMENTS
assert VIDEO_ARGUMENTS["setup"]["kwargs"]["action"] == "store_true"
def test_parser_accepts_setup(self):
import argparse
from skill_seekers.cli.arguments.video import add_video_arguments
parser = argparse.ArgumentParser()
add_video_arguments(parser)
args = parser.parse_args(["--setup"])
assert args.setup is True
def test_parser_default_false(self):
import argparse
from skill_seekers.cli.arguments.video import add_video_arguments
parser = argparse.ArgumentParser()
add_video_arguments(parser)
args = parser.parse_args(["--url", "https://example.com"])
assert args.setup is False
class TestVideoScraperSetupEarlyExit(unittest.TestCase):
"""Test that --setup exits before source validation."""
@patch("skill_seekers.cli.video_setup.run_setup", return_value=0)
def test_setup_skips_source_validation(self, mock_setup):
"""--setup without --url should NOT error about missing source."""
from skill_seekers.cli.video_scraper import main
old_argv = sys.argv
try:
sys.argv = ["video_scraper", "--setup"]
rc = main()
assert rc == 0
mock_setup.assert_called_once_with(interactive=True)
finally:
sys.argv = old_argv
if __name__ == "__main__":
unittest.main()

225
uv.lock generated
View File

@@ -250,6 +250,63 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f8/00/3ed12264094ec91f534fae429945efbaa9f8c666f3aa7061cc3b2a26a0cd/authlib-1.6.7-py2.py3-none-any.whl", hash = "sha256:c637340d9a02789d2efa1d003a7437d10d3e565237bcb5fcbc6c134c7b95bab0", size = 244115, upload-time = "2026-02-06T14:04:12.141Z" },
]
[[package]]
name = "av"
version = "16.1.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/78/cd/3a83ffbc3cc25b39721d174487fb0d51a76582f4a1703f98e46170ce83d4/av-16.1.0.tar.gz", hash = "sha256:a094b4fd87a3721dacf02794d3d2c82b8d712c85b9534437e82a8a978c175ffd", size = 4285203, upload-time = "2026-01-11T07:31:33.772Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/97/51/2217a9249409d2e88e16e3f16f7c0def9fd3e7ffc4238b2ec211f9935bdb/av-16.1.0-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:2395748b0c34fe3a150a1721e4f3d4487b939520991b13e7b36f8926b3b12295", size = 26942590, upload-time = "2026-01-09T20:17:58.588Z" },
{ url = "https://files.pythonhosted.org/packages/bf/cd/a7070f4febc76a327c38808e01e2ff6b94531fe0b321af54ea3915165338/av-16.1.0-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:72d7ac832710a158eeb7a93242370aa024a7646516291c562ee7f14a7ea881fd", size = 21507910, upload-time = "2026-01-09T20:18:02.309Z" },
{ url = "https://files.pythonhosted.org/packages/ae/30/ec812418cd9b297f0238fe20eb0747d8a8b68d82c5f73c56fe519a274143/av-16.1.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:6cbac833092e66b6b0ac4d81ab077970b8ca874951e9c3974d41d922aaa653ed", size = 38738309, upload-time = "2026-01-09T20:18:04.701Z" },
{ url = "https://files.pythonhosted.org/packages/3a/b8/6c5795bf1f05f45c5261f8bce6154e0e5e86b158a6676650ddd77c28805e/av-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:eb990672d97c18f99c02f31c8d5750236f770ffe354b5a52c5f4d16c5e65f619", size = 40293006, upload-time = "2026-01-09T20:18:07.238Z" },
{ url = "https://files.pythonhosted.org/packages/a7/44/5e183bcb9333fc3372ee6e683be8b0c9b515a506894b2d32ff465430c074/av-16.1.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:05ad70933ac3b8ef896a820ea64b33b6cca91a5fac5259cb9ba7fa010435be15", size = 40123516, upload-time = "2026-01-09T20:18:09.955Z" },
{ url = "https://files.pythonhosted.org/packages/12/1d/b5346d582a3c3d958b4d26a2cc63ce607233582d956121eb20d2bbe55c2e/av-16.1.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:d831a1062a3c47520bf99de6ec682bd1d64a40dfa958e5457bb613c5270e7ce3", size = 41463289, upload-time = "2026-01-09T20:18:12.459Z" },
{ url = "https://files.pythonhosted.org/packages/fa/31/acc946c0545f72b8d0d74584cb2a0ade9b7dfe2190af3ef9aa52a2e3c0b1/av-16.1.0-cp310-cp310-win_amd64.whl", hash = "sha256:358ab910fef3c5a806c55176f2b27e5663b33c4d0a692dafeb049c6ed71f8aff", size = 31754959, upload-time = "2026-01-09T20:18:14.718Z" },
{ url = "https://files.pythonhosted.org/packages/48/d0/b71b65d1b36520dcb8291a2307d98b7fc12329a45614a303ff92ada4d723/av-16.1.0-cp311-cp311-macosx_11_0_x86_64.whl", hash = "sha256:e88ad64ee9d2b9c4c5d891f16c22ae78e725188b8926eb88187538d9dd0b232f", size = 26927747, upload-time = "2026-01-09T20:18:16.976Z" },
{ url = "https://files.pythonhosted.org/packages/2f/79/720a5a6ccdee06eafa211b945b0a450e3a0b8fc3d12922f0f3c454d870d2/av-16.1.0-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:cb296073fa6935724de72593800ba86ae49ed48af03960a4aee34f8a611f442b", size = 21492232, upload-time = "2026-01-09T20:18:19.266Z" },
{ url = "https://files.pythonhosted.org/packages/8e/4f/a1ba8d922f2f6d1a3d52419463ef26dd6c4d43ee364164a71b424b5ae204/av-16.1.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:720edd4d25aa73723c1532bb0597806d7b9af5ee34fc02358782c358cfe2f879", size = 39291737, upload-time = "2026-01-09T20:18:21.513Z" },
{ url = "https://files.pythonhosted.org/packages/1a/31/fc62b9fe8738d2693e18d99f040b219e26e8df894c10d065f27c6b4f07e3/av-16.1.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:c7f2bc703d0df260a1fdf4de4253c7f5500ca9fc57772ea241b0cb241bcf972e", size = 40846822, upload-time = "2026-01-09T20:18:24.275Z" },
{ url = "https://files.pythonhosted.org/packages/53/10/ab446583dbce730000e8e6beec6ec3c2753e628c7f78f334a35cad0317f4/av-16.1.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d69c393809babada7d54964d56099e4b30a3e1f8b5736ca5e27bd7be0e0f3c83", size = 40675604, upload-time = "2026-01-09T20:18:26.866Z" },
{ url = "https://files.pythonhosted.org/packages/31/d7/1003be685277005f6d63fd9e64904ee222fe1f7a0ea70af313468bb597db/av-16.1.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:441892be28582356d53f282873c5a951592daaf71642c7f20165e3ddcb0b4c63", size = 42015955, upload-time = "2026-01-09T20:18:29.461Z" },
{ url = "https://files.pythonhosted.org/packages/2f/4a/fa2a38ee9306bf4579f556f94ecbc757520652eb91294d2a99c7cf7623b9/av-16.1.0-cp311-cp311-win_amd64.whl", hash = "sha256:273a3e32de64819e4a1cd96341824299fe06f70c46f2288b5dc4173944f0fd62", size = 31750339, upload-time = "2026-01-09T20:18:32.249Z" },
{ url = "https://files.pythonhosted.org/packages/9c/84/2535f55edcd426cebec02eb37b811b1b0c163f26b8d3f53b059e2ec32665/av-16.1.0-cp312-cp312-macosx_11_0_x86_64.whl", hash = "sha256:640f57b93f927fba8689f6966c956737ee95388a91bd0b8c8b5e0481f73513d6", size = 26945785, upload-time = "2026-01-09T20:18:34.486Z" },
{ url = "https://files.pythonhosted.org/packages/b6/17/ffb940c9e490bf42e86db4db1ff426ee1559cd355a69609ec1efe4d3a9eb/av-16.1.0-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:ae3fb658eec00852ebd7412fdc141f17f3ddce8afee2d2e1cf366263ad2a3b35", size = 21481147, upload-time = "2026-01-09T20:18:36.716Z" },
{ url = "https://files.pythonhosted.org/packages/15/c1/e0d58003d2d83c3921887d5c8c9b8f5f7de9b58dc2194356a2656a45cfdc/av-16.1.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:27ee558d9c02a142eebcbe55578a6d817fedfde42ff5676275504e16d07a7f86", size = 39517197, upload-time = "2026-01-11T09:57:31.937Z" },
{ url = "https://files.pythonhosted.org/packages/32/77/787797b43475d1b90626af76f80bfb0c12cfec5e11eafcfc4151b8c80218/av-16.1.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:7ae547f6d5fa31763f73900d43901e8c5fa6367bb9a9840978d57b5a7ae14ed2", size = 41174337, upload-time = "2026-01-11T09:57:35.792Z" },
{ url = "https://files.pythonhosted.org/packages/8e/ac/d90df7f1e3b97fc5554cf45076df5045f1e0a6adf13899e10121229b826c/av-16.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8cf065f9d438e1921dc31fc7aa045790b58aee71736897866420d80b5450f62a", size = 40817720, upload-time = "2026-01-11T09:57:39.039Z" },
{ url = "https://files.pythonhosted.org/packages/80/6f/13c3a35f9dbcebafd03fe0c4cbd075d71ac8968ec849a3cfce406c35a9d2/av-16.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:a345877a9d3cc0f08e2bc4ec163ee83176864b92587afb9d08dff50f37a9a829", size = 42267396, upload-time = "2026-01-11T09:57:42.115Z" },
{ url = "https://files.pythonhosted.org/packages/c8/b9/275df9607f7fb44317ccb1d4be74827185c0d410f52b6e2cd770fe209118/av-16.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:f49243b1d27c91cd8c66fdba90a674e344eb8eb917264f36117bf2b6879118fd", size = 31752045, upload-time = "2026-01-11T09:57:45.106Z" },
{ url = "https://files.pythonhosted.org/packages/75/2a/63797a4dde34283dd8054219fcb29294ba1c25d68ba8c8c8a6ae53c62c45/av-16.1.0-cp313-cp313-macosx_11_0_x86_64.whl", hash = "sha256:ce2a1b3d8bf619f6c47a9f28cfa7518ff75ddd516c234a4ee351037b05e6a587", size = 26916715, upload-time = "2026-01-11T09:57:47.682Z" },
{ url = "https://files.pythonhosted.org/packages/d2/c4/0b49cf730d0ae8cda925402f18ae814aef351f5772d14da72dd87ff66448/av-16.1.0-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:408dbe6a2573ca58a855eb8cd854112b33ea598651902c36709f5f84c991ed8e", size = 21452167, upload-time = "2026-01-11T09:57:50.606Z" },
{ url = "https://files.pythonhosted.org/packages/51/23/408806503e8d5d840975aad5699b153aaa21eb6de41ade75248a79b7a37f/av-16.1.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:57f657f86652a160a8a01887aaab82282f9e629abf94c780bbdbb01595d6f0f7", size = 39215659, upload-time = "2026-01-11T09:57:53.757Z" },
{ url = "https://files.pythonhosted.org/packages/c4/19/a8528d5bba592b3903f44c28dab9cc653c95fcf7393f382d2751a1d1523e/av-16.1.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:adbad2b355c2ee4552cac59762809d791bda90586d134a33c6f13727fb86cb3a", size = 40874970, upload-time = "2026-01-11T09:57:56.802Z" },
{ url = "https://files.pythonhosted.org/packages/e8/24/2dbcdf0e929ad56b7df078e514e7bd4ca0d45cba798aff3c8caac097d2f7/av-16.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:f42e1a68ec2aebd21f7eb6895be69efa6aa27eec1670536876399725bbda4b99", size = 40530345, upload-time = "2026-01-11T09:58:00.421Z" },
{ url = "https://files.pythonhosted.org/packages/54/27/ae91b41207f34e99602d1c72ab6ffd9c51d7c67e3fbcd4e3a6c0e54f882c/av-16.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:58fe47aeaef0f100c40ec8a5de9abbd37f118d3ca03829a1009cf288e9aef67c", size = 41972163, upload-time = "2026-01-11T09:58:03.756Z" },
{ url = "https://files.pythonhosted.org/packages/fc/7a/22158fb923b2a9a00dfab0e96ef2e8a1763a94dd89e666a5858412383d46/av-16.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:565093ebc93b2f4b76782589564869dadfa83af5b852edebedd8fee746457d06", size = 31729230, upload-time = "2026-01-11T09:58:07.254Z" },
{ url = "https://files.pythonhosted.org/packages/7f/f1/878f8687d801d6c4565d57ebec08449c46f75126ebca8e0fed6986599627/av-16.1.0-cp313-cp313t-macosx_11_0_x86_64.whl", hash = "sha256:574081a24edb98343fd9f473e21ae155bf61443d4ec9d7708987fa597d6b04b2", size = 27008769, upload-time = "2026-01-11T09:58:10.266Z" },
{ url = "https://files.pythonhosted.org/packages/30/f1/bd4ce8c8b5cbf1d43e27048e436cbc9de628d48ede088a1d0a993768eb86/av-16.1.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:9ab00ea29c25ebf2ea1d1e928d7babb3532d562481c5d96c0829212b70756ad0", size = 21590588, upload-time = "2026-01-11T09:58:12.629Z" },
{ url = "https://files.pythonhosted.org/packages/1d/dd/c81f6f9209201ff0b5d5bed6da6c6e641eef52d8fbc930d738c3f4f6f75d/av-16.1.0-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:a84a91188c1071f238a9523fd42dbe567fb2e2607b22b779851b2ce0eac1b560", size = 40638029, upload-time = "2026-01-11T09:58:15.399Z" },
{ url = "https://files.pythonhosted.org/packages/15/4d/07edff82b78d0459a6e807e01cd280d3180ce832efc1543de80d77676722/av-16.1.0-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:c2cd0de4dd022a7225ff224fde8e7971496d700be41c50adaaa26c07bb50bf97", size = 41970776, upload-time = "2026-01-11T09:58:19.075Z" },
{ url = "https://files.pythonhosted.org/packages/da/9d/1f48b354b82fa135d388477cd1b11b81bdd4384bd6a42a60808e2ec2d66b/av-16.1.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:0816143530624a5a93bc5494f8c6eeaf77549b9366709c2ac8566c1e9bff6df5", size = 41764751, upload-time = "2026-01-11T09:58:22.788Z" },
{ url = "https://files.pythonhosted.org/packages/2f/c7/a509801e98db35ec552dd79da7bdbcff7104044bfeb4c7d196c1ce121593/av-16.1.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:e3a28053af29644696d0c007e897d19b1197585834660a54773e12a40b16974c", size = 43034355, upload-time = "2026-01-11T09:58:26.125Z" },
{ url = "https://files.pythonhosted.org/packages/36/8b/e5f530d9e8f640da5f5c5f681a424c65f9dd171c871cd255d8a861785a6e/av-16.1.0-cp313-cp313t-win_amd64.whl", hash = "sha256:2e3e67144a202b95ed299d165232533989390a9ea3119d37eccec697dc6dbb0c", size = 31947047, upload-time = "2026-01-11T09:58:31.867Z" },
{ url = "https://files.pythonhosted.org/packages/df/18/8812221108c27d19f7e5f486a82c827923061edf55f906824ee0fcaadf50/av-16.1.0-cp314-cp314-macosx_11_0_x86_64.whl", hash = "sha256:39a634d8e5a87e78ea80772774bfd20c0721f0d633837ff185f36c9d14ffede4", size = 26916179, upload-time = "2026-01-11T09:58:36.506Z" },
{ url = "https://files.pythonhosted.org/packages/38/ef/49d128a9ddce42a2766fe2b6595bd9c49e067ad8937a560f7838a541464e/av-16.1.0-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:0ba32fb9e9300948a7fa9f8a3fc686e6f7f77599a665c71eb2118fdfd2c743f9", size = 21460168, upload-time = "2026-01-11T09:58:39.231Z" },
{ url = "https://files.pythonhosted.org/packages/e6/a9/b310d390844656fa74eeb8c2750e98030877c75b97551a23a77d3f982741/av-16.1.0-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:ca04d17815182d34ce3edc53cbda78a4f36e956c0fd73e3bab249872a831c4d7", size = 39210194, upload-time = "2026-01-11T09:58:42.138Z" },
{ url = "https://files.pythonhosted.org/packages/0c/7b/e65aae179929d0f173af6e474ad1489b5b5ad4c968a62c42758d619e54cf/av-16.1.0-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:ee0e8de2e124a9ef53c955fe2add6ee7c56cc8fd83318265549e44057db77142", size = 40811675, upload-time = "2026-01-11T09:58:45.871Z" },
{ url = "https://files.pythonhosted.org/packages/54/3f/5d7edefd26b6a5187d6fac0f5065ee286109934f3dea607ef05e53f05b31/av-16.1.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:22bf77a2f658827043a1e184b479c3bf25c4c43ab32353677df2d119f080e28f", size = 40543942, upload-time = "2026-01-11T09:58:49.759Z" },
{ url = "https://files.pythonhosted.org/packages/1b/24/f8b17897b67be0900a211142f5646a99d896168f54d57c81f3e018853796/av-16.1.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:2dd419d262e6a71cab206d80bbf28e0a10d0f227b671cdf5e854c028faa2d043", size = 41924336, upload-time = "2026-01-11T09:58:53.344Z" },
{ url = "https://files.pythonhosted.org/packages/1c/cf/d32bc6bbbcf60b65f6510c54690ed3ae1c4ca5d9fafbce835b6056858686/av-16.1.0-cp314-cp314-win_amd64.whl", hash = "sha256:53585986fd431cd436f290fba662cfb44d9494fbc2949a183de00acc5b33fa88", size = 31735077, upload-time = "2026-01-11T09:58:56.684Z" },
{ url = "https://files.pythonhosted.org/packages/53/f4/9b63dc70af8636399bd933e9df4f3025a0294609510239782c1b746fc796/av-16.1.0-cp314-cp314t-macosx_11_0_x86_64.whl", hash = "sha256:76f5ed8495cf41e1209a5775d3699dc63fdc1740b94a095e2485f13586593205", size = 27014423, upload-time = "2026-01-11T09:58:59.703Z" },
{ url = "https://files.pythonhosted.org/packages/d1/da/787a07a0d6ed35a0888d7e5cfb8c2ffa202f38b7ad2c657299fac08eb046/av-16.1.0-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:8d55397190f12a1a3ae7538be58c356cceb2bf50df1b33523817587748ce89e5", size = 21595536, upload-time = "2026-01-11T09:59:02.508Z" },
{ url = "https://files.pythonhosted.org/packages/d8/f4/9a7d8651a611be6e7e3ab7b30bb43779899c8cac5f7293b9fb634c44a3f3/av-16.1.0-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:9d51d9037437218261b4bbf9df78a95e216f83d7774fbfe8d289230b5b2e28e2", size = 40642490, upload-time = "2026-01-11T09:59:05.842Z" },
{ url = "https://files.pythonhosted.org/packages/6b/e4/eb79bc538a94b4ff93cd4237d00939cba797579f3272490dd0144c165a21/av-16.1.0-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:0ce07a89c15644407f49d942111ca046e323bbab0a9078ff43ee57c9b4a50dad", size = 41976905, upload-time = "2026-01-11T09:59:09.169Z" },
{ url = "https://files.pythonhosted.org/packages/5e/f5/f6db0dd86b70167a4d55ee0d9d9640983c570d25504f2bde42599f38241e/av-16.1.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:cac0c074892ea97113b53556ff41c99562db7b9f09f098adac1f08318c2acad5", size = 41770481, upload-time = "2026-01-11T09:59:12.74Z" },
{ url = "https://files.pythonhosted.org/packages/9e/8b/33651d658e45e16ab7671ea5fcf3d20980ea7983234f4d8d0c63c65581a5/av-16.1.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:7dec3dcbc35a187ce450f65a2e0dda820d5a9e6553eea8344a1459af11c98649", size = 43036824, upload-time = "2026-01-11T09:59:16.507Z" },
{ url = "https://files.pythonhosted.org/packages/83/41/7f13361db54d7e02f11552575c0384dadaf0918138f4eaa82ea03a9f9580/av-16.1.0-cp314-cp314t-win_amd64.whl", hash = "sha256:6f90dc082ff2068ddbe77618400b44d698d25d9c4edac57459e250c16b33d700", size = 31948164, upload-time = "2026-01-11T09:59:19.501Z" },
]
[[package]]
name = "azure-core"
version = "1.38.0"
@@ -871,6 +928,49 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/0d/c3/e90f4a4feae6410f914f8ebac129b9ae7a8c92eb60a638012dde42030a9d/cryptography-46.0.3-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:6b5063083824e5509fdba180721d55909ffacccc8adbec85268b48439423d78c", size = 3438528, upload-time = "2025-10-15T23:18:26.227Z" },
]
[[package]]
name = "ctranslate2"
version = "4.7.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
{ name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
{ name = "pyyaml" },
{ name = "setuptools" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/e0/b69c40c3d739b213a78d327071240590792071b4f890e34088b03b95bb1e/ctranslate2-4.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:9017a355dd7c6d29dc3bca6e9fc74827306c61b702c66bb1f6b939655e7de3fa", size = 1255773, upload-time = "2026-02-04T06:11:04.769Z" },
{ url = "https://files.pythonhosted.org/packages/51/29/e5c2fc1253e3fb9b2c86997f36524bba182a8ed77fb4f8fe8444a5649191/ctranslate2-4.7.1-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:6abcd0552285e7173475836f9d133e04dfc3e42ca8e6930f65eaa4b8b13a47fa", size = 11914945, upload-time = "2026-02-04T06:11:06.853Z" },
{ url = "https://files.pythonhosted.org/packages/03/25/e7fe847d3f02c84d2e9c5e8312434fbeab5af3d8916b6c8e2bdbe860d052/ctranslate2-4.7.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8492cba605319e0d7f2760180957d5a2a435dfdebcef1a75d2ade740e6b9fb0b", size = 16547973, upload-time = "2026-02-04T06:11:09.021Z" },
{ url = "https://files.pythonhosted.org/packages/68/75/074ed22bc340c2e26c09af6bf85859b586516e4e2d753b20189936d0dcf7/ctranslate2-4.7.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:688bd82482b5d057eff5bc1e727f11bb9a1277b7e4fce8ab01fd3bb70e69294b", size = 38636471, upload-time = "2026-02-04T06:11:12.146Z" },
{ url = "https://files.pythonhosted.org/packages/76/b6/9baf8a565f6dcdbfbc9cfd179dd6214529838cda4e91e89b616045a670f0/ctranslate2-4.7.1-cp310-cp310-win_amd64.whl", hash = "sha256:3b39a5f4e3c87ac91976996458a64ba08a7cbf974dc0be4e6df83a9e040d4bd2", size = 18842389, upload-time = "2026-02-04T06:11:15.154Z" },
{ url = "https://files.pythonhosted.org/packages/da/25/41920ccee68e91cb6fa0fc9e8078ab2b7839f2c668f750dc123144cb7c6e/ctranslate2-4.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:f74200bab9996b14a57cf6f7cb27d0921ceedc4acc1e905598e3e85b4d75b1ec", size = 1256943, upload-time = "2026-02-04T06:11:17.781Z" },
{ url = "https://files.pythonhosted.org/packages/79/22/bc81fcc9f10ba4da3ffd1a9adec15cfb73cb700b3bbe69c6c8b55d333316/ctranslate2-4.7.1-cp311-cp311-macosx_11_0_x86_64.whl", hash = "sha256:59b427eb3ac999a746315b03a63942fddd351f511db82ba1a66880d4dea98e25", size = 11916445, upload-time = "2026-02-04T06:11:19.938Z" },
{ url = "https://files.pythonhosted.org/packages/0a/a7/494a66bb02c7926331cadfff51d5ce81f5abfb1e8d05d7f2459082f31b48/ctranslate2-4.7.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:95f0c1051c180669d2a83a44b44b518b2d1683de125f623bbc81ad5dd6f6141c", size = 16696997, upload-time = "2026-02-04T06:11:22.697Z" },
{ url = "https://files.pythonhosted.org/packages/ed/4e/b48f79fd36e5d3c7e12db383aa49814c340921a618ef7364bd0ced670644/ctranslate2-4.7.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0ed92d9ab0ac6bc7005942be83d68714c80adb0897ab17f98157294ee0374347", size = 38836379, upload-time = "2026-02-04T06:11:26.325Z" },
{ url = "https://files.pythonhosted.org/packages/d2/23/8c01ac52e1f26fc4dbe985a35222ae7cd365bbf7ee5db5fd5545d8926f91/ctranslate2-4.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:67d9ad9b69933fbfeee7dcec899b2cd9341d5dca4fdfb53e8ba8c109dc332ee1", size = 18843315, upload-time = "2026-02-04T06:11:29.441Z" },
{ url = "https://files.pythonhosted.org/packages/fc/0f/581de94b64c5f2327a736270bc7e7a5f8fe5cf1ed56a2203b52de4d8986a/ctranslate2-4.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4c0cbd46a23b8dc37ccdbd9b447cb5f7fadc361c90e9df17d82ca84b1f019986", size = 1257089, upload-time = "2026-02-04T06:11:32.442Z" },
{ url = "https://files.pythonhosted.org/packages/3d/e9/d55b0e436362f9fe26bd98fefd2dd5d81926121f1d7f799c805e6035bb26/ctranslate2-4.7.1-cp312-cp312-macosx_11_0_x86_64.whl", hash = "sha256:5b141ddad1da5f84cf3c2a569a56227a37de649a555d376cbd9b80e8f0373dd8", size = 11918502, upload-time = "2026-02-04T06:11:33.986Z" },
{ url = "https://files.pythonhosted.org/packages/ec/ce/9f29f0b0bb4280c2ebafb3ddb6cdff8ef1c2e185ee020c0ec0ecba7dc934/ctranslate2-4.7.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d00a62544db4a3caaa58a3c50d39b25613c042b430053ae32384d94eb1d40990", size = 16859601, upload-time = "2026-02-04T06:11:36.227Z" },
{ url = "https://files.pythonhosted.org/packages/b3/86/428d270fd72117d19fb48ed3211aa8a3c8bd7577373252962cb634e0fd01/ctranslate2-4.7.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:722b93a89647974cbd182b4c7f87fefc7794fff7fc9cbd0303b6447905cc157e", size = 38995338, upload-time = "2026-02-04T06:11:42.789Z" },
{ url = "https://files.pythonhosted.org/packages/4a/f4/d23dbfb9c62cb642c114a30f05d753ba61d6ffbfd8a3a4012fe85a073bcb/ctranslate2-4.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:d0f734dc3757118094663bdaaf713f5090c55c1927fb330a76bb8b84173940e8", size = 18844949, upload-time = "2026-02-04T06:11:45.436Z" },
{ url = "https://files.pythonhosted.org/packages/34/6d/eb49ba05db286b4ea9d5d3fcf5f5cd0a9a5e218d46349618d5041001e303/ctranslate2-4.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:6b2abf2929756e3ec6246057b56df379995661560a2d776af05f9d97f63afcf5", size = 1256960, upload-time = "2026-02-04T06:11:47.487Z" },
{ url = "https://files.pythonhosted.org/packages/45/5a/b9cce7b00d89fc6fdeaf27587aa52d0597b465058563e93ff50910553bdd/ctranslate2-4.7.1-cp313-cp313-macosx_11_0_x86_64.whl", hash = "sha256:857ef3959d6b1c40dc227c715a36db33db2d097164996d6c75b6db8e30828f52", size = 11918645, upload-time = "2026-02-04T06:11:49.599Z" },
{ url = "https://files.pythonhosted.org/packages/ea/03/c0db0a5276599fb44ceafa2f2cb1afd5628808ec406fe036060a39693680/ctranslate2-4.7.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:393a9e7e989034660526a2c0e8bb65d1924f43d9a5c77d336494a353d16ba2a4", size = 16860452, upload-time = "2026-02-04T06:11:52.276Z" },
{ url = "https://files.pythonhosted.org/packages/0b/03/4e3728ce29d192ee75ed9a2d8589bf4f19edafe5bed3845187de51b179a3/ctranslate2-4.7.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5a3d0682f2b9082e31c73d75b45f16cde77355ab76d7e8356a24c3cb2480a6d3", size = 38995174, upload-time = "2026-02-04T06:11:55.477Z" },
{ url = "https://files.pythonhosted.org/packages/9b/15/6e8e87c6a201d69803a79ac2e29623ce7c2cc9cd1df9db99810cca714373/ctranslate2-4.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:baa6d2b10f57933d8c11791e8522659217918722d07bbef2389a443801125fe7", size = 18844953, upload-time = "2026-02-04T06:11:58.519Z" },
{ url = "https://files.pythonhosted.org/packages/fd/73/8a6b7ba18cad0c8667ee221ddab8c361cb70926440e5b8dd0e81924c28ac/ctranslate2-4.7.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:d5dfb076566551f4959dfd0706f94c923c1931def9b7bb249a2caa6ab23353a0", size = 1257560, upload-time = "2026-02-04T06:12:00.926Z" },
{ url = "https://files.pythonhosted.org/packages/70/c2/8817ca5d6c1b175b23a12f7c8b91484652f8718a76353317e5919b038733/ctranslate2-4.7.1-cp314-cp314-macosx_11_0_x86_64.whl", hash = "sha256:eecdb4ed934b384f16e8c01b185b082d6b5ffc7dcbb0b6a6eb48cd465282d957", size = 11918995, upload-time = "2026-02-04T06:12:02.875Z" },
{ url = "https://files.pythonhosted.org/packages/ac/33/b8eb3acc67bbca4d9872fc9ff94db78e6167a7ba5cd932f585d1560effc7/ctranslate2-4.7.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1aa6796edcc3c8d163c9e39c429d50076d266d68980fed9d1b2443f617c67e9e", size = 16844162, upload-time = "2026-02-04T06:12:05.099Z" },
{ url = "https://files.pythonhosted.org/packages/80/11/6474893b07121057035069a0a483fe1cd8c47878213f282afb4c0c6fc275/ctranslate2-4.7.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:24c0482c51726430fb83724451921c0e539d769c8618dcfd46b1645e7f75960d", size = 38966728, upload-time = "2026-02-04T06:12:07.923Z" },
{ url = "https://files.pythonhosted.org/packages/94/88/8fc7ff435c5e783e5fad9586d839d463e023988dbbbad949d442092d01f1/ctranslate2-4.7.1-cp314-cp314-win_amd64.whl", hash = "sha256:76db234c0446a23d20dd8eeaa7a789cc87d1d05283f48bf3152bae9fa0a69844", size = 19100788, upload-time = "2026-02-04T06:12:10.592Z" },
{ url = "https://files.pythonhosted.org/packages/d9/b3/f100013a76a98d64e67c721bd4559ea4eeb54be3e4ac45f4d801769899af/ctranslate2-4.7.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:058c9db2277dc8b19ecc86c7937628f69022f341844b9081d2ab642965d88fc6", size = 1280179, upload-time = "2026-02-04T06:12:12.596Z" },
{ url = "https://files.pythonhosted.org/packages/39/22/b77f748015667a5e2ca54a5ee080d7016fce34314f0e8cf904784549305a/ctranslate2-4.7.1-cp314-cp314t-macosx_11_0_x86_64.whl", hash = "sha256:5abcf885062c7f28a3f9a46be8d185795e8706ac6230ad086cae0bc82917df31", size = 11940166, upload-time = "2026-02-04T06:12:14.054Z" },
{ url = "https://files.pythonhosted.org/packages/7d/78/6d7fd52f646c6ba3343f71277a9bbef33734632949d1651231948b0f0359/ctranslate2-4.7.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9950acb04a002d5c60ae90a1ddceead1a803af1f00cadd9b1a1dc76e1f017481", size = 16849483, upload-time = "2026-02-04T06:12:17.082Z" },
{ url = "https://files.pythonhosted.org/packages/40/27/58769ff15ac31b44205bd7a8aeca80cf7357c657ea5df1b94ce0f5c83771/ctranslate2-4.7.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1dcc734e92e3f1ceeaa0c42bbfd009352857be179ecd4a7ed6cccc086a202f58", size = 38949393, upload-time = "2026-02-04T06:12:21.302Z" },
{ url = "https://files.pythonhosted.org/packages/0e/5c/9fa0ad6462b62efd0fb5ac1100eee47bc96ecc198ff4e237c731e5473616/ctranslate2-4.7.1-cp314-cp314t-win_amd64.whl", hash = "sha256:dfb7657bdb7b8211c8f9ecb6f3b70bc0db0e0384d01a8b1808cb66fe7199df59", size = 19123451, upload-time = "2026-02-04T06:12:24.115Z" },
]
[[package]]
name = "cuda-bindings"
version = "12.9.4"
@@ -1006,6 +1106,22 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ae/8b/c8050e556f5d7a1f33a93c2c94379a0bae23c58a79ad9709d7e052d0c3b8/fastapi-0.128.4-py3-none-any.whl", hash = "sha256:9321282cee605fd2075ccbc95c0f2e549d675c59de4a952bba202cd1730ac66b", size = 103684, upload-time = "2026-02-07T08:14:07.939Z" },
]
[[package]]
name = "faster-whisper"
version = "1.2.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "av" },
{ name = "ctranslate2" },
{ name = "huggingface-hub" },
{ name = "onnxruntime" },
{ name = "tokenizers" },
{ name = "tqdm" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/05/99/49ee85903dee060d9f08297b4a342e5e0bcfca2f027a07b4ee0a38ab13f9/faster_whisper-1.2.1-py3-none-any.whl", hash = "sha256:79a66ad50688c0b794dd501dc340a736992a6342f7f95e5811be60b5224a26a7", size = 1118909, upload-time = "2025-10-31T11:35:47.794Z" },
]
[[package]]
name = "ffmpeg-python"
version = "0.2.0"
@@ -3391,6 +3507,44 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/27/4b/7c1a00c2c3fbd004253937f7520f692a9650767aa73894d7a34f0d65d3f4/openai-2.14.0-py3-none-any.whl", hash = "sha256:7ea40aca4ffc4c4a776e77679021b47eec1160e341f42ae086ba949c9dcc9183", size = 1067558, upload-time = "2025-12-19T03:28:43.727Z" },
]
[[package]]
name = "opencv-python"
version = "4.13.0.92"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
{ name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/fc/6f/5a28fef4c4a382be06afe3938c64cc168223016fa520c5abaf37e8862aa5/opencv_python-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:caf60c071ec391ba51ed00a4a920f996d0b64e3e46068aac1f646b5de0326a19", size = 46247052, upload-time = "2026-02-05T07:01:25.046Z" },
{ url = "https://files.pythonhosted.org/packages/08/ac/6c98c44c650b8114a0fb901691351cfb3956d502e8e9b5cd27f4ee7fbf2f/opencv_python-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:5868a8c028a0b37561579bfb8ac1875babdc69546d236249fff296a8c010ccf9", size = 32568781, upload-time = "2026-02-05T07:01:41.379Z" },
{ url = "https://files.pythonhosted.org/packages/3e/51/82fed528b45173bf629fa44effb76dff8bc9f4eeaee759038362dfa60237/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:0bc2596e68f972ca452d80f444bc404e08807d021fbba40df26b61b18e01838a", size = 47685527, upload-time = "2026-02-05T06:59:11.24Z" },
{ url = "https://files.pythonhosted.org/packages/db/07/90b34a8e2cf9c50fe8ed25cac9011cde0676b4d9d9c973751ac7616223a2/opencv_python-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:402033cddf9d294693094de5ef532339f14ce821da3ad7df7c9f6e8316da32cf", size = 70460872, upload-time = "2026-02-05T06:59:19.162Z" },
{ url = "https://files.pythonhosted.org/packages/02/6d/7a9cc719b3eaf4377b9c2e3edeb7ed3a81de41f96421510c0a169ca3cfd4/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:bccaabf9eb7f897ca61880ce2869dcd9b25b72129c28478e7f2a5e8dee945616", size = 46708208, upload-time = "2026-02-05T06:59:15.419Z" },
{ url = "https://files.pythonhosted.org/packages/fd/55/b3b49a1b97aabcfbbd6c7326df9cb0b6fa0c0aefa8e89d500939e04aa229/opencv_python-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:620d602b8f7d8b8dab5f4b99c6eb353e78d3fb8b0f53db1bd258bb1aa001c1d5", size = 72927042, upload-time = "2026-02-05T06:59:23.389Z" },
{ url = "https://files.pythonhosted.org/packages/fb/17/de5458312bcb07ddf434d7bfcb24bb52c59635ad58c6e7c751b48949b009/opencv_python-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:372fe164a3148ac1ca51e5f3ad0541a4a276452273f503441d718fab9c5e5f59", size = 30932638, upload-time = "2026-02-05T07:02:14.98Z" },
{ url = "https://files.pythonhosted.org/packages/e9/a5/1be1516390333ff9be3a9cb648c9f33df79d5096e5884b5df71a588af463/opencv_python-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:423d934c9fafb91aad38edf26efb46da91ffbc05f3f59c4b0c72e699720706f5", size = 40212062, upload-time = "2026-02-05T07:02:12.724Z" },
]
[[package]]
name = "opencv-python-headless"
version = "4.13.0.92"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
{ name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/79/42/2310883be3b8826ac58c3f2787b9358a2d46923d61f88fedf930bc59c60c/opencv_python_headless-4.13.0.92-cp37-abi3-macosx_13_0_arm64.whl", hash = "sha256:1a7d040ac656c11b8c38677cc8cccdc149f98535089dbe5b081e80a4e5903209", size = 46247192, upload-time = "2026-02-05T07:01:35.187Z" },
{ url = "https://files.pythonhosted.org/packages/2d/1e/6f9e38005a6f7f22af785df42a43139d0e20f169eb5787ce8be37ee7fcc9/opencv_python_headless-4.13.0.92-cp37-abi3-macosx_14_0_x86_64.whl", hash = "sha256:3e0a6f0a37994ec6ce5f59e936be21d5d6384a4556f2d2da9c2f9c5dc948394c", size = 32568914, upload-time = "2026-02-05T07:01:51.989Z" },
{ url = "https://files.pythonhosted.org/packages/21/76/9417a6aef9def70e467a5bf560579f816148a4c658b7d525581b356eda9e/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:5c8cfc8e87ed452b5cecb9419473ee5560a989859fe1d10d1ce11ae87b09a2cb", size = 33703709, upload-time = "2026-02-05T10:24:46.469Z" },
{ url = "https://files.pythonhosted.org/packages/92/ce/bd17ff5772938267fd49716e94ca24f616ff4cb1ff4c6be13085108037be/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:0525a3d2c0b46c611e2130b5fdebc94cf404845d8fa64d2f3a3b679572a5bd22", size = 56016764, upload-time = "2026-02-05T10:26:48.904Z" },
{ url = "https://files.pythonhosted.org/packages/8f/b4/b7bcbf7c874665825a8c8e1097e93ea25d1f1d210a3e20d4451d01da30aa/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux_2_28_aarch64.whl", hash = "sha256:eb60e36b237b1ebd40a912da5384b348df8ed534f6f644d8e0b4f103e272ba7d", size = 35010236, upload-time = "2026-02-05T10:28:11.031Z" },
{ url = "https://files.pythonhosted.org/packages/4b/33/b5db29a6c00eb8f50708110d8d453747ca125c8b805bc437b289dbdcc057/opencv_python_headless-4.13.0.92-cp37-abi3-manylinux_2_28_x86_64.whl", hash = "sha256:0bd48544f77c68b2941392fcdf9bcd2b9cdf00e98cb8c29b2455d194763cf99e", size = 60391106, upload-time = "2026-02-05T10:30:14.236Z" },
{ url = "https://files.pythonhosted.org/packages/fb/c3/52cfea47cd33e53e8c0fbd6e7c800b457245c1fda7d61660b4ffe9596a7f/opencv_python_headless-4.13.0.92-cp37-abi3-win32.whl", hash = "sha256:a7cf08e5b191f4ebb530791acc0825a7986e0d0dee2a3c491184bd8599848a4b", size = 30812232, upload-time = "2026-02-05T07:02:29.594Z" },
{ url = "https://files.pythonhosted.org/packages/4a/90/b338326131ccb2aaa3c2c85d00f41822c0050139a4bfe723cfd95455bd2d/opencv_python_headless-4.13.0.92-cp37-abi3-win_amd64.whl", hash = "sha256:77a82fe35ddcec0f62c15f2ba8a12ecc2ed4207c17b0902c7a3151ae29f37fb6", size = 40070414, upload-time = "2026-02-05T07:02:26.448Z" },
]
[[package]]
name = "opentelemetry-api"
version = "1.39.1"
@@ -5103,6 +5257,27 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/58/5b/632a58724221ef03d78ab65062e82a1010e1bef8e8e0b9d7c6d7b8044841/safetensors-0.7.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:473b32699f4200e69801bf5abf93f1a4ecd432a70984df164fc22ccf39c4a6f3", size = 531885, upload-time = "2025-11-19T15:18:27.146Z" },
]
[[package]]
name = "scenedetect"
version = "0.6.7"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click" },
{ name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
{ name = "numpy", version = "2.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
{ name = "platformdirs" },
{ name = "tqdm" },
]
sdist = { url = "https://files.pythonhosted.org/packages/bd/b1/800d4c1d4da24cd673b921c0b5ffd5bbdcaa2a7f4f4dd86dd2c202a673c6/scenedetect-0.6.7.tar.gz", hash = "sha256:1a2c73b57de2e1656f7896edc8523de7217f361179a8966e947f79d33e40830f", size = 164213, upload-time = "2025-08-25T03:37:24.124Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e7/e9/05a20eaeed21d2e0761fc4d3819f1f5013a49945133323ba0ce7be8be291/scenedetect-0.6.7-py3-none-any.whl", hash = "sha256:935571453142f5d7d44a8d9bb713fdd89bdb69efdbce92c7dfe09d52c523ac2b", size = 130834, upload-time = "2025-08-25T03:37:22.8Z" },
]
[package.optional-dependencies]
opencv = [
{ name = "opencv-python" },
]
[[package]]
name = "schedule"
version = "1.2.2"
@@ -5422,7 +5597,6 @@ dependencies = [
{ name = "pygithub" },
{ name = "pygments" },
{ name = "pymupdf" },
{ name = "pytesseract" },
{ name = "python-dotenv" },
{ name = "pyyaml" },
{ name = "requests" },
@@ -5453,6 +5627,8 @@ all = [
{ name = "uvicorn" },
{ name = "voyageai" },
{ name = "weaviate-client" },
{ name = "youtube-transcript-api" },
{ name = "yt-dlp" },
]
all-cloud = [
{ name = "azure-storage-blob" },
@@ -5513,6 +5689,18 @@ s3 = [
sentence-transformers = [
{ name = "sentence-transformers" },
]
video = [
{ name = "youtube-transcript-api" },
{ name = "yt-dlp" },
]
video-full = [
{ name = "faster-whisper" },
{ name = "opencv-python-headless" },
{ name = "pytesseract" },
{ name = "scenedetect", extra = ["opencv"] },
{ name = "youtube-transcript-api" },
{ name = "yt-dlp" },
]
weaviate = [
{ name = "weaviate-client" },
]
@@ -5551,6 +5739,7 @@ requires-dist = [
{ name = "click", specifier = ">=8.3.0" },
{ name = "fastapi", marker = "extra == 'all'", specifier = ">=0.109.0" },
{ name = "fastapi", marker = "extra == 'embedding'", specifier = ">=0.109.0" },
{ name = "faster-whisper", marker = "extra == 'video-full'", specifier = ">=1.0.0" },
{ name = "gitpython", specifier = ">=3.1.40" },
{ name = "google-cloud-storage", marker = "extra == 'all'", specifier = ">=2.10.0" },
{ name = "google-cloud-storage", marker = "extra == 'all-cloud'", specifier = ">=2.10.0" },
@@ -5576,6 +5765,7 @@ requires-dist = [
{ name = "openai", marker = "extra == 'all'", specifier = ">=1.0.0" },
{ name = "openai", marker = "extra == 'all-llms'", specifier = ">=1.0.0" },
{ name = "openai", marker = "extra == 'openai'", specifier = ">=1.0.0" },
{ name = "opencv-python-headless", marker = "extra == 'video-full'", specifier = ">=4.9.0" },
{ name = "pathspec", specifier = ">=0.12.1" },
{ name = "pillow", specifier = ">=11.0.0" },
{ name = "pinecone", marker = "extra == 'all'", specifier = ">=5.0.0" },
@@ -5586,12 +5776,13 @@ requires-dist = [
{ name = "pygithub", specifier = ">=2.5.0" },
{ name = "pygments", specifier = ">=2.19.2" },
{ name = "pymupdf", specifier = ">=1.24.14" },
{ name = "pytesseract", specifier = ">=0.3.13" },
{ name = "pytesseract", marker = "extra == 'video-full'", specifier = ">=0.3.13" },
{ name = "python-docx", marker = "extra == 'all'", specifier = ">=1.1.0" },
{ name = "python-docx", marker = "extra == 'docx'", specifier = ">=1.1.0" },
{ name = "python-dotenv", specifier = ">=1.1.1" },
{ name = "pyyaml", specifier = ">=6.0" },
{ name = "requests", specifier = ">=2.32.5" },
{ name = "scenedetect", extras = ["opencv"], marker = "extra == 'video-full'", specifier = ">=0.6.4" },
{ name = "schedule", specifier = ">=1.2.0" },
{ name = "sentence-transformers", marker = "extra == 'all'", specifier = ">=2.3.0" },
{ name = "sentence-transformers", marker = "extra == 'embedding'", specifier = ">=2.3.0" },
@@ -5610,8 +5801,14 @@ requires-dist = [
{ name = "weaviate-client", marker = "extra == 'all'", specifier = ">=3.25.0" },
{ name = "weaviate-client", marker = "extra == 'rag-upload'", specifier = ">=3.25.0" },
{ name = "weaviate-client", marker = "extra == 'weaviate'", specifier = ">=3.25.0" },
{ name = "youtube-transcript-api", marker = "extra == 'all'", specifier = ">=1.2.0" },
{ name = "youtube-transcript-api", marker = "extra == 'video'", specifier = ">=1.2.0" },
{ name = "youtube-transcript-api", marker = "extra == 'video-full'", specifier = ">=1.2.0" },
{ name = "yt-dlp", marker = "extra == 'all'", specifier = ">=2024.12.0" },
{ name = "yt-dlp", marker = "extra == 'video'", specifier = ">=2024.12.0" },
{ name = "yt-dlp", marker = "extra == 'video-full'", specifier = ">=2024.12.0" },
]
provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "embedding", "all"]
provides-extras = ["mcp", "gemini", "openai", "all-llms", "s3", "gcs", "azure", "docx", "video", "video-full", "chroma", "weaviate", "sentence-transformers", "pinecone", "rag-upload", "all-cloud", "embedding", "all"]
[package.metadata.requires-dev]
dev = [
@@ -6774,6 +6971,28 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/73/ae/b48f95715333080afb75a4504487cbe142cae1268afc482d06692d605ae6/yarl-1.22.0-py3-none-any.whl", hash = "sha256:1380560bdba02b6b6c90de54133c81c9f2a453dee9912fe58c1dcced1edb7cff", size = 46814, upload-time = "2025-10-06T14:12:53.872Z" },
]
[[package]]
name = "youtube-transcript-api"
version = "1.2.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "defusedxml" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/60/43/4104185a2eaa839daa693b30e15c37e7e58795e8e09ec414f22b3db54bec/youtube_transcript_api-1.2.4.tar.gz", hash = "sha256:b72d0e96a335df599d67cee51d49e143cff4f45b84bcafc202ff51291603ddcd", size = 469839, upload-time = "2026-01-29T09:09:17.088Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/be/95/129ea37efd6cd6ed00f62baae6543345c677810b8a3bf0026756e1d3cf3c/youtube_transcript_api-1.2.4-py3-none-any.whl", hash = "sha256:03878759356da5caf5edac77431780b91448fb3d8c21d4496015bdc8a7bc43ff", size = 485227, upload-time = "2026-01-29T09:09:15.427Z" },
]
[[package]]
name = "yt-dlp"
version = "2026.2.21"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/58/d9/55ffff25204733e94a507552ad984d5a8a8e4f9d1f0d91763e6b1a41c79b/yt_dlp-2026.2.21.tar.gz", hash = "sha256:4407dfc1a71fec0dee5ef916a8d4b66057812939b509ae45451fa8fb4376b539", size = 3116630, upload-time = "2026-02-21T20:40:53.522Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5a/40/664c99ee36d80d84ce7a96cd98aebcb3d16c19e6c3ad3461d2cf5424040e/yt_dlp-2026.2.21-py3-none-any.whl", hash = "sha256:0d8408f5b6d20487f5caeb946dfd04f9bcd2f1a3a125b744a0a982b590e449f7", size = 3313392, upload-time = "2026-02-21T20:40:51.514Z" },
]
[[package]]
name = "zipp"
version = "3.23.0"