# Video Source — Dependencies & System Requirements **Date:** February 27, 2026 **Document:** 07 of 07 **Status:** Planning > **Status: IMPLEMENTED** — `skill-seekers video --setup` (see `video_setup.py`, 835 lines, 60 tests) > - GPU auto-detection: NVIDIA (nvidia-smi/CUDA), AMD (rocminfo/ROCm), CPU fallback > - Correct PyTorch index URL selection per GPU vendor > - EasyOCR removed from pip extras, installed at runtime via --setup > - ROCm configuration (MIOPEN_FIND_MODE, HSA_OVERRIDE_GFX_VERSION) > - Virtual environment detection with --force override > - System dependency checks (tesseract, ffmpeg) > - Non-interactive mode for MCP/CI usage --- ## Table of Contents 1. [Dependency Tiers](#dependency-tiers) 2. [pyproject.toml Changes](#pyprojecttoml-changes) 3. [System Requirements](#system-requirements) 4. [Import Guards](#import-guards) 5. [Dependency Check Command](#dependency-check-command) 6. [Model Management](#model-management) 7. [Docker Considerations](#docker-considerations) --- ## Dependency Tiers Video processing has two tiers to keep the base install lightweight: ### Tier 1: `[video]` — Lightweight (YouTube transcripts + metadata) **Use case:** YouTube videos with existing captions. No download, no GPU needed. | Package | Version | Size | Purpose | |---------|---------|------|---------| | `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata extraction, audio download | | `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube caption extraction | **Capabilities:** - YouTube metadata (title, chapters, tags, description, engagement) - YouTube captions (manual and auto-generated) - Vimeo metadata - Playlist and channel resolution - Subtitle file parsing (SRT, VTT) - Segmentation and alignment - Full output generation **NOT included:** - Speech-to-text (Whisper) - Visual extraction (frame + OCR) - Local video file transcription (without subtitles) ### Tier 2: `[video-full]` — Full (adds Whisper + visual extraction) **Use case:** Local videos without subtitles, or when you want code/slide extraction from screen. | Package | Version | Size | Purpose | |---------|---------|------|---------| | `yt-dlp` | `>=2024.12.0` | ~15MB | Metadata + audio download | | `youtube-transcript-api` | `>=1.2.0` | ~50KB | YouTube captions | | `faster-whisper` | `>=1.0.0` | ~5MB (+ models: 75MB-3GB) | Speech-to-text | | `scenedetect[opencv]` | `>=0.6.4` | ~50MB (includes OpenCV) | Scene boundary detection | | `easyocr` | `>=1.7.0` | ~150MB (+ models: ~200MB) | Text recognition from frames | | `opencv-python-headless` | `>=4.9.0` | ~50MB | Frame extraction, image processing | **Additional capabilities over Tier 1:** - Whisper speech-to-text (99 languages, word-level timestamps) - Scene detection (find visual transitions) - Keyframe extraction (save important frames) - Frame classification (code/slide/terminal/diagram) - OCR on frames (extract code and text from screen) - Code block detection from video **Total install size:** - Tier 1: ~15MB - Tier 2: ~270MB + models (~300MB-3.2GB depending on Whisper model) --- ## pyproject.toml Changes ```toml [project.optional-dependencies] # Existing dependencies... gemini = ["google-generativeai>=0.8.0"] openai = ["openai>=1.0.0"] all-llms = ["google-generativeai>=0.8.0", "openai>=1.0.0"] # NEW: Video processing video = [ "yt-dlp>=2024.12.0", "youtube-transcript-api>=1.2.0", ] video-full = [ "yt-dlp>=2024.12.0", "youtube-transcript-api>=1.2.0", "faster-whisper>=1.0.0", "scenedetect[opencv]>=0.6.4", "easyocr>=1.7.0", "opencv-python-headless>=4.9.0", ] # Update 'all' to include video all = [ # ... existing all dependencies ... "yt-dlp>=2024.12.0", "youtube-transcript-api>=1.2.0", "faster-whisper>=1.0.0", "scenedetect[opencv]>=0.6.4", "easyocr>=1.7.0", "opencv-python-headless>=4.9.0", ] [project.scripts] # ... existing entry points ... skill-seekers-video = "skill_seekers.cli.video_scraper:main" # NEW ``` ### Installation Commands ```bash # Lightweight video (YouTube transcripts + metadata) pip install skill-seekers[video] # Full video (+ Whisper + visual extraction) pip install skill-seekers[video-full] # Everything pip install skill-seekers[all] # Development (editable) pip install -e ".[video]" pip install -e ".[video-full]" ``` --- ## System Requirements ### Tier 1 (Lightweight) | Requirement | Needed For | How to Check | |-------------|-----------|-------------| | Python 3.10+ | All | `python --version` | | Internet connection | YouTube API calls | N/A | No additional system dependencies. Pure Python. ### Tier 2 (Full) | Requirement | Needed For | How to Check | Install | |-------------|-----------|-------------|---------| | Python 3.10+ | All | `python --version` | — | | FFmpeg | Audio extraction, video processing | `ffmpeg -version` | See below | | GPU (optional) | Whisper + easyocr acceleration | `nvidia-smi` (NVIDIA) | CUDA toolkit | ### FFmpeg Installation FFmpeg is required for: - Extracting audio from video files (Whisper input) - Downloading audio-only streams (yt-dlp post-processing) - Converting between audio formats ```bash # macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows (winget) winget install ffmpeg # Windows (choco) choco install ffmpeg # Verify ffmpeg -version ``` ### GPU Support (Optional) GPU accelerates Whisper (~4x) and easyocr (~5x) but is not required. **NVIDIA GPU (CUDA):** ```bash # Check CUDA availability python -c "import torch; print(torch.cuda.is_available())" # faster-whisper uses CTranslate2 which auto-detects CUDA # easyocr uses PyTorch which auto-detects CUDA # No additional setup needed if PyTorch CUDA is working ``` **Apple Silicon (MPS):** ```bash # faster-whisper does not support MPS directly # Falls back to CPU on Apple Silicon # easyocr has partial MPS support ``` **CPU-only (no GPU):** ```bash # Everything works on CPU, just slower # Whisper base model: ~4x slower on CPU vs GPU # easyocr: ~5x slower on CPU vs GPU # For short videos (<10 min), CPU is fine ``` --- ## Import Guards All video dependencies use try/except import guards to provide clear error messages: ### video_scraper.py ```python """Video scraper - main orchestrator.""" # Core dependencies (always available) import json import logging import os from pathlib import Path # Tier 1: Video basics try: from yt_dlp import YoutubeDL HAS_YTDLP = True except ImportError: HAS_YTDLP = False try: from youtube_transcript_api import YouTubeTranscriptApi HAS_YT_TRANSCRIPT = True except ImportError: HAS_YT_TRANSCRIPT = False # Feature availability check def check_video_dependencies(require_full: bool = False) -> None: """Check that video dependencies are installed. Args: require_full: If True, check for full dependencies (Whisper, OCR) Raises: ImportError: With installation instructions """ missing = [] if not HAS_YTDLP: missing.append("yt-dlp") if not HAS_YT_TRANSCRIPT: missing.append("youtube-transcript-api") if missing: raise ImportError( f"Video processing requires: {', '.join(missing)}\n" f"Install with: pip install skill-seekers[video]" ) if require_full: full_missing = [] try: import faster_whisper except ImportError: full_missing.append("faster-whisper") try: import cv2 except ImportError: full_missing.append("opencv-python-headless") try: import scenedetect except ImportError: full_missing.append("scenedetect[opencv]") try: import easyocr except ImportError: full_missing.append("easyocr") if full_missing: raise ImportError( f"Visual extraction requires: {', '.join(full_missing)}\n" f"Install with: pip install skill-seekers[video-full]" ) ``` ### video_transcript.py ```python """Transcript extraction module.""" # YouTube transcript (Tier 1) try: from youtube_transcript_api import YouTubeTranscriptApi HAS_YT_TRANSCRIPT = True except ImportError: HAS_YT_TRANSCRIPT = False # Whisper (Tier 2) try: from faster_whisper import WhisperModel HAS_WHISPER = True except ImportError: HAS_WHISPER = False def get_transcript(video_info, config): """Get transcript using best available method.""" # Try YouTube captions first (Tier 1) if HAS_YT_TRANSCRIPT and video_info.source_type == VideoSourceType.YOUTUBE: try: return extract_youtube_transcript(video_info.video_id, config.languages) except TranscriptNotAvailable: pass # Try Whisper fallback (Tier 2) if HAS_WHISPER: return transcribe_with_whisper(video_info, config) # No transcript possible if not HAS_WHISPER: logger.warning( f"No transcript for {video_info.video_id}. " "Install faster-whisper for speech-to-text: " "pip install skill-seekers[video-full]" ) return [], TranscriptSource.NONE ``` ### video_visual.py ```python """Visual extraction module.""" try: import cv2 HAS_OPENCV = True except ImportError: HAS_OPENCV = False try: from scenedetect import detect, ContentDetector HAS_SCENEDETECT = True except ImportError: HAS_SCENEDETECT = False try: import easyocr HAS_EASYOCR = True except ImportError: HAS_EASYOCR = False def check_visual_dependencies() -> None: """Check visual extraction dependencies.""" missing = [] if not HAS_OPENCV: missing.append("opencv-python-headless") if not HAS_SCENEDETECT: missing.append("scenedetect[opencv]") if not HAS_EASYOCR: missing.append("easyocr") if missing: raise ImportError( f"Visual extraction requires: {', '.join(missing)}\n" f"Install with: pip install skill-seekers[video-full]" ) def check_ffmpeg() -> bool: """Check if FFmpeg is available.""" import shutil return shutil.which('ffmpeg') is not None ``` --- ## Dependency Check Command Add a dependency check to the `config` command: ```bash # Check all video dependencies skill-seekers config --check-video # Output: # Video Dependencies: # yt-dlp ✅ 2025.01.15 # youtube-transcript-api ✅ 1.2.3 # faster-whisper ❌ Not installed (pip install skill-seekers[video-full]) # opencv-python-headless ❌ Not installed # scenedetect ❌ Not installed # easyocr ❌ Not installed # # System Dependencies: # FFmpeg ✅ 6.1.1 # GPU (CUDA) ❌ Not available (CPU mode will be used) # # Available modes: # Transcript only ✅ YouTube captions available # Whisper fallback ❌ Install faster-whisper # Visual extraction ❌ Install video-full dependencies ``` --- ## Model Management ### Whisper Models Whisper models are downloaded on first use and cached in the user's home directory. | Model | Download Size | Disk Size | First-Use Download Time | |-------|-------------|-----------|------------------------| | tiny | 75 MB | 75 MB | ~15s | | base | 142 MB | 142 MB | ~25s | | small | 466 MB | 466 MB | ~60s | | medium | 1.5 GB | 1.5 GB | ~3 min | | large-v3 | 3.1 GB | 3.1 GB | ~5 min | | large-v3-turbo | 1.6 GB | 1.6 GB | ~3 min | **Cache location:** `~/.cache/huggingface/hub/` (CTranslate2 models) **Pre-download command:** ```bash # Pre-download a model before using it python -c "from faster_whisper import WhisperModel; WhisperModel('base')" ``` ### easyocr Models easyocr models are also downloaded on first use. | Language Pack | Download Size | Disk Size | |-------------|-------------|-----------| | English | ~100 MB | ~100 MB | | + Additional language | ~50-100 MB each | ~50-100 MB each | **Cache location:** `~/.EasyOCR/model/` **Pre-download command:** ```bash # Pre-download English OCR model python -c "import easyocr; easyocr.Reader(['en'])" ``` --- ## Docker Considerations ### Dockerfile additions for video support ```dockerfile # Tier 1 (lightweight) RUN pip install skill-seekers[video] # Tier 2 (full) RUN apt-get update && apt-get install -y ffmpeg RUN pip install skill-seekers[video-full] # Pre-download Whisper model (avoids first-run download) RUN python -c "from faster_whisper import WhisperModel; WhisperModel('base')" # Pre-download easyocr model RUN python -c "import easyocr; easyocr.Reader(['en'])" ``` ### Docker image sizes | Tier | Base Image Size | Additional Size | Total | |------|----------------|----------------|-------| | Tier 1 (video) | ~300 MB | ~20 MB | ~320 MB | | Tier 2 (video-full, CPU) | ~300 MB | ~800 MB | ~1.1 GB | | Tier 2 (video-full, GPU) | ~5 GB (CUDA base) | ~800 MB | ~5.8 GB | ### Kubernetes resource recommendations ```yaml # Tier 1 (transcript only) resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" cpu: "1000m" # Tier 2 (full, CPU) resources: requests: memory: "2Gi" cpu: "2000m" limits: memory: "4Gi" cpu: "4000m" # Tier 2 (full, GPU) resources: requests: memory: "4Gi" cpu: "2000m" nvidia.com/gpu: 1 limits: memory: "8Gi" cpu: "4000m" nvidia.com/gpu: 1 ```