Add complete video tutorial extraction system that converts YouTube videos and local video files into AI-consumable skills. The pipeline extracts transcripts, performs visual OCR on code editor panels independently, tracks code evolution across frames, and generates structured SKILL.md output. Key features: - Video metadata extraction (YouTube, local files, playlists) - Multi-source transcript extraction (YouTube API, yt-dlp, Whisper fallback) - Chapter-based and time-window segmentation - Visual extraction: keyframe detection, frame classification, panel detection - Per-panel sub-section OCR (each IDE panel OCR'd independently) - Parallel OCR with ThreadPoolExecutor for multi-panel frames - Narrow panel filtering (300px min width) to skip UI chrome - Text block tracking with spatial panel position matching - Code timeline with edit tracking across frames - Audio-visual alignment (code + narrator pairs) - Video-specific AI enhancement prompt for OCR denoising and code reconstruction - video-tutorial.yaml workflow with 4 stages (OCR cleanup, language detection, tutorial synthesis, skill polish) - CLI integration: skill-seekers video --url/--video-file/--playlist - MCP tool: scrape_video for automation - 161 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 KiB
24 KiB
Video Source — Testing Strategy
Date: February 27, 2026 Document: 06 of 07 Status: Planning
Table of Contents
- Testing Principles
- Test File Structure
- Fixtures & Mock Data
- Unit Tests
- Integration Tests
- E2E Tests
- CI Considerations
- Performance Tests
Testing Principles
- No network calls in unit tests — All YouTube API, yt-dlp, and download operations must be mocked.
- No GPU required in CI — All Whisper and easyocr tests must work on CPU, or be marked
@pytest.mark.slow. - No video files in repo — Test fixtures use JSON transcripts and small synthetic images, not actual video files.
- 100% pipeline coverage — Every phase of the 6-phase pipeline must be tested.
- Edge case focus — Test missing chapters, empty transcripts, corrupt frames, rate limits.
- Compatible with existing test infra — Use existing conftest.py, markers, and patterns.
Test File Structure
tests/
├── test_video_models.py # Data model tests (serialization, validation)
├── test_video_scraper.py # Main scraper orchestration tests
├── test_video_transcript.py # Transcript extraction tests
├── test_video_visual.py # Visual extraction tests
├── test_video_segmenter.py # Segmentation and alignment tests
├── test_video_integration.py # Integration with unified scraper, create command
├── test_video_output.py # Output generation tests
├── test_video_source_detector.py # Source detection tests (or add to existing)
├── fixtures/
│ └── video/
│ ├── sample_metadata.json # yt-dlp info_dict mock
│ ├── sample_transcript.json # YouTube transcript mock
│ ├── sample_whisper_output.json # Whisper transcription mock
│ ├── sample_chapters.json # Chapter data mock
│ ├── sample_playlist.json # Playlist metadata mock
│ ├── sample_segments.json # Pre-aligned segments
│ ├── sample_frame_code.png # 100x100 synthetic dark frame
│ ├── sample_frame_slide.png # 100x100 synthetic light frame
│ ├── sample_frame_diagram.png # 100x100 synthetic edge-heavy frame
│ ├── sample_srt.srt # SRT subtitle file
│ ├── sample_vtt.vtt # WebVTT subtitle file
│ └── sample_config.json # Video source config
Fixtures & Mock Data
yt-dlp Metadata Fixture
# tests/fixtures/video/sample_metadata.json
SAMPLE_YTDLP_METADATA = {
"id": "abc123def45",
"title": "React Hooks Tutorial for Beginners",
"description": "Learn React Hooks from scratch. Covers useState, useEffect, and custom hooks.",
"duration": 1832,
"upload_date": "20260115",
"uploader": "React Official",
"uploader_url": "https://www.youtube.com/@reactofficial",
"channel_follower_count": 250000,
"view_count": 1500000,
"like_count": 45000,
"comment_count": 2300,
"tags": ["react", "hooks", "tutorial", "javascript"],
"categories": ["Education"],
"language": "en",
"thumbnail": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
"webpage_url": "https://www.youtube.com/watch?v=abc123def45",
"chapters": [
{"title": "Intro", "start_time": 0, "end_time": 45},
{"title": "Project Setup", "start_time": 45, "end_time": 180},
{"title": "useState Hook", "start_time": 180, "end_time": 540},
{"title": "useEffect Hook", "start_time": 540, "end_time": 900},
{"title": "Custom Hooks", "start_time": 900, "end_time": 1320},
{"title": "Best Practices", "start_time": 1320, "end_time": 1680},
{"title": "Wrap Up", "start_time": 1680, "end_time": 1832},
],
"subtitles": {
"en": [{"ext": "vtt", "url": "https://..."}],
},
"automatic_captions": {
"en": [{"ext": "vtt", "url": "https://..."}],
},
"extractor": "youtube",
}
YouTube Transcript Fixture
SAMPLE_YOUTUBE_TRANSCRIPT = [
{"text": "Welcome to this React Hooks tutorial.", "start": 0.0, "duration": 2.5},
{"text": "Today we'll learn about the most important hooks.", "start": 2.5, "duration": 3.0},
{"text": "Let's start by setting up our project.", "start": 45.0, "duration": 2.8},
{"text": "We'll use Create React App.", "start": 47.8, "duration": 2.0},
{"text": "Run npx create-react-app hooks-demo.", "start": 49.8, "duration": 3.5},
# ... more segments covering all chapters
]
Whisper Output Fixture
SAMPLE_WHISPER_OUTPUT = {
"language": "en",
"language_probability": 0.98,
"duration": 1832.0,
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Welcome to this React Hooks tutorial.",
"avg_logprob": -0.15,
"no_speech_prob": 0.01,
"words": [
{"word": "Welcome", "start": 0.0, "end": 0.4, "probability": 0.97},
{"word": "to", "start": 0.4, "end": 0.5, "probability": 0.99},
{"word": "this", "start": 0.5, "end": 0.7, "probability": 0.98},
{"word": "React", "start": 0.7, "end": 1.1, "probability": 0.95},
{"word": "Hooks", "start": 1.1, "end": 1.5, "probability": 0.93},
{"word": "tutorial.", "start": 1.5, "end": 2.3, "probability": 0.96},
],
},
],
}
Synthetic Frame Fixtures
# Generate in conftest.py or fixture setup
import numpy as np
import cv2
def create_dark_frame(path: str):
"""Create a synthetic dark frame (simulates code editor)."""
img = np.zeros((1080, 1920, 3), dtype=np.uint8)
img[200:250, 100:800] = [200, 200, 200] # Simulated text line
img[270:320, 100:600] = [180, 180, 180] # Another text line
cv2.imwrite(path, img)
def create_light_frame(path: str):
"""Create a synthetic light frame (simulates slide)."""
img = np.ones((1080, 1920, 3), dtype=np.uint8) * 240
img[100:150, 200:1000] = [40, 40, 40] # Title text
img[300:330, 200:1200] = [60, 60, 60] # Body text
cv2.imwrite(path, img)
conftest.py Additions
# tests/conftest.py — add video fixtures
import pytest
import json
from pathlib import Path
FIXTURES_DIR = Path(__file__).parent / "fixtures" / "video"
@pytest.fixture
def sample_ytdlp_metadata():
"""Load sample yt-dlp metadata."""
with open(FIXTURES_DIR / "sample_metadata.json") as f:
return json.load(f)
@pytest.fixture
def sample_transcript():
"""Load sample YouTube transcript."""
with open(FIXTURES_DIR / "sample_transcript.json") as f:
return json.load(f)
@pytest.fixture
def sample_whisper_output():
"""Load sample Whisper transcription output."""
with open(FIXTURES_DIR / "sample_whisper_output.json") as f:
return json.load(f)
@pytest.fixture
def sample_chapters():
"""Load sample chapter data."""
with open(FIXTURES_DIR / "sample_chapters.json") as f:
return json.load(f)
@pytest.fixture
def sample_video_config():
"""Create a sample VideoSourceConfig."""
from skill_seekers.cli.video_models import VideoSourceConfig
return VideoSourceConfig(
url="https://www.youtube.com/watch?v=abc123def45",
name="test_video",
visual_extraction=False,
max_videos=5,
)
@pytest.fixture
def video_output_dir(tmp_path):
"""Create a temporary output directory for video tests."""
output = tmp_path / "output" / "test_video"
output.mkdir(parents=True)
(output / "video_data").mkdir()
(output / "video_data" / "transcripts").mkdir()
(output / "video_data" / "segments").mkdir()
(output / "video_data" / "frames").mkdir()
(output / "references").mkdir()
(output / "pages").mkdir()
return output
Unit Tests
test_video_models.py
"""Tests for video data models and serialization."""
class TestVideoInfo:
def test_create_from_ytdlp_metadata(self, sample_ytdlp_metadata):
"""VideoInfo correctly parses yt-dlp info_dict."""
...
def test_serialization_round_trip(self):
"""VideoInfo serializes to dict and deserializes back identically."""
...
def test_content_richness_score(self):
"""Content richness score computed correctly based on signals."""
...
def test_empty_chapters(self):
"""VideoInfo handles video with no chapters."""
...
class TestVideoSegment:
def test_timestamp_display(self):
"""Timestamp display formats correctly (MM:SS - MM:SS)."""
...
def test_youtube_timestamp_url(self):
"""YouTube timestamp URL generated correctly."""
...
def test_segment_with_code_blocks(self):
"""Segment correctly tracks detected code blocks."""
...
def test_segment_without_visual(self):
"""Segment works when visual extraction is disabled."""
...
class TestChapter:
def test_chapter_duration(self):
"""Chapter duration computed correctly."""
...
def test_chapter_serialization(self):
"""Chapter serializes to/from dict."""
...
class TestTranscriptSegment:
def test_from_youtube_api(self):
"""TranscriptSegment created from YouTube API format."""
...
def test_from_whisper_output(self):
"""TranscriptSegment created from Whisper output."""
...
def test_with_word_timestamps(self):
"""TranscriptSegment preserves word-level timestamps."""
...
class TestVideoSourceConfig:
def test_validate_single_source(self):
"""Config requires exactly one source field."""
...
def test_validate_duration_range(self):
"""Config validates min < max duration."""
...
def test_defaults(self):
"""Config has sensible defaults."""
...
def test_from_unified_config(self, sample_video_config):
"""Config created from unified config JSON entry."""
...
class TestEnums:
def test_all_video_source_types(self):
"""All VideoSourceType values are valid."""
...
def test_all_frame_types(self):
"""All FrameType values are valid."""
...
def test_all_transcript_sources(self):
"""All TranscriptSource values are valid."""
...
test_video_transcript.py
"""Tests for transcript extraction (YouTube API + Whisper + subtitle parsing)."""
class TestYouTubeTranscript:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_extract_manual_captions(self, mock_api, sample_transcript):
"""Prefers manual captions over auto-generated."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_fallback_to_auto_generated(self, mock_api):
"""Falls back to auto-generated when manual not available."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_fallback_to_translation(self, mock_api):
"""Falls back to translated captions when preferred language unavailable."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_no_transcript_available(self, mock_api):
"""Raises TranscriptNotAvailable when no captions exist."""
...
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
def test_confidence_scoring(self, mock_api, sample_transcript):
"""Manual captions get 1.0 confidence, auto-generated get 0.8."""
...
class TestWhisperTranscription:
@pytest.mark.slow
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_transcribe_with_word_timestamps(self, mock_model):
"""Whisper returns word-level timestamps."""
...
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_language_detection(self, mock_model):
"""Whisper detects video language."""
...
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_vad_filtering(self, mock_model):
"""VAD filter removes silence segments."""
...
def test_download_audio_only(self):
"""Audio extraction downloads audio stream only (not video)."""
# Mock yt-dlp download
...
class TestSubtitleParsing:
def test_parse_srt(self, tmp_path):
"""Parse SRT subtitle file into segments."""
srt_content = "1\n00:00:01,500 --> 00:00:04,000\nHello world\n\n2\n00:00:05,000 --> 00:00:08,000\nSecond line\n"
srt_file = tmp_path / "test.srt"
srt_file.write_text(srt_content)
...
def test_parse_vtt(self, tmp_path):
"""Parse WebVTT subtitle file into segments."""
vtt_content = "WEBVTT\n\n00:00:01.500 --> 00:00:04.000\nHello world\n\n00:00:05.000 --> 00:00:08.000\nSecond line\n"
vtt_file = tmp_path / "test.vtt"
vtt_file.write_text(vtt_content)
...
def test_srt_html_tag_removal(self, tmp_path):
"""SRT parser removes inline HTML tags."""
...
def test_empty_subtitle_file(self, tmp_path):
"""Handle empty subtitle file gracefully."""
...
class TestTranscriptFallbackChain:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_transcript.WhisperModel')
def test_youtube_then_whisper_fallback(self, mock_whisper, mock_yt_api):
"""Falls back to Whisper when YouTube captions fail."""
...
def test_subtitle_file_discovery(self, tmp_path):
"""Discovers sidecar subtitle files for local videos."""
...
test_video_visual.py
"""Tests for visual extraction (scene detection, frame extraction, OCR)."""
class TestFrameClassification:
def test_classify_dark_frame_as_code(self, tmp_path):
"""Dark frame with text patterns classified as code_editor."""
...
def test_classify_light_frame_as_slide(self, tmp_path):
"""Light uniform frame classified as slide."""
...
def test_classify_high_edge_as_diagram(self, tmp_path):
"""High edge density frame classified as diagram."""
...
def test_classify_blank_frame_as_other(self, tmp_path):
"""Nearly blank frame classified as other."""
...
class TestKeyframeTimestamps:
def test_chapter_boundaries_included(self, sample_chapters):
"""Keyframe timestamps include chapter start times."""
...
def test_long_chapter_midpoint(self, sample_chapters):
"""Long chapters (>2 min) get midpoint keyframe."""
...
def test_deduplication_within_1_second(self):
"""Timestamps within 1 second are deduplicated."""
...
def test_regular_intervals_fill_gaps(self):
"""Regular interval timestamps fill gaps between scenes."""
...
class TestOCRExtraction:
@pytest.mark.slow
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_extract_text_from_code_frame(self, mock_reader, tmp_path):
"""OCR extracts text from code editor frame."""
...
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_confidence_filtering(self, mock_reader):
"""Low-confidence OCR results are filtered out."""
...
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
def test_monospace_detection(self, mock_reader):
"""Monospace text regions correctly detected."""
...
class TestCodeBlockDetection:
def test_detect_python_code(self):
"""Detect Python code from OCR text."""
...
def test_detect_terminal_commands(self):
"""Detect terminal commands from OCR text."""
...
def test_language_detection_from_ocr(self):
"""Language detection works on OCR-extracted code."""
...
test_video_segmenter.py
"""Tests for segmentation and stream alignment."""
class TestChapterSegmentation:
def test_chapters_create_segments(self, sample_chapters):
"""Chapters map directly to segments."""
...
def test_long_chapter_splitting(self):
"""Chapters exceeding max_segment_duration are split."""
...
def test_empty_chapters(self):
"""Falls back to time window when no chapters."""
...
class TestTimeWindowSegmentation:
def test_fixed_windows(self):
"""Creates segments at fixed intervals."""
...
def test_sentence_boundary_alignment(self):
"""Segments split at sentence boundaries, not mid-word."""
...
def test_configurable_window_size(self):
"""Window size respects config.time_window_seconds."""
...
class TestStreamAlignment:
def test_align_transcript_to_segments(self, sample_transcript, sample_chapters):
"""Transcript segments mapped to correct time windows."""
...
def test_align_keyframes_to_segments(self):
"""Keyframes mapped to correct segments by timestamp."""
...
def test_partial_overlap_handling(self):
"""Transcript segments partially overlapping window boundaries."""
...
def test_empty_segment_handling(self):
"""Handle segments with no transcript (silence, music)."""
...
class TestContentMerging:
def test_transcript_only_content(self):
"""Content is just transcript when no visual data."""
...
def test_code_block_appended(self):
"""Code on screen is appended to transcript content."""
...
def test_duplicate_code_not_repeated(self):
"""Code mentioned in transcript is not duplicated from OCR."""
...
def test_chapter_title_as_heading(self):
"""Chapter title becomes markdown heading in content."""
...
def test_slide_text_supplementary(self):
"""Slide text adds to content when not in transcript."""
...
class TestCategorization:
def test_category_from_chapter_title(self):
"""Category inferred from chapter title keywords."""
...
def test_category_from_transcript(self):
"""Category inferred from transcript content."""
...
def test_custom_categories_from_config(self):
"""Custom category keywords from config used."""
...
Integration Tests
test_video_integration.py
"""Integration tests for video pipeline end-to-end."""
class TestSourceDetectorVideo:
def test_detect_youtube_video(self):
info = SourceDetector.detect("https://youtube.com/watch?v=abc123def45")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_video"
def test_detect_youtube_short_url(self):
info = SourceDetector.detect("https://youtu.be/abc123def45")
assert info.type == "video"
def test_detect_youtube_playlist(self):
info = SourceDetector.detect("https://youtube.com/playlist?list=PLxxx")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_playlist"
def test_detect_youtube_channel(self):
info = SourceDetector.detect("https://youtube.com/@reactofficial")
assert info.type == "video"
assert info.parsed["video_source"] == "youtube_channel"
def test_detect_vimeo(self):
info = SourceDetector.detect("https://vimeo.com/123456789")
assert info.type == "video"
assert info.parsed["video_source"] == "vimeo"
def test_detect_mp4_file(self, tmp_path):
f = tmp_path / "tutorial.mp4"
f.touch()
info = SourceDetector.detect(str(f))
assert info.type == "video"
assert info.parsed["video_source"] == "local_file"
def test_detect_video_directory(self, tmp_path):
d = tmp_path / "videos"
d.mkdir()
(d / "vid1.mp4").touch()
(d / "vid2.mkv").touch()
info = SourceDetector.detect(str(d))
assert info.type == "video"
def test_youtube_not_confused_with_web(self):
"""YouTube URLs detected as video, not web."""
info = SourceDetector.detect("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
assert info.type == "video"
assert info.type != "web"
class TestUnifiedConfigVideo:
def test_video_source_in_config(self, tmp_path):
"""Video source parsed correctly from unified config."""
...
def test_multiple_video_sources(self, tmp_path):
"""Multiple video sources in same config."""
...
def test_video_alongside_docs(self, tmp_path):
"""Video source alongside documentation source."""
...
class TestFullPipeline:
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_scraper.YoutubeDL')
def test_single_video_transcript_only(
self, mock_ytdl, mock_transcript, sample_ytdlp_metadata,
sample_transcript, video_output_dir
):
"""Full pipeline: single YouTube video, transcript only."""
mock_ytdl.return_value.__enter__.return_value.extract_info.return_value = sample_ytdlp_metadata
mock_transcript.list_transcripts.return_value = ...
# Run pipeline
# Assert output files exist and content is correct
...
@pytest.mark.slow
@patch('skill_seekers.cli.video_visual.easyocr.Reader')
@patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
@patch('skill_seekers.cli.video_scraper.YoutubeDL')
def test_single_video_with_visual(
self, mock_ytdl, mock_transcript, mock_ocr,
sample_ytdlp_metadata, video_output_dir
):
"""Full pipeline: single video with visual extraction."""
...
CI Considerations
What Runs in CI (Default)
- All unit tests (mocked, no network, no GPU)
- Integration tests with mocked external services
- Source detection tests (pure logic)
- Data model tests (pure logic)
What Doesn't Run in CI (Marked)
@pytest.mark.slow # Whisper model loading, actual OCR
@pytest.mark.integration # Real YouTube API calls
@pytest.mark.e2e # Full pipeline with real video download
CI Test Matrix Compatibility
| Test | Ubuntu | macOS | Python 3.10 | Python 3.12 | GPU |
|---|---|---|---|---|---|
| Unit tests | Yes | Yes | Yes | Yes | No |
| Integration (mocked) | Yes | Yes | Yes | Yes | No |
| Whisper tests (mocked) | Yes | Yes | Yes | Yes | No |
| OCR tests (mocked) | Yes | Yes | Yes | Yes | No |
| E2E (real download) | Skip | Skip | Skip | Skip | No |
Dependency Handling in Tests
# At top of visual test files:
pytest.importorskip("cv2", reason="opencv-python-headless required for visual tests")
pytest.importorskip("easyocr", reason="easyocr required for OCR tests")
# At top of whisper test files:
pytest.importorskip("faster_whisper", reason="faster-whisper required for transcription tests")
Performance Tests
@pytest.mark.benchmark
class TestVideoPerformance:
def test_transcript_parsing_speed(self, sample_transcript):
"""Transcript parsing completes in < 10ms for 1000 segments."""
...
def test_segment_alignment_speed(self):
"""Segment alignment completes in < 50ms for 100 segments."""
...
def test_frame_classification_speed(self, tmp_path):
"""Frame classification completes in < 20ms per frame."""
...
def test_content_merging_speed(self):
"""Content merging completes in < 5ms per segment."""
...
def test_output_generation_speed(self, video_output_dir):
"""Output generation (5 videos, 50 segments) in < 1 second."""
...