firefrost-gaming/skill-seekers-reference

Files

YusufKaraaslanSpyke 62071c4aa9 feat: add video tutorial scraping pipeline with per-panel OCR and AI enhancement

Add complete video tutorial extraction system that converts YouTube videos
and local video files into AI-consumable skills. The pipeline extracts
transcripts, performs visual OCR on code editor panels independently,
tracks code evolution across frames, and generates structured SKILL.md output.

Key features:
- Video metadata extraction (YouTube, local files, playlists)
- Multi-source transcript extraction (YouTube API, yt-dlp, Whisper fallback)
- Chapter-based and time-window segmentation
- Visual extraction: keyframe detection, frame classification, panel detection
- Per-panel sub-section OCR (each IDE panel OCR'd independently)
- Parallel OCR with ThreadPoolExecutor for multi-panel frames
- Narrow panel filtering (300px min width) to skip UI chrome
- Text block tracking with spatial panel position matching
- Code timeline with edit tracking across frames
- Audio-visual alignment (code + narrator pairs)
- Video-specific AI enhancement prompt for OCR denoising and code reconstruction
- video-tutorial.yaml workflow with 4 stages (OCR cleanup, language detection,
  tutorial synthesis, skill polish)
- CLI integration: skill-seekers video --url/--video-file/--playlist
- MCP tool: scrape_video for automation
- 161 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-27 23:10:19 +03:00

24 KiB

Raw Blame History

Video Source — Testing Strategy

Date: February 27, 2026 Document: 06 of 07 Status: Planning

Testing Principles
Test File Structure
Fixtures & Mock Data
Unit Tests
Integration Tests
E2E Tests
CI Considerations
Performance Tests

Testing Principles

No network calls in unit tests — All YouTube API, yt-dlp, and download operations must be mocked.
No GPU required in CI — All Whisper and easyocr tests must work on CPU, or be marked @pytest.mark.slow.
No video files in repo — Test fixtures use JSON transcripts and small synthetic images, not actual video files.
100% pipeline coverage — Every phase of the 6-phase pipeline must be tested.
Edge case focus — Test missing chapters, empty transcripts, corrupt frames, rate limits.
Compatible with existing test infra — Use existing conftest.py, markers, and patterns.

Test File Structure

tests/
├── test_video_models.py          # Data model tests (serialization, validation)
├── test_video_scraper.py         # Main scraper orchestration tests
├── test_video_transcript.py      # Transcript extraction tests
├── test_video_visual.py          # Visual extraction tests
├── test_video_segmenter.py       # Segmentation and alignment tests
├── test_video_integration.py     # Integration with unified scraper, create command
├── test_video_output.py          # Output generation tests
├── test_video_source_detector.py # Source detection tests (or add to existing)
├── fixtures/
│   └── video/
│       ├── sample_metadata.json       # yt-dlp info_dict mock
│       ├── sample_transcript.json     # YouTube transcript mock
│       ├── sample_whisper_output.json # Whisper transcription mock
│       ├── sample_chapters.json       # Chapter data mock
│       ├── sample_playlist.json       # Playlist metadata mock
│       ├── sample_segments.json       # Pre-aligned segments
│       ├── sample_frame_code.png      # 100x100 synthetic dark frame
│       ├── sample_frame_slide.png     # 100x100 synthetic light frame
│       ├── sample_frame_diagram.png   # 100x100 synthetic edge-heavy frame
│       ├── sample_srt.srt             # SRT subtitle file
│       ├── sample_vtt.vtt             # WebVTT subtitle file
│       └── sample_config.json         # Video source config

Fixtures & Mock Data

yt-dlp Metadata Fixture

# tests/fixtures/video/sample_metadata.json
SAMPLE_YTDLP_METADATA = {
    "id": "abc123def45",
    "title": "React Hooks Tutorial for Beginners",
    "description": "Learn React Hooks from scratch. Covers useState, useEffect, and custom hooks.",
    "duration": 1832,
    "upload_date": "20260115",
    "uploader": "React Official",
    "uploader_url": "https://www.youtube.com/@reactofficial",
    "channel_follower_count": 250000,
    "view_count": 1500000,
    "like_count": 45000,
    "comment_count": 2300,
    "tags": ["react", "hooks", "tutorial", "javascript"],
    "categories": ["Education"],
    "language": "en",
    "thumbnail": "https://i.ytimg.com/vi/abc123def45/maxresdefault.jpg",
    "webpage_url": "https://www.youtube.com/watch?v=abc123def45",
    "chapters": [
        {"title": "Intro", "start_time": 0, "end_time": 45},
        {"title": "Project Setup", "start_time": 45, "end_time": 180},
        {"title": "useState Hook", "start_time": 180, "end_time": 540},
        {"title": "useEffect Hook", "start_time": 540, "end_time": 900},
        {"title": "Custom Hooks", "start_time": 900, "end_time": 1320},
        {"title": "Best Practices", "start_time": 1320, "end_time": 1680},
        {"title": "Wrap Up", "start_time": 1680, "end_time": 1832},
    ],
    "subtitles": {
        "en": [{"ext": "vtt", "url": "https://..."}],
    },
    "automatic_captions": {
        "en": [{"ext": "vtt", "url": "https://..."}],
    },
    "extractor": "youtube",
}

YouTube Transcript Fixture

SAMPLE_YOUTUBE_TRANSCRIPT = [
    {"text": "Welcome to this React Hooks tutorial.", "start": 0.0, "duration": 2.5},
    {"text": "Today we'll learn about the most important hooks.", "start": 2.5, "duration": 3.0},
    {"text": "Let's start by setting up our project.", "start": 45.0, "duration": 2.8},
    {"text": "We'll use Create React App.", "start": 47.8, "duration": 2.0},
    {"text": "Run npx create-react-app hooks-demo.", "start": 49.8, "duration": 3.5},
    # ... more segments covering all chapters
]

Whisper Output Fixture

SAMPLE_WHISPER_OUTPUT = {
    "language": "en",
    "language_probability": 0.98,
    "duration": 1832.0,
    "segments": [
        {
            "start": 0.0,
            "end": 2.5,
            "text": "Welcome to this React Hooks tutorial.",
            "avg_logprob": -0.15,
            "no_speech_prob": 0.01,
            "words": [
                {"word": "Welcome", "start": 0.0, "end": 0.4, "probability": 0.97},
                {"word": "to", "start": 0.4, "end": 0.5, "probability": 0.99},
                {"word": "this", "start": 0.5, "end": 0.7, "probability": 0.98},
                {"word": "React", "start": 0.7, "end": 1.1, "probability": 0.95},
                {"word": "Hooks", "start": 1.1, "end": 1.5, "probability": 0.93},
                {"word": "tutorial.", "start": 1.5, "end": 2.3, "probability": 0.96},
            ],
        },
    ],
}

Synthetic Frame Fixtures

# Generate in conftest.py or fixture setup
import numpy as np
import cv2

def create_dark_frame(path: str):
    """Create a synthetic dark frame (simulates code editor)."""
    img = np.zeros((1080, 1920, 3), dtype=np.uint8)
    img[200:250, 100:800] = [200, 200, 200]  # Simulated text line
    img[270:320, 100:600] = [180, 180, 180]  # Another text line
    cv2.imwrite(path, img)

def create_light_frame(path: str):
    """Create a synthetic light frame (simulates slide)."""
    img = np.ones((1080, 1920, 3), dtype=np.uint8) * 240
    img[100:150, 200:1000] = [40, 40, 40]  # Title text
    img[300:330, 200:1200] = [60, 60, 60]  # Body text
    cv2.imwrite(path, img)

conftest.py Additions

# tests/conftest.py — add video fixtures

import pytest
import json
from pathlib import Path

FIXTURES_DIR = Path(__file__).parent / "fixtures" / "video"


@pytest.fixture
def sample_ytdlp_metadata():
    """Load sample yt-dlp metadata."""
    with open(FIXTURES_DIR / "sample_metadata.json") as f:
        return json.load(f)


@pytest.fixture
def sample_transcript():
    """Load sample YouTube transcript."""
    with open(FIXTURES_DIR / "sample_transcript.json") as f:
        return json.load(f)


@pytest.fixture
def sample_whisper_output():
    """Load sample Whisper transcription output."""
    with open(FIXTURES_DIR / "sample_whisper_output.json") as f:
        return json.load(f)


@pytest.fixture
def sample_chapters():
    """Load sample chapter data."""
    with open(FIXTURES_DIR / "sample_chapters.json") as f:
        return json.load(f)


@pytest.fixture
def sample_video_config():
    """Create a sample VideoSourceConfig."""
    from skill_seekers.cli.video_models import VideoSourceConfig
    return VideoSourceConfig(
        url="https://www.youtube.com/watch?v=abc123def45",
        name="test_video",
        visual_extraction=False,
        max_videos=5,
    )


@pytest.fixture
def video_output_dir(tmp_path):
    """Create a temporary output directory for video tests."""
    output = tmp_path / "output" / "test_video"
    output.mkdir(parents=True)
    (output / "video_data").mkdir()
    (output / "video_data" / "transcripts").mkdir()
    (output / "video_data" / "segments").mkdir()
    (output / "video_data" / "frames").mkdir()
    (output / "references").mkdir()
    (output / "pages").mkdir()
    return output

Unit Tests

test_video_models.py

"""Tests for video data models and serialization."""

class TestVideoInfo:
    def test_create_from_ytdlp_metadata(self, sample_ytdlp_metadata):
        """VideoInfo correctly parses yt-dlp info_dict."""
        ...

    def test_serialization_round_trip(self):
        """VideoInfo serializes to dict and deserializes back identically."""
        ...

    def test_content_richness_score(self):
        """Content richness score computed correctly based on signals."""
        ...

    def test_empty_chapters(self):
        """VideoInfo handles video with no chapters."""
        ...


class TestVideoSegment:
    def test_timestamp_display(self):
        """Timestamp display formats correctly (MM:SS - MM:SS)."""
        ...

    def test_youtube_timestamp_url(self):
        """YouTube timestamp URL generated correctly."""
        ...

    def test_segment_with_code_blocks(self):
        """Segment correctly tracks detected code blocks."""
        ...

    def test_segment_without_visual(self):
        """Segment works when visual extraction is disabled."""
        ...


class TestChapter:
    def test_chapter_duration(self):
        """Chapter duration computed correctly."""
        ...

    def test_chapter_serialization(self):
        """Chapter serializes to/from dict."""
        ...


class TestTranscriptSegment:
    def test_from_youtube_api(self):
        """TranscriptSegment created from YouTube API format."""
        ...

    def test_from_whisper_output(self):
        """TranscriptSegment created from Whisper output."""
        ...

    def test_with_word_timestamps(self):
        """TranscriptSegment preserves word-level timestamps."""
        ...


class TestVideoSourceConfig:
    def test_validate_single_source(self):
        """Config requires exactly one source field."""
        ...

    def test_validate_duration_range(self):
        """Config validates min < max duration."""
        ...

    def test_defaults(self):
        """Config has sensible defaults."""
        ...

    def test_from_unified_config(self, sample_video_config):
        """Config created from unified config JSON entry."""
        ...


class TestEnums:
    def test_all_video_source_types(self):
        """All VideoSourceType values are valid."""
        ...

    def test_all_frame_types(self):
        """All FrameType values are valid."""
        ...

    def test_all_transcript_sources(self):
        """All TranscriptSource values are valid."""
        ...

test_video_transcript.py

"""Tests for transcript extraction (YouTube API + Whisper + subtitle parsing)."""

class TestYouTubeTranscript:
    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    def test_extract_manual_captions(self, mock_api, sample_transcript):
        """Prefers manual captions over auto-generated."""
        ...

    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    def test_fallback_to_auto_generated(self, mock_api):
        """Falls back to auto-generated when manual not available."""
        ...

    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    def test_fallback_to_translation(self, mock_api):
        """Falls back to translated captions when preferred language unavailable."""
        ...

    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    def test_no_transcript_available(self, mock_api):
        """Raises TranscriptNotAvailable when no captions exist."""
        ...

    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    def test_confidence_scoring(self, mock_api, sample_transcript):
        """Manual captions get 1.0 confidence, auto-generated get 0.8."""
        ...


class TestWhisperTranscription:
    @pytest.mark.slow
    @patch('skill_seekers.cli.video_transcript.WhisperModel')
    def test_transcribe_with_word_timestamps(self, mock_model):
        """Whisper returns word-level timestamps."""
        ...

    @patch('skill_seekers.cli.video_transcript.WhisperModel')
    def test_language_detection(self, mock_model):
        """Whisper detects video language."""
        ...

    @patch('skill_seekers.cli.video_transcript.WhisperModel')
    def test_vad_filtering(self, mock_model):
        """VAD filter removes silence segments."""
        ...

    def test_download_audio_only(self):
        """Audio extraction downloads audio stream only (not video)."""
        # Mock yt-dlp download
        ...


class TestSubtitleParsing:
    def test_parse_srt(self, tmp_path):
        """Parse SRT subtitle file into segments."""
        srt_content = "1\n00:00:01,500 --> 00:00:04,000\nHello world\n\n2\n00:00:05,000 --> 00:00:08,000\nSecond line\n"
        srt_file = tmp_path / "test.srt"
        srt_file.write_text(srt_content)
        ...

    def test_parse_vtt(self, tmp_path):
        """Parse WebVTT subtitle file into segments."""
        vtt_content = "WEBVTT\n\n00:00:01.500 --> 00:00:04.000\nHello world\n\n00:00:05.000 --> 00:00:08.000\nSecond line\n"
        vtt_file = tmp_path / "test.vtt"
        vtt_file.write_text(vtt_content)
        ...

    def test_srt_html_tag_removal(self, tmp_path):
        """SRT parser removes inline HTML tags."""
        ...

    def test_empty_subtitle_file(self, tmp_path):
        """Handle empty subtitle file gracefully."""
        ...


class TestTranscriptFallbackChain:
    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    @patch('skill_seekers.cli.video_transcript.WhisperModel')
    def test_youtube_then_whisper_fallback(self, mock_whisper, mock_yt_api):
        """Falls back to Whisper when YouTube captions fail."""
        ...

    def test_subtitle_file_discovery(self, tmp_path):
        """Discovers sidecar subtitle files for local videos."""
        ...

test_video_visual.py

"""Tests for visual extraction (scene detection, frame extraction, OCR)."""

class TestFrameClassification:
    def test_classify_dark_frame_as_code(self, tmp_path):
        """Dark frame with text patterns classified as code_editor."""
        ...

    def test_classify_light_frame_as_slide(self, tmp_path):
        """Light uniform frame classified as slide."""
        ...

    def test_classify_high_edge_as_diagram(self, tmp_path):
        """High edge density frame classified as diagram."""
        ...

    def test_classify_blank_frame_as_other(self, tmp_path):
        """Nearly blank frame classified as other."""
        ...


class TestKeyframeTimestamps:
    def test_chapter_boundaries_included(self, sample_chapters):
        """Keyframe timestamps include chapter start times."""
        ...

    def test_long_chapter_midpoint(self, sample_chapters):
        """Long chapters (>2 min) get midpoint keyframe."""
        ...

    def test_deduplication_within_1_second(self):
        """Timestamps within 1 second are deduplicated."""
        ...

    def test_regular_intervals_fill_gaps(self):
        """Regular interval timestamps fill gaps between scenes."""
        ...


class TestOCRExtraction:
    @pytest.mark.slow
    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
    def test_extract_text_from_code_frame(self, mock_reader, tmp_path):
        """OCR extracts text from code editor frame."""
        ...

    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
    def test_confidence_filtering(self, mock_reader):
        """Low-confidence OCR results are filtered out."""
        ...

    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
    def test_monospace_detection(self, mock_reader):
        """Monospace text regions correctly detected."""
        ...


class TestCodeBlockDetection:
    def test_detect_python_code(self):
        """Detect Python code from OCR text."""
        ...

    def test_detect_terminal_commands(self):
        """Detect terminal commands from OCR text."""
        ...

    def test_language_detection_from_ocr(self):
        """Language detection works on OCR-extracted code."""
        ...

test_video_segmenter.py

"""Tests for segmentation and stream alignment."""

class TestChapterSegmentation:
    def test_chapters_create_segments(self, sample_chapters):
        """Chapters map directly to segments."""
        ...

    def test_long_chapter_splitting(self):
        """Chapters exceeding max_segment_duration are split."""
        ...

    def test_empty_chapters(self):
        """Falls back to time window when no chapters."""
        ...


class TestTimeWindowSegmentation:
    def test_fixed_windows(self):
        """Creates segments at fixed intervals."""
        ...

    def test_sentence_boundary_alignment(self):
        """Segments split at sentence boundaries, not mid-word."""
        ...

    def test_configurable_window_size(self):
        """Window size respects config.time_window_seconds."""
        ...


class TestStreamAlignment:
    def test_align_transcript_to_segments(self, sample_transcript, sample_chapters):
        """Transcript segments mapped to correct time windows."""
        ...

    def test_align_keyframes_to_segments(self):
        """Keyframes mapped to correct segments by timestamp."""
        ...

    def test_partial_overlap_handling(self):
        """Transcript segments partially overlapping window boundaries."""
        ...

    def test_empty_segment_handling(self):
        """Handle segments with no transcript (silence, music)."""
        ...


class TestContentMerging:
    def test_transcript_only_content(self):
        """Content is just transcript when no visual data."""
        ...

    def test_code_block_appended(self):
        """Code on screen is appended to transcript content."""
        ...

    def test_duplicate_code_not_repeated(self):
        """Code mentioned in transcript is not duplicated from OCR."""
        ...

    def test_chapter_title_as_heading(self):
        """Chapter title becomes markdown heading in content."""
        ...

    def test_slide_text_supplementary(self):
        """Slide text adds to content when not in transcript."""
        ...


class TestCategorization:
    def test_category_from_chapter_title(self):
        """Category inferred from chapter title keywords."""
        ...

    def test_category_from_transcript(self):
        """Category inferred from transcript content."""
        ...

    def test_custom_categories_from_config(self):
        """Custom category keywords from config used."""
        ...

Integration Tests

test_video_integration.py

"""Integration tests for video pipeline end-to-end."""

class TestSourceDetectorVideo:
    def test_detect_youtube_video(self):
        info = SourceDetector.detect("https://youtube.com/watch?v=abc123def45")
        assert info.type == "video"
        assert info.parsed["video_source"] == "youtube_video"

    def test_detect_youtube_short_url(self):
        info = SourceDetector.detect("https://youtu.be/abc123def45")
        assert info.type == "video"

    def test_detect_youtube_playlist(self):
        info = SourceDetector.detect("https://youtube.com/playlist?list=PLxxx")
        assert info.type == "video"
        assert info.parsed["video_source"] == "youtube_playlist"

    def test_detect_youtube_channel(self):
        info = SourceDetector.detect("https://youtube.com/@reactofficial")
        assert info.type == "video"
        assert info.parsed["video_source"] == "youtube_channel"

    def test_detect_vimeo(self):
        info = SourceDetector.detect("https://vimeo.com/123456789")
        assert info.type == "video"
        assert info.parsed["video_source"] == "vimeo"

    def test_detect_mp4_file(self, tmp_path):
        f = tmp_path / "tutorial.mp4"
        f.touch()
        info = SourceDetector.detect(str(f))
        assert info.type == "video"
        assert info.parsed["video_source"] == "local_file"

    def test_detect_video_directory(self, tmp_path):
        d = tmp_path / "videos"
        d.mkdir()
        (d / "vid1.mp4").touch()
        (d / "vid2.mkv").touch()
        info = SourceDetector.detect(str(d))
        assert info.type == "video"

    def test_youtube_not_confused_with_web(self):
        """YouTube URLs detected as video, not web."""
        info = SourceDetector.detect("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
        assert info.type == "video"
        assert info.type != "web"


class TestUnifiedConfigVideo:
    def test_video_source_in_config(self, tmp_path):
        """Video source parsed correctly from unified config."""
        ...

    def test_multiple_video_sources(self, tmp_path):
        """Multiple video sources in same config."""
        ...

    def test_video_alongside_docs(self, tmp_path):
        """Video source alongside documentation source."""
        ...


class TestFullPipeline:
    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    @patch('skill_seekers.cli.video_scraper.YoutubeDL')
    def test_single_video_transcript_only(
        self, mock_ytdl, mock_transcript, sample_ytdlp_metadata,
        sample_transcript, video_output_dir
    ):
        """Full pipeline: single YouTube video, transcript only."""
        mock_ytdl.return_value.__enter__.return_value.extract_info.return_value = sample_ytdlp_metadata
        mock_transcript.list_transcripts.return_value = ...

        # Run pipeline
        # Assert output files exist and content is correct
        ...

    @pytest.mark.slow
    @patch('skill_seekers.cli.video_visual.easyocr.Reader')
    @patch('skill_seekers.cli.video_transcript.YouTubeTranscriptApi')
    @patch('skill_seekers.cli.video_scraper.YoutubeDL')
    def test_single_video_with_visual(
        self, mock_ytdl, mock_transcript, mock_ocr,
        sample_ytdlp_metadata, video_output_dir
    ):
        """Full pipeline: single video with visual extraction."""
        ...

CI Considerations

What Runs in CI (Default)

All unit tests (mocked, no network, no GPU)
Integration tests with mocked external services
Source detection tests (pure logic)
Data model tests (pure logic)

What Doesn't Run in CI (Marked)

@pytest.mark.slow       # Whisper model loading, actual OCR
@pytest.mark.integration  # Real YouTube API calls
@pytest.mark.e2e         # Full pipeline with real video download

CI Test Matrix Compatibility

Test	Ubuntu	macOS	Python 3.10	Python 3.12	GPU
Unit tests	Yes	Yes	Yes	Yes	No
Integration (mocked)	Yes	Yes	Yes	Yes	No
Whisper tests (mocked)	Yes	Yes	Yes	Yes	No
OCR tests (mocked)	Yes	Yes	Yes	Yes	No
E2E (real download)	Skip	Skip	Skip	Skip	No

Dependency Handling in Tests

# At top of visual test files:
pytest.importorskip("cv2", reason="opencv-python-headless required for visual tests")
pytest.importorskip("easyocr", reason="easyocr required for OCR tests")

# At top of whisper test files:
pytest.importorskip("faster_whisper", reason="faster-whisper required for transcription tests")

Performance Tests

@pytest.mark.benchmark
class TestVideoPerformance:
    def test_transcript_parsing_speed(self, sample_transcript):
        """Transcript parsing completes in < 10ms for 1000 segments."""
        ...

    def test_segment_alignment_speed(self):
        """Segment alignment completes in < 50ms for 100 segments."""
        ...

    def test_frame_classification_speed(self, tmp_path):
        """Frame classification completes in < 20ms per frame."""
        ...

    def test_content_merging_speed(self):
        """Content merging completes in < 5ms per segment."""
        ...

    def test_output_generation_speed(self, video_output_dir):
        """Output generation (5 videos, 50 segments) in < 1 second."""
        ...

24 KiB Raw Blame History

Video Source — Testing Strategy

Table of Contents

Testing Principles

Test File Structure

Fixtures & Mock Data

yt-dlp Metadata Fixture

YouTube Transcript Fixture

Whisper Output Fixture

Synthetic Frame Fixtures

conftest.py Additions

Unit Tests

test_video_models.py

test_video_transcript.py

test_video_visual.py

test_video_segmenter.py

Integration Tests

test_video_integration.py

CI Considerations

What Runs in CI (Default)

What Doesn't Run in CI (Marked)

CI Test Matrix Compatibility

Dependency Handling in Tests

Performance Tests

24 KiB

Raw Blame History