Files
skill-seekers-reference/docs/plans/video/04_VIDEO_INTEGRATION.md
YusufKaraaslanSpyke 62071c4aa9 feat: add video tutorial scraping pipeline with per-panel OCR and AI enhancement
Add complete video tutorial extraction system that converts YouTube videos
and local video files into AI-consumable skills. The pipeline extracts
transcripts, performs visual OCR on code editor panels independently,
tracks code evolution across frames, and generates structured SKILL.md output.

Key features:
- Video metadata extraction (YouTube, local files, playlists)
- Multi-source transcript extraction (YouTube API, yt-dlp, Whisper fallback)
- Chapter-based and time-window segmentation
- Visual extraction: keyframe detection, frame classification, panel detection
- Per-panel sub-section OCR (each IDE panel OCR'd independently)
- Parallel OCR with ThreadPoolExecutor for multi-panel frames
- Narrow panel filtering (300px min width) to skip UI chrome
- Text block tracking with spatial panel position matching
- Code timeline with edit tracking across frames
- Audio-visual alignment (code + narrator pairs)
- Video-specific AI enhancement prompt for OCR denoising and code reconstruction
- video-tutorial.yaml workflow with 4 stages (OCR cleanup, language detection,
  tutorial synthesis, skill polish)
- CLI integration: skill-seekers video --url/--video-file/--playlist
- MCP tool: scrape_video for automation
- 161 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:10:19 +03:00

26 KiB

Video Source — System Integration

Date: February 27, 2026 Document: 04 of 07 Status: Planning


Table of Contents

  1. CLI Integration
  2. Source Detection
  3. Unified Config Integration
  4. Unified Scraper Integration
  5. Create Command Integration
  6. Parser & Arguments
  7. MCP Tool Integration
  8. Enhancement Integration
  9. File Map (New & Modified)

CLI Integration

New Subcommand: video

# Dedicated video scraping command
skill-seekers video --url https://youtube.com/watch?v=abc123
skill-seekers video --playlist https://youtube.com/playlist?list=PLxxx
skill-seekers video --channel https://youtube.com/@channelname
skill-seekers video --path ./recording.mp4
skill-seekers video --directory ./recordings/

# With options
skill-seekers video --url <URL> \
    --output output/react-videos/ \
    --visual \
    --whisper-model large-v3 \
    --max-videos 20 \
    --languages en \
    --categories '{"hooks": ["useState", "useEffect"]}' \
    --enhance-level 2

Auto-Detection via create Command

# These all auto-detect as video sources
skill-seekers create https://youtube.com/watch?v=abc123
skill-seekers create https://youtu.be/abc123
skill-seekers create https://youtube.com/playlist?list=PLxxx
skill-seekers create https://youtube.com/@channelname
skill-seekers create https://vimeo.com/123456789
skill-seekers create ./tutorial.mp4
skill-seekers create ./recordings/                # Directory of videos

# With universal flags
skill-seekers create https://youtube.com/watch?v=abc123 --visual -p comprehensive
skill-seekers create ./tutorial.mp4 --enhance-level 2 --dry-run

Registration in main.py

# In src/skill_seekers/cli/main.py - COMMAND_MODULES dict

COMMAND_MODULES = {
    # ... existing commands ...
    'video': 'skill_seekers.cli.video_scraper',
    # ... rest of commands ...
}

Source Detection

Changes to source_detector.py

# New patterns to add:

class SourceDetector:
    # Existing patterns...

    # NEW: Video URL patterns
    YOUTUBE_VIDEO_PATTERN = re.compile(
        r'(?:https?://)?(?:www\.)?'
        r'(?:youtube\.com/watch\?v=|youtu\.be/)'
        r'([a-zA-Z0-9_-]{11})'
    )
    YOUTUBE_PLAYLIST_PATTERN = re.compile(
        r'(?:https?://)?(?:www\.)?'
        r'youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)'
    )
    YOUTUBE_CHANNEL_PATTERN = re.compile(
        r'(?:https?://)?(?:www\.)?'
        r'youtube\.com/(?:@|c/|channel/|user/)([a-zA-Z0-9_.-]+)'
    )
    VIMEO_PATTERN = re.compile(
        r'(?:https?://)?(?:www\.)?vimeo\.com/(\d+)'
    )

    # Video file extensions
    VIDEO_EXTENSIONS = {
        '.mp4', '.mkv', '.webm', '.avi', '.mov',
        '.flv', '.ts', '.wmv', '.m4v', '.ogv',
    }

    @classmethod
    def detect(cls, source: str) -> SourceInfo:
        """Updated detection order:
        1. .json (config)
        2. .pdf
        3. .docx
        4. Video file extensions (.mp4, .mkv, .webm, etc.)  ← NEW
        5. Directory (may contain videos)
        6. YouTube/Vimeo URL patterns  ← NEW
        7. GitHub patterns
        8. Web URL
        9. Domain inference
        """
        # 1. Config file
        if source.endswith('.json'):
            return cls._detect_config(source)

        # 2. PDF file
        if source.endswith('.pdf'):
            return cls._detect_pdf(source)

        # 3. Word document
        if source.endswith('.docx'):
            return cls._detect_word(source)

        # 4. NEW: Video file
        ext = os.path.splitext(source)[1].lower()
        if ext in cls.VIDEO_EXTENSIONS:
            return cls._detect_video_file(source)

        # 5. Directory
        if os.path.isdir(source):
            # Check if directory contains mostly video files
            if cls._is_video_directory(source):
                return cls._detect_video_directory(source)
            return cls._detect_local(source)

        # 6. NEW: Video URL patterns (before general web URL)
        video_info = cls._detect_video_url(source)
        if video_info:
            return video_info

        # 7. GitHub patterns
        github_info = cls._detect_github(source)
        if github_info:
            return github_info

        # 8. Web URL
        if source.startswith('http://') or source.startswith('https://'):
            return cls._detect_web(source)

        # 9. Domain inference
        if '.' in source and not source.startswith('/'):
            return cls._detect_web(f'https://{source}')

        raise ValueError(
            f"Cannot determine source type for: {source}\n\n"
            "Examples:\n"
            "  Web:      skill-seekers create https://docs.react.dev/\n"
            "  GitHub:   skill-seekers create facebook/react\n"
            "  Local:    skill-seekers create ./my-project\n"
            "  PDF:      skill-seekers create tutorial.pdf\n"
            "  DOCX:     skill-seekers create document.docx\n"
            "  Video:    skill-seekers create https://youtube.com/watch?v=xxx\n"  # NEW
            "  Playlist: skill-seekers create https://youtube.com/playlist?list=xxx\n"  # NEW
            "  Config:   skill-seekers create configs/react.json"
        )

    @classmethod
    def _detect_video_url(cls, source: str) -> SourceInfo | None:
        """Detect YouTube or Vimeo video URL."""

        # YouTube video
        match = cls.YOUTUBE_VIDEO_PATTERN.search(source)
        if match:
            video_id = match.group(1)
            return SourceInfo(
                type='video',
                parsed={
                    'video_source': 'youtube_video',
                    'video_id': video_id,
                    'url': f'https://www.youtube.com/watch?v={video_id}',
                },
                suggested_name=f'video-{video_id}',
                raw_input=source,
            )

        # YouTube playlist
        match = cls.YOUTUBE_PLAYLIST_PATTERN.search(source)
        if match:
            playlist_id = match.group(1)
            return SourceInfo(
                type='video',
                parsed={
                    'video_source': 'youtube_playlist',
                    'playlist_id': playlist_id,
                    'url': f'https://www.youtube.com/playlist?list={playlist_id}',
                },
                suggested_name=f'playlist-{playlist_id[:12]}',
                raw_input=source,
            )

        # YouTube channel
        match = cls.YOUTUBE_CHANNEL_PATTERN.search(source)
        if match:
            channel_name = match.group(1)
            return SourceInfo(
                type='video',
                parsed={
                    'video_source': 'youtube_channel',
                    'channel': channel_name,
                    'url': source if source.startswith('http') else f'https://www.youtube.com/@{channel_name}',
                },
                suggested_name=channel_name.lstrip('@'),
                raw_input=source,
            )

        # Vimeo
        match = cls.VIMEO_PATTERN.search(source)
        if match:
            video_id = match.group(1)
            return SourceInfo(
                type='video',
                parsed={
                    'video_source': 'vimeo',
                    'video_id': video_id,
                    'url': f'https://vimeo.com/{video_id}',
                },
                suggested_name=f'vimeo-{video_id}',
                raw_input=source,
            )

        return None

    @classmethod
    def _detect_video_file(cls, source: str) -> SourceInfo:
        """Detect local video file."""
        name = os.path.splitext(os.path.basename(source))[0]
        return SourceInfo(
            type='video',
            parsed={
                'video_source': 'local_file',
                'file_path': os.path.abspath(source),
            },
            suggested_name=name,
            raw_input=source,
        )

    @classmethod
    def _detect_video_directory(cls, source: str) -> SourceInfo:
        """Detect directory containing video files."""
        directory = os.path.abspath(source)
        name = os.path.basename(directory)
        return SourceInfo(
            type='video',
            parsed={
                'video_source': 'local_directory',
                'directory': directory,
            },
            suggested_name=name,
            raw_input=source,
        )

    @classmethod
    def _is_video_directory(cls, path: str) -> bool:
        """Check if a directory contains mostly video files.

        Returns True if >50% of files are video files.
        Used to distinguish video directories from code directories.
        """
        total = 0
        video = 0
        for f in os.listdir(path):
            if os.path.isfile(os.path.join(path, f)):
                total += 1
                ext = os.path.splitext(f)[1].lower()
                if ext in cls.VIDEO_EXTENSIONS:
                    video += 1
        return total > 0 and (video / total) > 0.5

    @classmethod
    def validate_source(cls, source_info: SourceInfo) -> None:
        """Updated to include video validation."""
        # ... existing validation ...

        if source_info.type == 'video':
            video_source = source_info.parsed.get('video_source')
            if video_source == 'local_file':
                file_path = source_info.parsed['file_path']
                if not os.path.exists(file_path):
                    raise ValueError(f"Video file does not exist: {file_path}")
            elif video_source == 'local_directory':
                directory = source_info.parsed['directory']
                if not os.path.exists(directory):
                    raise ValueError(f"Video directory does not exist: {directory}")
            # For online sources, validation happens during scraping

Unified Config Integration

Updated scraped_data dict in unified_scraper.py

# In UnifiedScraper.__init__():
self.scraped_data = {
    "documentation": [],
    "github": [],
    "pdf": [],
    "word": [],
    "local": [],
    "video": [],      # ← NEW
}

Video Source Processing in Unified Scraper

def _scrape_video_source(self, source: dict, source_index: int) -> dict:
    """Process a video source from unified config.

    Args:
        source: Video source config dict from unified JSON
        source_index: Index for unique naming

    Returns:
        Dict with scraping results and metadata
    """
    from skill_seekers.cli.video_scraper import VideoScraper
    from skill_seekers.cli.video_models import VideoSourceConfig

    config = VideoSourceConfig.from_dict(source)
    scraper = VideoScraper(config=config, output_dir=self.output_dir)

    result = scraper.scrape()

    return {
        'source_type': 'video',
        'source_name': source.get('name', f'video_{source_index}'),
        'weight': source.get('weight', 0.2),
        'result': result,
        'video_count': len(result.videos),
        'segment_count': result.total_segments,
        'categories': result.categories,
    }

Example Unified Config with Video

{
    "name": "react-complete",
    "description": "React 19 - Documentation + Code + Video Tutorials",
    "output_dir": "output/react-complete/",

    "sources": [
        {
            "type": "documentation",
            "url": "https://react.dev/",
            "name": "official_docs",
            "weight": 0.4,
            "selectors": {
                "main_content": "article",
                "code_blocks": "pre code"
            },
            "categories": {
                "getting_started": ["learn", "quick-start"],
                "hooks": ["hooks", "use-state", "use-effect"],
                "api": ["reference", "api"]
            }
        },
        {
            "type": "github",
            "repo": "facebook/react",
            "name": "source_code",
            "weight": 0.3,
            "analysis_depth": "deep"
        },
        {
            "type": "video",
            "playlist": "https://www.youtube.com/playlist?list=PLreactplaylist",
            "name": "official_tutorials",
            "weight": 0.2,
            "max_videos": 15,
            "visual_extraction": true,
            "languages": ["en"],
            "categories": {
                "getting_started": ["intro", "quickstart", "setup"],
                "hooks": ["useState", "useEffect", "hooks"],
                "advanced": ["suspense", "concurrent", "server"]
            }
        },
        {
            "type": "video",
            "url": "https://www.youtube.com/watch?v=abc123def45",
            "name": "react_conf_keynote",
            "weight": 0.1,
            "visual_extraction": false
        }
    ],

    "merge_strategy": "unified",
    "conflict_resolution": "docs_first",

    "enhancement": {
        "enabled": true,
        "level": 2
    }
}

Create Command Integration

Changes to Create Command Routing

# In src/skill_seekers/cli/create_command.py (or equivalent in main.py)

def route_source(source_info: SourceInfo, args: argparse.Namespace):
    """Route detected source to appropriate scraper."""

    if source_info.type == 'web':
        return _route_web(source_info, args)
    elif source_info.type == 'github':
        return _route_github(source_info, args)
    elif source_info.type == 'local':
        return _route_local(source_info, args)
    elif source_info.type == 'pdf':
        return _route_pdf(source_info, args)
    elif source_info.type == 'word':
        return _route_word(source_info, args)
    elif source_info.type == 'video':          # ← NEW
        return _route_video(source_info, args)
    elif source_info.type == 'config':
        return _route_config(source_info, args)


def _route_video(source_info: SourceInfo, args: argparse.Namespace):
    """Route video source to video scraper."""
    from skill_seekers.cli.video_scraper import VideoScraper
    from skill_seekers.cli.video_models import VideoSourceConfig

    parsed = source_info.parsed

    # Build config from CLI args + parsed source info
    config_dict = {
        'name': getattr(args, 'name', None) or source_info.suggested_name,
        'visual_extraction': getattr(args, 'visual', False),
        'whisper_model': getattr(args, 'whisper_model', 'base'),
        'max_videos': getattr(args, 'max_videos', 50),
        'languages': getattr(args, 'languages', None),
    }

    # Set the appropriate source field
    video_source = parsed['video_source']
    if video_source in ('youtube_video', 'vimeo'):
        config_dict['url'] = parsed['url']
    elif video_source == 'youtube_playlist':
        config_dict['playlist'] = parsed['url']
    elif video_source == 'youtube_channel':
        config_dict['channel'] = parsed['url']
    elif video_source == 'local_file':
        config_dict['path'] = parsed['file_path']
    elif video_source == 'local_directory':
        config_dict['directory'] = parsed['directory']

    config = VideoSourceConfig.from_dict(config_dict)
    output_dir = getattr(args, 'output', None) or f'output/{config_dict["name"]}/'

    scraper = VideoScraper(config=config, output_dir=output_dir)

    if getattr(args, 'dry_run', False):
        scraper.dry_run()
        return

    result = scraper.scrape()
    scraper.generate_output(result)

Parser & Arguments

New Parser: video_parser.py

# src/skill_seekers/cli/parsers/video_parser.py

from skill_seekers.cli.parsers.base import SubcommandParser


class VideoParser(SubcommandParser):
    """Parser for the video scraping command."""

    name = 'video'
    help = 'Extract knowledge from YouTube videos, playlists, channels, or local video files'
    description = (
        'Process video content into structured skill documentation.\n\n'
        'Supports YouTube (single video, playlist, channel), Vimeo, and local video files.\n'
        'Extracts transcripts, metadata, chapters, and optionally visual content (code, slides).'
    )

    def add_arguments(self, parser):
        # Source (mutually exclusive group)
        source = parser.add_mutually_exclusive_group(required=True)
        source.add_argument('--url', help='YouTube or Vimeo video URL')
        source.add_argument('--playlist', help='YouTube playlist URL')
        source.add_argument('--channel', help='YouTube channel URL')
        source.add_argument('--path', help='Local video file path')
        source.add_argument('--directory', help='Directory containing video files')

        # Add shared arguments (output, dry-run, verbose, etc.)
        from skill_seekers.cli.arguments.common import add_all_standard_arguments
        add_all_standard_arguments(parser)

        # Add video-specific arguments
        from skill_seekers.cli.arguments.video import add_video_arguments
        add_video_arguments(parser)

New Arguments: video.py

# src/skill_seekers/cli/arguments/video.py

VIDEO_ARGUMENTS = {
    # === Filtering ===
    "max_videos": {
        "flags": ("--max-videos",),
        "kwargs": {
            "type": int,
            "default": 50,
            "help": "Maximum number of videos to process (default: 50)",
        },
    },
    "min_duration": {
        "flags": ("--min-duration",),
        "kwargs": {
            "type": float,
            "default": 60.0,
            "help": "Minimum video duration in seconds (default: 60)",
        },
    },
    "max_duration": {
        "flags": ("--max-duration",),
        "kwargs": {
            "type": float,
            "default": 7200.0,
            "help": "Maximum video duration in seconds (default: 7200 = 2 hours)",
        },
    },
    "languages": {
        "flags": ("--languages",),
        "kwargs": {
            "nargs": "+",
            "default": None,
            "help": "Preferred transcript languages (default: all). Example: --languages en es",
        },
    },
    "min_views": {
        "flags": ("--min-views",),
        "kwargs": {
            "type": int,
            "default": None,
            "help": "Minimum view count filter (online videos only)",
        },
    },

    # === Extraction ===
    "visual": {
        "flags": ("--visual",),
        "kwargs": {
            "action": "store_true",
            "help": "Enable visual extraction (OCR on keyframes). Requires video-full dependencies.",
        },
    },
    "whisper_model": {
        "flags": ("--whisper-model",),
        "kwargs": {
            "default": "base",
            "choices": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo"],
            "help": "Whisper model size for speech-to-text (default: base)",
        },
    },
    "whisper_device": {
        "flags": ("--whisper-device",),
        "kwargs": {
            "default": "auto",
            "choices": ["auto", "cpu", "cuda"],
            "help": "Device for Whisper inference (default: auto)",
        },
    },
    "ocr_languages": {
        "flags": ("--ocr-languages",),
        "kwargs": {
            "nargs": "+",
            "default": None,
            "help": "OCR languages for visual extraction (default: same as --languages)",
        },
    },

    # === Segmentation ===
    "segment_strategy": {
        "flags": ("--segment-strategy",),
        "kwargs": {
            "default": "hybrid",
            "choices": ["chapters", "semantic", "time_window", "scene_change", "hybrid"],
            "help": "How to segment video content (default: hybrid)",
        },
    },
    "segment_duration": {
        "flags": ("--segment-duration",),
        "kwargs": {
            "type": float,
            "default": 300.0,
            "help": "Target segment duration in seconds for time_window strategy (default: 300)",
        },
    },

    # === Local file options ===
    "file_patterns": {
        "flags": ("--file-patterns",),
        "kwargs": {
            "nargs": "+",
            "default": None,
            "help": "File patterns for directory scanning (default: *.mp4 *.mkv *.webm)",
        },
    },
    "recursive": {
        "flags": ("--recursive",),
        "kwargs": {
            "action": "store_true",
            "default": True,
            "help": "Recursively scan directories (default: True)",
        },
    },
    "no_recursive": {
        "flags": ("--no-recursive",),
        "kwargs": {
            "action": "store_true",
            "help": "Disable recursive directory scanning",
        },
    },
}


def add_video_arguments(parser):
    """Add all video-specific arguments to a parser."""
    for arg_name, arg_def in VIDEO_ARGUMENTS.items():
        parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])

Progressive Help for Create Command

# In arguments/create.py - add video to help modes

# New help flag
"help_video": {
    "flags": ("--help-video",),
    "kwargs": {
        "action": "store_true",
        "help": "Show video-specific options",
    },
}

# VIDEO_ARGUMENTS added to create command's video help mode
# skill-seekers create --help-video

MCP Tool Integration

New MCP Tool: scrape_video

# In src/skill_seekers/mcp/tools/scraping_tools.py

@mcp.tool()
def scrape_video(
    url: str | None = None,
    playlist: str | None = None,
    path: str | None = None,
    output_dir: str = "output/",
    visual: bool = False,
    max_videos: int = 20,
    whisper_model: str = "base",
) -> str:
    """Scrape and extract knowledge from video content.

    Supports YouTube videos, playlists, channels, and local video files.
    Extracts transcripts, metadata, chapters, and optionally visual content.

    Args:
        url: YouTube or Vimeo video URL
        playlist: YouTube playlist URL
        path: Local video file or directory path
        output_dir: Output directory for results
        visual: Enable visual extraction (OCR on keyframes)
        max_videos: Maximum videos to process (for playlists)
        whisper_model: Whisper model size for transcription

    Returns:
        JSON string with scraping results summary
    """
    ...

Updated Tool Count

Total MCP tools: 27 (was 26, add scrape_video)


Enhancement Integration

Video Content Enhancement

Video segments can be enhanced using the same AI enhancement pipeline:

# In enhance_skill_local.py or enhance_command.py

def enhance_video_content(segments: list[VideoSegment], level: int) -> list[VideoSegment]:
    """AI-enhance video segments.

    Enhancement levels:
    0 - No enhancement
    1 - Summary generation per segment
    2 - + Topic extraction, category refinement, code annotation
    3 - + Cross-segment connections, tutorial flow analysis, key takeaways

    Uses the same enhancement infrastructure as other sources.
    """
    if level == 0:
        return segments

    for segment in segments:
        if level >= 1:
            segment.summary = ai_summarize(segment.content)

        if level >= 2:
            segment.topic = ai_extract_topic(segment.content)
            segment.category = ai_refine_category(
                segment.content, segment.category
            )
            # Annotate code blocks with explanations
            for cb in segment.detected_code_blocks:
                cb.explanation = ai_explain_code(cb.code, segment.transcript)

        if level >= 3:
            # Cross-segment analysis (needs all segments)
            pass  # Handled at video level, not segment level

    return segments

File Map (New & Modified Files)

New Files

File Purpose Estimated Size
src/skill_seekers/cli/video_scraper.py Main video scraper orchestrator ~800-1000 lines
src/skill_seekers/cli/video_models.py All data classes and enums ~500-600 lines
src/skill_seekers/cli/video_transcript.py Transcript extraction (YouTube API + Whisper) ~400-500 lines
src/skill_seekers/cli/video_visual.py Visual extraction (scene detection + OCR) ~500-600 lines
src/skill_seekers/cli/video_segmenter.py Segmentation and stream alignment ~400-500 lines
src/skill_seekers/cli/parsers/video_parser.py CLI argument parser ~80-100 lines
src/skill_seekers/cli/arguments/video.py Video-specific argument definitions ~120-150 lines
tests/test_video_scraper.py Video scraper tests ~600-800 lines
tests/test_video_transcript.py Transcript extraction tests ~400-500 lines
tests/test_video_visual.py Visual extraction tests ~400-500 lines
tests/test_video_segmenter.py Segmentation tests ~300-400 lines
tests/test_video_models.py Data model tests ~200-300 lines
tests/test_video_integration.py Integration tests ~300-400 lines
tests/fixtures/video/ Test fixtures (mock transcripts, metadata) Various

Modified Files

File Changes
src/skill_seekers/cli/source_detector.py Add video URL patterns, video file detection, video directory detection
src/skill_seekers/cli/main.py Register video subcommand in COMMAND_MODULES
src/skill_seekers/cli/unified_scraper.py Add "video": [] to scraped_data, add _scrape_video_source()
src/skill_seekers/cli/arguments/create.py Add video args to create command, add --help-video
src/skill_seekers/cli/parsers/__init__.py Register VideoParser
src/skill_seekers/cli/config_validator.py Validate video source entries in unified config
src/skill_seekers/mcp/tools/scraping_tools.py Add scrape_video tool
pyproject.toml Add [video] and [video-full] optional dependencies, add skill-seekers-video entry point
tests/test_source_detector.py Add video detection tests
tests/test_unified.py Add video source integration tests