Add complete video tutorial extraction system that converts YouTube videos and local video files into AI-consumable skills. The pipeline extracts transcripts, performs visual OCR on code editor panels independently, tracks code evolution across frames, and generates structured SKILL.md output. Key features: - Video metadata extraction (YouTube, local files, playlists) - Multi-source transcript extraction (YouTube API, yt-dlp, Whisper fallback) - Chapter-based and time-window segmentation - Visual extraction: keyframe detection, frame classification, panel detection - Per-panel sub-section OCR (each IDE panel OCR'd independently) - Parallel OCR with ThreadPoolExecutor for multi-panel frames - Narrow panel filtering (300px min width) to skip UI chrome - Text block tracking with spatial panel position matching - Code timeline with edit tracking across frames - Audio-visual alignment (code + narrator pairs) - Video-specific AI enhancement prompt for OCR denoising and code reconstruction - video-tutorial.yaml workflow with 4 stages (OCR cleanup, language detection, tutorial synthesis, skill polish) - CLI integration: skill-seekers video --url/--video-file/--playlist - MCP tool: scrape_video for automation - 161 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
26 KiB
26 KiB
Video Source — System Integration
Date: February 27, 2026 Document: 04 of 07 Status: Planning
Table of Contents
- CLI Integration
- Source Detection
- Unified Config Integration
- Unified Scraper Integration
- Create Command Integration
- Parser & Arguments
- MCP Tool Integration
- Enhancement Integration
- File Map (New & Modified)
CLI Integration
New Subcommand: video
# Dedicated video scraping command
skill-seekers video --url https://youtube.com/watch?v=abc123
skill-seekers video --playlist https://youtube.com/playlist?list=PLxxx
skill-seekers video --channel https://youtube.com/@channelname
skill-seekers video --path ./recording.mp4
skill-seekers video --directory ./recordings/
# With options
skill-seekers video --url <URL> \
--output output/react-videos/ \
--visual \
--whisper-model large-v3 \
--max-videos 20 \
--languages en \
--categories '{"hooks": ["useState", "useEffect"]}' \
--enhance-level 2
Auto-Detection via create Command
# These all auto-detect as video sources
skill-seekers create https://youtube.com/watch?v=abc123
skill-seekers create https://youtu.be/abc123
skill-seekers create https://youtube.com/playlist?list=PLxxx
skill-seekers create https://youtube.com/@channelname
skill-seekers create https://vimeo.com/123456789
skill-seekers create ./tutorial.mp4
skill-seekers create ./recordings/ # Directory of videos
# With universal flags
skill-seekers create https://youtube.com/watch?v=abc123 --visual -p comprehensive
skill-seekers create ./tutorial.mp4 --enhance-level 2 --dry-run
Registration in main.py
# In src/skill_seekers/cli/main.py - COMMAND_MODULES dict
COMMAND_MODULES = {
# ... existing commands ...
'video': 'skill_seekers.cli.video_scraper',
# ... rest of commands ...
}
Source Detection
Changes to source_detector.py
# New patterns to add:
class SourceDetector:
# Existing patterns...
# NEW: Video URL patterns
YOUTUBE_VIDEO_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'(?:youtube\.com/watch\?v=|youtu\.be/)'
r'([a-zA-Z0-9_-]{11})'
)
YOUTUBE_PLAYLIST_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'youtube\.com/playlist\?list=([a-zA-Z0-9_-]+)'
)
YOUTUBE_CHANNEL_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?'
r'youtube\.com/(?:@|c/|channel/|user/)([a-zA-Z0-9_.-]+)'
)
VIMEO_PATTERN = re.compile(
r'(?:https?://)?(?:www\.)?vimeo\.com/(\d+)'
)
# Video file extensions
VIDEO_EXTENSIONS = {
'.mp4', '.mkv', '.webm', '.avi', '.mov',
'.flv', '.ts', '.wmv', '.m4v', '.ogv',
}
@classmethod
def detect(cls, source: str) -> SourceInfo:
"""Updated detection order:
1. .json (config)
2. .pdf
3. .docx
4. Video file extensions (.mp4, .mkv, .webm, etc.) ← NEW
5. Directory (may contain videos)
6. YouTube/Vimeo URL patterns ← NEW
7. GitHub patterns
8. Web URL
9. Domain inference
"""
# 1. Config file
if source.endswith('.json'):
return cls._detect_config(source)
# 2. PDF file
if source.endswith('.pdf'):
return cls._detect_pdf(source)
# 3. Word document
if source.endswith('.docx'):
return cls._detect_word(source)
# 4. NEW: Video file
ext = os.path.splitext(source)[1].lower()
if ext in cls.VIDEO_EXTENSIONS:
return cls._detect_video_file(source)
# 5. Directory
if os.path.isdir(source):
# Check if directory contains mostly video files
if cls._is_video_directory(source):
return cls._detect_video_directory(source)
return cls._detect_local(source)
# 6. NEW: Video URL patterns (before general web URL)
video_info = cls._detect_video_url(source)
if video_info:
return video_info
# 7. GitHub patterns
github_info = cls._detect_github(source)
if github_info:
return github_info
# 8. Web URL
if source.startswith('http://') or source.startswith('https://'):
return cls._detect_web(source)
# 9. Domain inference
if '.' in source and not source.startswith('/'):
return cls._detect_web(f'https://{source}')
raise ValueError(
f"Cannot determine source type for: {source}\n\n"
"Examples:\n"
" Web: skill-seekers create https://docs.react.dev/\n"
" GitHub: skill-seekers create facebook/react\n"
" Local: skill-seekers create ./my-project\n"
" PDF: skill-seekers create tutorial.pdf\n"
" DOCX: skill-seekers create document.docx\n"
" Video: skill-seekers create https://youtube.com/watch?v=xxx\n" # NEW
" Playlist: skill-seekers create https://youtube.com/playlist?list=xxx\n" # NEW
" Config: skill-seekers create configs/react.json"
)
@classmethod
def _detect_video_url(cls, source: str) -> SourceInfo | None:
"""Detect YouTube or Vimeo video URL."""
# YouTube video
match = cls.YOUTUBE_VIDEO_PATTERN.search(source)
if match:
video_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_video',
'video_id': video_id,
'url': f'https://www.youtube.com/watch?v={video_id}',
},
suggested_name=f'video-{video_id}',
raw_input=source,
)
# YouTube playlist
match = cls.YOUTUBE_PLAYLIST_PATTERN.search(source)
if match:
playlist_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_playlist',
'playlist_id': playlist_id,
'url': f'https://www.youtube.com/playlist?list={playlist_id}',
},
suggested_name=f'playlist-{playlist_id[:12]}',
raw_input=source,
)
# YouTube channel
match = cls.YOUTUBE_CHANNEL_PATTERN.search(source)
if match:
channel_name = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'youtube_channel',
'channel': channel_name,
'url': source if source.startswith('http') else f'https://www.youtube.com/@{channel_name}',
},
suggested_name=channel_name.lstrip('@'),
raw_input=source,
)
# Vimeo
match = cls.VIMEO_PATTERN.search(source)
if match:
video_id = match.group(1)
return SourceInfo(
type='video',
parsed={
'video_source': 'vimeo',
'video_id': video_id,
'url': f'https://vimeo.com/{video_id}',
},
suggested_name=f'vimeo-{video_id}',
raw_input=source,
)
return None
@classmethod
def _detect_video_file(cls, source: str) -> SourceInfo:
"""Detect local video file."""
name = os.path.splitext(os.path.basename(source))[0]
return SourceInfo(
type='video',
parsed={
'video_source': 'local_file',
'file_path': os.path.abspath(source),
},
suggested_name=name,
raw_input=source,
)
@classmethod
def _detect_video_directory(cls, source: str) -> SourceInfo:
"""Detect directory containing video files."""
directory = os.path.abspath(source)
name = os.path.basename(directory)
return SourceInfo(
type='video',
parsed={
'video_source': 'local_directory',
'directory': directory,
},
suggested_name=name,
raw_input=source,
)
@classmethod
def _is_video_directory(cls, path: str) -> bool:
"""Check if a directory contains mostly video files.
Returns True if >50% of files are video files.
Used to distinguish video directories from code directories.
"""
total = 0
video = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
total += 1
ext = os.path.splitext(f)[1].lower()
if ext in cls.VIDEO_EXTENSIONS:
video += 1
return total > 0 and (video / total) > 0.5
@classmethod
def validate_source(cls, source_info: SourceInfo) -> None:
"""Updated to include video validation."""
# ... existing validation ...
if source_info.type == 'video':
video_source = source_info.parsed.get('video_source')
if video_source == 'local_file':
file_path = source_info.parsed['file_path']
if not os.path.exists(file_path):
raise ValueError(f"Video file does not exist: {file_path}")
elif video_source == 'local_directory':
directory = source_info.parsed['directory']
if not os.path.exists(directory):
raise ValueError(f"Video directory does not exist: {directory}")
# For online sources, validation happens during scraping
Unified Config Integration
Updated scraped_data dict in unified_scraper.py
# In UnifiedScraper.__init__():
self.scraped_data = {
"documentation": [],
"github": [],
"pdf": [],
"word": [],
"local": [],
"video": [], # ← NEW
}
Video Source Processing in Unified Scraper
def _scrape_video_source(self, source: dict, source_index: int) -> dict:
"""Process a video source from unified config.
Args:
source: Video source config dict from unified JSON
source_index: Index for unique naming
Returns:
Dict with scraping results and metadata
"""
from skill_seekers.cli.video_scraper import VideoScraper
from skill_seekers.cli.video_models import VideoSourceConfig
config = VideoSourceConfig.from_dict(source)
scraper = VideoScraper(config=config, output_dir=self.output_dir)
result = scraper.scrape()
return {
'source_type': 'video',
'source_name': source.get('name', f'video_{source_index}'),
'weight': source.get('weight', 0.2),
'result': result,
'video_count': len(result.videos),
'segment_count': result.total_segments,
'categories': result.categories,
}
Example Unified Config with Video
{
"name": "react-complete",
"description": "React 19 - Documentation + Code + Video Tutorials",
"output_dir": "output/react-complete/",
"sources": [
{
"type": "documentation",
"url": "https://react.dev/",
"name": "official_docs",
"weight": 0.4,
"selectors": {
"main_content": "article",
"code_blocks": "pre code"
},
"categories": {
"getting_started": ["learn", "quick-start"],
"hooks": ["hooks", "use-state", "use-effect"],
"api": ["reference", "api"]
}
},
{
"type": "github",
"repo": "facebook/react",
"name": "source_code",
"weight": 0.3,
"analysis_depth": "deep"
},
{
"type": "video",
"playlist": "https://www.youtube.com/playlist?list=PLreactplaylist",
"name": "official_tutorials",
"weight": 0.2,
"max_videos": 15,
"visual_extraction": true,
"languages": ["en"],
"categories": {
"getting_started": ["intro", "quickstart", "setup"],
"hooks": ["useState", "useEffect", "hooks"],
"advanced": ["suspense", "concurrent", "server"]
}
},
{
"type": "video",
"url": "https://www.youtube.com/watch?v=abc123def45",
"name": "react_conf_keynote",
"weight": 0.1,
"visual_extraction": false
}
],
"merge_strategy": "unified",
"conflict_resolution": "docs_first",
"enhancement": {
"enabled": true,
"level": 2
}
}
Create Command Integration
Changes to Create Command Routing
# In src/skill_seekers/cli/create_command.py (or equivalent in main.py)
def route_source(source_info: SourceInfo, args: argparse.Namespace):
"""Route detected source to appropriate scraper."""
if source_info.type == 'web':
return _route_web(source_info, args)
elif source_info.type == 'github':
return _route_github(source_info, args)
elif source_info.type == 'local':
return _route_local(source_info, args)
elif source_info.type == 'pdf':
return _route_pdf(source_info, args)
elif source_info.type == 'word':
return _route_word(source_info, args)
elif source_info.type == 'video': # ← NEW
return _route_video(source_info, args)
elif source_info.type == 'config':
return _route_config(source_info, args)
def _route_video(source_info: SourceInfo, args: argparse.Namespace):
"""Route video source to video scraper."""
from skill_seekers.cli.video_scraper import VideoScraper
from skill_seekers.cli.video_models import VideoSourceConfig
parsed = source_info.parsed
# Build config from CLI args + parsed source info
config_dict = {
'name': getattr(args, 'name', None) or source_info.suggested_name,
'visual_extraction': getattr(args, 'visual', False),
'whisper_model': getattr(args, 'whisper_model', 'base'),
'max_videos': getattr(args, 'max_videos', 50),
'languages': getattr(args, 'languages', None),
}
# Set the appropriate source field
video_source = parsed['video_source']
if video_source in ('youtube_video', 'vimeo'):
config_dict['url'] = parsed['url']
elif video_source == 'youtube_playlist':
config_dict['playlist'] = parsed['url']
elif video_source == 'youtube_channel':
config_dict['channel'] = parsed['url']
elif video_source == 'local_file':
config_dict['path'] = parsed['file_path']
elif video_source == 'local_directory':
config_dict['directory'] = parsed['directory']
config = VideoSourceConfig.from_dict(config_dict)
output_dir = getattr(args, 'output', None) or f'output/{config_dict["name"]}/'
scraper = VideoScraper(config=config, output_dir=output_dir)
if getattr(args, 'dry_run', False):
scraper.dry_run()
return
result = scraper.scrape()
scraper.generate_output(result)
Parser & Arguments
New Parser: video_parser.py
# src/skill_seekers/cli/parsers/video_parser.py
from skill_seekers.cli.parsers.base import SubcommandParser
class VideoParser(SubcommandParser):
"""Parser for the video scraping command."""
name = 'video'
help = 'Extract knowledge from YouTube videos, playlists, channels, or local video files'
description = (
'Process video content into structured skill documentation.\n\n'
'Supports YouTube (single video, playlist, channel), Vimeo, and local video files.\n'
'Extracts transcripts, metadata, chapters, and optionally visual content (code, slides).'
)
def add_arguments(self, parser):
# Source (mutually exclusive group)
source = parser.add_mutually_exclusive_group(required=True)
source.add_argument('--url', help='YouTube or Vimeo video URL')
source.add_argument('--playlist', help='YouTube playlist URL')
source.add_argument('--channel', help='YouTube channel URL')
source.add_argument('--path', help='Local video file path')
source.add_argument('--directory', help='Directory containing video files')
# Add shared arguments (output, dry-run, verbose, etc.)
from skill_seekers.cli.arguments.common import add_all_standard_arguments
add_all_standard_arguments(parser)
# Add video-specific arguments
from skill_seekers.cli.arguments.video import add_video_arguments
add_video_arguments(parser)
New Arguments: video.py
# src/skill_seekers/cli/arguments/video.py
VIDEO_ARGUMENTS = {
# === Filtering ===
"max_videos": {
"flags": ("--max-videos",),
"kwargs": {
"type": int,
"default": 50,
"help": "Maximum number of videos to process (default: 50)",
},
},
"min_duration": {
"flags": ("--min-duration",),
"kwargs": {
"type": float,
"default": 60.0,
"help": "Minimum video duration in seconds (default: 60)",
},
},
"max_duration": {
"flags": ("--max-duration",),
"kwargs": {
"type": float,
"default": 7200.0,
"help": "Maximum video duration in seconds (default: 7200 = 2 hours)",
},
},
"languages": {
"flags": ("--languages",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "Preferred transcript languages (default: all). Example: --languages en es",
},
},
"min_views": {
"flags": ("--min-views",),
"kwargs": {
"type": int,
"default": None,
"help": "Minimum view count filter (online videos only)",
},
},
# === Extraction ===
"visual": {
"flags": ("--visual",),
"kwargs": {
"action": "store_true",
"help": "Enable visual extraction (OCR on keyframes). Requires video-full dependencies.",
},
},
"whisper_model": {
"flags": ("--whisper-model",),
"kwargs": {
"default": "base",
"choices": ["tiny", "base", "small", "medium", "large-v3", "large-v3-turbo"],
"help": "Whisper model size for speech-to-text (default: base)",
},
},
"whisper_device": {
"flags": ("--whisper-device",),
"kwargs": {
"default": "auto",
"choices": ["auto", "cpu", "cuda"],
"help": "Device for Whisper inference (default: auto)",
},
},
"ocr_languages": {
"flags": ("--ocr-languages",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "OCR languages for visual extraction (default: same as --languages)",
},
},
# === Segmentation ===
"segment_strategy": {
"flags": ("--segment-strategy",),
"kwargs": {
"default": "hybrid",
"choices": ["chapters", "semantic", "time_window", "scene_change", "hybrid"],
"help": "How to segment video content (default: hybrid)",
},
},
"segment_duration": {
"flags": ("--segment-duration",),
"kwargs": {
"type": float,
"default": 300.0,
"help": "Target segment duration in seconds for time_window strategy (default: 300)",
},
},
# === Local file options ===
"file_patterns": {
"flags": ("--file-patterns",),
"kwargs": {
"nargs": "+",
"default": None,
"help": "File patterns for directory scanning (default: *.mp4 *.mkv *.webm)",
},
},
"recursive": {
"flags": ("--recursive",),
"kwargs": {
"action": "store_true",
"default": True,
"help": "Recursively scan directories (default: True)",
},
},
"no_recursive": {
"flags": ("--no-recursive",),
"kwargs": {
"action": "store_true",
"help": "Disable recursive directory scanning",
},
},
}
def add_video_arguments(parser):
"""Add all video-specific arguments to a parser."""
for arg_name, arg_def in VIDEO_ARGUMENTS.items():
parser.add_argument(*arg_def["flags"], **arg_def["kwargs"])
Progressive Help for Create Command
# In arguments/create.py - add video to help modes
# New help flag
"help_video": {
"flags": ("--help-video",),
"kwargs": {
"action": "store_true",
"help": "Show video-specific options",
},
}
# VIDEO_ARGUMENTS added to create command's video help mode
# skill-seekers create --help-video
MCP Tool Integration
New MCP Tool: scrape_video
# In src/skill_seekers/mcp/tools/scraping_tools.py
@mcp.tool()
def scrape_video(
url: str | None = None,
playlist: str | None = None,
path: str | None = None,
output_dir: str = "output/",
visual: bool = False,
max_videos: int = 20,
whisper_model: str = "base",
) -> str:
"""Scrape and extract knowledge from video content.
Supports YouTube videos, playlists, channels, and local video files.
Extracts transcripts, metadata, chapters, and optionally visual content.
Args:
url: YouTube or Vimeo video URL
playlist: YouTube playlist URL
path: Local video file or directory path
output_dir: Output directory for results
visual: Enable visual extraction (OCR on keyframes)
max_videos: Maximum videos to process (for playlists)
whisper_model: Whisper model size for transcription
Returns:
JSON string with scraping results summary
"""
...
Updated Tool Count
Total MCP tools: 27 (was 26, add scrape_video)
Enhancement Integration
Video Content Enhancement
Video segments can be enhanced using the same AI enhancement pipeline:
# In enhance_skill_local.py or enhance_command.py
def enhance_video_content(segments: list[VideoSegment], level: int) -> list[VideoSegment]:
"""AI-enhance video segments.
Enhancement levels:
0 - No enhancement
1 - Summary generation per segment
2 - + Topic extraction, category refinement, code annotation
3 - + Cross-segment connections, tutorial flow analysis, key takeaways
Uses the same enhancement infrastructure as other sources.
"""
if level == 0:
return segments
for segment in segments:
if level >= 1:
segment.summary = ai_summarize(segment.content)
if level >= 2:
segment.topic = ai_extract_topic(segment.content)
segment.category = ai_refine_category(
segment.content, segment.category
)
# Annotate code blocks with explanations
for cb in segment.detected_code_blocks:
cb.explanation = ai_explain_code(cb.code, segment.transcript)
if level >= 3:
# Cross-segment analysis (needs all segments)
pass # Handled at video level, not segment level
return segments
File Map (New & Modified Files)
New Files
| File | Purpose | Estimated Size |
|---|---|---|
src/skill_seekers/cli/video_scraper.py |
Main video scraper orchestrator | ~800-1000 lines |
src/skill_seekers/cli/video_models.py |
All data classes and enums | ~500-600 lines |
src/skill_seekers/cli/video_transcript.py |
Transcript extraction (YouTube API + Whisper) | ~400-500 lines |
src/skill_seekers/cli/video_visual.py |
Visual extraction (scene detection + OCR) | ~500-600 lines |
src/skill_seekers/cli/video_segmenter.py |
Segmentation and stream alignment | ~400-500 lines |
src/skill_seekers/cli/parsers/video_parser.py |
CLI argument parser | ~80-100 lines |
src/skill_seekers/cli/arguments/video.py |
Video-specific argument definitions | ~120-150 lines |
tests/test_video_scraper.py |
Video scraper tests | ~600-800 lines |
tests/test_video_transcript.py |
Transcript extraction tests | ~400-500 lines |
tests/test_video_visual.py |
Visual extraction tests | ~400-500 lines |
tests/test_video_segmenter.py |
Segmentation tests | ~300-400 lines |
tests/test_video_models.py |
Data model tests | ~200-300 lines |
tests/test_video_integration.py |
Integration tests | ~300-400 lines |
tests/fixtures/video/ |
Test fixtures (mock transcripts, metadata) | Various |
Modified Files
| File | Changes |
|---|---|
src/skill_seekers/cli/source_detector.py |
Add video URL patterns, video file detection, video directory detection |
src/skill_seekers/cli/main.py |
Register video subcommand in COMMAND_MODULES |
src/skill_seekers/cli/unified_scraper.py |
Add "video": [] to scraped_data, add _scrape_video_source() |
src/skill_seekers/cli/arguments/create.py |
Add video args to create command, add --help-video |
src/skill_seekers/cli/parsers/__init__.py |
Register VideoParser |
src/skill_seekers/cli/config_validator.py |
Validate video source entries in unified config |
src/skill_seekers/mcp/tools/scraping_tools.py |
Add scrape_video tool |
pyproject.toml |
Add [video] and [video-full] optional dependencies, add skill-seekers-video entry point |
tests/test_source_detector.py |
Add video detection tests |
tests/test_unified.py |
Add video source integration tests |