fix: resolve 15 bugs and gaps in video scraper pipeline

- Fix extract_visual_data returning 2-tuple instead of 3 (ValueError crash)
- Move pytesseract from core deps to [video-full] optional group
- Add 30-min timeout + user feedback to video enhancement subprocess
- Add scrape_video_impl to MCP server fallback import block
- Detect auto-generated YouTube captions via is_generated property
- Forward --vision-ocr and --video-playlist through create command
- Fix filename collision for non-ASCII video titles (fallback to video_id)
- Make _vision_used a proper dataclass field on FrameSubSection
- Expose 6 visual params in MCP scrape_video tool
- Add install instructions on missing video deps in unified scraper
- Update MCP docstring tool counts (25→33, 7 categories)
- Add video and word commands to main.py docstring
- Document video-full exclusion from [all] deps in pyproject.toml
- Update parser registry test count (22→23 for video parser)

All 2437 tests passing, 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-03-01 12:39:21 +03:00
parent 066e19674a
commit 12bc29ab36
13 changed files with 171 additions and 33 deletions

View File

@@ -24,12 +24,12 @@ class TestParserRegistry:
def test_all_parsers_registered(self):
"""Test that all parsers are registered."""
assert len(PARSERS) == 22, f"Expected 22 parsers, got {len(PARSERS)}"
assert len(PARSERS) == 23, f"Expected 23 parsers, got {len(PARSERS)}"
def test_get_parser_names(self):
"""Test getting list of parser names."""
names = get_parser_names()
assert len(names) == 22
assert len(names) == 23
assert "scrape" in names
assert "github" in names
assert "package" in names
@@ -37,6 +37,7 @@ class TestParserRegistry:
assert "analyze" in names
assert "config" in names
assert "workflows" in names
assert "video" in names
def test_all_parsers_are_subcommand_parsers(self):
"""Test that all parsers inherit from SubcommandParser."""
@@ -242,9 +243,9 @@ class TestBackwardCompatibility:
assert cmd in names, f"Command '{cmd}' not found in parser registry!"
def test_command_count_matches(self):
"""Test that we have exactly 22 commands (includes new create, workflows, and word commands)."""
assert len(PARSERS) == 22
assert len(get_parser_names()) == 22
"""Test that we have exactly 23 commands (includes create, workflows, word, and video commands)."""
assert len(PARSERS) == 23
assert len(get_parser_names()) == 23
if __name__ == "__main__":