fix: resolve 15 bugs and gaps in video scraper pipeline
- Fix extract_visual_data returning 2-tuple instead of 3 (ValueError crash) - Move pytesseract from core deps to [video-full] optional group - Add 30-min timeout + user feedback to video enhancement subprocess - Add scrape_video_impl to MCP server fallback import block - Detect auto-generated YouTube captions via is_generated property - Forward --vision-ocr and --video-playlist through create command - Fix filename collision for non-ASCII video titles (fallback to video_id) - Make _vision_used a proper dataclass field on FrameSubSection - Expose 6 visual params in MCP scrape_video tool - Add install instructions on missing video deps in unified scraper - Update MCP docstring tool counts (25→33, 7 categories) - Add video and word commands to main.py docstring - Document video-full exclusion from [all] deps in pyproject.toml - Update parser registry test count (22→23 for video parser) All 2437 tests passing, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -12,6 +12,8 @@ Commands:
|
||||
scrape Scrape documentation website
|
||||
github Scrape GitHub repository
|
||||
pdf Extract from PDF file
|
||||
word Extract from Word (.docx) file
|
||||
video Extract from video (YouTube or local)
|
||||
unified Multi-source scraping (docs + GitHub + PDF)
|
||||
analyze Analyze local codebase and extract code knowledge
|
||||
enhance AI-powered enhancement (auto: API or LOCAL mode)
|
||||
|
||||
Reference in New Issue
Block a user