docs: add video feature mentions throughout README and zh-CN

- Tagline, descriptions, and feature lists now include video - Add Video Extraction subsection in Key Features (7 bullet points) - Update Feature Matrix: 5 → 6 skill modes (added Video) - Add video rows to Performance table (transcript + visual) - Add VIDEO_GUIDE.md to documentation links - Update test badge and counts: 1,880/2,283 → 2,540+ - Sync all changes to README.zh-CN.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 22:17:43 +03:00
parent 169c184ff7
commit a5ae905e63
2 changed files with 34 additions and 20 deletions
--- a/README.md
+++ b/README.md
@@ -10,7 +10,7 @@ English | [简体中文](https://github.com/yusufkaraaslan/Skill_Seekers/blob/ma
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
 [![MCP Integration](https://img.shields.io/badge/MCP-Integrated-blue.svg)](https://modelcontextprotocol.io)
-[![Tested](https://img.shields.io/badge/Tests-2283%2B%20Passing-brightgreen.svg)](tests/)
+[![Tested](https://img.shields.io/badge/Tests-2540%2B%20Passing-brightgreen.svg)](tests/)
 [![Project Board](https://img.shields.io/badge/Project-Board-purple.svg)](https://github.com/users/yusufkaraaslan/projects/2)
 [![PyPI version](https://badge.fury.io/py/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
 [![PyPI - Downloads](https://img.shields.io/pypi/dm/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
@@ -19,7 +19,7 @@ English | [简体中文](https://github.com/yusufkaraaslan/Skill_Seekers/blob/ma
 [![Twitter Follow](https://img.shields.io/twitter/follow/_yUSyUS_?style=social)](https://x.com/_yUSyUS_)
 [![GitHub Repo stars](https://img.shields.io/github/stars/yusufkaraaslan/Skill_Seekers?style=social)](https://github.com/yusufkaraaslan/Skill_Seekers)

-**🧠 The data layer for AI systems.** Skill Seekers turns any documentation, GitHub repo, or PDF into structured knowledge assets—ready to power AI Skills (Claude, Gemini, OpenAI), RAG pipelines (LangChain, LlamaIndex, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline) in minutes, not hours.
+**🧠 The data layer for AI systems.** Skill Seekers turns any documentation, GitHub repo, PDF, or video into structured knowledge assets—ready to power AI Skills (Claude, Gemini, OpenAI), RAG pipelines (LangChain, LlamaIndex, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline) in minutes, not hours.

 > 🌐 **[Visit SkillSeekersWeb.com](https://skillseekersweb.com/)** - Browse 24+ preset configs, share your configs, and access complete documentation!

@@ -62,9 +62,10 @@ skill-seekers package output/react --target cursor      # → .cursorrules
 - ⚡ **99% faster** — Days of manual data prep → 15–45 minutes
 - 🎯 **AI Skill quality** — 500+ line SKILL.md files with examples, patterns, and guides
 - 📊 **RAG-ready chunks** — Smart chunking preserves code blocks and maintains context
- 🔄 **Multi-source** — Combine docs + GitHub + PDFs into one knowledge asset
+- 🎬 **Videos** — Extract code, transcripts, and structured knowledge from YouTube and local videos
+- 🔄 **Multi-source** — Combine docs + GitHub + PDFs + videos into one knowledge asset
 - 🌐 **One prep, every target** — Export the same asset to 16 platforms without re-scraping
- ✅ **Battle-tested** — 1,880+ tests, 24+ framework presets, production-ready
+- ✅ **Battle-tested** — 2,540+ tests, 24+ framework presets, production-ready

 ## 🚀 Quick Start (3 Commands)

@@ -110,7 +111,7 @@ done

 ## What is Skill Seekers?

-Skill Seekers is the **data layer for AI systems**. It transforms documentation websites, GitHub repositories, and PDF files into structured knowledge assets for every AI target:
+Skill Seekers is the **data layer for AI systems**. It transforms documentation websites, GitHub repositories, PDF files, and videos into structured knowledge assets for every AI target:

 | Use Case | What you get | Examples |
 |----------|-------------|---------|
@@ -136,7 +137,7 @@ Skill Seekers is the **data layer for AI systems**. It transforms documentation

 Instead of spending days on manual preprocessing, Skill Seekers:

-1. **Ingests** — docs, GitHub repos, local codebases, PDFs
+1. **Ingests** — docs, GitHub repos, local codebases, PDFs, videos
 2. **Analyzes** — deep AST parsing, pattern detection, API extraction
 3. **Structures** — categorized reference files with metadata
 4. **Enhances** — AI-powered SKILL.md generation (Claude, Gemini, or local)
@@ -157,7 +158,7 @@ Instead of spending days on manual preprocessing, Skill Seekers:
 - 🤖 **RAG-ready data** — Pre-chunked LangChain `Documents`, LlamaIndex `TextNodes`, Haystack `Documents`
 - 🚀 **99% faster** — Days of preprocessing → 15–45 minutes
 - 📊 **Smart metadata** — Categories, sources, types → better retrieval accuracy
- 🔄 **Multi-source** — Combine docs + GitHub + PDFs in one pipeline
+- 🔄 **Multi-source** — Combine docs + GitHub + PDFs + videos in one pipeline
 - 🌐 **Platform-agnostic** — Export to any vector DB or framework without re-scraping

 ### For AI Coding Assistant Users
@@ -183,6 +184,15 @@ Instead of spending days on manual preprocessing, Skill Seekers:
 - ✅ **Parallel Processing** - 3x faster for large PDFs
 - ✅ **Intelligent Caching** - 50% faster on re-runs

+### 🎬 Video Extraction
+- ✅ **YouTube & Local Videos** - Extract transcripts, on-screen code, and structured knowledge from videos
+- ✅ **Visual Frame Analysis** - OCR extraction from code editors, terminals, slides, and diagrams
+- ✅ **GPU Auto-Detection** - Automatically installs correct PyTorch build (CUDA/ROCm/MPS/CPU)
+- ✅ **AI Enhancement** - Two-pass: clean OCR artifacts + generate polished SKILL.md
+- ✅ **Time Clipping** - Extract specific sections with `--start-time` and `--end-time`
+- ✅ **Playlist Support** - Batch process all videos in a YouTube playlist
+- ✅ **Vision API Fallback** - Use Claude Vision for low-confidence OCR frames
+
 ### 🐙 GitHub Repository Analysis
 - ✅ **Deep Code Analysis** - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
 - ✅ **API Extraction** - Functions, classes, methods with parameters and types
@@ -564,7 +574,7 @@ stages:
 - ✅ **Caching System** - Scrape once, rebuild instantly

 ### ✅ Quality Assurance
- ✅ **Fully Tested** - 1,880+ tests with comprehensive coverage
+- ✅ **Fully Tested** - 2,540+ tests with comprehensive coverage

 ---

@@ -645,10 +655,10 @@ skill-seekers install --config react --dry-run

 ## 📊 Feature Matrix

-Skill Seekers supports **4 LLM platforms** and **5 skill modes** with full feature parity.
+Skill Seekers supports **4 LLM platforms** and **6 skill modes** with full feature parity.

 **Platforms:** Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
-**Skill Modes:** Documentation, GitHub, PDF, Unified Multi-Source, Local Repository
+**Skill Modes:** Documentation, GitHub, PDF, Video, Unified Multi-Source, Local Repository

 See [Complete Feature Matrix](docs/FEATURE_MATRIX.md) for detailed platform and feature support.

@@ -1078,6 +1088,8 @@ skill-seekers config --github
 | Re-building | <1 min | With `--skip-scrape` |
 | Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
 | Enhancement (API) | 20-40 sec | Requires API key |
+| Video (transcript) | 1-3 min | YouTube/local, transcript only |
+| Video (visual) | 5-15 min | + OCR frame extraction |
 | Packaging | 5-10 sec | Final .zip creation |

 ---
@@ -1096,6 +1108,7 @@ skill-seekers config --github
 - **[docs/ENHANCEMENT_MODES.md](docs/ENHANCEMENT_MODES.md)** - AI enhancement modes guide
 - **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP integration setup
 - **[docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)** - Multi-source scraping
+- **[docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md)** - Video extraction guide

 ### Integration Guides
 - **[docs/integrations/LANGCHAIN.md](docs/integrations/LANGCHAIN.md)** - LangChain RAG
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -14,7 +14,7 @@
 [![许可证: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
 [![MCP 集成](https://img.shields.io/badge/MCP-Integrated-blue.svg)](https://modelcontextprotocol.io)
-[![测试通过](https://img.shields.io/badge/Tests-1880%2B%20Passing-brightgreen.svg)](tests/)
+[![测试通过](https://img.shields.io/badge/Tests-2540%2B%20Passing-brightgreen.svg)](tests/)
 [![项目看板](https://img.shields.io/badge/Project-Board-purple.svg)](https://github.com/users/yusufkaraaslan/projects/2)
 [![PyPI 版本](https://badge.fury.io/py/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
 [![PyPI - 下载量](https://img.shields.io/pypi/dm/skill-seekers.svg)](https://pypi.org/project/skill-seekers/)
@@ -23,7 +23,7 @@
 [![关注 Twitter](https://img.shields.io/twitter/follow/_yUSyUS_?style=social)](https://x.com/_yUSyUS_)
 [![GitHub Stars](https://img.shields.io/github/stars/yusufkaraaslan/Skill_Seekers?style=social)](https://github.com/yusufkaraaslan/Skill_Seekers)

-**🧠 AI 系统的数据层。** Skill Seekers 将任何文档、GitHub 仓库或 PDF 转换为结构化知识资产——可在几分钟内为 AI 技能（Claude、Gemini、OpenAI）、RAG 流水线（LangChain、LlamaIndex、Pinecone）和 AI 编程助手（Cursor、Windsurf、Cline）提供支持。
+**🧠 AI 系统的数据层。** Skill Seekers 将任何文档、GitHub 仓库、PDF 或视频转换为结构化知识资产——可在几分钟内为 AI 技能（Claude、Gemini、OpenAI）、RAG 流水线（LangChain、LlamaIndex、Pinecone）和 AI 编程助手（Cursor、Windsurf、Cline）提供支持。

 > 🌐 **[访问 SkillSeekersWeb.com](https://skillseekersweb.com/)** - 浏览 24+ 个预设配置，分享您的配置，访问完整文档！

@@ -68,7 +68,8 @@ skill-seekers package output/react --target cursor      # → .cursorrules
 - 📊 **RAG 就绪的分块** — 智能分块保留代码块并维护上下文
 - 🔄 **多源支持** — 将文档 + GitHub + PDF 合并为一个知识资产
 - 🌐 **一次准备，导出所有目标** — 无需重新抓取即可导出到 16 个平台
- ✅ **久经考验** — 1,880+ 测试，24+ 框架预设，生产就绪
+- 🎬 **视频** — 从 YouTube 和本地视频提取代码、字幕和结构化知识
+- ✅ **久经考验** — 2,540+ 测试，24+ 框架预设，生产就绪

 ## 快速开始

@@ -99,7 +100,7 @@ skill-seekers package output/django --target cursor     # Cursor IDE 上下文

 ## 什么是 Skill Seekers？

-Skill Seekers 是 **AI 系统的数据层**，将文档网站、GitHub 仓库和 PDF 文件转换为适用于所有 AI 目标的结构化知识资产：
+Skill Seekers 是 **AI 系统的数据层**，将文档网站、GitHub 仓库、PDF 文件和视频转换为适用于所有 AI 目标的结构化知识资产：

 | 使用场景 | 获得的内容 | 示例 |
 |---------|-----------|------|
@@ -110,7 +111,7 @@ Skill Seekers 是 **AI 系统的数据层**，将文档网站、GitHub 仓库和

 Skill Seekers 通过以下步骤代替数天的手动预处理工作：

-1. **采集** — 文档、GitHub 仓库、本地代码库、PDF
+1. **采集** — 文档、GitHub 仓库、本地代码库、PDF、视频
 2. **分析** — 深度 AST 解析、模式检测、API 提取
 3. **结构化** — 带元数据的分类参考文件
 4. **增强** — AI 驱动的 SKILL.md 生成（Claude、Gemini 或本地）
@@ -157,8 +158,8 @@ Skill Seekers 通过以下步骤代替数天的手动预处理工作：
 - ✅ **并行处理** - 大型 PDF 快 3 倍
 - ✅ **智能缓存** - 重复运行快 50%

-### 🎬 视频教程提取
- ✅ **YouTube 和本地视频** - 从视频教程提取字幕、代码和结构化知识
+### 🎬 视频提取
+- ✅ **YouTube 和本地视频** - 从视频提取字幕、代码和结构化知识
 - ✅ **视觉帧分析** - 屏幕 OCR 提取代码编辑器、终端和幻灯片内容
 - ✅ **GPU 自动检测** - 自动安装正确的 PyTorch 版本（CUDA/ROCm/MPS/CPU）
 - ✅ **AI 增强** - 两阶段增强：清理 OCR + 生成精美 SKILL.md
@@ -489,7 +490,7 @@ stages:
 - ✅ **缓存系统** - 抓取一次，即时重建

 ### ✅ 质量保证
- ✅ **全面测试** - 1,880+ 测试，全面覆盖
+- ✅ **全面测试** - 2,540+ 测试，全面覆盖

 ---

@@ -613,7 +614,7 @@ skill-seekers pdf --pdf docs/manual.pdf --name myskill \
 skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
 ```

-### 视频教程提取
+### 视频提取

 ```bash
 # 安装视频支持
@@ -1013,7 +1014,7 @@ skill-seekers config --github
 - **[docs/ENHANCEMENT_MODES.md](docs/ENHANCEMENT_MODES.md)** - AI 增强模式指南
 - **[docs/MCP_SETUP.md](docs/MCP_SETUP.md)** - MCP 集成设置
 - **[docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md)** - 多源抓取
- **[docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md)** - 视频教程提取完整指南
+- **[docs/VIDEO_GUIDE.md](docs/VIDEO_GUIDE.md)** - 视频提取完整指南

 ### 集成指南
 - **[docs/integrations/LANGCHAIN.md](docs/integrations/LANGCHAIN.md)** - LangChain RAG