docs: Add GitHub scraper usage examples to README

- Added Option 4 section with CLI usage examples - Included basic scraping, config file, and authentication examples - Added MCP usage example - Listed extracted content types (Issues, CHANGELOG, Releases) - Completed Phase 7 documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-26 14:22:08 +03:00
parent 01c14d0e9c
commit c013c5bdf4
1 changed files with 47 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -48,6 +48,14 @@ Skill Seeker is an automated tool that transforms any documentation website into
 - ✅ **Parallel Processing** - 3x faster for large PDFs
 - ✅ **Intelligent Caching** - 50% faster on re-runs

+### 🐙 GitHub Repository Scraping (**NEW - v1.4.0**)
+- ✅ **Repository Structure** - Extract README, file tree, and language breakdown
+- ✅ **GitHub Issues** - Fetch open/closed issues with labels and milestones
+- ✅ **CHANGELOG Extraction** - Automatically find and extract version history
+- ✅ **Release Notes** - Pull GitHub Releases with full version history
+- ✅ **Surface Layer Approach** - API signatures and docs (no implementation dumps)
+- ✅ **MCP Integration** - Natural language: "Scrape GitHub repo facebook/react"
+
 ### 🤖 AI & Enhancement
 - ✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
 - ✅ **No API Costs** - FREE local enhancement using Claude Code Max
@@ -126,6 +134,45 @@ python3 cli/pdf_scraper.py --pdf docs/encrypted.pdf --name myskill --password my
 - ✅ Parallel processing (3x faster)
 - ✅ Intelligent caching

+### Option 4: Use CLI for GitHub Repository
+
+```bash
+# Install GitHub support
+pip3 install PyGithub
+
+# Basic repository scraping
+python3 cli/github_scraper.py --repo facebook/react
+
+# Using a config file
+python3 cli/github_scraper.py --config configs/react_github.json
+
+# With authentication (higher rate limits)
+export GITHUB_TOKEN=ghp_your_token_here
+python3 cli/github_scraper.py --repo facebook/react
+
+# Customize what to include
+python3 cli/github_scraper.py --repo django/django \
+    --include-issues \        # Extract GitHub Issues
+    --max-issues 100 \        # Limit issue count
+    --include-changelog \     # Extract CHANGELOG.md
+    --include-releases        # Extract GitHub Releases
+
+# MCP usage in Claude Code
+"Scrape GitHub repository facebook/react"
+
+# Upload output/react.zip to Claude - Done!
+```
+
+**Time:** ~5-10 minutes | **Quality:** Production-ready | **Cost:** Free
+
+**What Gets Extracted:**
+- ✅ README.md and documentation files
+- ✅ GitHub Issues (open/closed, labels, milestones)
+- ✅ CHANGELOG.md and version history
+- ✅ GitHub Releases with release notes
+- ✅ Repository metadata (stars, language, topics)
+- ✅ File structure and language breakdown
+
 ## How It Works

 ```mermaid