docs: Add GitHub scraper usage examples to README
- Added Option 4 section with CLI usage examples - Included basic scraping, config file, and authentication examples - Added MCP usage example - Listed extracted content types (Issues, CHANGELOG, Releases) - Completed Phase 7 documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
47
README.md
47
README.md
@@ -48,6 +48,14 @@ Skill Seeker is an automated tool that transforms any documentation website into
|
||||
- ✅ **Parallel Processing** - 3x faster for large PDFs
|
||||
- ✅ **Intelligent Caching** - 50% faster on re-runs
|
||||
|
||||
### 🐙 GitHub Repository Scraping (**NEW - v1.4.0**)
|
||||
- ✅ **Repository Structure** - Extract README, file tree, and language breakdown
|
||||
- ✅ **GitHub Issues** - Fetch open/closed issues with labels and milestones
|
||||
- ✅ **CHANGELOG Extraction** - Automatically find and extract version history
|
||||
- ✅ **Release Notes** - Pull GitHub Releases with full version history
|
||||
- ✅ **Surface Layer Approach** - API signatures and docs (no implementation dumps)
|
||||
- ✅ **MCP Integration** - Natural language: "Scrape GitHub repo facebook/react"
|
||||
|
||||
### 🤖 AI & Enhancement
|
||||
- ✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides
|
||||
- ✅ **No API Costs** - FREE local enhancement using Claude Code Max
|
||||
@@ -126,6 +134,45 @@ python3 cli/pdf_scraper.py --pdf docs/encrypted.pdf --name myskill --password my
|
||||
- ✅ Parallel processing (3x faster)
|
||||
- ✅ Intelligent caching
|
||||
|
||||
### Option 4: Use CLI for GitHub Repository
|
||||
|
||||
```bash
|
||||
# Install GitHub support
|
||||
pip3 install PyGithub
|
||||
|
||||
# Basic repository scraping
|
||||
python3 cli/github_scraper.py --repo facebook/react
|
||||
|
||||
# Using a config file
|
||||
python3 cli/github_scraper.py --config configs/react_github.json
|
||||
|
||||
# With authentication (higher rate limits)
|
||||
export GITHUB_TOKEN=ghp_your_token_here
|
||||
python3 cli/github_scraper.py --repo facebook/react
|
||||
|
||||
# Customize what to include
|
||||
python3 cli/github_scraper.py --repo django/django \
|
||||
--include-issues \ # Extract GitHub Issues
|
||||
--max-issues 100 \ # Limit issue count
|
||||
--include-changelog \ # Extract CHANGELOG.md
|
||||
--include-releases # Extract GitHub Releases
|
||||
|
||||
# MCP usage in Claude Code
|
||||
"Scrape GitHub repository facebook/react"
|
||||
|
||||
# Upload output/react.zip to Claude - Done!
|
||||
```
|
||||
|
||||
**Time:** ~5-10 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
**What Gets Extracted:**
|
||||
- ✅ README.md and documentation files
|
||||
- ✅ GitHub Issues (open/closed, labels, milestones)
|
||||
- ✅ CHANGELOG.md and version history
|
||||
- ✅ GitHub Releases with release notes
|
||||
- ✅ Repository metadata (stars, language, topics)
|
||||
- ✅ File structure and language breakdown
|
||||
|
||||
## How It Works
|
||||
|
||||
```mermaid
|
||||
|
||||
Reference in New Issue
Block a user