From c013c5bdf44a6a715c0c874022549df7f2978113 Mon Sep 17 00:00:00 2001 From: yusyus Date: Sun, 26 Oct 2025 14:22:08 +0300 Subject: [PATCH] docs: Add GitHub scraper usage examples to README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added Option 4 section with CLI usage examples - Included basic scraping, config file, and authentication examples - Added MCP usage example - Listed extracted content types (Issues, CHANGELOG, Releases) - Completed Phase 7 documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- README.md | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/README.md b/README.md index 4e2ee42..c3095ed 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,14 @@ Skill Seeker is an automated tool that transforms any documentation website into - ✅ **Parallel Processing** - 3x faster for large PDFs - ✅ **Intelligent Caching** - 50% faster on re-runs +### 🐙 GitHub Repository Scraping (**NEW - v1.4.0**) +- ✅ **Repository Structure** - Extract README, file tree, and language breakdown +- ✅ **GitHub Issues** - Fetch open/closed issues with labels and milestones +- ✅ **CHANGELOG Extraction** - Automatically find and extract version history +- ✅ **Release Notes** - Pull GitHub Releases with full version history +- ✅ **Surface Layer Approach** - API signatures and docs (no implementation dumps) +- ✅ **MCP Integration** - Natural language: "Scrape GitHub repo facebook/react" + ### 🤖 AI & Enhancement - ✅ **AI-Powered Enhancement** - Transforms basic templates into comprehensive guides - ✅ **No API Costs** - FREE local enhancement using Claude Code Max @@ -126,6 +134,45 @@ python3 cli/pdf_scraper.py --pdf docs/encrypted.pdf --name myskill --password my - ✅ Parallel processing (3x faster) - ✅ Intelligent caching +### Option 4: Use CLI for GitHub Repository + +```bash +# Install GitHub support +pip3 install PyGithub + +# Basic repository scraping +python3 cli/github_scraper.py --repo facebook/react + +# Using a config file +python3 cli/github_scraper.py --config configs/react_github.json + +# With authentication (higher rate limits) +export GITHUB_TOKEN=ghp_your_token_here +python3 cli/github_scraper.py --repo facebook/react + +# Customize what to include +python3 cli/github_scraper.py --repo django/django \ + --include-issues \ # Extract GitHub Issues + --max-issues 100 \ # Limit issue count + --include-changelog \ # Extract CHANGELOG.md + --include-releases # Extract GitHub Releases + +# MCP usage in Claude Code +"Scrape GitHub repository facebook/react" + +# Upload output/react.zip to Claude - Done! +``` + +**Time:** ~5-10 minutes | **Quality:** Production-ready | **Cost:** Free + +**What Gets Extracted:** +- ✅ README.md and documentation files +- ✅ GitHub Issues (open/closed, labels, milestones) +- ✅ CHANGELOG.md and version history +- ✅ GitHub Releases with release notes +- ✅ Repository metadata (stars, language, topics) +- ✅ File structure and language breakdown + ## How It Works ```mermaid