docs: Update README with modern Python packaging instructions
Added comprehensive Quick Start section showing: - **Option 1**: uv tool install (recommended, modern Python) - **Option 2**: pip install (traditional) - **Option 3**: Development install (from source) - **Option 4**: MCP integration (Claude Code) - **Option 5**: Legacy CLI (backwards compatible) Updated all usage examples to use new unified CLI: - python3 cli/doc_scraper.py → skill-seekers scrape - python3 cli/github_scraper.py → skill-seekers github - python3 cli/pdf_scraper.py → skill-seekers pdf - python3 cli/unified_scraper.py → skill-seekers unified - python3 cli/package_skill.py → skill-seekers package Highlights: - uv tool install skill-seekers (no cloning needed!) - uv tool run --from skill-seekers (run without installing) - Clean, simple commands: skill-seekers <command> - Backwards compatible with old method Addresses issue #168 - Modern Python packaging with uv support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
128
README.md
128
README.md
@@ -85,9 +85,52 @@ Skill Seeker is an automated tool that transforms documentation websites, GitHub
|
||||
### ✅ Quality Assurance
|
||||
- ✅ **Fully Tested** - 299 tests with 100% pass rate
|
||||
|
||||
## Quick Example
|
||||
## Quick Start
|
||||
|
||||
### Option 1: Use from Claude Code (Recommended)
|
||||
### Option 1: Install via uv (Recommended - Modern Python)
|
||||
|
||||
```bash
|
||||
# Install with uv (no cloning needed!)
|
||||
uv tool install skill-seekers
|
||||
|
||||
# Or run directly without installing
|
||||
uv tool run --from skill-seekers skill-seekers scrape --config https://raw.githubusercontent.com/yusufkaraaslan/Skill_Seekers/main/configs/react.json
|
||||
|
||||
# Unified CLI - simple commands
|
||||
skill-seekers scrape --config configs/react.json
|
||||
skill-seekers github --repo facebook/react
|
||||
skill-seekers package output/react/
|
||||
```
|
||||
|
||||
**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 2: Install via pip (Traditional)
|
||||
|
||||
```bash
|
||||
# Install from PyPI
|
||||
pip install skill-seekers
|
||||
|
||||
# Use the unified CLI
|
||||
skill-seekers scrape --config configs/react.json
|
||||
skill-seekers enhance output/react/
|
||||
skill-seekers package output/react/
|
||||
```
|
||||
|
||||
**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 3: Development Install (From Source)
|
||||
|
||||
```bash
|
||||
# Clone and install in editable mode
|
||||
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
|
||||
cd Skill_Seekers
|
||||
pip install -e .
|
||||
|
||||
# Use the unified CLI
|
||||
skill-seekers scrape --config configs/react.json
|
||||
```
|
||||
|
||||
### Option 4: Use from Claude Code (MCP Integration)
|
||||
|
||||
```bash
|
||||
# One-time setup (5 minutes)
|
||||
@@ -100,100 +143,91 @@ Skill Seeker is an automated tool that transforms documentation websites, GitHub
|
||||
|
||||
**Time:** Automated | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 2: Use CLI Directly (HTML Docs)
|
||||
### Option 5: Legacy CLI (Backwards Compatible)
|
||||
|
||||
```bash
|
||||
# Install dependencies (2 pip packages)
|
||||
# Install dependencies
|
||||
pip3 install requests beautifulsoup4
|
||||
|
||||
# Generate a React skill in one command
|
||||
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
||||
# Run scripts directly (old method)
|
||||
python3 src/skill_seekers/cli/doc_scraper.py --config configs/react.json
|
||||
|
||||
# Upload output/react.zip to Claude - Done!
|
||||
```
|
||||
|
||||
**Time:** ~25 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
### Option 3: Use CLI for PDF Documentation
|
||||
## Usage Examples
|
||||
|
||||
### Documentation Scraping
|
||||
|
||||
```bash
|
||||
# Install PDF support
|
||||
pip3 install PyMuPDF
|
||||
# Scrape documentation website
|
||||
skill-seekers scrape --config configs/react.json
|
||||
|
||||
# Quick scrape without config
|
||||
skill-seekers scrape --url https://react.dev --name react
|
||||
|
||||
# With async mode (3x faster)
|
||||
skill-seekers scrape --config configs/godot.json --async --workers 8
|
||||
```
|
||||
|
||||
### PDF Extraction
|
||||
|
||||
```bash
|
||||
# Basic PDF extraction
|
||||
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill
|
||||
skill-seekers pdf --pdf docs/manual.pdf --name myskill
|
||||
|
||||
# Advanced features
|
||||
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill \
|
||||
skill-seekers pdf --pdf docs/manual.pdf --name myskill \
|
||||
--extract-tables \ # Extract tables
|
||||
--parallel \ # Fast parallel processing
|
||||
--workers 8 # Use 8 CPU cores
|
||||
|
||||
# Scanned PDFs (requires: pip install pytesseract Pillow)
|
||||
python3 cli/pdf_scraper.py --pdf docs/scanned.pdf --name myskill --ocr
|
||||
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
|
||||
|
||||
# Password-protected PDFs
|
||||
python3 cli/pdf_scraper.py --pdf docs/encrypted.pdf --name myskill --password mypassword
|
||||
|
||||
# Upload output/myskill.zip to Claude - Done!
|
||||
skill-seekers pdf --pdf docs/encrypted.pdf --name myskill --password mypassword
|
||||
```
|
||||
|
||||
**Time:** ~5-15 minutes (or 2-5 minutes with parallel) | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
**Advanced Features:**
|
||||
- ✅ OCR for scanned PDFs (requires pytesseract)
|
||||
- ✅ Password-protected PDF support
|
||||
- ✅ Table extraction
|
||||
- ✅ Parallel processing (3x faster)
|
||||
- ✅ Intelligent caching
|
||||
|
||||
### Option 4: Use CLI for GitHub Repository
|
||||
### GitHub Repository Scraping
|
||||
|
||||
```bash
|
||||
# Install GitHub support
|
||||
pip3 install PyGithub
|
||||
|
||||
# Basic repository scraping
|
||||
python3 cli/github_scraper.py --repo facebook/react
|
||||
skill-seekers github --repo facebook/react
|
||||
|
||||
# Using a config file
|
||||
python3 cli/github_scraper.py --config configs/react_github.json
|
||||
skill-seekers github --config configs/react_github.json
|
||||
|
||||
# With authentication (higher rate limits)
|
||||
export GITHUB_TOKEN=ghp_your_token_here
|
||||
python3 cli/github_scraper.py --repo facebook/react
|
||||
skill-seekers github --repo facebook/react
|
||||
|
||||
# Customize what to include
|
||||
python3 cli/github_scraper.py --repo django/django \
|
||||
skill-seekers github --repo django/django \
|
||||
--include-issues \ # Extract GitHub Issues
|
||||
--max-issues 100 \ # Limit issue count
|
||||
--include-changelog \ # Extract CHANGELOG.md
|
||||
--include-releases # Extract GitHub Releases
|
||||
|
||||
# MCP usage in Claude Code
|
||||
"Scrape GitHub repository facebook/react"
|
||||
|
||||
# Upload output/react.zip to Claude - Done!
|
||||
```
|
||||
|
||||
**Time:** ~5-10 minutes | **Quality:** Production-ready | **Cost:** Free
|
||||
|
||||
**What Gets Extracted:**
|
||||
- ✅ README.md and documentation files
|
||||
- ✅ GitHub Issues (open/closed, labels, milestones)
|
||||
- ✅ CHANGELOG.md and version history
|
||||
- ✅ GitHub Releases with release notes
|
||||
- ✅ Repository metadata (stars, language, topics)
|
||||
- ✅ File structure and language breakdown
|
||||
|
||||
### Option 5: Unified Multi-Source Scraping (**NEW - v2.0.0**)
|
||||
### Unified Multi-Source Scraping (**NEW - v2.0.0**)
|
||||
|
||||
**The Problem:** Documentation and code often drift apart. Docs might be outdated, missing features that exist in code, or documenting features that were removed.
|
||||
|
||||
**The Solution:** Combine documentation + GitHub + PDF into one unified skill that shows BOTH what's documented AND what actually exists, with clear warnings about discrepancies.
|
||||
|
||||
```bash
|
||||
# Create unified config (mix documentation + GitHub)
|
||||
# Use existing unified configs
|
||||
skill-seekers unified --config configs/react_unified.json
|
||||
skill-seekers unified --config configs/django_unified.json
|
||||
|
||||
# Or create unified config (mix documentation + GitHub)
|
||||
cat > configs/myframework_unified.json << 'EOF'
|
||||
{
|
||||
"name": "myframework",
|
||||
@@ -217,8 +251,10 @@ cat > configs/myframework_unified.json << 'EOF'
|
||||
EOF
|
||||
|
||||
# Run unified scraper
|
||||
python3 cli/unified_scraper.py --config configs/myframework_unified.json
|
||||
skill-seekers unified --config configs/myframework_unified.json
|
||||
|
||||
# Package and upload
|
||||
skill-seekers package output/myframework/
|
||||
# Upload output/myframework.zip to Claude - Done!
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user