Update platform counts (4→12) in: - docs/reference/CLAUDE_INTEGRATION.md (EN + zh-CN) - docs/guides/MCP_SETUP.md, UPLOAD_GUIDE.md, MIGRATION_GUIDE.md - docs/strategy/INTEGRATION_STRATEGY.md, DEEPWIKI_ANALYSIS.md, KIMI_ANALYSIS_COMPARISON.md - docs/archive/historical/HTTPX_SKILL_GRADING.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
537 lines
20 KiB
Markdown
537 lines
20 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## 🎯 Current Status (January 8, 2026)
|
|
|
|
**Version:** v2.6.0 (Three-Stream GitHub Architecture - Phases 1-5 Complete!)
|
|
**Active Development:** Phase 6 pending (Documentation & Examples)
|
|
|
|
### Recent Updates (January 2026):
|
|
|
|
**🚀 MAJOR RELEASE: Three-Stream GitHub Architecture (v2.6.0)**
|
|
- **✅ Phases 1-5 Complete** (26 hours implementation, 81 tests passing)
|
|
- **NEW: GitHub Three-Stream Fetcher** - Split repos into Code, Docs, Insights streams
|
|
- **NEW: Unified Codebase Analyzer** - Works with GitHub URLs + local paths, C3.x as analysis depth
|
|
- **ENHANCED: Source Merging** - Multi-layer merge with GitHub docs and insights
|
|
- **ENHANCED: Router Generation** - GitHub metadata, README quick start, common issues
|
|
- **CRITICAL FIX: Actual C3.x Integration** - Real pattern detection (not placeholders)
|
|
- **Quality Metrics**: GitHub overhead 20-60 lines, router size 60-250 lines
|
|
- **Documentation**: Complete implementation summary and E2E tests
|
|
|
|
### Recent Updates (December 2025):
|
|
|
|
**🎉 MAJOR RELEASE: Multi-Platform Feature Parity! (v2.5.0)**
|
|
- **🌐 Multi-LLM Support**: Full support for 12 platforms - Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, OpenCode, Kimi, DeepSeek, Qwen, OpenRouter, Together AI, Fireworks AI, Generic Markdown
|
|
- **🔄 Complete Feature Parity**: All skill modes work with all platforms
|
|
- **🏗️ Platform Adaptors**: Clean architecture with platform-specific implementations
|
|
- **✨ 26 MCP Tools**: Enhanced with multi-platform support (package, upload, enhance)
|
|
- **📚 Comprehensive Documentation**: Complete guides for all platforms
|
|
- **🧪 Test Coverage**: 1,880+ tests passing, extensive platform compatibility testing
|
|
|
|
**🚀 NEW: Three-Stream GitHub Architecture (v2.6.0)**
|
|
- **📊 Three-Stream Fetcher**: Split GitHub repos into Code, Docs, and Insights streams
|
|
- **🔬 Unified Codebase Analyzer**: Works with GitHub URLs and local paths
|
|
- **🎯 Enhanced Router Generation**: GitHub insights + C3.x patterns for better routing
|
|
- **📝 GitHub Issue Integration**: Common problems and solutions in sub-skills
|
|
- **✅ 81 Tests Passing**: Comprehensive E2E validation (0.43 seconds)
|
|
|
|
## Three-Stream GitHub Architecture
|
|
|
|
**New in v2.6.0**: GitHub repositories are now analyzed using a three-stream architecture:
|
|
|
|
**STREAM 1: Code** (for C3.x analysis)
|
|
- Files: `*.py, *.js, *.ts, *.go, *.rs, *.java, etc.`
|
|
- Purpose: Deep code analysis with C3.x components
|
|
- Time: 20-60 minutes
|
|
- Components: Patterns (C3.1), Examples (C3.2), Guides (C3.3), Configs (C3.4), Architecture (C3.7)
|
|
|
|
**STREAM 2: Documentation** (from repository)
|
|
- Files: `README.md, CONTRIBUTING.md, docs/*.md`
|
|
- Purpose: Quick start guides and official documentation
|
|
- Time: 1-2 minutes
|
|
|
|
**STREAM 3: GitHub Insights** (metadata & community)
|
|
- Data: Open issues, closed issues, labels, stars, forks
|
|
- Purpose: Real user problems and known solutions
|
|
- Time: 1-2 minutes
|
|
|
|
### Usage Example
|
|
|
|
```python
|
|
from skill_seekers.cli.unified_codebase_analyzer import UnifiedCodebaseAnalyzer
|
|
|
|
# Analyze GitHub repo with three streams
|
|
analyzer = UnifiedCodebaseAnalyzer()
|
|
result = analyzer.analyze(
|
|
source="https://github.com/facebook/react",
|
|
depth="c3x", # or "basic"
|
|
fetch_github_metadata=True
|
|
)
|
|
|
|
# Access all three streams
|
|
print(f"Files: {len(result.code_analysis['files'])}")
|
|
print(f"README: {result.github_docs['readme'][:100]}")
|
|
print(f"Stars: {result.github_insights['metadata']['stars']}")
|
|
print(f"C3.x Patterns: {len(result.code_analysis['c3_1_patterns'])}")
|
|
```
|
|
|
|
### Router Generation with GitHub
|
|
|
|
```python
|
|
from skill_seekers.cli.generate_router import RouterGenerator
|
|
from skill_seekers.cli.github_fetcher import GitHubThreeStreamFetcher
|
|
|
|
# Fetch GitHub repo with three streams
|
|
fetcher = GitHubThreeStreamFetcher("https://github.com/jlowin/fastmcp")
|
|
three_streams = fetcher.fetch()
|
|
|
|
# Generate router with GitHub integration
|
|
generator = RouterGenerator(
|
|
['configs/fastmcp-oauth.json', 'configs/fastmcp-async.json'],
|
|
github_streams=three_streams
|
|
)
|
|
|
|
# Result includes:
|
|
# - Repository stats (stars, language)
|
|
# - README quick start
|
|
# - Common issues from GitHub
|
|
# - Enhanced routing keywords (GitHub labels with 2x weight)
|
|
skill_md = generator.generate_skill_md()
|
|
```
|
|
|
|
**See full documentation**: [Three-Stream Implementation Summary](IMPLEMENTATION_SUMMARY_THREE_STREAM.md)
|
|
|
|
## Overview
|
|
|
|
This is a Python-based documentation scraper that converts ANY documentation website into a Claude skill. It's a single-file tool (`doc_scraper.py`) that scrapes documentation, extracts code patterns, detects programming languages, and generates structured skill files ready for use with Claude.
|
|
|
|
## Dependencies
|
|
|
|
```bash
|
|
pip3 install requests beautifulsoup4
|
|
```
|
|
|
|
## Core Commands
|
|
|
|
### Run with a preset configuration
|
|
```bash
|
|
python3 cli/doc_scraper.py --config configs/godot.json
|
|
python3 cli/doc_scraper.py --config configs/react.json
|
|
python3 cli/doc_scraper.py --config configs/vue.json
|
|
python3 cli/doc_scraper.py --config configs/django.json
|
|
python3 cli/doc_scraper.py --config configs/fastapi.json
|
|
```
|
|
|
|
### Interactive mode (for new frameworks)
|
|
```bash
|
|
python3 cli/doc_scraper.py --interactive
|
|
```
|
|
|
|
### Quick mode (minimal config)
|
|
```bash
|
|
python3 cli/doc_scraper.py --name react --url https://react.dev/ --description "React framework"
|
|
```
|
|
|
|
### Skip scraping (use cached data)
|
|
```bash
|
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
|
```
|
|
|
|
### Resume interrupted scrapes
|
|
```bash
|
|
# If scrape was interrupted
|
|
python3 cli/doc_scraper.py --config configs/godot.json --resume
|
|
|
|
# Start fresh (clear checkpoint)
|
|
python3 cli/doc_scraper.py --config configs/godot.json --fresh
|
|
```
|
|
|
|
### Large documentation (10K-40K+ pages)
|
|
```bash
|
|
# 1. Estimate page count
|
|
python3 cli/estimate_pages.py configs/godot.json
|
|
|
|
# 2. Split into focused sub-skills
|
|
python3 cli/split_config.py configs/godot.json --strategy router
|
|
|
|
# 3. Generate router skill
|
|
python3 cli/generate_router.py configs/godot-*.json
|
|
|
|
# 4. Package multiple skills
|
|
python3 cli/package_multi.py output/godot*/
|
|
```
|
|
|
|
### AI-powered SKILL.md enhancement
|
|
```bash
|
|
# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
|
|
pip3 install anthropic
|
|
export ANTHROPIC_API_KEY=sk-ant-...
|
|
python3 cli/doc_scraper.py --config configs/react.json --enhance
|
|
|
|
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
|
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
|
|
|
# Option 3: Standalone after scraping (API-based)
|
|
python3 cli/enhance_skill.py output/react/
|
|
|
|
# Option 4: Standalone after scraping (LOCAL, no API key)
|
|
python3 cli/enhance_skill_local.py output/react/
|
|
```
|
|
|
|
The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
|
|
|
|
### MCP Integration (Claude Code)
|
|
```bash
|
|
# One-time setup
|
|
./setup_mcp.sh
|
|
|
|
# Then in Claude Code, use natural language:
|
|
"List all available configs"
|
|
"Generate config for Tailwind at https://tailwindcss.com/docs"
|
|
"Split configs/godot.json using router strategy"
|
|
"Generate router for configs/godot-*.json"
|
|
"Package skill at output/react/"
|
|
```
|
|
|
|
26 MCP tools available with multi-platform support: list_configs, generate_config, validate_config, fetch_config, estimate_pages, scrape_docs, scrape_github, scrape_pdf, package_skill, upload_skill, enhance_skill (NEW), install_skill, split_config, generate_router, add_config_source, list_config_sources, remove_config_source, submit_config
|
|
|
|
### Test with limited pages (edit config first)
|
|
Set `"max_pages": 20` in the config file to test with fewer pages.
|
|
|
|
## Multi-Platform Support (v2.5.0+)
|
|
|
|
**4 Platforms Fully Supported:**
|
|
- **Claude AI** (default) - ZIP format, Skills API, MCP integration
|
|
- **Google Gemini** - tar.gz format, Files API, 1M token context
|
|
- **OpenAI ChatGPT** - ZIP format, Assistants API, Vector Store
|
|
- **Generic Markdown** - ZIP format, universal compatibility
|
|
|
|
**All skill modes work with all platforms:**
|
|
- Documentation scraping
|
|
- GitHub repository analysis
|
|
- PDF extraction
|
|
- Unified multi-source
|
|
- Local repository analysis
|
|
|
|
**Use the `--target` parameter for packaging, upload, and enhancement:**
|
|
```bash
|
|
# Package for different platforms
|
|
skill-seekers package output/react/ --target claude # Default
|
|
skill-seekers package output/react/ --target gemini
|
|
skill-seekers package output/react/ --target openai
|
|
skill-seekers package output/react/ --target markdown
|
|
|
|
# Upload to platforms (requires API keys)
|
|
skill-seekers upload output/react.zip --target claude
|
|
skill-seekers upload output/react-gemini.tar.gz --target gemini
|
|
skill-seekers upload output/react-openai.zip --target openai
|
|
|
|
# Enhance with platform-specific AI
|
|
skill-seekers enhance output/react/ --target claude # Sonnet 4
|
|
skill-seekers enhance output/react/ --target gemini --mode api # Gemini 2.0
|
|
skill-seekers enhance output/react/ --target openai --mode api # GPT-4o
|
|
```
|
|
|
|
See [Multi-Platform Guide](UPLOAD_GUIDE.md) and [Feature Matrix](FEATURE_MATRIX.md) for complete details.
|
|
|
|
## Architecture
|
|
|
|
### Single-File Design
|
|
The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a class-based architecture with a single `DocToSkillConverter` class that handles:
|
|
- **Web scraping**: BFS traversal with URL validation
|
|
- **Content extraction**: CSS selectors for title, content, code blocks
|
|
- **Language detection**: Heuristic-based detection from code samples (Python, JavaScript, GDScript, C++, etc.)
|
|
- **Pattern extraction**: Identifies common coding patterns from documentation
|
|
- **Categorization**: Smart categorization using URL structure, page titles, and content keywords with scoring
|
|
- **Skill generation**: Creates SKILL.md with real code examples and categorized reference files
|
|
|
|
### Data Flow
|
|
1. **Scrape Phase**:
|
|
- Input: Config JSON (name, base_url, selectors, url_patterns, categories, rate_limit, max_pages)
|
|
- Process: BFS traversal starting from base_url, respecting include/exclude patterns
|
|
- Output: `output/{name}_data/pages/*.json` + `summary.json`
|
|
|
|
2. **Build Phase**:
|
|
- Input: Scraped JSON data from `output/{name}_data/`
|
|
- Process: Load pages → Smart categorize → Extract patterns → Generate references
|
|
- Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
|
|
|
|
### Directory Structure
|
|
```
|
|
Skill_Seekers/
|
|
├── cli/ # CLI tools
|
|
│ ├── doc_scraper.py # Main scraping & building tool
|
|
│ ├── enhance_skill.py # AI enhancement (API-based)
|
|
│ ├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
|
│ ├── estimate_pages.py # Page count estimator
|
|
│ ├── split_config.py # Large docs splitter (NEW)
|
|
│ ├── generate_router.py # Router skill generator (NEW)
|
|
│ ├── package_skill.py # Single skill packager
|
|
│ └── package_multi.py # Multi-skill packager (NEW)
|
|
├── mcp/ # MCP server
|
|
│ ├── server.py # 9 MCP tools (includes upload)
|
|
│ └── README.md
|
|
├── configs/ # Preset configurations
|
|
│ ├── godot.json
|
|
│ ├── godot-large-example.json # Large docs example (NEW)
|
|
│ ├── react.json
|
|
│ └── ...
|
|
├── docs/ # Documentation
|
|
│ ├── CLAUDE.md # Technical architecture (this file)
|
|
│ ├── LARGE_DOCUMENTATION.md # Large docs guide (NEW)
|
|
│ ├── ENHANCEMENT.md
|
|
│ ├── MCP_SETUP.md
|
|
│ └── ...
|
|
└── output/ # Generated output (git-ignored)
|
|
├── {name}_data/ # Raw scraped data (cached)
|
|
│ ├── pages/ # Individual page JSONs
|
|
│ ├── summary.json # Scraping summary
|
|
│ └── checkpoint.json # Resume checkpoint (NEW)
|
|
└── {name}/ # Generated skill
|
|
├── SKILL.md # Main skill file with examples
|
|
├── SKILL.md.backup # Backup (if enhanced)
|
|
├── references/ # Categorized documentation
|
|
│ ├── index.md
|
|
│ ├── getting_started.md
|
|
│ ├── api.md
|
|
│ └── ...
|
|
├── scripts/ # Empty (for user scripts)
|
|
└── assets/ # Empty (for user assets)
|
|
```
|
|
|
|
### Configuration Format
|
|
Config files in `configs/*.json` contain:
|
|
- `name`: Skill identifier (e.g., "godot", "react")
|
|
- `description`: When to use this skill
|
|
- `base_url`: Starting URL for scraping
|
|
- `selectors`: CSS selectors for content extraction
|
|
- `main_content`: Main documentation content (e.g., "article", "div[role='main']")
|
|
- `title`: Page title selector
|
|
- `code_blocks`: Code sample selector (e.g., "pre code", "pre")
|
|
- `url_patterns`: URL filtering
|
|
- `include`: Only scrape URLs containing these patterns
|
|
- `exclude`: Skip URLs containing these patterns
|
|
- `categories`: Keyword-based categorization mapping
|
|
- `rate_limit`: Delay between requests (seconds)
|
|
- `max_pages`: Maximum pages to scrape
|
|
- `split_strategy`: (Optional) How to split large docs: "auto", "category", "router", "size"
|
|
- `split_config`: (Optional) Split configuration
|
|
- `target_pages_per_skill`: Pages per sub-skill (default: 5000)
|
|
- `create_router`: Create router/hub skill (default: true)
|
|
- `split_by_categories`: Category names to split by
|
|
- `checkpoint`: (Optional) Checkpoint/resume configuration
|
|
- `enabled`: Enable checkpointing (default: false)
|
|
- `interval`: Save every N pages (default: 1000)
|
|
|
|
### Key Features
|
|
|
|
**Auto-detect existing data**: Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping.
|
|
|
|
**Language detection**: Detects code languages from:
|
|
1. CSS class attributes (`language-*`, `lang-*`)
|
|
2. Heuristics (keywords like `def`, `const`, `func`, etc.)
|
|
|
|
**Pattern extraction**: Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
|
|
|
|
**Smart categorization**:
|
|
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
|
|
- Threshold of 2+ for categorization
|
|
- Auto-infers categories from URL segments if none provided
|
|
- Falls back to "other" category
|
|
|
|
**Enhanced SKILL.md**: Generated with:
|
|
- Real code examples from documentation (language-annotated)
|
|
- Quick reference patterns extracted from docs
|
|
- Common pattern section
|
|
- Category file listings
|
|
|
|
**AI-Powered Enhancement**: Two scripts to dramatically improve SKILL.md quality:
|
|
- `enhance_skill.py`: Uses Anthropic API (~$0.15-$0.30 per skill, requires API key)
|
|
- `enhance_skill_local.py`: Uses Claude Code Max (free, no API key needed)
|
|
- Transforms generic 75-line templates into comprehensive 500+ line guides
|
|
- Extracts best examples, explains key concepts, adds navigation guidance
|
|
- Success rate: 9/10 quality (based on steam-economy test)
|
|
|
|
**Large Documentation Support (NEW)**: Handle 10K-40K+ page documentation:
|
|
- `split_config.py`: Split large configs into multiple focused sub-skills
|
|
- `generate_router.py`: Create intelligent router/hub skills that direct queries
|
|
- `package_multi.py`: Package multiple skills at once
|
|
- 4 split strategies: auto, category, router, size
|
|
- Parallel scraping support for faster processing
|
|
- MCP integration for natural language usage
|
|
|
|
**Checkpoint/Resume (NEW)**: Never lose progress on long scrapes:
|
|
- Auto-saves every N pages (configurable, default: 1000)
|
|
- Resume with `--resume` flag
|
|
- Clear checkpoint with `--fresh` flag
|
|
- Saves on interruption (Ctrl+C)
|
|
|
|
## Key Code Locations
|
|
|
|
- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
|
|
- **Content extraction**: `extract_content()` doc_scraper.py:64-131
|
|
- **Language detection**: `detect_language()` doc_scraper.py:133-163
|
|
- **Pattern extraction**: `extract_patterns()` doc_scraper.py:165-181
|
|
- **Smart categorization**: `smart_categorize()` doc_scraper.py:280-321
|
|
- **Category inference**: `infer_categories()` doc_scraper.py:323-349
|
|
- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:351-370
|
|
- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:424-540
|
|
- **Scraping loop**: `scrape_all()` doc_scraper.py:226-249
|
|
- **Main workflow**: `main()` doc_scraper.py:661-733
|
|
|
|
## Workflow Examples
|
|
|
|
### First time scraping (with scraping)
|
|
```bash
|
|
# 1. Scrape + Build
|
|
python3 cli/doc_scraper.py --config configs/godot.json
|
|
# Time: 20-40 minutes
|
|
|
|
# 2. Package
|
|
python3 cli/package_skill.py output/godot/
|
|
|
|
# Result: godot.zip
|
|
```
|
|
|
|
### Using cached data (fast iteration)
|
|
```bash
|
|
# 1. Use existing data
|
|
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
|
# Time: 1-3 minutes
|
|
|
|
# 2. Package
|
|
python3 cli/package_skill.py output/godot/
|
|
```
|
|
|
|
### Creating a new framework config
|
|
```bash
|
|
# Option 1: Interactive
|
|
python3 cli/doc_scraper.py --interactive
|
|
|
|
# Option 2: Copy and modify
|
|
cp configs/react.json configs/myframework.json
|
|
# Edit configs/myframework.json
|
|
python3 cli/doc_scraper.py --config configs/myframework.json
|
|
```
|
|
|
|
### Large documentation workflow (40K pages)
|
|
```bash
|
|
# 1. Estimate page count (fast, 1-2 minutes)
|
|
python3 cli/estimate_pages.py configs/godot.json
|
|
|
|
# 2. Split into focused sub-skills
|
|
python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
|
|
|
|
# Creates: godot-scripting.json, godot-2d.json, godot-3d.json, etc.
|
|
|
|
# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
|
|
for config in configs/godot-*.json; do
|
|
python3 cli/doc_scraper.py --config $config &
|
|
done
|
|
wait
|
|
|
|
# 4. Generate intelligent router skill
|
|
python3 cli/generate_router.py configs/godot-*.json
|
|
|
|
# 5. Package all skills
|
|
python3 cli/package_multi.py output/godot*/
|
|
|
|
# 6. Upload all .zip files to Claude
|
|
# Result: Router automatically directs queries to the right sub-skill!
|
|
```
|
|
|
|
**Time savings:** Parallel scraping reduces 20-40 hours to 4-8 hours
|
|
|
|
**See full guide:** [Large Documentation Guide](LARGE_DOCUMENTATION.md)
|
|
|
|
## Testing Selectors
|
|
|
|
To find the right CSS selectors for a documentation site:
|
|
|
|
```python
|
|
from bs4 import BeautifulSoup
|
|
import requests
|
|
|
|
url = "https://docs.example.com/page"
|
|
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
|
|
|
|
# Try different selectors
|
|
print(soup.select_one('article'))
|
|
print(soup.select_one('main'))
|
|
print(soup.select_one('div[role="main"]'))
|
|
```
|
|
|
|
## Running Tests
|
|
|
|
**IMPORTANT: You must install the package before running tests**
|
|
|
|
```bash
|
|
# 1. Install package in editable mode (one-time setup)
|
|
pip install -e .
|
|
|
|
# 2. Run all tests
|
|
pytest
|
|
|
|
# 3. Run specific test files
|
|
pytest tests/test_config_validation.py
|
|
pytest tests/test_github_scraper.py
|
|
|
|
# 4. Run with verbose output
|
|
pytest -v
|
|
|
|
# 5. Run with coverage report
|
|
pytest --cov=src/skill_seekers --cov-report=html
|
|
```
|
|
|
|
**Why install first?**
|
|
- Tests import from `skill_seekers.cli` which requires the package to be installed
|
|
- Modern Python packaging best practice (PEP 517/518)
|
|
- CI/CD automatically installs with `pip install -e .`
|
|
- conftest.py will show helpful error if package not installed
|
|
|
|
**Test Coverage:**
|
|
- 391+ tests passing
|
|
- 39% code coverage
|
|
- All core features tested
|
|
- CI/CD tests on Ubuntu + macOS with Python 3.10-3.12
|
|
|
|
## Troubleshooting
|
|
|
|
**No content extracted**: Check `main_content` selector. Common values: `article`, `main`, `div[role="main"]`, `div.content`
|
|
|
|
**Poor categorization**: Edit `categories` section in config with better keywords specific to the documentation structure
|
|
|
|
**Force re-scrape**: Delete cached data with `rm -rf output/{name}_data/`
|
|
|
|
**Rate limiting issues**: Increase `rate_limit` value in config (e.g., from 0.5 to 1.0 seconds)
|
|
|
|
## Output Quality Checks
|
|
|
|
After building, verify quality:
|
|
```bash
|
|
cat output/godot/SKILL.md # Should have real code examples
|
|
cat output/godot/references/index.md # Should show categories
|
|
ls output/godot/references/ # Should have category .md files
|
|
```
|
|
|
|
## llms.txt Support
|
|
|
|
Skill_Seekers automatically detects llms.txt files before HTML scraping:
|
|
|
|
### Detection Order
|
|
1. `{base_url}/llms-full.txt` (complete documentation)
|
|
2. `{base_url}/llms.txt` (standard version)
|
|
3. `{base_url}/llms-small.txt` (quick reference)
|
|
|
|
### Benefits
|
|
- ⚡ 10x faster (< 5 seconds vs 20-60 seconds)
|
|
- ✅ More reliable (maintained by docs authors)
|
|
- 🎯 Better quality (pre-formatted for LLMs)
|
|
- 🚫 No rate limiting needed
|
|
|
|
### Example Sites
|
|
- Hono: https://hono.dev/llms-full.txt
|
|
|
|
If no llms.txt is found, automatically falls back to HTML scraping.
|