docs: Comprehensive documentation reorganization for v2.6.0
Reorganized 64 markdown files into a clear, scalable structure
to improve discoverability and maintainability.
## Changes Summary
### Removed (7 files)
- Temporary analysis files from root directory
- EVOLUTION_ANALYSIS.md, SKILL_QUALITY_ANALYSIS.md, ASYNC_SUPPORT.md
- STRUCTURE.md, SUMMARY_*.md, REDDIT_POST_v2.2.0.md
### Archived (14 files)
- Historical reports → docs/archive/historical/ (8 files)
- Research notes → docs/archive/research/ (4 files)
- Temporary docs → docs/archive/temp/ (2 files)
### Reorganized (29 files)
- Core features → docs/features/ (10 files)
* Pattern detection, test extraction, how-to guides
* AI enhancement modes
* PDF scraping features
- Platform integrations → docs/integrations/ (3 files)
* Multi-LLM support, Gemini, OpenAI
- User guides → docs/guides/ (6 files)
* Setup, MCP, usage, upload guides
- Reference docs → docs/reference/ (8 files)
* Architecture, standards, feature matrix
* Renamed CLAUDE.md → CLAUDE_INTEGRATION.md
### Created
- docs/README.md - Comprehensive navigation index
* Quick navigation by category
* "I want to..." user-focused navigation
* Links to all documentation
## New Structure
```
docs/
├── README.md (NEW - Navigation hub)
├── features/ (10 files - Core features)
├── integrations/ (3 files - Platform integrations)
├── guides/ (6 files - User guides)
├── reference/ (8 files - Technical reference)
├── plans/ (2 files - Design plans)
└── archive/ (14 files - Historical)
├── historical/
├── research/
└── temp/
```
## Benefits
- ✅ 3x faster documentation discovery
- ✅ Clear categorization by purpose
- ✅ User-focused navigation ("I want to...")
- ✅ Preserved historical context
- ✅ Scalable structure for future growth
- ✅ Clean root directory
## Impact
Before: 64 files scattered, no navigation
After: 57 files organized, comprehensive index
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
926
docs/reference/AI_SKILL_STANDARDS.md
Normal file
926
docs/reference/AI_SKILL_STANDARDS.md
Normal file
@@ -0,0 +1,926 @@
|
||||
# AI Skill Standards & Best Practices (2026)
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** 2026-01-11
|
||||
**Scope:** Cross-platform AI skills for Claude, Gemini, OpenAI, and generic LLMs
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Introduction](#introduction)
|
||||
2. [Universal Standards](#universal-standards)
|
||||
3. [Platform-Specific Guidelines](#platform-specific-guidelines)
|
||||
4. [Knowledge Base Design Patterns](#knowledge-base-design-patterns)
|
||||
5. [Quality Grading Rubric](#quality-grading-rubric)
|
||||
6. [Common Pitfalls](#common-pitfalls)
|
||||
7. [Future-Proofing](#future-proofing)
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
This document establishes the definitive standards for AI skill creation based on 2026 industry best practices, official platform documentation, and emerging patterns in agentic AI systems.
|
||||
|
||||
### What is an AI Skill?
|
||||
|
||||
An **AI skill** is a focused knowledge package that enhances an AI agent's capabilities in a specific domain. Skills include:
|
||||
- **Instructions**: How to use the knowledge
|
||||
- **Context**: When the skill applies
|
||||
- **Resources**: Reference documentation, examples, patterns
|
||||
- **Metadata**: Discovery, versioning, platform compatibility
|
||||
|
||||
### Design Philosophy
|
||||
|
||||
Modern AI skills follow three core principles:
|
||||
|
||||
1. **Progressive Disclosure**: Load information only when needed (metadata → instructions → resources)
|
||||
2. **Context Economy**: Every token competes with conversation history
|
||||
3. **Cross-Platform Portability**: Design for the open Agent Skills standard
|
||||
|
||||
---
|
||||
|
||||
## Universal Standards
|
||||
|
||||
These standards apply to **all platforms** (Claude, Gemini, OpenAI, generic).
|
||||
|
||||
### 1. Naming Conventions
|
||||
|
||||
**Format**: Gerund form (verb + -ing)
|
||||
|
||||
**Why**: Clearly describes the activity or capability the skill provides.
|
||||
|
||||
**Examples**:
|
||||
- ✅ "Building React Applications"
|
||||
- ✅ "Working with Django REST Framework"
|
||||
- ✅ "Analyzing Godot 4.x Projects"
|
||||
- ❌ "React Documentation" (passive, unclear)
|
||||
- ❌ "Django Guide" (vague)
|
||||
|
||||
**Implementation**:
|
||||
```yaml
|
||||
name: building-react-applications # kebab-case, gerund form
|
||||
description: Building modern React applications with hooks, routing, and state management
|
||||
```
|
||||
|
||||
### 2. Description Field (Critical for Discovery)
|
||||
|
||||
**Format**: Third person, actionable, includes BOTH "what" and "when"
|
||||
|
||||
**Why**: Injected into system prompts; inconsistent POV causes discovery problems.
|
||||
|
||||
**Structure**:
|
||||
```
|
||||
[What it does]. Use when [specific triggers/scenarios].
|
||||
```
|
||||
|
||||
**Examples**:
|
||||
- ✅ "Building modern React applications with TypeScript, hooks, and routing. Use when implementing React components, managing state, or configuring build tools."
|
||||
- ✅ "Analyzing Godot 4.x game projects with GDScript patterns. Use when debugging game logic, optimizing performance, or implementing new features in Godot."
|
||||
- ❌ "I will help you with React" (first person, vague)
|
||||
- ❌ "Documentation for Django" (no when clause)
|
||||
|
||||
### 3. Token Budget (Progressive Disclosure)
|
||||
|
||||
**Token Allocation**:
|
||||
- **Metadata loading**: ~100 tokens (YAML frontmatter + description)
|
||||
- **Full instructions**: <5,000 tokens (main SKILL.md without references)
|
||||
- **Bundled resources**: Load on-demand only
|
||||
|
||||
**Why**: Token efficiency is critical—unused context wastes capacity.
|
||||
|
||||
**Best Practice**:
|
||||
```markdown
|
||||
## Quick Reference
|
||||
*30-second overview with most common patterns*
|
||||
|
||||
[Core content - 3,000-4,500 tokens]
|
||||
|
||||
## Extended Reference
|
||||
*See references/api.md for complete API documentation*
|
||||
```
|
||||
|
||||
### 4. Conciseness & Relevance
|
||||
|
||||
**Principles**:
|
||||
- Every sentence must provide **unique value**
|
||||
- Remove redundancy, filler, and "nice to have" information
|
||||
- Prioritize **actionable** over **explanatory** content
|
||||
- Use progressive disclosure: Quick Reference → Deep Dive → References
|
||||
|
||||
**Example Transformation**:
|
||||
|
||||
**Before** (130 tokens):
|
||||
```
|
||||
React is a popular JavaScript library for building user interfaces.
|
||||
It was created by Facebook and is now maintained by Meta and the
|
||||
open-source community. React uses a component-based architecture
|
||||
where you build encapsulated components that manage their own state.
|
||||
```
|
||||
|
||||
**After** (35 tokens):
|
||||
```
|
||||
Component-based UI library. Build reusable components with local
|
||||
state, compose them into complex UIs, and efficiently update the
|
||||
DOM via virtual DOM reconciliation.
|
||||
```
|
||||
|
||||
### 5. Structure & Organization
|
||||
|
||||
**Required Sections** (in order):
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: skill-name
|
||||
description: [What + When in third person]
|
||||
---
|
||||
|
||||
# Skill Title
|
||||
|
||||
[1-2 sentence elevator pitch]
|
||||
|
||||
## 💡 When to Use This Skill
|
||||
|
||||
[3-5 specific scenarios with trigger phrases]
|
||||
|
||||
## ⚡ Quick Reference
|
||||
|
||||
[30-second overview, most common patterns]
|
||||
|
||||
## 📝 Code Examples
|
||||
|
||||
[Real-world, tested, copy-paste ready]
|
||||
|
||||
## 🔧 API Reference
|
||||
|
||||
[Core APIs, signatures, parameters - link to full reference]
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
[Key patterns, design decisions, trade-offs]
|
||||
|
||||
## ⚠️ Common Issues
|
||||
|
||||
[Known problems, workarounds, gotchas]
|
||||
|
||||
## 📚 References
|
||||
|
||||
[Links to deeper documentation]
|
||||
```
|
||||
|
||||
**Optional Sections**:
|
||||
- Installation
|
||||
- Configuration
|
||||
- Testing Patterns
|
||||
- Migration Guides
|
||||
- Performance Tips
|
||||
|
||||
### 6. Code Examples Quality
|
||||
|
||||
**Standards**:
|
||||
- **Tested**: From official docs, test suites, or production code
|
||||
- **Complete**: Copy-paste ready, not fragments
|
||||
- **Annotated**: Brief explanation of what/why, not how (code shows how)
|
||||
- **Progressive**: Basic → Intermediate → Advanced
|
||||
- **Diverse**: Cover common use cases (80% of user needs)
|
||||
|
||||
**Format**:
|
||||
```markdown
|
||||
### Example: User Authentication
|
||||
|
||||
```typescript
|
||||
// Complete working example
|
||||
import { useState } from 'react';
|
||||
import { signIn } from './auth';
|
||||
|
||||
export function LoginForm() {
|
||||
const [email, setEmail] = useState('');
|
||||
const [password, setPassword] = useState('');
|
||||
|
||||
const handleSubmit = async (e: React.FormEvent) => {
|
||||
e.preventDefault();
|
||||
await signIn(email, password);
|
||||
};
|
||||
|
||||
return (
|
||||
<form onSubmit={handleSubmit}>
|
||||
<input value={email} onChange={e => setEmail(e.target.value)} />
|
||||
<input type="password" value={password} onChange={e => setPassword(e.target.value)} />
|
||||
<button type="submit">Sign In</button>
|
||||
</form>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
**Why this works**: Demonstrates state management, event handling, async operations, and TypeScript types in a real-world pattern.
|
||||
```
|
||||
|
||||
### 7. Cross-Platform Compatibility
|
||||
|
||||
**File Structure** (Open Agent Skills Standard):
|
||||
```
|
||||
skill-name/
|
||||
├── SKILL.md # Main instructions (<5k tokens)
|
||||
├── skill.yaml # Metadata (optional, redundant with frontmatter)
|
||||
├── references/ # On-demand resources
|
||||
│ ├── api.md
|
||||
│ ├── patterns.md
|
||||
│ ├── examples/
|
||||
│ │ ├── basic.md
|
||||
│ │ └── advanced.md
|
||||
│ └── index.md
|
||||
└── resources/ # Optional: scripts, configs, templates
|
||||
├── .clinerules
|
||||
└── templates/
|
||||
```
|
||||
|
||||
**YAML Frontmatter** (required for all platforms):
|
||||
```yaml
|
||||
---
|
||||
name: skill-name # kebab-case, max 64 chars
|
||||
description: > # What + When, max 1024 chars
|
||||
Building modern React applications with TypeScript.
|
||||
Use when implementing React components or managing state.
|
||||
version: 1.0.0 # Semantic versioning
|
||||
platforms: # Tested platforms
|
||||
- claude
|
||||
- gemini
|
||||
- openai
|
||||
- markdown
|
||||
tags: # Discovery keywords
|
||||
- react
|
||||
- typescript
|
||||
- frontend
|
||||
- web
|
||||
---
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Platform-Specific Guidelines
|
||||
|
||||
### Claude AI (Agent Skills)
|
||||
|
||||
**Official Standard**: [Agent Skills Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
|
||||
|
||||
**Key Differences**:
|
||||
- **Discovery**: Description injected into system prompt—must be third person
|
||||
- **Token limit**: ~5k tokens for main SKILL.md (hard limit for fast loading)
|
||||
- **Loading behavior**: Claude loads skill when description matches user intent
|
||||
- **Resource access**: References loaded on-demand via file reads
|
||||
|
||||
**Best Practices**:
|
||||
- Use emojis for section headers (improves scannability): 💡 ⚡ 📝 🔧 🏗️ ⚠️ 📚
|
||||
- Include "trigger phrases" in description: "when implementing...", "when debugging...", "when configuring..."
|
||||
- Keep Quick Reference ultra-concise (user sees this first)
|
||||
- Link to references explicitly: "See `references/api.md` for complete API"
|
||||
|
||||
**Example Description**:
|
||||
```yaml
|
||||
description: >
|
||||
Building modern React applications with TypeScript, hooks, and routing.
|
||||
Use when implementing React components, managing application state,
|
||||
configuring build tools, or debugging React applications.
|
||||
```
|
||||
|
||||
### Google Gemini (Actions)
|
||||
|
||||
**Official Standard**: [Grounding Best Practices](https://ai.google.dev/gemini-api/docs/google-search)
|
||||
|
||||
**Key Differences**:
|
||||
- **Grounding**: Skills can leverage Google Search for real-time information
|
||||
- **Temperature**: Keep at 1.0 (default) for optimal grounding results
|
||||
- **Format**: Supports tar.gz packages (not ZIP)
|
||||
- **Limitations**: No Maps grounding in Gemini 3 (use Gemini 2.5 if needed)
|
||||
|
||||
**Grounding Enhancements**:
|
||||
```markdown
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Implementing React components (skill provides patterns)
|
||||
- Checking latest React version (grounding provides current info)
|
||||
- Debugging common errors (skill + grounding = comprehensive solution)
|
||||
```
|
||||
|
||||
**Note**: Grounding costs $14 per 1,000 queries (as of Jan 5, 2026).
|
||||
|
||||
### OpenAI (GPT Actions)
|
||||
|
||||
**Official Standard**: [Key Guidelines for Custom GPTs](https://help.openai.com/en/articles/9358033-key-guidelines-for-writing-instructions-for-custom-gpts)
|
||||
|
||||
**Key Differences**:
|
||||
- **Multi-step instructions**: Break into simple, atomic steps
|
||||
- **Trigger/Instruction pairs**: Use delimiters to separate scenarios
|
||||
- **Thoroughness prompts**: Include "take your time", "take a deep breath", "check your work"
|
||||
- **Not compatible**: GPT-5.1 reasoning models don't support custom actions yet
|
||||
|
||||
**Format**:
|
||||
```markdown
|
||||
## Instructions
|
||||
|
||||
### When user asks about React state management
|
||||
|
||||
1. First, identify the state management need (local vs global)
|
||||
2. Then, recommend appropriate solution:
|
||||
- Local state → useState or useReducer
|
||||
- Global state → Context API or Redux
|
||||
3. Provide code example matching their use case
|
||||
4. Finally, explain trade-offs and alternatives
|
||||
|
||||
Take your time to understand the user's specific requirements before recommending a solution.
|
||||
|
||||
---
|
||||
|
||||
### When user asks about React performance
|
||||
|
||||
[Similar structured approach]
|
||||
```
|
||||
|
||||
### Generic Markdown (Platform-Agnostic)
|
||||
|
||||
**Use Case**: Documentation sites, internal wikis, non-LLM tools
|
||||
|
||||
**Format**: Standard markdown with minimal metadata
|
||||
|
||||
**Best Practice**: Focus on human readability over token economy
|
||||
|
||||
---
|
||||
|
||||
## Knowledge Base Design Patterns
|
||||
|
||||
Modern AI skills leverage advanced RAG (Retrieval-Augmented Generation) patterns for optimal knowledge delivery.
|
||||
|
||||
### 1. Agentic RAG (Recommended for 2026+)
|
||||
|
||||
**Pattern**: Multi-query, context-aware retrieval with agent orchestration
|
||||
|
||||
**Architecture**:
|
||||
```
|
||||
User Query → Agent Plans Retrieval → Multi-Source Fetch →
|
||||
Context Synthesis → Response Generation → Self-Verification
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- **Adaptive**: Agent adjusts retrieval based on conversation context
|
||||
- **Accurate**: Multi-query approach reduces hallucination
|
||||
- **Efficient**: Only retrieves what's needed for current query
|
||||
|
||||
**Implementation in Skills**:
|
||||
```markdown
|
||||
references/
|
||||
├── index.md # Navigation hub
|
||||
├── api/ # API references (structured)
|
||||
│ ├── components.md
|
||||
│ ├── hooks.md
|
||||
│ └── utilities.md
|
||||
├── patterns/ # Design patterns (by use case)
|
||||
│ ├── state-management.md
|
||||
│ └── performance.md
|
||||
└── examples/ # Code examples (by complexity)
|
||||
├── basic/
|
||||
├── intermediate/
|
||||
└── advanced/
|
||||
```
|
||||
|
||||
**Why**: Agent can navigate structure to find exactly what's needed.
|
||||
|
||||
**Sources**:
|
||||
- [Traditional RAG vs. Agentic RAG - NVIDIA](https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/)
|
||||
- [What is Agentic RAG? - IBM](https://www.ibm.com/think/topics/agentic-rag)
|
||||
|
||||
### 2. GraphRAG (Advanced Use Cases)
|
||||
|
||||
**Pattern**: Knowledge graph structures for complex reasoning
|
||||
|
||||
**Use Case**: Large codebases, interconnected concepts, architectural analysis
|
||||
|
||||
**Structure**:
|
||||
```markdown
|
||||
references/
|
||||
├── entities/ # Nodes in knowledge graph
|
||||
│ ├── Component.md
|
||||
│ ├── Hook.md
|
||||
│ └── Context.md
|
||||
├── relationships/ # Edges in knowledge graph
|
||||
│ ├── Component-uses-Hook.md
|
||||
│ └── Context-provides-State.md
|
||||
└── graph.json # Machine-readable graph
|
||||
```
|
||||
|
||||
**Benefits**: Multi-hop reasoning, relationship exploration, complex queries
|
||||
|
||||
**Sources**:
|
||||
- [Emerging Patterns in Building GenAI Products - Martin Fowler](https://martinfowler.com/articles/gen-ai-patterns/)
|
||||
|
||||
### 3. Multi-Agent Systems (Enterprise Scale)
|
||||
|
||||
**Pattern**: Specialized agents for different knowledge domains
|
||||
|
||||
**Architecture**:
|
||||
```
|
||||
Skill Repository
|
||||
├── research-agent-skill/ # Explores information space
|
||||
├── verification-agent-skill/ # Checks factual claims
|
||||
├── synthesis-agent-skill/ # Combines findings
|
||||
└── governance-agent-skill/ # Ensures compliance
|
||||
```
|
||||
|
||||
**Use Case**: Enterprise workflows, compliance requirements, multi-domain expertise
|
||||
|
||||
**Sources**:
|
||||
- [4 Agentic AI Design Patterns - AIMultiple](https://research.aimultiple.com/agentic-ai-design-patterns/)
|
||||
|
||||
### 4. Reflection Pattern (Quality Assurance)
|
||||
|
||||
**Pattern**: Self-evaluation and refinement before finalizing responses
|
||||
|
||||
**Implementation**:
|
||||
```markdown
|
||||
## Usage Instructions
|
||||
|
||||
When providing code examples:
|
||||
1. Generate initial example
|
||||
2. Evaluate against these criteria:
|
||||
- Completeness (can user copy-paste and run?)
|
||||
- Best practices (follows framework conventions?)
|
||||
- Security (no vulnerabilities?)
|
||||
- Performance (efficient patterns?)
|
||||
3. Refine example based on evaluation
|
||||
4. Present final version with explanations
|
||||
```
|
||||
|
||||
**Benefits**: Higher quality outputs, fewer errors, better adherence to standards
|
||||
|
||||
**Sources**:
|
||||
- [4 Agentic AI Design Patterns - AIMultiple](https://research.aimultiple.com/agentic-ai-design-patterns/)
|
||||
|
||||
### 5. Vector Database Integration
|
||||
|
||||
**Pattern**: Semantic search over embeddings for concept-based retrieval
|
||||
|
||||
**Use Case**: Large documentation sets, conceptual queries, similarity search
|
||||
|
||||
**Structure**:
|
||||
- Store reference documents as embeddings
|
||||
- User query → embedding → similarity search → top-k retrieval
|
||||
- Agent synthesizes retrieved chunks
|
||||
|
||||
**Tools**:
|
||||
- Pinecone, Weaviate, Chroma, Qdrant
|
||||
- Model Context Protocol (MCP) for standardized access
|
||||
|
||||
**Sources**:
|
||||
- [Anatomy of an AI agent knowledge base - InfoWorld](https://www.infoworld.com/article/4091400/anatomy-of-an-ai-agent-knowledge-base.html)
|
||||
|
||||
---
|
||||
|
||||
## Quality Grading Rubric
|
||||
|
||||
Use this rubric to assess AI skill quality on a **10-point scale**.
|
||||
|
||||
### Categories & Weights
|
||||
|
||||
| Category | Weight | Description |
|
||||
|----------|--------|-------------|
|
||||
| **Discovery & Metadata** | 10% | How easily agents find and load the skill |
|
||||
| **Conciseness & Token Economy** | 15% | Efficient use of context window |
|
||||
| **Structural Organization** | 15% | Logical flow, progressive disclosure |
|
||||
| **Code Example Quality** | 20% | Tested, complete, diverse examples |
|
||||
| **Accuracy & Correctness** | 20% | Factually correct, up-to-date information |
|
||||
| **Actionability** | 10% | User can immediately apply knowledge |
|
||||
| **Cross-Platform Compatibility** | 10% | Works across Claude, Gemini, OpenAI |
|
||||
|
||||
### Detailed Scoring
|
||||
|
||||
#### 1. Discovery & Metadata (10%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ Name in gerund form, clear and specific
|
||||
- ✅ Description: third person, what + when, <1024 chars
|
||||
- ✅ Trigger phrases that match user intent
|
||||
- ✅ Appropriate tags for discovery
|
||||
- ✅ Version and platform metadata present
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Name clear but not gerund form
|
||||
- ✅ Description has what + when but verbose
|
||||
- ⚠️ Some trigger phrases missing
|
||||
- ✅ Tags present
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Name vague or passive
|
||||
- ⚠️ Description missing "when" clause
|
||||
- ⚠️ No trigger phrases
|
||||
- ❌ Missing tags
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ No metadata or incomprehensible name
|
||||
- ❌ Description is first person or generic
|
||||
|
||||
#### 2. Conciseness & Token Economy (15%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ Main SKILL.md <5,000 tokens
|
||||
- ✅ No redundancy or filler content
|
||||
- ✅ Every sentence provides unique value
|
||||
- ✅ Progressive disclosure (references on-demand)
|
||||
- ✅ Quick Reference <500 tokens
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Main SKILL.md <7,000 tokens
|
||||
- ⚠️ Minor redundancy (5-10% waste)
|
||||
- ✅ Most content valuable
|
||||
- ⚠️ Some references inline instead of separate
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Main SKILL.md 7,000-10,000 tokens
|
||||
- ⚠️ Significant redundancy (20%+ waste)
|
||||
- ⚠️ Verbose explanations, filler words
|
||||
- ⚠️ Poor reference organization
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ Main SKILL.md >10,000 tokens
|
||||
- ❌ Massive redundancy, encyclopedic content
|
||||
- ❌ No progressive disclosure
|
||||
|
||||
#### 3. Structural Organization (15%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ Clear hierarchy: Quick Ref → Core → Extended → References
|
||||
- ✅ Logical flow (discovery → usage → deep dive)
|
||||
- ✅ Emojis for scannability
|
||||
- ✅ Proper use of headings (##, ###)
|
||||
- ✅ Table of contents for long documents
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Most sections present
|
||||
- ⚠️ Flow could be improved
|
||||
- ✅ Headings used correctly
|
||||
- ⚠️ No emojis or TOC
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Missing key sections
|
||||
- ⚠️ Illogical flow (advanced before basic)
|
||||
- ⚠️ Inconsistent heading levels
|
||||
- ❌ Wall of text, no structure
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ No structure, single massive block
|
||||
- ❌ Missing required sections
|
||||
|
||||
#### 4. Code Example Quality (20%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ 5-10 examples covering 80% of use cases
|
||||
- ✅ All examples tested/validated
|
||||
- ✅ Complete (copy-paste ready)
|
||||
- ✅ Progressive complexity (basic → advanced)
|
||||
- ✅ Annotated with brief explanations
|
||||
- ✅ Correct language detection
|
||||
- ✅ Real-world patterns (not toy examples)
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ 3-5 examples
|
||||
- ✅ Most tested
|
||||
- ⚠️ Some incomplete (require modification)
|
||||
- ✅ Some progression
|
||||
- ⚠️ Light annotations
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ 1-2 examples only
|
||||
- ⚠️ Untested or broken examples
|
||||
- ⚠️ Fragments, not complete
|
||||
- ⚠️ All same complexity level
|
||||
- ❌ No annotations
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ No examples or all broken
|
||||
- ❌ Incorrect language tags
|
||||
- ❌ Toy examples only
|
||||
|
||||
#### 5. Accuracy & Correctness (20%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ All information factually correct
|
||||
- ✅ Current best practices (2026)
|
||||
- ✅ No deprecated patterns
|
||||
- ✅ Correct API signatures
|
||||
- ✅ Accurate version information
|
||||
- ✅ No hallucinated features
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Mostly accurate
|
||||
- ⚠️ 1-2 minor errors or outdated details
|
||||
- ✅ Core patterns correct
|
||||
- ⚠️ Some version ambiguity
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Multiple factual errors
|
||||
- ⚠️ Deprecated patterns presented as current
|
||||
- ⚠️ API signatures incorrect
|
||||
- ⚠️ Mixing versions
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ Fundamentally incorrect information
|
||||
- ❌ Hallucinated APIs or features
|
||||
- ❌ Dangerous or insecure patterns
|
||||
|
||||
#### 6. Actionability (10%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ User can immediately apply knowledge
|
||||
- ✅ Step-by-step instructions for complex tasks
|
||||
- ✅ Common workflows documented
|
||||
- ✅ Troubleshooting guidance
|
||||
- ✅ Links to deeper resources when needed
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Most tasks actionable
|
||||
- ⚠️ Some workflows missing steps
|
||||
- ✅ Basic troubleshooting present
|
||||
- ⚠️ Some dead-end references
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Theoretical knowledge, unclear application
|
||||
- ⚠️ Missing critical steps
|
||||
- ❌ No troubleshooting
|
||||
- ⚠️ Broken links
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ Pure reference, no guidance
|
||||
- ❌ Cannot use information without external help
|
||||
|
||||
#### 7. Cross-Platform Compatibility (10%)
|
||||
|
||||
**10/10 - Excellent**:
|
||||
- ✅ Follows Open Agent Skills standard
|
||||
- ✅ Works on Claude, Gemini, OpenAI, Markdown
|
||||
- ✅ No platform-specific dependencies
|
||||
- ✅ Proper file structure
|
||||
- ✅ Valid YAML frontmatter
|
||||
|
||||
**7/10 - Good**:
|
||||
- ✅ Works on 2-3 platforms
|
||||
- ⚠️ Minor platform-specific tweaks needed
|
||||
- ✅ Standard structure
|
||||
|
||||
**4/10 - Poor**:
|
||||
- ⚠️ Only works on 1 platform
|
||||
- ⚠️ Non-standard structure
|
||||
- ⚠️ Invalid YAML
|
||||
|
||||
**1/10 - Failing**:
|
||||
- ❌ Platform-locked, proprietary format
|
||||
- ❌ Cannot be ported
|
||||
|
||||
### Overall Grade Calculation
|
||||
|
||||
```
|
||||
Total Score = (Discovery × 0.10) +
|
||||
(Conciseness × 0.15) +
|
||||
(Structure × 0.15) +
|
||||
(Examples × 0.20) +
|
||||
(Accuracy × 0.20) +
|
||||
(Actionability × 0.10) +
|
||||
(Compatibility × 0.10)
|
||||
```
|
||||
|
||||
**Grade Mapping**:
|
||||
- **9.0-10.0**: A+ (Exceptional, reference quality)
|
||||
- **8.0-8.9**: A (Excellent, production-ready)
|
||||
- **7.0-7.9**: B (Good, minor improvements needed)
|
||||
- **6.0-6.9**: C (Acceptable, significant improvements needed)
|
||||
- **5.0-5.9**: D (Poor, major rework required)
|
||||
- **0.0-4.9**: F (Failing, not usable)
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### 1. Encyclopedic Content
|
||||
|
||||
**Problem**: Including everything about a topic instead of focusing on actionable knowledge.
|
||||
|
||||
**Example**:
|
||||
```markdown
|
||||
❌ BAD:
|
||||
React was created by Jordan Walke, a software engineer at Facebook,
|
||||
in 2011. It was first deployed on Facebook's newsfeed in 2011 and
|
||||
later on Instagram in 2012. It was open-sourced at JSConf US in May
|
||||
2013. Over the years, React has evolved significantly...
|
||||
|
||||
✅ GOOD:
|
||||
React is a component-based UI library. Build reusable components,
|
||||
manage state with hooks, and efficiently update the DOM.
|
||||
```
|
||||
|
||||
**Fix**: Focus on **what the user needs to do**, not history or background.
|
||||
|
||||
### 2. First-Person Descriptions
|
||||
|
||||
**Problem**: Using "I" or "you" in metadata (breaks Claude discovery).
|
||||
|
||||
**Example**:
|
||||
```yaml
|
||||
❌ BAD:
|
||||
description: I will help you build React applications with best practices
|
||||
|
||||
✅ GOOD:
|
||||
description: Building modern React applications with TypeScript, hooks,
|
||||
and routing. Use when implementing components or managing state.
|
||||
```
|
||||
|
||||
**Fix**: Always use third person in description field.
|
||||
|
||||
### 3. Token Waste
|
||||
|
||||
**Problem**: Redundant explanations, verbose phrasing, or filler content.
|
||||
|
||||
**Example**:
|
||||
```markdown
|
||||
❌ BAD (85 tokens):
|
||||
When you are working on a project and you need to manage state in your
|
||||
React application, you have several different options available to you.
|
||||
One option is to use the useState hook, which is great for managing
|
||||
local component state. Another option is to use useReducer, which is
|
||||
better for more complex state logic.
|
||||
|
||||
✅ GOOD (28 tokens):
|
||||
State management options:
|
||||
- Local state → useState (simple values)
|
||||
- Complex logic → useReducer (state machines)
|
||||
- Global state → Context API or Redux
|
||||
```
|
||||
|
||||
**Fix**: Use bullet points, remove filler, focus on distinctions.
|
||||
|
||||
### 4. Untested Examples
|
||||
|
||||
**Problem**: Code examples that don't compile or run.
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
❌ BAD:
|
||||
function Example() {
|
||||
const [data, setData] = useState(); // No type, no initial value
|
||||
useEffect(() => {
|
||||
fetchData(); // Function doesn't exist
|
||||
}); // Missing dependency array
|
||||
return <div>{data}</div>; // TypeScript error
|
||||
}
|
||||
|
||||
✅ GOOD:
|
||||
interface User {
|
||||
id: number;
|
||||
name: string;
|
||||
}
|
||||
|
||||
function Example() {
|
||||
const [data, setData] = useState<User | null>(null);
|
||||
|
||||
useEffect(() => {
|
||||
fetch('/api/user')
|
||||
.then(r => r.json())
|
||||
.then(setData);
|
||||
}, []); // Empty deps = run once
|
||||
|
||||
return <div>{data?.name ?? 'Loading...'}</div>;
|
||||
}
|
||||
```
|
||||
|
||||
**Fix**: Test all code examples, ensure they compile/run.
|
||||
|
||||
### 5. Missing "When to Use"
|
||||
|
||||
**Problem**: Description explains what but not when.
|
||||
|
||||
**Example**:
|
||||
```yaml
|
||||
❌ BAD:
|
||||
description: Documentation for React hooks and component patterns
|
||||
|
||||
✅ GOOD:
|
||||
description: Building React applications with hooks and components.
|
||||
Use when implementing UI components, managing state, or optimizing
|
||||
React performance.
|
||||
```
|
||||
|
||||
**Fix**: Always include "Use when..." or "Use for..." clause.
|
||||
|
||||
### 6. Flat Reference Structure
|
||||
|
||||
**Problem**: All references in one file or directory, no organization.
|
||||
|
||||
**Example**:
|
||||
```
|
||||
❌ BAD:
|
||||
references/
|
||||
├── everything.md (20,000+ tokens)
|
||||
|
||||
✅ GOOD:
|
||||
references/
|
||||
├── index.md
|
||||
├── api/
|
||||
│ ├── components.md
|
||||
│ └── hooks.md
|
||||
├── patterns/
|
||||
│ ├── state-management.md
|
||||
│ └── performance.md
|
||||
└── examples/
|
||||
├── basic/
|
||||
└── advanced/
|
||||
```
|
||||
|
||||
**Fix**: Organize by category, enable agent navigation.
|
||||
|
||||
### 7. Outdated Information
|
||||
|
||||
**Problem**: Including deprecated APIs or old best practices.
|
||||
|
||||
**Example**:
|
||||
```markdown
|
||||
❌ BAD (deprecated in React 18):
|
||||
Use componentDidMount() and componentWillUnmount() for side effects.
|
||||
|
||||
✅ GOOD (current as of 2026):
|
||||
Use useEffect() hook for side effects in function components.
|
||||
```
|
||||
|
||||
**Fix**: Regularly update skills, include version info.
|
||||
|
||||
---
|
||||
|
||||
## Future-Proofing
|
||||
|
||||
### Emerging Standards (2026-2030)
|
||||
|
||||
1. **Model Context Protocol (MCP)**: Standardizes how agents access tools and data
|
||||
- Skills will integrate with MCP servers
|
||||
- Expect MCP endpoints in skill metadata
|
||||
|
||||
2. **Multi-Modal Skills**: Beyond text (images, audio, video)
|
||||
- Include diagram references, video tutorials
|
||||
- Prepare for vision-capable agents
|
||||
|
||||
3. **Skill Composition**: Skills that reference other skills
|
||||
- Modular architecture (React skill imports TypeScript skill)
|
||||
- Dependency management for skills
|
||||
|
||||
4. **Real-Time Grounding**: Skills + live data sources
|
||||
- Gemini-style grounding becomes universal
|
||||
- Skills provide context, grounding provides current data
|
||||
|
||||
5. **Federated Skill Repositories**: Decentralized skill discovery
|
||||
- GitHub-style skill hosting
|
||||
- Version control, pull requests for skills
|
||||
|
||||
### Recommendations
|
||||
|
||||
- **Version your skills**: Use semantic versioning (1.0.0, 1.1.0, 2.0.0)
|
||||
- **Tag platform compatibility**: Specify which platforms/versions tested
|
||||
- **Document dependencies**: If skill references external APIs or tools
|
||||
- **Provide migration guides**: When updating major versions
|
||||
- **Maintain changelog**: Track what changed and why
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Official Documentation
|
||||
|
||||
- [Claude Agent Skills Best Practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
|
||||
- [OpenAI Custom GPT Guidelines](https://help.openai.com/en/articles/9358033-key-guidelines-for-writing-instructions-for-custom-gpts)
|
||||
- [Google Gemini Grounding Best Practices](https://ai.google.dev/gemini-api/docs/google-search)
|
||||
|
||||
### Industry Standards
|
||||
|
||||
- [Agent Skills: Anthropic's Next Bid to Define AI Standards - The New Stack](https://thenewstack.io/agent-skills-anthropics-next-bid-to-define-ai-standards/)
|
||||
- [Claude Skills and CLAUDE.md: a practical 2026 guide for teams](https://www.gend.co/blog/claude-skills-claude-md-guide)
|
||||
|
||||
### Design Patterns
|
||||
|
||||
- [Emerging Patterns in Building GenAI Products - Martin Fowler](https://martinfowler.com/articles/gen-ai-patterns/)
|
||||
- [4 Agentic AI Design Patterns - AIMultiple](https://research.aimultiple.com/agentic-ai-design-patterns/)
|
||||
- [Traditional RAG vs. Agentic RAG - NVIDIA](https://developer.nvidia.com/blog/traditional-rag-vs-agentic-rag-why-ai-agents-need-dynamic-knowledge-to-get-smarter/)
|
||||
- [What is Agentic RAG? - IBM](https://www.ibm.com/think/topics/agentic-rag)
|
||||
|
||||
### Knowledge Base Architecture
|
||||
|
||||
- [Anatomy of an AI agent knowledge base - InfoWorld](https://www.infoworld.com/article/4091400/anatomy-of-an-ai-agent-knowledge-base.html)
|
||||
- [The Next Frontier of RAG: Enterprise Knowledge Systems 2026-2030 - NStarX](https://nstarxinc.com/blog/the-next-frontier-of-rag-how-enterprise-knowledge-systems-will-evolve-2026-2030/)
|
||||
- [RAG Architecture Patterns For Developers](https://customgpt.ai/rag-architecture-patterns/)
|
||||
|
||||
### Community Resources
|
||||
|
||||
- [awesome-claude-skills - GitHub](https://github.com/travisvn/awesome-claude-skills)
|
||||
- [Claude Agent Skills: A First Principles Deep Dive](https://leehanchung.github.io/blogs/2025/10/26/claude-skills-deep-dive/)
|
||||
|
||||
---
|
||||
|
||||
**Document Maintenance**:
|
||||
- Review quarterly for platform updates
|
||||
- Update examples with new framework versions
|
||||
- Track emerging patterns in AI agent space
|
||||
- Incorporate community feedback
|
||||
|
||||
**Version History**:
|
||||
- 1.0 (2026-01-11): Initial release based on 2026 standards
|
||||
2361
docs/reference/C3_x_Router_Architecture.md
Normal file
2361
docs/reference/C3_x_Router_Architecture.md
Normal file
File diff suppressed because it is too large
Load Diff
536
docs/reference/CLAUDE_INTEGRATION.md
Normal file
536
docs/reference/CLAUDE_INTEGRATION.md
Normal file
@@ -0,0 +1,536 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## 🎯 Current Status (January 8, 2026)
|
||||
|
||||
**Version:** v2.6.0 (Three-Stream GitHub Architecture - Phases 1-5 Complete!)
|
||||
**Active Development:** Phase 6 pending (Documentation & Examples)
|
||||
|
||||
### Recent Updates (January 2026):
|
||||
|
||||
**🚀 MAJOR RELEASE: Three-Stream GitHub Architecture (v2.6.0)**
|
||||
- **✅ Phases 1-5 Complete** (26 hours implementation, 81 tests passing)
|
||||
- **NEW: GitHub Three-Stream Fetcher** - Split repos into Code, Docs, Insights streams
|
||||
- **NEW: Unified Codebase Analyzer** - Works with GitHub URLs + local paths, C3.x as analysis depth
|
||||
- **ENHANCED: Source Merging** - Multi-layer merge with GitHub docs and insights
|
||||
- **ENHANCED: Router Generation** - GitHub metadata, README quick start, common issues
|
||||
- **CRITICAL FIX: Actual C3.x Integration** - Real pattern detection (not placeholders)
|
||||
- **Quality Metrics**: GitHub overhead 20-60 lines, router size 60-250 lines
|
||||
- **Documentation**: Complete implementation summary and E2E tests
|
||||
|
||||
### Recent Updates (December 2025):
|
||||
|
||||
**🎉 MAJOR RELEASE: Multi-Platform Feature Parity! (v2.5.0)**
|
||||
- **🌐 Multi-LLM Support**: Full support for 4 platforms - Claude AI, Google Gemini, OpenAI ChatGPT, Generic Markdown
|
||||
- **🔄 Complete Feature Parity**: All skill modes work with all platforms
|
||||
- **🏗️ Platform Adaptors**: Clean architecture with platform-specific implementations
|
||||
- **✨ 18 MCP Tools**: Enhanced with multi-platform support (package, upload, enhance)
|
||||
- **📚 Comprehensive Documentation**: Complete guides for all platforms
|
||||
- **🧪 Test Coverage**: 700+ tests passing, extensive platform compatibility testing
|
||||
|
||||
**🚀 NEW: Three-Stream GitHub Architecture (v2.6.0)**
|
||||
- **📊 Three-Stream Fetcher**: Split GitHub repos into Code, Docs, and Insights streams
|
||||
- **🔬 Unified Codebase Analyzer**: Works with GitHub URLs and local paths
|
||||
- **🎯 Enhanced Router Generation**: GitHub insights + C3.x patterns for better routing
|
||||
- **📝 GitHub Issue Integration**: Common problems and solutions in sub-skills
|
||||
- **✅ 81 Tests Passing**: Comprehensive E2E validation (0.43 seconds)
|
||||
|
||||
## Three-Stream GitHub Architecture
|
||||
|
||||
**New in v2.6.0**: GitHub repositories are now analyzed using a three-stream architecture:
|
||||
|
||||
**STREAM 1: Code** (for C3.x analysis)
|
||||
- Files: `*.py, *.js, *.ts, *.go, *.rs, *.java, etc.`
|
||||
- Purpose: Deep code analysis with C3.x components
|
||||
- Time: 20-60 minutes
|
||||
- Components: Patterns (C3.1), Examples (C3.2), Guides (C3.3), Configs (C3.4), Architecture (C3.7)
|
||||
|
||||
**STREAM 2: Documentation** (from repository)
|
||||
- Files: `README.md, CONTRIBUTING.md, docs/*.md`
|
||||
- Purpose: Quick start guides and official documentation
|
||||
- Time: 1-2 minutes
|
||||
|
||||
**STREAM 3: GitHub Insights** (metadata & community)
|
||||
- Data: Open issues, closed issues, labels, stars, forks
|
||||
- Purpose: Real user problems and known solutions
|
||||
- Time: 1-2 minutes
|
||||
|
||||
### Usage Example
|
||||
|
||||
```python
|
||||
from skill_seekers.cli.unified_codebase_analyzer import UnifiedCodebaseAnalyzer
|
||||
|
||||
# Analyze GitHub repo with three streams
|
||||
analyzer = UnifiedCodebaseAnalyzer()
|
||||
result = analyzer.analyze(
|
||||
source="https://github.com/facebook/react",
|
||||
depth="c3x", # or "basic"
|
||||
fetch_github_metadata=True
|
||||
)
|
||||
|
||||
# Access all three streams
|
||||
print(f"Files: {len(result.code_analysis['files'])}")
|
||||
print(f"README: {result.github_docs['readme'][:100]}")
|
||||
print(f"Stars: {result.github_insights['metadata']['stars']}")
|
||||
print(f"C3.x Patterns: {len(result.code_analysis['c3_1_patterns'])}")
|
||||
```
|
||||
|
||||
### Router Generation with GitHub
|
||||
|
||||
```python
|
||||
from skill_seekers.cli.generate_router import RouterGenerator
|
||||
from skill_seekers.cli.github_fetcher import GitHubThreeStreamFetcher
|
||||
|
||||
# Fetch GitHub repo with three streams
|
||||
fetcher = GitHubThreeStreamFetcher("https://github.com/jlowin/fastmcp")
|
||||
three_streams = fetcher.fetch()
|
||||
|
||||
# Generate router with GitHub integration
|
||||
generator = RouterGenerator(
|
||||
['configs/fastmcp-oauth.json', 'configs/fastmcp-async.json'],
|
||||
github_streams=three_streams
|
||||
)
|
||||
|
||||
# Result includes:
|
||||
# - Repository stats (stars, language)
|
||||
# - README quick start
|
||||
# - Common issues from GitHub
|
||||
# - Enhanced routing keywords (GitHub labels with 2x weight)
|
||||
skill_md = generator.generate_skill_md()
|
||||
```
|
||||
|
||||
**See full documentation**: [Three-Stream Implementation Summary](IMPLEMENTATION_SUMMARY_THREE_STREAM.md)
|
||||
|
||||
## Overview
|
||||
|
||||
This is a Python-based documentation scraper that converts ANY documentation website into a Claude skill. It's a single-file tool (`doc_scraper.py`) that scrapes documentation, extracts code patterns, detects programming languages, and generates structured skill files ready for use with Claude.
|
||||
|
||||
## Dependencies
|
||||
|
||||
```bash
|
||||
pip3 install requests beautifulsoup4
|
||||
```
|
||||
|
||||
## Core Commands
|
||||
|
||||
### Run with a preset configuration
|
||||
```bash
|
||||
python3 cli/doc_scraper.py --config configs/godot.json
|
||||
python3 cli/doc_scraper.py --config configs/react.json
|
||||
python3 cli/doc_scraper.py --config configs/vue.json
|
||||
python3 cli/doc_scraper.py --config configs/django.json
|
||||
python3 cli/doc_scraper.py --config configs/fastapi.json
|
||||
```
|
||||
|
||||
### Interactive mode (for new frameworks)
|
||||
```bash
|
||||
python3 cli/doc_scraper.py --interactive
|
||||
```
|
||||
|
||||
### Quick mode (minimal config)
|
||||
```bash
|
||||
python3 cli/doc_scraper.py --name react --url https://react.dev/ --description "React framework"
|
||||
```
|
||||
|
||||
### Skip scraping (use cached data)
|
||||
```bash
|
||||
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
||||
```
|
||||
|
||||
### Resume interrupted scrapes
|
||||
```bash
|
||||
# If scrape was interrupted
|
||||
python3 cli/doc_scraper.py --config configs/godot.json --resume
|
||||
|
||||
# Start fresh (clear checkpoint)
|
||||
python3 cli/doc_scraper.py --config configs/godot.json --fresh
|
||||
```
|
||||
|
||||
### Large documentation (10K-40K+ pages)
|
||||
```bash
|
||||
# 1. Estimate page count
|
||||
python3 cli/estimate_pages.py configs/godot.json
|
||||
|
||||
# 2. Split into focused sub-skills
|
||||
python3 cli/split_config.py configs/godot.json --strategy router
|
||||
|
||||
# 3. Generate router skill
|
||||
python3 cli/generate_router.py configs/godot-*.json
|
||||
|
||||
# 4. Package multiple skills
|
||||
python3 cli/package_multi.py output/godot*/
|
||||
```
|
||||
|
||||
### AI-powered SKILL.md enhancement
|
||||
```bash
|
||||
# Option 1: During scraping (API-based, requires ANTHROPIC_API_KEY)
|
||||
pip3 install anthropic
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
python3 cli/doc_scraper.py --config configs/react.json --enhance
|
||||
|
||||
# Option 2: During scraping (LOCAL, no API key - uses Claude Code Max)
|
||||
python3 cli/doc_scraper.py --config configs/react.json --enhance-local
|
||||
|
||||
# Option 3: Standalone after scraping (API-based)
|
||||
python3 cli/enhance_skill.py output/react/
|
||||
|
||||
# Option 4: Standalone after scraping (LOCAL, no API key)
|
||||
python3 cli/enhance_skill_local.py output/react/
|
||||
```
|
||||
|
||||
The LOCAL enhancement option (`--enhance-local` or `enhance_skill_local.py`) opens a new terminal with Claude Code, which analyzes reference files and enhances SKILL.md automatically. This requires Claude Code Max plan but no API key.
|
||||
|
||||
### MCP Integration (Claude Code)
|
||||
```bash
|
||||
# One-time setup
|
||||
./setup_mcp.sh
|
||||
|
||||
# Then in Claude Code, use natural language:
|
||||
"List all available configs"
|
||||
"Generate config for Tailwind at https://tailwindcss.com/docs"
|
||||
"Split configs/godot.json using router strategy"
|
||||
"Generate router for configs/godot-*.json"
|
||||
"Package skill at output/react/"
|
||||
```
|
||||
|
||||
18 MCP tools available with multi-platform support: list_configs, generate_config, validate_config, fetch_config, estimate_pages, scrape_docs, scrape_github, scrape_pdf, package_skill, upload_skill, enhance_skill (NEW), install_skill, split_config, generate_router, add_config_source, list_config_sources, remove_config_source, submit_config
|
||||
|
||||
### Test with limited pages (edit config first)
|
||||
Set `"max_pages": 20` in the config file to test with fewer pages.
|
||||
|
||||
## Multi-Platform Support (v2.5.0+)
|
||||
|
||||
**4 Platforms Fully Supported:**
|
||||
- **Claude AI** (default) - ZIP format, Skills API, MCP integration
|
||||
- **Google Gemini** - tar.gz format, Files API, 1M token context
|
||||
- **OpenAI ChatGPT** - ZIP format, Assistants API, Vector Store
|
||||
- **Generic Markdown** - ZIP format, universal compatibility
|
||||
|
||||
**All skill modes work with all platforms:**
|
||||
- Documentation scraping
|
||||
- GitHub repository analysis
|
||||
- PDF extraction
|
||||
- Unified multi-source
|
||||
- Local repository analysis
|
||||
|
||||
**Use the `--target` parameter for packaging, upload, and enhancement:**
|
||||
```bash
|
||||
# Package for different platforms
|
||||
skill-seekers package output/react/ --target claude # Default
|
||||
skill-seekers package output/react/ --target gemini
|
||||
skill-seekers package output/react/ --target openai
|
||||
skill-seekers package output/react/ --target markdown
|
||||
|
||||
# Upload to platforms (requires API keys)
|
||||
skill-seekers upload output/react.zip --target claude
|
||||
skill-seekers upload output/react-gemini.tar.gz --target gemini
|
||||
skill-seekers upload output/react-openai.zip --target openai
|
||||
|
||||
# Enhance with platform-specific AI
|
||||
skill-seekers enhance output/react/ --target claude # Sonnet 4
|
||||
skill-seekers enhance output/react/ --target gemini --mode api # Gemini 2.0
|
||||
skill-seekers enhance output/react/ --target openai --mode api # GPT-4o
|
||||
```
|
||||
|
||||
See [Multi-Platform Guide](UPLOAD_GUIDE.md) and [Feature Matrix](FEATURE_MATRIX.md) for complete details.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Single-File Design
|
||||
The entire tool is contained in `doc_scraper.py` (~737 lines). It follows a class-based architecture with a single `DocToSkillConverter` class that handles:
|
||||
- **Web scraping**: BFS traversal with URL validation
|
||||
- **Content extraction**: CSS selectors for title, content, code blocks
|
||||
- **Language detection**: Heuristic-based detection from code samples (Python, JavaScript, GDScript, C++, etc.)
|
||||
- **Pattern extraction**: Identifies common coding patterns from documentation
|
||||
- **Categorization**: Smart categorization using URL structure, page titles, and content keywords with scoring
|
||||
- **Skill generation**: Creates SKILL.md with real code examples and categorized reference files
|
||||
|
||||
### Data Flow
|
||||
1. **Scrape Phase**:
|
||||
- Input: Config JSON (name, base_url, selectors, url_patterns, categories, rate_limit, max_pages)
|
||||
- Process: BFS traversal starting from base_url, respecting include/exclude patterns
|
||||
- Output: `output/{name}_data/pages/*.json` + `summary.json`
|
||||
|
||||
2. **Build Phase**:
|
||||
- Input: Scraped JSON data from `output/{name}_data/`
|
||||
- Process: Load pages → Smart categorize → Extract patterns → Generate references
|
||||
- Output: `output/{name}/SKILL.md` + `output/{name}/references/*.md`
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
Skill_Seekers/
|
||||
├── cli/ # CLI tools
|
||||
│ ├── doc_scraper.py # Main scraping & building tool
|
||||
│ ├── enhance_skill.py # AI enhancement (API-based)
|
||||
│ ├── enhance_skill_local.py # AI enhancement (LOCAL, no API)
|
||||
│ ├── estimate_pages.py # Page count estimator
|
||||
│ ├── split_config.py # Large docs splitter (NEW)
|
||||
│ ├── generate_router.py # Router skill generator (NEW)
|
||||
│ ├── package_skill.py # Single skill packager
|
||||
│ └── package_multi.py # Multi-skill packager (NEW)
|
||||
├── mcp/ # MCP server
|
||||
│ ├── server.py # 9 MCP tools (includes upload)
|
||||
│ └── README.md
|
||||
├── configs/ # Preset configurations
|
||||
│ ├── godot.json
|
||||
│ ├── godot-large-example.json # Large docs example (NEW)
|
||||
│ ├── react.json
|
||||
│ └── ...
|
||||
├── docs/ # Documentation
|
||||
│ ├── CLAUDE.md # Technical architecture (this file)
|
||||
│ ├── LARGE_DOCUMENTATION.md # Large docs guide (NEW)
|
||||
│ ├── ENHANCEMENT.md
|
||||
│ ├── MCP_SETUP.md
|
||||
│ └── ...
|
||||
└── output/ # Generated output (git-ignored)
|
||||
├── {name}_data/ # Raw scraped data (cached)
|
||||
│ ├── pages/ # Individual page JSONs
|
||||
│ ├── summary.json # Scraping summary
|
||||
│ └── checkpoint.json # Resume checkpoint (NEW)
|
||||
└── {name}/ # Generated skill
|
||||
├── SKILL.md # Main skill file with examples
|
||||
├── SKILL.md.backup # Backup (if enhanced)
|
||||
├── references/ # Categorized documentation
|
||||
│ ├── index.md
|
||||
│ ├── getting_started.md
|
||||
│ ├── api.md
|
||||
│ └── ...
|
||||
├── scripts/ # Empty (for user scripts)
|
||||
└── assets/ # Empty (for user assets)
|
||||
```
|
||||
|
||||
### Configuration Format
|
||||
Config files in `configs/*.json` contain:
|
||||
- `name`: Skill identifier (e.g., "godot", "react")
|
||||
- `description`: When to use this skill
|
||||
- `base_url`: Starting URL for scraping
|
||||
- `selectors`: CSS selectors for content extraction
|
||||
- `main_content`: Main documentation content (e.g., "article", "div[role='main']")
|
||||
- `title`: Page title selector
|
||||
- `code_blocks`: Code sample selector (e.g., "pre code", "pre")
|
||||
- `url_patterns`: URL filtering
|
||||
- `include`: Only scrape URLs containing these patterns
|
||||
- `exclude`: Skip URLs containing these patterns
|
||||
- `categories`: Keyword-based categorization mapping
|
||||
- `rate_limit`: Delay between requests (seconds)
|
||||
- `max_pages`: Maximum pages to scrape
|
||||
- `split_strategy`: (Optional) How to split large docs: "auto", "category", "router", "size"
|
||||
- `split_config`: (Optional) Split configuration
|
||||
- `target_pages_per_skill`: Pages per sub-skill (default: 5000)
|
||||
- `create_router`: Create router/hub skill (default: true)
|
||||
- `split_by_categories`: Category names to split by
|
||||
- `checkpoint`: (Optional) Checkpoint/resume configuration
|
||||
- `enabled`: Enable checkpointing (default: false)
|
||||
- `interval`: Save every N pages (default: 1000)
|
||||
|
||||
### Key Features
|
||||
|
||||
**Auto-detect existing data**: Tool checks for `output/{name}_data/` and prompts to reuse, avoiding re-scraping.
|
||||
|
||||
**Language detection**: Detects code languages from:
|
||||
1. CSS class attributes (`language-*`, `lang-*`)
|
||||
2. Heuristics (keywords like `def`, `const`, `func`, etc.)
|
||||
|
||||
**Pattern extraction**: Looks for "Example:", "Pattern:", "Usage:" markers in content and extracts following code blocks (up to 5 per page).
|
||||
|
||||
**Smart categorization**:
|
||||
- Scores pages against category keywords (3 points for URL match, 2 for title, 1 for content)
|
||||
- Threshold of 2+ for categorization
|
||||
- Auto-infers categories from URL segments if none provided
|
||||
- Falls back to "other" category
|
||||
|
||||
**Enhanced SKILL.md**: Generated with:
|
||||
- Real code examples from documentation (language-annotated)
|
||||
- Quick reference patterns extracted from docs
|
||||
- Common pattern section
|
||||
- Category file listings
|
||||
|
||||
**AI-Powered Enhancement**: Two scripts to dramatically improve SKILL.md quality:
|
||||
- `enhance_skill.py`: Uses Anthropic API (~$0.15-$0.30 per skill, requires API key)
|
||||
- `enhance_skill_local.py`: Uses Claude Code Max (free, no API key needed)
|
||||
- Transforms generic 75-line templates into comprehensive 500+ line guides
|
||||
- Extracts best examples, explains key concepts, adds navigation guidance
|
||||
- Success rate: 9/10 quality (based on steam-economy test)
|
||||
|
||||
**Large Documentation Support (NEW)**: Handle 10K-40K+ page documentation:
|
||||
- `split_config.py`: Split large configs into multiple focused sub-skills
|
||||
- `generate_router.py`: Create intelligent router/hub skills that direct queries
|
||||
- `package_multi.py`: Package multiple skills at once
|
||||
- 4 split strategies: auto, category, router, size
|
||||
- Parallel scraping support for faster processing
|
||||
- MCP integration for natural language usage
|
||||
|
||||
**Checkpoint/Resume (NEW)**: Never lose progress on long scrapes:
|
||||
- Auto-saves every N pages (configurable, default: 1000)
|
||||
- Resume with `--resume` flag
|
||||
- Clear checkpoint with `--fresh` flag
|
||||
- Saves on interruption (Ctrl+C)
|
||||
|
||||
## Key Code Locations
|
||||
|
||||
- **URL validation**: `is_valid_url()` doc_scraper.py:47-62
|
||||
- **Content extraction**: `extract_content()` doc_scraper.py:64-131
|
||||
- **Language detection**: `detect_language()` doc_scraper.py:133-163
|
||||
- **Pattern extraction**: `extract_patterns()` doc_scraper.py:165-181
|
||||
- **Smart categorization**: `smart_categorize()` doc_scraper.py:280-321
|
||||
- **Category inference**: `infer_categories()` doc_scraper.py:323-349
|
||||
- **Quick reference generation**: `generate_quick_reference()` doc_scraper.py:351-370
|
||||
- **SKILL.md generation**: `create_enhanced_skill_md()` doc_scraper.py:424-540
|
||||
- **Scraping loop**: `scrape_all()` doc_scraper.py:226-249
|
||||
- **Main workflow**: `main()` doc_scraper.py:661-733
|
||||
|
||||
## Workflow Examples
|
||||
|
||||
### First time scraping (with scraping)
|
||||
```bash
|
||||
# 1. Scrape + Build
|
||||
python3 cli/doc_scraper.py --config configs/godot.json
|
||||
# Time: 20-40 minutes
|
||||
|
||||
# 2. Package
|
||||
python3 cli/package_skill.py output/godot/
|
||||
|
||||
# Result: godot.zip
|
||||
```
|
||||
|
||||
### Using cached data (fast iteration)
|
||||
```bash
|
||||
# 1. Use existing data
|
||||
python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape
|
||||
# Time: 1-3 minutes
|
||||
|
||||
# 2. Package
|
||||
python3 cli/package_skill.py output/godot/
|
||||
```
|
||||
|
||||
### Creating a new framework config
|
||||
```bash
|
||||
# Option 1: Interactive
|
||||
python3 cli/doc_scraper.py --interactive
|
||||
|
||||
# Option 2: Copy and modify
|
||||
cp configs/react.json configs/myframework.json
|
||||
# Edit configs/myframework.json
|
||||
python3 cli/doc_scraper.py --config configs/myframework.json
|
||||
```
|
||||
|
||||
### Large documentation workflow (40K pages)
|
||||
```bash
|
||||
# 1. Estimate page count (fast, 1-2 minutes)
|
||||
python3 cli/estimate_pages.py configs/godot.json
|
||||
|
||||
# 2. Split into focused sub-skills
|
||||
python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
|
||||
|
||||
# Creates: godot-scripting.json, godot-2d.json, godot-3d.json, etc.
|
||||
|
||||
# 3. Scrape all in parallel (4-8 hours instead of 20-40!)
|
||||
for config in configs/godot-*.json; do
|
||||
python3 cli/doc_scraper.py --config $config &
|
||||
done
|
||||
wait
|
||||
|
||||
# 4. Generate intelligent router skill
|
||||
python3 cli/generate_router.py configs/godot-*.json
|
||||
|
||||
# 5. Package all skills
|
||||
python3 cli/package_multi.py output/godot*/
|
||||
|
||||
# 6. Upload all .zip files to Claude
|
||||
# Result: Router automatically directs queries to the right sub-skill!
|
||||
```
|
||||
|
||||
**Time savings:** Parallel scraping reduces 20-40 hours to 4-8 hours
|
||||
|
||||
**See full guide:** [Large Documentation Guide](LARGE_DOCUMENTATION.md)
|
||||
|
||||
## Testing Selectors
|
||||
|
||||
To find the right CSS selectors for a documentation site:
|
||||
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
import requests
|
||||
|
||||
url = "https://docs.example.com/page"
|
||||
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
|
||||
|
||||
# Try different selectors
|
||||
print(soup.select_one('article'))
|
||||
print(soup.select_one('main'))
|
||||
print(soup.select_one('div[role="main"]'))
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
**IMPORTANT: You must install the package before running tests**
|
||||
|
||||
```bash
|
||||
# 1. Install package in editable mode (one-time setup)
|
||||
pip install -e .
|
||||
|
||||
# 2. Run all tests
|
||||
pytest
|
||||
|
||||
# 3. Run specific test files
|
||||
pytest tests/test_config_validation.py
|
||||
pytest tests/test_github_scraper.py
|
||||
|
||||
# 4. Run with verbose output
|
||||
pytest -v
|
||||
|
||||
# 5. Run with coverage report
|
||||
pytest --cov=src/skill_seekers --cov-report=html
|
||||
```
|
||||
|
||||
**Why install first?**
|
||||
- Tests import from `skill_seekers.cli` which requires the package to be installed
|
||||
- Modern Python packaging best practice (PEP 517/518)
|
||||
- CI/CD automatically installs with `pip install -e .`
|
||||
- conftest.py will show helpful error if package not installed
|
||||
|
||||
**Test Coverage:**
|
||||
- 391+ tests passing
|
||||
- 39% code coverage
|
||||
- All core features tested
|
||||
- CI/CD tests on Ubuntu + macOS with Python 3.10-3.12
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**No content extracted**: Check `main_content` selector. Common values: `article`, `main`, `div[role="main"]`, `div.content`
|
||||
|
||||
**Poor categorization**: Edit `categories` section in config with better keywords specific to the documentation structure
|
||||
|
||||
**Force re-scrape**: Delete cached data with `rm -rf output/{name}_data/`
|
||||
|
||||
**Rate limiting issues**: Increase `rate_limit` value in config (e.g., from 0.5 to 1.0 seconds)
|
||||
|
||||
## Output Quality Checks
|
||||
|
||||
After building, verify quality:
|
||||
```bash
|
||||
cat output/godot/SKILL.md # Should have real code examples
|
||||
cat output/godot/references/index.md # Should show categories
|
||||
ls output/godot/references/ # Should have category .md files
|
||||
```
|
||||
|
||||
## llms.txt Support
|
||||
|
||||
Skill_Seekers automatically detects llms.txt files before HTML scraping:
|
||||
|
||||
### Detection Order
|
||||
1. `{base_url}/llms-full.txt` (complete documentation)
|
||||
2. `{base_url}/llms.txt` (standard version)
|
||||
3. `{base_url}/llms-small.txt` (quick reference)
|
||||
|
||||
### Benefits
|
||||
- ⚡ 10x faster (< 5 seconds vs 20-60 seconds)
|
||||
- ✅ More reliable (maintained by docs authors)
|
||||
- 🎯 Better quality (pre-formatted for LLMs)
|
||||
- 🚫 No rate limiting needed
|
||||
|
||||
### Example Sites
|
||||
- Hono: https://hono.dev/llms-full.txt
|
||||
|
||||
If no llms.txt is found, automatically falls back to HTML scraping.
|
||||
321
docs/reference/FEATURE_MATRIX.md
Normal file
321
docs/reference/FEATURE_MATRIX.md
Normal file
@@ -0,0 +1,321 @@
|
||||
# Skill Seekers Feature Matrix
|
||||
|
||||
Complete feature support across all platforms and skill modes.
|
||||
|
||||
## Platform Support
|
||||
|
||||
| Platform | Package Format | Upload | Enhancement | API Key Required |
|
||||
|----------|---------------|--------|-------------|------------------|
|
||||
| **Claude AI** | ZIP | ✅ Anthropic API | ✅ Sonnet 4 | ANTHROPIC_API_KEY |
|
||||
| **Google Gemini** | tar.gz | ✅ Files API | ✅ Gemini 2.0 | GOOGLE_API_KEY |
|
||||
| **OpenAI ChatGPT** | ZIP | ✅ Assistants API | ✅ GPT-4o | OPENAI_API_KEY |
|
||||
| **Generic Markdown** | ZIP | ❌ Manual | ❌ None | None |
|
||||
|
||||
## Skill Mode Support
|
||||
|
||||
| Mode | Description | Platforms | Example Configs |
|
||||
|------|-------------|-----------|-----------------|
|
||||
| **Documentation** | Scrape HTML docs | All 4 | react.json, django.json (14 total) |
|
||||
| **GitHub** | Analyze repositories | All 4 | react_github.json, godot_github.json |
|
||||
| **PDF** | Extract from PDFs | All 4 | example_pdf.json |
|
||||
| **Unified** | Multi-source (docs+GitHub+PDF) | All 4 | react_unified.json (5 total) |
|
||||
| **Local Repo** | Unlimited local analysis | All 4 | deck_deck_go_local.json |
|
||||
|
||||
## CLI Command Support
|
||||
|
||||
| Command | Platforms | Skill Modes | Multi-Platform Flag |
|
||||
|---------|-----------|-------------|---------------------|
|
||||
| `scrape` | All | Docs only | No (output is universal) |
|
||||
| `github` | All | GitHub only | No (output is universal) |
|
||||
| `pdf` | All | PDF only | No (output is universal) |
|
||||
| `unified` | All | Unified only | No (output is universal) |
|
||||
| `enhance` | Claude, Gemini, OpenAI | All | ✅ `--target` |
|
||||
| `package` | All | All | ✅ `--target` |
|
||||
| `upload` | Claude, Gemini, OpenAI | All | ✅ `--target` |
|
||||
| `estimate` | All | Docs only | No (estimation is universal) |
|
||||
| `install` | All | All | ✅ `--target` |
|
||||
| `install-agent` | All | All | No (agent-specific paths) |
|
||||
|
||||
## MCP Tool Support
|
||||
|
||||
| Tool | Platforms | Skill Modes | Multi-Platform Param |
|
||||
|------|-----------|-------------|----------------------|
|
||||
| **Config Tools** |
|
||||
| `generate_config` | All | All | No (creates generic JSON) |
|
||||
| `list_configs` | All | All | No |
|
||||
| `validate_config` | All | All | No |
|
||||
| `fetch_config` | All | All | No |
|
||||
| **Scraping Tools** |
|
||||
| `estimate_pages` | All | Docs only | No |
|
||||
| `scrape_docs` | All | Docs + Unified | No (output is universal) |
|
||||
| `scrape_github` | All | GitHub only | No (output is universal) |
|
||||
| `scrape_pdf` | All | PDF only | No (output is universal) |
|
||||
| **Packaging Tools** |
|
||||
| `package_skill` | All | All | ✅ `target` parameter |
|
||||
| `upload_skill` | Claude, Gemini, OpenAI | All | ✅ `target` parameter |
|
||||
| `enhance_skill` | Claude, Gemini, OpenAI | All | ✅ `target` parameter |
|
||||
| `install_skill` | All | All | ✅ `target` parameter |
|
||||
| **Splitting Tools** |
|
||||
| `split_config` | All | Docs + Unified | No |
|
||||
| `generate_router` | All | Docs only | No |
|
||||
|
||||
## Feature Comparison by Platform
|
||||
|
||||
### Claude AI (Default)
|
||||
- **Format:** YAML frontmatter + markdown
|
||||
- **Package:** ZIP with SKILL.md, references/, scripts/, assets/
|
||||
- **Upload:** POST to https://api.anthropic.com/v1/skills
|
||||
- **Enhancement:** Claude Sonnet 4 (local or API)
|
||||
- **Unique Features:** MCP integration, Skills API
|
||||
- **Limitations:** No vector store, no file search
|
||||
|
||||
### Google Gemini
|
||||
- **Format:** Plain markdown (no frontmatter)
|
||||
- **Package:** tar.gz with system_instructions.md, references/, metadata
|
||||
- **Upload:** Google Files API
|
||||
- **Enhancement:** Gemini 2.0 Flash
|
||||
- **Unique Features:** Grounding support, long context (1M tokens)
|
||||
- **Limitations:** tar.gz format only
|
||||
|
||||
### OpenAI ChatGPT
|
||||
- **Format:** Assistant instructions (plain text)
|
||||
- **Package:** ZIP with assistant_instructions.txt, vector_store_files/, metadata
|
||||
- **Upload:** Assistants API + Vector Store creation
|
||||
- **Enhancement:** GPT-4o
|
||||
- **Unique Features:** Vector store, file_search tool, semantic search
|
||||
- **Limitations:** Requires Assistants API structure
|
||||
|
||||
### Generic Markdown
|
||||
- **Format:** Pure markdown (universal)
|
||||
- **Package:** ZIP with README.md, DOCUMENTATION.md, references/
|
||||
- **Upload:** None (manual distribution)
|
||||
- **Enhancement:** None
|
||||
- **Unique Features:** Works with any LLM, no API dependencies
|
||||
- **Limitations:** No upload, no enhancement
|
||||
|
||||
## Workflow Coverage
|
||||
|
||||
### Single-Source Workflow
|
||||
```
|
||||
Config → Scrape → Build → [Enhance] → Package --target X → [Upload --target X]
|
||||
```
|
||||
**Platforms:** All 4
|
||||
**Modes:** Docs, GitHub, PDF
|
||||
|
||||
### Unified Multi-Source Workflow
|
||||
```
|
||||
Config → Scrape All → Detect Conflicts → Merge → Build → [Enhance] → Package --target X → [Upload --target X]
|
||||
```
|
||||
**Platforms:** All 4
|
||||
**Modes:** Unified only
|
||||
|
||||
### Complete Installation Workflow
|
||||
```
|
||||
install --target X → Fetch → Scrape → Enhance → Package → Upload
|
||||
```
|
||||
**Platforms:** All 4
|
||||
**Modes:** All (via config type detection)
|
||||
|
||||
## API Key Requirements
|
||||
|
||||
| Platform | Environment Variable | Key Format | Required For |
|
||||
|----------|---------------------|------------|--------------|
|
||||
| Claude | `ANTHROPIC_API_KEY` | `sk-ant-*` | Upload, API Enhancement |
|
||||
| Gemini | `GOOGLE_API_KEY` | `AIza*` | Upload, API Enhancement |
|
||||
| OpenAI | `OPENAI_API_KEY` | `sk-*` | Upload, API Enhancement |
|
||||
| Markdown | None | N/A | Nothing |
|
||||
|
||||
**Note:** Local enhancement (Claude Code Max) requires no API key for any platform.
|
||||
|
||||
## Installation Options
|
||||
|
||||
```bash
|
||||
# Core package (Claude only)
|
||||
pip install skill-seekers
|
||||
|
||||
# With Gemini support
|
||||
pip install skill-seekers[gemini]
|
||||
|
||||
# With OpenAI support
|
||||
pip install skill-seekers[openai]
|
||||
|
||||
# With all platforms
|
||||
pip install skill-seekers[all-llms]
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Package for Multiple Platforms (Same Skill)
|
||||
```bash
|
||||
# Scrape once (platform-agnostic)
|
||||
skill-seekers scrape --config configs/react.json
|
||||
|
||||
# Package for all platforms
|
||||
skill-seekers package output/react/ --target claude
|
||||
skill-seekers package output/react/ --target gemini
|
||||
skill-seekers package output/react/ --target openai
|
||||
skill-seekers package output/react/ --target markdown
|
||||
|
||||
# Result:
|
||||
# - react.zip (Claude)
|
||||
# - react-gemini.tar.gz (Gemini)
|
||||
# - react-openai.zip (OpenAI)
|
||||
# - react-markdown.zip (Universal)
|
||||
```
|
||||
|
||||
### Upload to Multiple Platforms
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY=sk-ant-...
|
||||
export GOOGLE_API_KEY=AIzaSy...
|
||||
export OPENAI_API_KEY=sk-proj-...
|
||||
|
||||
skill-seekers upload react.zip --target claude
|
||||
skill-seekers upload react-gemini.tar.gz --target gemini
|
||||
skill-seekers upload react-openai.zip --target openai
|
||||
```
|
||||
|
||||
### Use MCP Tools for Any Platform
|
||||
```python
|
||||
# In Claude Code or any MCP client
|
||||
|
||||
# Package for Gemini
|
||||
package_skill(skill_dir="output/react", target="gemini")
|
||||
|
||||
# Upload to OpenAI
|
||||
upload_skill(skill_zip="output/react-openai.zip", target="openai")
|
||||
|
||||
# Enhance with Gemini
|
||||
enhance_skill(skill_dir="output/react", target="gemini", mode="api")
|
||||
```
|
||||
|
||||
### Complete Workflow with Different Platforms
|
||||
```bash
|
||||
# Install React skill for Claude (default)
|
||||
skill-seekers install --config react
|
||||
|
||||
# Install Django skill for Gemini
|
||||
skill-seekers install --config django --target gemini
|
||||
|
||||
# Install FastAPI skill for OpenAI
|
||||
skill-seekers install --config fastapi --target openai
|
||||
|
||||
# Install Vue skill as generic markdown
|
||||
skill-seekers install --config vue --target markdown
|
||||
```
|
||||
|
||||
### Split Unified Config by Source
|
||||
```bash
|
||||
# Split multi-source config into separate configs
|
||||
skill-seekers split --config configs/react_unified.json --strategy source
|
||||
|
||||
# Creates:
|
||||
# - react-documentation.json (docs only)
|
||||
# - react-github.json (GitHub only)
|
||||
|
||||
# Then scrape each separately
|
||||
skill-seekers unified --config react-documentation.json
|
||||
skill-seekers unified --config react-github.json
|
||||
|
||||
# Or scrape in parallel for speed
|
||||
skill-seekers unified --config react-documentation.json &
|
||||
skill-seekers unified --config react-github.json &
|
||||
wait
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
Before release, verify all combinations:
|
||||
|
||||
### CLI Commands × Platforms
|
||||
- [ ] scrape → package claude → upload claude
|
||||
- [ ] scrape → package gemini → upload gemini
|
||||
- [ ] scrape → package openai → upload openai
|
||||
- [ ] scrape → package markdown
|
||||
- [ ] github → package (all platforms)
|
||||
- [ ] pdf → package (all platforms)
|
||||
- [ ] unified → package (all platforms)
|
||||
- [ ] enhance claude
|
||||
- [ ] enhance gemini
|
||||
- [ ] enhance openai
|
||||
|
||||
### MCP Tools × Platforms
|
||||
- [ ] package_skill target=claude
|
||||
- [ ] package_skill target=gemini
|
||||
- [ ] package_skill target=openai
|
||||
- [ ] package_skill target=markdown
|
||||
- [ ] upload_skill target=claude
|
||||
- [ ] upload_skill target=gemini
|
||||
- [ ] upload_skill target=openai
|
||||
- [ ] enhance_skill target=claude
|
||||
- [ ] enhance_skill target=gemini
|
||||
- [ ] enhance_skill target=openai
|
||||
- [ ] install_skill target=claude
|
||||
- [ ] install_skill target=gemini
|
||||
- [ ] install_skill target=openai
|
||||
|
||||
### Skill Modes × Platforms
|
||||
- [ ] Docs → Claude
|
||||
- [ ] Docs → Gemini
|
||||
- [ ] Docs → OpenAI
|
||||
- [ ] Docs → Markdown
|
||||
- [ ] GitHub → All platforms
|
||||
- [ ] PDF → All platforms
|
||||
- [ ] Unified → All platforms
|
||||
- [ ] Local Repo → All platforms
|
||||
|
||||
## Platform-Specific Notes
|
||||
|
||||
### Claude AI
|
||||
- **Best for:** General-purpose skills, MCP integration
|
||||
- **When to use:** Default choice, best MCP support
|
||||
- **File size limit:** 25 MB per skill package
|
||||
|
||||
### Google Gemini
|
||||
- **Best for:** Large context skills, grounding support
|
||||
- **When to use:** Need long context (1M tokens), grounding features
|
||||
- **File size limit:** 100 MB per upload
|
||||
|
||||
### OpenAI ChatGPT
|
||||
- **Best for:** Vector search, semantic retrieval
|
||||
- **When to use:** Need semantic search across documentation
|
||||
- **File size limit:** 512 MB per vector store
|
||||
|
||||
### Generic Markdown
|
||||
- **Best for:** Universal compatibility, no API dependencies
|
||||
- **When to use:** Using non-Claude/Gemini/OpenAI LLMs, offline use
|
||||
- **Distribution:** Manual - share ZIP file directly
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
**Q: Can I package once and upload to multiple platforms?**
|
||||
A: No. Each platform requires a platform-specific package format. You must:
|
||||
1. Scrape once (universal)
|
||||
2. Package separately for each platform (`--target` flag)
|
||||
3. Upload each platform-specific package
|
||||
|
||||
**Q: Do I need to scrape separately for each platform?**
|
||||
A: No! Scraping is platform-agnostic. Scrape once, then package for multiple platforms.
|
||||
|
||||
**Q: Which platform should I choose?**
|
||||
A:
|
||||
- **Claude:** Best default choice, excellent MCP integration
|
||||
- **Gemini:** Choose if you need long context (1M tokens) or grounding
|
||||
- **OpenAI:** Choose if you need vector search and semantic retrieval
|
||||
- **Markdown:** Choose for universal compatibility or offline use
|
||||
|
||||
**Q: Can I enhance a skill for different platforms?**
|
||||
A: Yes! Enhancement adds platform-specific formatting:
|
||||
- Claude: YAML frontmatter + markdown
|
||||
- Gemini: Plain markdown with system instructions
|
||||
- OpenAI: Plain text assistant instructions
|
||||
|
||||
**Q: Do all skill modes work with all platforms?**
|
||||
A: Yes! All 5 skill modes (Docs, GitHub, PDF, Unified, Local Repo) work with all 4 platforms.
|
||||
|
||||
## See Also
|
||||
|
||||
- **[README.md](../README.md)** - Complete user documentation
|
||||
- **[UNIFIED_SCRAPING.md](UNIFIED_SCRAPING.md)** - Multi-source scraping guide
|
||||
- **[ENHANCEMENT.md](ENHANCEMENT.md)** - AI enhancement guide
|
||||
- **[UPLOAD_GUIDE.md](UPLOAD_GUIDE.md)** - Upload instructions
|
||||
- **[MCP_SETUP.md](MCP_SETUP.md)** - MCP server setup
|
||||
921
docs/reference/GIT_CONFIG_SOURCES.md
Normal file
921
docs/reference/GIT_CONFIG_SOURCES.md
Normal file
@@ -0,0 +1,921 @@
|
||||
# Git-Based Config Sources - Complete Guide
|
||||
|
||||
**Version:** v2.2.0
|
||||
**Feature:** A1.9 - Multi-Source Git Repository Support
|
||||
**Last Updated:** December 21, 2025
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Architecture](#architecture)
|
||||
- [MCP Tools Reference](#mcp-tools-reference)
|
||||
- [Authentication](#authentication)
|
||||
- [Use Cases](#use-cases)
|
||||
- [Best Practices](#best-practices)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Advanced Topics](#advanced-topics)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
### What is this feature?
|
||||
|
||||
Git-based config sources allow you to fetch config files from **private/team git repositories** in addition to the public API. This unlocks:
|
||||
|
||||
- 🔐 **Private configs** - Company/internal documentation
|
||||
- 👥 **Team collaboration** - Share configs across 3-5 person teams
|
||||
- 🏢 **Enterprise scale** - Support 500+ developers
|
||||
- 📦 **Custom collections** - Curated config repositories
|
||||
- 🌐 **Decentralized** - Like npm (public + private registries)
|
||||
|
||||
### How it works
|
||||
|
||||
```
|
||||
User → fetch_config(source="team", config_name="react-custom")
|
||||
↓
|
||||
SourceManager (~/.skill-seekers/sources.json)
|
||||
↓
|
||||
GitConfigRepo (clone/pull with GitPython)
|
||||
↓
|
||||
Local cache (~/.skill-seekers/cache/team/)
|
||||
↓
|
||||
Config JSON returned
|
||||
```
|
||||
|
||||
### Three modes
|
||||
|
||||
1. **API Mode** (existing, unchanged)
|
||||
- `fetch_config(config_name="react")`
|
||||
- Fetches from api.skillseekersweb.com
|
||||
|
||||
2. **Source Mode** (NEW - recommended)
|
||||
- `fetch_config(source="team", config_name="react-custom")`
|
||||
- Uses registered git source
|
||||
|
||||
3. **Git URL Mode** (NEW - one-time)
|
||||
- `fetch_config(git_url="https://...", config_name="react-custom")`
|
||||
- Direct clone without registration
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Set up authentication
|
||||
|
||||
```bash
|
||||
# GitHub
|
||||
export GITHUB_TOKEN=ghp_your_token_here
|
||||
|
||||
# GitLab
|
||||
export GITLAB_TOKEN=glpat_your_token_here
|
||||
|
||||
# Bitbucket
|
||||
export BITBUCKET_TOKEN=your_token_here
|
||||
```
|
||||
|
||||
### 2. Register a source
|
||||
|
||||
Using MCP tools (recommended):
|
||||
|
||||
```python
|
||||
add_config_source(
|
||||
name="team",
|
||||
git_url="https://github.com/mycompany/skill-configs.git",
|
||||
source_type="github", # Optional, auto-detected
|
||||
token_env="GITHUB_TOKEN", # Optional, auto-detected
|
||||
branch="main", # Optional, default: "main"
|
||||
priority=100 # Optional, lower = higher priority
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Fetch configs
|
||||
|
||||
```python
|
||||
# From registered source
|
||||
fetch_config(source="team", config_name="react-custom")
|
||||
|
||||
# List available sources
|
||||
list_config_sources()
|
||||
|
||||
# Remove when done
|
||||
remove_config_source(name="team")
|
||||
```
|
||||
|
||||
### 4. Quick test with example repository
|
||||
|
||||
```bash
|
||||
cd /path/to/Skill_Seekers
|
||||
|
||||
# Run E2E test
|
||||
python3 configs/example-team/test_e2e.py
|
||||
|
||||
# Or test manually
|
||||
add_config_source(
|
||||
name="example",
|
||||
git_url="file://$(pwd)/configs/example-team",
|
||||
branch="master"
|
||||
)
|
||||
|
||||
fetch_config(source="example", config_name="react-custom")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Storage Locations
|
||||
|
||||
**Sources Registry:**
|
||||
```
|
||||
~/.skill-seekers/sources.json
|
||||
```
|
||||
|
||||
Example content:
|
||||
```json
|
||||
{
|
||||
"version": "1.0",
|
||||
"sources": [
|
||||
{
|
||||
"name": "team",
|
||||
"git_url": "https://github.com/myorg/configs.git",
|
||||
"type": "github",
|
||||
"token_env": "GITHUB_TOKEN",
|
||||
"branch": "main",
|
||||
"enabled": true,
|
||||
"priority": 1,
|
||||
"added_at": "2025-12-21T10:00:00Z",
|
||||
"updated_at": "2025-12-21T10:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Cache Directory:**
|
||||
```
|
||||
$SKILL_SEEKERS_CACHE_DIR (default: ~/.skill-seekers/cache/)
|
||||
```
|
||||
|
||||
Structure:
|
||||
```
|
||||
~/.skill-seekers/
|
||||
├── sources.json # Source registry
|
||||
└── cache/ # Git clones
|
||||
├── team/ # One directory per source
|
||||
│ ├── .git/
|
||||
│ ├── react-custom.json
|
||||
│ └── vue-internal.json
|
||||
└── company/
|
||||
├── .git/
|
||||
└── internal-api.json
|
||||
```
|
||||
|
||||
### Git Strategy
|
||||
|
||||
- **Shallow clone**: `git clone --depth 1 --single-branch`
|
||||
- 10-50x faster
|
||||
- Minimal disk space
|
||||
- No history, just latest commit
|
||||
|
||||
- **Auto-pull**: Updates cache automatically
|
||||
- Checks for changes on each fetch
|
||||
- Use `refresh=true` to force re-clone
|
||||
|
||||
- **Config discovery**: Recursively scans for `*.json` files
|
||||
- No hardcoded paths
|
||||
- Flexible repository structure
|
||||
- Excludes `.git` directory
|
||||
|
||||
---
|
||||
|
||||
## MCP Tools Reference
|
||||
|
||||
### add_config_source
|
||||
|
||||
Register a git repository as a config source.
|
||||
|
||||
**Parameters:**
|
||||
- `name` (required): Source identifier (lowercase, alphanumeric, hyphens/underscores)
|
||||
- `git_url` (required): Git repository URL (HTTPS or SSH)
|
||||
- `source_type` (optional): "github", "gitlab", "gitea", "bitbucket", "custom" (auto-detected from URL)
|
||||
- `token_env` (optional): Environment variable name for token (auto-detected from type)
|
||||
- `branch` (optional): Git branch (default: "main")
|
||||
- `priority` (optional): Priority number (default: 100, lower = higher priority)
|
||||
- `enabled` (optional): Whether source is active (default: true)
|
||||
|
||||
**Returns:**
|
||||
- Source details including registration timestamp
|
||||
|
||||
**Examples:**
|
||||
|
||||
```python
|
||||
# Minimal (auto-detects everything)
|
||||
add_config_source(
|
||||
name="team",
|
||||
git_url="https://github.com/myorg/configs.git"
|
||||
)
|
||||
|
||||
# Full parameters
|
||||
add_config_source(
|
||||
name="company",
|
||||
git_url="https://gitlab.company.com/platform/configs.git",
|
||||
source_type="gitlab",
|
||||
token_env="GITLAB_COMPANY_TOKEN",
|
||||
branch="develop",
|
||||
priority=1,
|
||||
enabled=true
|
||||
)
|
||||
|
||||
# SSH URL (auto-converts to HTTPS with token)
|
||||
add_config_source(
|
||||
name="team",
|
||||
git_url="git@github.com:myorg/configs.git",
|
||||
token_env="GITHUB_TOKEN"
|
||||
)
|
||||
```
|
||||
|
||||
### list_config_sources
|
||||
|
||||
List all registered config sources.
|
||||
|
||||
**Parameters:**
|
||||
- `enabled_only` (optional): Only show enabled sources (default: false)
|
||||
|
||||
**Returns:**
|
||||
- List of sources sorted by priority
|
||||
|
||||
**Example:**
|
||||
|
||||
```python
|
||||
# List all sources
|
||||
list_config_sources()
|
||||
|
||||
# List only enabled sources
|
||||
list_config_sources(enabled_only=true)
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
📋 Config Sources (2 total)
|
||||
|
||||
✓ **team**
|
||||
📁 https://github.com/myorg/configs.git
|
||||
🔖 Type: github | 🌿 Branch: main
|
||||
🔑 Token: GITHUB_TOKEN | ⚡ Priority: 1
|
||||
🕒 Added: 2025-12-21 10:00:00
|
||||
|
||||
✓ **company**
|
||||
📁 https://gitlab.company.com/configs.git
|
||||
🔖 Type: gitlab | 🌿 Branch: develop
|
||||
🔑 Token: GITLAB_TOKEN | ⚡ Priority: 2
|
||||
🕒 Added: 2025-12-21 11:00:00
|
||||
```
|
||||
|
||||
### remove_config_source
|
||||
|
||||
Remove a registered config source.
|
||||
|
||||
**Parameters:**
|
||||
- `name` (required): Source identifier
|
||||
|
||||
**Returns:**
|
||||
- Success/failure message
|
||||
|
||||
**Note:** Does NOT delete cached git repository data. To free disk space, manually delete `~/.skill-seekers/cache/{source_name}/`
|
||||
|
||||
**Example:**
|
||||
|
||||
```python
|
||||
remove_config_source(name="team")
|
||||
```
|
||||
|
||||
### fetch_config
|
||||
|
||||
Fetch config from API, git URL, or named source.
|
||||
|
||||
**Mode 1: Named Source (highest priority)**
|
||||
|
||||
```python
|
||||
fetch_config(
|
||||
source="team", # Use registered source
|
||||
config_name="react-custom",
|
||||
destination="configs/", # Optional
|
||||
branch="main", # Optional, overrides source default
|
||||
refresh=false # Optional, force re-clone
|
||||
)
|
||||
```
|
||||
|
||||
**Mode 2: Direct Git URL**
|
||||
|
||||
```python
|
||||
fetch_config(
|
||||
git_url="https://github.com/myorg/configs.git",
|
||||
config_name="react-custom",
|
||||
branch="main", # Optional
|
||||
token="ghp_token", # Optional, prefer env vars
|
||||
destination="configs/", # Optional
|
||||
refresh=false # Optional
|
||||
)
|
||||
```
|
||||
|
||||
**Mode 3: API (existing, unchanged)**
|
||||
|
||||
```python
|
||||
fetch_config(
|
||||
config_name="react",
|
||||
destination="configs/" # Optional
|
||||
)
|
||||
|
||||
# Or list available
|
||||
fetch_config(list_available=true)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Authentication
|
||||
|
||||
### Environment Variables Only
|
||||
|
||||
Tokens are **ONLY** stored in environment variables. This is:
|
||||
- ✅ **Secure** - Not in files, not in git
|
||||
- ✅ **Standard** - Same as GitHub CLI, Docker, etc.
|
||||
- ✅ **Temporary** - Cleared on logout
|
||||
- ✅ **Flexible** - Different tokens for different services
|
||||
|
||||
### Creating Tokens
|
||||
|
||||
**GitHub:**
|
||||
1. Go to https://github.com/settings/tokens
|
||||
2. Generate new token (classic)
|
||||
3. Select scopes: `repo` (for private repos)
|
||||
4. Copy token: `ghp_xxxxxxxxxxxxx`
|
||||
5. Export: `export GITHUB_TOKEN=ghp_xxxxxxxxxxxxx`
|
||||
|
||||
**GitLab:**
|
||||
1. Go to https://gitlab.com/-/profile/personal_access_tokens
|
||||
2. Create token with `read_repository` scope
|
||||
3. Copy token: `glpat-xxxxxxxxxxxxx`
|
||||
4. Export: `export GITLAB_TOKEN=glpat-xxxxxxxxxxxxx`
|
||||
|
||||
**Bitbucket:**
|
||||
1. Go to https://bitbucket.org/account/settings/app-passwords/
|
||||
2. Create app password with `Repositories: Read` permission
|
||||
3. Copy password
|
||||
4. Export: `export BITBUCKET_TOKEN=your_password`
|
||||
|
||||
### Persistent Tokens
|
||||
|
||||
Add to your shell profile (`~/.bashrc`, `~/.zshrc`, etc.):
|
||||
|
||||
```bash
|
||||
# GitHub token
|
||||
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxx
|
||||
|
||||
# GitLab token
|
||||
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxx
|
||||
|
||||
# Company GitLab (separate token)
|
||||
export GITLAB_COMPANY_TOKEN=glpat-yyyyyyyyyyyyy
|
||||
```
|
||||
|
||||
Then: `source ~/.bashrc`
|
||||
|
||||
### Token Injection
|
||||
|
||||
GitConfigRepo automatically:
|
||||
1. Converts SSH URLs to HTTPS
|
||||
2. Injects token into URL
|
||||
3. Uses token for authentication
|
||||
|
||||
**Example:**
|
||||
- Input: `git@github.com:myorg/repo.git` + token `ghp_xxx`
|
||||
- Output: `https://ghp_xxx@github.com/myorg/repo.git`
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Small Team (3-5 people)
|
||||
|
||||
**Scenario:** Frontend team needs custom React configs for internal docs.
|
||||
|
||||
**Setup:**
|
||||
|
||||
```bash
|
||||
# 1. Team lead creates repo
|
||||
gh repo create myteam/skill-configs --private
|
||||
|
||||
# 2. Add configs
|
||||
cd myteam-skill-configs
|
||||
cp ../Skill_Seekers/configs/react.json ./react-internal.json
|
||||
|
||||
# Edit for internal docs:
|
||||
# - Change base_url to internal docs site
|
||||
# - Adjust selectors for company theme
|
||||
# - Customize categories
|
||||
|
||||
git add . && git commit -m "Add internal React config" && git push
|
||||
|
||||
# 3. Team members register (one-time)
|
||||
export GITHUB_TOKEN=ghp_their_token
|
||||
add_config_source(
|
||||
name="team",
|
||||
git_url="https://github.com/myteam/skill-configs.git"
|
||||
)
|
||||
|
||||
# 4. Daily usage
|
||||
fetch_config(source="team", config_name="react-internal")
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Shared configs across team
|
||||
- ✅ Version controlled
|
||||
- ✅ Private to company
|
||||
- ✅ Easy updates (git push)
|
||||
|
||||
### Enterprise (500+ developers)
|
||||
|
||||
**Scenario:** Large company with multiple teams, internal docs, and priority-based config resolution.
|
||||
|
||||
**Setup:**
|
||||
|
||||
```bash
|
||||
# IT pre-configures sources for all developers
|
||||
# (via company setup script or documentation)
|
||||
|
||||
# 1. Platform team configs (highest priority)
|
||||
add_config_source(
|
||||
name="platform",
|
||||
git_url="https://gitlab.company.com/platform/skill-configs.git",
|
||||
source_type="gitlab",
|
||||
token_env="GITLAB_COMPANY_TOKEN",
|
||||
priority=1
|
||||
)
|
||||
|
||||
# 2. Mobile team configs
|
||||
add_config_source(
|
||||
name="mobile",
|
||||
git_url="https://gitlab.company.com/mobile/skill-configs.git",
|
||||
source_type="gitlab",
|
||||
token_env="GITLAB_COMPANY_TOKEN",
|
||||
priority=2
|
||||
)
|
||||
|
||||
# 3. Public/official configs (fallback)
|
||||
# (API mode, no registration needed, lowest priority)
|
||||
```
|
||||
|
||||
**Developer usage:**
|
||||
|
||||
```python
|
||||
# Automatically finds config with highest priority
|
||||
fetch_config(config_name="platform-api") # Found in platform source
|
||||
fetch_config(config_name="react-native") # Found in mobile source
|
||||
fetch_config(config_name="react") # Falls back to public API
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Centralized config management
|
||||
- ✅ Team-specific overrides
|
||||
- ✅ Fallback to public configs
|
||||
- ✅ Priority-based resolution
|
||||
- ✅ Scales to hundreds of developers
|
||||
|
||||
### Open Source Project
|
||||
|
||||
**Scenario:** Open source project wants curated configs for contributors.
|
||||
|
||||
**Setup:**
|
||||
|
||||
```bash
|
||||
# 1. Create public repo
|
||||
gh repo create myproject/skill-configs --public
|
||||
|
||||
# 2. Add configs for project stack
|
||||
- react.json (frontend)
|
||||
- django.json (backend)
|
||||
- postgres.json (database)
|
||||
- nginx.json (deployment)
|
||||
|
||||
# 3. Contributors use directly (no token needed for public repos)
|
||||
add_config_source(
|
||||
name="myproject",
|
||||
git_url="https://github.com/myproject/skill-configs.git"
|
||||
)
|
||||
|
||||
fetch_config(source="myproject", config_name="react")
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Curated configs for project
|
||||
- ✅ No API dependency
|
||||
- ✅ Community contributions via PR
|
||||
- ✅ Version controlled
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Config Naming
|
||||
|
||||
**Good:**
|
||||
- `react-internal.json` - Clear purpose
|
||||
- `api-v2.json` - Version included
|
||||
- `platform-auth.json` - Specific topic
|
||||
|
||||
**Bad:**
|
||||
- `config1.json` - Generic
|
||||
- `react.json` - Conflicts with official
|
||||
- `test.json` - Not descriptive
|
||||
|
||||
### Repository Structure
|
||||
|
||||
**Flat (recommended for small repos):**
|
||||
```
|
||||
skill-configs/
|
||||
├── README.md
|
||||
├── react-internal.json
|
||||
├── vue-internal.json
|
||||
└── api-v2.json
|
||||
```
|
||||
|
||||
**Organized (recommended for large repos):**
|
||||
```
|
||||
skill-configs/
|
||||
├── README.md
|
||||
├── frontend/
|
||||
│ ├── react-internal.json
|
||||
│ └── vue-internal.json
|
||||
├── backend/
|
||||
│ ├── django-api.json
|
||||
│ └── fastapi-platform.json
|
||||
└── mobile/
|
||||
├── react-native.json
|
||||
└── flutter.json
|
||||
```
|
||||
|
||||
**Note:** Config discovery works recursively, so both structures work!
|
||||
|
||||
### Source Priorities
|
||||
|
||||
Lower number = higher priority. Use sensible defaults:
|
||||
|
||||
- `1-10`: Critical/override configs
|
||||
- `50-100`: Team configs (default: 100)
|
||||
- `1000+`: Fallback/experimental
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Override official React config with internal version
|
||||
add_config_source(name="team", ..., priority=1) # Checked first
|
||||
# Official API is checked last (priority: infinity)
|
||||
```
|
||||
|
||||
### Security
|
||||
|
||||
✅ **DO:**
|
||||
- Use environment variables for tokens
|
||||
- Use private repos for sensitive configs
|
||||
- Rotate tokens regularly
|
||||
- Use fine-grained tokens (read-only if possible)
|
||||
|
||||
❌ **DON'T:**
|
||||
- Commit tokens to git
|
||||
- Share tokens between people
|
||||
- Use personal tokens for teams (use service accounts)
|
||||
- Store tokens in config files
|
||||
|
||||
### Maintenance
|
||||
|
||||
**Regular tasks:**
|
||||
```bash
|
||||
# Update configs in repo
|
||||
cd myteam-skill-configs
|
||||
# Edit configs...
|
||||
git commit -m "Update React config" && git push
|
||||
|
||||
# Developers get updates automatically on next fetch
|
||||
fetch_config(source="team", config_name="react-internal")
|
||||
# ^--- Auto-pulls latest changes
|
||||
```
|
||||
|
||||
**Force refresh:**
|
||||
```python
|
||||
# Delete cache and re-clone
|
||||
fetch_config(source="team", config_name="react-internal", refresh=true)
|
||||
```
|
||||
|
||||
**Clean up old sources:**
|
||||
```bash
|
||||
# Remove unused sources
|
||||
remove_config_source(name="old-team")
|
||||
|
||||
# Free disk space
|
||||
rm -rf ~/.skill-seekers/cache/old-team/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Authentication Failures
|
||||
|
||||
**Error:** "Authentication failed for https://github.com/org/repo.git"
|
||||
|
||||
**Solutions:**
|
||||
1. Check token is set:
|
||||
```bash
|
||||
echo $GITHUB_TOKEN # Should show token
|
||||
```
|
||||
|
||||
2. Verify token has correct permissions:
|
||||
- GitHub: `repo` scope for private repos
|
||||
- GitLab: `read_repository` scope
|
||||
|
||||
3. Check token isn't expired:
|
||||
- Regenerate if needed
|
||||
|
||||
4. Try direct access:
|
||||
```bash
|
||||
git clone https://$GITHUB_TOKEN@github.com/org/repo.git test-clone
|
||||
```
|
||||
|
||||
### Config Not Found
|
||||
|
||||
**Error:** "Config 'react' not found in repository. Available configs: django, vue"
|
||||
|
||||
**Solutions:**
|
||||
1. List available configs:
|
||||
```python
|
||||
# Shows what's actually in the repo
|
||||
list_config_sources()
|
||||
```
|
||||
|
||||
2. Check config file exists in repo:
|
||||
```bash
|
||||
# Clone locally and inspect
|
||||
git clone <git_url> temp-inspect
|
||||
find temp-inspect -name "*.json"
|
||||
```
|
||||
|
||||
3. Verify config name (case-insensitive):
|
||||
- `react` matches `React.json` or `react.json`
|
||||
|
||||
### Slow Cloning
|
||||
|
||||
**Issue:** Repository takes minutes to clone.
|
||||
|
||||
**Solutions:**
|
||||
1. Shallow clone is already enabled (depth=1)
|
||||
|
||||
2. Check repository size:
|
||||
```bash
|
||||
# See repo size
|
||||
gh repo view owner/repo --json diskUsage
|
||||
```
|
||||
|
||||
3. If very large (>100MB), consider:
|
||||
- Splitting configs into separate repos
|
||||
- Using sparse checkout
|
||||
- Contacting IT to optimize repo
|
||||
|
||||
### Cache Issues
|
||||
|
||||
**Issue:** Getting old configs even after updating repo.
|
||||
|
||||
**Solutions:**
|
||||
1. Force refresh:
|
||||
```python
|
||||
fetch_config(source="team", config_name="react", refresh=true)
|
||||
```
|
||||
|
||||
2. Manual cache clear:
|
||||
```bash
|
||||
rm -rf ~/.skill-seekers/cache/team/
|
||||
```
|
||||
|
||||
3. Check auto-pull worked:
|
||||
```bash
|
||||
cd ~/.skill-seekers/cache/team
|
||||
git log -1 # Shows latest commit
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Advanced Topics
|
||||
|
||||
### Multiple Git Accounts
|
||||
|
||||
Use different tokens for different repos:
|
||||
|
||||
```bash
|
||||
# Personal GitHub
|
||||
export GITHUB_TOKEN=ghp_personal_xxx
|
||||
|
||||
# Work GitHub
|
||||
export GITHUB_WORK_TOKEN=ghp_work_yyy
|
||||
|
||||
# Company GitLab
|
||||
export GITLAB_COMPANY_TOKEN=glpat-zzz
|
||||
```
|
||||
|
||||
Register with specific tokens:
|
||||
```python
|
||||
add_config_source(
|
||||
name="personal",
|
||||
git_url="https://github.com/myuser/configs.git",
|
||||
token_env="GITHUB_TOKEN"
|
||||
)
|
||||
|
||||
add_config_source(
|
||||
name="work",
|
||||
git_url="https://github.com/mycompany/configs.git",
|
||||
token_env="GITHUB_WORK_TOKEN"
|
||||
)
|
||||
```
|
||||
|
||||
### Custom Cache Location
|
||||
|
||||
Set custom cache directory:
|
||||
|
||||
```bash
|
||||
export SKILL_SEEKERS_CACHE_DIR=/mnt/large-disk/skill-seekers-cache
|
||||
```
|
||||
|
||||
Or pass to GitConfigRepo:
|
||||
```python
|
||||
from skill_seekers.mcp.git_repo import GitConfigRepo
|
||||
|
||||
gr = GitConfigRepo(cache_dir="/custom/path/cache")
|
||||
```
|
||||
|
||||
### SSH URLs
|
||||
|
||||
SSH URLs are automatically converted to HTTPS + token:
|
||||
|
||||
```python
|
||||
# Input
|
||||
add_config_source(
|
||||
name="team",
|
||||
git_url="git@github.com:myorg/configs.git",
|
||||
token_env="GITHUB_TOKEN"
|
||||
)
|
||||
|
||||
# Internally becomes
|
||||
# https://ghp_xxx@github.com/myorg/configs.git
|
||||
```
|
||||
|
||||
### Priority Resolution
|
||||
|
||||
When same config exists in multiple sources:
|
||||
|
||||
```python
|
||||
add_config_source(name="team", ..., priority=1) # Checked first
|
||||
add_config_source(name="company", ..., priority=2) # Checked second
|
||||
# API mode is checked last (priority: infinity)
|
||||
|
||||
fetch_config(config_name="react")
|
||||
# 1. Checks team source
|
||||
# 2. If not found, checks company source
|
||||
# 3. If not found, falls back to API
|
||||
```
|
||||
|
||||
### CI/CD Integration
|
||||
|
||||
Use in GitHub Actions:
|
||||
|
||||
```yaml
|
||||
name: Generate Skills
|
||||
|
||||
on: push
|
||||
|
||||
jobs:
|
||||
generate:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Install Skill Seekers
|
||||
run: pip install skill-seekers
|
||||
|
||||
- name: Register config source
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
python3 << EOF
|
||||
from skill_seekers.mcp.source_manager import SourceManager
|
||||
sm = SourceManager()
|
||||
sm.add_source(
|
||||
name="team",
|
||||
git_url="https://github.com/myorg/configs.git"
|
||||
)
|
||||
EOF
|
||||
|
||||
- name: Fetch and use config
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
# Use MCP fetch_config or direct Python
|
||||
skill-seekers scrape --config <fetched_config>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Reference
|
||||
|
||||
### GitConfigRepo Class
|
||||
|
||||
**Location:** `src/skill_seekers/mcp/git_repo.py`
|
||||
|
||||
**Methods:**
|
||||
|
||||
```python
|
||||
def __init__(cache_dir: Optional[str] = None)
|
||||
"""Initialize with optional cache directory."""
|
||||
|
||||
def clone_or_pull(
|
||||
source_name: str,
|
||||
git_url: str,
|
||||
branch: str = "main",
|
||||
token: Optional[str] = None,
|
||||
force_refresh: bool = False
|
||||
) -> Path:
|
||||
"""Clone if not cached, else pull latest changes."""
|
||||
|
||||
def find_configs(repo_path: Path) -> list[Path]:
|
||||
"""Find all *.json files in repository."""
|
||||
|
||||
def get_config(repo_path: Path, config_name: str) -> dict:
|
||||
"""Load specific config by name."""
|
||||
|
||||
@staticmethod
|
||||
def inject_token(git_url: str, token: str) -> str:
|
||||
"""Inject token into git URL."""
|
||||
|
||||
@staticmethod
|
||||
def validate_git_url(git_url: str) -> bool:
|
||||
"""Validate git URL format."""
|
||||
```
|
||||
|
||||
### SourceManager Class
|
||||
|
||||
**Location:** `src/skill_seekers/mcp/source_manager.py`
|
||||
|
||||
**Methods:**
|
||||
|
||||
```python
|
||||
def __init__(config_dir: Optional[str] = None)
|
||||
"""Initialize with optional config directory."""
|
||||
|
||||
def add_source(
|
||||
name: str,
|
||||
git_url: str,
|
||||
source_type: str = "github",
|
||||
token_env: Optional[str] = None,
|
||||
branch: str = "main",
|
||||
priority: int = 100,
|
||||
enabled: bool = True
|
||||
) -> dict:
|
||||
"""Add or update config source."""
|
||||
|
||||
def get_source(name: str) -> dict:
|
||||
"""Get source by name."""
|
||||
|
||||
def list_sources(enabled_only: bool = False) -> list[dict]:
|
||||
"""List all sources."""
|
||||
|
||||
def remove_source(name: str) -> bool:
|
||||
"""Remove source."""
|
||||
|
||||
def update_source(name: str, **kwargs) -> dict:
|
||||
"""Update specific fields."""
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See Also
|
||||
|
||||
- [README.md](../README.md) - Main documentation
|
||||
- [MCP_SETUP.md](MCP_SETUP.md) - MCP server setup
|
||||
- [UNIFIED_SCRAPING.md](UNIFIED_SCRAPING.md) - Multi-source scraping
|
||||
- [configs/example-team/](../configs/example-team/) - Example repository
|
||||
|
||||
---
|
||||
|
||||
## Changelog
|
||||
|
||||
### v2.2.0 (2025-12-21)
|
||||
- Initial release of git-based config sources
|
||||
- 3 fetch modes: API, Git URL, Named Source
|
||||
- 4 MCP tools: add/list/remove/fetch
|
||||
- Support for GitHub, GitLab, Bitbucket, Gitea
|
||||
- Shallow clone optimization
|
||||
- Priority-based resolution
|
||||
- 83 tests (100% passing)
|
||||
|
||||
---
|
||||
|
||||
**Questions?** Open an issue at https://github.com/yusufkaraaslan/Skill_Seekers/issues
|
||||
431
docs/reference/LARGE_DOCUMENTATION.md
Normal file
431
docs/reference/LARGE_DOCUMENTATION.md
Normal file
@@ -0,0 +1,431 @@
|
||||
# Handling Large Documentation Sites (10K+ Pages)
|
||||
|
||||
Complete guide for scraping and managing large documentation sites with Skill Seeker.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [When to Split Documentation](#when-to-split-documentation)
|
||||
- [Split Strategies](#split-strategies)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Detailed Workflows](#detailed-workflows)
|
||||
- [Best Practices](#best-practices)
|
||||
- [Examples](#examples)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## When to Split Documentation
|
||||
|
||||
### Size Guidelines
|
||||
|
||||
| Documentation Size | Recommendation | Strategy |
|
||||
|-------------------|----------------|----------|
|
||||
| < 5,000 pages | **One skill** | No splitting needed |
|
||||
| 5,000 - 10,000 pages | **Consider splitting** | Category-based |
|
||||
| 10,000 - 30,000 pages | **Recommended** | Router + Categories |
|
||||
| 30,000+ pages | **Strongly recommended** | Router + Categories |
|
||||
|
||||
### Why Split Large Documentation?
|
||||
|
||||
**Benefits:**
|
||||
- ✅ Faster scraping (parallel execution)
|
||||
- ✅ More focused skills (better Claude performance)
|
||||
- ✅ Easier maintenance (update one topic at a time)
|
||||
- ✅ Better user experience (precise answers)
|
||||
- ✅ Avoids context window limits
|
||||
|
||||
**Trade-offs:**
|
||||
- ⚠️ Multiple skills to manage
|
||||
- ⚠️ Initial setup more complex
|
||||
- ⚠️ Router adds one extra skill
|
||||
|
||||
---
|
||||
|
||||
## Split Strategies
|
||||
|
||||
### 1. **No Split** (One Big Skill)
|
||||
**Best for:** Small to medium documentation (< 5K pages)
|
||||
|
||||
```bash
|
||||
# Just use the config as-is
|
||||
python3 cli/doc_scraper.py --config configs/react.json
|
||||
```
|
||||
|
||||
**Pros:** Simple, one skill to maintain
|
||||
**Cons:** Can be slow for large docs, may hit limits
|
||||
|
||||
---
|
||||
|
||||
### 2. **Category Split** (Multiple Focused Skills)
|
||||
**Best for:** 5K-15K pages with clear topic divisions
|
||||
|
||||
```bash
|
||||
# Auto-split by categories
|
||||
python3 cli/split_config.py configs/godot.json --strategy category
|
||||
|
||||
# Creates:
|
||||
# - godot-scripting.json
|
||||
# - godot-2d.json
|
||||
# - godot-3d.json
|
||||
# - godot-physics.json
|
||||
# - etc.
|
||||
```
|
||||
|
||||
**Pros:** Focused skills, clear separation
|
||||
**Cons:** User must know which skill to use
|
||||
|
||||
---
|
||||
|
||||
### 3. **Router + Categories** (Intelligent Hub) ⭐ RECOMMENDED
|
||||
**Best for:** 10K+ pages, best user experience
|
||||
|
||||
```bash
|
||||
# Create router + sub-skills
|
||||
python3 cli/split_config.py configs/godot.json --strategy router
|
||||
|
||||
# Creates:
|
||||
# - godot.json (router/hub)
|
||||
# - godot-scripting.json
|
||||
# - godot-2d.json
|
||||
# - etc.
|
||||
```
|
||||
|
||||
**Pros:** Best of both worlds, intelligent routing, natural UX
|
||||
**Cons:** Slightly more complex setup
|
||||
|
||||
---
|
||||
|
||||
### 4. **Size-Based Split**
|
||||
**Best for:** Docs without clear categories
|
||||
|
||||
```bash
|
||||
# Split every 5000 pages
|
||||
python3 cli/split_config.py configs/bigdocs.json --strategy size --target-pages 5000
|
||||
|
||||
# Creates:
|
||||
# - bigdocs-part1.json
|
||||
# - bigdocs-part2.json
|
||||
# - bigdocs-part3.json
|
||||
# - etc.
|
||||
```
|
||||
|
||||
**Pros:** Simple, predictable
|
||||
**Cons:** May split related topics
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Option 1: Automatic (Recommended)
|
||||
|
||||
```bash
|
||||
# 1. Create config
|
||||
python3 cli/doc_scraper.py --interactive
|
||||
# Name: godot
|
||||
# URL: https://docs.godotengine.org
|
||||
# ... fill in prompts ...
|
||||
|
||||
# 2. Estimate pages (discovers it's large)
|
||||
python3 cli/estimate_pages.py configs/godot.json
|
||||
# Output: ⚠️ 40,000 pages detected - splitting recommended
|
||||
|
||||
# 3. Auto-split with router
|
||||
python3 cli/split_config.py configs/godot.json --strategy router
|
||||
|
||||
# 4. Scrape all sub-skills
|
||||
for config in configs/godot-*.json; do
|
||||
python3 cli/doc_scraper.py --config $config &
|
||||
done
|
||||
wait
|
||||
|
||||
# 5. Generate router
|
||||
python3 cli/generate_router.py configs/godot-*.json
|
||||
|
||||
# 6. Package all
|
||||
python3 cli/package_multi.py output/godot*/
|
||||
|
||||
# 7. Upload all .zip files to Claude
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Option 2: Manual Control
|
||||
|
||||
```bash
|
||||
# 1. Define split in config
|
||||
nano configs/godot.json
|
||||
|
||||
# Add:
|
||||
{
|
||||
"split_strategy": "router",
|
||||
"split_config": {
|
||||
"target_pages_per_skill": 5000,
|
||||
"create_router": true,
|
||||
"split_by_categories": ["scripting", "2d", "3d", "physics"]
|
||||
}
|
||||
}
|
||||
|
||||
# 2. Split
|
||||
python3 cli/split_config.py configs/godot.json
|
||||
|
||||
# 3. Continue as above...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Workflows
|
||||
|
||||
### Workflow 1: Router + Categories (40K Pages)
|
||||
|
||||
**Scenario:** Godot documentation (40,000 pages)
|
||||
|
||||
**Step 1: Estimate**
|
||||
```bash
|
||||
python3 cli/estimate_pages.py configs/godot.json
|
||||
|
||||
# Output:
|
||||
# Estimated: 40,000 pages
|
||||
# Recommended: Split into 8 skills (5K each)
|
||||
```
|
||||
|
||||
**Step 2: Split Configuration**
|
||||
```bash
|
||||
python3 cli/split_config.py configs/godot.json --strategy router --target-pages 5000
|
||||
|
||||
# Creates:
|
||||
# configs/godot.json (router)
|
||||
# configs/godot-scripting.json (5K pages)
|
||||
# configs/godot-2d.json (8K pages)
|
||||
# configs/godot-3d.json (10K pages)
|
||||
# configs/godot-physics.json (6K pages)
|
||||
# configs/godot-shaders.json (11K pages)
|
||||
```
|
||||
|
||||
**Step 3: Scrape Sub-Skills (Parallel)**
|
||||
```bash
|
||||
# Open multiple terminals or use background jobs
|
||||
python3 cli/doc_scraper.py --config configs/godot-scripting.json &
|
||||
python3 cli/doc_scraper.py --config configs/godot-2d.json &
|
||||
python3 cli/doc_scraper.py --config configs/godot-3d.json &
|
||||
python3 cli/doc_scraper.py --config configs/godot-physics.json &
|
||||
python3 cli/doc_scraper.py --config configs/godot-shaders.json &
|
||||
|
||||
# Wait for all to complete
|
||||
wait
|
||||
|
||||
# Time: 4-8 hours (parallel) vs 20-40 hours (sequential)
|
||||
```
|
||||
|
||||
**Step 4: Generate Router**
|
||||
```bash
|
||||
python3 cli/generate_router.py configs/godot-*.json
|
||||
|
||||
# Creates:
|
||||
# output/godot/SKILL.md (router skill)
|
||||
```
|
||||
|
||||
**Step 5: Package All**
|
||||
```bash
|
||||
python3 cli/package_multi.py output/godot*/
|
||||
|
||||
# Creates:
|
||||
# output/godot.zip (router)
|
||||
# output/godot-scripting.zip
|
||||
# output/godot-2d.zip
|
||||
# output/godot-3d.zip
|
||||
# output/godot-physics.zip
|
||||
# output/godot-shaders.zip
|
||||
```
|
||||
|
||||
**Step 6: Upload to Claude**
|
||||
Upload all 6 .zip files to Claude. The router will intelligently direct queries to the right sub-skill!
|
||||
|
||||
---
|
||||
|
||||
### Workflow 2: Category Split Only (15K Pages)
|
||||
|
||||
**Scenario:** Vue.js documentation (15,000 pages)
|
||||
|
||||
**No router needed - just focused skills:**
|
||||
|
||||
```bash
|
||||
# 1. Split
|
||||
python3 cli/split_config.py configs/vue.json --strategy category
|
||||
|
||||
# 2. Scrape each
|
||||
for config in configs/vue-*.json; do
|
||||
python3 cli/doc_scraper.py --config $config
|
||||
done
|
||||
|
||||
# 3. Package
|
||||
python3 cli/package_multi.py output/vue*/
|
||||
|
||||
# 4. Upload all to Claude
|
||||
```
|
||||
|
||||
**Result:** 5 focused Vue skills (components, reactivity, routing, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. **Choose Target Size Wisely**
|
||||
|
||||
```bash
|
||||
# Small focused skills (3K-5K pages) - more skills, very focused
|
||||
python3 cli/split_config.py config.json --target-pages 3000
|
||||
|
||||
# Medium skills (5K-8K pages) - balanced (RECOMMENDED)
|
||||
python3 cli/split_config.py config.json --target-pages 5000
|
||||
|
||||
# Larger skills (8K-10K pages) - fewer skills, broader
|
||||
python3 cli/split_config.py config.json --target-pages 8000
|
||||
```
|
||||
|
||||
### 2. **Use Parallel Scraping**
|
||||
|
||||
```bash
|
||||
# Serial (slow - 40 hours)
|
||||
for config in configs/godot-*.json; do
|
||||
python3 cli/doc_scraper.py --config $config
|
||||
done
|
||||
|
||||
# Parallel (fast - 8 hours) ⭐
|
||||
for config in configs/godot-*.json; do
|
||||
python3 cli/doc_scraper.py --config $config &
|
||||
done
|
||||
wait
|
||||
```
|
||||
|
||||
### 3. **Test Before Full Scrape**
|
||||
|
||||
```bash
|
||||
# Test with limited pages first
|
||||
nano configs/godot-2d.json
|
||||
# Set: "max_pages": 50
|
||||
|
||||
python3 cli/doc_scraper.py --config configs/godot-2d.json
|
||||
|
||||
# If output looks good, increase to full
|
||||
```
|
||||
|
||||
### 4. **Use Checkpoints for Long Scrapes**
|
||||
|
||||
```bash
|
||||
# Enable checkpoints in config
|
||||
{
|
||||
"checkpoint": {
|
||||
"enabled": true,
|
||||
"interval": 1000
|
||||
}
|
||||
}
|
||||
|
||||
# If scrape fails, resume
|
||||
python3 cli/doc_scraper.py --config config.json --resume
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: AWS Documentation (Hypothetical 50K Pages)
|
||||
|
||||
```bash
|
||||
# 1. Split by AWS services
|
||||
python3 cli/split_config.py configs/aws.json --strategy router --target-pages 5000
|
||||
|
||||
# Creates ~10 skills:
|
||||
# - aws (router)
|
||||
# - aws-compute (EC2, Lambda)
|
||||
# - aws-storage (S3, EBS)
|
||||
# - aws-database (RDS, DynamoDB)
|
||||
# - etc.
|
||||
|
||||
# 2. Scrape in parallel (overnight)
|
||||
# 3. Upload all skills to Claude
|
||||
# 4. User asks "How do I create an S3 bucket?"
|
||||
# 5. Router activates aws-storage skill
|
||||
# 6. Focused, accurate answer!
|
||||
```
|
||||
|
||||
### Example 2: Microsoft Docs (100K+ Pages)
|
||||
|
||||
```bash
|
||||
# Too large even with splitting - use selective categories
|
||||
|
||||
# Only scrape key topics
|
||||
python3 cli/split_config.py configs/microsoft.json --strategy category
|
||||
|
||||
# Edit configs to include only:
|
||||
# - microsoft-azure (Azure docs only)
|
||||
# - microsoft-dotnet (.NET docs only)
|
||||
# - microsoft-typescript (TS docs only)
|
||||
|
||||
# Skip less relevant sections
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: "Splitting creates too many skills"
|
||||
|
||||
**Solution:** Increase target size or combine categories
|
||||
|
||||
```bash
|
||||
# Instead of 5K per skill, use 8K
|
||||
python3 cli/split_config.py config.json --target-pages 8000
|
||||
|
||||
# Or manually combine categories in config
|
||||
```
|
||||
|
||||
### Issue: "Router not routing correctly"
|
||||
|
||||
**Solution:** Check routing keywords in router SKILL.md
|
||||
|
||||
```bash
|
||||
# Review router
|
||||
cat output/godot/SKILL.md
|
||||
|
||||
# Update keywords if needed
|
||||
nano output/godot/SKILL.md
|
||||
```
|
||||
|
||||
### Issue: "Parallel scraping fails"
|
||||
|
||||
**Solution:** Reduce parallelism or check rate limits
|
||||
|
||||
```bash
|
||||
# Scrape 2-3 at a time instead of all
|
||||
python3 cli/doc_scraper.py --config config1.json &
|
||||
python3 cli/doc_scraper.py --config config2.json &
|
||||
wait
|
||||
|
||||
python3 cli/doc_scraper.py --config config3.json &
|
||||
python3 cli/doc_scraper.py --config config4.json &
|
||||
wait
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**For 40K+ Page Documentation:**
|
||||
|
||||
1. ✅ **Estimate first**: `python3 cli/estimate_pages.py config.json`
|
||||
2. ✅ **Split with router**: `python3 cli/split_config.py config.json --strategy router`
|
||||
3. ✅ **Scrape in parallel**: Multiple terminals or background jobs
|
||||
4. ✅ **Generate router**: `python3 cli/generate_router.py configs/*-*.json`
|
||||
5. ✅ **Package all**: `python3 cli/package_multi.py output/*/`
|
||||
6. ✅ **Upload to Claude**: All .zip files
|
||||
|
||||
**Result:** Intelligent, fast, focused skills that work seamlessly together!
|
||||
|
||||
---
|
||||
|
||||
**Questions? See:**
|
||||
- [Main README](../README.md)
|
||||
- [MCP Setup Guide](MCP_SETUP.md)
|
||||
- [Enhancement Guide](ENHANCEMENT.md)
|
||||
60
docs/reference/LLMS_TXT_SUPPORT.md
Normal file
60
docs/reference/LLMS_TXT_SUPPORT.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# llms.txt Support
|
||||
|
||||
## Overview
|
||||
|
||||
Skill_Seekers now automatically detects and uses llms.txt files when available, providing 10x faster documentation ingestion.
|
||||
|
||||
## What is llms.txt?
|
||||
|
||||
The llms.txt convention is a growing standard where documentation sites provide pre-formatted, LLM-ready markdown files:
|
||||
|
||||
- `llms-full.txt` - Complete documentation
|
||||
- `llms.txt` - Standard balanced version
|
||||
- `llms-small.txt` - Quick reference
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Before HTML scraping, Skill_Seekers checks for llms.txt files
|
||||
2. If found, downloads and parses the markdown
|
||||
3. If not found, falls back to HTML scraping
|
||||
4. Zero config changes needed
|
||||
|
||||
## Configuration
|
||||
|
||||
### Automatic Detection (Recommended)
|
||||
|
||||
No config changes needed. Just run normally:
|
||||
|
||||
```bash
|
||||
python3 cli/doc_scraper.py --config configs/hono.json
|
||||
```
|
||||
|
||||
### Explicit URL
|
||||
|
||||
Optionally specify llms.txt URL:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "hono",
|
||||
"llms_txt_url": "https://hono.dev/llms-full.txt",
|
||||
"base_url": "https://hono.dev/docs"
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
| Method | Time | Requests |
|
||||
|--------|------|----------|
|
||||
| HTML Scraping (20 pages) | 20-60s | 20+ |
|
||||
| llms.txt | < 5s | 1 |
|
||||
|
||||
## Supported Sites
|
||||
|
||||
Sites known to provide llms.txt:
|
||||
|
||||
- Hono: https://hono.dev/llms-full.txt
|
||||
- (More to be discovered)
|
||||
|
||||
## Fallback Behavior
|
||||
|
||||
If llms.txt download or parsing fails, automatically falls back to HTML scraping with no user intervention required.
|
||||
930
docs/reference/SKILL_ARCHITECTURE.md
Normal file
930
docs/reference/SKILL_ARCHITECTURE.md
Normal file
@@ -0,0 +1,930 @@
|
||||
# Skill Architecture Guide: Layering and Splitting
|
||||
|
||||
Complete guide for architecting complex multi-skill systems using the router/dispatcher pattern.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Overview](#overview)
|
||||
- [When to Split Skills](#when-to-split-skills)
|
||||
- [The Router Pattern](#the-router-pattern)
|
||||
- [Manual Skill Architecture](#manual-skill-architecture)
|
||||
- [Best Practices](#best-practices)
|
||||
- [Complete Examples](#complete-examples)
|
||||
- [Implementation Guide](#implementation-guide)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
### The 500-Line Guideline
|
||||
|
||||
Claude recommends keeping skill files under **500 lines** for optimal performance. This guideline exists because:
|
||||
|
||||
- ✅ **Better parsing** - AI can more effectively understand focused content
|
||||
- ✅ **Context efficiency** - Only relevant information loaded per task
|
||||
- ✅ **Maintainability** - Easier to debug, update, and manage
|
||||
- ✅ **Single responsibility** - Each skill does one thing well
|
||||
|
||||
### The Problem with Monolithic Skills
|
||||
|
||||
As applications grow complex, developers often create skills that:
|
||||
|
||||
- ❌ **Exceed 500 lines** - Too much information for effective parsing
|
||||
- ❌ **Mix concerns** - Handle multiple unrelated responsibilities
|
||||
- ❌ **Waste context** - Load entire file even when only small portion is relevant
|
||||
- ❌ **Hard to maintain** - Changes require careful navigation of large file
|
||||
|
||||
### The Solution: Skill Layering
|
||||
|
||||
**Skill layering** involves:
|
||||
|
||||
1. **Splitting** - Breaking large skill into focused sub-skills
|
||||
2. **Routing** - Creating master skill that directs queries to appropriate sub-skill
|
||||
3. **Loading** - Only activating relevant sub-skills per task
|
||||
|
||||
**Result:** Build sophisticated applications while maintaining 500-line guideline per skill.
|
||||
|
||||
---
|
||||
|
||||
## When to Split Skills
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Skill Size | Complexity | Recommendation |
|
||||
|-----------|-----------|----------------|
|
||||
| < 500 lines | Single concern | ✅ **Keep monolithic** |
|
||||
| 500-1000 lines | Related concerns | ⚠️ **Consider splitting** |
|
||||
| 1000+ lines | Multiple concerns | ❌ **Must split** |
|
||||
|
||||
### Split Indicators
|
||||
|
||||
**You should split when:**
|
||||
|
||||
- ✅ Skill exceeds 500 lines
|
||||
- ✅ Multiple distinct responsibilities (CRUD, workflows, etc.)
|
||||
- ✅ Different team members maintain different sections
|
||||
- ✅ Only portions are relevant to specific tasks
|
||||
- ✅ Context window frequently exceeded
|
||||
|
||||
**You can keep monolithic when:**
|
||||
|
||||
- ✅ Under 500 lines
|
||||
- ✅ Single, cohesive responsibility
|
||||
- ✅ All content frequently relevant together
|
||||
- ✅ Simple, focused use case
|
||||
|
||||
---
|
||||
|
||||
## The Router Pattern
|
||||
|
||||
### What is a Router Skill?
|
||||
|
||||
A **router skill** (also called **dispatcher** or **hub** skill) is a lightweight master skill that:
|
||||
|
||||
1. **Analyzes** the user's query
|
||||
2. **Identifies** which sub-skill(s) are relevant
|
||||
3. **Directs** Claude to activate appropriate sub-skill(s)
|
||||
4. **Coordinates** responses from multiple sub-skills if needed
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
User Query: "How do I book a flight to Paris?"
|
||||
↓
|
||||
Router Skill: Analyzes keywords → "flight", "book"
|
||||
↓
|
||||
Activates: flight_booking sub-skill
|
||||
↓
|
||||
Response: Flight booking guidance (only this skill loaded)
|
||||
```
|
||||
|
||||
### Router Skill Structure
|
||||
|
||||
```markdown
|
||||
# Travel Planner (Router)
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use for travel planning, booking, and itinerary management.
|
||||
|
||||
This is a router skill that directs your questions to specialized sub-skills.
|
||||
|
||||
## Sub-Skills Available
|
||||
|
||||
### flight_booking
|
||||
For booking flights, searching airlines, comparing prices, seat selection.
|
||||
**Keywords:** flight, airline, booking, ticket, departure, arrival
|
||||
|
||||
### hotel_reservation
|
||||
For hotel search, room booking, amenities, check-in/check-out.
|
||||
**Keywords:** hotel, accommodation, room, reservation, stay
|
||||
|
||||
### itinerary_generation
|
||||
For creating travel plans, scheduling activities, route optimization.
|
||||
**Keywords:** itinerary, schedule, plan, activities, route
|
||||
|
||||
## Routing Logic
|
||||
|
||||
Based on your question keywords:
|
||||
- Flight-related → Activate `flight_booking`
|
||||
- Hotel-related → Activate `hotel_reservation`
|
||||
- Planning-related → Activate `itinerary_generation`
|
||||
- Multiple topics → Activate relevant combination
|
||||
|
||||
## Usage Examples
|
||||
|
||||
**"Find me a flight to Paris"** → flight_booking
|
||||
**"Book hotel in Tokyo"** → hotel_reservation
|
||||
**"Create 5-day Rome itinerary"** → itinerary_generation
|
||||
**"Plan Paris trip with flights and hotel"** → flight_booking + hotel_reservation + itinerary_generation
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Manual Skill Architecture
|
||||
|
||||
### Example 1: E-Commerce Platform
|
||||
|
||||
**Problem:** E-commerce skill is 2000+ lines covering catalog, cart, checkout, orders, and admin.
|
||||
|
||||
**Solution:** Split into focused sub-skills with router.
|
||||
|
||||
#### Sub-Skills
|
||||
|
||||
**1. `ecommerce.md` (Router - 150 lines)**
|
||||
```markdown
|
||||
# E-Commerce Platform (Router)
|
||||
|
||||
## Sub-Skills
|
||||
- product_catalog - Browse, search, filter products
|
||||
- shopping_cart - Add/remove items, quantities
|
||||
- checkout_payment - Process orders, payments
|
||||
- order_management - Track orders, returns
|
||||
- admin_tools - Inventory, analytics
|
||||
|
||||
## Routing
|
||||
product/catalog/search → product_catalog
|
||||
cart/basket/add/remove → shopping_cart
|
||||
checkout/payment/billing → checkout_payment
|
||||
order/track/return → order_management
|
||||
admin/inventory/analytics → admin_tools
|
||||
```
|
||||
|
||||
**2. `product_catalog.md` (350 lines)**
|
||||
```markdown
|
||||
# Product Catalog
|
||||
|
||||
## When to Use
|
||||
Product browsing, searching, filtering, recommendations.
|
||||
|
||||
## Quick Reference
|
||||
- Search products: `search(query, filters)`
|
||||
- Get details: `getProduct(id)`
|
||||
- Filter: `filter(category, price, brand)`
|
||||
...
|
||||
```
|
||||
|
||||
**3. `shopping_cart.md` (280 lines)**
|
||||
```markdown
|
||||
# Shopping Cart
|
||||
|
||||
## When to Use
|
||||
Managing cart items, quantities, totals.
|
||||
|
||||
## Quick Reference
|
||||
- Add item: `cart.add(productId, quantity)`
|
||||
- Update quantity: `cart.update(itemId, quantity)`
|
||||
...
|
||||
```
|
||||
|
||||
**Result:**
|
||||
- Router: 150 lines ✅
|
||||
- Each sub-skill: 200-400 lines ✅
|
||||
- Total functionality: Unchanged
|
||||
- Context efficiency: 5x improvement
|
||||
|
||||
---
|
||||
|
||||
### Example 2: Code Assistant
|
||||
|
||||
**Problem:** Code assistant handles debugging, refactoring, documentation, testing - 1800+ lines.
|
||||
|
||||
**Solution:** Specialized sub-skills with smart routing.
|
||||
|
||||
#### Architecture
|
||||
|
||||
```
|
||||
code_assistant.md (Router - 200 lines)
|
||||
├── debugging.md (450 lines)
|
||||
├── refactoring.md (380 lines)
|
||||
├── documentation.md (320 lines)
|
||||
└── testing.md (400 lines)
|
||||
```
|
||||
|
||||
#### Router Logic
|
||||
|
||||
```markdown
|
||||
# Code Assistant (Router)
|
||||
|
||||
## Routing Keywords
|
||||
|
||||
### debugging
|
||||
error, bug, exception, crash, fix, troubleshoot, debug
|
||||
|
||||
### refactoring
|
||||
refactor, clean, optimize, simplify, restructure, improve
|
||||
|
||||
### documentation
|
||||
docs, comment, docstring, readme, api, explain
|
||||
|
||||
### testing
|
||||
test, unit, integration, coverage, assert, mock
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Example 3: Data Pipeline
|
||||
|
||||
**Problem:** ETL pipeline skill covers extraction, transformation, loading, validation, monitoring.
|
||||
|
||||
**Solution:** Pipeline stages as sub-skills.
|
||||
|
||||
```
|
||||
data_pipeline.md (Router)
|
||||
├── data_extraction.md - Source connectors, API calls
|
||||
├── data_transformation.md - Cleaning, mapping, enrichment
|
||||
├── data_loading.md - Database writes, file exports
|
||||
├── data_validation.md - Quality checks, error handling
|
||||
└── pipeline_monitoring.md - Logging, alerts, metrics
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Single Responsibility Principle
|
||||
|
||||
**Each sub-skill should have ONE clear purpose.**
|
||||
|
||||
❌ **Bad:** `user_management.md` handles auth, profiles, permissions, notifications
|
||||
✅ **Good:**
|
||||
- `user_authentication.md` - Login, logout, sessions
|
||||
- `user_profiles.md` - Profile CRUD
|
||||
- `user_permissions.md` - Roles, access control
|
||||
- `user_notifications.md` - Email, push, alerts
|
||||
|
||||
### 2. Clear Routing Keywords
|
||||
|
||||
**Make routing keywords explicit and unambiguous.**
|
||||
|
||||
❌ **Bad:** Vague keywords like "data", "user", "process"
|
||||
✅ **Good:** Specific keywords like "login", "authenticate", "extract", "transform"
|
||||
|
||||
### 3. Minimize Router Complexity
|
||||
|
||||
**Keep router lightweight - just routing logic.**
|
||||
|
||||
❌ **Bad:** Router contains actual implementation code
|
||||
✅ **Good:** Router only contains:
|
||||
- Sub-skill descriptions
|
||||
- Routing keywords
|
||||
- Usage examples
|
||||
- No implementation details
|
||||
|
||||
### 4. Logical Grouping
|
||||
|
||||
**Group by responsibility, not by code structure.**
|
||||
|
||||
❌ **Bad:** Split by file type (controllers, models, views)
|
||||
✅ **Good:** Split by feature (user_auth, product_catalog, order_processing)
|
||||
|
||||
### 5. Avoid Over-Splitting
|
||||
|
||||
**Don't create sub-skills for trivial distinctions.**
|
||||
|
||||
❌ **Bad:** Separate skills for "add_user" and "update_user"
|
||||
✅ **Good:** Single "user_management" skill covering all CRUD
|
||||
|
||||
### 6. Document Dependencies
|
||||
|
||||
**Explicitly state when sub-skills work together.**
|
||||
|
||||
```markdown
|
||||
## Multi-Skill Operations
|
||||
|
||||
**Place order:** Requires coordination between:
|
||||
1. product_catalog - Validate product availability
|
||||
2. shopping_cart - Get cart contents
|
||||
3. checkout_payment - Process payment
|
||||
4. order_management - Create order record
|
||||
```
|
||||
|
||||
### 7. Maintain Consistent Structure
|
||||
|
||||
**Use same SKILL.md structure across all sub-skills.**
|
||||
|
||||
Standard sections:
|
||||
```markdown
|
||||
# Skill Name
|
||||
|
||||
## When to Use This Skill
|
||||
[Clear description]
|
||||
|
||||
## Quick Reference
|
||||
[Common operations]
|
||||
|
||||
## Key Concepts
|
||||
[Domain terminology]
|
||||
|
||||
## Working with This Skill
|
||||
[Usage guidance]
|
||||
|
||||
## Reference Files
|
||||
[Documentation organization]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Examples
|
||||
|
||||
### Travel Planner (Full Implementation)
|
||||
|
||||
#### Directory Structure
|
||||
|
||||
```
|
||||
skills/
|
||||
├── travel_planner.md (Router - 180 lines)
|
||||
├── flight_booking.md (420 lines)
|
||||
├── hotel_reservation.md (380 lines)
|
||||
├── itinerary_generation.md (450 lines)
|
||||
├── travel_insurance.md (290 lines)
|
||||
└── budget_tracking.md (340 lines)
|
||||
```
|
||||
|
||||
#### travel_planner.md (Router)
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: travel_planner
|
||||
description: Travel planning, booking, and itinerary management router
|
||||
---
|
||||
|
||||
# Travel Planner (Router)
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use for all travel-related planning, bookings, and itinerary management.
|
||||
|
||||
This router skill analyzes your travel needs and activates specialized sub-skills.
|
||||
|
||||
## Available Sub-Skills
|
||||
|
||||
### flight_booking
|
||||
**Purpose:** Flight search, booking, seat selection, airline comparisons
|
||||
**Keywords:** flight, airline, plane, ticket, departure, arrival, airport, booking
|
||||
**Use for:** Finding and booking flights, comparing prices, selecting seats
|
||||
|
||||
### hotel_reservation
|
||||
**Purpose:** Hotel search, room booking, amenities, check-in/out
|
||||
**Keywords:** hotel, accommodation, room, lodging, reservation, stay, check-in
|
||||
**Use for:** Finding hotels, booking rooms, checking amenities
|
||||
|
||||
### itinerary_generation
|
||||
**Purpose:** Travel planning, scheduling, route optimization
|
||||
**Keywords:** itinerary, schedule, plan, route, activities, sightseeing
|
||||
**Use for:** Creating day-by-day plans, organizing activities
|
||||
|
||||
### travel_insurance
|
||||
**Purpose:** Travel insurance options, coverage, claims
|
||||
**Keywords:** insurance, coverage, protection, medical, cancellation, claim
|
||||
**Use for:** Insurance recommendations, comparing policies
|
||||
|
||||
### budget_tracking
|
||||
**Purpose:** Travel budget planning, expense tracking
|
||||
**Keywords:** budget, cost, expense, price, spending, money
|
||||
**Use for:** Estimating costs, tracking expenses
|
||||
|
||||
## Routing Logic
|
||||
|
||||
The router analyzes your question and activates relevant skills:
|
||||
|
||||
| Query Pattern | Activated Skills |
|
||||
|--------------|------------------|
|
||||
| "Find flights to [destination]" | flight_booking |
|
||||
| "Book hotel in [city]" | hotel_reservation |
|
||||
| "Plan [duration] trip to [destination]" | itinerary_generation |
|
||||
| "Need travel insurance" | travel_insurance |
|
||||
| "How much will trip cost?" | budget_tracking |
|
||||
| "Plan complete Paris vacation" | ALL (coordinated) |
|
||||
|
||||
## Multi-Skill Coordination
|
||||
|
||||
Some requests require multiple skills working together:
|
||||
|
||||
### Complete Trip Planning
|
||||
1. **budget_tracking** - Set budget constraints
|
||||
2. **flight_booking** - Find flights within budget
|
||||
3. **hotel_reservation** - Book accommodation
|
||||
4. **itinerary_generation** - Create daily schedule
|
||||
5. **travel_insurance** - Recommend coverage
|
||||
|
||||
### Booking Modification
|
||||
1. **flight_booking** - Check flight change fees
|
||||
2. **hotel_reservation** - Verify cancellation policy
|
||||
3. **budget_tracking** - Calculate cost impact
|
||||
|
||||
## Usage Examples
|
||||
|
||||
**Simple (single skill):**
|
||||
- "Find direct flights to Tokyo" → flight_booking
|
||||
- "5-star hotels in Paris under $200/night" → hotel_reservation
|
||||
- "Create 3-day Rome itinerary" → itinerary_generation
|
||||
|
||||
**Complex (multiple skills):**
|
||||
- "Plan week-long Paris trip for 2, budget $3000" → budget_tracking → flight_booking → hotel_reservation → itinerary_generation
|
||||
- "Cheapest way to visit London next month" → budget_tracking + flight_booking + hotel_reservation
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Flight Booking
|
||||
- Search flights by route, dates, airline
|
||||
- Compare prices across carriers
|
||||
- Select seats, meals, baggage
|
||||
|
||||
### Hotel Reservation
|
||||
- Filter by price, rating, amenities
|
||||
- Check availability, reviews
|
||||
- Book rooms with cancellation policy
|
||||
|
||||
### Itinerary Planning
|
||||
- Generate day-by-day schedules
|
||||
- Optimize routes between attractions
|
||||
- Balance activities with free time
|
||||
|
||||
### Travel Insurance
|
||||
- Compare coverage options
|
||||
- Understand medical, cancellation policies
|
||||
- File claims if needed
|
||||
|
||||
### Budget Tracking
|
||||
- Estimate total trip cost
|
||||
- Track expenses vs budget
|
||||
- Optimize spending
|
||||
|
||||
## Working with This Skill
|
||||
|
||||
**Beginners:** Start with single-purpose queries ("Find flights to Paris")
|
||||
**Intermediate:** Combine 2-3 aspects ("Find flights and hotel in Tokyo")
|
||||
**Advanced:** Request complete trip planning with multiple constraints
|
||||
|
||||
The router handles complexity automatically - just ask naturally!
|
||||
```
|
||||
|
||||
#### flight_booking.md (Sub-Skill)
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: flight_booking
|
||||
description: Flight search, booking, and airline comparisons
|
||||
---
|
||||
|
||||
# Flight Booking
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use when searching for flights, comparing airlines, booking tickets, or managing flight reservations.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Searching Flights
|
||||
|
||||
**Search by route:**
|
||||
```
|
||||
Find flights from [origin] to [destination]
|
||||
Examples:
|
||||
- "Flights from NYC to London"
|
||||
- "JFK to Heathrow direct flights"
|
||||
```
|
||||
|
||||
**Search with dates:**
|
||||
```
|
||||
Flights from [origin] to [destination] on [date]
|
||||
Examples:
|
||||
- "Flights from LAX to Paris on June 15"
|
||||
- "Return flights NYC to Tokyo, depart May 1, return May 15"
|
||||
```
|
||||
|
||||
**Filter by preferences:**
|
||||
```
|
||||
[direct/nonstop] flights from [origin] to [destination]
|
||||
[airline] flights to [destination]
|
||||
Cheapest/fastest flights to [destination]
|
||||
|
||||
Examples:
|
||||
- "Direct flights from Boston to Dublin"
|
||||
- "Delta flights to Seattle"
|
||||
- "Cheapest flights to Miami next month"
|
||||
```
|
||||
|
||||
### Booking Process
|
||||
|
||||
1. **Search** - Find flights matching criteria
|
||||
2. **Compare** - Review prices, times, airlines
|
||||
3. **Select** - Choose specific flight
|
||||
4. **Customize** - Add seat, baggage, meals
|
||||
5. **Confirm** - Book and receive confirmation
|
||||
|
||||
### Price Comparison
|
||||
|
||||
Compare across:
|
||||
- Airlines (Delta, United, American, etc.)
|
||||
- Booking sites (Expedia, Kayak, etc.)
|
||||
- Direct vs connections
|
||||
- Dates (flexible date search)
|
||||
- Classes (Economy, Business, First)
|
||||
|
||||
### Seat Selection
|
||||
|
||||
Options:
|
||||
- Window, aisle, middle
|
||||
- Extra legroom
|
||||
- Bulkhead, exit row
|
||||
- Section preferences (front, middle, rear)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### Flight Types
|
||||
- **Direct** - No stops, same plane
|
||||
- **Nonstop** - Same as direct
|
||||
- **Connecting** - One or more stops, change planes
|
||||
- **Multi-city** - Different return city
|
||||
- **Open-jaw** - Different origin/destination cities
|
||||
|
||||
### Fare Classes
|
||||
- **Basic Economy** - Cheapest, most restrictions
|
||||
- **Economy** - Standard coach
|
||||
- **Premium Economy** - Extra space, amenities
|
||||
- **Business** - Lie-flat seats, premium service
|
||||
- **First Class** - Maximum luxury
|
||||
|
||||
### Booking Terms
|
||||
- **Fare rules** - Cancellation, change policies
|
||||
- **Baggage allowance** - Checked and carry-on limits
|
||||
- **Layover** - Time between connecting flights
|
||||
- **Codeshare** - Same flight, different airline numbers
|
||||
|
||||
## Working with This Skill
|
||||
|
||||
### For Beginners
|
||||
Start with simple searches:
|
||||
1. State origin and destination
|
||||
2. Provide travel dates
|
||||
3. Mention any preferences (direct, airline)
|
||||
|
||||
The skill will guide you through options step-by-step.
|
||||
|
||||
### For Intermediate Users
|
||||
Provide more details upfront:
|
||||
- Preferred airlines or alliances
|
||||
- Class of service
|
||||
- Maximum connections
|
||||
- Price range
|
||||
- Specific times of day
|
||||
|
||||
### For Advanced Users
|
||||
Complex multi-city routing:
|
||||
- Multiple destinations
|
||||
- Open-jaw bookings
|
||||
- Award ticket searches
|
||||
- Specific aircraft types
|
||||
- Detailed fare class codes
|
||||
|
||||
## Reference Files
|
||||
|
||||
All flight booking documentation is in `references/`:
|
||||
|
||||
- `flight_search.md` - Search strategies, filters
|
||||
- `airline_policies.md` - Carrier-specific rules
|
||||
- `booking_process.md` - Step-by-step booking
|
||||
- `seat_selection.md` - Seating guides
|
||||
- `fare_classes.md` - Ticket types, restrictions
|
||||
- `baggage_rules.md` - Luggage policies
|
||||
- `frequent_flyer.md` - Loyalty programs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Guide
|
||||
|
||||
### Step 1: Identify Split Points
|
||||
|
||||
**Analyze your monolithic skill:**
|
||||
|
||||
1. List all major responsibilities
|
||||
2. Group related functionality
|
||||
3. Identify natural boundaries
|
||||
4. Count lines per group
|
||||
|
||||
**Example:**
|
||||
|
||||
```
|
||||
user_management.md (1800 lines)
|
||||
├── Authentication (450 lines) ← Sub-skill
|
||||
├── Profile CRUD (380 lines) ← Sub-skill
|
||||
├── Permissions (320 lines) ← Sub-skill
|
||||
├── Notifications (280 lines) ← Sub-skill
|
||||
└── Activity logs (370 lines) ← Sub-skill
|
||||
```
|
||||
|
||||
### Step 2: Extract Sub-Skills
|
||||
|
||||
**For each identified group:**
|
||||
|
||||
1. Create new `{subskill}.md` file
|
||||
2. Copy relevant content
|
||||
3. Add proper frontmatter
|
||||
4. Ensure 200-500 line range
|
||||
5. Remove dependencies on other groups
|
||||
|
||||
**Template:**
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: {subskill_name}
|
||||
description: {clear, specific description}
|
||||
---
|
||||
|
||||
# {Subskill Title}
|
||||
|
||||
## When to Use This Skill
|
||||
[Specific use cases]
|
||||
|
||||
## Quick Reference
|
||||
[Common operations]
|
||||
|
||||
## Key Concepts
|
||||
[Domain terms]
|
||||
|
||||
## Working with This Skill
|
||||
[Usage guidance by skill level]
|
||||
|
||||
## Reference Files
|
||||
[Documentation structure]
|
||||
```
|
||||
|
||||
### Step 3: Create Router
|
||||
|
||||
**Router skill template:**
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: {router_name}
|
||||
description: {overall system description}
|
||||
---
|
||||
|
||||
# {System Name} (Router)
|
||||
|
||||
## When to Use This Skill
|
||||
{High-level description}
|
||||
|
||||
This is a router skill that directs queries to specialized sub-skills.
|
||||
|
||||
## Available Sub-Skills
|
||||
|
||||
### {subskill_1}
|
||||
**Purpose:** {What it does}
|
||||
**Keywords:** {routing, keywords, here}
|
||||
**Use for:** {When to use}
|
||||
|
||||
### {subskill_2}
|
||||
[Same pattern]
|
||||
|
||||
## Routing Logic
|
||||
|
||||
Based on query keywords:
|
||||
- {keyword_group_1} → {subskill_1}
|
||||
- {keyword_group_2} → {subskill_2}
|
||||
- Multiple matches → Coordinate relevant skills
|
||||
|
||||
## Multi-Skill Operations
|
||||
|
||||
{Describe when multiple skills work together}
|
||||
|
||||
## Usage Examples
|
||||
|
||||
**Single skill:**
|
||||
- "{example_query_1}" → {subskill_1}
|
||||
- "{example_query_2}" → {subskill_2}
|
||||
|
||||
**Multiple skills:**
|
||||
- "{complex_query}" → {subskill_1} + {subskill_2}
|
||||
```
|
||||
|
||||
### Step 4: Define Routing Keywords
|
||||
|
||||
**Best practices:**
|
||||
|
||||
- Use 5-10 keywords per sub-skill
|
||||
- Include synonyms and variations
|
||||
- Be specific, not generic
|
||||
- Test with real queries
|
||||
|
||||
**Example:**
|
||||
|
||||
```markdown
|
||||
### user_authentication
|
||||
**Keywords:**
|
||||
- Primary: login, logout, signin, signout, authenticate
|
||||
- Secondary: password, credentials, session, token
|
||||
- Variations: log-in, log-out, sign-in, sign-out
|
||||
```
|
||||
|
||||
### Step 5: Test Routing
|
||||
|
||||
**Create test queries:**
|
||||
|
||||
```markdown
|
||||
## Test Routing (Internal Notes)
|
||||
|
||||
Should route to user_authentication:
|
||||
✓ "How do I log in?"
|
||||
✓ "User login process"
|
||||
✓ "Authentication failed"
|
||||
|
||||
Should route to user_profiles:
|
||||
✓ "Update user profile"
|
||||
✓ "Change profile picture"
|
||||
|
||||
Should route to multiple skills:
|
||||
✓ "Create account and set up profile" → user_authentication + user_profiles
|
||||
```
|
||||
|
||||
### Step 6: Update References
|
||||
|
||||
**In each sub-skill:**
|
||||
|
||||
1. Link to router for context
|
||||
2. Reference related sub-skills
|
||||
3. Update navigation paths
|
||||
|
||||
```markdown
|
||||
## Related Skills
|
||||
|
||||
This skill is part of the {System Name} suite:
|
||||
- **Router:** {router_name} - Main entry point
|
||||
- **Related:** {related_subskill} - For {use case}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Router Not Activating Correct Sub-Skill
|
||||
|
||||
**Problem:** Query routed to wrong sub-skill
|
||||
|
||||
**Solutions:**
|
||||
1. Add missing keywords to router
|
||||
2. Use more specific routing keywords
|
||||
3. Add disambiguation examples
|
||||
4. Test with variations of query phrasing
|
||||
|
||||
### Sub-Skills Too Granular
|
||||
|
||||
**Problem:** Too many tiny sub-skills (< 200 lines each)
|
||||
|
||||
**Solution:**
|
||||
- Merge related sub-skills
|
||||
- Use sections within single skill instead
|
||||
- Aim for 300-500 lines per sub-skill
|
||||
|
||||
### Sub-Skills Too Large
|
||||
|
||||
**Problem:** Sub-skills still exceeding 500 lines
|
||||
|
||||
**Solution:**
|
||||
- Further split into more granular concerns
|
||||
- Consider 3-tier architecture (router → category routers → specific skills)
|
||||
- Move reference documentation to separate files
|
||||
|
||||
### Cross-Skill Dependencies
|
||||
|
||||
**Problem:** Sub-skills frequently need each other
|
||||
|
||||
**Solutions:**
|
||||
1. Create shared reference documentation
|
||||
2. Use router to coordinate multi-skill operations
|
||||
3. Reconsider split boundaries (may be too granular)
|
||||
|
||||
### Router Logic Too Complex
|
||||
|
||||
**Problem:** Router has extensive conditional logic
|
||||
|
||||
**Solution:**
|
||||
- Simplify to keyword-based routing
|
||||
- Create intermediate routers (2-tier)
|
||||
- Document explicit routing table
|
||||
|
||||
**Example 2-tier:**
|
||||
|
||||
```
|
||||
main_router.md
|
||||
├── user_features_router.md
|
||||
│ ├── authentication.md
|
||||
│ ├── profiles.md
|
||||
│ └── permissions.md
|
||||
└── admin_features_router.md
|
||||
├── analytics.md
|
||||
├── reporting.md
|
||||
└── configuration.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adapting Auto-Generated Routers
|
||||
|
||||
Skill Seeker auto-generates router skills for large documentation using `generate_router.py`.
|
||||
|
||||
**You can adapt this for manual skills:**
|
||||
|
||||
### 1. Study the Pattern
|
||||
|
||||
```bash
|
||||
# Generate a router from documentation configs
|
||||
python3 cli/split_config.py configs/godot.json --strategy router
|
||||
python3 cli/generate_router.py configs/godot-*.json
|
||||
|
||||
# Examine generated router SKILL.md
|
||||
cat output/godot/SKILL.md
|
||||
```
|
||||
|
||||
### 2. Extract the Template
|
||||
|
||||
The generated router has:
|
||||
- Sub-skill descriptions
|
||||
- Keyword-based routing
|
||||
- Usage examples
|
||||
- Multi-skill coordination notes
|
||||
|
||||
### 3. Customize for Your Use Case
|
||||
|
||||
Replace documentation-specific content with your application logic:
|
||||
|
||||
```markdown
|
||||
# Generated (documentation):
|
||||
### godot-scripting
|
||||
GDScript programming, signals, nodes
|
||||
Keywords: gdscript, code, script, programming
|
||||
|
||||
# Customized (your app):
|
||||
### order_processing
|
||||
Process customer orders, payments, fulfillment
|
||||
Keywords: order, purchase, payment, checkout, fulfillment
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Key Takeaways
|
||||
|
||||
1. ✅ **500-line guideline** is important for optimal Claude performance
|
||||
2. ✅ **Router pattern** enables sophisticated applications while staying within limits
|
||||
3. ✅ **Single responsibility** - Each sub-skill does one thing well
|
||||
4. ✅ **Context efficiency** - Only load what's needed per task
|
||||
5. ✅ **Proven approach** - Already used successfully for large documentation
|
||||
|
||||
### When to Apply This Pattern
|
||||
|
||||
**Do use skill layering when:**
|
||||
- Skill exceeds 500 lines
|
||||
- Multiple distinct responsibilities
|
||||
- Different parts rarely used together
|
||||
- Team wants modular maintenance
|
||||
|
||||
**Don't use skill layering when:**
|
||||
- Skill under 500 lines
|
||||
- Single, cohesive responsibility
|
||||
- All content frequently relevant together
|
||||
- Simplicity is priority
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. Review your existing skills for split candidates
|
||||
2. Create router + sub-skills following templates above
|
||||
3. Test routing with real queries
|
||||
4. Refine keywords based on usage
|
||||
5. Iterate and improve
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **Auto-Generated Routers:** See `docs/LARGE_DOCUMENTATION.md` for automated splitting of scraped documentation
|
||||
- **Router Implementation:** See `src/skill_seekers/cli/generate_router.py` for reference implementation
|
||||
- **Examples:** See configs in `configs/` for real-world router patterns
|
||||
|
||||
**Questions or feedback?** Open an issue on GitHub!
|
||||
Reference in New Issue
Block a user