This patch release fixes the broken Chinese language selector link on PyPI by using absolute GitHub URLs instead of relative paths. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1027 lines
28 KiB
Markdown
1027 lines
28 KiB
Markdown
# Skill Seekers Intelligence System - Roadmap
|
|
|
|
**Status:** 🔬 Research & Design Phase
|
|
**Target:** Open Source, Individual Developers
|
|
**Timeline:** 6-12 months (iterative releases)
|
|
**Version:** 1.0 (Initial Design)
|
|
**Last Updated:** 2026-01-20
|
|
|
|
---
|
|
|
|
## 🎯 Vision
|
|
|
|
Build an **auto-updating, context-aware, multi-skill codebase intelligence system** that:
|
|
|
|
1. **Detects** your tech stack automatically
|
|
2. **Generates** separate skills for libraries and codebase modules
|
|
3. **Updates** skills when branches merge (git-based triggers)
|
|
4. **Clusters** skills intelligently based on what you're working on
|
|
5. **Integrates** with Claude Code via plugin architecture
|
|
|
|
**Think of it as:** A self-maintaining RAG system for your codebase that knows exactly which knowledge to load based on context.
|
|
|
|
---
|
|
|
|
## 🏗️ System Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Skill Seekers Intelligence System │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Layer 1: PROJECT CONFIGURATION │
|
|
│ ┌──────────────────────────────────────────┐ │
|
|
│ │ .skill-seekers/ │ │
|
|
│ │ ├── config.yml (user editable) │ │
|
|
│ │ ├── skills/ (auto-generated) │ │
|
|
│ │ ├── cache/ (embeddings) │ │
|
|
│ │ └── hooks/ (git triggers) │ │
|
|
│ └──────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Layer 2: SKILL GENERATION ENGINE │
|
|
│ ┌──────────────────────────────────────────┐ │
|
|
│ │ • Tech Stack Detector │ │
|
|
│ │ • Modular Codebase Analyzer (C3.x) │ │
|
|
│ │ • Library Skill Downloader │ │
|
|
│ │ • Git-Based Trigger System │ │
|
|
│ └──────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Layer 3: SKILL CLUSTERING ENGINE │
|
|
│ ┌──────────────────────────────────────────┐ │
|
|
│ │ Phase 1: Import-Based (deterministic) │ │
|
|
│ │ Phase 2: Embedding-Based (experimental) │ │
|
|
│ └──────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Layer 4: CLAUDE CODE PLUGIN │
|
|
│ ┌──────────────────────────────────────────┐ │
|
|
│ │ • File Open Handler │ │
|
|
│ │ • Branch Merge Listener │ │
|
|
│ │ • Context Manager │ │
|
|
│ │ • Skill Loader │ │
|
|
│ └──────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Development Phases
|
|
|
|
### Phase 0: Research & Validation (2-3 weeks)
|
|
**Status:** 🔬 Current Phase
|
|
**Goal:** Validate core assumptions, design architecture
|
|
|
|
**Deliverables:**
|
|
- ✅ Technical architecture document
|
|
- ✅ Roadmap document (this file)
|
|
- ✅ POC design for Phase 1
|
|
- ✅ Research clustering algorithms
|
|
- ✅ Design config schema
|
|
|
|
**Success Criteria:**
|
|
- Clear technical direction
|
|
- Validated assumptions (import analysis works, etc.)
|
|
- Ready to build Phase 1
|
|
|
|
---
|
|
|
|
### Phase 1: Git-Based Auto-Generation (3-4 weeks)
|
|
**Status:** 📅 Planned
|
|
**Goal:** Auto-generate skills on branch merges
|
|
|
|
#### Milestones
|
|
|
|
**Milestone 1.1: Project Initialization (Week 1)**
|
|
```bash
|
|
# Command
|
|
skill-seekers init-project --directory .
|
|
|
|
# Creates
|
|
.skill-seekers/
|
|
├── config.yml # Project configuration
|
|
├── hooks/
|
|
│ ├── post-merge # Git hook
|
|
│ └── post-commit # Optional
|
|
└── skills/
|
|
├── libraries/ # Empty (Phase 2)
|
|
└── codebase/ # Will be generated
|
|
```
|
|
|
|
**Config Schema (v1.0):**
|
|
```yaml
|
|
# .skill-seekers/config.yml
|
|
version: "1.0"
|
|
project_name: skill-seekers
|
|
watch_branches:
|
|
- main
|
|
- development
|
|
|
|
# Phase 1: Simple, no modules yet
|
|
skill_generation:
|
|
enabled: true
|
|
output_dir: .skill-seekers/skills/codebase
|
|
|
|
git_hooks:
|
|
enabled: true
|
|
trigger_on:
|
|
- post-merge
|
|
- post-commit # optional
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] `skill-seekers init-project` command
|
|
- [ ] Config schema v1.0
|
|
- [ ] Git hook installer
|
|
- [ ] Project directory structure creator
|
|
|
|
**Success Criteria:**
|
|
- Running `init-project` sets up directory structure
|
|
- Git hooks are installed correctly
|
|
- Config file is created with sensible defaults
|
|
|
|
---
|
|
|
|
**Milestone 1.2: Git Hook Integration (Week 2)**
|
|
|
|
**Git Hook Logic:**
|
|
```bash
|
|
#!/bin/bash
|
|
# .skill-seekers/hooks/post-merge
|
|
|
|
# Check if we're on a watched branch
|
|
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
|
|
WATCH_BRANCHES=$(yq '.watch_branches[]' .skill-seekers/config.yml)
|
|
|
|
if echo "$WATCH_BRANCHES" | grep -q "$CURRENT_BRANCH"; then
|
|
echo "🔄 Branch merge detected on $CURRENT_BRANCH"
|
|
echo "🚀 Regenerating skills..."
|
|
|
|
skill-seekers regenerate-skills --branch "$CURRENT_BRANCH"
|
|
|
|
echo "✅ Skills updated"
|
|
fi
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] Git hook templates
|
|
- [ ] Hook installer/uninstaller
|
|
- [ ] Branch detection logic
|
|
- [ ] Hook execution logging
|
|
|
|
**Success Criteria:**
|
|
- Merging to watched branch triggers skill regeneration
|
|
- Only watched branches trigger updates
|
|
- Hooks can be enabled/disabled via config
|
|
|
|
---
|
|
|
|
**Milestone 1.3: Basic Skill Regeneration (Week 3)**
|
|
|
|
**Command:**
|
|
```bash
|
|
skill-seekers regenerate-skills --branch main
|
|
|
|
# Runs:
|
|
# 1. Detects changed files since last generation
|
|
# 2. Runs codebase analysis (existing C3.x features)
|
|
# 3. Generates single skill: codebase.skill
|
|
# 4. Updates .skill-seekers/skills/codebase/codebase.skill
|
|
```
|
|
|
|
**Phase 1 Scope (Simple):**
|
|
- Single skill for entire codebase
|
|
- No modularization yet (Phase 3)
|
|
- No library skills yet (Phase 2)
|
|
- No clustering yet (Phase 4)
|
|
|
|
**Deliverables:**
|
|
- [ ] `regenerate-skills` command
|
|
- [ ] Change detection (git diff)
|
|
- [ ] Incremental vs full regeneration logic
|
|
- [ ] Skill versioning (timestamp)
|
|
|
|
**Success Criteria:**
|
|
- Manual regeneration works
|
|
- Git hook triggers regeneration
|
|
- Skill is usable in Claude Code
|
|
|
|
---
|
|
|
|
**Milestone 1.4: Dogfooding & Testing (Week 4)**
|
|
|
|
**Test on skill-seekers itself:**
|
|
```bash
|
|
cd Skill_Seekers/
|
|
skill-seekers init-project --directory .
|
|
|
|
# Make code change
|
|
git checkout -b test-auto-regen
|
|
echo "# Test" >> README.md
|
|
git commit -am "test: Auto-regen test"
|
|
|
|
# Merge to development
|
|
git checkout development
|
|
git merge test-auto-regen
|
|
# → Should trigger skill regeneration
|
|
|
|
# Verify
|
|
cat .skill-seekers/skills/codebase/codebase.skill
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] End-to-end test on skill-seekers
|
|
- [ ] Performance benchmarks
|
|
- [ ] Bug fixes
|
|
- [ ] Documentation updates
|
|
|
|
**Success Criteria:**
|
|
- Works on skill-seekers codebase
|
|
- Regeneration completes in <5 minutes
|
|
- Generated skill is high quality
|
|
- No major bugs
|
|
|
|
---
|
|
|
|
### Phase 2: Tech Stack Detection & Library Skills (2-3 weeks)
|
|
**Status:** 📅 Planned (After Phase 1)
|
|
**Goal:** Auto-detect tech stack and download library skills
|
|
|
|
#### Milestones
|
|
|
|
**Milestone 2.1: Tech Stack Detector (Week 1)**
|
|
|
|
**Detection Strategy:**
|
|
```python
|
|
# src/skill_seekers/intelligence/stack_detector.py
|
|
|
|
class TechStackDetector:
|
|
"""Detect tech stack from project files"""
|
|
|
|
def detect(self, project_dir: Path) -> dict:
|
|
stack = {
|
|
"languages": [],
|
|
"frameworks": [],
|
|
"databases": [],
|
|
"tools": []
|
|
}
|
|
|
|
# Python ecosystem
|
|
if (project_dir / "requirements.txt").exists():
|
|
stack["languages"].append("Python")
|
|
deps = self._parse_requirements()
|
|
|
|
if "fastapi" in deps:
|
|
stack["frameworks"].append("FastAPI")
|
|
if "django" in deps:
|
|
stack["frameworks"].append("Django")
|
|
if "flask" in deps:
|
|
stack["frameworks"].append("Flask")
|
|
|
|
# JavaScript/TypeScript ecosystem
|
|
if (project_dir / "package.json").exists():
|
|
deps = self._parse_package_json()
|
|
|
|
if "typescript" in deps:
|
|
stack["languages"].append("TypeScript")
|
|
else:
|
|
stack["languages"].append("JavaScript")
|
|
|
|
if "react" in deps:
|
|
stack["frameworks"].append("React")
|
|
if "vue" in deps:
|
|
stack["frameworks"].append("Vue")
|
|
if "next" in deps:
|
|
stack["frameworks"].append("Next.js")
|
|
|
|
# Database detection
|
|
if (project_dir / ".env").exists():
|
|
env = self._parse_env()
|
|
db_url = env.get("DATABASE_URL", "")
|
|
|
|
if "postgres" in db_url:
|
|
stack["databases"].append("PostgreSQL")
|
|
if "mysql" in db_url:
|
|
stack["databases"].append("MySQL")
|
|
if "mongodb" in db_url:
|
|
stack["databases"].append("MongoDB")
|
|
|
|
# Docker services
|
|
if (project_dir / "docker-compose.yml").exists():
|
|
services = self._parse_docker_compose()
|
|
stack["tools"].extend(services)
|
|
|
|
return stack
|
|
```
|
|
|
|
**Supported Ecosystems (v1.0):**
|
|
- **Python:** FastAPI, Django, Flask, SQLAlchemy
|
|
- **JavaScript/TypeScript:** React, Vue, Next.js, Express
|
|
- **Databases:** PostgreSQL, MySQL, MongoDB, Redis
|
|
- **Tools:** Docker, Nginx, Celery
|
|
|
|
**Deliverables:**
|
|
- [ ] `TechStackDetector` class
|
|
- [ ] Parsers for common config files
|
|
- [ ] Detection accuracy tests
|
|
- [ ] `skill-seekers detect-stack` command
|
|
|
|
**Success Criteria:**
|
|
- 90%+ accuracy on common stacks
|
|
- Fast (<1 second)
|
|
- Extensible (easy to add new detectors)
|
|
|
|
---
|
|
|
|
**Milestone 2.2: Library Skill Downloader (Week 2)**
|
|
|
|
**Architecture:**
|
|
```python
|
|
# src/skill_seekers/intelligence/library_manager.py
|
|
|
|
class LibrarySkillManager:
|
|
"""Download and cache library skills"""
|
|
|
|
def download_skills(self, tech_stack: dict) -> list[Path]:
|
|
skills = []
|
|
|
|
for framework in tech_stack["frameworks"]:
|
|
skill_path = self._download_skill(framework)
|
|
skills.append(skill_path)
|
|
|
|
return skills
|
|
|
|
def _download_skill(self, name: str) -> Path:
|
|
# Try skillseekersweb.com API first
|
|
skill = self._fetch_from_api(name)
|
|
|
|
if not skill:
|
|
# Fallback: generate from GitHub repo
|
|
skill = self._generate_from_github(name)
|
|
|
|
# Cache locally
|
|
cache_path = Path(f".skill-seekers/skills/libraries/{name}.skill")
|
|
cache_path.write_text(skill)
|
|
|
|
return cache_path
|
|
```
|
|
|
|
**Library Skill Sources:**
|
|
1. **SkillSeekersWeb.com API** (preferred)
|
|
- Pre-generated skills for popular frameworks
|
|
- Curated, high-quality
|
|
- Fast download
|
|
|
|
2. **On-Demand Generation** (fallback)
|
|
- Generate from framework's GitHub repo
|
|
- Uses existing `github_scraper.py`
|
|
- Cached after first generation
|
|
|
|
**Deliverables:**
|
|
- [ ] `LibrarySkillManager` class
|
|
- [ ] API client for skillseekersweb.com
|
|
- [ ] Caching system
|
|
- [ ] `skill-seekers download-libraries` command
|
|
|
|
**Success Criteria:**
|
|
- Downloads skills for detected frameworks
|
|
- Caching works (no duplicate downloads)
|
|
- Handles missing skills gracefully
|
|
|
|
---
|
|
|
|
**Milestone 2.3: Config Schema v2.0 (Week 3)**
|
|
|
|
**Updated Config:**
|
|
```yaml
|
|
# .skill-seekers/config.yml
|
|
version: "2.0"
|
|
project_name: skill-seekers
|
|
watch_branches:
|
|
- main
|
|
- development
|
|
|
|
# NEW: Tech stack configuration
|
|
tech_stack:
|
|
auto_detect: true
|
|
frameworks:
|
|
- FastAPI
|
|
- React
|
|
- PostgreSQL
|
|
|
|
# Override auto-detection
|
|
custom:
|
|
- name: "Internal Framework"
|
|
skill_url: "https://internal.com/skills/framework.skill"
|
|
|
|
# Library skills
|
|
library_skills:
|
|
enabled: true
|
|
source: "skillseekersweb.com"
|
|
cache_dir: .skill-seekers/skills/libraries
|
|
update_frequency: "weekly" # or: never, daily, on-branch-merge
|
|
|
|
skill_generation:
|
|
enabled: true
|
|
output_dir: .skill-seekers/skills/codebase
|
|
|
|
git_hooks:
|
|
enabled: true
|
|
trigger_on:
|
|
- post-merge
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] Config schema v2.0
|
|
- [ ] Migration from v1.0 to v2.0
|
|
- [ ] Validation logic
|
|
- [ ] Documentation
|
|
|
|
**Success Criteria:**
|
|
- Backward compatible with v1.0
|
|
- Clear upgrade path
|
|
- Well documented
|
|
|
|
---
|
|
|
|
### Phase 3: Modular Skill Splitting (3-4 weeks)
|
|
**Status:** 📅 Planned (After Phase 2)
|
|
**Goal:** Split codebase into modular skills based on config
|
|
|
|
#### Milestones
|
|
|
|
**Milestone 3.1: Module Configuration (Week 1)**
|
|
|
|
**Config Schema v3.0:**
|
|
```yaml
|
|
# .skill-seekers/config.yml
|
|
version: "3.0"
|
|
project_name: skill-seekers
|
|
|
|
# ... (previous config)
|
|
|
|
# NEW: Module definitions
|
|
modules:
|
|
backend:
|
|
path: src/skill_seekers/
|
|
split_by: namespace # or: directory, feature, custom
|
|
|
|
skills:
|
|
- name: cli
|
|
description: "Command-line interface tools"
|
|
include:
|
|
- "cli/**/*.py"
|
|
exclude:
|
|
- "cli/**/*_test.py"
|
|
|
|
- name: scrapers
|
|
description: "Web scraping and analysis"
|
|
include:
|
|
- "cli/doc_scraper.py"
|
|
- "cli/github_scraper.py"
|
|
- "cli/pdf_scraper.py"
|
|
|
|
- name: adaptors
|
|
description: "Platform adaptor system"
|
|
include:
|
|
- "cli/adaptors/**/*.py"
|
|
|
|
- name: mcp
|
|
description: "MCP server integration"
|
|
include:
|
|
- "mcp/**/*.py"
|
|
|
|
tests:
|
|
path: tests/
|
|
split_by: directory
|
|
skills:
|
|
- name: unit-tests
|
|
include: ["test_*.py"]
|
|
```
|
|
|
|
**Splitting Strategies:**
|
|
```python
|
|
class ModuleSplitter:
|
|
"""Split codebase into modular skills"""
|
|
|
|
STRATEGIES = {
|
|
"namespace": self._split_by_namespace,
|
|
"directory": self._split_by_directory,
|
|
"feature": self._split_by_feature,
|
|
"custom": self._split_by_custom,
|
|
}
|
|
|
|
def _split_by_namespace(self, module_config: dict) -> list[Skill]:
|
|
# Python: package.module.submodule
|
|
# JS: import { X } from './path/to/module'
|
|
pass
|
|
|
|
def _split_by_directory(self, module_config: dict) -> list[Skill]:
|
|
# One skill per top-level directory
|
|
pass
|
|
|
|
def _split_by_feature(self, module_config: dict) -> list[Skill]:
|
|
# Group by feature (auth, api, models, etc.)
|
|
pass
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] Module splitting engine
|
|
- [ ] Config schema v3.0
|
|
- [ ] Support for glob patterns
|
|
- [ ] Validation logic
|
|
|
|
**Success Criteria:**
|
|
- Can split skill-seekers into 4-5 modules
|
|
- Each module is focused and cohesive
|
|
- User has full control via config
|
|
|
|
---
|
|
|
|
**Milestone 3.2: Modular Skill Generation (Week 2-3)**
|
|
|
|
**Output Structure:**
|
|
```
|
|
.skill-seekers/skills/
|
|
├── libraries/
|
|
│ ├── fastapi.skill
|
|
│ ├── anthropic.skill
|
|
│ └── beautifulsoup.skill
|
|
│
|
|
└── codebase/
|
|
├── cli.skill # CLI tools
|
|
├── scrapers.skill # Scraping logic
|
|
├── adaptors.skill # Platform adaptors
|
|
├── mcp.skill # MCP server
|
|
└── tests.skill # Test suite
|
|
```
|
|
|
|
**Each skill contains:**
|
|
- Focused documentation (one module only)
|
|
- API reference for that module
|
|
- Design patterns in that module
|
|
- Test examples for that module
|
|
- Cross-references to related skills
|
|
|
|
**Deliverables:**
|
|
- [ ] Modular skill generator
|
|
- [ ] Cross-reference system
|
|
- [ ] Skill metadata (dependencies, related skills)
|
|
- [ ] Update generation pipeline
|
|
|
|
**Success Criteria:**
|
|
- Generates 4-5 focused skills for skill-seekers
|
|
- Each skill is 50-200 lines (not too big)
|
|
- Cross-references work
|
|
|
|
---
|
|
|
|
**Milestone 3.3: Testing & Iteration (Week 4)**
|
|
|
|
**Test Plan:**
|
|
1. Generate modular skills for skill-seekers
|
|
2. Use in Claude Code for 1 week
|
|
3. Compare vs single skill (Phase 1)
|
|
4. Iterate on module boundaries
|
|
|
|
**Success Criteria:**
|
|
- Modular skills are more useful than single skill
|
|
- Module boundaries make sense
|
|
- Performance is acceptable
|
|
|
|
---
|
|
|
|
### Phase 4: Import-Based Clustering (2-3 weeks)
|
|
**Status:** 📅 Planned (After Phase 3)
|
|
**Goal:** Load only relevant skills based on current file
|
|
|
|
#### Milestones
|
|
|
|
**Milestone 4.1: Import Analyzer (Week 1)**
|
|
|
|
**Algorithm:**
|
|
```python
|
|
# src/skill_seekers/intelligence/import_analyzer.py
|
|
|
|
class ImportAnalyzer:
|
|
"""Analyze imports to find relevant skills"""
|
|
|
|
def find_relevant_skills(
|
|
self,
|
|
current_file: Path,
|
|
available_skills: list[SkillMetadata]
|
|
) -> list[Path]:
|
|
# 1. Parse imports from current file
|
|
imports = self._parse_imports(current_file)
|
|
# Example: editing src/cli/doc_scraper.py
|
|
# Imports:
|
|
# - from anthropic import Anthropic
|
|
# - from bs4 import BeautifulSoup
|
|
# - from skill_seekers.cli.adaptors import get_adaptor
|
|
|
|
# 2. Map imports to skills
|
|
relevant = []
|
|
|
|
for imp in imports:
|
|
# External library?
|
|
if self._is_external(imp):
|
|
library_skill = self._find_library_skill(imp)
|
|
if library_skill:
|
|
relevant.append(library_skill)
|
|
|
|
# Internal module?
|
|
else:
|
|
module_skill = self._find_module_skill(imp, available_skills)
|
|
if module_skill:
|
|
relevant.append(module_skill)
|
|
|
|
# 3. Add current module's skill
|
|
current_skill = self._find_skill_for_file(current_file, available_skills)
|
|
if current_skill:
|
|
relevant.insert(0, current_skill) # First in list
|
|
|
|
# 4. Deduplicate and rank
|
|
return self._deduplicate(relevant)[:5] # Max 5 skills
|
|
```
|
|
|
|
**Example Output:**
|
|
```python
|
|
# Editing: src/cli/doc_scraper.py
|
|
find_relevant_skills("src/cli/doc_scraper.py")
|
|
|
|
# Returns:
|
|
[
|
|
"codebase/scrapers.skill", # Current module (highest priority)
|
|
"libraries/beautifulsoup.skill", # External import
|
|
"libraries/anthropic.skill", # External import
|
|
"codebase/adaptors.skill", # Internal import
|
|
]
|
|
```
|
|
|
|
**Deliverables:**
|
|
- [ ] `ImportAnalyzer` class
|
|
- [ ] Python import parser (AST-based)
|
|
- [ ] JavaScript import parser (regex-based)
|
|
- [ ] Import-to-skill mapping logic
|
|
|
|
**Success Criteria:**
|
|
- Correctly identifies imports from files
|
|
- Maps imports to skills accurately
|
|
- Fast (<100ms for typical file)
|
|
|
|
---
|
|
|
|
**Milestone 4.2: Claude Code Plugin (Week 2)**
|
|
|
|
**Plugin Architecture:**
|
|
```python
|
|
# claude_plugins/skill-seekers-intelligence/agent.py
|
|
|
|
class SkillSeekersIntelligenceAgent:
|
|
"""
|
|
Claude Code plugin that manages skill loading
|
|
"""
|
|
|
|
def __init__(self):
|
|
self.config = self._load_config()
|
|
self.import_analyzer = ImportAnalyzer()
|
|
self.current_skills = []
|
|
|
|
async def on_file_open(self, file_path: str):
|
|
"""
|
|
Hook: User opens a file
|
|
Action: Load relevant skills
|
|
"""
|
|
# Find relevant skills
|
|
relevant = self.import_analyzer.find_relevant_skills(
|
|
file_path,
|
|
self.config.available_skills
|
|
)
|
|
|
|
# Load into Claude context
|
|
self.load_skills(relevant)
|
|
|
|
# Notify user
|
|
print(f"📚 Loaded {len(relevant)} relevant skills:")
|
|
for skill in relevant:
|
|
print(f" - {skill.name}")
|
|
|
|
async def on_branch_merge(self, branch: str):
|
|
"""
|
|
Hook: Branch merged
|
|
Action: Regenerate skills if needed
|
|
"""
|
|
if branch in self.config.watch_branches:
|
|
print(f"🔄 Regenerating skills for {branch}...")
|
|
await self.regenerate_skills(branch)
|
|
print("✅ Skills updated")
|
|
|
|
def load_skills(self, skills: list[Path]):
|
|
"""Load skills into Claude context"""
|
|
self.current_skills = skills
|
|
|
|
# Tell Claude which skills are loaded
|
|
# (Implementation depends on Claude Code API)
|
|
```
|
|
|
|
**Plugin Hooks:**
|
|
- `on_file_open` - Load relevant skills
|
|
- `on_file_save` - Update skills if needed
|
|
- `on_branch_merge` - Regenerate skills
|
|
- `on_branch_checkout` - Switch skill set
|
|
|
|
**Deliverables:**
|
|
- [ ] Claude Code plugin skeleton
|
|
- [ ] File open handler
|
|
- [ ] Branch merge listener
|
|
- [ ] Skill loader integration
|
|
|
|
**Success Criteria:**
|
|
- Plugin loads in Claude Code
|
|
- File opens trigger skill loading
|
|
- Branch merges trigger regeneration
|
|
- User sees which skills are loaded
|
|
|
|
---
|
|
|
|
**Milestone 4.3: Testing & Dogfooding (Week 3)**
|
|
|
|
**Test Plan:**
|
|
1. Install plugin in Claude Code
|
|
2. Open skill-seekers codebase
|
|
3. Navigate files, observe skill loading
|
|
4. Make changes, merge branch, observe regeneration
|
|
|
|
**Success Criteria:**
|
|
- Correct skills load for each file
|
|
- No performance issues
|
|
- User experience is smooth
|
|
|
|
---
|
|
|
|
### Phase 5: Embedding-Based Clustering (3-4 weeks)
|
|
**Status:** 🔬 Experimental (After Phase 4)
|
|
**Goal:** Smarter clustering using semantic similarity
|
|
|
|
#### Milestones
|
|
|
|
**Milestone 5.1: Embedding Generation (Week 1-2)**
|
|
|
|
**Architecture:**
|
|
```python
|
|
# src/skill_seekers/intelligence/embeddings.py
|
|
|
|
class SkillEmbedder:
|
|
"""Generate and cache embeddings for skills and files"""
|
|
|
|
def __init__(self):
|
|
# Use lightweight model for speed
|
|
# Options: sentence-transformers, OpenAI, Anthropic
|
|
self.model = "all-MiniLM-L6-v2" # Fast, good quality
|
|
|
|
def embed_skill(self, skill_path: Path) -> np.ndarray:
|
|
"""Generate embedding for entire skill"""
|
|
content = skill_path.read_text()
|
|
|
|
# Extract key sections
|
|
api_ref = self._extract_section(content, "API Reference")
|
|
examples = self._extract_section(content, "Examples")
|
|
|
|
# Embed combined text
|
|
text = f"{api_ref}\n{examples}"
|
|
embedding = self.model.encode(text)
|
|
|
|
# Cache for reuse
|
|
self._cache_embedding(skill_path, embedding)
|
|
|
|
return embedding
|
|
|
|
def embed_file(self, file_path: Path) -> np.ndarray:
|
|
"""Generate embedding for current file"""
|
|
content = file_path.read_text()
|
|
|
|
# Embed full content or summary
|
|
embedding = self.model.encode(content[:5000]) # First 5K chars
|
|
|
|
return embedding
|
|
```
|
|
|
|
**Embedding Strategy:**
|
|
- **Skills:** Embed once, cache forever (until skill updates)
|
|
- **Files:** Embed on-demand (or cache for open files)
|
|
- **Model:** Lightweight (all-MiniLM-L6-v2 is 80MB, fast)
|
|
- **Storage:** `.skill-seekers/cache/embeddings/`
|
|
|
|
**Deliverables:**
|
|
- [ ] `SkillEmbedder` class
|
|
- [ ] Embedding cache system
|
|
- [ ] Similarity search (cosine similarity)
|
|
- [ ] Benchmark performance
|
|
|
|
**Success Criteria:**
|
|
- Fast embedding (<100ms per file)
|
|
- Accurate similarity (>80% precision)
|
|
- Reasonable storage (<100MB for typical project)
|
|
|
|
---
|
|
|
|
**Milestone 5.2: Hybrid Clustering (Week 3)**
|
|
|
|
**Algorithm:**
|
|
```python
|
|
class HybridClusteringEngine:
|
|
"""
|
|
Combine import-based (fast, deterministic)
|
|
with embedding-based (smart, flexible)
|
|
"""
|
|
|
|
def find_relevant_skills(
|
|
self,
|
|
current_file: Path,
|
|
available_skills: list[SkillMetadata]
|
|
) -> list[Path]:
|
|
# Method 1: Import-based (weight: 0.7)
|
|
import_skills = self.import_analyzer.find_relevant_skills(
|
|
current_file, available_skills
|
|
)
|
|
|
|
# Method 2: Embedding-based (weight: 0.3)
|
|
file_embedding = self.embedder.embed_file(current_file)
|
|
similar_skills = self._find_similar_skills(
|
|
file_embedding, available_skills
|
|
)
|
|
|
|
# Combine with weighted ranking
|
|
combined = self._weighted_merge(
|
|
import_skills, similar_skills,
|
|
weights=[0.7, 0.3]
|
|
)
|
|
|
|
return combined[:5] # Top 5
|
|
```
|
|
|
|
**Why Hybrid?**
|
|
- Import-based: Precise but misses semantic similarity
|
|
- Embedding-based: Flexible but sometimes wrong
|
|
- Hybrid: Best of both worlds
|
|
|
|
**Deliverables:**
|
|
- [ ] Hybrid clustering algorithm
|
|
- [ ] Weighted ranking system
|
|
- [ ] A/B testing framework
|
|
- [ ] Performance comparison
|
|
|
|
**Success Criteria:**
|
|
- Better than import-only (A/B test)
|
|
- Not significantly slower (<200ms)
|
|
- Handles edge cases well
|
|
|
|
---
|
|
|
|
**Milestone 5.3: Experimental Features (Week 4)**
|
|
|
|
**Ideas to Explore:**
|
|
1. **Dynamic Skill Loading:** Load skills as conversation progresses
|
|
2. **Conversation Context:** Use chat history to refine clustering
|
|
3. **Feedback Loop:** Learn from user corrections
|
|
4. **Skill Ranking:** Rank skills by usefulness
|
|
|
|
**Deliverables:**
|
|
- [ ] Experimental features (optional)
|
|
- [ ] Documentation of learnings
|
|
- [ ] Recommendations for v2.0
|
|
|
|
**Success Criteria:**
|
|
- Identified valuable experimental features
|
|
- Documented what works and what doesn't
|
|
|
|
---
|
|
|
|
## 📊 Success Metrics
|
|
|
|
### Phase 1 Metrics
|
|
- ✅ Auto-regeneration works on branch merge
|
|
- ✅ <5 minutes to regenerate skills
|
|
- ✅ Git hooks work reliably
|
|
|
|
### Phase 2 Metrics
|
|
- ✅ 90%+ accuracy on tech stack detection
|
|
- ✅ Library skills downloaded successfully
|
|
- ✅ <2 seconds to download cached skill
|
|
|
|
### Phase 3 Metrics
|
|
- ✅ Modular skills are 50-200 lines each
|
|
- ✅ User can configure module boundaries
|
|
- ✅ Cross-references work
|
|
|
|
### Phase 4 Metrics
|
|
- ✅ Correct skills load 85%+ of the time
|
|
- ✅ <100ms to find relevant skills
|
|
- ✅ Plugin works smoothly in Claude Code
|
|
|
|
### Phase 5 Metrics
|
|
- ✅ Hybrid clustering beats import-only
|
|
- ✅ <200ms to cluster with embeddings
|
|
- ✅ Embedding cache < 100MB
|
|
|
|
---
|
|
|
|
## 🎯 Target Users
|
|
|
|
### Primary: Individual Open Source Developers
|
|
- Working on their own projects
|
|
- Want better codebase understanding
|
|
- Use Claude Code for development
|
|
- Value automation over manual work
|
|
|
|
### Secondary: Small Teams
|
|
- Onboarding new developers
|
|
- Maintaining large codebases
|
|
- Need consistent documentation
|
|
|
|
### Future: Enterprise
|
|
- Large codebases (1M+ LOC)
|
|
- Multiple microservices
|
|
- Advanced clustering requirements
|
|
|
|
---
|
|
|
|
## 📦 Deliverables
|
|
|
|
### User-Facing
|
|
- [ ] CLI commands (init, regenerate, detect, download)
|
|
- [ ] Claude Code plugin
|
|
- [ ] Configuration system (.skill-seekers/config.yml)
|
|
- [ ] Documentation (user guide, tutorial)
|
|
|
|
### Developer-Facing
|
|
- [ ] Python library (skill_seekers.intelligence)
|
|
- [ ] Plugin SDK (for extending)
|
|
- [ ] API documentation
|
|
- [ ] Architecture documentation
|
|
|
|
### Infrastructure
|
|
- [ ] Git hooks
|
|
- [ ] CI/CD integration
|
|
- [ ] Embedding cache system
|
|
- [ ] Skill registry
|
|
|
|
---
|
|
|
|
## 🚧 Known Challenges
|
|
|
|
### Technical
|
|
1. **Context Window Limits:** Even with clustering, large projects may exceed limits
|
|
2. **Embedding Performance:** Need fast, lightweight models
|
|
3. **Accuracy:** Import analysis may miss implicit dependencies
|
|
4. **Versioning:** Skills must stay in sync with code
|
|
|
|
### Product
|
|
1. **Onboarding:** Complex system needs good UX
|
|
2. **Configuration:** Balance power vs simplicity
|
|
3. **Debugging:** When clustering fails, hard to debug
|
|
|
|
### Operational
|
|
1. **Maintenance:** More components = more maintenance
|
|
2. **Testing:** Hard to test context-aware features
|
|
3. **Documentation:** Need excellent docs for adoption
|
|
|
|
---
|
|
|
|
## 🔮 Future Ideas (Post v1.0)
|
|
|
|
### Advanced Clustering
|
|
- [ ] Multi-file context (editing 3 files → load related skills)
|
|
- [ ] Conversation-aware clustering (use chat history)
|
|
- [ ] Feedback loop (learn from corrections)
|
|
|
|
### Multi-Project
|
|
- [ ] Workspace support (multiple projects)
|
|
- [ ] Cross-project skills (shared libraries)
|
|
- [ ] Monorepo support
|
|
|
|
### Integrations
|
|
- [ ] VS Code extension
|
|
- [ ] IntelliJ plugin
|
|
- [ ] Web dashboard
|
|
|
|
### Advanced Features
|
|
- [ ] Skill versioning (track changes over time)
|
|
- [ ] Skill diff (compare versions)
|
|
- [ ] Skill analytics (usage tracking)
|
|
|
|
---
|
|
|
|
## 📚 References
|
|
|
|
- **Existing Features:** C3.x Codebase Analysis (patterns, examples, architecture)
|
|
- **Platform:** Claude Code plugin system
|
|
- **Similar Tools:** GitHub Copilot, Cursor, Tabnine
|
|
- **Research:** RAG systems, semantic search, code embeddings
|
|
|
|
---
|
|
|
|
**Version:** 1.0
|
|
**Status:** Research & Design Phase
|
|
**Next Review:** After Phase 0 completion
|
|
**Owner:** Yusuf Karaaslan
|