This patch release fixes the broken Chinese language selector link on PyPI by using absolute GitHub URLs instead of relative paths. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
28 KiB
Skill Seekers Intelligence System - Roadmap
Status: 🔬 Research & Design Phase Target: Open Source, Individual Developers Timeline: 6-12 months (iterative releases) Version: 1.0 (Initial Design) Last Updated: 2026-01-20
🎯 Vision
Build an auto-updating, context-aware, multi-skill codebase intelligence system that:
- Detects your tech stack automatically
- Generates separate skills for libraries and codebase modules
- Updates skills when branches merge (git-based triggers)
- Clusters skills intelligently based on what you're working on
- Integrates with Claude Code via plugin architecture
Think of it as: A self-maintaining RAG system for your codebase that knows exactly which knowledge to load based on context.
🏗️ System Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ Skill Seekers Intelligence System │
├─────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: PROJECT CONFIGURATION │
│ ┌──────────────────────────────────────────┐ │
│ │ .skill-seekers/ │ │
│ │ ├── config.yml (user editable) │ │
│ │ ├── skills/ (auto-generated) │ │
│ │ ├── cache/ (embeddings) │ │
│ │ └── hooks/ (git triggers) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Layer 2: SKILL GENERATION ENGINE │
│ ┌──────────────────────────────────────────┐ │
│ │ • Tech Stack Detector │ │
│ │ • Modular Codebase Analyzer (C3.x) │ │
│ │ • Library Skill Downloader │ │
│ │ • Git-Based Trigger System │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Layer 3: SKILL CLUSTERING ENGINE │
│ ┌──────────────────────────────────────────┐ │
│ │ Phase 1: Import-Based (deterministic) │ │
│ │ Phase 2: Embedding-Based (experimental) │ │
│ └──────────────────────────────────────────┘ │
│ │
│ Layer 4: CLAUDE CODE PLUGIN │
│ ┌──────────────────────────────────────────┐ │
│ │ • File Open Handler │ │
│ │ • Branch Merge Listener │ │
│ │ • Context Manager │ │
│ │ • Skill Loader │ │
│ └──────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
📋 Development Phases
Phase 0: Research & Validation (2-3 weeks)
Status: 🔬 Current Phase Goal: Validate core assumptions, design architecture
Deliverables:
- ✅ Technical architecture document
- ✅ Roadmap document (this file)
- ✅ POC design for Phase 1
- ✅ Research clustering algorithms
- ✅ Design config schema
Success Criteria:
- Clear technical direction
- Validated assumptions (import analysis works, etc.)
- Ready to build Phase 1
Phase 1: Git-Based Auto-Generation (3-4 weeks)
Status: 📅 Planned Goal: Auto-generate skills on branch merges
Milestones
Milestone 1.1: Project Initialization (Week 1)
# Command
skill-seekers init-project --directory .
# Creates
.skill-seekers/
├── config.yml # Project configuration
├── hooks/
│ ├── post-merge # Git hook
│ └── post-commit # Optional
└── skills/
├── libraries/ # Empty (Phase 2)
└── codebase/ # Will be generated
Config Schema (v1.0):
# .skill-seekers/config.yml
version: "1.0"
project_name: skill-seekers
watch_branches:
- main
- development
# Phase 1: Simple, no modules yet
skill_generation:
enabled: true
output_dir: .skill-seekers/skills/codebase
git_hooks:
enabled: true
trigger_on:
- post-merge
- post-commit # optional
Deliverables:
skill-seekers init-projectcommand- Config schema v1.0
- Git hook installer
- Project directory structure creator
Success Criteria:
- Running
init-projectsets up directory structure - Git hooks are installed correctly
- Config file is created with sensible defaults
Milestone 1.2: Git Hook Integration (Week 2)
Git Hook Logic:
#!/bin/bash
# .skill-seekers/hooks/post-merge
# Check if we're on a watched branch
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
WATCH_BRANCHES=$(yq '.watch_branches[]' .skill-seekers/config.yml)
if echo "$WATCH_BRANCHES" | grep -q "$CURRENT_BRANCH"; then
echo "🔄 Branch merge detected on $CURRENT_BRANCH"
echo "🚀 Regenerating skills..."
skill-seekers regenerate-skills --branch "$CURRENT_BRANCH"
echo "✅ Skills updated"
fi
Deliverables:
- Git hook templates
- Hook installer/uninstaller
- Branch detection logic
- Hook execution logging
Success Criteria:
- Merging to watched branch triggers skill regeneration
- Only watched branches trigger updates
- Hooks can be enabled/disabled via config
Milestone 1.3: Basic Skill Regeneration (Week 3)
Command:
skill-seekers regenerate-skills --branch main
# Runs:
# 1. Detects changed files since last generation
# 2. Runs codebase analysis (existing C3.x features)
# 3. Generates single skill: codebase.skill
# 4. Updates .skill-seekers/skills/codebase/codebase.skill
Phase 1 Scope (Simple):
- Single skill for entire codebase
- No modularization yet (Phase 3)
- No library skills yet (Phase 2)
- No clustering yet (Phase 4)
Deliverables:
regenerate-skillscommand- Change detection (git diff)
- Incremental vs full regeneration logic
- Skill versioning (timestamp)
Success Criteria:
- Manual regeneration works
- Git hook triggers regeneration
- Skill is usable in Claude Code
Milestone 1.4: Dogfooding & Testing (Week 4)
Test on skill-seekers itself:
cd Skill_Seekers/
skill-seekers init-project --directory .
# Make code change
git checkout -b test-auto-regen
echo "# Test" >> README.md
git commit -am "test: Auto-regen test"
# Merge to development
git checkout development
git merge test-auto-regen
# → Should trigger skill regeneration
# Verify
cat .skill-seekers/skills/codebase/codebase.skill
Deliverables:
- End-to-end test on skill-seekers
- Performance benchmarks
- Bug fixes
- Documentation updates
Success Criteria:
- Works on skill-seekers codebase
- Regeneration completes in <5 minutes
- Generated skill is high quality
- No major bugs
Phase 2: Tech Stack Detection & Library Skills (2-3 weeks)
Status: 📅 Planned (After Phase 1) Goal: Auto-detect tech stack and download library skills
Milestones
Milestone 2.1: Tech Stack Detector (Week 1)
Detection Strategy:
# src/skill_seekers/intelligence/stack_detector.py
class TechStackDetector:
"""Detect tech stack from project files"""
def detect(self, project_dir: Path) -> dict:
stack = {
"languages": [],
"frameworks": [],
"databases": [],
"tools": []
}
# Python ecosystem
if (project_dir / "requirements.txt").exists():
stack["languages"].append("Python")
deps = self._parse_requirements()
if "fastapi" in deps:
stack["frameworks"].append("FastAPI")
if "django" in deps:
stack["frameworks"].append("Django")
if "flask" in deps:
stack["frameworks"].append("Flask")
# JavaScript/TypeScript ecosystem
if (project_dir / "package.json").exists():
deps = self._parse_package_json()
if "typescript" in deps:
stack["languages"].append("TypeScript")
else:
stack["languages"].append("JavaScript")
if "react" in deps:
stack["frameworks"].append("React")
if "vue" in deps:
stack["frameworks"].append("Vue")
if "next" in deps:
stack["frameworks"].append("Next.js")
# Database detection
if (project_dir / ".env").exists():
env = self._parse_env()
db_url = env.get("DATABASE_URL", "")
if "postgres" in db_url:
stack["databases"].append("PostgreSQL")
if "mysql" in db_url:
stack["databases"].append("MySQL")
if "mongodb" in db_url:
stack["databases"].append("MongoDB")
# Docker services
if (project_dir / "docker-compose.yml").exists():
services = self._parse_docker_compose()
stack["tools"].extend(services)
return stack
Supported Ecosystems (v1.0):
- Python: FastAPI, Django, Flask, SQLAlchemy
- JavaScript/TypeScript: React, Vue, Next.js, Express
- Databases: PostgreSQL, MySQL, MongoDB, Redis
- Tools: Docker, Nginx, Celery
Deliverables:
TechStackDetectorclass- Parsers for common config files
- Detection accuracy tests
skill-seekers detect-stackcommand
Success Criteria:
- 90%+ accuracy on common stacks
- Fast (<1 second)
- Extensible (easy to add new detectors)
Milestone 2.2: Library Skill Downloader (Week 2)
Architecture:
# src/skill_seekers/intelligence/library_manager.py
class LibrarySkillManager:
"""Download and cache library skills"""
def download_skills(self, tech_stack: dict) -> list[Path]:
skills = []
for framework in tech_stack["frameworks"]:
skill_path = self._download_skill(framework)
skills.append(skill_path)
return skills
def _download_skill(self, name: str) -> Path:
# Try skillseekersweb.com API first
skill = self._fetch_from_api(name)
if not skill:
# Fallback: generate from GitHub repo
skill = self._generate_from_github(name)
# Cache locally
cache_path = Path(f".skill-seekers/skills/libraries/{name}.skill")
cache_path.write_text(skill)
return cache_path
Library Skill Sources:
-
SkillSeekersWeb.com API (preferred)
- Pre-generated skills for popular frameworks
- Curated, high-quality
- Fast download
-
On-Demand Generation (fallback)
- Generate from framework's GitHub repo
- Uses existing
github_scraper.py - Cached after first generation
Deliverables:
LibrarySkillManagerclass- API client for skillseekersweb.com
- Caching system
skill-seekers download-librariescommand
Success Criteria:
- Downloads skills for detected frameworks
- Caching works (no duplicate downloads)
- Handles missing skills gracefully
Milestone 2.3: Config Schema v2.0 (Week 3)
Updated Config:
# .skill-seekers/config.yml
version: "2.0"
project_name: skill-seekers
watch_branches:
- main
- development
# NEW: Tech stack configuration
tech_stack:
auto_detect: true
frameworks:
- FastAPI
- React
- PostgreSQL
# Override auto-detection
custom:
- name: "Internal Framework"
skill_url: "https://internal.com/skills/framework.skill"
# Library skills
library_skills:
enabled: true
source: "skillseekersweb.com"
cache_dir: .skill-seekers/skills/libraries
update_frequency: "weekly" # or: never, daily, on-branch-merge
skill_generation:
enabled: true
output_dir: .skill-seekers/skills/codebase
git_hooks:
enabled: true
trigger_on:
- post-merge
Deliverables:
- Config schema v2.0
- Migration from v1.0 to v2.0
- Validation logic
- Documentation
Success Criteria:
- Backward compatible with v1.0
- Clear upgrade path
- Well documented
Phase 3: Modular Skill Splitting (3-4 weeks)
Status: 📅 Planned (After Phase 2) Goal: Split codebase into modular skills based on config
Milestones
Milestone 3.1: Module Configuration (Week 1)
Config Schema v3.0:
# .skill-seekers/config.yml
version: "3.0"
project_name: skill-seekers
# ... (previous config)
# NEW: Module definitions
modules:
backend:
path: src/skill_seekers/
split_by: namespace # or: directory, feature, custom
skills:
- name: cli
description: "Command-line interface tools"
include:
- "cli/**/*.py"
exclude:
- "cli/**/*_test.py"
- name: scrapers
description: "Web scraping and analysis"
include:
- "cli/doc_scraper.py"
- "cli/github_scraper.py"
- "cli/pdf_scraper.py"
- name: adaptors
description: "Platform adaptor system"
include:
- "cli/adaptors/**/*.py"
- name: mcp
description: "MCP server integration"
include:
- "mcp/**/*.py"
tests:
path: tests/
split_by: directory
skills:
- name: unit-tests
include: ["test_*.py"]
Splitting Strategies:
class ModuleSplitter:
"""Split codebase into modular skills"""
STRATEGIES = {
"namespace": self._split_by_namespace,
"directory": self._split_by_directory,
"feature": self._split_by_feature,
"custom": self._split_by_custom,
}
def _split_by_namespace(self, module_config: dict) -> list[Skill]:
# Python: package.module.submodule
# JS: import { X } from './path/to/module'
pass
def _split_by_directory(self, module_config: dict) -> list[Skill]:
# One skill per top-level directory
pass
def _split_by_feature(self, module_config: dict) -> list[Skill]:
# Group by feature (auth, api, models, etc.)
pass
Deliverables:
- Module splitting engine
- Config schema v3.0
- Support for glob patterns
- Validation logic
Success Criteria:
- Can split skill-seekers into 4-5 modules
- Each module is focused and cohesive
- User has full control via config
Milestone 3.2: Modular Skill Generation (Week 2-3)
Output Structure:
.skill-seekers/skills/
├── libraries/
│ ├── fastapi.skill
│ ├── anthropic.skill
│ └── beautifulsoup.skill
│
└── codebase/
├── cli.skill # CLI tools
├── scrapers.skill # Scraping logic
├── adaptors.skill # Platform adaptors
├── mcp.skill # MCP server
└── tests.skill # Test suite
Each skill contains:
- Focused documentation (one module only)
- API reference for that module
- Design patterns in that module
- Test examples for that module
- Cross-references to related skills
Deliverables:
- Modular skill generator
- Cross-reference system
- Skill metadata (dependencies, related skills)
- Update generation pipeline
Success Criteria:
- Generates 4-5 focused skills for skill-seekers
- Each skill is 50-200 lines (not too big)
- Cross-references work
Milestone 3.3: Testing & Iteration (Week 4)
Test Plan:
- Generate modular skills for skill-seekers
- Use in Claude Code for 1 week
- Compare vs single skill (Phase 1)
- Iterate on module boundaries
Success Criteria:
- Modular skills are more useful than single skill
- Module boundaries make sense
- Performance is acceptable
Phase 4: Import-Based Clustering (2-3 weeks)
Status: 📅 Planned (After Phase 3) Goal: Load only relevant skills based on current file
Milestones
Milestone 4.1: Import Analyzer (Week 1)
Algorithm:
# src/skill_seekers/intelligence/import_analyzer.py
class ImportAnalyzer:
"""Analyze imports to find relevant skills"""
def find_relevant_skills(
self,
current_file: Path,
available_skills: list[SkillMetadata]
) -> list[Path]:
# 1. Parse imports from current file
imports = self._parse_imports(current_file)
# Example: editing src/cli/doc_scraper.py
# Imports:
# - from anthropic import Anthropic
# - from bs4 import BeautifulSoup
# - from skill_seekers.cli.adaptors import get_adaptor
# 2. Map imports to skills
relevant = []
for imp in imports:
# External library?
if self._is_external(imp):
library_skill = self._find_library_skill(imp)
if library_skill:
relevant.append(library_skill)
# Internal module?
else:
module_skill = self._find_module_skill(imp, available_skills)
if module_skill:
relevant.append(module_skill)
# 3. Add current module's skill
current_skill = self._find_skill_for_file(current_file, available_skills)
if current_skill:
relevant.insert(0, current_skill) # First in list
# 4. Deduplicate and rank
return self._deduplicate(relevant)[:5] # Max 5 skills
Example Output:
# Editing: src/cli/doc_scraper.py
find_relevant_skills("src/cli/doc_scraper.py")
# Returns:
[
"codebase/scrapers.skill", # Current module (highest priority)
"libraries/beautifulsoup.skill", # External import
"libraries/anthropic.skill", # External import
"codebase/adaptors.skill", # Internal import
]
Deliverables:
ImportAnalyzerclass- Python import parser (AST-based)
- JavaScript import parser (regex-based)
- Import-to-skill mapping logic
Success Criteria:
- Correctly identifies imports from files
- Maps imports to skills accurately
- Fast (<100ms for typical file)
Milestone 4.2: Claude Code Plugin (Week 2)
Plugin Architecture:
# claude_plugins/skill-seekers-intelligence/agent.py
class SkillSeekersIntelligenceAgent:
"""
Claude Code plugin that manages skill loading
"""
def __init__(self):
self.config = self._load_config()
self.import_analyzer = ImportAnalyzer()
self.current_skills = []
async def on_file_open(self, file_path: str):
"""
Hook: User opens a file
Action: Load relevant skills
"""
# Find relevant skills
relevant = self.import_analyzer.find_relevant_skills(
file_path,
self.config.available_skills
)
# Load into Claude context
self.load_skills(relevant)
# Notify user
print(f"📚 Loaded {len(relevant)} relevant skills:")
for skill in relevant:
print(f" - {skill.name}")
async def on_branch_merge(self, branch: str):
"""
Hook: Branch merged
Action: Regenerate skills if needed
"""
if branch in self.config.watch_branches:
print(f"🔄 Regenerating skills for {branch}...")
await self.regenerate_skills(branch)
print("✅ Skills updated")
def load_skills(self, skills: list[Path]):
"""Load skills into Claude context"""
self.current_skills = skills
# Tell Claude which skills are loaded
# (Implementation depends on Claude Code API)
Plugin Hooks:
on_file_open- Load relevant skillson_file_save- Update skills if neededon_branch_merge- Regenerate skillson_branch_checkout- Switch skill set
Deliverables:
- Claude Code plugin skeleton
- File open handler
- Branch merge listener
- Skill loader integration
Success Criteria:
- Plugin loads in Claude Code
- File opens trigger skill loading
- Branch merges trigger regeneration
- User sees which skills are loaded
Milestone 4.3: Testing & Dogfooding (Week 3)
Test Plan:
- Install plugin in Claude Code
- Open skill-seekers codebase
- Navigate files, observe skill loading
- Make changes, merge branch, observe regeneration
Success Criteria:
- Correct skills load for each file
- No performance issues
- User experience is smooth
Phase 5: Embedding-Based Clustering (3-4 weeks)
Status: 🔬 Experimental (After Phase 4) Goal: Smarter clustering using semantic similarity
Milestones
Milestone 5.1: Embedding Generation (Week 1-2)
Architecture:
# src/skill_seekers/intelligence/embeddings.py
class SkillEmbedder:
"""Generate and cache embeddings for skills and files"""
def __init__(self):
# Use lightweight model for speed
# Options: sentence-transformers, OpenAI, Anthropic
self.model = "all-MiniLM-L6-v2" # Fast, good quality
def embed_skill(self, skill_path: Path) -> np.ndarray:
"""Generate embedding for entire skill"""
content = skill_path.read_text()
# Extract key sections
api_ref = self._extract_section(content, "API Reference")
examples = self._extract_section(content, "Examples")
# Embed combined text
text = f"{api_ref}\n{examples}"
embedding = self.model.encode(text)
# Cache for reuse
self._cache_embedding(skill_path, embedding)
return embedding
def embed_file(self, file_path: Path) -> np.ndarray:
"""Generate embedding for current file"""
content = file_path.read_text()
# Embed full content or summary
embedding = self.model.encode(content[:5000]) # First 5K chars
return embedding
Embedding Strategy:
- Skills: Embed once, cache forever (until skill updates)
- Files: Embed on-demand (or cache for open files)
- Model: Lightweight (all-MiniLM-L6-v2 is 80MB, fast)
- Storage:
.skill-seekers/cache/embeddings/
Deliverables:
SkillEmbedderclass- Embedding cache system
- Similarity search (cosine similarity)
- Benchmark performance
Success Criteria:
- Fast embedding (<100ms per file)
- Accurate similarity (>80% precision)
- Reasonable storage (<100MB for typical project)
Milestone 5.2: Hybrid Clustering (Week 3)
Algorithm:
class HybridClusteringEngine:
"""
Combine import-based (fast, deterministic)
with embedding-based (smart, flexible)
"""
def find_relevant_skills(
self,
current_file: Path,
available_skills: list[SkillMetadata]
) -> list[Path]:
# Method 1: Import-based (weight: 0.7)
import_skills = self.import_analyzer.find_relevant_skills(
current_file, available_skills
)
# Method 2: Embedding-based (weight: 0.3)
file_embedding = self.embedder.embed_file(current_file)
similar_skills = self._find_similar_skills(
file_embedding, available_skills
)
# Combine with weighted ranking
combined = self._weighted_merge(
import_skills, similar_skills,
weights=[0.7, 0.3]
)
return combined[:5] # Top 5
Why Hybrid?
- Import-based: Precise but misses semantic similarity
- Embedding-based: Flexible but sometimes wrong
- Hybrid: Best of both worlds
Deliverables:
- Hybrid clustering algorithm
- Weighted ranking system
- A/B testing framework
- Performance comparison
Success Criteria:
- Better than import-only (A/B test)
- Not significantly slower (<200ms)
- Handles edge cases well
Milestone 5.3: Experimental Features (Week 4)
Ideas to Explore:
- Dynamic Skill Loading: Load skills as conversation progresses
- Conversation Context: Use chat history to refine clustering
- Feedback Loop: Learn from user corrections
- Skill Ranking: Rank skills by usefulness
Deliverables:
- Experimental features (optional)
- Documentation of learnings
- Recommendations for v2.0
Success Criteria:
- Identified valuable experimental features
- Documented what works and what doesn't
📊 Success Metrics
Phase 1 Metrics
- ✅ Auto-regeneration works on branch merge
- ✅ <5 minutes to regenerate skills
- ✅ Git hooks work reliably
Phase 2 Metrics
- ✅ 90%+ accuracy on tech stack detection
- ✅ Library skills downloaded successfully
- ✅ <2 seconds to download cached skill
Phase 3 Metrics
- ✅ Modular skills are 50-200 lines each
- ✅ User can configure module boundaries
- ✅ Cross-references work
Phase 4 Metrics
- ✅ Correct skills load 85%+ of the time
- ✅ <100ms to find relevant skills
- ✅ Plugin works smoothly in Claude Code
Phase 5 Metrics
- ✅ Hybrid clustering beats import-only
- ✅ <200ms to cluster with embeddings
- ✅ Embedding cache < 100MB
🎯 Target Users
Primary: Individual Open Source Developers
- Working on their own projects
- Want better codebase understanding
- Use Claude Code for development
- Value automation over manual work
Secondary: Small Teams
- Onboarding new developers
- Maintaining large codebases
- Need consistent documentation
Future: Enterprise
- Large codebases (1M+ LOC)
- Multiple microservices
- Advanced clustering requirements
📦 Deliverables
User-Facing
- CLI commands (init, regenerate, detect, download)
- Claude Code plugin
- Configuration system (.skill-seekers/config.yml)
- Documentation (user guide, tutorial)
Developer-Facing
- Python library (skill_seekers.intelligence)
- Plugin SDK (for extending)
- API documentation
- Architecture documentation
Infrastructure
- Git hooks
- CI/CD integration
- Embedding cache system
- Skill registry
🚧 Known Challenges
Technical
- Context Window Limits: Even with clustering, large projects may exceed limits
- Embedding Performance: Need fast, lightweight models
- Accuracy: Import analysis may miss implicit dependencies
- Versioning: Skills must stay in sync with code
Product
- Onboarding: Complex system needs good UX
- Configuration: Balance power vs simplicity
- Debugging: When clustering fails, hard to debug
Operational
- Maintenance: More components = more maintenance
- Testing: Hard to test context-aware features
- Documentation: Need excellent docs for adoption
🔮 Future Ideas (Post v1.0)
Advanced Clustering
- Multi-file context (editing 3 files → load related skills)
- Conversation-aware clustering (use chat history)
- Feedback loop (learn from corrections)
Multi-Project
- Workspace support (multiple projects)
- Cross-project skills (shared libraries)
- Monorepo support
Integrations
- VS Code extension
- IntelliJ plugin
- Web dashboard
Advanced Features
- Skill versioning (track changes over time)
- Skill diff (compare versions)
- Skill analytics (usage tracking)
📚 References
- Existing Features: C3.x Codebase Analysis (patterns, examples, architecture)
- Platform: Claude Code plugin system
- Similar Tools: GitHub Copilot, Cursor, Tabnine
- Research: RAG systems, semantic search, code embeddings
Version: 1.0 Status: Research & Design Phase Next Review: After Phase 0 completion Owner: Yusuf Karaaslan