# Skill Seekers Intelligence System - Roadmap **Status:** 🔬 Research & Design Phase **Target:** Open Source, Individual Developers **Timeline:** 6-12 months (iterative releases) **Version:** 1.0 (Initial Design) **Last Updated:** 2026-01-20 --- ## 🎯 Vision Build an **auto-updating, context-aware, multi-skill codebase intelligence system** that: 1. **Detects** your tech stack automatically 2. **Generates** separate skills for libraries and codebase modules 3. **Updates** skills when branches merge (git-based triggers) 4. **Clusters** skills intelligently based on what you're working on 5. **Integrates** with Claude Code via plugin architecture **Think of it as:** A self-maintaining RAG system for your codebase that knows exactly which knowledge to load based on context. --- ## 🏗️ System Architecture Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Skill Seekers Intelligence System │ ├─────────────────────────────────────────────────────────────┤ │ │ │ Layer 1: PROJECT CONFIGURATION │ │ ┌──────────────────────────────────────────┐ │ │ │ .skill-seekers/ │ │ │ │ ├── config.yml (user editable) │ │ │ │ ├── skills/ (auto-generated) │ │ │ │ ├── cache/ (embeddings) │ │ │ │ └── hooks/ (git triggers) │ │ │ └──────────────────────────────────────────┘ │ │ │ │ Layer 2: SKILL GENERATION ENGINE │ │ ┌──────────────────────────────────────────┐ │ │ │ • Tech Stack Detector │ │ │ │ • Modular Codebase Analyzer (C3.x) │ │ │ │ • Library Skill Downloader │ │ │ │ • Git-Based Trigger System │ │ │ └──────────────────────────────────────────┘ │ │ │ │ Layer 3: SKILL CLUSTERING ENGINE │ │ ┌──────────────────────────────────────────┐ │ │ │ Phase 1: Import-Based (deterministic) │ │ │ │ Phase 2: Embedding-Based (experimental) │ │ │ └──────────────────────────────────────────┘ │ │ │ │ Layer 4: CLAUDE CODE PLUGIN │ │ ┌──────────────────────────────────────────┐ │ │ │ • File Open Handler │ │ │ │ • Branch Merge Listener │ │ │ │ • Context Manager │ │ │ │ • Skill Loader │ │ │ └──────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 📋 Development Phases ### Phase 0: Research & Validation (2-3 weeks) **Status:** 🔬 Current Phase **Goal:** Validate core assumptions, design architecture **Deliverables:** - ✅ Technical architecture document - ✅ Roadmap document (this file) - ✅ POC design for Phase 1 - ✅ Research clustering algorithms - ✅ Design config schema **Success Criteria:** - Clear technical direction - Validated assumptions (import analysis works, etc.) - Ready to build Phase 1 --- ### Phase 1: Git-Based Auto-Generation (3-4 weeks) **Status:** 📅 Planned **Goal:** Auto-generate skills on branch merges #### Milestones **Milestone 1.1: Project Initialization (Week 1)** ```bash # Command skill-seekers init-project --directory . # Creates .skill-seekers/ ├── config.yml # Project configuration ├── hooks/ │ ├── post-merge # Git hook │ └── post-commit # Optional └── skills/ ├── libraries/ # Empty (Phase 2) └── codebase/ # Will be generated ``` **Config Schema (v1.0):** ```yaml # .skill-seekers/config.yml version: "1.0" project_name: skill-seekers watch_branches: - main - development # Phase 1: Simple, no modules yet skill_generation: enabled: true output_dir: .skill-seekers/skills/codebase git_hooks: enabled: true trigger_on: - post-merge - post-commit # optional ``` **Deliverables:** - [ ] `skill-seekers init-project` command - [ ] Config schema v1.0 - [ ] Git hook installer - [ ] Project directory structure creator **Success Criteria:** - Running `init-project` sets up directory structure - Git hooks are installed correctly - Config file is created with sensible defaults --- **Milestone 1.2: Git Hook Integration (Week 2)** **Git Hook Logic:** ```bash #!/bin/bash # .skill-seekers/hooks/post-merge # Check if we're on a watched branch CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD) WATCH_BRANCHES=$(yq '.watch_branches[]' .skill-seekers/config.yml) if echo "$WATCH_BRANCHES" | grep -q "$CURRENT_BRANCH"; then echo "🔄 Branch merge detected on $CURRENT_BRANCH" echo "🚀 Regenerating skills..." skill-seekers regenerate-skills --branch "$CURRENT_BRANCH" echo "✅ Skills updated" fi ``` **Deliverables:** - [ ] Git hook templates - [ ] Hook installer/uninstaller - [ ] Branch detection logic - [ ] Hook execution logging **Success Criteria:** - Merging to watched branch triggers skill regeneration - Only watched branches trigger updates - Hooks can be enabled/disabled via config --- **Milestone 1.3: Basic Skill Regeneration (Week 3)** **Command:** ```bash skill-seekers regenerate-skills --branch main # Runs: # 1. Detects changed files since last generation # 2. Runs codebase analysis (existing C3.x features) # 3. Generates single skill: codebase.skill # 4. Updates .skill-seekers/skills/codebase/codebase.skill ``` **Phase 1 Scope (Simple):** - Single skill for entire codebase - No modularization yet (Phase 3) - No library skills yet (Phase 2) - No clustering yet (Phase 4) **Deliverables:** - [ ] `regenerate-skills` command - [ ] Change detection (git diff) - [ ] Incremental vs full regeneration logic - [ ] Skill versioning (timestamp) **Success Criteria:** - Manual regeneration works - Git hook triggers regeneration - Skill is usable in Claude Code --- **Milestone 1.4: Dogfooding & Testing (Week 4)** **Test on skill-seekers itself:** ```bash cd Skill_Seekers/ skill-seekers init-project --directory . # Make code change git checkout -b test-auto-regen echo "# Test" >> README.md git commit -am "test: Auto-regen test" # Merge to development git checkout development git merge test-auto-regen # → Should trigger skill regeneration # Verify cat .skill-seekers/skills/codebase/codebase.skill ``` **Deliverables:** - [ ] End-to-end test on skill-seekers - [ ] Performance benchmarks - [ ] Bug fixes - [ ] Documentation updates **Success Criteria:** - Works on skill-seekers codebase - Regeneration completes in <5 minutes - Generated skill is high quality - No major bugs --- ### Phase 2: Tech Stack Detection & Library Skills (2-3 weeks) **Status:** 📅 Planned (After Phase 1) **Goal:** Auto-detect tech stack and download library skills #### Milestones **Milestone 2.1: Tech Stack Detector (Week 1)** **Detection Strategy:** ```python # src/skill_seekers/intelligence/stack_detector.py class TechStackDetector: """Detect tech stack from project files""" def detect(self, project_dir: Path) -> dict: stack = { "languages": [], "frameworks": [], "databases": [], "tools": [] } # Python ecosystem if (project_dir / "requirements.txt").exists(): stack["languages"].append("Python") deps = self._parse_requirements() if "fastapi" in deps: stack["frameworks"].append("FastAPI") if "django" in deps: stack["frameworks"].append("Django") if "flask" in deps: stack["frameworks"].append("Flask") # JavaScript/TypeScript ecosystem if (project_dir / "package.json").exists(): deps = self._parse_package_json() if "typescript" in deps: stack["languages"].append("TypeScript") else: stack["languages"].append("JavaScript") if "react" in deps: stack["frameworks"].append("React") if "vue" in deps: stack["frameworks"].append("Vue") if "next" in deps: stack["frameworks"].append("Next.js") # Database detection if (project_dir / ".env").exists(): env = self._parse_env() db_url = env.get("DATABASE_URL", "") if "postgres" in db_url: stack["databases"].append("PostgreSQL") if "mysql" in db_url: stack["databases"].append("MySQL") if "mongodb" in db_url: stack["databases"].append("MongoDB") # Docker services if (project_dir / "docker-compose.yml").exists(): services = self._parse_docker_compose() stack["tools"].extend(services) return stack ``` **Supported Ecosystems (v1.0):** - **Python:** FastAPI, Django, Flask, SQLAlchemy - **JavaScript/TypeScript:** React, Vue, Next.js, Express - **Databases:** PostgreSQL, MySQL, MongoDB, Redis - **Tools:** Docker, Nginx, Celery **Deliverables:** - [ ] `TechStackDetector` class - [ ] Parsers for common config files - [ ] Detection accuracy tests - [ ] `skill-seekers detect-stack` command **Success Criteria:** - 90%+ accuracy on common stacks - Fast (<1 second) - Extensible (easy to add new detectors) --- **Milestone 2.2: Library Skill Downloader (Week 2)** **Architecture:** ```python # src/skill_seekers/intelligence/library_manager.py class LibrarySkillManager: """Download and cache library skills""" def download_skills(self, tech_stack: dict) -> list[Path]: skills = [] for framework in tech_stack["frameworks"]: skill_path = self._download_skill(framework) skills.append(skill_path) return skills def _download_skill(self, name: str) -> Path: # Try skillseekersweb.com API first skill = self._fetch_from_api(name) if not skill: # Fallback: generate from GitHub repo skill = self._generate_from_github(name) # Cache locally cache_path = Path(f".skill-seekers/skills/libraries/{name}.skill") cache_path.write_text(skill) return cache_path ``` **Library Skill Sources:** 1. **SkillSeekersWeb.com API** (preferred) - Pre-generated skills for popular frameworks - Curated, high-quality - Fast download 2. **On-Demand Generation** (fallback) - Generate from framework's GitHub repo - Uses existing `github_scraper.py` - Cached after first generation **Deliverables:** - [ ] `LibrarySkillManager` class - [ ] API client for skillseekersweb.com - [ ] Caching system - [ ] `skill-seekers download-libraries` command **Success Criteria:** - Downloads skills for detected frameworks - Caching works (no duplicate downloads) - Handles missing skills gracefully --- **Milestone 2.3: Config Schema v2.0 (Week 3)** **Updated Config:** ```yaml # .skill-seekers/config.yml version: "2.0" project_name: skill-seekers watch_branches: - main - development # NEW: Tech stack configuration tech_stack: auto_detect: true frameworks: - FastAPI - React - PostgreSQL # Override auto-detection custom: - name: "Internal Framework" skill_url: "https://internal.com/skills/framework.skill" # Library skills library_skills: enabled: true source: "skillseekersweb.com" cache_dir: .skill-seekers/skills/libraries update_frequency: "weekly" # or: never, daily, on-branch-merge skill_generation: enabled: true output_dir: .skill-seekers/skills/codebase git_hooks: enabled: true trigger_on: - post-merge ``` **Deliverables:** - [ ] Config schema v2.0 - [ ] Migration from v1.0 to v2.0 - [ ] Validation logic - [ ] Documentation **Success Criteria:** - Backward compatible with v1.0 - Clear upgrade path - Well documented --- ### Phase 3: Modular Skill Splitting (3-4 weeks) **Status:** 📅 Planned (After Phase 2) **Goal:** Split codebase into modular skills based on config #### Milestones **Milestone 3.1: Module Configuration (Week 1)** **Config Schema v3.0:** ```yaml # .skill-seekers/config.yml version: "3.0" project_name: skill-seekers # ... (previous config) # NEW: Module definitions modules: backend: path: src/skill_seekers/ split_by: namespace # or: directory, feature, custom skills: - name: cli description: "Command-line interface tools" include: - "cli/**/*.py" exclude: - "cli/**/*_test.py" - name: scrapers description: "Web scraping and analysis" include: - "cli/doc_scraper.py" - "cli/github_scraper.py" - "cli/pdf_scraper.py" - name: adaptors description: "Platform adaptor system" include: - "cli/adaptors/**/*.py" - name: mcp description: "MCP server integration" include: - "mcp/**/*.py" tests: path: tests/ split_by: directory skills: - name: unit-tests include: ["test_*.py"] ``` **Splitting Strategies:** ```python class ModuleSplitter: """Split codebase into modular skills""" STRATEGIES = { "namespace": self._split_by_namespace, "directory": self._split_by_directory, "feature": self._split_by_feature, "custom": self._split_by_custom, } def _split_by_namespace(self, module_config: dict) -> list[Skill]: # Python: package.module.submodule # JS: import { X } from './path/to/module' pass def _split_by_directory(self, module_config: dict) -> list[Skill]: # One skill per top-level directory pass def _split_by_feature(self, module_config: dict) -> list[Skill]: # Group by feature (auth, api, models, etc.) pass ``` **Deliverables:** - [ ] Module splitting engine - [ ] Config schema v3.0 - [ ] Support for glob patterns - [ ] Validation logic **Success Criteria:** - Can split skill-seekers into 4-5 modules - Each module is focused and cohesive - User has full control via config --- **Milestone 3.2: Modular Skill Generation (Week 2-3)** **Output Structure:** ``` .skill-seekers/skills/ ├── libraries/ │ ├── fastapi.skill │ ├── anthropic.skill │ └── beautifulsoup.skill │ └── codebase/ ├── cli.skill # CLI tools ├── scrapers.skill # Scraping logic ├── adaptors.skill # Platform adaptors ├── mcp.skill # MCP server └── tests.skill # Test suite ``` **Each skill contains:** - Focused documentation (one module only) - API reference for that module - Design patterns in that module - Test examples for that module - Cross-references to related skills **Deliverables:** - [ ] Modular skill generator - [ ] Cross-reference system - [ ] Skill metadata (dependencies, related skills) - [ ] Update generation pipeline **Success Criteria:** - Generates 4-5 focused skills for skill-seekers - Each skill is 50-200 lines (not too big) - Cross-references work --- **Milestone 3.3: Testing & Iteration (Week 4)** **Test Plan:** 1. Generate modular skills for skill-seekers 2. Use in Claude Code for 1 week 3. Compare vs single skill (Phase 1) 4. Iterate on module boundaries **Success Criteria:** - Modular skills are more useful than single skill - Module boundaries make sense - Performance is acceptable --- ### Phase 4: Import-Based Clustering (2-3 weeks) **Status:** 📅 Planned (After Phase 3) **Goal:** Load only relevant skills based on current file #### Milestones **Milestone 4.1: Import Analyzer (Week 1)** **Algorithm:** ```python # src/skill_seekers/intelligence/import_analyzer.py class ImportAnalyzer: """Analyze imports to find relevant skills""" def find_relevant_skills( self, current_file: Path, available_skills: list[SkillMetadata] ) -> list[Path]: # 1. Parse imports from current file imports = self._parse_imports(current_file) # Example: editing src/cli/doc_scraper.py # Imports: # - from anthropic import Anthropic # - from bs4 import BeautifulSoup # - from skill_seekers.cli.adaptors import get_adaptor # 2. Map imports to skills relevant = [] for imp in imports: # External library? if self._is_external(imp): library_skill = self._find_library_skill(imp) if library_skill: relevant.append(library_skill) # Internal module? else: module_skill = self._find_module_skill(imp, available_skills) if module_skill: relevant.append(module_skill) # 3. Add current module's skill current_skill = self._find_skill_for_file(current_file, available_skills) if current_skill: relevant.insert(0, current_skill) # First in list # 4. Deduplicate and rank return self._deduplicate(relevant)[:5] # Max 5 skills ``` **Example Output:** ```python # Editing: src/cli/doc_scraper.py find_relevant_skills("src/cli/doc_scraper.py") # Returns: [ "codebase/scrapers.skill", # Current module (highest priority) "libraries/beautifulsoup.skill", # External import "libraries/anthropic.skill", # External import "codebase/adaptors.skill", # Internal import ] ``` **Deliverables:** - [ ] `ImportAnalyzer` class - [ ] Python import parser (AST-based) - [ ] JavaScript import parser (regex-based) - [ ] Import-to-skill mapping logic **Success Criteria:** - Correctly identifies imports from files - Maps imports to skills accurately - Fast (<100ms for typical file) --- **Milestone 4.2: Claude Code Plugin (Week 2)** **Plugin Architecture:** ```python # claude_plugins/skill-seekers-intelligence/agent.py class SkillSeekersIntelligenceAgent: """ Claude Code plugin that manages skill loading """ def __init__(self): self.config = self._load_config() self.import_analyzer = ImportAnalyzer() self.current_skills = [] async def on_file_open(self, file_path: str): """ Hook: User opens a file Action: Load relevant skills """ # Find relevant skills relevant = self.import_analyzer.find_relevant_skills( file_path, self.config.available_skills ) # Load into Claude context self.load_skills(relevant) # Notify user print(f"📚 Loaded {len(relevant)} relevant skills:") for skill in relevant: print(f" - {skill.name}") async def on_branch_merge(self, branch: str): """ Hook: Branch merged Action: Regenerate skills if needed """ if branch in self.config.watch_branches: print(f"🔄 Regenerating skills for {branch}...") await self.regenerate_skills(branch) print("✅ Skills updated") def load_skills(self, skills: list[Path]): """Load skills into Claude context""" self.current_skills = skills # Tell Claude which skills are loaded # (Implementation depends on Claude Code API) ``` **Plugin Hooks:** - `on_file_open` - Load relevant skills - `on_file_save` - Update skills if needed - `on_branch_merge` - Regenerate skills - `on_branch_checkout` - Switch skill set **Deliverables:** - [ ] Claude Code plugin skeleton - [ ] File open handler - [ ] Branch merge listener - [ ] Skill loader integration **Success Criteria:** - Plugin loads in Claude Code - File opens trigger skill loading - Branch merges trigger regeneration - User sees which skills are loaded --- **Milestone 4.3: Testing & Dogfooding (Week 3)** **Test Plan:** 1. Install plugin in Claude Code 2. Open skill-seekers codebase 3. Navigate files, observe skill loading 4. Make changes, merge branch, observe regeneration **Success Criteria:** - Correct skills load for each file - No performance issues - User experience is smooth --- ### Phase 5: Embedding-Based Clustering (3-4 weeks) **Status:** 🔬 Experimental (After Phase 4) **Goal:** Smarter clustering using semantic similarity #### Milestones **Milestone 5.1: Embedding Generation (Week 1-2)** **Architecture:** ```python # src/skill_seekers/intelligence/embeddings.py class SkillEmbedder: """Generate and cache embeddings for skills and files""" def __init__(self): # Use lightweight model for speed # Options: sentence-transformers, OpenAI, Anthropic self.model = "all-MiniLM-L6-v2" # Fast, good quality def embed_skill(self, skill_path: Path) -> np.ndarray: """Generate embedding for entire skill""" content = skill_path.read_text() # Extract key sections api_ref = self._extract_section(content, "API Reference") examples = self._extract_section(content, "Examples") # Embed combined text text = f"{api_ref}\n{examples}" embedding = self.model.encode(text) # Cache for reuse self._cache_embedding(skill_path, embedding) return embedding def embed_file(self, file_path: Path) -> np.ndarray: """Generate embedding for current file""" content = file_path.read_text() # Embed full content or summary embedding = self.model.encode(content[:5000]) # First 5K chars return embedding ``` **Embedding Strategy:** - **Skills:** Embed once, cache forever (until skill updates) - **Files:** Embed on-demand (or cache for open files) - **Model:** Lightweight (all-MiniLM-L6-v2 is 80MB, fast) - **Storage:** `.skill-seekers/cache/embeddings/` **Deliverables:** - [ ] `SkillEmbedder` class - [ ] Embedding cache system - [ ] Similarity search (cosine similarity) - [ ] Benchmark performance **Success Criteria:** - Fast embedding (<100ms per file) - Accurate similarity (>80% precision) - Reasonable storage (<100MB for typical project) --- **Milestone 5.2: Hybrid Clustering (Week 3)** **Algorithm:** ```python class HybridClusteringEngine: """ Combine import-based (fast, deterministic) with embedding-based (smart, flexible) """ def find_relevant_skills( self, current_file: Path, available_skills: list[SkillMetadata] ) -> list[Path]: # Method 1: Import-based (weight: 0.7) import_skills = self.import_analyzer.find_relevant_skills( current_file, available_skills ) # Method 2: Embedding-based (weight: 0.3) file_embedding = self.embedder.embed_file(current_file) similar_skills = self._find_similar_skills( file_embedding, available_skills ) # Combine with weighted ranking combined = self._weighted_merge( import_skills, similar_skills, weights=[0.7, 0.3] ) return combined[:5] # Top 5 ``` **Why Hybrid?** - Import-based: Precise but misses semantic similarity - Embedding-based: Flexible but sometimes wrong - Hybrid: Best of both worlds **Deliverables:** - [ ] Hybrid clustering algorithm - [ ] Weighted ranking system - [ ] A/B testing framework - [ ] Performance comparison **Success Criteria:** - Better than import-only (A/B test) - Not significantly slower (<200ms) - Handles edge cases well --- **Milestone 5.3: Experimental Features (Week 4)** **Ideas to Explore:** 1. **Dynamic Skill Loading:** Load skills as conversation progresses 2. **Conversation Context:** Use chat history to refine clustering 3. **Feedback Loop:** Learn from user corrections 4. **Skill Ranking:** Rank skills by usefulness **Deliverables:** - [ ] Experimental features (optional) - [ ] Documentation of learnings - [ ] Recommendations for v2.0 **Success Criteria:** - Identified valuable experimental features - Documented what works and what doesn't --- ## 📊 Success Metrics ### Phase 1 Metrics - ✅ Auto-regeneration works on branch merge - ✅ <5 minutes to regenerate skills - ✅ Git hooks work reliably ### Phase 2 Metrics - ✅ 90%+ accuracy on tech stack detection - ✅ Library skills downloaded successfully - ✅ <2 seconds to download cached skill ### Phase 3 Metrics - ✅ Modular skills are 50-200 lines each - ✅ User can configure module boundaries - ✅ Cross-references work ### Phase 4 Metrics - ✅ Correct skills load 85%+ of the time - ✅ <100ms to find relevant skills - ✅ Plugin works smoothly in Claude Code ### Phase 5 Metrics - ✅ Hybrid clustering beats import-only - ✅ <200ms to cluster with embeddings - ✅ Embedding cache < 100MB --- ## 🎯 Target Users ### Primary: Individual Open Source Developers - Working on their own projects - Want better codebase understanding - Use Claude Code for development - Value automation over manual work ### Secondary: Small Teams - Onboarding new developers - Maintaining large codebases - Need consistent documentation ### Future: Enterprise - Large codebases (1M+ LOC) - Multiple microservices - Advanced clustering requirements --- ## 📦 Deliverables ### User-Facing - [ ] CLI commands (init, regenerate, detect, download) - [ ] Claude Code plugin - [ ] Configuration system (.skill-seekers/config.yml) - [ ] Documentation (user guide, tutorial) ### Developer-Facing - [ ] Python library (skill_seekers.intelligence) - [ ] Plugin SDK (for extending) - [ ] API documentation - [ ] Architecture documentation ### Infrastructure - [ ] Git hooks - [ ] CI/CD integration - [ ] Embedding cache system - [ ] Skill registry --- ## 🚧 Known Challenges ### Technical 1. **Context Window Limits:** Even with clustering, large projects may exceed limits 2. **Embedding Performance:** Need fast, lightweight models 3. **Accuracy:** Import analysis may miss implicit dependencies 4. **Versioning:** Skills must stay in sync with code ### Product 1. **Onboarding:** Complex system needs good UX 2. **Configuration:** Balance power vs simplicity 3. **Debugging:** When clustering fails, hard to debug ### Operational 1. **Maintenance:** More components = more maintenance 2. **Testing:** Hard to test context-aware features 3. **Documentation:** Need excellent docs for adoption --- ## 🔮 Future Ideas (Post v1.0) ### Advanced Clustering - [ ] Multi-file context (editing 3 files → load related skills) - [ ] Conversation-aware clustering (use chat history) - [ ] Feedback loop (learn from corrections) ### Multi-Project - [ ] Workspace support (multiple projects) - [ ] Cross-project skills (shared libraries) - [ ] Monorepo support ### Integrations - [ ] VS Code extension - [ ] IntelliJ plugin - [ ] Web dashboard ### Advanced Features - [ ] Skill versioning (track changes over time) - [ ] Skill diff (compare versions) - [ ] Skill analytics (usage tracking) --- ## 📚 References - **Existing Features:** C3.x Codebase Analysis (patterns, examples, architecture) - **Platform:** Claude Code plugin system - **Similar Tools:** GitHub Copilot, Cursor, Tabnine - **Research:** RAG systems, semantic search, code embeddings --- **Version:** 1.0 **Status:** Research & Design Phase **Next Review:** After Phase 0 completion **Owner:** Yusuf Karaaslan