fix(A1.3): Add comprehensive validation to submit_config MCP tool

Issue: #11 (A1.3 - Add MCP tool to submit custom configs) ## Summary Fixed submit_config MCP tool to use ConfigValidator for comprehensive validation instead of basic 3-field checks. Now supports both legacy and unified config formats with detailed error messages and validation warnings. ## Critical Gaps Fixed (6 total) 1. ✅ Missing comprehensive validation (HIGH) - Only checked 3 fields 2. ✅ No unified config support (HIGH) - Couldn't handle multi-source configs 3. ✅ No test coverage (MEDIUM) - Zero tests for submit_config_tool 4. ✅ No URL format validation (MEDIUM) - Accepted malformed URLs 5. ✅ No warnings for unlimited scraping (LOW) - Silent config issues 6. ✅ No url_patterns validation (MEDIUM) - No selector structure checks ## Changes Made ### Phase 1: Validation Logic (server.py lines 1224-1380) - Added ConfigValidator import with graceful degradation - Replaced basic validation (3 fields) with comprehensive ConfigValidator.validate() - Enhanced category detection for unified multi-source configs - Added validation warnings collection (unlimited scraping, missing max_pages) - Updated GitHub issue template with: * Config format type (Unified vs Legacy) * Validation warnings section * Updated documentation URL handling for unified configs * Checklist showing "Config validated with ConfigValidator" ### Phase 2: Test Coverage (test_mcp_server.py lines 617-769) Added 8 comprehensive test cases: 1. test_submit_config_requires_token - GitHub token requirement 2. test_submit_config_validates_required_fields - Required field validation 3. test_submit_config_validates_name_format - Name format validation 4. test_submit_config_validates_url_format - URL format validation 5. test_submit_config_accepts_legacy_format - Legacy config acceptance 6. test_submit_config_accepts_unified_format - Unified config acceptance 7. test_submit_config_from_file_path - File path input support 8. test_submit_config_detects_category - Category auto-detection ### Phase 3: Documentation Updates - Updated Issue #11 with completion notes - Updated tool description to mention format support - Updated CHANGELOG.md with fix details - Added EVOLUTION_ANALYSIS.md for deep architecture analysis ## Validation Improvements ### Before: ```python required_fields = ["name", "description", "base_url"] missing_fields = [field for field in required_fields if field not in config_data] if missing_fields: return error ``` ### After: ```python validator = ConfigValidator(config_data) validator.validate() # Comprehensive validation: # - Name format (alphanumeric, hyphens, underscores only) # - URL formats (must start with http:// or https://) # - Selectors structure (dict with proper keys) # - Rate limits (non-negative numbers) # - Max pages (positive integer or -1) # - Supports both legacy AND unified formats # - Provides detailed error messages with examples ``` ## Test Results ✅ All 427 tests passing (no regressions) ✅ 8 new tests for submit_config_tool ✅ No breaking changes ## Files Modified - src/skill_seekers/mcp/server.py (157 lines changed) - tests/test_mcp_server.py (157 lines added) - CHANGELOG.md (12 lines added) - EVOLUTION_ANALYSIS.md (500+ lines, new file) ## Issue Resolution Closes #11 - A1.3 now fully implemented with comprehensive validation, test coverage, and support for both config formats. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 18:32:20 +03:00
parent 1e50290fc7
commit cee3fcf025
4 changed files with 963 additions and 19 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+### Fixed
+- **submit_config MCP tool** - Comprehensive validation and format support ([#11](https://github.com/yusufkaraaslan/Skill_Seekers/issues/11))
+  - Now uses ConfigValidator for comprehensive validation (previously only checked 3 fields)
+  - Validates name format (alphanumeric, hyphens, underscores only)
+  - Validates URL formats (must start with http:// or https://)
+  - Validates selectors, patterns, rate limits, and max_pages
+  - **Supports both legacy and unified config formats**
+  - Provides detailed error messages with validation failures and examples
+  - Adds warnings for unlimited scraping configurations
+  - Enhanced category detection for multi-source configs
+  - 8 comprehensive test cases added to test_mcp_server.py
+  - Updated GitHub issue template with format type and validation warnings
+
 ---

 ## [2.1.1] - 2025-11-30
--- a/EVOLUTION_ANALYSIS.md
+++ b/EVOLUTION_ANALYSIS.md
@@ -0,0 +1,710 @@
+# Skill Seekers Evolution Analysis
+**Date**: 2025-12-21
+**Focus**: A1.3 Completion + A1.9 Multi-Source Architecture
+
+---
+
+## 🔍 Part 1: A1.3 Implementation Gap Analysis
+
+### What We Built vs What Was Required
+
+#### ✅ **Completed Requirements:**
+1. MCP tool `submit_config` - ✅ DONE
+2. Creates GitHub issue in skill-seekers-configs repo - ✅ DONE
+3. Uses issue template format - ✅ DONE
+4. Auto-labels (config-submission, needs-review) - ✅ DONE
+5. Returns GitHub issue URL - ✅ DONE
+6. Accepts config_path or config_json - ✅ DONE
+7. Validates required fields - ✅ DONE (basic)
+
+#### ❌ **Missing/Incomplete:**
+1. **Robust Validation** - Issue says "same validation as `validate_config` tool"
+   - **Current**: Only checks `name`, `description`, `base_url` exist
+   - **Should**: Use `config_validator.py` which validates:
+     - URL formats (http/https)
+     - Selector structure
+     - Pattern arrays
+     - Unified vs legacy format
+     - Source types (documentation, github, pdf)
+     - Merge modes
+     - All nested fields
+
+2. **URL Validation** - Not checking if URLs are actually valid
+   - **Current**: Just checks if `base_url` exists
+   - **Should**: Validate URL format, check reachability (optional)
+
+3. **Schema Validation** - Not using the full validator
+   - **Current**: Manual field checks
+   - **Should**: `ConfigValidator(config_data).validate()`
+
+### 🔧 **What Needs to be Fixed:**
+
+```python
+# CURRENT (submit_config_tool):
+required_fields = ["name", "description", "base_url"]
+missing_fields = [field for field in required_fields if field not in config_data]
+# Basic but incomplete
+
+# SHOULD BE:
+from config_validator import ConfigValidator
+validator = ConfigValidator(config_data)
+try:
+    validator.validate()  # Comprehensive validation
+except ValueError as e:
+    return error_message(str(e))
+```
+
+---
+
+## 🚀 Part 2: A1.9 Multi-Source Architecture - The Big Picture
+
+### Current State: Single Source System
+
+```
+User → fetch_config → API → skill-seekers-configs (GitHub) → Download
+```
+
+**Limitations:**
+- Only ONE source of configs (official public repo)
+- Can't use private configs
+- Can't share configs within teams
+- Can't create custom collections
+- Centralized dependency
+
+### Future State: Multi-Source Federation
+
+```
+User → fetch_config → Source Manager → [
+    Priority 1: Official (public)
+    Priority 2: Team Private Repo
+    Priority 3: Personal Configs
+    Priority 4: Custom Collections
+] → Download
+```
+
+**Capabilities:**
+- Multiple config sources
+- Public + Private repos
+- Team collaboration
+- Personal configs
+- Custom curated collections
+- Decentralized, federated system
+
+---
+
+## 🎯 Part 3: Evolution Vision - The Three Horizons
+
+### **Horizon 1: Official Configs (CURRENT - A1.1 to A1.3)**
+✅ **Status**: Complete
+**What**: Single public repository (skill-seekers-configs)
+**Users**: Everyone, public community
+**Paradigm**: Centralized, curated, verified configs
+
+### **Horizon 2: Multi-Source Federation (A1.9)**
+🔨 **Status**: Proposed
+**What**: Support multiple git repositories as config sources
+**Users**: Teams (3-5 people), organizations, individuals
+**Paradigm**: Decentralized, federated, user-controlled
+
+**Key Features:**
+- Direct git URL support
+- Named sources (register once, use many times)
+- Authentication (GitHub/GitLab/Bitbucket tokens)
+- Caching (local clones)
+- Priority-based resolution
+- Public OR private repos
+
+**Implementation:**
+```python
+# Option 1: Direct URL (one-off)
+fetch_config(
+    git_url='https://github.com/myteam/configs.git',
+    config_name='internal-api',
+    token='$GITHUB_TOKEN'
+)
+
+# Option 2: Named source (reusable)
+add_config_source(
+    name='team',
+    git_url='https://github.com/myteam/configs.git',
+    token='$GITHUB_TOKEN'
+)
+fetch_config(source='team', config_name='internal-api')
+
+# Option 3: Config file
+# ~/.skill-seekers/sources.json
+{
+  "sources": [
+    {"name": "official", "git_url": "...", "priority": 1},
+    {"name": "team", "git_url": "...", "priority": 2, "token": "$TOKEN"}
+  ]
+}
+```
+
+### **Horizon 3: Skill Marketplace (Future - A1.13+)**
+💭 **Status**: Vision
+**What**: Full ecosystem of shareable configs AND skills
+**Users**: Entire community, marketplace dynamics
+**Paradigm**: Platform, network effects, curation
+
+**Key Features:**
+- Browse all public sources
+- Star/rate configs
+- Download counts, popularity
+- Verified configs (badge system)
+- Share built skills (not just configs)
+- Continuous updates (watch repos)
+- Notifications
+
+---
+
+## 🏗️ Part 4: Technical Architecture for A1.9
+
+### **Layer 1: Source Management**
+
+```python
+# ~/.skill-seekers/sources.json
+{
+  "version": "1.0",
+  "default_source": "official",
+  "sources": [
+    {
+      "name": "official",
+      "type": "git",
+      "git_url": "https://github.com/yusufkaraaslan/skill-seekers-configs.git",
+      "branch": "main",
+      "enabled": true,
+      "priority": 1,
+      "cache_ttl": 86400  # 24 hours
+    },
+    {
+      "name": "team",
+      "type": "git",
+      "git_url": "https://github.com/myteam/private-configs.git",
+      "branch": "main",
+      "token_env": "TEAM_GITHUB_TOKEN",
+      "enabled": true,
+      "priority": 2,
+      "cache_ttl": 3600  # 1 hour
+    }
+  ]
+}
+```
+
+**Source Manager Class:**
+```python
+class SourceManager:
+    def __init__(self, config_file="~/.skill-seekers/sources.json"):
+        self.config_file = Path(config_file).expanduser()
+        self.sources = self.load_sources()
+
+    def add_source(self, name, git_url, token=None, priority=None):
+        """Register a new config source"""
+
+    def remove_source(self, name):
+        """Remove a registered source"""
+
+    def list_sources(self):
+        """List all registered sources"""
+
+    def get_source(self, name):
+        """Get source by name"""
+
+    def search_config(self, config_name):
+        """Search for config across all sources (priority order)"""
+```
+
+### **Layer 2: Git Operations**
+
+```python
+class GitConfigRepo:
+    def __init__(self, source_config):
+        self.url = source_config['git_url']
+        self.branch = source_config.get('branch', 'main')
+        self.cache_dir = Path("~/.skill-seekers/cache") / source_config['name']
+        self.token = self._get_token(source_config)
+
+    def clone_or_update(self):
+        """Clone if not exists, else pull"""
+        if not self.cache_dir.exists():
+            self._clone()
+        else:
+            self._pull()
+
+    def _clone(self):
+        """Shallow clone for efficiency"""
+        # git clone --depth 1 --branch {branch} {url} {cache_dir}
+
+    def _pull(self):
+        """Update existing clone"""
+        # git -C {cache_dir} pull
+
+    def list_configs(self):
+        """Scan cache_dir for .json files"""
+
+    def get_config(self, config_name):
+        """Read specific config file"""
+```
+
+**Library Choice:**
+- **GitPython**: High-level, Pythonic API ✅ RECOMMENDED
+- **pygit2**: Low-level, faster, complex
+- **subprocess**: Simple, works everywhere
+
+### **Layer 3: Config Discovery & Resolution**
+
+```python
+class ConfigDiscovery:
+    def __init__(self, source_manager):
+        self.source_manager = source_manager
+
+    def find_config(self, config_name, source=None):
+        """
+        Find config across sources
+
+        Args:
+            config_name: Name of config to find
+            source: Optional specific source name
+
+        Returns:
+            (source_name, config_path, config_data)
+        """
+        if source:
+            # Search in specific source only
+            return self._search_source(source, config_name)
+        else:
+            # Search all sources in priority order
+            for src in self.source_manager.get_sources_by_priority():
+                result = self._search_source(src['name'], config_name)
+                if result:
+                    return result
+            return None
+
+    def list_all_configs(self, source=None):
+        """List configs from one or all sources"""
+
+    def resolve_conflicts(self, config_name):
+        """Find all sources that have this config"""
+```
+
+### **Layer 4: Authentication & Security**
+
+```python
+class TokenManager:
+    def __init__(self):
+        self.use_keyring = self._check_keyring()
+
+    def _check_keyring(self):
+        """Check if keyring library available"""
+        try:
+            import keyring
+            return True
+        except ImportError:
+            return False
+
+    def store_token(self, source_name, token):
+        """Store token securely"""
+        if self.use_keyring:
+            import keyring
+            keyring.set_password("skill-seekers", source_name, token)
+        else:
+            # Fall back to env var prompt
+            print(f"Set environment variable: {source_name.upper()}_TOKEN")
+
+    def get_token(self, source_name, env_var=None):
+        """Retrieve token"""
+        # Try keyring first
+        if self.use_keyring:
+            import keyring
+            token = keyring.get_password("skill-seekers", source_name)
+            if token:
+                return token
+
+        # Try environment variable
+        if env_var:
+            return os.environ.get(env_var)
+
+        # Try default patterns
+        return os.environ.get(f"{source_name.upper()}_TOKEN")
+```
+
+---
+
+## 📊 Part 5: Use Case Matrix
+
+| Use Case | Users | Visibility | Auth | Priority |
+|----------|-------|------------|------|----------|
+| **Official Configs** | Everyone | Public | None | High |
+| **Team Configs** | 3-5 people | Private | GitHub Token | Medium |
+| **Personal Configs** | Individual | Private | GitHub Token | Low |
+| **Public Collections** | Community | Public | None | Medium |
+| **Enterprise Configs** | Organization | Private | GitLab Token | High |
+
+### **Scenario 1: Startup Team (5 developers)**
+
+**Setup:**
+```bash
+# Team lead creates private repo
+gh repo create startup/skill-configs --private
+cd startup-skill-configs
+mkdir -p official/internal-apis
+# Add configs for internal services
+git add . && git commit -m "Add internal API configs"
+git push
+```
+
+**Team Usage:**
+```python
+# Each developer adds source (one-time)
+add_config_source(
+    name='startup',
+    git_url='https://github.com/startup/skill-configs.git',
+    token='$GITHUB_TOKEN'
+)
+
+# Daily usage
+fetch_config(source='startup', config_name='backend-api')
+fetch_config(source='startup', config_name='frontend-components')
+fetch_config(source='startup', config_name='mobile-api')
+
+# Also use official configs
+fetch_config(config_name='react')  # From official
+```
+
+### **Scenario 2: Enterprise (500+ developers)**
+
+**Setup:**
+```bash
+# Multiple teams, multiple repos
+# Platform team
+gitlab.company.com/platform/skill-configs
+
+# Mobile team
+gitlab.company.com/mobile/skill-configs
+
+# Data team
+gitlab.company.com/data/skill-configs
+```
+
+**Usage:**
+```python
+# Central IT pre-configures sources
+add_config_source('official', '...', priority=1)
+add_config_source('platform', 'gitlab.company.com/platform/...', priority=2)
+add_config_source('mobile', 'gitlab.company.com/mobile/...', priority=3)
+add_config_source('data', 'gitlab.company.com/data/...', priority=4)
+
+# Developers use transparently
+fetch_config('internal-platform')  # Found in platform source
+fetch_config('react')  # Found in official
+fetch_config('company-data-api')  # Found in data source
+```
+
+### **Scenario 3: Open Source Curator**
+
+**Setup:**
+```bash
+# Community member creates curated collection
+gh repo create awesome-ai/skill-configs --public
+# Adds 50+ AI framework configs
+```
+
+**Community Usage:**
+```python
+# Anyone can add this public collection
+add_config_source(
+    name='ai-frameworks',
+    git_url='https://github.com/awesome-ai/skill-configs.git'
+)
+
+# Access curated configs
+fetch_config(source='ai-frameworks', list_available=true)
+# Shows: tensorflow, pytorch, jax, keras, transformers, etc.
+```
+
+---
+
+## 🎨 Part 6: Design Decisions & Trade-offs
+
+### **Decision 1: Git vs API vs Database**
+
+| Approach | Pros | Cons | Verdict |
+|----------|------|------|---------|
+| **Git repos** | - Version control<br>- Existing auth<br>- Offline capable<br>- Familiar | - Git dependency<br>- Clone overhead<br>- Disk space | ✅ **CHOOSE THIS** |
+| **Central API** | - Fast<br>- No git needed<br>- Easy search | - Single point of failure<br>- No offline<br>- Server costs | ❌ Not decentralized |
+| **Database** | - Fast queries<br>- Advanced search | - Complex setup<br>- Not portable | ❌ Over-engineered |
+
+**Winner**: Git repositories - aligns with developer workflows, decentralized, free hosting
+
+### **Decision 2: Caching Strategy**
+
+| Strategy | Disk Usage | Speed | Freshness | Verdict |
+|----------|------------|-------|-----------|---------|
+| **No cache** | None | Slow (clone each time) | Always fresh | ❌ Too slow |
+| **Full clone** | High (~50MB per repo) | Medium | Manual refresh | ⚠️ Acceptable |
+| **Shallow clone** | Low (~5MB per repo) | Fast | Manual refresh | ✅ **BEST** |
+| **Sparse checkout** | Minimal (~1MB) | Fast | Manual refresh | ✅ **IDEAL** |
+
+**Winner**: Shallow clone with TTL-based auto-refresh
+
+### **Decision 3: Token Storage**
+
+| Method | Security | Ease | Cross-platform | Verdict |
+|--------|----------|------|----------------|---------|
+| **Plain text** | ❌ Insecure | ✅ Easy | ✅ Yes | ❌ NO |
+| **Keyring** | ✅ Secure | ⚠️ Medium | ⚠️ Mostly | ✅ **PRIMARY** |
+| **Env vars only** | ⚠️ OK | ✅ Easy | ✅ Yes | ✅ **FALLBACK** |
+| **Encrypted file** | ⚠️ OK | ❌ Complex | ✅ Yes | ❌ Over-engineered |
+
+**Winner**: Keyring (primary) + Environment variables (fallback)
+
+---
+
+## 🛣️ Part 7: Implementation Roadmap
+
+### **Phase 1: Prototype (1-2 hours)**
+**Goal**: Prove the concept works
+
+```python
+# Just add git_url parameter to fetch_config
+fetch_config(
+    git_url='https://github.com/user/configs.git',
+    config_name='test'
+)
+# Temp clone, no caching, basic only
+```
+
+**Deliverable**: Working proof-of-concept
+
+### **Phase 2: Basic Multi-Source (3-4 hours) - A1.9**
+**Goal**: Production-ready multi-source support
+
+**New MCP Tools:**
+1. `add_config_source` - Register sources
+2. `list_config_sources` - Show registered sources
+3. `remove_config_source` - Unregister sources
+
+**Enhanced `fetch_config`:**
+- Add `source` parameter
+- Add `git_url` parameter
+- Add `branch` parameter
+- Add `token` parameter
+- Add `refresh` parameter
+
+**Infrastructure:**
+- SourceManager class
+- GitConfigRepo class
+- ~/.skill-seekers/sources.json
+- Shallow clone caching
+
+**Deliverable**: Team-ready multi-source system
+
+### **Phase 3: Advanced Features (4-6 hours)**
+**Goal**: Enterprise features
+
+**Features:**
+1. **Multi-source search**: Search config across all sources
+2. **Conflict resolution**: Show all sources with same config name
+3. **Token management**: Keyring integration
+4. **Auto-refresh**: TTL-based cache updates
+5. **Offline mode**: Work without network
+
+**Deliverable**: Enterprise-ready system
+
+### **Phase 4: Polish & UX (2-3 hours)**
+**Goal**: Great user experience
+
+**Features:**
+1. Better error messages
+2. Progress indicators for git ops
+3. Source validation (check URL before adding)
+4. Migration tool (convert old to new)
+5. Documentation & examples
+
+---
+
+## 🔒 Part 8: Security Considerations
+
+### **Threat Model**
+
+| Threat | Impact | Mitigation |
+|--------|--------|------------|
+| **Malicious git URL** | Code execution via git exploits | URL validation, shallow clone, sandboxing |
+| **Token exposure** | Unauthorized repo access | Keyring storage, never log tokens |
+| **Supply chain attack** | Malicious configs | Config validation, source trust levels |
+| **MITM attacks** | Token interception | HTTPS only, certificate verification |
+
+### **Security Measures**
+
+1. **URL Validation**:
+   ```python
+   def validate_git_url(url):
+       # Only allow https://, git@, file:// (file only in dev mode)
+       # Block suspicious patterns
+       # DNS lookup to prevent SSRF
+   ```
+
+2. **Token Handling**:
+   ```python
+   # NEVER do this:
+   logger.info(f"Using token: {token}")  # ❌
+
+   # DO this:
+   logger.info("Using token: <redacted>")  # ✅
+   ```
+
+3. **Config Sandboxing**:
+   ```python
+   # Validate configs from untrusted sources
+   ConfigValidator(untrusted_config).validate()
+   # Check for suspicious patterns
+   ```
+
+---
+
+## 💡 Part 9: Key Insights & Recommendations
+
+### **What Makes This Powerful**
+
+1. **Network Effects**: More sources → More configs → More value
+2. **Zero Lock-in**: Use any git hosting (GitHub, GitLab, Bitbucket, self-hosted)
+3. **Privacy First**: Keep sensitive configs private
+4. **Team-Friendly**: Perfect for 3-5 person teams
+5. **Decentralized**: No single point of failure
+
+### **Competitive Advantage**
+
+This makes Skill Seekers similar to:
+- **npm**: Multiple registries (npmjs.com + private)
+- **Docker**: Multiple registries (Docker Hub + private)
+- **PyPI**: Public + private package indexes
+- **Git**: Multiple remotes
+
+**But for CONFIG FILES instead of packages!**
+
+### **Business Model Implications**
+
+- **Official repo**: Free, public, community-driven
+- **Private repos**: Users bring their own (GitHub, GitLab)
+- **Enterprise features**: Could offer sync services, mirrors, caching
+- **Marketplace**: Future monetization via verified configs, premium features
+
+### **What to Build NEXT**
+
+**Immediate Priority:**
+1. **Fix A1.3**: Use proper ConfigValidator for submit_config
+2. **Start A1.9 Phase 1**: Prototype git_url parameter
+3. **Test with public repos**: Prove concept before private repos
+
+**This Week:**
+- A1.3 validation fix (30 minutes)
+- A1.9 Phase 1 prototype (2 hours)
+- A1.9 Phase 2 implementation (3-4 hours)
+
+**This Month:**
+- A1.9 Phase 3 (advanced features)
+- A1.7 (install_skill workflow)
+- Documentation & examples
+
+---
+
+## 🎯 Part 10: Action Items
+
+### **Critical (Do Now):**
+
+1. **Fix A1.3 Validation** ⚠️ HIGH PRIORITY
+   ```python
+   # In submit_config_tool, replace basic validation with:
+   from config_validator import ConfigValidator
+
+   try:
+       validator = ConfigValidator(config_data)
+       validator.validate()
+   except ValueError as e:
+       return error_with_details(e)
+   ```
+
+2. **Test A1.9 Concept**
+   ```python
+   # Quick prototype - add to fetch_config:
+   if git_url:
+       temp_dir = tempfile.mkdtemp()
+       subprocess.run(['git', 'clone', '--depth', '1', git_url, temp_dir])
+       # Read config from temp_dir
+   ```
+
+### **High Priority (This Week):**
+
+3. **Implement A1.9 Phase 2**
+   - SourceManager class
+   - add_config_source tool
+   - Enhanced fetch_config
+   - Caching infrastructure
+
+4. **Documentation**
+   - Update A1.9 issue with implementation plan
+   - Create MULTI_SOURCE_GUIDE.md
+   - Update README with examples
+
+### **Medium Priority (This Month):**
+
+5. **A1.7 - install_skill** (most user value!)
+6. **A1.4 - Static website** (visibility)
+7. **Polish & testing**
+
+---
+
+## 🤔 Open Questions for Discussion
+
+1. **Validation**: Should submit_config use full ConfigValidator or keep it simple?
+2. **Caching**: 24-hour TTL too long/short for team repos?
+3. **Priority**: Should A1.7 (install_skill) come before A1.9?
+4. **Security**: Keyring mandatory or optional?
+5. **UX**: Auto-refresh on every fetch vs manual refresh command?
+6. **Migration**: How to migrate existing users to multi-source model?
+
+---
+
+## 📈 Success Metrics
+
+### **A1.9 Success Criteria:**
+
+- [ ] Can add custom git repo as source
+- [ ] Can fetch config from private GitHub repo
+- [ ] Can fetch config from private GitLab repo
+- [ ] Caching works (no repeated clones)
+- [ ] Token auth works (HTTPS + token)
+- [ ] Multiple sources work simultaneously
+- [ ] Priority resolution works correctly
+- [ ] Offline mode works with cache
+- [ ] Documentation complete
+- [ ] Tests pass
+
+### **Adoption Goals:**
+
+- **Week 1**: 5 early adopters test private repos
+- **Month 1**: 10 teams using team-shared configs
+- **Month 3**: 50+ custom config sources registered
+- **Month 6**: Feature parity with npm's registry system
+
+---
+
+## 🎉 Conclusion
+
+**The Evolution:**
+```
+Current:  ONE official public repo
+↓
+A1.9:     MANY repos (public + private)
+↓
+Future:   ECOSYSTEM (marketplace, ratings, continuous updates)
+```
+
+**The Vision:**
+Transform Skill Seekers from a "tool with configs" into a "platform for config sharing" - the npm/PyPI of documentation configs.
+
+**Next Steps:**
+1. Fix A1.3 validation (30 min)
+2. Prototype A1.9 (2 hours)
+3. Implement A1.9 Phase 2 (3-4 hours)
+4. Merge and deploy! 🚀
--- a/src/skill_seekers/mcp/server.py
+++ b/src/skill_seekers/mcp/server.py
@@ -39,6 +39,13 @@ app = Server("skill-seeker") if MCP_AVAILABLE and Server is not None else None
 # Path to CLI tools
 CLI_DIR = Path(__file__).parent.parent / "cli"

+# Import config validator for submit_config validation
+sys.path.insert(0, str(CLI_DIR))
+try:
+    from config_validator import ConfigValidator
+except ImportError:
+    ConfigValidator = None  # Graceful degradation if not available
+
 # Helper decorator that works even when app is None
 def safe_decorator(decorator_func):
    """Returns the decorator if MCP is available, otherwise returns a no-op"""
@@ -440,7 +447,7 @@ async def list_tools() -> list[Tool]:
        ),
        Tool(
            name="submit_config",
-            description="Submit a custom config file to the community. Creates a GitHub issue in skill-seekers-configs repo for review.",
+            description="Submit a custom config file to the community. Validates config (legacy or unified format) and creates a GitHub issue in skill-seekers-configs repo for review.",
            inputSchema={
                "type": "object",
                "properties": {
@@ -1255,24 +1262,77 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
        else:
            return [TextContent(type="text", text="❌ Error: Must provide either config_path or config_json")]

-        # Validate required fields
-        required_fields = ["name", "description", "base_url"]
-        missing_fields = [field for field in required_fields if field not in config_data]
+        # Use ConfigValidator for comprehensive validation
+        if ConfigValidator is None:
+            return [TextContent(type="text", text="❌ Error: ConfigValidator not available. Please ensure config_validator.py is in the CLI directory.")]

-        if missing_fields:
-            return [TextContent(type="text", text=f"❌ Error: Missing required fields: {', '.join(missing_fields)}\n\nRequired: name, description, base_url")]
+        try:
+            validator = ConfigValidator(config_data)
+            validator.validate()

-        # Detect category
-        name_lower = config_name.lower()
-        category = "other"
-        if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
-            category = "web-frameworks"
-        elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
-            category = "game-engines"
-        elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
-            category = "devops"
-        elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
-            category = "css-frameworks"
+            # Get format info
+            is_unified = validator.is_unified
+            config_name = config_data.get("name", "unnamed")
+
+        except ValueError as validation_error:
+            # Provide detailed validation feedback
+            error_msg = f"""❌ Config validation failed:
+
+{str(validation_error)}
+
+Please fix these issues and try again.
+
+💡 Validation help:
+- Names: alphanumeric, hyphens, underscores only (e.g., "my-framework", "react_docs")
+- URLs: must start with http:// or https://
+- Selectors: should be a dict with keys like 'main_content', 'title', 'code_blocks'
+- Rate limit: non-negative number (default: 0.5)
+- Max pages: positive integer or -1 for unlimited
+
+📚 Example configs: https://github.com/yusufkaraaslan/skill-seekers-configs/tree/main/official
+"""
+            return [TextContent(type="text", text=error_msg)]
+
+        # Detect category based on config format and content
+        if is_unified:
+            # For unified configs, look at source types
+            source_types = [src.get('type') for src in config_data.get('sources', [])]
+            if 'documentation' in source_types and 'github' in source_types:
+                category = "multi-source"
+            elif 'documentation' in source_types and 'pdf' in source_types:
+                category = "multi-source"
+            elif len(source_types) > 1:
+                category = "multi-source"
+            else:
+                category = "unified"
+        else:
+            # For legacy configs, use name-based detection
+            name_lower = config_name.lower()
+            category = "other"
+            if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
+                category = "web-frameworks"
+            elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
+                category = "game-engines"
+            elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
+                category = "devops"
+            elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
+                category = "css-frameworks"
+
+        # Collect validation warnings
+        warnings = []
+        if not is_unified:
+            # Legacy config warnings
+            if 'max_pages' not in config_data:
+                warnings.append("⚠️ No max_pages set - will use default (100)")
+            elif config_data.get('max_pages') in (None, -1):
+                warnings.append("⚠️ Unlimited scraping enabled - may scrape thousands of pages and take hours")
+        else:
+            # Unified config warnings
+            for src in config_data.get('sources', []):
+                if src.get('type') == 'documentation' and 'max_pages' not in src:
+                    warnings.append(f"⚠️ No max_pages set for documentation source - will use default (100)")
+                elif src.get('type') == 'documentation' and src.get('max_pages') in (None, -1):
+                    warnings.append(f"⚠️ Unlimited scraping enabled for documentation source")

        # Check for GitHub token
        if not github_token:
@@ -1292,6 +1352,9 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
 ### Category
 {category}

+### Config Format
+{"Unified (multi-source)" if is_unified else "Legacy (single-source)"}
+
 ### Configuration JSON
 ```json
 {config_json_str}
@@ -1301,12 +1364,15 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
 {testing_notes if testing_notes else "Not provided"}

 ### Documentation URL
-{config_data.get('base_url', 'N/A')}
+{config_data.get('base_url') if not is_unified else 'See sources in config'}
+
+{"### Validation Warnings" if warnings else ""}
+{chr(10).join(f"- {w}" for w in warnings) if warnings else ""}

 ---

 ### Checklist
- [ ] Config validated
+- [x] Config validated with ConfigValidator
 - [ ] Test scraping completed
 - [ ] Added to appropriate category
 - [ ] API updated
--- a/tests/test_mcp_server.py
+++ b/tests/test_mcp_server.py
@@ -614,5 +614,160 @@ class TestMCPServerIntegration(unittest.IsolatedAsyncioTestCase):
            shutil.rmtree(temp_dir, ignore_errors=True)


+@unittest.skipUnless(MCP_AVAILABLE, "MCP package not installed")
+class TestSubmitConfigTool(unittest.IsolatedAsyncioTestCase):
+    """Test submit_config MCP tool"""
+
+    async def test_submit_config_requires_token(self):
+        """Should error without GitHub token"""
+        args = {
+            "config_json": '{"name": "test", "description": "Test", "base_url": "https://example.com"}'
+        }
+        result = await skill_seeker_server.submit_config_tool(args)
+        self.assertIn("GitHub token required", result[0].text)
+
+    async def test_submit_config_validates_required_fields(self):
+        """Should reject config missing required fields"""
+        args = {
+            "config_json": '{"name": "test"}',  # Missing description, base_url
+            "github_token": "fake_token"
+        }
+        result = await skill_seeker_server.submit_config_tool(args)
+        self.assertIn("validation failed", result[0].text.lower())
+        self.assertIn("description", result[0].text)
+
+    async def test_submit_config_validates_name_format(self):
+        """Should reject invalid name characters"""
+        args = {
+            "config_json": '{"name": "React@2024!", "description": "Test", "base_url": "https://example.com"}',
+            "github_token": "fake_token"
+        }
+        result = await skill_seeker_server.submit_config_tool(args)
+        self.assertIn("validation failed", result[0].text.lower())
+
+    async def test_submit_config_validates_url_format(self):
+        """Should reject invalid URL format"""
+        args = {
+            "config_json": '{"name": "test", "description": "Test", "base_url": "not-a-url"}',
+            "github_token": "fake_token"
+        }
+        result = await skill_seeker_server.submit_config_tool(args)
+        self.assertIn("validation failed", result[0].text.lower())
+
+    async def test_submit_config_accepts_legacy_format(self):
+        """Should accept valid legacy config"""
+        valid_config = {
+            "name": "testframework",
+            "description": "Test framework docs",
+            "base_url": "https://docs.test.com/",
+            "selectors": {
+                "main_content": "article",
+                "title": "h1",
+                "code_blocks": "pre code"
+            },
+            "max_pages": 100
+        }
+        args = {
+            "config_json": json.dumps(valid_config),
+            "github_token": "fake_token"
+        }
+
+        # Mock GitHub API call
+        with patch('github.Github') as mock_gh:
+            mock_repo = MagicMock()
+            mock_issue = MagicMock()
+            mock_issue.html_url = "https://github.com/test/issue/1"
+            mock_issue.number = 1
+            mock_repo.create_issue.return_value = mock_issue
+            mock_gh.return_value.get_repo.return_value = mock_repo
+
+            result = await skill_seeker_server.submit_config_tool(args)
+            self.assertIn("Config submitted successfully", result[0].text)
+            self.assertIn("https://github.com", result[0].text)
+
+    async def test_submit_config_accepts_unified_format(self):
+        """Should accept valid unified config"""
+        unified_config = {
+            "name": "testunified",
+            "description": "Test unified config",
+            "merge_mode": "rule-based",
+            "sources": [
+                {
+                    "type": "documentation",
+                    "base_url": "https://docs.test.com/",
+                    "max_pages": 100
+                },
+                {
+                    "type": "github",
+                    "repo": "testorg/testrepo"
+                }
+            ]
+        }
+        args = {
+            "config_json": json.dumps(unified_config),
+            "github_token": "fake_token"
+        }
+
+        with patch('github.Github') as mock_gh:
+            mock_repo = MagicMock()
+            mock_issue = MagicMock()
+            mock_issue.html_url = "https://github.com/test/issue/2"
+            mock_issue.number = 2
+            mock_repo.create_issue.return_value = mock_issue
+            mock_gh.return_value.get_repo.return_value = mock_repo
+
+            result = await skill_seeker_server.submit_config_tool(args)
+            self.assertIn("Config submitted successfully", result[0].text)
+            self.assertTrue("Unified" in result[0].text or "multi-source" in result[0].text)
+
+    async def test_submit_config_from_file_path(self):
+        """Should accept config_path parameter"""
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
+            json.dump({
+                "name": "testfile",
+                "description": "From file",
+                "base_url": "https://test.com/"
+            }, f)
+            temp_path = f.name
+
+        try:
+            args = {
+                "config_path": temp_path,
+                "github_token": "fake_token"
+            }
+
+            with patch('github.Github') as mock_gh:
+                mock_repo = MagicMock()
+                mock_issue = MagicMock()
+                mock_issue.html_url = "https://github.com/test/issue/3"
+                mock_issue.number = 3
+                mock_repo.create_issue.return_value = mock_issue
+                mock_gh.return_value.get_repo.return_value = mock_repo
+
+                result = await skill_seeker_server.submit_config_tool(args)
+                self.assertIn("Config submitted successfully", result[0].text)
+        finally:
+            os.unlink(temp_path)
+
+    async def test_submit_config_detects_category(self):
+        """Should auto-detect category from config name"""
+        args = {
+            "config_json": '{"name": "react-test", "description": "React", "base_url": "https://react.dev/"}',
+            "github_token": "fake_token"
+        }
+
+        with patch('github.Github') as mock_gh:
+            mock_repo = MagicMock()
+            mock_issue = MagicMock()
+            mock_issue.html_url = "https://github.com/test/issue/4"
+            mock_issue.number = 4
+            mock_repo.create_issue.return_value = mock_issue
+            mock_gh.return_value.get_repo.return_value = mock_repo
+
+            result = await skill_seeker_server.submit_config_tool(args)
+            # Verify category appears in result
+            self.assertTrue("web-frameworks" in result[0].text or "Category" in result[0].text)
+
+
 if __name__ == '__main__':
    unittest.main()