diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9d0141b..3694324 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,19 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
+### Fixed
+- **submit_config MCP tool** - Comprehensive validation and format support ([#11](https://github.com/yusufkaraaslan/Skill_Seekers/issues/11))
+ - Now uses ConfigValidator for comprehensive validation (previously only checked 3 fields)
+ - Validates name format (alphanumeric, hyphens, underscores only)
+ - Validates URL formats (must start with http:// or https://)
+ - Validates selectors, patterns, rate limits, and max_pages
+ - **Supports both legacy and unified config formats**
+ - Provides detailed error messages with validation failures and examples
+ - Adds warnings for unlimited scraping configurations
+ - Enhanced category detection for multi-source configs
+ - 8 comprehensive test cases added to test_mcp_server.py
+ - Updated GitHub issue template with format type and validation warnings
+
---
## [2.1.1] - 2025-11-30
diff --git a/EVOLUTION_ANALYSIS.md b/EVOLUTION_ANALYSIS.md
new file mode 100644
index 0000000..fd34211
--- /dev/null
+++ b/EVOLUTION_ANALYSIS.md
@@ -0,0 +1,710 @@
+# Skill Seekers Evolution Analysis
+**Date**: 2025-12-21
+**Focus**: A1.3 Completion + A1.9 Multi-Source Architecture
+
+---
+
+## 🔍 Part 1: A1.3 Implementation Gap Analysis
+
+### What We Built vs What Was Required
+
+#### ✅ **Completed Requirements:**
+1. MCP tool `submit_config` - ✅ DONE
+2. Creates GitHub issue in skill-seekers-configs repo - ✅ DONE
+3. Uses issue template format - ✅ DONE
+4. Auto-labels (config-submission, needs-review) - ✅ DONE
+5. Returns GitHub issue URL - ✅ DONE
+6. Accepts config_path or config_json - ✅ DONE
+7. Validates required fields - ✅ DONE (basic)
+
+#### ❌ **Missing/Incomplete:**
+1. **Robust Validation** - Issue says "same validation as `validate_config` tool"
+ - **Current**: Only checks `name`, `description`, `base_url` exist
+ - **Should**: Use `config_validator.py` which validates:
+ - URL formats (http/https)
+ - Selector structure
+ - Pattern arrays
+ - Unified vs legacy format
+ - Source types (documentation, github, pdf)
+ - Merge modes
+ - All nested fields
+
+2. **URL Validation** - Not checking if URLs are actually valid
+ - **Current**: Just checks if `base_url` exists
+ - **Should**: Validate URL format, check reachability (optional)
+
+3. **Schema Validation** - Not using the full validator
+ - **Current**: Manual field checks
+ - **Should**: `ConfigValidator(config_data).validate()`
+
+### 🔧 **What Needs to be Fixed:**
+
+```python
+# CURRENT (submit_config_tool):
+required_fields = ["name", "description", "base_url"]
+missing_fields = [field for field in required_fields if field not in config_data]
+# Basic but incomplete
+
+# SHOULD BE:
+from config_validator import ConfigValidator
+validator = ConfigValidator(config_data)
+try:
+ validator.validate() # Comprehensive validation
+except ValueError as e:
+ return error_message(str(e))
+```
+
+---
+
+## 🚀 Part 2: A1.9 Multi-Source Architecture - The Big Picture
+
+### Current State: Single Source System
+
+```
+User → fetch_config → API → skill-seekers-configs (GitHub) → Download
+```
+
+**Limitations:**
+- Only ONE source of configs (official public repo)
+- Can't use private configs
+- Can't share configs within teams
+- Can't create custom collections
+- Centralized dependency
+
+### Future State: Multi-Source Federation
+
+```
+User → fetch_config → Source Manager → [
+ Priority 1: Official (public)
+ Priority 2: Team Private Repo
+ Priority 3: Personal Configs
+ Priority 4: Custom Collections
+] → Download
+```
+
+**Capabilities:**
+- Multiple config sources
+- Public + Private repos
+- Team collaboration
+- Personal configs
+- Custom curated collections
+- Decentralized, federated system
+
+---
+
+## 🎯 Part 3: Evolution Vision - The Three Horizons
+
+### **Horizon 1: Official Configs (CURRENT - A1.1 to A1.3)**
+✅ **Status**: Complete
+**What**: Single public repository (skill-seekers-configs)
+**Users**: Everyone, public community
+**Paradigm**: Centralized, curated, verified configs
+
+### **Horizon 2: Multi-Source Federation (A1.9)**
+🔨 **Status**: Proposed
+**What**: Support multiple git repositories as config sources
+**Users**: Teams (3-5 people), organizations, individuals
+**Paradigm**: Decentralized, federated, user-controlled
+
+**Key Features:**
+- Direct git URL support
+- Named sources (register once, use many times)
+- Authentication (GitHub/GitLab/Bitbucket tokens)
+- Caching (local clones)
+- Priority-based resolution
+- Public OR private repos
+
+**Implementation:**
+```python
+# Option 1: Direct URL (one-off)
+fetch_config(
+ git_url='https://github.com/myteam/configs.git',
+ config_name='internal-api',
+ token='$GITHUB_TOKEN'
+)
+
+# Option 2: Named source (reusable)
+add_config_source(
+ name='team',
+ git_url='https://github.com/myteam/configs.git',
+ token='$GITHUB_TOKEN'
+)
+fetch_config(source='team', config_name='internal-api')
+
+# Option 3: Config file
+# ~/.skill-seekers/sources.json
+{
+ "sources": [
+ {"name": "official", "git_url": "...", "priority": 1},
+ {"name": "team", "git_url": "...", "priority": 2, "token": "$TOKEN"}
+ ]
+}
+```
+
+### **Horizon 3: Skill Marketplace (Future - A1.13+)**
+💭 **Status**: Vision
+**What**: Full ecosystem of shareable configs AND skills
+**Users**: Entire community, marketplace dynamics
+**Paradigm**: Platform, network effects, curation
+
+**Key Features:**
+- Browse all public sources
+- Star/rate configs
+- Download counts, popularity
+- Verified configs (badge system)
+- Share built skills (not just configs)
+- Continuous updates (watch repos)
+- Notifications
+
+---
+
+## 🏗️ Part 4: Technical Architecture for A1.9
+
+### **Layer 1: Source Management**
+
+```python
+# ~/.skill-seekers/sources.json
+{
+ "version": "1.0",
+ "default_source": "official",
+ "sources": [
+ {
+ "name": "official",
+ "type": "git",
+ "git_url": "https://github.com/yusufkaraaslan/skill-seekers-configs.git",
+ "branch": "main",
+ "enabled": true,
+ "priority": 1,
+ "cache_ttl": 86400 # 24 hours
+ },
+ {
+ "name": "team",
+ "type": "git",
+ "git_url": "https://github.com/myteam/private-configs.git",
+ "branch": "main",
+ "token_env": "TEAM_GITHUB_TOKEN",
+ "enabled": true,
+ "priority": 2,
+ "cache_ttl": 3600 # 1 hour
+ }
+ ]
+}
+```
+
+**Source Manager Class:**
+```python
+class SourceManager:
+ def __init__(self, config_file="~/.skill-seekers/sources.json"):
+ self.config_file = Path(config_file).expanduser()
+ self.sources = self.load_sources()
+
+ def add_source(self, name, git_url, token=None, priority=None):
+ """Register a new config source"""
+
+ def remove_source(self, name):
+ """Remove a registered source"""
+
+ def list_sources(self):
+ """List all registered sources"""
+
+ def get_source(self, name):
+ """Get source by name"""
+
+ def search_config(self, config_name):
+ """Search for config across all sources (priority order)"""
+```
+
+### **Layer 2: Git Operations**
+
+```python
+class GitConfigRepo:
+ def __init__(self, source_config):
+ self.url = source_config['git_url']
+ self.branch = source_config.get('branch', 'main')
+ self.cache_dir = Path("~/.skill-seekers/cache") / source_config['name']
+ self.token = self._get_token(source_config)
+
+ def clone_or_update(self):
+ """Clone if not exists, else pull"""
+ if not self.cache_dir.exists():
+ self._clone()
+ else:
+ self._pull()
+
+ def _clone(self):
+ """Shallow clone for efficiency"""
+ # git clone --depth 1 --branch {branch} {url} {cache_dir}
+
+ def _pull(self):
+ """Update existing clone"""
+ # git -C {cache_dir} pull
+
+ def list_configs(self):
+ """Scan cache_dir for .json files"""
+
+ def get_config(self, config_name):
+ """Read specific config file"""
+```
+
+**Library Choice:**
+- **GitPython**: High-level, Pythonic API ✅ RECOMMENDED
+- **pygit2**: Low-level, faster, complex
+- **subprocess**: Simple, works everywhere
+
+### **Layer 3: Config Discovery & Resolution**
+
+```python
+class ConfigDiscovery:
+ def __init__(self, source_manager):
+ self.source_manager = source_manager
+
+ def find_config(self, config_name, source=None):
+ """
+ Find config across sources
+
+ Args:
+ config_name: Name of config to find
+ source: Optional specific source name
+
+ Returns:
+ (source_name, config_path, config_data)
+ """
+ if source:
+ # Search in specific source only
+ return self._search_source(source, config_name)
+ else:
+ # Search all sources in priority order
+ for src in self.source_manager.get_sources_by_priority():
+ result = self._search_source(src['name'], config_name)
+ if result:
+ return result
+ return None
+
+ def list_all_configs(self, source=None):
+ """List configs from one or all sources"""
+
+ def resolve_conflicts(self, config_name):
+ """Find all sources that have this config"""
+```
+
+### **Layer 4: Authentication & Security**
+
+```python
+class TokenManager:
+ def __init__(self):
+ self.use_keyring = self._check_keyring()
+
+ def _check_keyring(self):
+ """Check if keyring library available"""
+ try:
+ import keyring
+ return True
+ except ImportError:
+ return False
+
+ def store_token(self, source_name, token):
+ """Store token securely"""
+ if self.use_keyring:
+ import keyring
+ keyring.set_password("skill-seekers", source_name, token)
+ else:
+ # Fall back to env var prompt
+ print(f"Set environment variable: {source_name.upper()}_TOKEN")
+
+ def get_token(self, source_name, env_var=None):
+ """Retrieve token"""
+ # Try keyring first
+ if self.use_keyring:
+ import keyring
+ token = keyring.get_password("skill-seekers", source_name)
+ if token:
+ return token
+
+ # Try environment variable
+ if env_var:
+ return os.environ.get(env_var)
+
+ # Try default patterns
+ return os.environ.get(f"{source_name.upper()}_TOKEN")
+```
+
+---
+
+## 📊 Part 5: Use Case Matrix
+
+| Use Case | Users | Visibility | Auth | Priority |
+|----------|-------|------------|------|----------|
+| **Official Configs** | Everyone | Public | None | High |
+| **Team Configs** | 3-5 people | Private | GitHub Token | Medium |
+| **Personal Configs** | Individual | Private | GitHub Token | Low |
+| **Public Collections** | Community | Public | None | Medium |
+| **Enterprise Configs** | Organization | Private | GitLab Token | High |
+
+### **Scenario 1: Startup Team (5 developers)**
+
+**Setup:**
+```bash
+# Team lead creates private repo
+gh repo create startup/skill-configs --private
+cd startup-skill-configs
+mkdir -p official/internal-apis
+# Add configs for internal services
+git add . && git commit -m "Add internal API configs"
+git push
+```
+
+**Team Usage:**
+```python
+# Each developer adds source (one-time)
+add_config_source(
+ name='startup',
+ git_url='https://github.com/startup/skill-configs.git',
+ token='$GITHUB_TOKEN'
+)
+
+# Daily usage
+fetch_config(source='startup', config_name='backend-api')
+fetch_config(source='startup', config_name='frontend-components')
+fetch_config(source='startup', config_name='mobile-api')
+
+# Also use official configs
+fetch_config(config_name='react') # From official
+```
+
+### **Scenario 2: Enterprise (500+ developers)**
+
+**Setup:**
+```bash
+# Multiple teams, multiple repos
+# Platform team
+gitlab.company.com/platform/skill-configs
+
+# Mobile team
+gitlab.company.com/mobile/skill-configs
+
+# Data team
+gitlab.company.com/data/skill-configs
+```
+
+**Usage:**
+```python
+# Central IT pre-configures sources
+add_config_source('official', '...', priority=1)
+add_config_source('platform', 'gitlab.company.com/platform/...', priority=2)
+add_config_source('mobile', 'gitlab.company.com/mobile/...', priority=3)
+add_config_source('data', 'gitlab.company.com/data/...', priority=4)
+
+# Developers use transparently
+fetch_config('internal-platform') # Found in platform source
+fetch_config('react') # Found in official
+fetch_config('company-data-api') # Found in data source
+```
+
+### **Scenario 3: Open Source Curator**
+
+**Setup:**
+```bash
+# Community member creates curated collection
+gh repo create awesome-ai/skill-configs --public
+# Adds 50+ AI framework configs
+```
+
+**Community Usage:**
+```python
+# Anyone can add this public collection
+add_config_source(
+ name='ai-frameworks',
+ git_url='https://github.com/awesome-ai/skill-configs.git'
+)
+
+# Access curated configs
+fetch_config(source='ai-frameworks', list_available=true)
+# Shows: tensorflow, pytorch, jax, keras, transformers, etc.
+```
+
+---
+
+## 🎨 Part 6: Design Decisions & Trade-offs
+
+### **Decision 1: Git vs API vs Database**
+
+| Approach | Pros | Cons | Verdict |
+|----------|------|------|---------|
+| **Git repos** | - Version control
- Existing auth
- Offline capable
- Familiar | - Git dependency
- Clone overhead
- Disk space | ✅ **CHOOSE THIS** |
+| **Central API** | - Fast
- No git needed
- Easy search | - Single point of failure
- No offline
- Server costs | ❌ Not decentralized |
+| **Database** | - Fast queries
- Advanced search | - Complex setup
- Not portable | ❌ Over-engineered |
+
+**Winner**: Git repositories - aligns with developer workflows, decentralized, free hosting
+
+### **Decision 2: Caching Strategy**
+
+| Strategy | Disk Usage | Speed | Freshness | Verdict |
+|----------|------------|-------|-----------|---------|
+| **No cache** | None | Slow (clone each time) | Always fresh | ❌ Too slow |
+| **Full clone** | High (~50MB per repo) | Medium | Manual refresh | ⚠️ Acceptable |
+| **Shallow clone** | Low (~5MB per repo) | Fast | Manual refresh | ✅ **BEST** |
+| **Sparse checkout** | Minimal (~1MB) | Fast | Manual refresh | ✅ **IDEAL** |
+
+**Winner**: Shallow clone with TTL-based auto-refresh
+
+### **Decision 3: Token Storage**
+
+| Method | Security | Ease | Cross-platform | Verdict |
+|--------|----------|------|----------------|---------|
+| **Plain text** | ❌ Insecure | ✅ Easy | ✅ Yes | ❌ NO |
+| **Keyring** | ✅ Secure | ⚠️ Medium | ⚠️ Mostly | ✅ **PRIMARY** |
+| **Env vars only** | ⚠️ OK | ✅ Easy | ✅ Yes | ✅ **FALLBACK** |
+| **Encrypted file** | ⚠️ OK | ❌ Complex | ✅ Yes | ❌ Over-engineered |
+
+**Winner**: Keyring (primary) + Environment variables (fallback)
+
+---
+
+## 🛣️ Part 7: Implementation Roadmap
+
+### **Phase 1: Prototype (1-2 hours)**
+**Goal**: Prove the concept works
+
+```python
+# Just add git_url parameter to fetch_config
+fetch_config(
+ git_url='https://github.com/user/configs.git',
+ config_name='test'
+)
+# Temp clone, no caching, basic only
+```
+
+**Deliverable**: Working proof-of-concept
+
+### **Phase 2: Basic Multi-Source (3-4 hours) - A1.9**
+**Goal**: Production-ready multi-source support
+
+**New MCP Tools:**
+1. `add_config_source` - Register sources
+2. `list_config_sources` - Show registered sources
+3. `remove_config_source` - Unregister sources
+
+**Enhanced `fetch_config`:**
+- Add `source` parameter
+- Add `git_url` parameter
+- Add `branch` parameter
+- Add `token` parameter
+- Add `refresh` parameter
+
+**Infrastructure:**
+- SourceManager class
+- GitConfigRepo class
+- ~/.skill-seekers/sources.json
+- Shallow clone caching
+
+**Deliverable**: Team-ready multi-source system
+
+### **Phase 3: Advanced Features (4-6 hours)**
+**Goal**: Enterprise features
+
+**Features:**
+1. **Multi-source search**: Search config across all sources
+2. **Conflict resolution**: Show all sources with same config name
+3. **Token management**: Keyring integration
+4. **Auto-refresh**: TTL-based cache updates
+5. **Offline mode**: Work without network
+
+**Deliverable**: Enterprise-ready system
+
+### **Phase 4: Polish & UX (2-3 hours)**
+**Goal**: Great user experience
+
+**Features:**
+1. Better error messages
+2. Progress indicators for git ops
+3. Source validation (check URL before adding)
+4. Migration tool (convert old to new)
+5. Documentation & examples
+
+---
+
+## 🔒 Part 8: Security Considerations
+
+### **Threat Model**
+
+| Threat | Impact | Mitigation |
+|--------|--------|------------|
+| **Malicious git URL** | Code execution via git exploits | URL validation, shallow clone, sandboxing |
+| **Token exposure** | Unauthorized repo access | Keyring storage, never log tokens |
+| **Supply chain attack** | Malicious configs | Config validation, source trust levels |
+| **MITM attacks** | Token interception | HTTPS only, certificate verification |
+
+### **Security Measures**
+
+1. **URL Validation**:
+ ```python
+ def validate_git_url(url):
+ # Only allow https://, git@, file:// (file only in dev mode)
+ # Block suspicious patterns
+ # DNS lookup to prevent SSRF
+ ```
+
+2. **Token Handling**:
+ ```python
+ # NEVER do this:
+ logger.info(f"Using token: {token}") # ❌
+
+ # DO this:
+ logger.info("Using token: ") # ✅
+ ```
+
+3. **Config Sandboxing**:
+ ```python
+ # Validate configs from untrusted sources
+ ConfigValidator(untrusted_config).validate()
+ # Check for suspicious patterns
+ ```
+
+---
+
+## 💡 Part 9: Key Insights & Recommendations
+
+### **What Makes This Powerful**
+
+1. **Network Effects**: More sources → More configs → More value
+2. **Zero Lock-in**: Use any git hosting (GitHub, GitLab, Bitbucket, self-hosted)
+3. **Privacy First**: Keep sensitive configs private
+4. **Team-Friendly**: Perfect for 3-5 person teams
+5. **Decentralized**: No single point of failure
+
+### **Competitive Advantage**
+
+This makes Skill Seekers similar to:
+- **npm**: Multiple registries (npmjs.com + private)
+- **Docker**: Multiple registries (Docker Hub + private)
+- **PyPI**: Public + private package indexes
+- **Git**: Multiple remotes
+
+**But for CONFIG FILES instead of packages!**
+
+### **Business Model Implications**
+
+- **Official repo**: Free, public, community-driven
+- **Private repos**: Users bring their own (GitHub, GitLab)
+- **Enterprise features**: Could offer sync services, mirrors, caching
+- **Marketplace**: Future monetization via verified configs, premium features
+
+### **What to Build NEXT**
+
+**Immediate Priority:**
+1. **Fix A1.3**: Use proper ConfigValidator for submit_config
+2. **Start A1.9 Phase 1**: Prototype git_url parameter
+3. **Test with public repos**: Prove concept before private repos
+
+**This Week:**
+- A1.3 validation fix (30 minutes)
+- A1.9 Phase 1 prototype (2 hours)
+- A1.9 Phase 2 implementation (3-4 hours)
+
+**This Month:**
+- A1.9 Phase 3 (advanced features)
+- A1.7 (install_skill workflow)
+- Documentation & examples
+
+---
+
+## 🎯 Part 10: Action Items
+
+### **Critical (Do Now):**
+
+1. **Fix A1.3 Validation** ⚠️ HIGH PRIORITY
+ ```python
+ # In submit_config_tool, replace basic validation with:
+ from config_validator import ConfigValidator
+
+ try:
+ validator = ConfigValidator(config_data)
+ validator.validate()
+ except ValueError as e:
+ return error_with_details(e)
+ ```
+
+2. **Test A1.9 Concept**
+ ```python
+ # Quick prototype - add to fetch_config:
+ if git_url:
+ temp_dir = tempfile.mkdtemp()
+ subprocess.run(['git', 'clone', '--depth', '1', git_url, temp_dir])
+ # Read config from temp_dir
+ ```
+
+### **High Priority (This Week):**
+
+3. **Implement A1.9 Phase 2**
+ - SourceManager class
+ - add_config_source tool
+ - Enhanced fetch_config
+ - Caching infrastructure
+
+4. **Documentation**
+ - Update A1.9 issue with implementation plan
+ - Create MULTI_SOURCE_GUIDE.md
+ - Update README with examples
+
+### **Medium Priority (This Month):**
+
+5. **A1.7 - install_skill** (most user value!)
+6. **A1.4 - Static website** (visibility)
+7. **Polish & testing**
+
+---
+
+## 🤔 Open Questions for Discussion
+
+1. **Validation**: Should submit_config use full ConfigValidator or keep it simple?
+2. **Caching**: 24-hour TTL too long/short for team repos?
+3. **Priority**: Should A1.7 (install_skill) come before A1.9?
+4. **Security**: Keyring mandatory or optional?
+5. **UX**: Auto-refresh on every fetch vs manual refresh command?
+6. **Migration**: How to migrate existing users to multi-source model?
+
+---
+
+## 📈 Success Metrics
+
+### **A1.9 Success Criteria:**
+
+- [ ] Can add custom git repo as source
+- [ ] Can fetch config from private GitHub repo
+- [ ] Can fetch config from private GitLab repo
+- [ ] Caching works (no repeated clones)
+- [ ] Token auth works (HTTPS + token)
+- [ ] Multiple sources work simultaneously
+- [ ] Priority resolution works correctly
+- [ ] Offline mode works with cache
+- [ ] Documentation complete
+- [ ] Tests pass
+
+### **Adoption Goals:**
+
+- **Week 1**: 5 early adopters test private repos
+- **Month 1**: 10 teams using team-shared configs
+- **Month 3**: 50+ custom config sources registered
+- **Month 6**: Feature parity with npm's registry system
+
+---
+
+## 🎉 Conclusion
+
+**The Evolution:**
+```
+Current: ONE official public repo
+↓
+A1.9: MANY repos (public + private)
+↓
+Future: ECOSYSTEM (marketplace, ratings, continuous updates)
+```
+
+**The Vision:**
+Transform Skill Seekers from a "tool with configs" into a "platform for config sharing" - the npm/PyPI of documentation configs.
+
+**Next Steps:**
+1. Fix A1.3 validation (30 min)
+2. Prototype A1.9 (2 hours)
+3. Implement A1.9 Phase 2 (3-4 hours)
+4. Merge and deploy! 🚀
diff --git a/src/skill_seekers/mcp/server.py b/src/skill_seekers/mcp/server.py
index da4f4c3..27686fd 100644
--- a/src/skill_seekers/mcp/server.py
+++ b/src/skill_seekers/mcp/server.py
@@ -39,6 +39,13 @@ app = Server("skill-seeker") if MCP_AVAILABLE and Server is not None else None
# Path to CLI tools
CLI_DIR = Path(__file__).parent.parent / "cli"
+# Import config validator for submit_config validation
+sys.path.insert(0, str(CLI_DIR))
+try:
+ from config_validator import ConfigValidator
+except ImportError:
+ ConfigValidator = None # Graceful degradation if not available
+
# Helper decorator that works even when app is None
def safe_decorator(decorator_func):
"""Returns the decorator if MCP is available, otherwise returns a no-op"""
@@ -440,7 +447,7 @@ async def list_tools() -> list[Tool]:
),
Tool(
name="submit_config",
- description="Submit a custom config file to the community. Creates a GitHub issue in skill-seekers-configs repo for review.",
+ description="Submit a custom config file to the community. Validates config (legacy or unified format) and creates a GitHub issue in skill-seekers-configs repo for review.",
inputSchema={
"type": "object",
"properties": {
@@ -1255,24 +1262,77 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
else:
return [TextContent(type="text", text="❌ Error: Must provide either config_path or config_json")]
- # Validate required fields
- required_fields = ["name", "description", "base_url"]
- missing_fields = [field for field in required_fields if field not in config_data]
+ # Use ConfigValidator for comprehensive validation
+ if ConfigValidator is None:
+ return [TextContent(type="text", text="❌ Error: ConfigValidator not available. Please ensure config_validator.py is in the CLI directory.")]
- if missing_fields:
- return [TextContent(type="text", text=f"❌ Error: Missing required fields: {', '.join(missing_fields)}\n\nRequired: name, description, base_url")]
+ try:
+ validator = ConfigValidator(config_data)
+ validator.validate()
- # Detect category
- name_lower = config_name.lower()
- category = "other"
- if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
- category = "web-frameworks"
- elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
- category = "game-engines"
- elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
- category = "devops"
- elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
- category = "css-frameworks"
+ # Get format info
+ is_unified = validator.is_unified
+ config_name = config_data.get("name", "unnamed")
+
+ except ValueError as validation_error:
+ # Provide detailed validation feedback
+ error_msg = f"""❌ Config validation failed:
+
+{str(validation_error)}
+
+Please fix these issues and try again.
+
+💡 Validation help:
+- Names: alphanumeric, hyphens, underscores only (e.g., "my-framework", "react_docs")
+- URLs: must start with http:// or https://
+- Selectors: should be a dict with keys like 'main_content', 'title', 'code_blocks'
+- Rate limit: non-negative number (default: 0.5)
+- Max pages: positive integer or -1 for unlimited
+
+📚 Example configs: https://github.com/yusufkaraaslan/skill-seekers-configs/tree/main/official
+"""
+ return [TextContent(type="text", text=error_msg)]
+
+ # Detect category based on config format and content
+ if is_unified:
+ # For unified configs, look at source types
+ source_types = [src.get('type') for src in config_data.get('sources', [])]
+ if 'documentation' in source_types and 'github' in source_types:
+ category = "multi-source"
+ elif 'documentation' in source_types and 'pdf' in source_types:
+ category = "multi-source"
+ elif len(source_types) > 1:
+ category = "multi-source"
+ else:
+ category = "unified"
+ else:
+ # For legacy configs, use name-based detection
+ name_lower = config_name.lower()
+ category = "other"
+ if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
+ category = "web-frameworks"
+ elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
+ category = "game-engines"
+ elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
+ category = "devops"
+ elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
+ category = "css-frameworks"
+
+ # Collect validation warnings
+ warnings = []
+ if not is_unified:
+ # Legacy config warnings
+ if 'max_pages' not in config_data:
+ warnings.append("⚠️ No max_pages set - will use default (100)")
+ elif config_data.get('max_pages') in (None, -1):
+ warnings.append("⚠️ Unlimited scraping enabled - may scrape thousands of pages and take hours")
+ else:
+ # Unified config warnings
+ for src in config_data.get('sources', []):
+ if src.get('type') == 'documentation' and 'max_pages' not in src:
+ warnings.append(f"⚠️ No max_pages set for documentation source - will use default (100)")
+ elif src.get('type') == 'documentation' and src.get('max_pages') in (None, -1):
+ warnings.append(f"⚠️ Unlimited scraping enabled for documentation source")
# Check for GitHub token
if not github_token:
@@ -1292,6 +1352,9 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
### Category
{category}
+### Config Format
+{"Unified (multi-source)" if is_unified else "Legacy (single-source)"}
+
### Configuration JSON
```json
{config_json_str}
@@ -1301,12 +1364,15 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
{testing_notes if testing_notes else "Not provided"}
### Documentation URL
-{config_data.get('base_url', 'N/A')}
+{config_data.get('base_url') if not is_unified else 'See sources in config'}
+
+{"### Validation Warnings" if warnings else ""}
+{chr(10).join(f"- {w}" for w in warnings) if warnings else ""}
---
### Checklist
-- [ ] Config validated
+- [x] Config validated with ConfigValidator
- [ ] Test scraping completed
- [ ] Added to appropriate category
- [ ] API updated
diff --git a/tests/test_mcp_server.py b/tests/test_mcp_server.py
index 421cb56..3093e08 100644
--- a/tests/test_mcp_server.py
+++ b/tests/test_mcp_server.py
@@ -614,5 +614,160 @@ class TestMCPServerIntegration(unittest.IsolatedAsyncioTestCase):
shutil.rmtree(temp_dir, ignore_errors=True)
+@unittest.skipUnless(MCP_AVAILABLE, "MCP package not installed")
+class TestSubmitConfigTool(unittest.IsolatedAsyncioTestCase):
+ """Test submit_config MCP tool"""
+
+ async def test_submit_config_requires_token(self):
+ """Should error without GitHub token"""
+ args = {
+ "config_json": '{"name": "test", "description": "Test", "base_url": "https://example.com"}'
+ }
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("GitHub token required", result[0].text)
+
+ async def test_submit_config_validates_required_fields(self):
+ """Should reject config missing required fields"""
+ args = {
+ "config_json": '{"name": "test"}', # Missing description, base_url
+ "github_token": "fake_token"
+ }
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("validation failed", result[0].text.lower())
+ self.assertIn("description", result[0].text)
+
+ async def test_submit_config_validates_name_format(self):
+ """Should reject invalid name characters"""
+ args = {
+ "config_json": '{"name": "React@2024!", "description": "Test", "base_url": "https://example.com"}',
+ "github_token": "fake_token"
+ }
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("validation failed", result[0].text.lower())
+
+ async def test_submit_config_validates_url_format(self):
+ """Should reject invalid URL format"""
+ args = {
+ "config_json": '{"name": "test", "description": "Test", "base_url": "not-a-url"}',
+ "github_token": "fake_token"
+ }
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("validation failed", result[0].text.lower())
+
+ async def test_submit_config_accepts_legacy_format(self):
+ """Should accept valid legacy config"""
+ valid_config = {
+ "name": "testframework",
+ "description": "Test framework docs",
+ "base_url": "https://docs.test.com/",
+ "selectors": {
+ "main_content": "article",
+ "title": "h1",
+ "code_blocks": "pre code"
+ },
+ "max_pages": 100
+ }
+ args = {
+ "config_json": json.dumps(valid_config),
+ "github_token": "fake_token"
+ }
+
+ # Mock GitHub API call
+ with patch('github.Github') as mock_gh:
+ mock_repo = MagicMock()
+ mock_issue = MagicMock()
+ mock_issue.html_url = "https://github.com/test/issue/1"
+ mock_issue.number = 1
+ mock_repo.create_issue.return_value = mock_issue
+ mock_gh.return_value.get_repo.return_value = mock_repo
+
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("Config submitted successfully", result[0].text)
+ self.assertIn("https://github.com", result[0].text)
+
+ async def test_submit_config_accepts_unified_format(self):
+ """Should accept valid unified config"""
+ unified_config = {
+ "name": "testunified",
+ "description": "Test unified config",
+ "merge_mode": "rule-based",
+ "sources": [
+ {
+ "type": "documentation",
+ "base_url": "https://docs.test.com/",
+ "max_pages": 100
+ },
+ {
+ "type": "github",
+ "repo": "testorg/testrepo"
+ }
+ ]
+ }
+ args = {
+ "config_json": json.dumps(unified_config),
+ "github_token": "fake_token"
+ }
+
+ with patch('github.Github') as mock_gh:
+ mock_repo = MagicMock()
+ mock_issue = MagicMock()
+ mock_issue.html_url = "https://github.com/test/issue/2"
+ mock_issue.number = 2
+ mock_repo.create_issue.return_value = mock_issue
+ mock_gh.return_value.get_repo.return_value = mock_repo
+
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("Config submitted successfully", result[0].text)
+ self.assertTrue("Unified" in result[0].text or "multi-source" in result[0].text)
+
+ async def test_submit_config_from_file_path(self):
+ """Should accept config_path parameter"""
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
+ json.dump({
+ "name": "testfile",
+ "description": "From file",
+ "base_url": "https://test.com/"
+ }, f)
+ temp_path = f.name
+
+ try:
+ args = {
+ "config_path": temp_path,
+ "github_token": "fake_token"
+ }
+
+ with patch('github.Github') as mock_gh:
+ mock_repo = MagicMock()
+ mock_issue = MagicMock()
+ mock_issue.html_url = "https://github.com/test/issue/3"
+ mock_issue.number = 3
+ mock_repo.create_issue.return_value = mock_issue
+ mock_gh.return_value.get_repo.return_value = mock_repo
+
+ result = await skill_seeker_server.submit_config_tool(args)
+ self.assertIn("Config submitted successfully", result[0].text)
+ finally:
+ os.unlink(temp_path)
+
+ async def test_submit_config_detects_category(self):
+ """Should auto-detect category from config name"""
+ args = {
+ "config_json": '{"name": "react-test", "description": "React", "base_url": "https://react.dev/"}',
+ "github_token": "fake_token"
+ }
+
+ with patch('github.Github') as mock_gh:
+ mock_repo = MagicMock()
+ mock_issue = MagicMock()
+ mock_issue.html_url = "https://github.com/test/issue/4"
+ mock_issue.number = 4
+ mock_repo.create_issue.return_value = mock_issue
+ mock_gh.return_value.get_repo.return_value = mock_repo
+
+ result = await skill_seeker_server.submit_config_tool(args)
+ # Verify category appears in result
+ self.assertTrue("web-frameworks" in result[0].text or "Category" in result[0].text)
+
+
if __name__ == '__main__':
unittest.main()