fix(A1.3): Add comprehensive validation to submit_config MCP tool

Issue: #11 (A1.3 - Add MCP tool to submit custom configs) ## Summary Fixed submit_config MCP tool to use ConfigValidator for comprehensive validation instead of basic 3-field checks. Now supports both legacy and unified config formats with detailed error messages and validation warnings. ## Critical Gaps Fixed (6 total) 1. ✅ Missing comprehensive validation (HIGH) - Only checked 3 fields 2. ✅ No unified config support (HIGH) - Couldn't handle multi-source configs 3. ✅ No test coverage (MEDIUM) - Zero tests for submit_config_tool 4. ✅ No URL format validation (MEDIUM) - Accepted malformed URLs 5. ✅ No warnings for unlimited scraping (LOW) - Silent config issues 6. ✅ No url_patterns validation (MEDIUM) - No selector structure checks ## Changes Made ### Phase 1: Validation Logic (server.py lines 1224-1380) - Added ConfigValidator import with graceful degradation - Replaced basic validation (3 fields) with comprehensive ConfigValidator.validate() - Enhanced category detection for unified multi-source configs - Added validation warnings collection (unlimited scraping, missing max_pages) - Updated GitHub issue template with: * Config format type (Unified vs Legacy) * Validation warnings section * Updated documentation URL handling for unified configs * Checklist showing "Config validated with ConfigValidator" ### Phase 2: Test Coverage (test_mcp_server.py lines 617-769) Added 8 comprehensive test cases: 1. test_submit_config_requires_token - GitHub token requirement 2. test_submit_config_validates_required_fields - Required field validation 3. test_submit_config_validates_name_format - Name format validation 4. test_submit_config_validates_url_format - URL format validation 5. test_submit_config_accepts_legacy_format - Legacy config acceptance 6. test_submit_config_accepts_unified_format - Unified config acceptance 7. test_submit_config_from_file_path - File path input support 8. test_submit_config_detects_category - Category auto-detection ### Phase 3: Documentation Updates - Updated Issue #11 with completion notes - Updated tool description to mention format support - Updated CHANGELOG.md with fix details - Added EVOLUTION_ANALYSIS.md for deep architecture analysis ## Validation Improvements ### Before: ```python required_fields = ["name", "description", "base_url"] missing_fields = [field for field in required_fields if field not in config_data] if missing_fields: return error ``` ### After: ```python validator = ConfigValidator(config_data) validator.validate() # Comprehensive validation: # - Name format (alphanumeric, hyphens, underscores only) # - URL formats (must start with http:// or https://) # - Selectors structure (dict with proper keys) # - Rate limits (non-negative numbers) # - Max pages (positive integer or -1) # - Supports both legacy AND unified formats # - Provides detailed error messages with examples ``` ## Test Results ✅ All 427 tests passing (no regressions) ✅ 8 new tests for submit_config_tool ✅ No breaking changes ## Files Modified - src/skill_seekers/mcp/server.py (157 lines changed) - tests/test_mcp_server.py (157 lines added) - CHANGELOG.md (12 lines added) - EVOLUTION_ANALYSIS.md (500+ lines, new file) ## Issue Resolution Closes #11 - A1.3 now fully implemented with comprehensive validation, test coverage, and support for both config formats. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 18:32:20 +03:00
parent 1e50290fc7
commit cee3fcf025
4 changed files with 963 additions and 19 deletions
--- a/src/skill_seekers/mcp/server.py
+++ b/src/skill_seekers/mcp/server.py
@@ -39,6 +39,13 @@ app = Server("skill-seeker") if MCP_AVAILABLE and Server is not None else None
 # Path to CLI tools
 CLI_DIR = Path(__file__).parent.parent / "cli"

+# Import config validator for submit_config validation
+sys.path.insert(0, str(CLI_DIR))
+try:
+    from config_validator import ConfigValidator
+except ImportError:
+    ConfigValidator = None  # Graceful degradation if not available
+
 # Helper decorator that works even when app is None
 def safe_decorator(decorator_func):
    """Returns the decorator if MCP is available, otherwise returns a no-op"""
@@ -440,7 +447,7 @@ async def list_tools() -> list[Tool]:
        ),
        Tool(
            name="submit_config",
-            description="Submit a custom config file to the community. Creates a GitHub issue in skill-seekers-configs repo for review.",
+            description="Submit a custom config file to the community. Validates config (legacy or unified format) and creates a GitHub issue in skill-seekers-configs repo for review.",
            inputSchema={
                "type": "object",
                "properties": {
@@ -1255,24 +1262,77 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
        else:
            return [TextContent(type="text", text="❌ Error: Must provide either config_path or config_json")]

-        # Validate required fields
-        required_fields = ["name", "description", "base_url"]
-        missing_fields = [field for field in required_fields if field not in config_data]
+        # Use ConfigValidator for comprehensive validation
+        if ConfigValidator is None:
+            return [TextContent(type="text", text="❌ Error: ConfigValidator not available. Please ensure config_validator.py is in the CLI directory.")]

-        if missing_fields:
-            return [TextContent(type="text", text=f"❌ Error: Missing required fields: {', '.join(missing_fields)}\n\nRequired: name, description, base_url")]
+        try:
+            validator = ConfigValidator(config_data)
+            validator.validate()

-        # Detect category
-        name_lower = config_name.lower()
-        category = "other"
-        if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
-            category = "web-frameworks"
-        elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
-            category = "game-engines"
-        elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
-            category = "devops"
-        elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
-            category = "css-frameworks"
+            # Get format info
+            is_unified = validator.is_unified
+            config_name = config_data.get("name", "unnamed")
+
+        except ValueError as validation_error:
+            # Provide detailed validation feedback
+            error_msg = f"""❌ Config validation failed:
+
+{str(validation_error)}
+
+Please fix these issues and try again.
+
+💡 Validation help:
+- Names: alphanumeric, hyphens, underscores only (e.g., "my-framework", "react_docs")
+- URLs: must start with http:// or https://
+- Selectors: should be a dict with keys like 'main_content', 'title', 'code_blocks'
+- Rate limit: non-negative number (default: 0.5)
+- Max pages: positive integer or -1 for unlimited
+
+📚 Example configs: https://github.com/yusufkaraaslan/skill-seekers-configs/tree/main/official
+"""
+            return [TextContent(type="text", text=error_msg)]
+
+        # Detect category based on config format and content
+        if is_unified:
+            # For unified configs, look at source types
+            source_types = [src.get('type') for src in config_data.get('sources', [])]
+            if 'documentation' in source_types and 'github' in source_types:
+                category = "multi-source"
+            elif 'documentation' in source_types and 'pdf' in source_types:
+                category = "multi-source"
+            elif len(source_types) > 1:
+                category = "multi-source"
+            else:
+                category = "unified"
+        else:
+            # For legacy configs, use name-based detection
+            name_lower = config_name.lower()
+            category = "other"
+            if any(x in name_lower for x in ["react", "vue", "django", "laravel", "fastapi", "astro", "hono"]):
+                category = "web-frameworks"
+            elif any(x in name_lower for x in ["godot", "unity", "unreal"]):
+                category = "game-engines"
+            elif any(x in name_lower for x in ["kubernetes", "ansible", "docker"]):
+                category = "devops"
+            elif any(x in name_lower for x in ["tailwind", "bootstrap", "bulma"]):
+                category = "css-frameworks"
+
+        # Collect validation warnings
+        warnings = []
+        if not is_unified:
+            # Legacy config warnings
+            if 'max_pages' not in config_data:
+                warnings.append("⚠️ No max_pages set - will use default (100)")
+            elif config_data.get('max_pages') in (None, -1):
+                warnings.append("⚠️ Unlimited scraping enabled - may scrape thousands of pages and take hours")
+        else:
+            # Unified config warnings
+            for src in config_data.get('sources', []):
+                if src.get('type') == 'documentation' and 'max_pages' not in src:
+                    warnings.append(f"⚠️ No max_pages set for documentation source - will use default (100)")
+                elif src.get('type') == 'documentation' and src.get('max_pages') in (None, -1):
+                    warnings.append(f"⚠️ Unlimited scraping enabled for documentation source")

        # Check for GitHub token
        if not github_token:
@@ -1292,6 +1352,9 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
 ### Category
 {category}

+### Config Format
+{"Unified (multi-source)" if is_unified else "Legacy (single-source)"}
+
 ### Configuration JSON
 ```json
 {config_json_str}
@@ -1301,12 +1364,15 @@ async def submit_config_tool(args: dict) -> list[TextContent]:
 {testing_notes if testing_notes else "Not provided"}

 ### Documentation URL
-{config_data.get('base_url', 'N/A')}
+{config_data.get('base_url') if not is_unified else 'See sources in config'}
+
+{"### Validation Warnings" if warnings else ""}
+{chr(10).join(f"- {w}" for w in warnings) if warnings else ""}

 ---

 ### Checklist
- [ ] Config validated
+- [x] Config validated with ConfigValidator
 - [ ] Test scraping completed
 - [ ] Added to appropriate category
 - [ ] API updated