Issue: #11 (A1.3 - Add MCP tool to submit custom configs) ## Summary Fixed submit_config MCP tool to use ConfigValidator for comprehensive validation instead of basic 3-field checks. Now supports both legacy and unified config formats with detailed error messages and validation warnings. ## Critical Gaps Fixed (6 total) 1. ✅ Missing comprehensive validation (HIGH) - Only checked 3 fields 2. ✅ No unified config support (HIGH) - Couldn't handle multi-source configs 3. ✅ No test coverage (MEDIUM) - Zero tests for submit_config_tool 4. ✅ No URL format validation (MEDIUM) - Accepted malformed URLs 5. ✅ No warnings for unlimited scraping (LOW) - Silent config issues 6. ✅ No url_patterns validation (MEDIUM) - No selector structure checks ## Changes Made ### Phase 1: Validation Logic (server.py lines 1224-1380) - Added ConfigValidator import with graceful degradation - Replaced basic validation (3 fields) with comprehensive ConfigValidator.validate() - Enhanced category detection for unified multi-source configs - Added validation warnings collection (unlimited scraping, missing max_pages) - Updated GitHub issue template with: * Config format type (Unified vs Legacy) * Validation warnings section * Updated documentation URL handling for unified configs * Checklist showing "Config validated with ConfigValidator" ### Phase 2: Test Coverage (test_mcp_server.py lines 617-769) Added 8 comprehensive test cases: 1. test_submit_config_requires_token - GitHub token requirement 2. test_submit_config_validates_required_fields - Required field validation 3. test_submit_config_validates_name_format - Name format validation 4. test_submit_config_validates_url_format - URL format validation 5. test_submit_config_accepts_legacy_format - Legacy config acceptance 6. test_submit_config_accepts_unified_format - Unified config acceptance 7. test_submit_config_from_file_path - File path input support 8. test_submit_config_detects_category - Category auto-detection ### Phase 3: Documentation Updates - Updated Issue #11 with completion notes - Updated tool description to mention format support - Updated CHANGELOG.md with fix details - Added EVOLUTION_ANALYSIS.md for deep architecture analysis ## Validation Improvements ### Before: ```python required_fields = ["name", "description", "base_url"] missing_fields = [field for field in required_fields if field not in config_data] if missing_fields: return error ``` ### After: ```python validator = ConfigValidator(config_data) validator.validate() # Comprehensive validation: # - Name format (alphanumeric, hyphens, underscores only) # - URL formats (must start with http:// or https://) # - Selectors structure (dict with proper keys) # - Rate limits (non-negative numbers) # - Max pages (positive integer or -1) # - Supports both legacy AND unified formats # - Provides detailed error messages with examples ``` ## Test Results ✅ All 427 tests passing (no regressions) ✅ 8 new tests for submit_config_tool ✅ No breaking changes ## Files Modified - src/skill_seekers/mcp/server.py (157 lines changed) - tests/test_mcp_server.py (157 lines added) - CHANGELOG.md (12 lines added) - EVOLUTION_ANALYSIS.md (500+ lines, new file) ## Issue Resolution Closes #11 - A1.3 now fully implemented with comprehensive validation, test coverage, and support for both config formats. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
20 KiB
Skill Seekers Evolution Analysis
Date: 2025-12-21 Focus: A1.3 Completion + A1.9 Multi-Source Architecture
🔍 Part 1: A1.3 Implementation Gap Analysis
What We Built vs What Was Required
✅ Completed Requirements:
- MCP tool
submit_config- ✅ DONE - Creates GitHub issue in skill-seekers-configs repo - ✅ DONE
- Uses issue template format - ✅ DONE
- Auto-labels (config-submission, needs-review) - ✅ DONE
- Returns GitHub issue URL - ✅ DONE
- Accepts config_path or config_json - ✅ DONE
- Validates required fields - ✅ DONE (basic)
❌ Missing/Incomplete:
-
Robust Validation - Issue says "same validation as
validate_configtool"- Current: Only checks
name,description,base_urlexist - Should: Use
config_validator.pywhich validates:- URL formats (http/https)
- Selector structure
- Pattern arrays
- Unified vs legacy format
- Source types (documentation, github, pdf)
- Merge modes
- All nested fields
- Current: Only checks
-
URL Validation - Not checking if URLs are actually valid
- Current: Just checks if
base_urlexists - Should: Validate URL format, check reachability (optional)
- Current: Just checks if
-
Schema Validation - Not using the full validator
- Current: Manual field checks
- Should:
ConfigValidator(config_data).validate()
🔧 What Needs to be Fixed:
# CURRENT (submit_config_tool):
required_fields = ["name", "description", "base_url"]
missing_fields = [field for field in required_fields if field not in config_data]
# Basic but incomplete
# SHOULD BE:
from config_validator import ConfigValidator
validator = ConfigValidator(config_data)
try:
validator.validate() # Comprehensive validation
except ValueError as e:
return error_message(str(e))
🚀 Part 2: A1.9 Multi-Source Architecture - The Big Picture
Current State: Single Source System
User → fetch_config → API → skill-seekers-configs (GitHub) → Download
Limitations:
- Only ONE source of configs (official public repo)
- Can't use private configs
- Can't share configs within teams
- Can't create custom collections
- Centralized dependency
Future State: Multi-Source Federation
User → fetch_config → Source Manager → [
Priority 1: Official (public)
Priority 2: Team Private Repo
Priority 3: Personal Configs
Priority 4: Custom Collections
] → Download
Capabilities:
- Multiple config sources
- Public + Private repos
- Team collaboration
- Personal configs
- Custom curated collections
- Decentralized, federated system
🎯 Part 3: Evolution Vision - The Three Horizons
Horizon 1: Official Configs (CURRENT - A1.1 to A1.3)
✅ Status: Complete What: Single public repository (skill-seekers-configs) Users: Everyone, public community Paradigm: Centralized, curated, verified configs
Horizon 2: Multi-Source Federation (A1.9)
🔨 Status: Proposed What: Support multiple git repositories as config sources Users: Teams (3-5 people), organizations, individuals Paradigm: Decentralized, federated, user-controlled
Key Features:
- Direct git URL support
- Named sources (register once, use many times)
- Authentication (GitHub/GitLab/Bitbucket tokens)
- Caching (local clones)
- Priority-based resolution
- Public OR private repos
Implementation:
# Option 1: Direct URL (one-off)
fetch_config(
git_url='https://github.com/myteam/configs.git',
config_name='internal-api',
token='$GITHUB_TOKEN'
)
# Option 2: Named source (reusable)
add_config_source(
name='team',
git_url='https://github.com/myteam/configs.git',
token='$GITHUB_TOKEN'
)
fetch_config(source='team', config_name='internal-api')
# Option 3: Config file
# ~/.skill-seekers/sources.json
{
"sources": [
{"name": "official", "git_url": "...", "priority": 1},
{"name": "team", "git_url": "...", "priority": 2, "token": "$TOKEN"}
]
}
Horizon 3: Skill Marketplace (Future - A1.13+)
💭 Status: Vision What: Full ecosystem of shareable configs AND skills Users: Entire community, marketplace dynamics Paradigm: Platform, network effects, curation
Key Features:
- Browse all public sources
- Star/rate configs
- Download counts, popularity
- Verified configs (badge system)
- Share built skills (not just configs)
- Continuous updates (watch repos)
- Notifications
🏗️ Part 4: Technical Architecture for A1.9
Layer 1: Source Management
# ~/.skill-seekers/sources.json
{
"version": "1.0",
"default_source": "official",
"sources": [
{
"name": "official",
"type": "git",
"git_url": "https://github.com/yusufkaraaslan/skill-seekers-configs.git",
"branch": "main",
"enabled": true,
"priority": 1,
"cache_ttl": 86400 # 24 hours
},
{
"name": "team",
"type": "git",
"git_url": "https://github.com/myteam/private-configs.git",
"branch": "main",
"token_env": "TEAM_GITHUB_TOKEN",
"enabled": true,
"priority": 2,
"cache_ttl": 3600 # 1 hour
}
]
}
Source Manager Class:
class SourceManager:
def __init__(self, config_file="~/.skill-seekers/sources.json"):
self.config_file = Path(config_file).expanduser()
self.sources = self.load_sources()
def add_source(self, name, git_url, token=None, priority=None):
"""Register a new config source"""
def remove_source(self, name):
"""Remove a registered source"""
def list_sources(self):
"""List all registered sources"""
def get_source(self, name):
"""Get source by name"""
def search_config(self, config_name):
"""Search for config across all sources (priority order)"""
Layer 2: Git Operations
class GitConfigRepo:
def __init__(self, source_config):
self.url = source_config['git_url']
self.branch = source_config.get('branch', 'main')
self.cache_dir = Path("~/.skill-seekers/cache") / source_config['name']
self.token = self._get_token(source_config)
def clone_or_update(self):
"""Clone if not exists, else pull"""
if not self.cache_dir.exists():
self._clone()
else:
self._pull()
def _clone(self):
"""Shallow clone for efficiency"""
# git clone --depth 1 --branch {branch} {url} {cache_dir}
def _pull(self):
"""Update existing clone"""
# git -C {cache_dir} pull
def list_configs(self):
"""Scan cache_dir for .json files"""
def get_config(self, config_name):
"""Read specific config file"""
Library Choice:
- GitPython: High-level, Pythonic API ✅ RECOMMENDED
- pygit2: Low-level, faster, complex
- subprocess: Simple, works everywhere
Layer 3: Config Discovery & Resolution
class ConfigDiscovery:
def __init__(self, source_manager):
self.source_manager = source_manager
def find_config(self, config_name, source=None):
"""
Find config across sources
Args:
config_name: Name of config to find
source: Optional specific source name
Returns:
(source_name, config_path, config_data)
"""
if source:
# Search in specific source only
return self._search_source(source, config_name)
else:
# Search all sources in priority order
for src in self.source_manager.get_sources_by_priority():
result = self._search_source(src['name'], config_name)
if result:
return result
return None
def list_all_configs(self, source=None):
"""List configs from one or all sources"""
def resolve_conflicts(self, config_name):
"""Find all sources that have this config"""
Layer 4: Authentication & Security
class TokenManager:
def __init__(self):
self.use_keyring = self._check_keyring()
def _check_keyring(self):
"""Check if keyring library available"""
try:
import keyring
return True
except ImportError:
return False
def store_token(self, source_name, token):
"""Store token securely"""
if self.use_keyring:
import keyring
keyring.set_password("skill-seekers", source_name, token)
else:
# Fall back to env var prompt
print(f"Set environment variable: {source_name.upper()}_TOKEN")
def get_token(self, source_name, env_var=None):
"""Retrieve token"""
# Try keyring first
if self.use_keyring:
import keyring
token = keyring.get_password("skill-seekers", source_name)
if token:
return token
# Try environment variable
if env_var:
return os.environ.get(env_var)
# Try default patterns
return os.environ.get(f"{source_name.upper()}_TOKEN")
📊 Part 5: Use Case Matrix
| Use Case | Users | Visibility | Auth | Priority |
|---|---|---|---|---|
| Official Configs | Everyone | Public | None | High |
| Team Configs | 3-5 people | Private | GitHub Token | Medium |
| Personal Configs | Individual | Private | GitHub Token | Low |
| Public Collections | Community | Public | None | Medium |
| Enterprise Configs | Organization | Private | GitLab Token | High |
Scenario 1: Startup Team (5 developers)
Setup:
# Team lead creates private repo
gh repo create startup/skill-configs --private
cd startup-skill-configs
mkdir -p official/internal-apis
# Add configs for internal services
git add . && git commit -m "Add internal API configs"
git push
Team Usage:
# Each developer adds source (one-time)
add_config_source(
name='startup',
git_url='https://github.com/startup/skill-configs.git',
token='$GITHUB_TOKEN'
)
# Daily usage
fetch_config(source='startup', config_name='backend-api')
fetch_config(source='startup', config_name='frontend-components')
fetch_config(source='startup', config_name='mobile-api')
# Also use official configs
fetch_config(config_name='react') # From official
Scenario 2: Enterprise (500+ developers)
Setup:
# Multiple teams, multiple repos
# Platform team
gitlab.company.com/platform/skill-configs
# Mobile team
gitlab.company.com/mobile/skill-configs
# Data team
gitlab.company.com/data/skill-configs
Usage:
# Central IT pre-configures sources
add_config_source('official', '...', priority=1)
add_config_source('platform', 'gitlab.company.com/platform/...', priority=2)
add_config_source('mobile', 'gitlab.company.com/mobile/...', priority=3)
add_config_source('data', 'gitlab.company.com/data/...', priority=4)
# Developers use transparently
fetch_config('internal-platform') # Found in platform source
fetch_config('react') # Found in official
fetch_config('company-data-api') # Found in data source
Scenario 3: Open Source Curator
Setup:
# Community member creates curated collection
gh repo create awesome-ai/skill-configs --public
# Adds 50+ AI framework configs
Community Usage:
# Anyone can add this public collection
add_config_source(
name='ai-frameworks',
git_url='https://github.com/awesome-ai/skill-configs.git'
)
# Access curated configs
fetch_config(source='ai-frameworks', list_available=true)
# Shows: tensorflow, pytorch, jax, keras, transformers, etc.
🎨 Part 6: Design Decisions & Trade-offs
Decision 1: Git vs API vs Database
| Approach | Pros | Cons | Verdict |
|---|---|---|---|
| Git repos | - Version control - Existing auth - Offline capable - Familiar |
- Git dependency - Clone overhead - Disk space |
✅ CHOOSE THIS |
| Central API | - Fast - No git needed - Easy search |
- Single point of failure - No offline - Server costs |
❌ Not decentralized |
| Database | - Fast queries - Advanced search |
- Complex setup - Not portable |
❌ Over-engineered |
Winner: Git repositories - aligns with developer workflows, decentralized, free hosting
Decision 2: Caching Strategy
| Strategy | Disk Usage | Speed | Freshness | Verdict |
|---|---|---|---|---|
| No cache | None | Slow (clone each time) | Always fresh | ❌ Too slow |
| Full clone | High (~50MB per repo) | Medium | Manual refresh | ⚠️ Acceptable |
| Shallow clone | Low (~5MB per repo) | Fast | Manual refresh | ✅ BEST |
| Sparse checkout | Minimal (~1MB) | Fast | Manual refresh | ✅ IDEAL |
Winner: Shallow clone with TTL-based auto-refresh
Decision 3: Token Storage
| Method | Security | Ease | Cross-platform | Verdict |
|---|---|---|---|---|
| Plain text | ❌ Insecure | ✅ Easy | ✅ Yes | ❌ NO |
| Keyring | ✅ Secure | ⚠️ Medium | ⚠️ Mostly | ✅ PRIMARY |
| Env vars only | ⚠️ OK | ✅ Easy | ✅ Yes | ✅ FALLBACK |
| Encrypted file | ⚠️ OK | ❌ Complex | ✅ Yes | ❌ Over-engineered |
Winner: Keyring (primary) + Environment variables (fallback)
🛣️ Part 7: Implementation Roadmap
Phase 1: Prototype (1-2 hours)
Goal: Prove the concept works
# Just add git_url parameter to fetch_config
fetch_config(
git_url='https://github.com/user/configs.git',
config_name='test'
)
# Temp clone, no caching, basic only
Deliverable: Working proof-of-concept
Phase 2: Basic Multi-Source (3-4 hours) - A1.9
Goal: Production-ready multi-source support
New MCP Tools:
add_config_source- Register sourceslist_config_sources- Show registered sourcesremove_config_source- Unregister sources
Enhanced fetch_config:
- Add
sourceparameter - Add
git_urlparameter - Add
branchparameter - Add
tokenparameter - Add
refreshparameter
Infrastructure:
- SourceManager class
- GitConfigRepo class
- ~/.skill-seekers/sources.json
- Shallow clone caching
Deliverable: Team-ready multi-source system
Phase 3: Advanced Features (4-6 hours)
Goal: Enterprise features
Features:
- Multi-source search: Search config across all sources
- Conflict resolution: Show all sources with same config name
- Token management: Keyring integration
- Auto-refresh: TTL-based cache updates
- Offline mode: Work without network
Deliverable: Enterprise-ready system
Phase 4: Polish & UX (2-3 hours)
Goal: Great user experience
Features:
- Better error messages
- Progress indicators for git ops
- Source validation (check URL before adding)
- Migration tool (convert old to new)
- Documentation & examples
🔒 Part 8: Security Considerations
Threat Model
| Threat | Impact | Mitigation |
|---|---|---|
| Malicious git URL | Code execution via git exploits | URL validation, shallow clone, sandboxing |
| Token exposure | Unauthorized repo access | Keyring storage, never log tokens |
| Supply chain attack | Malicious configs | Config validation, source trust levels |
| MITM attacks | Token interception | HTTPS only, certificate verification |
Security Measures
-
URL Validation:
def validate_git_url(url): # Only allow https://, git@, file:// (file only in dev mode) # Block suspicious patterns # DNS lookup to prevent SSRF -
Token Handling:
# NEVER do this: logger.info(f"Using token: {token}") # ❌ # DO this: logger.info("Using token: <redacted>") # ✅ -
Config Sandboxing:
# Validate configs from untrusted sources ConfigValidator(untrusted_config).validate() # Check for suspicious patterns
💡 Part 9: Key Insights & Recommendations
What Makes This Powerful
- Network Effects: More sources → More configs → More value
- Zero Lock-in: Use any git hosting (GitHub, GitLab, Bitbucket, self-hosted)
- Privacy First: Keep sensitive configs private
- Team-Friendly: Perfect for 3-5 person teams
- Decentralized: No single point of failure
Competitive Advantage
This makes Skill Seekers similar to:
- npm: Multiple registries (npmjs.com + private)
- Docker: Multiple registries (Docker Hub + private)
- PyPI: Public + private package indexes
- Git: Multiple remotes
But for CONFIG FILES instead of packages!
Business Model Implications
- Official repo: Free, public, community-driven
- Private repos: Users bring their own (GitHub, GitLab)
- Enterprise features: Could offer sync services, mirrors, caching
- Marketplace: Future monetization via verified configs, premium features
What to Build NEXT
Immediate Priority:
- Fix A1.3: Use proper ConfigValidator for submit_config
- Start A1.9 Phase 1: Prototype git_url parameter
- Test with public repos: Prove concept before private repos
This Week:
- A1.3 validation fix (30 minutes)
- A1.9 Phase 1 prototype (2 hours)
- A1.9 Phase 2 implementation (3-4 hours)
This Month:
- A1.9 Phase 3 (advanced features)
- A1.7 (install_skill workflow)
- Documentation & examples
🎯 Part 10: Action Items
Critical (Do Now):
-
Fix A1.3 Validation ⚠️ HIGH PRIORITY
# In submit_config_tool, replace basic validation with: from config_validator import ConfigValidator try: validator = ConfigValidator(config_data) validator.validate() except ValueError as e: return error_with_details(e) -
Test A1.9 Concept
# Quick prototype - add to fetch_config: if git_url: temp_dir = tempfile.mkdtemp() subprocess.run(['git', 'clone', '--depth', '1', git_url, temp_dir]) # Read config from temp_dir
High Priority (This Week):
-
Implement A1.9 Phase 2
- SourceManager class
- add_config_source tool
- Enhanced fetch_config
- Caching infrastructure
-
Documentation
- Update A1.9 issue with implementation plan
- Create MULTI_SOURCE_GUIDE.md
- Update README with examples
Medium Priority (This Month):
- A1.7 - install_skill (most user value!)
- A1.4 - Static website (visibility)
- Polish & testing
🤔 Open Questions for Discussion
- Validation: Should submit_config use full ConfigValidator or keep it simple?
- Caching: 24-hour TTL too long/short for team repos?
- Priority: Should A1.7 (install_skill) come before A1.9?
- Security: Keyring mandatory or optional?
- UX: Auto-refresh on every fetch vs manual refresh command?
- Migration: How to migrate existing users to multi-source model?
📈 Success Metrics
A1.9 Success Criteria:
- Can add custom git repo as source
- Can fetch config from private GitHub repo
- Can fetch config from private GitLab repo
- Caching works (no repeated clones)
- Token auth works (HTTPS + token)
- Multiple sources work simultaneously
- Priority resolution works correctly
- Offline mode works with cache
- Documentation complete
- Tests pass
Adoption Goals:
- Week 1: 5 early adopters test private repos
- Month 1: 10 teams using team-shared configs
- Month 3: 50+ custom config sources registered
- Month 6: Feature parity with npm's registry system
🎉 Conclusion
The Evolution:
Current: ONE official public repo
↓
A1.9: MANY repos (public + private)
↓
Future: ECOSYSTEM (marketplace, ratings, continuous updates)
The Vision: Transform Skill Seekers from a "tool with configs" into a "platform for config sharing" - the npm/PyPI of documentation configs.
Next Steps:
- Fix A1.3 validation (30 min)
- Prototype A1.9 (2 hours)
- Implement A1.9 Phase 2 (3-4 hours)
- Merge and deploy! 🚀