Phase 4 Complete: - Updated README.md with git source usage examples and use cases - Created docs/GIT_CONFIG_SOURCES.md (800+ lines comprehensive guide) - Updated CHANGELOG.md with v2.2.0 release notes - Added configs/example-team/ example repository with E2E test Documentation covers: - Quick start and architecture - MCP tools reference (4 tools with examples) - Authentication for GitHub, GitLab, Bitbucket - Use cases (small teams, enterprise, open source) - Best practices, troubleshooting, advanced topics - Complete API reference Example repository includes: - 3 example configs (react-custom, vue-internal, company-api) - README with usage guide - E2E test script (7 steps, 100% passing) 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
20 KiB
Git-Based Config Sources - Complete Guide
Version: v2.2.0 Feature: A1.9 - Multi-Source Git Repository Support Last Updated: December 21, 2025
Table of Contents
- Overview
- Quick Start
- Architecture
- MCP Tools Reference
- Authentication
- Use Cases
- Best Practices
- Troubleshooting
- Advanced Topics
Overview
What is this feature?
Git-based config sources allow you to fetch config files from private/team git repositories in addition to the public API. This unlocks:
- 🔐 Private configs - Company/internal documentation
- 👥 Team collaboration - Share configs across 3-5 person teams
- 🏢 Enterprise scale - Support 500+ developers
- 📦 Custom collections - Curated config repositories
- 🌐 Decentralized - Like npm (public + private registries)
How it works
User → fetch_config(source="team", config_name="react-custom")
↓
SourceManager (~/.skill-seekers/sources.json)
↓
GitConfigRepo (clone/pull with GitPython)
↓
Local cache (~/.skill-seekers/cache/team/)
↓
Config JSON returned
Three modes
-
API Mode (existing, unchanged)
fetch_config(config_name="react")- Fetches from api.skillseekersweb.com
-
Source Mode (NEW - recommended)
fetch_config(source="team", config_name="react-custom")- Uses registered git source
-
Git URL Mode (NEW - one-time)
fetch_config(git_url="https://...", config_name="react-custom")- Direct clone without registration
Quick Start
1. Set up authentication
# GitHub
export GITHUB_TOKEN=ghp_your_token_here
# GitLab
export GITLAB_TOKEN=glpat_your_token_here
# Bitbucket
export BITBUCKET_TOKEN=your_token_here
2. Register a source
Using MCP tools (recommended):
add_config_source(
name="team",
git_url="https://github.com/mycompany/skill-configs.git",
source_type="github", # Optional, auto-detected
token_env="GITHUB_TOKEN", # Optional, auto-detected
branch="main", # Optional, default: "main"
priority=100 # Optional, lower = higher priority
)
3. Fetch configs
# From registered source
fetch_config(source="team", config_name="react-custom")
# List available sources
list_config_sources()
# Remove when done
remove_config_source(name="team")
4. Quick test with example repository
cd /path/to/Skill_Seekers
# Run E2E test
python3 configs/example-team/test_e2e.py
# Or test manually
add_config_source(
name="example",
git_url="file://$(pwd)/configs/example-team",
branch="master"
)
fetch_config(source="example", config_name="react-custom")
Architecture
Storage Locations
Sources Registry:
~/.skill-seekers/sources.json
Example content:
{
"version": "1.0",
"sources": [
{
"name": "team",
"git_url": "https://github.com/myorg/configs.git",
"type": "github",
"token_env": "GITHUB_TOKEN",
"branch": "main",
"enabled": true,
"priority": 1,
"added_at": "2025-12-21T10:00:00Z",
"updated_at": "2025-12-21T10:00:00Z"
}
]
}
Cache Directory:
$SKILL_SEEKERS_CACHE_DIR (default: ~/.skill-seekers/cache/)
Structure:
~/.skill-seekers/
├── sources.json # Source registry
└── cache/ # Git clones
├── team/ # One directory per source
│ ├── .git/
│ ├── react-custom.json
│ └── vue-internal.json
└── company/
├── .git/
└── internal-api.json
Git Strategy
-
Shallow clone:
git clone --depth 1 --single-branch- 10-50x faster
- Minimal disk space
- No history, just latest commit
-
Auto-pull: Updates cache automatically
- Checks for changes on each fetch
- Use
refresh=trueto force re-clone
-
Config discovery: Recursively scans for
*.jsonfiles- No hardcoded paths
- Flexible repository structure
- Excludes
.gitdirectory
MCP Tools Reference
add_config_source
Register a git repository as a config source.
Parameters:
name(required): Source identifier (lowercase, alphanumeric, hyphens/underscores)git_url(required): Git repository URL (HTTPS or SSH)source_type(optional): "github", "gitlab", "gitea", "bitbucket", "custom" (auto-detected from URL)token_env(optional): Environment variable name for token (auto-detected from type)branch(optional): Git branch (default: "main")priority(optional): Priority number (default: 100, lower = higher priority)enabled(optional): Whether source is active (default: true)
Returns:
- Source details including registration timestamp
Examples:
# Minimal (auto-detects everything)
add_config_source(
name="team",
git_url="https://github.com/myorg/configs.git"
)
# Full parameters
add_config_source(
name="company",
git_url="https://gitlab.company.com/platform/configs.git",
source_type="gitlab",
token_env="GITLAB_COMPANY_TOKEN",
branch="develop",
priority=1,
enabled=true
)
# SSH URL (auto-converts to HTTPS with token)
add_config_source(
name="team",
git_url="git@github.com:myorg/configs.git",
token_env="GITHUB_TOKEN"
)
list_config_sources
List all registered config sources.
Parameters:
enabled_only(optional): Only show enabled sources (default: false)
Returns:
- List of sources sorted by priority
Example:
# List all sources
list_config_sources()
# List only enabled sources
list_config_sources(enabled_only=true)
Output:
📋 Config Sources (2 total)
✓ **team**
📁 https://github.com/myorg/configs.git
🔖 Type: github | 🌿 Branch: main
🔑 Token: GITHUB_TOKEN | ⚡ Priority: 1
🕒 Added: 2025-12-21 10:00:00
✓ **company**
📁 https://gitlab.company.com/configs.git
🔖 Type: gitlab | 🌿 Branch: develop
🔑 Token: GITLAB_TOKEN | ⚡ Priority: 2
🕒 Added: 2025-12-21 11:00:00
remove_config_source
Remove a registered config source.
Parameters:
name(required): Source identifier
Returns:
- Success/failure message
Note: Does NOT delete cached git repository data. To free disk space, manually delete ~/.skill-seekers/cache/{source_name}/
Example:
remove_config_source(name="team")
fetch_config
Fetch config from API, git URL, or named source.
Mode 1: Named Source (highest priority)
fetch_config(
source="team", # Use registered source
config_name="react-custom",
destination="configs/", # Optional
branch="main", # Optional, overrides source default
refresh=false # Optional, force re-clone
)
Mode 2: Direct Git URL
fetch_config(
git_url="https://github.com/myorg/configs.git",
config_name="react-custom",
branch="main", # Optional
token="ghp_token", # Optional, prefer env vars
destination="configs/", # Optional
refresh=false # Optional
)
Mode 3: API (existing, unchanged)
fetch_config(
config_name="react",
destination="configs/" # Optional
)
# Or list available
fetch_config(list_available=true)
Authentication
Environment Variables Only
Tokens are ONLY stored in environment variables. This is:
- ✅ Secure - Not in files, not in git
- ✅ Standard - Same as GitHub CLI, Docker, etc.
- ✅ Temporary - Cleared on logout
- ✅ Flexible - Different tokens for different services
Creating Tokens
GitHub:
- Go to https://github.com/settings/tokens
- Generate new token (classic)
- Select scopes:
repo(for private repos) - Copy token:
ghp_xxxxxxxxxxxxx - Export:
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxx
GitLab:
- Go to https://gitlab.com/-/profile/personal_access_tokens
- Create token with
read_repositoryscope - Copy token:
glpat-xxxxxxxxxxxxx - Export:
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxx
Bitbucket:
- Go to https://bitbucket.org/account/settings/app-passwords/
- Create app password with
Repositories: Readpermission - Copy password
- Export:
export BITBUCKET_TOKEN=your_password
Persistent Tokens
Add to your shell profile (~/.bashrc, ~/.zshrc, etc.):
# GitHub token
export GITHUB_TOKEN=ghp_xxxxxxxxxxxxx
# GitLab token
export GITLAB_TOKEN=glpat-xxxxxxxxxxxxx
# Company GitLab (separate token)
export GITLAB_COMPANY_TOKEN=glpat-yyyyyyyyyyyyy
Then: source ~/.bashrc
Token Injection
GitConfigRepo automatically:
- Converts SSH URLs to HTTPS
- Injects token into URL
- Uses token for authentication
Example:
- Input:
git@github.com:myorg/repo.git+ tokenghp_xxx - Output:
https://ghp_xxx@github.com/myorg/repo.git
Use Cases
Small Team (3-5 people)
Scenario: Frontend team needs custom React configs for internal docs.
Setup:
# 1. Team lead creates repo
gh repo create myteam/skill-configs --private
# 2. Add configs
cd myteam-skill-configs
cp ../Skill_Seekers/configs/react.json ./react-internal.json
# Edit for internal docs:
# - Change base_url to internal docs site
# - Adjust selectors for company theme
# - Customize categories
git add . && git commit -m "Add internal React config" && git push
# 3. Team members register (one-time)
export GITHUB_TOKEN=ghp_their_token
add_config_source(
name="team",
git_url="https://github.com/myteam/skill-configs.git"
)
# 4. Daily usage
fetch_config(source="team", config_name="react-internal")
Benefits:
- ✅ Shared configs across team
- ✅ Version controlled
- ✅ Private to company
- ✅ Easy updates (git push)
Enterprise (500+ developers)
Scenario: Large company with multiple teams, internal docs, and priority-based config resolution.
Setup:
# IT pre-configures sources for all developers
# (via company setup script or documentation)
# 1. Platform team configs (highest priority)
add_config_source(
name="platform",
git_url="https://gitlab.company.com/platform/skill-configs.git",
source_type="gitlab",
token_env="GITLAB_COMPANY_TOKEN",
priority=1
)
# 2. Mobile team configs
add_config_source(
name="mobile",
git_url="https://gitlab.company.com/mobile/skill-configs.git",
source_type="gitlab",
token_env="GITLAB_COMPANY_TOKEN",
priority=2
)
# 3. Public/official configs (fallback)
# (API mode, no registration needed, lowest priority)
Developer usage:
# Automatically finds config with highest priority
fetch_config(config_name="platform-api") # Found in platform source
fetch_config(config_name="react-native") # Found in mobile source
fetch_config(config_name="react") # Falls back to public API
Benefits:
- ✅ Centralized config management
- ✅ Team-specific overrides
- ✅ Fallback to public configs
- ✅ Priority-based resolution
- ✅ Scales to hundreds of developers
Open Source Project
Scenario: Open source project wants curated configs for contributors.
Setup:
# 1. Create public repo
gh repo create myproject/skill-configs --public
# 2. Add configs for project stack
- react.json (frontend)
- django.json (backend)
- postgres.json (database)
- nginx.json (deployment)
# 3. Contributors use directly (no token needed for public repos)
add_config_source(
name="myproject",
git_url="https://github.com/myproject/skill-configs.git"
)
fetch_config(source="myproject", config_name="react")
Benefits:
- ✅ Curated configs for project
- ✅ No API dependency
- ✅ Community contributions via PR
- ✅ Version controlled
Best Practices
Config Naming
Good:
react-internal.json- Clear purposeapi-v2.json- Version includedplatform-auth.json- Specific topic
Bad:
config1.json- Genericreact.json- Conflicts with officialtest.json- Not descriptive
Repository Structure
Flat (recommended for small repos):
skill-configs/
├── README.md
├── react-internal.json
├── vue-internal.json
└── api-v2.json
Organized (recommended for large repos):
skill-configs/
├── README.md
├── frontend/
│ ├── react-internal.json
│ └── vue-internal.json
├── backend/
│ ├── django-api.json
│ └── fastapi-platform.json
└── mobile/
├── react-native.json
└── flutter.json
Note: Config discovery works recursively, so both structures work!
Source Priorities
Lower number = higher priority. Use sensible defaults:
1-10: Critical/override configs50-100: Team configs (default: 100)1000+: Fallback/experimental
Example:
# Override official React config with internal version
add_config_source(name="team", ..., priority=1) # Checked first
# Official API is checked last (priority: infinity)
Security
✅ DO:
- Use environment variables for tokens
- Use private repos for sensitive configs
- Rotate tokens regularly
- Use fine-grained tokens (read-only if possible)
❌ DON'T:
- Commit tokens to git
- Share tokens between people
- Use personal tokens for teams (use service accounts)
- Store tokens in config files
Maintenance
Regular tasks:
# Update configs in repo
cd myteam-skill-configs
# Edit configs...
git commit -m "Update React config" && git push
# Developers get updates automatically on next fetch
fetch_config(source="team", config_name="react-internal")
# ^--- Auto-pulls latest changes
Force refresh:
# Delete cache and re-clone
fetch_config(source="team", config_name="react-internal", refresh=true)
Clean up old sources:
# Remove unused sources
remove_config_source(name="old-team")
# Free disk space
rm -rf ~/.skill-seekers/cache/old-team/
Troubleshooting
Authentication Failures
Error: "Authentication failed for https://github.com/org/repo.git"
Solutions:
-
Check token is set:
echo $GITHUB_TOKEN # Should show token -
Verify token has correct permissions:
- GitHub:
reposcope for private repos - GitLab:
read_repositoryscope
- GitHub:
-
Check token isn't expired:
- Regenerate if needed
-
Try direct access:
git clone https://$GITHUB_TOKEN@github.com/org/repo.git test-clone
Config Not Found
Error: "Config 'react' not found in repository. Available configs: django, vue"
Solutions:
-
List available configs:
# Shows what's actually in the repo list_config_sources() -
Check config file exists in repo:
# Clone locally and inspect git clone <git_url> temp-inspect find temp-inspect -name "*.json" -
Verify config name (case-insensitive):
reactmatchesReact.jsonorreact.json
Slow Cloning
Issue: Repository takes minutes to clone.
Solutions:
-
Shallow clone is already enabled (depth=1)
-
Check repository size:
# See repo size gh repo view owner/repo --json diskUsage -
If very large (>100MB), consider:
- Splitting configs into separate repos
- Using sparse checkout
- Contacting IT to optimize repo
Cache Issues
Issue: Getting old configs even after updating repo.
Solutions:
-
Force refresh:
fetch_config(source="team", config_name="react", refresh=true) -
Manual cache clear:
rm -rf ~/.skill-seekers/cache/team/ -
Check auto-pull worked:
cd ~/.skill-seekers/cache/team git log -1 # Shows latest commit
Advanced Topics
Multiple Git Accounts
Use different tokens for different repos:
# Personal GitHub
export GITHUB_TOKEN=ghp_personal_xxx
# Work GitHub
export GITHUB_WORK_TOKEN=ghp_work_yyy
# Company GitLab
export GITLAB_COMPANY_TOKEN=glpat-zzz
Register with specific tokens:
add_config_source(
name="personal",
git_url="https://github.com/myuser/configs.git",
token_env="GITHUB_TOKEN"
)
add_config_source(
name="work",
git_url="https://github.com/mycompany/configs.git",
token_env="GITHUB_WORK_TOKEN"
)
Custom Cache Location
Set custom cache directory:
export SKILL_SEEKERS_CACHE_DIR=/mnt/large-disk/skill-seekers-cache
Or pass to GitConfigRepo:
from skill_seekers.mcp.git_repo import GitConfigRepo
gr = GitConfigRepo(cache_dir="/custom/path/cache")
SSH URLs
SSH URLs are automatically converted to HTTPS + token:
# Input
add_config_source(
name="team",
git_url="git@github.com:myorg/configs.git",
token_env="GITHUB_TOKEN"
)
# Internally becomes
# https://ghp_xxx@github.com/myorg/configs.git
Priority Resolution
When same config exists in multiple sources:
add_config_source(name="team", ..., priority=1) # Checked first
add_config_source(name="company", ..., priority=2) # Checked second
# API mode is checked last (priority: infinity)
fetch_config(config_name="react")
# 1. Checks team source
# 2. If not found, checks company source
# 3. If not found, falls back to API
CI/CD Integration
Use in GitHub Actions:
name: Generate Skills
on: push
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Skill Seekers
run: pip install skill-seekers
- name: Register config source
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python3 << EOF
from skill_seekers.mcp.source_manager import SourceManager
sm = SourceManager()
sm.add_source(
name="team",
git_url="https://github.com/myorg/configs.git"
)
EOF
- name: Fetch and use config
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Use MCP fetch_config or direct Python
skill-seekers scrape --config <fetched_config>
API Reference
GitConfigRepo Class
Location: src/skill_seekers/mcp/git_repo.py
Methods:
def __init__(cache_dir: Optional[str] = None)
"""Initialize with optional cache directory."""
def clone_or_pull(
source_name: str,
git_url: str,
branch: str = "main",
token: Optional[str] = None,
force_refresh: bool = False
) -> Path:
"""Clone if not cached, else pull latest changes."""
def find_configs(repo_path: Path) -> list[Path]:
"""Find all *.json files in repository."""
def get_config(repo_path: Path, config_name: str) -> dict:
"""Load specific config by name."""
@staticmethod
def inject_token(git_url: str, token: str) -> str:
"""Inject token into git URL."""
@staticmethod
def validate_git_url(git_url: str) -> bool:
"""Validate git URL format."""
SourceManager Class
Location: src/skill_seekers/mcp/source_manager.py
Methods:
def __init__(config_dir: Optional[str] = None)
"""Initialize with optional config directory."""
def add_source(
name: str,
git_url: str,
source_type: str = "github",
token_env: Optional[str] = None,
branch: str = "main",
priority: int = 100,
enabled: bool = True
) -> dict:
"""Add or update config source."""
def get_source(name: str) -> dict:
"""Get source by name."""
def list_sources(enabled_only: bool = False) -> list[dict]:
"""List all sources."""
def remove_source(name: str) -> bool:
"""Remove source."""
def update_source(name: str, **kwargs) -> dict:
"""Update specific fields."""
See Also
- README.md - Main documentation
- MCP_SETUP.md - MCP server setup
- UNIFIED_SCRAPING.md - Multi-source scraping
- configs/example-team/ - Example repository
Changelog
v2.2.0 (2025-12-21)
- Initial release of git-based config sources
- 3 fetch modes: API, Git URL, Named Source
- 4 MCP tools: add/list/remove/fetch
- Support for GitHub, GitLab, Bitbucket, Gitea
- Shallow clone optimization
- Priority-based resolution
- 83 tests (100% passing)
Questions? Open an issue at https://github.com/yusufkaraaslan/Skill_Seekers/issues