Files
skill-seekers-reference/tests/test_git_sources_e2e.py
yusyus 6d37e43b83 feat: Grand Unification — one command, one interface, direct converters (#346)
* fix: resolve 8 pipeline bugs found during skill quality review

- Fix 0 APIs extracted from documentation by enriching summary.json
  with individual page file content before conflict detection
- Fix all "Unknown" entries in merged_api.md by injecting dict keys
  as API names and falling back to AI merger field names
- Fix frontmatter using raw slugs instead of config name by
  normalizing frontmatter after SKILL.md generation
- Fix leaked absolute filesystem paths in patterns/index.md by
  stripping .skillseeker-cache repo clone prefixes
- Fix ARCHITECTURE.md file count always showing "1 files" by
  counting files per language from code_analysis data
- Fix YAML parse errors on GitHub Actions workflows by converting
  boolean keys (on: true) to strings
- Fix false React/Vue.js framework detection in C# projects by
  filtering web frameworks based on primary language
- Improve how-to guide generation by broadening workflow example
  filter to include setup/config examples with sufficient complexity
- Fix test_git_sources_e2e failures caused by git init default
  branch being 'main' instead of 'master'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6 review issues in ExecutionContext implementation

Fixes from code review:

1. Mode resolution (#3 critical): _args_to_data no longer unconditionally
   overwrites mode. Only writes mode="api" when --api-key explicitly passed.
   Env-var-based mode detection moved to _default_data() as lowest priority.

2. Re-initialization warning (#4): initialize() now logs debug message
   when called a second time instead of silently returning stale instance.

3. _raw_args preserved in override (#5): temp context now copies _raw_args
   from parent so get_raw() works correctly inside override blocks.

4. test_local_mode_detection env cleanup (#7): test now saves/restores
   API key env vars to prevent failures when ANTHROPIC_API_KEY is set.

5. _load_config_file error handling (#8): wraps FileNotFoundError and
   JSONDecodeError with user-friendly ValueError messages.

6. Lint fixes: added logging import, fixed Generator import from
   collections.abc, fixed AgentClient return type annotation.

Remaining P2/P3 items (documented, not blocking):
- Lock TOCTOU in override() — safe on CPython, needs fix for no-GIL
- get() reads _instance without lock — same CPython caveat
- config_path not stored on instance
- AnalysisSettings.depth not Literal constrained

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address all remaining P2/P3 review issues in ExecutionContext

1. Thread safety: get() now acquires _lock before reading _instance (#2)
2. Thread safety: override() saves/restores _initialized flag to prevent
   re-init during override blocks (#10)
3. Config path stored: _config_path PrivateAttr + config_path property (#6)
4. Literal validation: AnalysisSettings.depth now uses
   Literal["surface", "deep", "full"] — rejects invalid values (#9)
5. Test updated: test_analysis_depth_choices now expects ValidationError
   for invalid depth, added test_analysis_depth_valid_choices
6. Lint cleanup: removed unused imports, fixed whitespace in tests

All 10 previously reported issues now resolved.
26 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore 5 truncated scrapers, migrate unified_scraper, fix context init

5 scrapers had main() truncated with "# Original main continues here..."
after Kimi's migration — business logic was never connected:
- html_scraper.py — restored HtmlToSkillConverter extraction + build
- pptx_scraper.py — restored PptxToSkillConverter extraction + build
- confluence_scraper.py — restored ConfluenceToSkillConverter with 3 modes
- notion_scraper.py — restored NotionToSkillConverter with 4 sources
- chat_scraper.py — restored ChatToSkillConverter extraction + build

unified_scraper.py — migrated main() to context-first pattern with argv fallback

Fixed context initialization chain:
- main.py no longer initializes ExecutionContext (was stealing init from commands)
- create_command.py now passes config_path from source_info.parsed
- execution_context.py handles SourceInfo.raw_input (not raw_source)

All 18 scrapers now genuinely migrated. 26 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 7 data flow conflicts between ExecutionContext and legacy paths

Critical fixes (CLI args silently lost):
- unified_scraper Phase 6: reads ctx.enhancement.level instead of raw JSON
  when args=None (#3, #4)
- unified_scraper Phase 6 agent: reads ctx.enhancement.agent instead of
  3 independent env var lookups (#5)
- doc_scraper._run_enhancement: uses agent_client.api_key instead of raw
  os.environ.get() — respects config file api_key (#1)

Important fixes:
- main._handle_analyze_command: populates _fake_args from ExecutionContext
  so --agent and --api-key aren't lost in analyze→enhance path (#6)
- doc_scraper type annotations: replaced forward refs with Any to avoid
  F821 undefined name errors

All changes include RuntimeError fallback for backward compatibility when
ExecutionContext isn't initialized.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: 3 crashes + 1 stub in migrated scrapers found by deep scan

1. github_scraper.py: args.scrape_only and args.enhance_level crash when
   args=None (context path). Guarded with if args and getattr(). Also
   fixed agent fallback to read ctx.enhancement.agent.

2. codebase_scraper.py: args.output and args.skip_api_reference crash in
   summary block when args=None. Replaced with output_dir local var and
   ctx.analysis.skip_api_reference.

3. epub_scraper.py: main() was still a stub ending with "# Rest of main()
   continues..." — restored full extraction + build + enhancement logic
   using ctx values exclusively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: complete ExecutionContext migration for remaining scrapers

Kimi's Phase 4 scraper migrations + Claude's review fixes.
All 18 scrapers now use context-first pattern with argv fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 1 — ExecutionContext.get() always returns context (no RuntimeError)

get() now returns a default context instead of raising RuntimeError when
not explicitly initialized. This eliminates the need for try/except
RuntimeError blocks in all 18 scrapers.

Components can always call ExecutionContext.get() safely — it returns
defaults if not initialized, or the explicitly initialized instance.

Updated tests: test_get_returns_defaults_when_not_initialized,
test_reset_clears_instance (no longer expects RuntimeError).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 2a-c — remove 16 individual scraper CLI commands

Removed individual scraper commands from:
- COMMAND_MODULES in main.py (16 entries: scrape, github, pdf, word,
  epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage,
  confluence, notion, chat)
- pyproject.toml entry points (16 skill-seekers-<type> binaries)
- parsers/__init__.py (16 parser registrations)

All source types now accessed via: skill-seekers create <source>
Kept: create, unified, analyze, enhance, package, upload, install,
      install-agent, config, doctor, and utility commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: create SkillConverter base class + converter registry

New base interface that all 17 converters will inherit:
- SkillConverter.run() — extract + build (same call for all types)
- SkillConverter.extract() — override in subclass
- SkillConverter.build_skill() — override in subclass
- get_converter(source_type, config) — factory from registry
- CONVERTER_REGISTRY — maps source type → (module, class)

create_command will use get_converter() instead of _call_module().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Grand Unification — one command, one interface, direct converters

Complete the Grand Unification refactor: `skill-seekers create` is now
the single entry point for all 18 source types. Individual scraper CLI
commands (scrape, github, pdf, analyze, unified, etc.) are removed.

## Architecture changes

- **18 SkillConverter subclasses**: Every scraper now inherits SkillConverter
  with extract() + build_skill() + SOURCE_TYPE. Factory via get_converter().
- **create_command.py rewritten**: _build_config() constructs config dicts
  from ExecutionContext for each source type. Direct converter.run() calls
  replace the old _build_argv() + sys.argv swap + _call_module() machinery.
- **main.py simplified**: create command bypasses _reconstruct_argv entirely,
  calls CreateCommand(args).execute() directly. analyze/unified commands
  removed (create handles both via auto-detection).
- **CreateParser mode="all"**: Top-level parser now accepts all 120+ flags
  (--browser, --max-pages, --depth, etc.) since create is the only entry.
- **Centralized enhancement**: Runs once in create_command after converter,
  not duplicated in each scraper.
- **MCP tools use converters**: 5 scraping tools call get_converter()
  directly instead of subprocess. Config type auto-detected from keys.
- **ConfigValidator → UniSkillConfigValidator**: Renamed with backward-
  compat alias.
- **Data flow**: AgentClient + LocalSkillEnhancer read ExecutionContext
  first, env vars as fallback.

## What was removed

- main() from all 18 scraper files (~3400 lines)
- 18 CLI commands from COMMAND_MODULES + pyproject.toml entry points
- analyze + unified parsers from parser registry
- _build_argv, _call_module, _SKIP_ARGS, _DEST_TO_FLAG, all _route_*()
- setup_argument_parser, get_configuration, _check_deprecated_flags
- Tests referencing removed commands/functions

## Net impact

51 files changed, ~6000 lines removed. 2996 tests pass, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review fixes for Grand Unification PR

- Add autouse conftest fixture to reset ExecutionContext singleton between tests
- Replace hardcoded defaults in _is_explicitly_set() with parser-derived defaults
- Upgrade ExecutionContext double-init log from debug to info
- Use logger.exception() in SkillConverter.run() to preserve tracebacks
- Fix docstring "17 types" → "18 types" in skill_converter.py
- DRY up 10 copy-paste help handlers into dict + loop (~100 lines removed)
- Fix 2 CI workflows still referencing removed `skill-seekers scrape` command
- Remove broken pyproject.toml entry point for codebase_scraper:main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 12 logic/flow issues found in deep review

Critical fixes:
- UnifiedScraper.run(): replace sys.exit(1) with return 1, add return 0
- doc_scraper: use ExecutionContext.get() when already initialized instead
  of re-calling initialize() which silently discards new config
- unified_scraper: define enhancement_config before try/except to prevent
  UnboundLocalError in LOCAL enhancement timeout read

Important fixes:
- override(): cleaner tuple save/restore for singleton swap
- --agent without --api-key now sets mode="local" so env API key doesn't
  override explicit agent choice
- Remove DeprecationWarning from _reconstruct_argv (fires on every
  non-create command in production)
- Rewrite scrape_generic_tool to use get_converter() instead of subprocess
  calls to removed main() functions
- SkillConverter.run() checks build_skill() return value, returns 1 if False
- estimate_pages_tool uses -m module invocation instead of .py file path

Low-priority fixes:
- get_converter() raises descriptive ValueError on class name typo
- test_default_values: save/clear API key env vars before asserting mode
- test_get_converter_pdf: fix config key "path" → "pdf_path"

3056 passed, 4 failed (pre-existing dep version issues), 32 skipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update MCP server tests to mock converter instead of subprocess

scrape_docs_tool now uses get_converter() + _run_converter() in-process
instead of run_subprocess_with_streaming. Update 4 TestScrapeDocsTool
tests to mock the converter layer instead of the removed subprocess path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: YusufKaraaslanSpyke <yusuf@spykegames.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:00:52 +03:00

880 lines
30 KiB
Python

#!/usr/bin/env python3
"""
E2E Tests for A1.9 Git Source Features
Tests the complete workflow with temporary files and repositories:
1. GitConfigRepo - clone/pull operations
2. SourceManager - registry CRUD operations
3. MCP Tools - all 4 git-related tools
4. Integration - complete user workflows
5. Error handling - authentication, not found, etc.
All tests use temporary directories and actual git repositories.
"""
import json
import os
import shutil
import tempfile
from pathlib import Path
import git
import pytest
from skill_seekers.mcp.git_repo import GitConfigRepo
from skill_seekers.mcp.source_manager import SourceManager
# Check if MCP is available
try:
import mcp # noqa: F401
from mcp.types import TextContent # noqa: F401
MCP_AVAILABLE = True
except ImportError:
MCP_AVAILABLE = False
class TestGitSourcesE2E:
"""End-to-end tests for git source features."""
@pytest.fixture
def temp_dirs(self):
"""Create temporary directories for cache and config."""
cache_dir = tempfile.mkdtemp(prefix="ss_cache_")
config_dir = tempfile.mkdtemp(prefix="ss_config_")
yield cache_dir, config_dir
# Cleanup
shutil.rmtree(cache_dir, ignore_errors=True)
shutil.rmtree(config_dir, ignore_errors=True)
@pytest.fixture
def temp_git_repo(self):
"""Create a temporary git repository with sample configs."""
repo_dir = tempfile.mkdtemp(prefix="ss_repo_")
# Initialize git repository with 'master' branch for test consistency
repo = git.Repo.init(repo_dir, initial_branch="master")
# Create sample config files
configs = {
"react.json": {
"name": "react",
"description": "React framework for UIs",
"base_url": "https://react.dev/",
"selectors": {"main_content": "article", "title": "h1", "code_blocks": "pre code"},
"url_patterns": {"include": [], "exclude": []},
"categories": {"getting_started": ["learn", "start"], "api": ["reference", "api"]},
"rate_limit": 0.5,
"max_pages": 100,
},
"vue.json": {
"name": "vue",
"description": "Vue.js progressive framework",
"base_url": "https://vuejs.org/",
"selectors": {"main_content": "main", "title": "h1"},
"url_patterns": {"include": [], "exclude": []},
"categories": {},
"rate_limit": 0.5,
"max_pages": 50,
},
"django.json": {
"name": "django",
"description": "Django web framework",
"base_url": "https://docs.djangoproject.com/",
"selectors": {"main_content": "div[role='main']", "title": "h1"},
"url_patterns": {"include": [], "exclude": []},
"categories": {},
"rate_limit": 0.5,
"max_pages": 200,
},
}
# Write config files
for filename, config_data in configs.items():
config_path = Path(repo_dir) / filename
with open(config_path, "w") as f:
json.dump(config_data, f, indent=2)
# Add and commit
repo.index.add(["*.json"])
repo.index.commit("Initial commit with sample configs")
yield repo_dir, repo
# Cleanup
shutil.rmtree(repo_dir, ignore_errors=True)
def test_e2e_workflow_direct_git_url(self, temp_dirs, temp_git_repo):
"""
E2E Test 1: Direct git URL workflow (no source registration)
Steps:
1. Clone repository via direct git URL
2. List available configs
3. Fetch specific config
4. Verify config content
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Step 1: Clone repository
git_repo = GitConfigRepo(cache_dir=cache_dir)
repo_path = git_repo.clone_or_pull(
source_name="test-direct",
git_url=git_url,
branch="master", # git.Repo.init creates 'master' by default
)
assert repo_path.exists()
assert (repo_path / ".git").exists()
# Step 2: List available configs
configs = git_repo.find_configs(repo_path)
assert len(configs) == 3
config_names = [c.stem for c in configs]
assert set(config_names) == {"react", "vue", "django"}
# Step 3: Fetch specific config
config = git_repo.get_config(repo_path, "react")
# Step 4: Verify config content
assert config["name"] == "react"
assert config["description"] == "React framework for UIs"
assert config["base_url"] == "https://react.dev/"
assert "selectors" in config
assert "categories" in config
assert config["max_pages"] == 100
def test_e2e_workflow_with_source_registration(self, temp_dirs, temp_git_repo):
"""
E2E Test 2: Complete workflow with source registration
Steps:
1. Add source to registry
2. List sources
3. Get source details
4. Clone via source name
5. Fetch config
6. Update source (re-add with different priority)
7. Remove source
8. Verify removal
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Step 1: Add source to registry
source_manager = SourceManager(config_dir=config_dir)
source = source_manager.add_source(
name="team-configs", git_url=git_url, source_type="custom", branch="master", priority=10
)
assert source["name"] == "team-configs"
assert source["git_url"] == git_url
assert source["type"] == "custom"
assert source["branch"] == "master"
assert source["priority"] == 10
assert source["enabled"] is True
# Step 2: List sources
sources = source_manager.list_sources()
assert len(sources) == 1
assert sources[0]["name"] == "team-configs"
# Step 3: Get source details
retrieved_source = source_manager.get_source("team-configs")
assert retrieved_source["git_url"] == git_url
# Step 4: Clone via source name
git_repo = GitConfigRepo(cache_dir=cache_dir)
repo_path = git_repo.clone_or_pull(
source_name=source["name"], git_url=source["git_url"], branch=source["branch"]
)
assert repo_path.exists()
# Step 5: Fetch config
config = git_repo.get_config(repo_path, "vue")
assert config["name"] == "vue"
assert config["base_url"] == "https://vuejs.org/"
# Step 6: Update source (re-add with different priority)
updated_source = source_manager.add_source(
name="team-configs",
git_url=git_url,
source_type="custom",
branch="master",
priority=5, # Changed priority
)
assert updated_source["priority"] == 5
# Step 7: Remove source
removed = source_manager.remove_source("team-configs")
assert removed is True
# Step 8: Verify removal
sources = source_manager.list_sources()
assert len(sources) == 0
with pytest.raises(KeyError, match="Source 'team-configs' not found"):
source_manager.get_source("team-configs")
def test_e2e_multiple_sources_priority_resolution(self, temp_dirs, temp_git_repo):
"""
E2E Test 3: Multiple sources with priority resolution
Steps:
1. Add multiple sources with different priorities
2. Verify sources are sorted by priority
3. Enable/disable sources
4. List enabled sources only
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
source_manager = SourceManager(config_dir=config_dir)
# Step 1: Add multiple sources with different priorities
source_manager.add_source(name="low-priority", git_url=git_url, priority=100)
source_manager.add_source(name="high-priority", git_url=git_url, priority=1)
source_manager.add_source(name="medium-priority", git_url=git_url, priority=50)
# Step 2: Verify sources are sorted by priority
sources = source_manager.list_sources()
assert len(sources) == 3
assert sources[0]["name"] == "high-priority"
assert sources[1]["name"] == "medium-priority"
assert sources[2]["name"] == "low-priority"
# Step 3: Enable/disable sources
source_manager.add_source(name="high-priority", git_url=git_url, priority=1, enabled=False)
# Step 4: List enabled sources only
enabled_sources = source_manager.list_sources(enabled_only=True)
assert len(enabled_sources) == 2
assert all(s["enabled"] for s in enabled_sources)
assert "high-priority" not in [s["name"] for s in enabled_sources]
def test_e2e_pull_existing_repository(self, temp_dirs, temp_git_repo):
"""
E2E Test 4: Pull updates from existing repository
Steps:
1. Clone repository
2. Add new commit to original repo
3. Pull updates
4. Verify new config is available
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
git_repo = GitConfigRepo(cache_dir=cache_dir)
# Step 1: Clone repository
repo_path = git_repo.clone_or_pull(
source_name="test-pull", git_url=git_url, branch="master"
)
initial_configs = git_repo.find_configs(repo_path)
assert len(initial_configs) == 3
# Step 2: Add new commit to original repo
new_config = {
"name": "fastapi",
"description": "FastAPI framework",
"base_url": "https://fastapi.tiangolo.com/",
"selectors": {"main_content": "article"},
"url_patterns": {"include": [], "exclude": []},
"categories": {},
"rate_limit": 0.5,
"max_pages": 150,
}
new_config_path = Path(repo_dir) / "fastapi.json"
with open(new_config_path, "w") as f:
json.dump(new_config, f, indent=2)
repo.index.add(["fastapi.json"])
repo.index.commit("Add FastAPI config")
# Step 3: Pull updates
updated_repo_path = git_repo.clone_or_pull(
source_name="test-pull",
git_url=git_url,
branch="master",
force_refresh=False, # Should pull, not re-clone
)
# Step 4: Verify new config is available
updated_configs = git_repo.find_configs(updated_repo_path)
assert len(updated_configs) == 4
fastapi_config = git_repo.get_config(updated_repo_path, "fastapi")
assert fastapi_config["name"] == "fastapi"
assert fastapi_config["max_pages"] == 150
def test_e2e_force_refresh(self, temp_dirs, temp_git_repo):
"""
E2E Test 5: Force refresh (delete and re-clone)
Steps:
1. Clone repository
2. Modify local cache manually
3. Force refresh
4. Verify cache was reset
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
git_repo = GitConfigRepo(cache_dir=cache_dir)
# Step 1: Clone repository
repo_path = git_repo.clone_or_pull(
source_name="test-refresh", git_url=git_url, branch="master"
)
# Step 2: Modify local cache manually
corrupt_file = repo_path / "CORRUPTED.txt"
with open(corrupt_file, "w") as f:
f.write("This file should not exist after refresh")
assert corrupt_file.exists()
# Step 3: Force refresh
refreshed_repo_path = git_repo.clone_or_pull(
source_name="test-refresh",
git_url=git_url,
branch="master",
force_refresh=True, # Delete and re-clone
)
# Step 4: Verify cache was reset
assert not corrupt_file.exists()
configs = git_repo.find_configs(refreshed_repo_path)
assert len(configs) == 3
def test_e2e_config_not_found(self, temp_dirs, temp_git_repo):
"""
E2E Test 6: Error handling - config not found
Steps:
1. Clone repository
2. Try to fetch non-existent config
3. Verify helpful error message with suggestions
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
git_repo = GitConfigRepo(cache_dir=cache_dir)
# Step 1: Clone repository
repo_path = git_repo.clone_or_pull(
source_name="test-not-found", git_url=git_url, branch="master"
)
# Step 2: Try to fetch non-existent config
with pytest.raises(FileNotFoundError) as exc_info:
git_repo.get_config(repo_path, "nonexistent")
# Step 3: Verify helpful error message with suggestions
error_msg = str(exc_info.value)
assert "nonexistent.json" in error_msg
assert "not found" in error_msg
assert "react" in error_msg # Should suggest available configs
assert "vue" in error_msg
assert "django" in error_msg
def test_e2e_invalid_git_url(self, temp_dirs):
"""
E2E Test 7: Error handling - invalid git URL
Steps:
1. Try to clone with invalid URL
2. Verify validation error
"""
cache_dir, config_dir = temp_dirs
git_repo = GitConfigRepo(cache_dir=cache_dir)
# Invalid URLs
invalid_urls = ["", "not-a-url", "ftp://invalid.com/repo.git", "javascript:alert('xss')"]
for invalid_url in invalid_urls:
with pytest.raises(ValueError, match="Invalid git URL"):
git_repo.clone_or_pull(
source_name="test-invalid", git_url=invalid_url, branch="master"
)
def test_e2e_source_name_validation(self, temp_dirs):
"""
E2E Test 8: Error handling - invalid source names
Steps:
1. Try to add sources with invalid names
2. Verify validation errors
"""
cache_dir, config_dir = temp_dirs
source_manager = SourceManager(config_dir=config_dir)
# Invalid source names
invalid_names = [
"",
"name with spaces",
"name/with/slashes",
"name@with@symbols",
"name.with.dots",
"123-only-numbers-start-is-ok", # This should actually work
"name!exclamation",
]
valid_git_url = "https://github.com/test/repo.git"
for invalid_name in invalid_names[:-2]: # Skip the valid one
if invalid_name == "123-only-numbers-start-is-ok":
continue
with pytest.raises(ValueError, match="Invalid source name"):
source_manager.add_source(name=invalid_name, git_url=valid_git_url)
def test_e2e_registry_persistence(self, temp_dirs, temp_git_repo):
"""
E2E Test 9: Registry persistence across instances
Steps:
1. Add source with one SourceManager instance
2. Create new SourceManager instance
3. Verify source persists
4. Modify source with new instance
5. Verify changes persist
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Step 1: Add source with one instance
manager1 = SourceManager(config_dir=config_dir)
manager1.add_source(name="persistent-source", git_url=git_url, priority=25)
# Step 2: Create new instance
manager2 = SourceManager(config_dir=config_dir)
# Step 3: Verify source persists
sources = manager2.list_sources()
assert len(sources) == 1
assert sources[0]["name"] == "persistent-source"
assert sources[0]["priority"] == 25
# Step 4: Modify source with new instance
manager2.add_source(
name="persistent-source",
git_url=git_url,
priority=50, # Changed
)
# Step 5: Verify changes persist
manager3 = SourceManager(config_dir=config_dir)
source = manager3.get_source("persistent-source")
assert source["priority"] == 50
def test_e2e_cache_isolation(self, temp_dirs, temp_git_repo):
"""
E2E Test 10: Cache isolation between different cache directories
Steps:
1. Clone to cache_dir_1
2. Clone same repo to cache_dir_2
3. Verify both caches are independent
4. Modify one cache
5. Verify other cache is unaffected
"""
_config_dir = temp_dirs[1]
repo_dir, repo = temp_git_repo
cache_dir_1 = tempfile.mkdtemp(prefix="ss_cache1_")
cache_dir_2 = tempfile.mkdtemp(prefix="ss_cache2_")
try:
git_url = f"file://{repo_dir}"
# Step 1: Clone to cache_dir_1
git_repo_1 = GitConfigRepo(cache_dir=cache_dir_1)
repo_path_1 = git_repo_1.clone_or_pull(
source_name="test-source", git_url=git_url, branch="master"
)
# Step 2: Clone same repo to cache_dir_2
git_repo_2 = GitConfigRepo(cache_dir=cache_dir_2)
repo_path_2 = git_repo_2.clone_or_pull(
source_name="test-source", git_url=git_url, branch="master"
)
# Step 3: Verify both caches are independent
assert repo_path_1 != repo_path_2
assert repo_path_1.exists()
assert repo_path_2.exists()
# Step 4: Modify one cache
marker_file = repo_path_1 / "MARKER.txt"
with open(marker_file, "w") as f:
f.write("Cache 1 marker")
# Step 5: Verify other cache is unaffected
assert marker_file.exists()
assert not (repo_path_2 / "MARKER.txt").exists()
configs_1 = git_repo_1.find_configs(repo_path_1)
configs_2 = git_repo_2.find_configs(repo_path_2)
assert len(configs_1) == len(configs_2) == 3
finally:
shutil.rmtree(cache_dir_1, ignore_errors=True)
shutil.rmtree(cache_dir_2, ignore_errors=True)
def test_e2e_auto_detect_token_env(self, temp_dirs):
"""
E2E Test 11: Auto-detect token_env based on source type
Steps:
1. Add GitHub source without token_env
2. Verify GITHUB_TOKEN was auto-detected
3. Add GitLab source without token_env
4. Verify GITLAB_TOKEN was auto-detected
"""
cache_dir, config_dir = temp_dirs
source_manager = SourceManager(config_dir=config_dir)
# Step 1: Add GitHub source
github_source = source_manager.add_source(
name="github-test",
git_url="https://github.com/test/repo.git",
source_type="github",
# No token_env specified
)
# Step 2: Verify GITHUB_TOKEN was auto-detected
assert github_source["token_env"] == "GITHUB_TOKEN"
# Step 3: Add GitLab source
gitlab_source = source_manager.add_source(
name="gitlab-test",
git_url="https://gitlab.com/test/repo.git",
source_type="gitlab",
# No token_env specified
)
# Step 4: Verify GITLAB_TOKEN was auto-detected
assert gitlab_source["token_env"] == "GITLAB_TOKEN"
# Also test custom type (defaults to GIT_TOKEN)
custom_source = source_manager.add_source(
name="custom-test", git_url="https://custom.com/test/repo.git", source_type="custom"
)
assert custom_source["token_env"] == "GIT_TOKEN"
def test_e2e_complete_user_workflow(self, temp_dirs, temp_git_repo):
"""
E2E Test 12: Complete real-world user workflow
Simulates a team using the feature end-to-end:
1. Team lead creates config repository
2. Team lead registers source
3. Developer 1 clones and uses config
4. Developer 2 uses same source (cached)
5. Team lead updates repository
6. Developers pull updates
7. Config is removed from repo
8. Error handling works correctly
"""
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Step 1: Team lead creates repository (already done by fixture)
# Step 2: Team lead registers source
source_manager = SourceManager(config_dir=config_dir)
source_manager.add_source(
name="team-configs", git_url=git_url, source_type="custom", branch="master", priority=1
)
# Step 3: Developer 1 clones and uses config
git_repo = GitConfigRepo(cache_dir=cache_dir)
source = source_manager.get_source("team-configs")
repo_path = git_repo.clone_or_pull(
source_name=source["name"], git_url=source["git_url"], branch=source["branch"]
)
react_config = git_repo.get_config(repo_path, "react")
assert react_config["name"] == "react"
# Step 4: Developer 2 uses same source (should use cache, not re-clone)
# Simulate by checking if pull works (not re-clone)
repo_path_2 = git_repo.clone_or_pull(
source_name=source["name"], git_url=source["git_url"], branch=source["branch"]
)
assert repo_path == repo_path_2
# Step 5: Team lead updates repository
updated_react_config = react_config.copy()
updated_react_config["max_pages"] = 500 # Increased limit
react_config_path = Path(repo_dir) / "react.json"
with open(react_config_path, "w") as f:
json.dump(updated_react_config, f, indent=2)
repo.index.add(["react.json"])
repo.index.commit("Increase React config max_pages to 500")
# Step 6: Developers pull updates
git_repo.clone_or_pull(
source_name=source["name"], git_url=source["git_url"], branch=source["branch"]
)
updated_config = git_repo.get_config(repo_path, "react")
assert updated_config["max_pages"] == 500
# Step 7: Config is removed from repo
react_config_path.unlink()
repo.index.remove(["react.json"])
repo.index.commit("Remove react.json")
git_repo.clone_or_pull(
source_name=source["name"], git_url=source["git_url"], branch=source["branch"]
)
# Step 8: Error handling works correctly
with pytest.raises(FileNotFoundError, match="react.json"):
git_repo.get_config(repo_path, "react")
# But other configs still work
vue_config = git_repo.get_config(repo_path, "vue")
assert vue_config["name"] == "vue"
@pytest.mark.skipif(not MCP_AVAILABLE, reason="MCP not installed")
class TestMCPToolsE2E:
"""E2E tests for MCP tools integration."""
@pytest.fixture
def temp_dirs(self):
"""Create temporary directories for cache and config."""
cache_dir = tempfile.mkdtemp(prefix="ss_mcp_cache_")
config_dir = tempfile.mkdtemp(prefix="ss_mcp_config_")
# Set environment variables for tools to use
os.environ["SKILL_SEEKERS_CACHE_DIR"] = cache_dir
os.environ["SKILL_SEEKERS_CONFIG_DIR"] = config_dir
yield cache_dir, config_dir
# Cleanup
os.environ.pop("SKILL_SEEKERS_CACHE_DIR", None)
os.environ.pop("SKILL_SEEKERS_CONFIG_DIR", None)
shutil.rmtree(cache_dir, ignore_errors=True)
shutil.rmtree(config_dir, ignore_errors=True)
@pytest.fixture
def temp_git_repo(self):
"""Create a temporary git repository with sample configs."""
repo_dir = tempfile.mkdtemp(prefix="ss_mcp_repo_")
# Initialize git repository with 'master' branch for test consistency
repo = git.Repo.init(repo_dir, initial_branch="master")
# Create sample config
config = {
"name": "test-framework",
"description": "Test framework for E2E",
"base_url": "https://example.com/docs/",
"selectors": {"main_content": "article", "title": "h1"},
"url_patterns": {"include": [], "exclude": []},
"categories": {},
"rate_limit": 0.5,
"max_pages": 50,
}
config_path = Path(repo_dir) / "test-framework.json"
with open(config_path, "w") as f:
json.dump(config, f, indent=2)
repo.index.add(["*.json"])
repo.index.commit("Initial commit")
yield repo_dir, repo
shutil.rmtree(repo_dir, ignore_errors=True)
@pytest.mark.asyncio
async def test_mcp_add_list_remove_source_e2e(self, temp_dirs, temp_git_repo):
"""
MCP E2E Test 1: Complete add/list/remove workflow via MCP tools
"""
from skill_seekers.mcp.server import (
add_config_source_tool,
list_config_sources_tool,
remove_config_source_tool,
)
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Add source
add_result = await add_config_source_tool(
{
"name": "mcp-test-source",
"git_url": git_url,
"source_type": "custom",
"branch": "master",
}
)
assert len(add_result) == 1
assert "" in add_result[0].text
assert "mcp-test-source" in add_result[0].text
# List sources
list_result = await list_config_sources_tool({})
assert len(list_result) == 1
assert "mcp-test-source" in list_result[0].text
# Remove source
remove_result = await remove_config_source_tool({"name": "mcp-test-source"})
assert len(remove_result) == 1
assert "" in remove_result[0].text
assert "removed" in remove_result[0].text.lower()
@pytest.mark.asyncio
async def test_mcp_fetch_config_git_url_mode_e2e(self, temp_dirs, temp_git_repo):
"""
MCP E2E Test 2: fetch_config with direct git URL
"""
from skill_seekers.mcp.server import fetch_config_tool
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Create destination directory
dest_dir = Path(config_dir) / "configs"
dest_dir.mkdir(parents=True, exist_ok=True)
result = await fetch_config_tool(
{
"config_name": "test-framework",
"git_url": git_url,
"branch": "master",
"destination": str(dest_dir),
}
)
assert len(result) == 1
assert "" in result[0].text
assert "test-framework" in result[0].text
# Verify config was saved
saved_config = dest_dir / "test-framework.json"
assert saved_config.exists()
with open(saved_config) as f:
config_data = json.load(f)
assert config_data["name"] == "test-framework"
@pytest.mark.asyncio
async def test_mcp_fetch_config_source_mode_e2e(self, temp_dirs, temp_git_repo):
"""
MCP E2E Test 3: fetch_config with registered source
"""
from skill_seekers.mcp.server import add_config_source_tool, fetch_config_tool
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Register source first
await add_config_source_tool(
{"name": "test-source", "git_url": git_url, "source_type": "custom", "branch": "master"}
)
# Fetch via source name
dest_dir = Path(config_dir) / "configs"
dest_dir.mkdir(parents=True, exist_ok=True)
result = await fetch_config_tool(
{"config_name": "test-framework", "source": "test-source", "destination": str(dest_dir)}
)
assert len(result) == 1
assert "" in result[0].text
assert "test-framework" in result[0].text
# Verify config was saved
saved_config = dest_dir / "test-framework.json"
assert saved_config.exists()
@pytest.mark.asyncio
async def test_mcp_error_handling_e2e(self, temp_dirs, temp_git_repo):
"""
MCP E2E Test 4: Error handling across all tools
"""
from skill_seekers.mcp.server import (
add_config_source_tool,
fetch_config_tool,
remove_config_source_tool,
)
cache_dir, config_dir = temp_dirs
repo_dir, repo = temp_git_repo
git_url = f"file://{repo_dir}"
# Test 1: Add source without name
result = await add_config_source_tool({"git_url": git_url})
assert "" in result[0].text
assert "name" in result[0].text.lower()
# Test 2: Add source without git_url
result = await add_config_source_tool({"name": "test"})
assert "" in result[0].text
assert "git_url" in result[0].text.lower()
# Test 3: Remove non-existent source
result = await remove_config_source_tool({"name": "non-existent"})
assert "" in result[0].text or "not found" in result[0].text.lower()
# Test 4: Fetch config from non-existent source
dest_dir = Path(config_dir) / "configs"
dest_dir.mkdir(parents=True, exist_ok=True)
result = await fetch_config_tool(
{"config_name": "test", "source": "non-existent-source", "destination": str(dest_dir)}
)
assert "" in result[0].text or "not found" in result[0].text.lower()
# Test 5: Fetch non-existent config from valid source
await add_config_source_tool(
{"name": "valid-source", "git_url": git_url, "branch": "master"}
)
result = await fetch_config_tool(
{
"config_name": "non-existent-config",
"source": "valid-source",
"destination": str(dest_dir),
}
)
assert "" in result[0].text or "not found" in result[0].text.lower()
if __name__ == "__main__":
pytest.main([__file__, "-v", "--tb=short"])