Files
skill-seekers-reference/tests/test_create_integration_basic.py
yusyus 6d37e43b83 feat: Grand Unification — one command, one interface, direct converters (#346)
* fix: resolve 8 pipeline bugs found during skill quality review

- Fix 0 APIs extracted from documentation by enriching summary.json
  with individual page file content before conflict detection
- Fix all "Unknown" entries in merged_api.md by injecting dict keys
  as API names and falling back to AI merger field names
- Fix frontmatter using raw slugs instead of config name by
  normalizing frontmatter after SKILL.md generation
- Fix leaked absolute filesystem paths in patterns/index.md by
  stripping .skillseeker-cache repo clone prefixes
- Fix ARCHITECTURE.md file count always showing "1 files" by
  counting files per language from code_analysis data
- Fix YAML parse errors on GitHub Actions workflows by converting
  boolean keys (on: true) to strings
- Fix false React/Vue.js framework detection in C# projects by
  filtering web frameworks based on primary language
- Improve how-to guide generation by broadening workflow example
  filter to include setup/config examples with sufficient complexity
- Fix test_git_sources_e2e failures caused by git init default
  branch being 'main' instead of 'master'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6 review issues in ExecutionContext implementation

Fixes from code review:

1. Mode resolution (#3 critical): _args_to_data no longer unconditionally
   overwrites mode. Only writes mode="api" when --api-key explicitly passed.
   Env-var-based mode detection moved to _default_data() as lowest priority.

2. Re-initialization warning (#4): initialize() now logs debug message
   when called a second time instead of silently returning stale instance.

3. _raw_args preserved in override (#5): temp context now copies _raw_args
   from parent so get_raw() works correctly inside override blocks.

4. test_local_mode_detection env cleanup (#7): test now saves/restores
   API key env vars to prevent failures when ANTHROPIC_API_KEY is set.

5. _load_config_file error handling (#8): wraps FileNotFoundError and
   JSONDecodeError with user-friendly ValueError messages.

6. Lint fixes: added logging import, fixed Generator import from
   collections.abc, fixed AgentClient return type annotation.

Remaining P2/P3 items (documented, not blocking):
- Lock TOCTOU in override() — safe on CPython, needs fix for no-GIL
- get() reads _instance without lock — same CPython caveat
- config_path not stored on instance
- AnalysisSettings.depth not Literal constrained

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address all remaining P2/P3 review issues in ExecutionContext

1. Thread safety: get() now acquires _lock before reading _instance (#2)
2. Thread safety: override() saves/restores _initialized flag to prevent
   re-init during override blocks (#10)
3. Config path stored: _config_path PrivateAttr + config_path property (#6)
4. Literal validation: AnalysisSettings.depth now uses
   Literal["surface", "deep", "full"] — rejects invalid values (#9)
5. Test updated: test_analysis_depth_choices now expects ValidationError
   for invalid depth, added test_analysis_depth_valid_choices
6. Lint cleanup: removed unused imports, fixed whitespace in tests

All 10 previously reported issues now resolved.
26 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore 5 truncated scrapers, migrate unified_scraper, fix context init

5 scrapers had main() truncated with "# Original main continues here..."
after Kimi's migration — business logic was never connected:
- html_scraper.py — restored HtmlToSkillConverter extraction + build
- pptx_scraper.py — restored PptxToSkillConverter extraction + build
- confluence_scraper.py — restored ConfluenceToSkillConverter with 3 modes
- notion_scraper.py — restored NotionToSkillConverter with 4 sources
- chat_scraper.py — restored ChatToSkillConverter extraction + build

unified_scraper.py — migrated main() to context-first pattern with argv fallback

Fixed context initialization chain:
- main.py no longer initializes ExecutionContext (was stealing init from commands)
- create_command.py now passes config_path from source_info.parsed
- execution_context.py handles SourceInfo.raw_input (not raw_source)

All 18 scrapers now genuinely migrated. 26 tests pass, lint clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 7 data flow conflicts between ExecutionContext and legacy paths

Critical fixes (CLI args silently lost):
- unified_scraper Phase 6: reads ctx.enhancement.level instead of raw JSON
  when args=None (#3, #4)
- unified_scraper Phase 6 agent: reads ctx.enhancement.agent instead of
  3 independent env var lookups (#5)
- doc_scraper._run_enhancement: uses agent_client.api_key instead of raw
  os.environ.get() — respects config file api_key (#1)

Important fixes:
- main._handle_analyze_command: populates _fake_args from ExecutionContext
  so --agent and --api-key aren't lost in analyze→enhance path (#6)
- doc_scraper type annotations: replaced forward refs with Any to avoid
  F821 undefined name errors

All changes include RuntimeError fallback for backward compatibility when
ExecutionContext isn't initialized.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: 3 crashes + 1 stub in migrated scrapers found by deep scan

1. github_scraper.py: args.scrape_only and args.enhance_level crash when
   args=None (context path). Guarded with if args and getattr(). Also
   fixed agent fallback to read ctx.enhancement.agent.

2. codebase_scraper.py: args.output and args.skip_api_reference crash in
   summary block when args=None. Replaced with output_dir local var and
   ctx.analysis.skip_api_reference.

3. epub_scraper.py: main() was still a stub ending with "# Rest of main()
   continues..." — restored full extraction + build + enhancement logic
   using ctx values exclusively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: complete ExecutionContext migration for remaining scrapers

Kimi's Phase 4 scraper migrations + Claude's review fixes.
All 18 scrapers now use context-first pattern with argv fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 1 — ExecutionContext.get() always returns context (no RuntimeError)

get() now returns a default context instead of raising RuntimeError when
not explicitly initialized. This eliminates the need for try/except
RuntimeError blocks in all 18 scrapers.

Components can always call ExecutionContext.get() safely — it returns
defaults if not initialized, or the explicitly initialized instance.

Updated tests: test_get_returns_defaults_when_not_initialized,
test_reset_clears_instance (no longer expects RuntimeError).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 2a-c — remove 16 individual scraper CLI commands

Removed individual scraper commands from:
- COMMAND_MODULES in main.py (16 entries: scrape, github, pdf, word,
  epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage,
  confluence, notion, chat)
- pyproject.toml entry points (16 skill-seekers-<type> binaries)
- parsers/__init__.py (16 parser registrations)

All source types now accessed via: skill-seekers create <source>
Kept: create, unified, analyze, enhance, package, upload, install,
      install-agent, config, doctor, and utility commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: create SkillConverter base class + converter registry

New base interface that all 17 converters will inherit:
- SkillConverter.run() — extract + build (same call for all types)
- SkillConverter.extract() — override in subclass
- SkillConverter.build_skill() — override in subclass
- get_converter(source_type, config) — factory from registry
- CONVERTER_REGISTRY — maps source type → (module, class)

create_command will use get_converter() instead of _call_module().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Grand Unification — one command, one interface, direct converters

Complete the Grand Unification refactor: `skill-seekers create` is now
the single entry point for all 18 source types. Individual scraper CLI
commands (scrape, github, pdf, analyze, unified, etc.) are removed.

## Architecture changes

- **18 SkillConverter subclasses**: Every scraper now inherits SkillConverter
  with extract() + build_skill() + SOURCE_TYPE. Factory via get_converter().
- **create_command.py rewritten**: _build_config() constructs config dicts
  from ExecutionContext for each source type. Direct converter.run() calls
  replace the old _build_argv() + sys.argv swap + _call_module() machinery.
- **main.py simplified**: create command bypasses _reconstruct_argv entirely,
  calls CreateCommand(args).execute() directly. analyze/unified commands
  removed (create handles both via auto-detection).
- **CreateParser mode="all"**: Top-level parser now accepts all 120+ flags
  (--browser, --max-pages, --depth, etc.) since create is the only entry.
- **Centralized enhancement**: Runs once in create_command after converter,
  not duplicated in each scraper.
- **MCP tools use converters**: 5 scraping tools call get_converter()
  directly instead of subprocess. Config type auto-detected from keys.
- **ConfigValidator → UniSkillConfigValidator**: Renamed with backward-
  compat alias.
- **Data flow**: AgentClient + LocalSkillEnhancer read ExecutionContext
  first, env vars as fallback.

## What was removed

- main() from all 18 scraper files (~3400 lines)
- 18 CLI commands from COMMAND_MODULES + pyproject.toml entry points
- analyze + unified parsers from parser registry
- _build_argv, _call_module, _SKIP_ARGS, _DEST_TO_FLAG, all _route_*()
- setup_argument_parser, get_configuration, _check_deprecated_flags
- Tests referencing removed commands/functions

## Net impact

51 files changed, ~6000 lines removed. 2996 tests pass, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review fixes for Grand Unification PR

- Add autouse conftest fixture to reset ExecutionContext singleton between tests
- Replace hardcoded defaults in _is_explicitly_set() with parser-derived defaults
- Upgrade ExecutionContext double-init log from debug to info
- Use logger.exception() in SkillConverter.run() to preserve tracebacks
- Fix docstring "17 types" → "18 types" in skill_converter.py
- DRY up 10 copy-paste help handlers into dict + loop (~100 lines removed)
- Fix 2 CI workflows still referencing removed `skill-seekers scrape` command
- Remove broken pyproject.toml entry point for codebase_scraper:main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 12 logic/flow issues found in deep review

Critical fixes:
- UnifiedScraper.run(): replace sys.exit(1) with return 1, add return 0
- doc_scraper: use ExecutionContext.get() when already initialized instead
  of re-calling initialize() which silently discards new config
- unified_scraper: define enhancement_config before try/except to prevent
  UnboundLocalError in LOCAL enhancement timeout read

Important fixes:
- override(): cleaner tuple save/restore for singleton swap
- --agent without --api-key now sets mode="local" so env API key doesn't
  override explicit agent choice
- Remove DeprecationWarning from _reconstruct_argv (fires on every
  non-create command in production)
- Rewrite scrape_generic_tool to use get_converter() instead of subprocess
  calls to removed main() functions
- SkillConverter.run() checks build_skill() return value, returns 1 if False
- estimate_pages_tool uses -m module invocation instead of .py file path

Low-priority fixes:
- get_converter() raises descriptive ValueError on class name typo
- test_default_values: save/clear API key env vars before asserting mode
- test_get_converter_pdf: fix config key "path" → "pdf_path"

3056 passed, 4 failed (pre-existing dep version issues), 32 skipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update MCP server tests to mock converter instead of subprocess

scrape_docs_tool now uses get_converter() + _run_converter() in-process
instead of run_subprocess_with_streaming. Update 4 TestScrapeDocsTool
tests to mock the converter layer instead of the removed subprocess path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: YusufKaraaslanSpyke <yusuf@spykegames.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:00:52 +03:00

242 lines
7.9 KiB
Python

"""Basic integration tests for create command.
Tests that the create command properly detects source types
and routes to the correct scrapers without actually scraping.
"""
import pytest
class TestCreateCommandBasic:
"""Basic integration tests for create command (dry-run mode)."""
def test_create_command_help(self):
"""Test that create command help works."""
import subprocess
result = subprocess.run(
["skill-seekers", "create", "--help"], capture_output=True, text=True
)
assert result.returncode == 0
assert "Auto-detects source type" in result.stdout
assert "auto-detected" in result.stdout
assert "--help-web" in result.stdout
def test_create_detects_web_url(self):
"""Test that web URLs are detected and routed correctly."""
from skill_seekers.cli.source_detector import SourceDetector
info = SourceDetector.detect("https://docs.react.dev/")
assert info.type == "web"
assert info.parsed["url"] == "https://docs.react.dev/"
assert info.suggested_name # non-empty
# Plain domain should also be treated as web
info2 = SourceDetector.detect("docs.example.com")
assert info2.type == "web"
def test_create_detects_github_repo(self):
"""Test that GitHub repos are detected."""
import subprocess
result = subprocess.run(
["skill-seekers", "create", "facebook/react", "--help"],
capture_output=True,
text=True,
timeout=10,
)
# Just verify help works - actual scraping would need API token
assert result.returncode in [0, 2] # 0 for success, 2 for argparse help
def test_create_detects_local_directory(self, tmp_path):
"""Test that local directories are detected."""
import subprocess
# Create a test directory
test_dir = tmp_path / "test_project"
test_dir.mkdir()
result = subprocess.run(
["skill-seekers", "create", str(test_dir), "--help"],
capture_output=True,
text=True,
timeout=10,
)
# Verify help works
assert result.returncode in [0, 2]
def test_create_detects_pdf_file(self, tmp_path):
"""Test that PDF files are detected."""
import subprocess
# Create a dummy PDF file
pdf_file = tmp_path / "test.pdf"
pdf_file.touch()
result = subprocess.run(
["skill-seekers", "create", str(pdf_file), "--help"],
capture_output=True,
text=True,
timeout=10,
)
# Verify help works
assert result.returncode in [0, 2]
def test_create_detects_config_file(self, tmp_path):
"""Test that config files are detected."""
import subprocess
import json
# Create a minimal config file
config_file = tmp_path / "test.json"
config_data = {"name": "test", "base_url": "https://example.com/"}
config_file.write_text(json.dumps(config_data))
result = subprocess.run(
["skill-seekers", "create", str(config_file), "--help"],
capture_output=True,
text=True,
timeout=10,
)
# Verify help works
assert result.returncode in [0, 2]
class TestCreateCommandConverterRouting:
"""Tests that create command routes to correct converters."""
def test_get_converter_web(self):
"""Test that get_converter returns DocToSkillConverter for web."""
from skill_seekers.cli.skill_converter import get_converter
config = {"name": "test", "base_url": "https://example.com"}
converter = get_converter("web", config)
assert converter.SOURCE_TYPE == "web"
assert converter.name == "test"
def test_get_converter_github(self):
"""Test that get_converter returns GitHubScraper for github."""
from skill_seekers.cli.skill_converter import get_converter
config = {"name": "test", "repo": "owner/repo"}
converter = get_converter("github", config)
assert converter.SOURCE_TYPE == "github"
assert converter.name == "test"
def test_get_converter_pdf(self):
"""Test that get_converter returns PDFToSkillConverter for pdf."""
from skill_seekers.cli.skill_converter import get_converter
config = {"name": "test", "pdf_path": "/tmp/test.pdf"}
converter = get_converter("pdf", config)
assert converter.SOURCE_TYPE == "pdf"
assert converter.name == "test"
def test_get_converter_unknown_raises(self):
"""Test that get_converter raises ValueError for unknown type."""
from skill_seekers.cli.skill_converter import get_converter
with pytest.raises(ValueError, match="Unknown source type"):
get_converter("unknown_type", {})
class TestExecutionContextIntegration:
"""Tests that ExecutionContext flows correctly through the system."""
def test_execution_context_auto_initializes(self):
"""ExecutionContext.get() returns defaults without explicit init."""
from skill_seekers.cli.execution_context import ExecutionContext
# Reset to ensure clean state
ExecutionContext.reset()
# Should not raise - returns default context
ctx = ExecutionContext.get()
assert ctx is not None
assert ctx.output.name is None # Default value
ExecutionContext.reset()
def test_execution_context_values_preserved(self):
"""Values set in context are preserved and accessible."""
from skill_seekers.cli.execution_context import ExecutionContext
import argparse
ExecutionContext.reset()
args = argparse.Namespace(
source="https://example.com",
name="test_skill",
enhance_level=3,
dry_run=True,
)
ctx = ExecutionContext.initialize(args=args)
assert ctx.output.name == "test_skill"
assert ctx.enhancement.level == 3
assert ctx.output.dry_run is True
# Getting context again returns same values
ctx2 = ExecutionContext.get()
assert ctx2.output.name == "test_skill"
ExecutionContext.reset()
class TestUnifiedCommands:
"""Test that unified commands still work."""
def test_main_help_shows_available_commands(self):
"""Main help should show available commands."""
import subprocess
result = subprocess.run(
["skill-seekers", "--help"], capture_output=True, text=True, timeout=10
)
assert result.returncode == 0
# Should show create command
assert "create" in result.stdout
# Should show enhance command
assert "enhance" in result.stdout
def test_workflows_command_still_works(self):
"""The workflows subcommand is accessible via the main CLI."""
import subprocess
result = subprocess.run(
["skill-seekers", "workflows", "--help"],
capture_output=True,
text=True,
timeout=10,
)
assert result.returncode == 0
class TestRemovedCommands:
"""Test that old individual scraper commands are properly removed."""
def test_scrape_command_removed(self):
"""Old scrape command should not exist."""
import subprocess
result = subprocess.run(
["skill-seekers", "scrape", "--help"], capture_output=True, text=True, timeout=10
)
# Should fail - command removed
assert result.returncode == 2
assert "invalid choice" in result.stderr
def test_github_command_removed(self):
"""Old github command should not exist."""
import subprocess
result = subprocess.run(
["skill-seekers", "github", "--help"], capture_output=True, text=True, timeout=10
)
# Should fail - command removed
assert result.returncode == 2
assert "invalid choice" in result.stderr