feat: Grand Unification — one command, one interface, direct converters (#346)
* fix: resolve 8 pipeline bugs found during skill quality review - Fix 0 APIs extracted from documentation by enriching summary.json with individual page file content before conflict detection - Fix all "Unknown" entries in merged_api.md by injecting dict keys as API names and falling back to AI merger field names - Fix frontmatter using raw slugs instead of config name by normalizing frontmatter after SKILL.md generation - Fix leaked absolute filesystem paths in patterns/index.md by stripping .skillseeker-cache repo clone prefixes - Fix ARCHITECTURE.md file count always showing "1 files" by counting files per language from code_analysis data - Fix YAML parse errors on GitHub Actions workflows by converting boolean keys (on: true) to strings - Fix false React/Vue.js framework detection in C# projects by filtering web frameworks based on primary language - Improve how-to guide generation by broadening workflow example filter to include setup/config examples with sufficient complexity - Fix test_git_sources_e2e failures caused by git init default branch being 'main' instead of 'master' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address 6 review issues in ExecutionContext implementation Fixes from code review: 1. Mode resolution (#3 critical): _args_to_data no longer unconditionally overwrites mode. Only writes mode="api" when --api-key explicitly passed. Env-var-based mode detection moved to _default_data() as lowest priority. 2. Re-initialization warning (#4): initialize() now logs debug message when called a second time instead of silently returning stale instance. 3. _raw_args preserved in override (#5): temp context now copies _raw_args from parent so get_raw() works correctly inside override blocks. 4. test_local_mode_detection env cleanup (#7): test now saves/restores API key env vars to prevent failures when ANTHROPIC_API_KEY is set. 5. _load_config_file error handling (#8): wraps FileNotFoundError and JSONDecodeError with user-friendly ValueError messages. 6. Lint fixes: added logging import, fixed Generator import from collections.abc, fixed AgentClient return type annotation. Remaining P2/P3 items (documented, not blocking): - Lock TOCTOU in override() — safe on CPython, needs fix for no-GIL - get() reads _instance without lock — same CPython caveat - config_path not stored on instance - AnalysisSettings.depth not Literal constrained Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address all remaining P2/P3 review issues in ExecutionContext 1. Thread safety: get() now acquires _lock before reading _instance (#2) 2. Thread safety: override() saves/restores _initialized flag to prevent re-init during override blocks (#10) 3. Config path stored: _config_path PrivateAttr + config_path property (#6) 4. Literal validation: AnalysisSettings.depth now uses Literal["surface", "deep", "full"] — rejects invalid values (#9) 5. Test updated: test_analysis_depth_choices now expects ValidationError for invalid depth, added test_analysis_depth_valid_choices 6. Lint cleanup: removed unused imports, fixed whitespace in tests All 10 previously reported issues now resolved. 26 tests pass, lint clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore 5 truncated scrapers, migrate unified_scraper, fix context init 5 scrapers had main() truncated with "# Original main continues here..." after Kimi's migration — business logic was never connected: - html_scraper.py — restored HtmlToSkillConverter extraction + build - pptx_scraper.py — restored PptxToSkillConverter extraction + build - confluence_scraper.py — restored ConfluenceToSkillConverter with 3 modes - notion_scraper.py — restored NotionToSkillConverter with 4 sources - chat_scraper.py — restored ChatToSkillConverter extraction + build unified_scraper.py — migrated main() to context-first pattern with argv fallback Fixed context initialization chain: - main.py no longer initializes ExecutionContext (was stealing init from commands) - create_command.py now passes config_path from source_info.parsed - execution_context.py handles SourceInfo.raw_input (not raw_source) All 18 scrapers now genuinely migrated. 26 tests pass, lint clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve 7 data flow conflicts between ExecutionContext and legacy paths Critical fixes (CLI args silently lost): - unified_scraper Phase 6: reads ctx.enhancement.level instead of raw JSON when args=None (#3, #4) - unified_scraper Phase 6 agent: reads ctx.enhancement.agent instead of 3 independent env var lookups (#5) - doc_scraper._run_enhancement: uses agent_client.api_key instead of raw os.environ.get() — respects config file api_key (#1) Important fixes: - main._handle_analyze_command: populates _fake_args from ExecutionContext so --agent and --api-key aren't lost in analyze→enhance path (#6) - doc_scraper type annotations: replaced forward refs with Any to avoid F821 undefined name errors All changes include RuntimeError fallback for backward compatibility when ExecutionContext isn't initialized. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: 3 crashes + 1 stub in migrated scrapers found by deep scan 1. github_scraper.py: args.scrape_only and args.enhance_level crash when args=None (context path). Guarded with if args and getattr(). Also fixed agent fallback to read ctx.enhancement.agent. 2. codebase_scraper.py: args.output and args.skip_api_reference crash in summary block when args=None. Replaced with output_dir local var and ctx.analysis.skip_api_reference. 3. epub_scraper.py: main() was still a stub ending with "# Rest of main() continues..." — restored full extraction + build + enhancement logic using ctx values exclusively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: complete ExecutionContext migration for remaining scrapers Kimi's Phase 4 scraper migrations + Claude's review fixes. All 18 scrapers now use context-first pattern with argv fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Phase 1 — ExecutionContext.get() always returns context (no RuntimeError) get() now returns a default context instead of raising RuntimeError when not explicitly initialized. This eliminates the need for try/except RuntimeError blocks in all 18 scrapers. Components can always call ExecutionContext.get() safely — it returns defaults if not initialized, or the explicitly initialized instance. Updated tests: test_get_returns_defaults_when_not_initialized, test_reset_clears_instance (no longer expects RuntimeError). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Phase 2a-c — remove 16 individual scraper CLI commands Removed individual scraper commands from: - COMMAND_MODULES in main.py (16 entries: scrape, github, pdf, word, epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat) - pyproject.toml entry points (16 skill-seekers-<type> binaries) - parsers/__init__.py (16 parser registrations) All source types now accessed via: skill-seekers create <source> Kept: create, unified, analyze, enhance, package, upload, install, install-agent, config, doctor, and utility commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: create SkillConverter base class + converter registry New base interface that all 17 converters will inherit: - SkillConverter.run() — extract + build (same call for all types) - SkillConverter.extract() — override in subclass - SkillConverter.build_skill() — override in subclass - get_converter(source_type, config) — factory from registry - CONVERTER_REGISTRY — maps source type → (module, class) create_command will use get_converter() instead of _call_module(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Grand Unification — one command, one interface, direct converters Complete the Grand Unification refactor: `skill-seekers create` is now the single entry point for all 18 source types. Individual scraper CLI commands (scrape, github, pdf, analyze, unified, etc.) are removed. ## Architecture changes - **18 SkillConverter subclasses**: Every scraper now inherits SkillConverter with extract() + build_skill() + SOURCE_TYPE. Factory via get_converter(). - **create_command.py rewritten**: _build_config() constructs config dicts from ExecutionContext for each source type. Direct converter.run() calls replace the old _build_argv() + sys.argv swap + _call_module() machinery. - **main.py simplified**: create command bypasses _reconstruct_argv entirely, calls CreateCommand(args).execute() directly. analyze/unified commands removed (create handles both via auto-detection). - **CreateParser mode="all"**: Top-level parser now accepts all 120+ flags (--browser, --max-pages, --depth, etc.) since create is the only entry. - **Centralized enhancement**: Runs once in create_command after converter, not duplicated in each scraper. - **MCP tools use converters**: 5 scraping tools call get_converter() directly instead of subprocess. Config type auto-detected from keys. - **ConfigValidator → UniSkillConfigValidator**: Renamed with backward- compat alias. - **Data flow**: AgentClient + LocalSkillEnhancer read ExecutionContext first, env vars as fallback. ## What was removed - main() from all 18 scraper files (~3400 lines) - 18 CLI commands from COMMAND_MODULES + pyproject.toml entry points - analyze + unified parsers from parser registry - _build_argv, _call_module, _SKIP_ARGS, _DEST_TO_FLAG, all _route_*() - setup_argument_parser, get_configuration, _check_deprecated_flags - Tests referencing removed commands/functions ## Net impact 51 files changed, ~6000 lines removed. 2996 tests pass, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review fixes for Grand Unification PR - Add autouse conftest fixture to reset ExecutionContext singleton between tests - Replace hardcoded defaults in _is_explicitly_set() with parser-derived defaults - Upgrade ExecutionContext double-init log from debug to info - Use logger.exception() in SkillConverter.run() to preserve tracebacks - Fix docstring "17 types" → "18 types" in skill_converter.py - DRY up 10 copy-paste help handlers into dict + loop (~100 lines removed) - Fix 2 CI workflows still referencing removed `skill-seekers scrape` command - Remove broken pyproject.toml entry point for codebase_scraper:main Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve 12 logic/flow issues found in deep review Critical fixes: - UnifiedScraper.run(): replace sys.exit(1) with return 1, add return 0 - doc_scraper: use ExecutionContext.get() when already initialized instead of re-calling initialize() which silently discards new config - unified_scraper: define enhancement_config before try/except to prevent UnboundLocalError in LOCAL enhancement timeout read Important fixes: - override(): cleaner tuple save/restore for singleton swap - --agent without --api-key now sets mode="local" so env API key doesn't override explicit agent choice - Remove DeprecationWarning from _reconstruct_argv (fires on every non-create command in production) - Rewrite scrape_generic_tool to use get_converter() instead of subprocess calls to removed main() functions - SkillConverter.run() checks build_skill() return value, returns 1 if False - estimate_pages_tool uses -m module invocation instead of .py file path Low-priority fixes: - get_converter() raises descriptive ValueError on class name typo - test_default_values: save/clear API key env vars before asserting mode - test_get_converter_pdf: fix config key "path" → "pdf_path" 3056 passed, 4 failed (pre-existing dep version issues), 32 skipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update MCP server tests to mock converter instead of subprocess scrape_docs_tool now uses get_converter() + _run_converter() in-process instead of run_subprocess_with_streaming. Update 4 TestScrapeDocsTool tests to mock the converter layer instead of the removed subprocess path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: YusufKaraaslanSpyke <yusuf@spykegames.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2
.github/workflows/scheduled-updates.yml
vendored
2
.github/workflows/scheduled-updates.yml
vendored
@@ -122,7 +122,7 @@ jobs:
|
||||
fi
|
||||
|
||||
# Use streaming ingestion for large docs
|
||||
skill-seekers scrape --config "$CONFIG_FILE" --streaming --max-pages 200
|
||||
skill-seekers create "$CONFIG_FILE" --max-pages 200
|
||||
|
||||
- name: Generate quality report
|
||||
if: steps.should_update.outputs.update == 'true'
|
||||
|
||||
2
.github/workflows/vector-db-export.yml
vendored
2
.github/workflows/vector-db-export.yml
vendored
@@ -74,7 +74,7 @@ jobs:
|
||||
if: steps.check_config.outputs.exists == 'true'
|
||||
run: |
|
||||
echo "📥 Scraping documentation for $SKILL_NAME..."
|
||||
skill-seekers scrape --config "${{ steps.config.outputs.path }}" --max-pages 100
|
||||
skill-seekers create "${{ steps.config.outputs.path }}" --max-pages 100
|
||||
continue-on-error: true
|
||||
|
||||
- name: Determine export targets
|
||||
|
||||
36
CLAUDE.md
36
CLAUDE.md
@@ -52,17 +52,26 @@ Runs on push/PR to `main` or `development`. Lint job (Python 3.12, Ubuntu) + Tes
|
||||
|
||||
## Architecture
|
||||
|
||||
### CLI: Git-style dispatcher
|
||||
### CLI: Unified create command
|
||||
|
||||
Entry point `src/skill_seekers/cli/main.py` maps subcommands to modules. The `create` command auto-detects source type and is the recommended entry point for users.
|
||||
Entry point `src/skill_seekers/cli/main.py`. The `create` command is the **only** entry point for skill creation — it auto-detects source type and routes to the appropriate `SkillConverter`.
|
||||
|
||||
```
|
||||
skill-seekers create <source> # Auto-detect: URL, owner/repo, ./path, file.pdf, etc.
|
||||
skill-seekers <type> [options] # Direct: scrape, github, pdf, word, epub, video, jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat
|
||||
skill-seekers analyze <dir> # Analyze local codebase (C3.x pipeline)
|
||||
skill-seekers package <dir> # Package for platform (--target claude/gemini/openai/markdown/minimax/opencode/kimi/deepseek/qwen/openrouter/together/fireworks, --format langchain/llama-index/haystack/chroma/faiss/weaviate/qdrant/pinecone)
|
||||
```
|
||||
|
||||
### SkillConverter Pattern (Template Method + Factory)
|
||||
|
||||
All 18 source types implement the `SkillConverter` base class (`skill_converter.py`):
|
||||
|
||||
```python
|
||||
converter = get_converter("web", config) # Factory lookup
|
||||
converter.run() # Template: extract() → build_skill()
|
||||
```
|
||||
|
||||
Registry in `CONVERTER_REGISTRY` maps source type → (module, class). `create_command.py` builds config from `ExecutionContext`, calls `get_converter()`, then runs centralized enhancement.
|
||||
|
||||
### Data Flow (5 phases)
|
||||
|
||||
1. **Scrape** - Source-specific scraper extracts content to `output/{name}_data/pages/*.json`
|
||||
@@ -105,9 +114,9 @@ src/skill_seekers/cli/adaptors/
|
||||
|
||||
`--target` = LLM platforms, `--format` = RAG/vector DBs. All adaptors are imported with `try/except ImportError` so missing optional deps don't break the registry.
|
||||
|
||||
### 17 Source Type Scrapers
|
||||
### 18 Source Type Converters
|
||||
|
||||
Each in `src/skill_seekers/cli/{type}_scraper.py` with a `main()` entry point. The `create_command.py` uses `source_detector.py` to auto-route. New scrapers added in v3.2.0+: jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat.
|
||||
Each in `src/skill_seekers/cli/{type}_scraper.py` as a `SkillConverter` subclass (no `main()`). The `create_command.py` uses `source_detector.py` to auto-detect, then calls `get_converter()`. Converters: web (doc_scraper), github, pdf, word, epub, video, local (codebase_scraper), jupyter, html, openapi, asciidoc, pptx, rss, manpage, confluence, notion, chat, config (unified_scraper).
|
||||
|
||||
### CLI Argument System
|
||||
|
||||
@@ -228,13 +237,14 @@ GITHUB_TOKEN=ghp_... # Higher GitHub rate limits
|
||||
3. Add optional dep to `pyproject.toml`
|
||||
4. Add tests in `tests/`
|
||||
|
||||
### New source type scraper
|
||||
1. Create `src/skill_seekers/cli/{type}_scraper.py` with `main()`
|
||||
2. Add to `COMMAND_MODULES` in `cli/main.py`
|
||||
3. Add entry point in `pyproject.toml` `[project.scripts]`
|
||||
4. Add auto-detection in `source_detector.py`
|
||||
5. Add optional dep if needed
|
||||
6. Add tests
|
||||
### New source type converter
|
||||
1. Create `src/skill_seekers/cli/{type}_scraper.py` with a class inheriting `SkillConverter`
|
||||
2. Implement `extract()` and `build_skill()` methods, set `SOURCE_TYPE`
|
||||
3. Register in `CONVERTER_REGISTRY` in `skill_converter.py`
|
||||
4. Add source type config building in `create_command.py:_build_config()`
|
||||
5. Add auto-detection in `source_detector.py`
|
||||
6. Add optional dep if needed
|
||||
7. Add tests
|
||||
|
||||
### New CLI argument
|
||||
- Universal: `UNIVERSAL_ARGUMENTS` in `arguments/create.py`
|
||||
|
||||
@@ -299,32 +299,33 @@ Documentation = "https://skillseekersweb.com/"
|
||||
"Homebrew Tap" = "https://github.com/yusufkaraaslan/homebrew-skill-seekers"
|
||||
|
||||
[project.scripts]
|
||||
# Main unified CLI
|
||||
# Main CLI entry point
|
||||
skill-seekers = "skill_seekers.cli.main:main"
|
||||
|
||||
# Individual tool entry points
|
||||
skill-seekers-create = "skill_seekers.cli.create_command:main" # NEW: Unified create command
|
||||
skill-seekers-doctor = "skill_seekers.cli.doctor:main"
|
||||
skill-seekers-config = "skill_seekers.cli.config_command:main"
|
||||
skill-seekers-resume = "skill_seekers.cli.resume_command:main"
|
||||
skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
|
||||
skill-seekers-github = "skill_seekers.cli.github_scraper:main"
|
||||
skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
|
||||
skill-seekers-word = "skill_seekers.cli.word_scraper:main"
|
||||
skill-seekers-epub = "skill_seekers.cli.epub_scraper:main"
|
||||
skill-seekers-video = "skill_seekers.cli.video_scraper:main"
|
||||
skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
|
||||
# Core commands
|
||||
skill-seekers-create = "skill_seekers.cli.create_command:main"
|
||||
skill-seekers-enhance = "skill_seekers.cli.enhance_command:main"
|
||||
skill-seekers-enhance-status = "skill_seekers.cli.enhance_status:main"
|
||||
skill-seekers-package = "skill_seekers.cli.package_skill:main"
|
||||
skill-seekers-upload = "skill_seekers.cli.upload_skill:main"
|
||||
skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
|
||||
skill-seekers-install = "skill_seekers.cli.install_skill:main"
|
||||
skill-seekers-install-agent = "skill_seekers.cli.install_agent:main"
|
||||
skill-seekers-codebase = "skill_seekers.cli.codebase_scraper:main"
|
||||
|
||||
# Analysis & utilities
|
||||
skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
|
||||
skill-seekers-patterns = "skill_seekers.cli.pattern_recognizer:main"
|
||||
skill-seekers-how-to-guides = "skill_seekers.cli.how_to_guide_builder:main"
|
||||
skill-seekers-quality = "skill_seekers.cli.quality_metrics:main"
|
||||
skill-seekers-workflows = "skill_seekers.cli.workflows_command:main"
|
||||
|
||||
# Configuration & setup
|
||||
skill-seekers-config = "skill_seekers.cli.config_command:main"
|
||||
skill-seekers-doctor = "skill_seekers.cli.doctor:main"
|
||||
skill-seekers-setup = "skill_seekers.cli.setup_wizard:main"
|
||||
skill-seekers-resume = "skill_seekers.cli.resume_command:main"
|
||||
skill-seekers-sync-config = "skill_seekers.cli.sync_config:main"
|
||||
|
||||
# Advanced
|
||||
skill-seekers-cloud = "skill_seekers.cli.cloud_storage_cli:main"
|
||||
skill-seekers-embed = "skill_seekers.embedding.server:main"
|
||||
skill-seekers-sync = "skill_seekers.cli.sync_cli:main"
|
||||
@@ -332,22 +333,6 @@ skill-seekers-benchmark = "skill_seekers.cli.benchmark_cli:main"
|
||||
skill-seekers-stream = "skill_seekers.cli.streaming_ingest:main"
|
||||
skill-seekers-update = "skill_seekers.cli.incremental_updater:main"
|
||||
skill-seekers-multilang = "skill_seekers.cli.multilang_support:main"
|
||||
skill-seekers-quality = "skill_seekers.cli.quality_metrics:main"
|
||||
skill-seekers-workflows = "skill_seekers.cli.workflows_command:main"
|
||||
skill-seekers-sync-config = "skill_seekers.cli.sync_config:main"
|
||||
|
||||
# New source type entry points (v3.2.0+)
|
||||
skill-seekers-jupyter = "skill_seekers.cli.jupyter_scraper:main"
|
||||
skill-seekers-html = "skill_seekers.cli.html_scraper:main"
|
||||
skill-seekers-openapi = "skill_seekers.cli.openapi_scraper:main"
|
||||
skill-seekers-asciidoc = "skill_seekers.cli.asciidoc_scraper:main"
|
||||
skill-seekers-pptx = "skill_seekers.cli.pptx_scraper:main"
|
||||
skill-seekers-rss = "skill_seekers.cli.rss_scraper:main"
|
||||
skill-seekers-manpage = "skill_seekers.cli.man_scraper:main"
|
||||
skill-seekers-confluence = "skill_seekers.cli.confluence_scraper:main"
|
||||
skill-seekers-notion = "skill_seekers.cli.notion_scraper:main"
|
||||
skill-seekers-chat = "skill_seekers.cli.chat_scraper:main"
|
||||
skill-seekers-opencode-split = "skill_seekers.cli.opencode_skill_splitter:main"
|
||||
|
||||
[tool.setuptools]
|
||||
package-dir = {"" = "src"}
|
||||
|
||||
@@ -37,8 +37,8 @@ echo "✓ Done"
|
||||
# Step 2: Run codebase analysis
|
||||
echo "Step 2: Analyzing codebase..."
|
||||
rm -rf "$OUTPUT_DIR" 2>/dev/null || true
|
||||
uv run skill-seekers analyze \
|
||||
--directory "$PROJECT_ROOT" \
|
||||
uv run skill-seekers create "$PROJECT_ROOT" \
|
||||
--name "$SKILL_NAME" \
|
||||
--output "$OUTPUT_DIR" 2>&1 | grep -E "^(INFO|✅)" || true
|
||||
echo "✓ Done"
|
||||
|
||||
|
||||
@@ -16,16 +16,16 @@ pip install skill-seekers
|
||||
|
||||
| Source | Command |
|
||||
|--------|---------|
|
||||
| Local code | `skill-seekers analyze --directory ./path` |
|
||||
| Docs URL | `skill-seekers scrape --url https://...` |
|
||||
| GitHub | `skill-seekers github --repo owner/repo` |
|
||||
| PDF | `skill-seekers pdf --file doc.pdf` |
|
||||
| Local code | `skill-seekers create ./path` |
|
||||
| Docs URL | `skill-seekers create https://docs.example.com` |
|
||||
| GitHub | `skill-seekers create owner/repo` |
|
||||
| PDF | `skill-seekers create document.pdf` |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Analyze local codebase
|
||||
skill-seekers analyze --directory /path/to/project --output output/my-skill/
|
||||
skill-seekers create /path/to/project --name my-skill
|
||||
|
||||
# Package for Claude
|
||||
yes | skill-seekers package output/my-skill/ --no-open
|
||||
|
||||
@@ -21,6 +21,9 @@ from .llms_txt_detector import LlmsTxtDetector
|
||||
from .llms_txt_downloader import LlmsTxtDownloader
|
||||
from .llms_txt_parser import LlmsTxtParser
|
||||
|
||||
# ExecutionContext - single source of truth for all configuration
|
||||
from .execution_context import ExecutionContext, get_context
|
||||
|
||||
try:
|
||||
from .utils import open_folder, read_reference_files
|
||||
except ImportError:
|
||||
@@ -35,6 +38,8 @@ __all__ = [
|
||||
"LlmsTxtDetector",
|
||||
"LlmsTxtDownloader",
|
||||
"LlmsTxtParser",
|
||||
"ExecutionContext",
|
||||
"get_context",
|
||||
"open_folder",
|
||||
"read_reference_files",
|
||||
"__version__",
|
||||
|
||||
@@ -164,9 +164,16 @@ class AgentClient:
|
||||
Resolved from: arg → env SKILL_SEEKER_AGENT → "claude"
|
||||
api_key: API key override. If None, auto-detected from env vars.
|
||||
"""
|
||||
# Resolve agent name
|
||||
# Resolve agent name: param > ExecutionContext > env var > default
|
||||
try:
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
ctx = ExecutionContext.get()
|
||||
ctx_agent = ctx.enhancement.agent or ""
|
||||
except Exception:
|
||||
ctx_agent = ""
|
||||
env_agent = os.environ.get("SKILL_SEEKER_AGENT", "").strip()
|
||||
self.agent = normalize_agent_name(agent or env_agent or "claude")
|
||||
self.agent = normalize_agent_name(agent or ctx_agent or env_agent or "claude")
|
||||
self.agent_display = AGENT_PRESETS.get(self.agent, {}).get("display_name", self.agent)
|
||||
|
||||
# Detect API key and provider
|
||||
|
||||
@@ -139,6 +139,10 @@ class ArchitecturalPatternDetector:
|
||||
"Laravel": ["laravel", "illuminate", "artisan", "app/Http/Controllers", "app/Models"],
|
||||
}
|
||||
|
||||
# Web frameworks should only match for web-language projects
|
||||
_WEB_FRAMEWORKS = {"React", "Vue.js", "Express", "Angular"}
|
||||
_WEB_LANGUAGES = {"JavaScript", "TypeScript", "Python", "PHP", "Ruby"}
|
||||
|
||||
def __init__(self, enhance_with_ai: bool = True, agent: str | None = None):
|
||||
"""
|
||||
Initialize detector.
|
||||
@@ -268,11 +272,26 @@ class ArchitecturalPatternDetector:
|
||||
# Return early to prevent web framework false positives
|
||||
return detected
|
||||
|
||||
# Determine primary language to filter out impossible framework matches
|
||||
# e.g., C#/C++ projects should not match React/Vue.js/Express
|
||||
lang_counts: dict[str, int] = {}
|
||||
for file_data in files:
|
||||
lang = file_data.get("language", "")
|
||||
if lang:
|
||||
lang_counts[lang] = lang_counts.get(lang, 0) + 1
|
||||
primary_lang = max(lang_counts, key=lang_counts.get) if lang_counts else ""
|
||||
|
||||
skip_web = primary_lang and primary_lang not in self._WEB_LANGUAGES
|
||||
|
||||
# Check other frameworks (including imports - fixes #239)
|
||||
for framework, markers in self.FRAMEWORK_MARKERS.items():
|
||||
if framework in ["Unity", "Unreal", "Godot"]:
|
||||
continue # Already checked
|
||||
|
||||
# Skip web frameworks for non-web language projects
|
||||
if skip_web and framework in self._WEB_FRAMEWORKS:
|
||||
continue
|
||||
|
||||
# Check in file paths, directory structure, AND imports
|
||||
path_matches = sum(1 for marker in markers if marker.lower() in all_content.lower())
|
||||
dir_matches = sum(1 for marker in markers if marker.lower() in dir_content.lower())
|
||||
|
||||
@@ -938,3 +938,14 @@ def add_create_arguments(parser: argparse.ArgumentParser, mode: str = "default")
|
||||
action="store_true",
|
||||
help=argparse.SUPPRESS,
|
||||
)
|
||||
|
||||
|
||||
def get_create_defaults() -> dict[str, Any]:
|
||||
"""Build a defaults dict from a throwaway parser with all create arguments.
|
||||
|
||||
Used by CreateCommand._is_explicitly_set() to compare argument values
|
||||
against their registered defaults instead of hardcoded values.
|
||||
"""
|
||||
temp = argparse.ArgumentParser(add_help=False)
|
||||
add_create_arguments(temp, mode="all")
|
||||
return {action.dest: action.default for action in temp._actions if action.dest != "help"}
|
||||
|
||||
@@ -15,12 +15,10 @@ Usage:
|
||||
skill-seekers asciidoc --from-json doc_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Optional dependency guard — asciidoc library for HTML conversion
|
||||
@@ -31,6 +29,8 @@ try:
|
||||
except ImportError:
|
||||
ASCIIDOC_AVAILABLE = False
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
ASCIIDOC_EXTENSIONS = {".adoc", ".asciidoc", ".asc", ".ad"}
|
||||
@@ -112,7 +112,7 @@ def _score_code_quality(code: str) -> float:
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
class AsciiDocToSkillConverter:
|
||||
class AsciiDocToSkillConverter(SkillConverter):
|
||||
"""Convert AsciiDoc documentation to an AI-ready skill.
|
||||
|
||||
Handles single ``.adoc`` files and directories. Content is parsed into
|
||||
@@ -120,7 +120,10 @@ class AsciiDocToSkillConverter:
|
||||
directory layout (SKILL.md, references/, etc.).
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "asciidoc"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.asciidoc_path: str = config.get("asciidoc_path", "")
|
||||
@@ -132,6 +135,10 @@ class AsciiDocToSkillConverter:
|
||||
self.categories: dict = config.get("categories", {})
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from AsciiDoc files (SkillConverter interface)."""
|
||||
self.extract_asciidoc()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction
|
||||
# ------------------------------------------------------------------
|
||||
@@ -943,147 +950,3 @@ class AsciiDocToSkillConverter:
|
||||
def _in_range(pos: int, ranges: list[tuple[int, int]]) -> bool:
|
||||
"""Check whether pos falls within any consumed range."""
|
||||
return any(s <= pos < e for s, e in ranges)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# CLI entry point
|
||||
# ============================================================================
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for AsciiDoc scraper."""
|
||||
from skill_seekers.cli.arguments.asciidoc import add_asciidoc_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert AsciiDoc documentation to skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
add_asciidoc_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = (
|
||||
getattr(args, "asciidoc_path", None) or getattr(args, "from_json", None) or "(none)"
|
||||
)
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: AsciiDoc Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (getattr(args, "asciidoc_path", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --asciidoc-path or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} documentation",
|
||||
}
|
||||
try:
|
||||
converter = AsciiDocToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct AsciiDoc mode
|
||||
if not getattr(args, "name", None):
|
||||
p = Path(args.asciidoc_path)
|
||||
args.name = p.stem if p.is_file() else p.name
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"asciidoc_path": args.asciidoc_path,
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = AsciiDocToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_asciidoc():
|
||||
print("\n❌ AsciiDoc extraction failed - see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except (FileNotFoundError, ValueError, RuntimeError) as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during AsciiDoc processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -34,16 +34,16 @@ Usage:
|
||||
skill-seekers chat --from-json myteam_extracted.json --name myteam
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
# Optional dependency guard — Slack SDK
|
||||
try:
|
||||
from slack_sdk import WebClient
|
||||
@@ -243,7 +243,7 @@ def _score_code_quality(code: str) -> float:
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class ChatToSkillConverter:
|
||||
class ChatToSkillConverter(SkillConverter):
|
||||
"""Convert Slack or Discord chat history into an AI-ready skill.
|
||||
|
||||
Follows the same pipeline pattern as the EPUB, Jupyter, and PPTX scrapers:
|
||||
@@ -261,6 +261,8 @@ class ChatToSkillConverter:
|
||||
channel, date range, and detected topic.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "chat"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialize the converter with a configuration dictionary.
|
||||
|
||||
@@ -276,6 +278,7 @@ class ChatToSkillConverter:
|
||||
- description (str): Skill description (optional, inferred
|
||||
if absent).
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.export_path: str = config.get("export_path", "")
|
||||
@@ -294,6 +297,10 @@ class ChatToSkillConverter:
|
||||
# Extracted data (populated by extract_chat or load_extracted_data)
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from chat history (SkillConverter interface)."""
|
||||
self.extract_chat()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction — public entry point
|
||||
# ------------------------------------------------------------------
|
||||
@@ -1730,195 +1737,3 @@ class ChatToSkillConverter:
|
||||
"""
|
||||
safe = re.sub(r"[^\w\s-]", "", name.lower())
|
||||
return re.sub(r"[-\s]+", "_", safe)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the Slack/Discord chat scraper.
|
||||
|
||||
Parses command-line arguments and runs the extraction and
|
||||
skill-building pipeline. Supports export import, API fetch,
|
||||
and loading from previously extracted JSON.
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for errors).
|
||||
"""
|
||||
from .arguments.chat import add_chat_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Slack/Discord chat history to AI-ready skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Slack workspace export
|
||||
%(prog)s --export-path ./slack-export/ --platform slack --name myteam
|
||||
|
||||
# Slack API
|
||||
%(prog)s --platform slack --token xoxb-... --channel C01234 --name myteam
|
||||
|
||||
# Discord export (DiscordChatExporter)
|
||||
%(prog)s --export-path ./discord-export.json --platform discord --name myserver
|
||||
|
||||
# Discord API
|
||||
%(prog)s --platform discord --token Bot-token --channel 12345 --name myserver
|
||||
|
||||
# From previously extracted JSON
|
||||
%(prog)s --from-json myteam_extracted.json --name myteam
|
||||
""",
|
||||
)
|
||||
|
||||
add_chat_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if args.dry_run:
|
||||
source = args.export_path or args.from_json or f"{args.platform}-api"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: Chat Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Platform: {args.platform}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {args.name or '(auto-detect)'}")
|
||||
print(f"Channel: {args.channel or '(all)'}")
|
||||
print(f"Max messages: {args.max_messages}")
|
||||
print(f"Enhance level: {args.enhance_level}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if args.from_json:
|
||||
# Build from previously extracted JSON
|
||||
name = args.name or Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": name,
|
||||
"description": (args.description or f"Use when referencing {name} chat knowledge base"),
|
||||
}
|
||||
try:
|
||||
converter = ChatToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Require either --export-path or --token for extraction
|
||||
if not args.export_path and not args.token:
|
||||
parser.error(
|
||||
"Must specify --export-path (export mode), --token (API mode), "
|
||||
"or --from-json (build from extracted data)"
|
||||
)
|
||||
|
||||
if not args.name:
|
||||
if args.export_path:
|
||||
args.name = Path(args.export_path).stem
|
||||
else:
|
||||
args.name = f"{args.platform}_chat"
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"export_path": args.export_path or "",
|
||||
"platform": args.platform,
|
||||
"token": args.token or "",
|
||||
"channel": args.channel or "",
|
||||
"max_messages": args.max_messages,
|
||||
"description": args.description,
|
||||
}
|
||||
|
||||
try:
|
||||
converter = ChatToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_chat():
|
||||
print(
|
||||
"\n❌ Chat extraction failed - see error above",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis, "
|
||||
"enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except (FileNotFoundError, ValueError) as e:
|
||||
print(f"\n❌ Input error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(
|
||||
f"\n❌ Unexpected error during chat processing: {e}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -24,12 +24,10 @@ Credits:
|
||||
- pathspec for .gitignore support: https://pypi.org/project/pathspec/
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
@@ -38,7 +36,7 @@ from skill_seekers.cli.code_analyzer import CodeAnalyzer
|
||||
from skill_seekers.cli.config_extractor import ConfigExtractor
|
||||
from skill_seekers.cli.dependency_analyzer import DependencyAnalyzer
|
||||
from skill_seekers.cli.signal_flow_analyzer import SignalFlowAnalyzer
|
||||
from skill_seekers.cli.utils import setup_logging
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
# Try to import pathspec for .gitignore support
|
||||
try:
|
||||
@@ -2147,278 +2145,56 @@ def _generate_references(output_dir: Path):
|
||||
logger.info(f"✅ Generated references directory: {references_dir}")
|
||||
|
||||
|
||||
def _check_deprecated_flags(args):
|
||||
"""Check for deprecated flags and show migration warnings."""
|
||||
warnings = []
|
||||
class CodebaseAnalyzer(SkillConverter):
|
||||
"""SkillConverter wrapper around the analyze_codebase / _generate_skill_md functions."""
|
||||
|
||||
# Deprecated: --depth
|
||||
if hasattr(args, "depth") and args.depth:
|
||||
preset_map = {
|
||||
"surface": "quick",
|
||||
"deep": "standard",
|
||||
"full": "comprehensive",
|
||||
}
|
||||
suggested_preset = preset_map.get(args.depth, "standard")
|
||||
warnings.append(
|
||||
f"⚠️ DEPRECATED: --depth {args.depth} → use --preset {suggested_preset} instead"
|
||||
SOURCE_TYPE = "local"
|
||||
|
||||
def __init__(self, config: dict[str, Any]):
|
||||
super().__init__(config)
|
||||
self.directory = Path(config.get("directory", ".")).resolve()
|
||||
self.output_dir = Path(config.get("output_dir", f"output/{self.name}"))
|
||||
self.depth = config.get("depth", "deep")
|
||||
self.languages = config.get("languages")
|
||||
self.file_patterns = config.get("file_patterns")
|
||||
self.build_api_reference = config.get("build_api_reference", True)
|
||||
self.extract_comments = config.get("extract_comments", True)
|
||||
self.build_dependency_graph = config.get("build_dependency_graph", True)
|
||||
self.detect_patterns = config.get("detect_patterns", True)
|
||||
self.extract_test_examples = config.get("extract_test_examples", True)
|
||||
self.build_how_to_guides = config.get("build_how_to_guides", True)
|
||||
self.extract_config_patterns = config.get("extract_config_patterns", True)
|
||||
self.extract_docs = config.get("extract_docs", True)
|
||||
self.enhance_level = config.get("enhance_level", 0)
|
||||
self.skill_name = config.get("skill_name") or self.name
|
||||
self.skill_description = config.get("skill_description")
|
||||
self.doc_version = config.get("doc_version", "")
|
||||
self._results: dict[str, Any] | None = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to analyze_codebase()."""
|
||||
self._results = analyze_codebase(
|
||||
directory=self.directory,
|
||||
output_dir=self.output_dir,
|
||||
depth=self.depth,
|
||||
languages=self.languages,
|
||||
file_patterns=self.file_patterns,
|
||||
build_api_reference=self.build_api_reference,
|
||||
extract_comments=self.extract_comments,
|
||||
build_dependency_graph=self.build_dependency_graph,
|
||||
detect_patterns=self.detect_patterns,
|
||||
extract_test_examples=self.extract_test_examples,
|
||||
build_how_to_guides=self.build_how_to_guides,
|
||||
extract_config_patterns=self.extract_config_patterns,
|
||||
extract_docs=self.extract_docs,
|
||||
enhance_level=self.enhance_level,
|
||||
skill_name=self.skill_name,
|
||||
skill_description=self.skill_description,
|
||||
doc_version=self.doc_version,
|
||||
)
|
||||
|
||||
# Deprecated: --ai-mode
|
||||
if hasattr(args, "ai_mode") and args.ai_mode and args.ai_mode != "auto":
|
||||
if args.ai_mode == "api":
|
||||
warnings.append(
|
||||
"⚠️ DEPRECATED: --ai-mode api → use --enhance-level with ANTHROPIC_API_KEY set instead"
|
||||
)
|
||||
elif args.ai_mode == "local":
|
||||
warnings.append(
|
||||
"⚠️ DEPRECATED: --ai-mode local → use --enhance-level without API key instead"
|
||||
)
|
||||
elif args.ai_mode == "none":
|
||||
warnings.append("⚠️ DEPRECATED: --ai-mode none → use --enhance-level 0 instead")
|
||||
|
||||
# Deprecated: --quick flag
|
||||
if hasattr(args, "quick") and args.quick:
|
||||
warnings.append("⚠️ DEPRECATED: --quick → use --preset quick instead")
|
||||
|
||||
# Deprecated: --comprehensive flag
|
||||
if hasattr(args, "comprehensive") and args.comprehensive:
|
||||
warnings.append("⚠️ DEPRECATED: --comprehensive → use --preset comprehensive instead")
|
||||
|
||||
# Show warnings if any found
|
||||
if warnings:
|
||||
print("\n" + "=" * 70)
|
||||
for warning in warnings:
|
||||
print(warning)
|
||||
print("\n💡 MIGRATION TIP:")
|
||||
print(" --preset quick (1-2 min, basic features)")
|
||||
print(" --preset standard (5-10 min, core features, DEFAULT)")
|
||||
print(" --preset comprehensive (20-60 min, all features + AI)")
|
||||
print(" --enhance-level 0-3 (granular AI enhancement control)")
|
||||
print("\n⚠️ Deprecated flags will be removed in v4.0.0")
|
||||
print("=" * 70 + "\n")
|
||||
|
||||
|
||||
def main():
|
||||
"""Command-line interface for codebase analysis."""
|
||||
from skill_seekers.cli.arguments.analyze import add_analyze_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze local codebases and extract code knowledge",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Analyze current directory
|
||||
codebase-scraper --directory . --output output/codebase/
|
||||
|
||||
# Deep analysis with API reference and dependency graph
|
||||
codebase-scraper --directory /path/to/repo --depth deep --build-api-reference --build-dependency-graph
|
||||
|
||||
# Analyze only Python and JavaScript
|
||||
codebase-scraper --directory . --languages Python,JavaScript
|
||||
|
||||
# Use file patterns
|
||||
codebase-scraper --directory . --file-patterns "*.py,src/**/*.js"
|
||||
|
||||
# Full analysis with all features (default)
|
||||
codebase-scraper --directory . --depth deep
|
||||
|
||||
# Surface analysis (fast, skip all analysis features)
|
||||
codebase-scraper --directory . --depth surface --skip-api-reference --skip-dependency-graph --skip-patterns --skip-test-examples
|
||||
|
||||
# Skip specific features
|
||||
codebase-scraper --directory . --skip-patterns --skip-test-examples
|
||||
""",
|
||||
)
|
||||
|
||||
# Register all args from the shared definitions module
|
||||
add_analyze_arguments(parser)
|
||||
|
||||
# Extra legacy arg only used by standalone CLI (not in arguments/analyze.py)
|
||||
parser.add_argument(
|
||||
"--ai-mode",
|
||||
choices=["auto", "api", "local", "none"],
|
||||
default="auto",
|
||||
help=(
|
||||
"AI enhancement mode for how-to guides: "
|
||||
"auto (auto-detect: API if ANTHROPIC_API_KEY set, else LOCAL), "
|
||||
"api (Anthropic API, requires ANTHROPIC_API_KEY), "
|
||||
"local (coding agent CLI, FREE, no API key), "
|
||||
"none (disable AI enhancement). "
|
||||
"💡 TIP: Use --enhance flag instead for simpler UX!"
|
||||
),
|
||||
)
|
||||
|
||||
# Check for deprecated flags
|
||||
deprecated_flags = {
|
||||
"--build-api-reference": "--skip-api-reference",
|
||||
"--build-dependency-graph": "--skip-dependency-graph",
|
||||
"--detect-patterns": "--skip-patterns",
|
||||
"--extract-test-examples": "--skip-test-examples",
|
||||
"--build-how-to-guides": "--skip-how-to-guides",
|
||||
"--extract-config-patterns": "--skip-config-patterns",
|
||||
}
|
||||
|
||||
for old_flag, new_flag in deprecated_flags.items():
|
||||
if old_flag in sys.argv:
|
||||
logger.warning(
|
||||
f"⚠️ DEPRECATED: {old_flag} is deprecated. "
|
||||
f"All features are now enabled by default. "
|
||||
f"Use {new_flag} to disable this feature."
|
||||
)
|
||||
|
||||
# Handle --preset-list flag BEFORE parse_args() to avoid required --directory validation
|
||||
if "--preset-list" in sys.argv:
|
||||
from skill_seekers.cli.presets import PresetManager
|
||||
|
||||
print(PresetManager.format_preset_help())
|
||||
return 0
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Check for deprecated flags and show warnings
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
# Handle presets using formal preset system
|
||||
preset_name = None
|
||||
if hasattr(args, "preset") and args.preset:
|
||||
# New --preset flag (recommended)
|
||||
preset_name = args.preset
|
||||
elif hasattr(args, "quick") and args.quick:
|
||||
# Legacy --quick flag (backward compatibility)
|
||||
preset_name = "quick"
|
||||
elif hasattr(args, "comprehensive") and args.comprehensive:
|
||||
# Legacy --comprehensive flag (backward compatibility)
|
||||
preset_name = "comprehensive"
|
||||
else:
|
||||
# Default preset if none specified
|
||||
preset_name = "standard"
|
||||
|
||||
# Apply preset using PresetManager
|
||||
if preset_name:
|
||||
from skill_seekers.cli.presets import PresetManager
|
||||
|
||||
try:
|
||||
preset_args = PresetManager.apply_preset(preset_name, vars(args))
|
||||
# Update args with preset values
|
||||
for key, value in preset_args.items():
|
||||
setattr(args, key, value)
|
||||
|
||||
preset = PresetManager.get_preset(preset_name)
|
||||
logger.info(f"{preset.icon} {preset.name} analysis mode: {preset.description}")
|
||||
except ValueError as e:
|
||||
logger.error(f"❌ {e}")
|
||||
return 1
|
||||
|
||||
# Apply default depth if not set by preset or CLI
|
||||
if args.depth is None:
|
||||
args.depth = "deep" # Default depth
|
||||
|
||||
setup_logging(verbose=args.verbose, quiet=getattr(args, "quiet", False))
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
directory = Path(args.directory)
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f"DRY RUN: Codebase Analysis")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Directory: {directory.resolve()}")
|
||||
print(f"Output: {args.output}")
|
||||
print(f"Preset: {preset_name}")
|
||||
print(f"Depth: {args.depth or 'deep (default)'}")
|
||||
print(f"Name: {getattr(args, 'name', None) or directory.name}")
|
||||
print(f"Enhance: level {args.enhance_level}")
|
||||
print(f"Skip flags: ", end="")
|
||||
skips = []
|
||||
for flag in [
|
||||
"skip_api_reference",
|
||||
"skip_dependency_graph",
|
||||
"skip_patterns",
|
||||
"skip_test_examples",
|
||||
"skip_how_to_guides",
|
||||
"skip_config_patterns",
|
||||
"skip_docs",
|
||||
]:
|
||||
if getattr(args, flag, False):
|
||||
skips.append(f"--{flag.replace('_', '-')}")
|
||||
print(", ".join(skips) if skips else "(none)")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate directory
|
||||
directory = Path(args.directory)
|
||||
if not directory.exists():
|
||||
logger.error(f"Directory not found: {directory}")
|
||||
return 1
|
||||
|
||||
if not directory.is_dir():
|
||||
logger.error(f"Not a directory: {directory}")
|
||||
return 1
|
||||
|
||||
# Parse languages
|
||||
languages = None
|
||||
if args.languages:
|
||||
languages = [lang.strip() for lang in args.languages.split(",")]
|
||||
|
||||
# Parse file patterns
|
||||
file_patterns = None
|
||||
if args.file_patterns:
|
||||
file_patterns = [p.strip() for p in args.file_patterns.split(",")]
|
||||
|
||||
# Analyze codebase
|
||||
try:
|
||||
results = analyze_codebase(
|
||||
directory=directory,
|
||||
output_dir=Path(args.output),
|
||||
depth=args.depth,
|
||||
languages=languages,
|
||||
file_patterns=file_patterns,
|
||||
build_api_reference=not args.skip_api_reference,
|
||||
extract_comments=not args.no_comments,
|
||||
build_dependency_graph=not args.skip_dependency_graph,
|
||||
detect_patterns=not args.skip_patterns,
|
||||
extract_test_examples=not args.skip_test_examples,
|
||||
build_how_to_guides=not args.skip_how_to_guides,
|
||||
extract_config_patterns=not args.skip_config_patterns,
|
||||
extract_docs=not args.skip_docs,
|
||||
enhance_level=args.enhance_level, # AI enhancement level (0-3)
|
||||
skill_name=getattr(args, "name", None),
|
||||
skill_description=getattr(args, "description", None),
|
||||
doc_version=getattr(args, "doc_version", ""),
|
||||
)
|
||||
|
||||
# ============================================================
|
||||
# WORKFLOW SYSTEM INTEGRATION (Phase 2)
|
||||
# ============================================================
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
|
||||
# Print summary
|
||||
print(f"\n{'=' * 60}")
|
||||
print("CODEBASE ANALYSIS COMPLETE")
|
||||
if workflow_executed:
|
||||
print(f" + {len(workflow_names)} ENHANCEMENT WORKFLOW(S) EXECUTED")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Files analyzed: {len(results['files'])}")
|
||||
print(f"Output directory: {args.output}")
|
||||
if not args.skip_api_reference:
|
||||
print(f"API reference: {Path(args.output) / 'api_reference'}")
|
||||
if workflow_executed:
|
||||
print(f"Workflows applied: {', '.join(workflow_names)}")
|
||||
print(f"{'=' * 60}\n")
|
||||
|
||||
return 0
|
||||
|
||||
except KeyboardInterrupt:
|
||||
logger.error("\nAnalysis interrupted by user")
|
||||
return 130
|
||||
except Exception as e:
|
||||
logger.error(f"Analysis failed: {e}")
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
def build_skill(self):
|
||||
"""SkillConverter interface — no-op because analyze_codebase() already calls _generate_skill_md()."""
|
||||
# analyze_codebase() generates SKILL.md internally via _generate_skill_md(),
|
||||
# so there is nothing additional to do here.
|
||||
pass
|
||||
|
||||
@@ -627,15 +627,17 @@ class ConfigParser:
|
||||
parent_path = []
|
||||
|
||||
for key, value in data.items():
|
||||
# YAML parses 'on:' as boolean True; convert non-string keys
|
||||
str_key = str(key) if not isinstance(key, str) else key
|
||||
if isinstance(value, dict):
|
||||
# Recurse into nested dicts
|
||||
self._extract_settings_from_dict(value, config_file, parent_path + [key])
|
||||
self._extract_settings_from_dict(value, config_file, parent_path + [str_key])
|
||||
else:
|
||||
setting = ConfigSetting(
|
||||
key=".".join(parent_path + [key]) if parent_path else key,
|
||||
key=".".join(parent_path + [str_key]) if parent_path else str_key,
|
||||
value=value,
|
||||
value_type=self._infer_type(value),
|
||||
nested_path=parent_path + [key],
|
||||
nested_path=parent_path + [str_key],
|
||||
)
|
||||
config_file.settings.append(setting)
|
||||
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Unified Config Validator
|
||||
UniSkillConfig Validator
|
||||
|
||||
Validates unified config format that supports multiple sources:
|
||||
Validates uni_skill_config format that supports multiple sources:
|
||||
- documentation (website scraping)
|
||||
- github (repository scraping)
|
||||
- pdf (PDF document scraping)
|
||||
@@ -34,9 +34,9 @@ logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ConfigValidator:
|
||||
class UniSkillConfigValidator:
|
||||
"""
|
||||
Validates unified config format (legacy support removed in v2.11.0).
|
||||
Validates uni_skill_config format (legacy support removed in v2.11.0).
|
||||
"""
|
||||
|
||||
# Valid source types
|
||||
@@ -100,7 +100,7 @@ class ConfigValidator:
|
||||
|
||||
def validate(self) -> bool:
|
||||
"""
|
||||
Validate unified config format.
|
||||
Validate uni_skill_config format.
|
||||
|
||||
Returns:
|
||||
True if valid
|
||||
@@ -136,8 +136,8 @@ class ConfigValidator:
|
||||
return self._validate_unified()
|
||||
|
||||
def _validate_unified(self) -> bool:
|
||||
"""Validate unified config format."""
|
||||
logger.info("Validating unified config format...")
|
||||
"""Validate uni_skill_config format."""
|
||||
logger.info("Validating uni_skill_config format...")
|
||||
|
||||
# Required top-level fields
|
||||
if "name" not in self.config:
|
||||
@@ -483,7 +483,11 @@ class ConfigValidator:
|
||||
return has_docs_api and has_github_code
|
||||
|
||||
|
||||
def validate_config(config_path: str) -> ConfigValidator:
|
||||
# Backward-compat alias
|
||||
ConfigValidator = UniSkillConfigValidator
|
||||
|
||||
|
||||
def validate_config(config_path: str) -> UniSkillConfigValidator:
|
||||
"""
|
||||
Validate config file and return validator instance.
|
||||
|
||||
@@ -491,12 +495,12 @@ def validate_config(config_path: str) -> ConfigValidator:
|
||||
config_path: Path to config JSON file
|
||||
|
||||
Returns:
|
||||
ConfigValidator instance
|
||||
UniSkillConfigValidator instance
|
||||
|
||||
Raises:
|
||||
ValueError if config is invalid
|
||||
"""
|
||||
validator = ConfigValidator(config_path)
|
||||
validator = UniSkillConfigValidator(config_path)
|
||||
validator.validate()
|
||||
return validator
|
||||
|
||||
|
||||
@@ -31,15 +31,15 @@ Usage:
|
||||
--space-key DEV --name dev-wiki --max-pages 200
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
# Optional dependency guard for atlassian-python-api
|
||||
try:
|
||||
from atlassian import Confluence
|
||||
@@ -177,7 +177,7 @@ def infer_description_from_confluence(
|
||||
)
|
||||
|
||||
|
||||
class ConfluenceToSkillConverter:
|
||||
class ConfluenceToSkillConverter(SkillConverter):
|
||||
"""Convert Confluence space documentation to an AI-ready skill.
|
||||
|
||||
Supports two extraction modes:
|
||||
@@ -209,6 +209,8 @@ class ConfluenceToSkillConverter:
|
||||
extracted_data: Structured extraction results dict.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "confluence"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialize the Confluence to skill converter.
|
||||
|
||||
@@ -223,6 +225,7 @@ class ConfluenceToSkillConverter:
|
||||
- description (str): Skill description (optional).
|
||||
- max_pages (int): Maximum pages to fetch, default 500.
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.base_url: str = config.get("base_url", "")
|
||||
@@ -242,6 +245,10 @@ class ConfluenceToSkillConverter:
|
||||
# Extracted data storage
|
||||
self.extracted_data: dict[str, Any] | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from Confluence (SkillConverter interface)."""
|
||||
self.extract_confluence()
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
# Extraction dispatcher
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
@@ -1916,255 +1923,3 @@ def _score_code_quality(code: str) -> float:
|
||||
score -= 2.0
|
||||
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# CLI entry point
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the Confluence scraper.
|
||||
|
||||
Parses command-line arguments and runs the extraction/build pipeline.
|
||||
Supports three workflows:
|
||||
|
||||
1. **API mode**: ``--base-url URL --space-key KEY --name my-skill``
|
||||
2. **Export mode**: ``--export-path ./export-dir/ --name my-skill``
|
||||
3. **Build from JSON**: ``--from-json my-skill_extracted.json``
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for failure).
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Confluence documentation to AI-ready skills",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" %(prog)s --base-url https://wiki.example.com "
|
||||
"--space-key PROJ --name my-wiki\n"
|
||||
" %(prog)s --export-path ./confluence-export/ --name my-wiki\n"
|
||||
" %(prog)s --from-json my-wiki_extracted.json\n"
|
||||
),
|
||||
)
|
||||
|
||||
# Standard shared arguments
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for Confluence
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
action.help = (
|
||||
"AI enhancement level (auto-detects API vs LOCAL mode): "
|
||||
"0=disabled (default for Confluence), 1=SKILL.md only, "
|
||||
"2=+architecture/config, 3=full enhancement. "
|
||||
"Mode selection: uses API if ANTHROPIC_API_KEY is set, "
|
||||
"otherwise LOCAL (Claude Code, Kimi, etc.)"
|
||||
)
|
||||
|
||||
# Confluence-specific arguments
|
||||
parser.add_argument(
|
||||
"--base-url",
|
||||
type=str,
|
||||
help="Confluence instance base URL (e.g., https://wiki.example.com)",
|
||||
metavar="URL",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--space-key",
|
||||
type=str,
|
||||
help="Confluence space key to extract (e.g., PROJ, DEV)",
|
||||
metavar="KEY",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--export-path",
|
||||
type=str,
|
||||
help="Path to Confluence HTML/XML export directory",
|
||||
metavar="PATH",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--username",
|
||||
type=str,
|
||||
help=("Confluence username / email for API auth (or set CONFLUENCE_USERNAME env var)"),
|
||||
metavar="USER",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--token",
|
||||
type=str,
|
||||
help=("Confluence API token for API auth (or set CONFLUENCE_TOKEN env var)"),
|
||||
metavar="TOKEN",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-pages",
|
||||
type=int,
|
||||
default=500,
|
||||
help="Maximum number of pages to fetch (default: 500)",
|
||||
metavar="N",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json",
|
||||
type=str,
|
||||
help="Build skill from previously extracted JSON data",
|
||||
metavar="FILE",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Setup logging
|
||||
if getattr(args, "quiet", False):
|
||||
logging.basicConfig(level=logging.WARNING, format="%(message)s")
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
|
||||
else:
|
||||
logging.basicConfig(level=logging.INFO, format="%(message)s")
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = (
|
||||
getattr(args, "base_url", None)
|
||||
or getattr(args, "export_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
or "(none)"
|
||||
)
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: Confluence Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Space key: {getattr(args, 'space_key', None) or '(N/A)'}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Max pages: {getattr(args, 'max_pages', 500)}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
has_api = getattr(args, "base_url", None) and getattr(args, "space_key", None)
|
||||
has_export = getattr(args, "export_path", None)
|
||||
has_json = getattr(args, "from_json", None)
|
||||
|
||||
if not (has_api or has_export or has_json):
|
||||
parser.error(
|
||||
"Must specify one of:\n"
|
||||
" --base-url URL --space-key KEY (API mode)\n"
|
||||
" --export-path PATH (export mode)\n"
|
||||
" --from-json FILE (build from JSON)"
|
||||
)
|
||||
|
||||
# Build from pre-extracted JSON
|
||||
if has_json:
|
||||
name = getattr(args, "name", None) or Path(args.from_json).stem.replace("_extracted", "")
|
||||
config: dict[str, Any] = {
|
||||
"name": name,
|
||||
"description": (
|
||||
getattr(args, "description", None) or f"Use when referencing {name} documentation"
|
||||
),
|
||||
}
|
||||
try:
|
||||
converter = ConfluenceToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Determine name
|
||||
if not getattr(args, "name", None):
|
||||
if has_api:
|
||||
args.name = args.space_key.lower()
|
||||
elif has_export:
|
||||
args.name = Path(args.export_path).name
|
||||
else:
|
||||
args.name = "confluence-skill"
|
||||
|
||||
# Build config
|
||||
config = {
|
||||
"name": args.name,
|
||||
"base_url": getattr(args, "base_url", "") or "",
|
||||
"space_key": getattr(args, "space_key", "") or "",
|
||||
"export_path": getattr(args, "export_path", "") or "",
|
||||
"username": getattr(args, "username", "") or "",
|
||||
"token": getattr(args, "token", "") or "",
|
||||
"max_pages": getattr(args, "max_pages", 500),
|
||||
}
|
||||
if getattr(args, "description", None):
|
||||
config["description"] = args.description
|
||||
|
||||
# Create converter and run
|
||||
try:
|
||||
converter = ConfluenceToSkillConverter(config)
|
||||
|
||||
if not converter.extract_confluence():
|
||||
print("\n Confluence extraction failed", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement workflow integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f" AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialised analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print(" API enhancement complete!")
|
||||
except ImportError:
|
||||
print(" API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print(" Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print(" Local enhancement complete!")
|
||||
|
||||
except (ValueError, RuntimeError, FileNotFoundError) as e:
|
||||
print(f"\n Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n Unexpected error during Confluence processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -1,19 +1,22 @@
|
||||
"""Unified create command - single entry point for skill creation.
|
||||
|
||||
Auto-detects source type (web, GitHub, local, PDF, config) and routes
|
||||
to appropriate scraper while maintaining full backward compatibility.
|
||||
to appropriate converter via get_converter().
|
||||
"""
|
||||
|
||||
import sys
|
||||
import logging
|
||||
import argparse
|
||||
from typing import Any
|
||||
|
||||
from skill_seekers.cli.source_detector import SourceDetector, SourceInfo
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
from skill_seekers.cli.arguments.create import (
|
||||
get_compatible_arguments,
|
||||
get_create_defaults,
|
||||
get_universal_argument_names,
|
||||
)
|
||||
from skill_seekers.cli.arguments.common import DEFAULT_CHUNK_TOKENS, DEFAULT_CHUNK_OVERLAP_TOKENS
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -21,14 +24,20 @@ logger = logging.getLogger(__name__)
|
||||
class CreateCommand:
|
||||
"""Unified create command implementation."""
|
||||
|
||||
def __init__(self, args: argparse.Namespace):
|
||||
def __init__(self, args: argparse.Namespace, parser_defaults: dict[str, Any] | None = None):
|
||||
"""Initialize create command.
|
||||
|
||||
Args:
|
||||
args: Parsed command-line arguments
|
||||
parser_defaults: Default values from the argument parser. Used by
|
||||
_is_explicitly_set() to detect which args the user actually
|
||||
provided on the command line vs. which are just defaults.
|
||||
"""
|
||||
self.args = args
|
||||
self.source_info: SourceInfo | None = None
|
||||
self._parser_defaults = (
|
||||
parser_defaults if parser_defaults is not None else get_create_defaults()
|
||||
)
|
||||
|
||||
def execute(self) -> int:
|
||||
"""Execute the create command.
|
||||
@@ -52,12 +61,36 @@ class CreateCommand:
|
||||
logger.error(f"Source validation failed: {e}")
|
||||
return 1
|
||||
|
||||
# 3. Validate and warn about incompatible arguments
|
||||
# 3. Initialize ExecutionContext with source info
|
||||
# This provides a single source of truth for all configuration
|
||||
# Resolve config path from args or source detection
|
||||
config_path = getattr(self.args, "config", None) or (
|
||||
self.source_info.parsed.get("config_path") if self.source_info else None
|
||||
)
|
||||
ExecutionContext.initialize(
|
||||
args=self.args,
|
||||
config_path=config_path,
|
||||
source_info=self.source_info,
|
||||
)
|
||||
|
||||
# 4. Validate and warn about incompatible arguments
|
||||
self._validate_arguments()
|
||||
|
||||
# 4. Route to appropriate scraper
|
||||
logger.info(f"Routing to {self.source_info.type} scraper...")
|
||||
return self._route_to_scraper()
|
||||
# 5. Route to appropriate converter
|
||||
logger.info(f"Routing to {self.source_info.type} converter...")
|
||||
result = self._route_to_scraper()
|
||||
if result != 0:
|
||||
return result
|
||||
|
||||
# 6. Centralized enhancement (runs after converter, not inside each scraper)
|
||||
ctx = ExecutionContext.get()
|
||||
if ctx.enhancement.enabled and ctx.enhancement.level > 0:
|
||||
self._run_enhancement(ctx)
|
||||
|
||||
# 7. Centralized workflows
|
||||
self._run_workflows()
|
||||
|
||||
return 0
|
||||
|
||||
def _validate_arguments(self) -> None:
|
||||
"""Validate arguments and warn about incompatible ones."""
|
||||
@@ -86,313 +119,338 @@ class CreateCommand:
|
||||
f"{self.source_info.type} sources and will be ignored"
|
||||
)
|
||||
|
||||
def _is_explicitly_set(self, arg_name: str, arg_value: any) -> bool:
|
||||
def _is_explicitly_set(self, arg_name: str, arg_value: Any) -> bool:
|
||||
"""Check if an argument was explicitly set by the user.
|
||||
|
||||
Compares the current value against the parser's registered default.
|
||||
This avoids hardcoding default values that can drift out of sync.
|
||||
|
||||
Args:
|
||||
arg_name: Argument name
|
||||
arg_value: Argument value
|
||||
arg_name: Argument destination name
|
||||
arg_value: Current argument value
|
||||
|
||||
Returns:
|
||||
True if user explicitly set this argument
|
||||
"""
|
||||
# Boolean flags - True means it was set
|
||||
if isinstance(arg_value, bool):
|
||||
return arg_value
|
||||
|
||||
# None means not set
|
||||
if arg_value is None:
|
||||
return False
|
||||
|
||||
# Check against common defaults — args with these values were NOT
|
||||
# explicitly set by the user and should not be forwarded.
|
||||
defaults = {
|
||||
"max_issues": 100,
|
||||
"chunk_tokens": DEFAULT_CHUNK_TOKENS,
|
||||
"chunk_overlap_tokens": DEFAULT_CHUNK_OVERLAP_TOKENS,
|
||||
"output": None,
|
||||
"doc_version": "",
|
||||
"video_languages": "en",
|
||||
"whisper_model": "base",
|
||||
"platform": "slack",
|
||||
"visual_interval": 0.7,
|
||||
"visual_min_gap": 0.5,
|
||||
"visual_similarity": 3.0,
|
||||
}
|
||||
# Boolean flags: True means explicitly set (store_true defaults to False)
|
||||
if isinstance(arg_value, bool):
|
||||
return arg_value
|
||||
|
||||
if arg_name in defaults:
|
||||
return arg_value != defaults[arg_name]
|
||||
# Compare against parser default if available
|
||||
if arg_name in self._parser_defaults:
|
||||
return arg_value != self._parser_defaults[arg_name]
|
||||
|
||||
# Any other non-None value means it was set
|
||||
# No registered default and non-None → user must have set it
|
||||
return True
|
||||
|
||||
def _route_to_scraper(self) -> int:
|
||||
"""Route to appropriate scraper based on source type.
|
||||
"""Route to appropriate converter based on source type.
|
||||
|
||||
Builds a config dict from ExecutionContext + source_info, then
|
||||
calls converter.run() directly — no sys.argv swap needed.
|
||||
|
||||
Returns:
|
||||
Exit code from scraper
|
||||
Exit code from converter
|
||||
"""
|
||||
if self.source_info.type == "web":
|
||||
return self._route_web()
|
||||
elif self.source_info.type == "github":
|
||||
return self._route_github()
|
||||
elif self.source_info.type == "local":
|
||||
return self._route_local()
|
||||
elif self.source_info.type == "pdf":
|
||||
return self._route_pdf()
|
||||
elif self.source_info.type == "word":
|
||||
return self._route_word()
|
||||
elif self.source_info.type == "epub":
|
||||
return self._route_epub()
|
||||
elif self.source_info.type == "video":
|
||||
return self._route_video()
|
||||
elif self.source_info.type == "config":
|
||||
return self._route_config()
|
||||
elif self.source_info.type == "jupyter":
|
||||
return self._route_generic("jupyter_scraper", "--notebook")
|
||||
elif self.source_info.type == "html":
|
||||
return self._route_generic("html_scraper", "--html-path")
|
||||
elif self.source_info.type == "openapi":
|
||||
return self._route_generic("openapi_scraper", "--spec")
|
||||
elif self.source_info.type == "asciidoc":
|
||||
return self._route_generic("asciidoc_scraper", "--asciidoc-path")
|
||||
elif self.source_info.type == "pptx":
|
||||
return self._route_generic("pptx_scraper", "--pptx")
|
||||
elif self.source_info.type == "rss":
|
||||
return self._route_generic("rss_scraper", "--feed-path")
|
||||
elif self.source_info.type == "manpage":
|
||||
return self._route_generic("man_scraper", "--man-path")
|
||||
elif self.source_info.type == "confluence":
|
||||
return self._route_generic("confluence_scraper", "--export-path")
|
||||
elif self.source_info.type == "notion":
|
||||
return self._route_generic("notion_scraper", "--export-path")
|
||||
elif self.source_info.type == "chat":
|
||||
return self._route_generic("chat_scraper", "--export-path")
|
||||
else:
|
||||
logger.error(f"Unknown source type: {self.source_info.type}")
|
||||
return 1
|
||||
source_type = self.source_info.type
|
||||
ctx = ExecutionContext.get()
|
||||
|
||||
# ── Dynamic argument forwarding ──────────────────────────────────────
|
||||
#
|
||||
# Instead of manually checking each flag in every _route_*() method,
|
||||
# _build_argv() dynamically iterates vars(self.args) and forwards all
|
||||
# explicitly-set arguments. This is the same pattern used by
|
||||
# main.py::_reconstruct_argv() and eliminates ~40 missing-flag gaps.
|
||||
# UnifiedScraper is special — it takes config_path, not a config dict
|
||||
if source_type == "config":
|
||||
from skill_seekers.cli.unified_scraper import UnifiedScraper
|
||||
|
||||
# Dest names that differ from their CLI flag (dest → flag)
|
||||
_DEST_TO_FLAG = {
|
||||
"async_mode": "--async",
|
||||
"video_url": "--url",
|
||||
"video_playlist": "--playlist",
|
||||
"video_languages": "--languages",
|
||||
"skip_config": "--skip-config-patterns",
|
||||
}
|
||||
config_path = self.source_info.parsed["config_path"]
|
||||
merge_mode = getattr(self.args, "merge_mode", None)
|
||||
converter = UnifiedScraper(config_path, merge_mode=merge_mode)
|
||||
return converter.run()
|
||||
|
||||
# Internal args that should never be forwarded to sub-scrapers.
|
||||
# video_url/video_playlist/video_file are handled as positionals by _route_video().
|
||||
# config is forwarded manually only by routes that need it (web, github).
|
||||
_SKIP_ARGS = frozenset(
|
||||
{
|
||||
"source",
|
||||
"func",
|
||||
"subcommand",
|
||||
"command",
|
||||
"config",
|
||||
"video_url",
|
||||
"video_playlist",
|
||||
"video_file",
|
||||
}
|
||||
)
|
||||
config = self._build_config(source_type, ctx)
|
||||
converter = get_converter(source_type, config)
|
||||
return converter.run()
|
||||
|
||||
def _build_argv(
|
||||
self,
|
||||
module_name: str,
|
||||
positional_args: list[str],
|
||||
allowlist: frozenset[str] | None = None,
|
||||
) -> list[str]:
|
||||
"""Build argv dynamically by forwarding all explicitly-set arguments.
|
||||
def _build_config(self, source_type: str, ctx: ExecutionContext) -> dict[str, Any]:
|
||||
"""Build a config dict for the converter from ExecutionContext.
|
||||
|
||||
Uses the same pattern as main.py::_reconstruct_argv().
|
||||
Replaces manual per-flag checking in _route_*() and _add_common_args().
|
||||
Each converter reads specific keys from the config dict passed to
|
||||
its __init__. This method constructs that dict from the centralized
|
||||
ExecutionContext, which already holds all CLI args + config file values.
|
||||
|
||||
Args:
|
||||
module_name: Scraper module name (e.g., "doc_scraper")
|
||||
positional_args: Positional arguments to prepend (e.g., [url] or ["--repo", repo])
|
||||
allowlist: If provided, ONLY forward args in this set (overrides _SKIP_ARGS).
|
||||
Used for targets with strict arg sets like unified_scraper.
|
||||
source_type: Detected source type (web, github, pdf, etc.)
|
||||
ctx: Initialized ExecutionContext
|
||||
|
||||
Returns:
|
||||
Complete argv list for the scraper
|
||||
Config dict suitable for the converter's __init__.
|
||||
"""
|
||||
argv = [module_name] + positional_args
|
||||
|
||||
# Auto-add suggested name if user didn't provide one (skip for allowlisted targets)
|
||||
if not allowlist and not self.args.name and self.source_info:
|
||||
argv.extend(["--name", self.source_info.suggested_name])
|
||||
|
||||
for key, value in vars(self.args).items():
|
||||
# If allowlist provided, only forward args in the allowlist
|
||||
if allowlist is not None:
|
||||
if key not in allowlist:
|
||||
continue
|
||||
elif key in self._SKIP_ARGS or key.startswith("_help_"):
|
||||
continue
|
||||
if not self._is_explicitly_set(key, value):
|
||||
continue
|
||||
|
||||
# Use translation map for mismatched dest→flag names, else derive from key
|
||||
if key in self._DEST_TO_FLAG:
|
||||
arg_flag = self._DEST_TO_FLAG[key]
|
||||
else:
|
||||
arg_flag = f"--{key.replace('_', '-')}"
|
||||
|
||||
if isinstance(value, bool):
|
||||
if value:
|
||||
argv.append(arg_flag)
|
||||
elif isinstance(value, list):
|
||||
for item in value:
|
||||
argv.extend([arg_flag, str(item)])
|
||||
elif value is not None:
|
||||
argv.extend([arg_flag, str(value)])
|
||||
|
||||
return argv
|
||||
|
||||
def _call_module(self, module, argv: list[str]) -> int:
|
||||
"""Call a scraper module with the given argv.
|
||||
|
||||
Swaps sys.argv, calls module.main(), restores sys.argv.
|
||||
"""
|
||||
logger.debug(f"Calling {argv[0]} with argv: {argv}")
|
||||
original_argv = sys.argv
|
||||
try:
|
||||
sys.argv = argv
|
||||
result = module.main()
|
||||
if result is None:
|
||||
logger.warning(f"Module returned None exit code, treating as success")
|
||||
return 0
|
||||
return result
|
||||
finally:
|
||||
sys.argv = original_argv
|
||||
|
||||
def _route_web(self) -> int:
|
||||
"""Route to web documentation scraper (doc_scraper.py)."""
|
||||
from skill_seekers.cli import doc_scraper
|
||||
|
||||
url = self.source_info.parsed.get("url", self.source_info.raw_source)
|
||||
argv = self._build_argv("doc_scraper", [url])
|
||||
|
||||
# Forward config if set (not in _build_argv since it's in SKIP_ARGS
|
||||
# to avoid double-forwarding for config-type sources)
|
||||
if self.args.config:
|
||||
argv.extend(["--config", self.args.config])
|
||||
|
||||
return self._call_module(doc_scraper, argv)
|
||||
|
||||
def _route_github(self) -> int:
|
||||
"""Route to GitHub repository scraper (github_scraper.py)."""
|
||||
from skill_seekers.cli import github_scraper
|
||||
|
||||
repo = self.source_info.parsed.get("repo", self.source_info.raw_source)
|
||||
argv = self._build_argv("github_scraper", ["--repo", repo])
|
||||
|
||||
if self.args.config:
|
||||
argv.extend(["--config", self.args.config])
|
||||
|
||||
return self._call_module(github_scraper, argv)
|
||||
|
||||
def _route_local(self) -> int:
|
||||
"""Route to local codebase analyzer (codebase_scraper.py)."""
|
||||
from skill_seekers.cli import codebase_scraper
|
||||
|
||||
directory = self.source_info.parsed.get("directory", self.source_info.raw_source)
|
||||
argv = self._build_argv("codebase_scraper", ["--directory", directory])
|
||||
return self._call_module(codebase_scraper, argv)
|
||||
|
||||
def _route_pdf(self) -> int:
|
||||
"""Route to PDF scraper (pdf_scraper.py)."""
|
||||
from skill_seekers.cli import pdf_scraper
|
||||
|
||||
file_path = self.source_info.parsed.get("file_path", self.source_info.raw_source)
|
||||
argv = self._build_argv("pdf_scraper", ["--pdf", file_path])
|
||||
return self._call_module(pdf_scraper, argv)
|
||||
|
||||
def _route_word(self) -> int:
|
||||
"""Route to Word document scraper (word_scraper.py)."""
|
||||
from skill_seekers.cli import word_scraper
|
||||
|
||||
file_path = self.source_info.parsed.get("file_path", self.source_info.raw_source)
|
||||
argv = self._build_argv("word_scraper", ["--docx", file_path])
|
||||
return self._call_module(word_scraper, argv)
|
||||
|
||||
def _route_epub(self) -> int:
|
||||
"""Route to EPUB scraper (epub_scraper.py)."""
|
||||
from skill_seekers.cli import epub_scraper
|
||||
|
||||
file_path = self.source_info.parsed.get("file_path", self.source_info.raw_source)
|
||||
argv = self._build_argv("epub_scraper", ["--epub", file_path])
|
||||
return self._call_module(epub_scraper, argv)
|
||||
|
||||
def _route_video(self) -> int:
|
||||
"""Route to video scraper (video_scraper.py)."""
|
||||
from skill_seekers.cli import video_scraper
|
||||
|
||||
parsed = self.source_info.parsed
|
||||
if parsed.get("source_kind") == "file":
|
||||
positional = ["--video-file", parsed["file_path"]]
|
||||
elif parsed.get("url"):
|
||||
url = parsed["url"]
|
||||
flag = "--playlist" if "playlist" in url.lower() else "--url"
|
||||
positional = [flag, url]
|
||||
else:
|
||||
positional = []
|
||||
name = ctx.output.name or self.source_info.suggested_name
|
||||
|
||||
argv = self._build_argv("video_scraper", positional)
|
||||
return self._call_module(video_scraper, argv)
|
||||
|
||||
# Args accepted by unified_scraper (allowlist for config route)
|
||||
_UNIFIED_SCRAPER_ARGS = frozenset(
|
||||
{
|
||||
"merge_mode",
|
||||
"skip_codebase_analysis",
|
||||
"fresh",
|
||||
"dry_run",
|
||||
"enhance_workflow",
|
||||
"enhance_stage",
|
||||
"var",
|
||||
"workflow_dry_run",
|
||||
"api_key",
|
||||
"enhance_level",
|
||||
"agent",
|
||||
"agent_cmd",
|
||||
# Common keys shared by all converters
|
||||
config: dict[str, Any] = {
|
||||
"name": name,
|
||||
"description": getattr(self.args, "description", None)
|
||||
or f"Use when working with {name}",
|
||||
}
|
||||
)
|
||||
|
||||
def _route_config(self) -> int:
|
||||
"""Route to unified scraper for config files (unified_scraper.py)."""
|
||||
from skill_seekers.cli import unified_scraper
|
||||
if source_type == "web":
|
||||
url = parsed.get("url", parsed.get("base_url", self.source_info.raw_input))
|
||||
config.update(
|
||||
{
|
||||
"base_url": url,
|
||||
"doc_version": ctx.output.doc_version,
|
||||
"max_pages": ctx.scraping.max_pages,
|
||||
"rate_limit": ctx.scraping.rate_limit,
|
||||
"browser": ctx.scraping.browser,
|
||||
"browser_wait_until": ctx.scraping.browser_wait_until,
|
||||
"browser_extra_wait": ctx.scraping.browser_extra_wait,
|
||||
"workers": ctx.scraping.workers,
|
||||
"async_mode": ctx.scraping.async_mode,
|
||||
"resume": ctx.scraping.resume,
|
||||
"fresh": ctx.scraping.fresh,
|
||||
"skip_scrape": ctx.scraping.skip_scrape,
|
||||
"selectors": {"title": "title", "code_blocks": "pre code"},
|
||||
"url_patterns": {"include": [], "exclude": []},
|
||||
}
|
||||
)
|
||||
# Load from config file if provided
|
||||
config_path = getattr(self.args, "config", None)
|
||||
if config_path:
|
||||
self._merge_json_config(config, config_path)
|
||||
|
||||
config_path = self.source_info.parsed["config_path"]
|
||||
argv = self._build_argv(
|
||||
"unified_scraper",
|
||||
["--config", config_path],
|
||||
allowlist=self._UNIFIED_SCRAPER_ARGS,
|
||||
)
|
||||
return self._call_module(unified_scraper, argv)
|
||||
elif source_type == "github":
|
||||
repo = parsed.get("repo", self.source_info.raw_input)
|
||||
config.update(
|
||||
{
|
||||
"repo": repo,
|
||||
"local_repo_path": getattr(self.args, "local_repo_path", None),
|
||||
"include_issues": getattr(self.args, "include_issues", True),
|
||||
"max_issues": getattr(self.args, "max_issues", 100),
|
||||
"include_changelog": getattr(self.args, "include_changelog", True),
|
||||
"include_releases": getattr(self.args, "include_releases", True),
|
||||
"include_code": getattr(self.args, "include_code", False),
|
||||
}
|
||||
)
|
||||
config_path = getattr(self.args, "config", None)
|
||||
if config_path:
|
||||
self._merge_json_config(config, config_path)
|
||||
|
||||
def _route_generic(self, module_name: str, file_flag: str) -> int:
|
||||
"""Generic routing for new source types.
|
||||
elif source_type == "local":
|
||||
directory = parsed.get("directory", self.source_info.raw_input)
|
||||
config.update(
|
||||
{
|
||||
"directory": directory,
|
||||
"depth": ctx.analysis.depth,
|
||||
"output_dir": ctx.output.output_dir or f"output/{name}",
|
||||
"languages": getattr(self.args, "languages", None),
|
||||
"file_patterns": ctx.analysis.file_patterns,
|
||||
"detect_patterns": not ctx.analysis.skip_patterns,
|
||||
"extract_test_examples": not ctx.analysis.skip_test_examples,
|
||||
"build_how_to_guides": not ctx.analysis.skip_how_to_guides,
|
||||
"extract_config_patterns": not ctx.analysis.skip_config_patterns,
|
||||
"build_api_reference": not ctx.analysis.skip_api_reference,
|
||||
"build_dependency_graph": not ctx.analysis.skip_dependency_graph,
|
||||
"extract_docs": not ctx.analysis.skip_docs,
|
||||
"extract_comments": not ctx.analysis.no_comments,
|
||||
"enhance_level": ctx.enhancement.level if ctx.enhancement.enabled else 0,
|
||||
"skill_name": name,
|
||||
"doc_version": ctx.output.doc_version,
|
||||
}
|
||||
)
|
||||
|
||||
All new source types (jupyter, html, openapi, asciidoc, pptx, rss,
|
||||
manpage, confluence, notion, chat) use dynamic argument forwarding.
|
||||
elif source_type == "pdf":
|
||||
config.update(
|
||||
{
|
||||
"pdf_path": parsed.get("file_path", self.source_info.raw_input),
|
||||
"extract_options": {
|
||||
"chunk_size": 10,
|
||||
"min_quality": 5.0,
|
||||
"extract_images": True,
|
||||
"min_image_size": 100,
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
elif source_type == "word":
|
||||
config["docx_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "epub":
|
||||
config["epub_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "video":
|
||||
config.update(
|
||||
{
|
||||
"languages": getattr(self.args, "video_languages", "en"),
|
||||
"visual": getattr(self.args, "visual", False),
|
||||
"whisper_model": getattr(self.args, "whisper_model", "base"),
|
||||
"visual_interval": getattr(self.args, "visual_interval", 0.7),
|
||||
"visual_min_gap": getattr(self.args, "visual_min_gap", 0.5),
|
||||
"visual_similarity": getattr(self.args, "visual_similarity", 3.0),
|
||||
}
|
||||
)
|
||||
# Video source can be URL, playlist, or file
|
||||
if parsed.get("source_kind") == "file":
|
||||
config["video_file"] = parsed["file_path"]
|
||||
elif parsed.get("url"):
|
||||
url = parsed["url"]
|
||||
if "playlist" in url.lower():
|
||||
config["playlist"] = url
|
||||
else:
|
||||
config["url"] = url
|
||||
else:
|
||||
# Fallback: treat raw input as URL
|
||||
config["url"] = self.source_info.raw_input
|
||||
|
||||
elif source_type == "jupyter":
|
||||
config["notebook_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "html":
|
||||
config["html_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "openapi":
|
||||
file_path = parsed.get("file_path", self.source_info.raw_input)
|
||||
if file_path.startswith(("http://", "https://")):
|
||||
config["spec_url"] = file_path
|
||||
else:
|
||||
config["spec_path"] = file_path
|
||||
|
||||
elif source_type == "asciidoc":
|
||||
config["asciidoc_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "pptx":
|
||||
config["pptx_path"] = parsed.get("file_path", self.source_info.raw_input)
|
||||
|
||||
elif source_type == "rss":
|
||||
file_path = parsed.get("file_path", self.source_info.raw_input)
|
||||
if file_path.startswith(("http://", "https://")):
|
||||
config["feed_url"] = file_path
|
||||
else:
|
||||
config["feed_path"] = file_path
|
||||
config["follow_links"] = getattr(self.args, "follow_links", True)
|
||||
config["max_articles"] = getattr(self.args, "max_articles", 50)
|
||||
|
||||
elif source_type == "manpage":
|
||||
file_path = parsed.get("file_path", "")
|
||||
if file_path:
|
||||
config["man_path"] = file_path
|
||||
man_names = parsed.get("man_names", [])
|
||||
if man_names:
|
||||
config["man_names"] = man_names
|
||||
|
||||
elif source_type == "confluence":
|
||||
config.update(
|
||||
{
|
||||
"export_path": parsed.get("file_path", ""),
|
||||
"base_url": getattr(self.args, "confluence_url", ""),
|
||||
"space_key": getattr(self.args, "space_key", ""),
|
||||
"username": getattr(self.args, "username", ""),
|
||||
"token": getattr(self.args, "token", ""),
|
||||
"max_pages": getattr(self.args, "max_pages", 500),
|
||||
}
|
||||
)
|
||||
|
||||
elif source_type == "notion":
|
||||
config.update(
|
||||
{
|
||||
"export_path": parsed.get("file_path"),
|
||||
"database_id": getattr(self.args, "database_id", None),
|
||||
"page_id": getattr(self.args, "page_id", None),
|
||||
"token": getattr(self.args, "notion_token", None),
|
||||
"max_pages": getattr(self.args, "max_pages", 100),
|
||||
}
|
||||
)
|
||||
|
||||
elif source_type == "chat":
|
||||
config.update(
|
||||
{
|
||||
"export_path": parsed.get("file_path", ""),
|
||||
"platform": getattr(self.args, "platform", "slack"),
|
||||
"token": getattr(self.args, "token", ""),
|
||||
"channel": getattr(self.args, "channel", ""),
|
||||
"max_messages": getattr(self.args, "max_messages", 1000),
|
||||
}
|
||||
)
|
||||
|
||||
return config
|
||||
|
||||
@staticmethod
|
||||
def _merge_json_config(config: dict[str, Any], config_path: str) -> None:
|
||||
"""Merge a JSON config file into the config dict.
|
||||
|
||||
Config file values are used as defaults — CLI args (already in config) take precedence.
|
||||
"""
|
||||
import importlib
|
||||
import json
|
||||
|
||||
module = importlib.import_module(f"skill_seekers.cli.{module_name}")
|
||||
try:
|
||||
with open(config_path, encoding="utf-8") as f:
|
||||
file_config = json.load(f)
|
||||
# Only set keys that aren't already in config
|
||||
for key, value in file_config.items():
|
||||
if key not in config:
|
||||
config[key] = value
|
||||
except (FileNotFoundError, json.JSONDecodeError) as e:
|
||||
logger.warning(f"Could not load config file {config_path}: {e}")
|
||||
|
||||
file_path = self.source_info.parsed.get("file_path", "")
|
||||
positional = [file_flag, file_path] if file_path else []
|
||||
argv = self._build_argv(module_name, positional)
|
||||
return self._call_module(module, argv)
|
||||
def _run_enhancement(self, ctx: ExecutionContext) -> None:
|
||||
"""Run centralized AI enhancement after converter completes."""
|
||||
from pathlib import Path
|
||||
|
||||
name = ctx.output.name or (
|
||||
self.source_info.suggested_name if self.source_info else "unnamed"
|
||||
)
|
||||
skill_dir = ctx.output.output_dir or f"output/{name}"
|
||||
|
||||
logger.info("\n" + "=" * 60)
|
||||
logger.info(f"Enhancing SKILL.md (level {ctx.enhancement.level})")
|
||||
logger.info("=" * 60)
|
||||
|
||||
try:
|
||||
from skill_seekers.cli.agent_client import AgentClient
|
||||
|
||||
client = AgentClient(
|
||||
mode=ctx.enhancement.mode,
|
||||
agent=ctx.enhancement.agent,
|
||||
api_key=ctx.enhancement.api_key,
|
||||
)
|
||||
|
||||
if client.mode == "api" and client.client:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
api_key = ctx.enhancement.api_key or client.api_key
|
||||
if api_key:
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
logger.info("API enhancement complete!")
|
||||
else:
|
||||
logger.warning("No API key available for enhancement")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
enhancer = LocalSkillEnhancer(
|
||||
Path(skill_dir),
|
||||
agent=ctx.enhancement.agent,
|
||||
agent_cmd=ctx.enhancement.agent_cmd,
|
||||
)
|
||||
success = enhancer.run(headless=True, timeout=ctx.enhancement.timeout)
|
||||
if success:
|
||||
agent_name = ctx.enhancement.agent or "claude"
|
||||
logger.info(f"Local enhancement complete! (via {agent_name})")
|
||||
else:
|
||||
logger.warning("Local enhancement did not complete")
|
||||
except Exception as e:
|
||||
logger.warning(f"Enhancement failed: {e}")
|
||||
|
||||
def _run_workflows(self) -> None:
|
||||
"""Run enhancement workflows if configured."""
|
||||
try:
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
run_workflows(self.args)
|
||||
except ImportError:
|
||||
pass
|
||||
except Exception as e:
|
||||
logger.warning(f"Workflow execution failed: {e}")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
@@ -492,97 +550,28 @@ Common Workflows:
|
||||
args = parser.parse_args()
|
||||
|
||||
# Handle source-specific help modes
|
||||
if args._help_web:
|
||||
# Recreate parser with web-specific arguments
|
||||
parser_web = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from web documentation",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_web, mode="web")
|
||||
parser_web.print_help()
|
||||
return 0
|
||||
elif args._help_github:
|
||||
parser_github = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from GitHub repository",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_github, mode="github")
|
||||
parser_github.print_help()
|
||||
return 0
|
||||
elif args._help_local:
|
||||
parser_local = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from local codebase",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_local, mode="local")
|
||||
parser_local.print_help()
|
||||
return 0
|
||||
elif args._help_pdf:
|
||||
parser_pdf = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from PDF file",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_pdf, mode="pdf")
|
||||
parser_pdf.print_help()
|
||||
return 0
|
||||
elif args._help_word:
|
||||
parser_word = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from Word document (.docx)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_word, mode="word")
|
||||
parser_word.print_help()
|
||||
return 0
|
||||
elif args._help_epub:
|
||||
parser_epub = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from EPUB e-book (.epub)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_epub, mode="epub")
|
||||
parser_epub.print_help()
|
||||
return 0
|
||||
elif args._help_video:
|
||||
parser_video = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from video (YouTube, Vimeo, local files)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_video, mode="video")
|
||||
parser_video.print_help()
|
||||
return 0
|
||||
elif args._help_config:
|
||||
parser_config = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill from multi-source config file (unified scraper)",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_config, mode="config")
|
||||
parser_config.print_help()
|
||||
return 0
|
||||
elif args._help_advanced:
|
||||
parser_advanced = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill - advanced options",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_advanced, mode="advanced")
|
||||
parser_advanced.print_help()
|
||||
return 0
|
||||
elif args._help_all:
|
||||
parser_all = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description="Create skill - all options",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(parser_all, mode="all")
|
||||
parser_all.print_help()
|
||||
return 0
|
||||
_HELP_MODES = {
|
||||
"_help_web": ("web", "Create skill from web documentation"),
|
||||
"_help_github": ("github", "Create skill from GitHub repository"),
|
||||
"_help_local": ("local", "Create skill from local codebase"),
|
||||
"_help_pdf": ("pdf", "Create skill from PDF file"),
|
||||
"_help_word": ("word", "Create skill from Word document (.docx)"),
|
||||
"_help_epub": ("epub", "Create skill from EPUB e-book (.epub)"),
|
||||
"_help_video": ("video", "Create skill from video (YouTube, Vimeo, local files)"),
|
||||
"_help_config": ("config", "Create skill from multi-source config file (unified scraper)"),
|
||||
"_help_advanced": ("advanced", "Create skill - advanced options"),
|
||||
"_help_all": ("all", "Create skill - all options"),
|
||||
}
|
||||
for attr, (mode, description) in _HELP_MODES.items():
|
||||
if getattr(args, attr, False):
|
||||
help_parser = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description=description,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(help_parser, mode=mode)
|
||||
help_parser.print_help()
|
||||
return 0
|
||||
|
||||
# Setup logging
|
||||
log_level = logging.DEBUG if args.verbose else (logging.WARNING if args.quiet else logging.INFO)
|
||||
|
||||
570
src/skill_seekers/cli/doc_scraper.py
Executable file → Normal file
570
src/skill_seekers/cli/doc_scraper.py
Executable file → Normal file
@@ -46,7 +46,7 @@ from skill_seekers.cli.language_detector import LanguageDetector
|
||||
from skill_seekers.cli.llms_txt_detector import LlmsTxtDetector
|
||||
from skill_seekers.cli.llms_txt_downloader import LlmsTxtDownloader
|
||||
from skill_seekers.cli.llms_txt_parser import LlmsTxtParser
|
||||
from skill_seekers.cli.arguments.scrape import add_scrape_arguments
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
from skill_seekers.cli.utils import sanitize_url, setup_logging
|
||||
|
||||
# Configure logging
|
||||
@@ -152,8 +152,11 @@ def infer_description_from_docs(
|
||||
)
|
||||
|
||||
|
||||
class DocToSkillConverter:
|
||||
class DocToSkillConverter(SkillConverter):
|
||||
SOURCE_TYPE = "web"
|
||||
|
||||
def __init__(self, config: dict[str, Any], dry_run: bool = False, resume: bool = False) -> None:
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.base_url = config["base_url"]
|
||||
@@ -1943,6 +1946,10 @@ To refresh this skill with updated documentation:
|
||||
|
||||
logger.info(" ✓ index.md")
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to scrape_all()."""
|
||||
self.scrape_all()
|
||||
|
||||
def build_skill(self) -> bool:
|
||||
"""Build the skill from scraped data.
|
||||
|
||||
@@ -2209,495 +2216,130 @@ def load_config(config_path: str) -> dict[str, Any]:
|
||||
return config
|
||||
|
||||
|
||||
def interactive_config() -> dict[str, Any]:
|
||||
"""Interactive configuration wizard for creating new configs.
|
||||
def scrape_documentation(
|
||||
config: dict[str, Any],
|
||||
ctx: Any | None = None,
|
||||
verbose: bool = False,
|
||||
quiet: bool = False,
|
||||
) -> int:
|
||||
"""Scrape documentation using config and optional context.
|
||||
|
||||
Prompts user for all required configuration fields step-by-step
|
||||
and returns a complete configuration dictionary.
|
||||
|
||||
Returns:
|
||||
dict: Complete configuration dictionary with user-provided values
|
||||
|
||||
Example:
|
||||
>>> config = interactive_config()
|
||||
# User enters: name=react, url=https://react.dev, etc.
|
||||
>>> config['name']
|
||||
'react'
|
||||
"""
|
||||
logger.info("\n" + "=" * 60)
|
||||
logger.info("Documentation to Skill Converter")
|
||||
logger.info("=" * 60 + "\n")
|
||||
|
||||
config: dict[str, Any] = {}
|
||||
|
||||
# Basic info
|
||||
config["name"] = input("Skill name (e.g., 'react', 'godot'): ").strip()
|
||||
config["description"] = input("Skill description: ").strip()
|
||||
config["base_url"] = input("Base URL (e.g., https://docs.example.com/): ").strip()
|
||||
|
||||
if not config["base_url"].endswith("/"):
|
||||
config["base_url"] += "/"
|
||||
|
||||
# Selectors
|
||||
logger.info("\nCSS Selectors (press Enter for defaults):")
|
||||
selectors = {}
|
||||
selectors["main_content"] = (
|
||||
input(" Main content [div[role='main']]: ").strip() or "div[role='main']"
|
||||
)
|
||||
selectors["title"] = input(" Title [title]: ").strip() or "title"
|
||||
selectors["code_blocks"] = input(" Code blocks [pre code]: ").strip() or "pre code"
|
||||
config["selectors"] = selectors
|
||||
|
||||
# URL patterns
|
||||
logger.info("\nURL Patterns (comma-separated, optional):")
|
||||
include = input(" Include: ").strip()
|
||||
exclude = input(" Exclude: ").strip()
|
||||
config["url_patterns"] = {
|
||||
"include": [p.strip() for p in include.split(",") if p.strip()],
|
||||
"exclude": [p.strip() for p in exclude.split(",") if p.strip()],
|
||||
}
|
||||
|
||||
# Settings
|
||||
rate = input(f"\nRate limit (seconds) [{DEFAULT_RATE_LIMIT}]: ").strip()
|
||||
config["rate_limit"] = float(rate) if rate else DEFAULT_RATE_LIMIT
|
||||
|
||||
max_p = input(f"Max pages [{DEFAULT_MAX_PAGES}]: ").strip()
|
||||
config["max_pages"] = int(max_p) if max_p else DEFAULT_MAX_PAGES
|
||||
|
||||
return config
|
||||
|
||||
|
||||
def check_existing_data(name: str) -> tuple[bool, int]:
|
||||
"""Check if scraped data already exists for a skill.
|
||||
This is the main entry point for programmatic use. CLI main() is a thin
|
||||
wrapper around this function.
|
||||
|
||||
Args:
|
||||
name (str): Skill name to check
|
||||
config: Configuration dictionary with required fields (name, base_url, etc.)
|
||||
ctx: Optional ExecutionContext for shared configuration
|
||||
verbose: Enable verbose logging
|
||||
quiet: Minimize logging output
|
||||
|
||||
Returns:
|
||||
tuple: (exists, page_count) where exists is bool and page_count is int
|
||||
|
||||
Example:
|
||||
>>> exists, count = check_existing_data('react')
|
||||
>>> if exists:
|
||||
... print(f"Found {count} existing pages")
|
||||
Exit code (0 for success, non-zero for error)
|
||||
"""
|
||||
data_dir = f"output/{name}_data"
|
||||
if os.path.exists(data_dir) and os.path.exists(f"{data_dir}/summary.json"):
|
||||
with open(f"{data_dir}/summary.json", encoding="utf-8") as f:
|
||||
summary = json.load(f)
|
||||
return True, summary.get("total_pages", 0)
|
||||
return False, 0
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
# Setup logging
|
||||
setup_logging(verbose=verbose, quiet=quiet)
|
||||
|
||||
def setup_argument_parser() -> argparse.ArgumentParser:
|
||||
"""Setup and configure command-line argument parser.
|
||||
|
||||
Creates an ArgumentParser with all CLI options for the doc scraper tool,
|
||||
including configuration, scraping, enhancement, and performance options.
|
||||
|
||||
All arguments are defined in skill_seekers.cli.arguments.scrape to ensure
|
||||
consistency between the standalone scraper and unified CLI.
|
||||
|
||||
Returns:
|
||||
argparse.ArgumentParser: Configured argument parser
|
||||
|
||||
Example:
|
||||
>>> parser = setup_argument_parser()
|
||||
>>> args = parser.parse_args(['--config', 'configs/react.json'])
|
||||
>>> print(args.config)
|
||||
configs/react.json
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert documentation websites to AI skills",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
# Add all scrape arguments from shared definitions
|
||||
# This ensures the standalone scraper and unified CLI stay in sync
|
||||
add_scrape_arguments(parser)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
def get_configuration(args: argparse.Namespace) -> dict[str, Any]:
|
||||
"""Load or create configuration from command-line arguments.
|
||||
|
||||
Handles three configuration modes:
|
||||
1. Load from JSON file (--config)
|
||||
2. Interactive configuration wizard (--interactive or missing args)
|
||||
3. Quick mode from command-line arguments (--name, --url)
|
||||
|
||||
Also applies CLI overrides for rate limiting and worker count.
|
||||
|
||||
Args:
|
||||
args: Parsed command-line arguments from argparse
|
||||
|
||||
Returns:
|
||||
dict: Configuration dictionary with all required fields
|
||||
|
||||
Example:
|
||||
>>> args = parser.parse_args(['--name', 'react', '--url', 'https://react.dev'])
|
||||
>>> config = get_configuration(args)
|
||||
>>> print(config['name'])
|
||||
react
|
||||
"""
|
||||
# Handle URL from either positional argument or --url flag
|
||||
# Positional 'url' takes precedence, then --url flag
|
||||
effective_url = getattr(args, "url", None)
|
||||
|
||||
# Get base configuration
|
||||
if args.config:
|
||||
config = load_config(args.config)
|
||||
elif args.interactive or not (args.name and effective_url):
|
||||
config = interactive_config()
|
||||
else:
|
||||
config = {
|
||||
"name": args.name,
|
||||
"description": args.description or f"Use when working with {args.name}",
|
||||
"base_url": effective_url,
|
||||
"selectors": {
|
||||
"title": "title",
|
||||
"code_blocks": "pre code",
|
||||
},
|
||||
"url_patterns": {"include": [], "exclude": []},
|
||||
"rate_limit": DEFAULT_RATE_LIMIT,
|
||||
"max_pages": DEFAULT_MAX_PAGES,
|
||||
}
|
||||
|
||||
# Apply CLI override for doc_version (works for all config modes)
|
||||
cli_doc_version = getattr(args, "doc_version", "")
|
||||
if cli_doc_version:
|
||||
config["doc_version"] = cli_doc_version
|
||||
|
||||
# Apply CLI overrides for rate limiting
|
||||
if args.no_rate_limit:
|
||||
config["rate_limit"] = 0
|
||||
logger.info("⚡ Rate limiting disabled")
|
||||
elif args.rate_limit is not None:
|
||||
config["rate_limit"] = args.rate_limit
|
||||
if args.rate_limit == 0:
|
||||
logger.info("⚡ Rate limiting disabled")
|
||||
# Use existing context if already initialized, otherwise create one
|
||||
if ctx is None:
|
||||
if ExecutionContext._initialized:
|
||||
ctx = ExecutionContext.get()
|
||||
else:
|
||||
logger.info("⚡ Rate limit override: %ss per page", args.rate_limit)
|
||||
ctx = ExecutionContext.initialize(args=argparse.Namespace(**config))
|
||||
|
||||
# Apply CLI overrides for worker count
|
||||
if args.workers:
|
||||
# Validate workers count
|
||||
if args.workers < 1:
|
||||
logger.error("❌ Error: --workers must be at least 1 (got %d)", args.workers)
|
||||
logger.error(" Suggestion: Use --workers 1 (default) or omit the flag")
|
||||
sys.exit(1)
|
||||
if args.workers > 10:
|
||||
logger.warning("⚠️ Warning: --workers capped at 10 (requested %d)", args.workers)
|
||||
args.workers = 10
|
||||
config["workers"] = args.workers
|
||||
if args.workers > 1:
|
||||
logger.info("🚀 Parallel scraping enabled: %d workers", args.workers)
|
||||
# Build converter and execute
|
||||
try:
|
||||
converter = _run_scraping(config)
|
||||
if converter is None:
|
||||
return 1
|
||||
|
||||
# Apply CLI override for async mode
|
||||
if args.async_mode:
|
||||
config["async_mode"] = True
|
||||
if config.get("workers", 1) > 1:
|
||||
logger.info("⚡ Async mode enabled (2-3x faster than threads)")
|
||||
else:
|
||||
logger.warning(
|
||||
"⚠️ Async mode enabled but workers=1. Consider using --workers 4 for better performance"
|
||||
)
|
||||
# Handle enhancement if enabled
|
||||
if ctx.enhancement.enabled and ctx.enhancement.level > 0:
|
||||
_run_enhancement(config, ctx, converter)
|
||||
|
||||
# Apply CLI override for browser mode
|
||||
if getattr(args, "browser", False):
|
||||
config["browser"] = True
|
||||
logger.info("🌐 Browser mode enabled (Playwright headless Chromium)")
|
||||
|
||||
# Apply CLI override for max_pages
|
||||
if args.max_pages is not None:
|
||||
old_max = config.get("max_pages", DEFAULT_MAX_PAGES)
|
||||
config["max_pages"] = args.max_pages
|
||||
|
||||
# Warnings for --max-pages usage
|
||||
if args.max_pages > 1000:
|
||||
logger.warning(
|
||||
"⚠️ --max-pages=%d is very high - scraping may take hours", args.max_pages
|
||||
)
|
||||
logger.warning(" Recommendation: Use configs with reasonable limits for production")
|
||||
elif args.max_pages < 10:
|
||||
logger.warning(
|
||||
"⚠️ --max-pages=%d is very low - may result in incomplete skill", args.max_pages
|
||||
)
|
||||
|
||||
if old_max and old_max != args.max_pages:
|
||||
logger.info(
|
||||
"📊 Max pages override: %d → %d (from --max-pages flag)", old_max, args.max_pages
|
||||
)
|
||||
else:
|
||||
logger.info("📊 Max pages set to: %d (from --max-pages flag)", args.max_pages)
|
||||
|
||||
return config
|
||||
return 0
|
||||
except Exception as e:
|
||||
logger.error(f"Scraping failed: {e}")
|
||||
return 1
|
||||
|
||||
|
||||
def execute_scraping_and_building(
|
||||
config: dict[str, Any], args: argparse.Namespace
|
||||
) -> Optional["DocToSkillConverter"]:
|
||||
"""Execute the scraping and skill building process.
|
||||
|
||||
Handles dry run mode, existing data checks, scraping with checkpoints,
|
||||
keyboard interrupts, and skill building. This is the core workflow
|
||||
orchestration for the scraping phase.
|
||||
|
||||
Args:
|
||||
config (dict): Configuration dictionary with scraping parameters
|
||||
args: Parsed command-line arguments
|
||||
|
||||
Returns:
|
||||
DocToSkillConverter: The converter instance after scraping/building,
|
||||
or None if process was aborted
|
||||
|
||||
Example:
|
||||
>>> config = {'name': 'react', 'base_url': 'https://react.dev'}
|
||||
>>> converter = execute_scraping_and_building(config, args)
|
||||
>>> if converter:
|
||||
... print("Scraping complete!")
|
||||
"""
|
||||
# Dry run mode - preview only
|
||||
if args.dry_run:
|
||||
logger.info("\n" + "=" * 60)
|
||||
logger.info("DRY RUN MODE")
|
||||
logger.info("=" * 60)
|
||||
logger.info("This will show what would be scraped without saving anything.\n")
|
||||
|
||||
converter = DocToSkillConverter(config, dry_run=True)
|
||||
converter.scrape_all()
|
||||
|
||||
logger.info("\n📋 Configuration Summary:")
|
||||
logger.info(" Name: %s", config["name"])
|
||||
logger.info(" Base URL: %s", config["base_url"])
|
||||
logger.info(" Max pages: %d", config.get("max_pages", DEFAULT_MAX_PAGES))
|
||||
logger.info(" Rate limit: %ss", config.get("rate_limit", DEFAULT_RATE_LIMIT))
|
||||
logger.info(" Categories: %d", len(config.get("categories", {})))
|
||||
return None
|
||||
|
||||
# Check for existing data
|
||||
exists, page_count = check_existing_data(config["name"])
|
||||
|
||||
if exists and not args.skip_scrape and not args.fresh:
|
||||
# Check force_rescrape flag from config
|
||||
if config.get("force_rescrape", False):
|
||||
# Auto-delete cached data and rescrape
|
||||
logger.info("\n✓ Found existing data: %d pages", page_count)
|
||||
logger.info(" force_rescrape enabled - deleting cached data and rescaping")
|
||||
import shutil
|
||||
|
||||
data_dir = f"output/{config['name']}_data"
|
||||
if os.path.exists(data_dir):
|
||||
shutil.rmtree(data_dir)
|
||||
logger.info(f" Deleted: {data_dir}")
|
||||
else:
|
||||
# Only prompt if force_rescrape is False
|
||||
logger.info("\n✓ Found existing data: %d pages", page_count)
|
||||
response = input("Use existing data? (y/n): ").strip().lower()
|
||||
if response == "y":
|
||||
args.skip_scrape = True
|
||||
elif exists and args.fresh:
|
||||
logger.info("\n✓ Found existing data: %d pages", page_count)
|
||||
logger.info(" --fresh flag set, will re-scrape from scratch")
|
||||
|
||||
def _run_scraping(config: dict[str, Any]) -> Optional["DocToSkillConverter"]:
|
||||
"""Run the scraping process."""
|
||||
# Create converter
|
||||
converter = DocToSkillConverter(config, resume=args.resume)
|
||||
converter = DocToSkillConverter(config)
|
||||
|
||||
# Initialize workflow tracking (will be updated if workflow runs)
|
||||
converter.workflow_executed = False
|
||||
converter.workflow_name = None
|
||||
|
||||
# Handle fresh start (clear checkpoint)
|
||||
if args.fresh:
|
||||
converter.clear_checkpoint()
|
||||
|
||||
# Scrape or skip
|
||||
if not args.skip_scrape:
|
||||
try:
|
||||
converter.scrape_all()
|
||||
# Save final checkpoint
|
||||
if converter.checkpoint_enabled:
|
||||
converter.save_checkpoint()
|
||||
logger.info("\n💾 Final checkpoint saved")
|
||||
# Clear checkpoint after successful completion
|
||||
converter.clear_checkpoint()
|
||||
logger.info("✅ Scraping complete - checkpoint cleared")
|
||||
except KeyboardInterrupt:
|
||||
logger.warning("\n\nScraping interrupted.")
|
||||
if converter.checkpoint_enabled:
|
||||
converter.save_checkpoint()
|
||||
logger.info("💾 Progress saved to checkpoint")
|
||||
logger.info(
|
||||
" Resume with: --config %s --resume",
|
||||
args.config if args.config else "config.json",
|
||||
)
|
||||
response = input("Continue with skill building? (y/n): ").strip().lower()
|
||||
if response != "y":
|
||||
return None
|
||||
# Check for resume
|
||||
if config.get("resume") and converter.checkpoint_exists():
|
||||
logger.info("📂 Resuming from checkpoint...")
|
||||
converter.load_checkpoint()
|
||||
else:
|
||||
logger.info("\n⏭️ Skipping scrape, using existing data")
|
||||
# Clear checkpoint if fresh start
|
||||
if config.get("fresh"):
|
||||
converter.clear_checkpoint()
|
||||
|
||||
# Scrape
|
||||
if not config.get("skip_scrape"):
|
||||
logger.info("\n🔍 Starting scrape...")
|
||||
try:
|
||||
asyncio.run(converter.scrape())
|
||||
except KeyboardInterrupt:
|
||||
logger.info("\n\n⚠️ Interrupted by user")
|
||||
converter.save_checkpoint()
|
||||
logger.info("💾 Checkpoint saved. Resume with --resume")
|
||||
return None
|
||||
|
||||
# Build skill
|
||||
success = converter.build_skill()
|
||||
|
||||
if not success:
|
||||
sys.exit(1)
|
||||
|
||||
# RAG chunking (optional - NEW v2.10.0)
|
||||
if args.chunk_for_rag:
|
||||
logger.info("\n" + "=" * 60)
|
||||
logger.info("🔪 Generating RAG chunks...")
|
||||
logger.info("=" * 60)
|
||||
|
||||
from skill_seekers.cli.rag_chunker import RAGChunker
|
||||
|
||||
chunker = RAGChunker(
|
||||
chunk_size=args.chunk_tokens,
|
||||
chunk_overlap=args.chunk_overlap_tokens,
|
||||
preserve_code_blocks=not args.no_preserve_code_blocks,
|
||||
preserve_paragraphs=not args.no_preserve_paragraphs,
|
||||
)
|
||||
|
||||
# Chunk the skill
|
||||
skill_dir = Path(converter.skill_dir)
|
||||
chunks = chunker.chunk_skill(skill_dir)
|
||||
|
||||
# Save chunks
|
||||
chunks_path = skill_dir / "rag_chunks.json"
|
||||
chunker.save_chunks(chunks, chunks_path)
|
||||
|
||||
logger.info(f"✅ Generated {len(chunks)} RAG chunks")
|
||||
logger.info(f"📄 Saved to: {chunks_path}")
|
||||
logger.info(f"💡 Use with LangChain: --target langchain")
|
||||
logger.info(f"💡 Use with LlamaIndex: --target llama-index")
|
||||
|
||||
# ============================================================
|
||||
# WORKFLOW SYSTEM INTEGRATION (Phase 2 - doc_scraper)
|
||||
# ============================================================
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
# Pass doc-scraper-specific context to workflows
|
||||
doc_context = {
|
||||
"name": config["name"],
|
||||
"base_url": config.get("base_url", ""),
|
||||
"description": config.get("description", ""),
|
||||
}
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args, context=doc_context)
|
||||
|
||||
# Store workflow execution status on converter for execute_enhancement() to access
|
||||
converter.workflow_executed = workflow_executed
|
||||
converter.workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
logger.info("\n📦 Building skill...")
|
||||
converter.build_skill()
|
||||
|
||||
return converter
|
||||
|
||||
|
||||
def execute_enhancement(config: dict[str, Any], args: argparse.Namespace, converter=None) -> None:
|
||||
"""Execute optional SKILL.md enhancement with AI.
|
||||
def _run_enhancement(
|
||||
config: dict[str, Any],
|
||||
ctx: Any,
|
||||
_converter: Any,
|
||||
) -> None:
|
||||
"""Run enhancement using context settings."""
|
||||
from pathlib import Path
|
||||
|
||||
Supports two enhancement modes:
|
||||
1. API-based enhancement (requires ANTHROPIC_API_KEY)
|
||||
2. Local enhancement using a coding agent CLI (no API key needed)
|
||||
skill_dir = f"output/{config['name']}"
|
||||
|
||||
Prints appropriate messages and suggestions based on whether
|
||||
enhancement was requested and whether it succeeded.
|
||||
logger.info("\n" + "=" * 60)
|
||||
logger.info(f"🤖 Enhancing SKILL.md (level {ctx.enhancement.level})")
|
||||
logger.info("=" * 60)
|
||||
|
||||
Args:
|
||||
config (dict): Configuration dictionary with skill name
|
||||
args: Parsed command-line arguments with enhancement flags
|
||||
converter: Optional DocToSkillConverter instance (to check workflow status)
|
||||
# Use AgentClient from context
|
||||
try:
|
||||
agent_client = ctx.get_agent_client()
|
||||
|
||||
Example:
|
||||
>>> execute_enhancement(config, args)
|
||||
# Runs enhancement if --enhance or --enhance-local flag is set
|
||||
"""
|
||||
import subprocess
|
||||
# Run enhancement based on mode
|
||||
if agent_client.mode == "api" and agent_client.client:
|
||||
# API mode enhancement
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
# Check if workflow was already executed (for logging context)
|
||||
workflow_executed = (
|
||||
converter and hasattr(converter, "workflow_executed") and converter.workflow_executed
|
||||
)
|
||||
workflow_name = converter.workflow_name if workflow_executed else None
|
||||
# Use AgentClient's API key detection (respects priority: CLI > config > env)
|
||||
api_key = ctx.enhancement.api_key or agent_client.api_key
|
||||
if api_key:
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
logger.info("✅ API enhancement complete!")
|
||||
else:
|
||||
logger.warning("⚠️ No API key available for enhancement")
|
||||
else:
|
||||
# Local mode enhancement
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
# Optional enhancement with auto-detected mode (API or LOCAL)
|
||||
# Note: Runs independently of workflow system (they complement each other)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
import os
|
||||
|
||||
has_api_key = bool(os.environ.get("ANTHROPIC_API_KEY") or args.api_key)
|
||||
mode = "API" if has_api_key else "LOCAL"
|
||||
|
||||
logger.info("\n" + "=" * 80)
|
||||
logger.info(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
logger.info("=" * 80)
|
||||
if workflow_executed:
|
||||
logger.info(f" Running after workflow: {workflow_name}")
|
||||
logger.info(
|
||||
" (Workflow provides specialized analysis, enhancement provides general improvements)"
|
||||
enhancer = LocalSkillEnhancer(
|
||||
Path(skill_dir),
|
||||
agent=ctx.enhancement.agent,
|
||||
agent_cmd=ctx.enhancement.agent_cmd,
|
||||
)
|
||||
logger.info("")
|
||||
|
||||
try:
|
||||
enhance_cmd = ["skill-seekers-enhance", f"output/{config['name']}/"]
|
||||
|
||||
if args.api_key:
|
||||
enhance_cmd.extend(["--api-key", args.api_key])
|
||||
if getattr(args, "agent", None):
|
||||
enhance_cmd.extend(["--agent", args.agent])
|
||||
if getattr(args, "interactive_enhancement", False):
|
||||
enhance_cmd.append("--interactive-enhancement")
|
||||
|
||||
result = subprocess.run(enhance_cmd, check=True)
|
||||
if result.returncode == 0:
|
||||
logger.info("\n✅ Enhancement complete!")
|
||||
except subprocess.CalledProcessError:
|
||||
logger.warning("\n⚠ Enhancement failed, but skill was still built")
|
||||
except FileNotFoundError:
|
||||
logger.warning("\n⚠ skill-seekers-enhance command not found. Run manually:")
|
||||
logger.info(
|
||||
" skill-seekers enhance output/%s/",
|
||||
config["name"],
|
||||
)
|
||||
|
||||
# Print packaging instructions
|
||||
logger.info("\n📦 Package your skill:")
|
||||
logger.info(" skill-seekers-package output/%s/", config["name"])
|
||||
|
||||
# Suggest enhancement if not done
|
||||
if getattr(args, "enhance_level", 0) == 0:
|
||||
logger.info("\n💡 Optional: Enhance SKILL.md with AI:")
|
||||
logger.info(" skill-seekers enhance output/%s/", config["name"])
|
||||
logger.info(" or re-run with: --enhance-level 2 (auto-detects API vs LOCAL mode)")
|
||||
logger.info(
|
||||
" API-based: skill-seekers-enhance-api output/%s/",
|
||||
config["name"],
|
||||
)
|
||||
logger.info(" or re-run with: --enhance")
|
||||
logger.info(
|
||||
"\n💡 Tip: Use --interactive-enhancement with --enhance-local to open terminal window"
|
||||
)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = setup_argument_parser()
|
||||
args = parser.parse_args()
|
||||
|
||||
# Setup logging based on verbosity flags
|
||||
setup_logging(verbose=args.verbose, quiet=args.quiet)
|
||||
|
||||
config = get_configuration(args)
|
||||
|
||||
# Execute scraping and building
|
||||
converter = execute_scraping_and_building(config, args)
|
||||
|
||||
# Exit if dry run or aborted
|
||||
if converter is None:
|
||||
return
|
||||
|
||||
# Execute enhancement and print instructions (pass converter for workflow status check)
|
||||
execute_enhancement(config, args, converter)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
success = enhancer.run(headless=True, timeout=ctx.enhancement.timeout)
|
||||
if success:
|
||||
agent_name = ctx.enhancement.agent or "claude"
|
||||
logger.info(f"✅ Local enhancement complete! (via {agent_name})")
|
||||
else:
|
||||
logger.warning("⚠️ Local enhancement did not complete")
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ Enhancement failed: {e}")
|
||||
|
||||
@@ -200,11 +200,22 @@ class LocalSkillEnhancer:
|
||||
raise ValueError(f"Executable '{executable}' not found in PATH")
|
||||
|
||||
def _resolve_agent(self, agent, agent_cmd):
|
||||
# Priority: explicit param > ExecutionContext > env var > default
|
||||
try:
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
ctx = ExecutionContext.get()
|
||||
ctx_agent = ctx.enhancement.agent or ""
|
||||
ctx_cmd = ctx.enhancement.agent_cmd or ""
|
||||
except Exception:
|
||||
ctx_agent = ""
|
||||
ctx_cmd = ""
|
||||
|
||||
env_agent = os.environ.get("SKILL_SEEKER_AGENT", "").strip()
|
||||
env_cmd = os.environ.get("SKILL_SEEKER_AGENT_CMD", "").strip()
|
||||
|
||||
agent_name = _normalize_agent_name(agent or env_agent or "claude")
|
||||
cmd_override = agent_cmd or env_cmd or None
|
||||
agent_name = _normalize_agent_name(agent or ctx_agent or env_agent or "claude")
|
||||
cmd_override = agent_cmd or ctx_cmd or env_cmd or None
|
||||
|
||||
if agent_name == "custom":
|
||||
if not cmd_override:
|
||||
|
||||
@@ -10,12 +10,10 @@ Usage:
|
||||
skill-seekers epub --from-json book_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Optional dependency guard
|
||||
@@ -30,6 +28,8 @@ except ImportError:
|
||||
# BeautifulSoup is a core dependency (always available)
|
||||
from bs4 import BeautifulSoup, Comment
|
||||
|
||||
from .skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -68,10 +68,13 @@ def infer_description_from_epub(metadata: dict | None = None, name: str = "") ->
|
||||
)
|
||||
|
||||
|
||||
class EpubToSkillConverter:
|
||||
class EpubToSkillConverter(SkillConverter):
|
||||
"""Convert EPUB e-book to AI skill."""
|
||||
|
||||
SOURCE_TYPE = "epub"
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.epub_path = config.get("epub_path", "")
|
||||
@@ -89,6 +92,10 @@ class EpubToSkillConverter:
|
||||
# Extracted data
|
||||
self.extracted_data = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to extract_epub()."""
|
||||
return self.extract_epub()
|
||||
|
||||
def extract_epub(self):
|
||||
"""Extract content from EPUB file.
|
||||
|
||||
@@ -1068,143 +1075,3 @@ def _score_code_quality(code: str) -> float:
|
||||
score -= 2.0
|
||||
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
def main():
|
||||
from .arguments.epub import add_epub_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert EPUB e-book to skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
add_epub_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = getattr(args, "epub", None) or getattr(args, "from_json", None) or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: EPUB Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (getattr(args, "epub", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --epub or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} documentation",
|
||||
}
|
||||
try:
|
||||
converter = EpubToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct EPUB mode
|
||||
if not getattr(args, "name", None):
|
||||
# Auto-detect name from filename
|
||||
args.name = Path(args.epub).stem
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"epub_path": args.epub,
|
||||
# Pass None so extract_epub() can infer from EPUB metadata
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = EpubToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_epub():
|
||||
print("\n❌ EPUB extraction failed - see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
import os
|
||||
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during EPUB processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
549
src/skill_seekers/cli/execution_context.py
Normal file
549
src/skill_seekers/cli/execution_context.py
Normal file
@@ -0,0 +1,549 @@
|
||||
"""ExecutionContext - Single source of truth for all configuration.
|
||||
|
||||
This module provides a singleton context object that holds all resolved
|
||||
configuration from CLI args, config files, and environment variables.
|
||||
All components read from this context instead of parsing their own argv.
|
||||
|
||||
Example:
|
||||
>>> from skill_seekers.cli.execution_context import ExecutionContext
|
||||
>>> ctx = ExecutionContext.initialize(args=parsed_args)
|
||||
>>> ctx = ExecutionContext.get() # Get initialized instance
|
||||
>>> print(ctx.output.name)
|
||||
>>> print(ctx.enhancement.agent)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import contextlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import threading
|
||||
from pathlib import Path
|
||||
from typing import Any, ClassVar, Literal
|
||||
from collections.abc import Generator
|
||||
|
||||
from pydantic import BaseModel, Field, PrivateAttr
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SourceInfoConfig(BaseModel):
|
||||
"""Source detection results."""
|
||||
|
||||
type: str = Field(..., description="Source type (web, github, pdf, etc.)")
|
||||
raw_source: str = Field(..., description="Original user input")
|
||||
parsed: dict[str, Any] = Field(default_factory=dict, description="Parsed source details")
|
||||
suggested_name: str = Field(default="", description="Auto-generated skill name")
|
||||
|
||||
|
||||
class EnhancementSettings(BaseModel):
|
||||
"""AI enhancement configuration."""
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {
|
||||
"enabled": True,
|
||||
"level": 2,
|
||||
"mode": "auto",
|
||||
"agent": "kimi",
|
||||
"timeout": 2700,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
enabled: bool = Field(default=True, description="Whether enhancement is enabled")
|
||||
level: int = Field(default=2, ge=0, le=3, description="Enhancement level (0-3)")
|
||||
mode: str = Field(default="auto", description="Mode: api, local, or auto")
|
||||
agent: str | None = Field(default=None, description="Local agent name (claude, kimi, etc.)")
|
||||
agent_cmd: str | None = Field(default=None, description="Custom agent command override")
|
||||
api_key: str | None = Field(default=None, description="API key for enhancement")
|
||||
timeout: int = Field(default=2700, description="Timeout in seconds (default: 45min)")
|
||||
workflows: list[str] = Field(default_factory=list, description="Enhancement workflow names")
|
||||
stages: list[str] = Field(default_factory=list, description="Inline enhancement stages")
|
||||
workflow_vars: dict[str, str] = Field(default_factory=dict, description="Workflow variables")
|
||||
|
||||
|
||||
class OutputSettings(BaseModel):
|
||||
"""Output configuration."""
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {
|
||||
"name": "react-docs",
|
||||
"doc_version": "18.2",
|
||||
"dry_run": False,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
name: str | None = Field(default=None, description="Skill name")
|
||||
output_dir: str | None = Field(default=None, description="Output directory override")
|
||||
doc_version: str = Field(default="", description="Documentation version tag")
|
||||
dry_run: bool = Field(default=False, description="Preview mode without execution")
|
||||
|
||||
|
||||
class ScrapingSettings(BaseModel):
|
||||
"""Web scraping configuration."""
|
||||
|
||||
max_pages: int | None = Field(default=None, description="Maximum pages to scrape")
|
||||
rate_limit: float | None = Field(default=None, description="Rate limit in seconds")
|
||||
browser: bool = Field(default=False, description="Use headless browser for JS sites")
|
||||
browser_wait_until: str = Field(
|
||||
default="domcontentloaded", description="Browser wait condition"
|
||||
)
|
||||
browser_extra_wait: int = Field(default=0, description="Extra wait time in ms after page load")
|
||||
workers: int = Field(default=1, description="Number of parallel workers")
|
||||
async_mode: bool = Field(default=False, description="Enable async mode")
|
||||
resume: bool = Field(default=False, description="Resume from checkpoint")
|
||||
fresh: bool = Field(default=False, description="Clear checkpoint and start fresh")
|
||||
skip_scrape: bool = Field(default=False, description="Skip scraping, use existing data")
|
||||
languages: list[str] = Field(default_factory=lambda: ["en"], description="Language preferences")
|
||||
|
||||
|
||||
class AnalysisSettings(BaseModel):
|
||||
"""Code analysis configuration."""
|
||||
|
||||
depth: Literal["surface", "deep", "full"] = Field(
|
||||
default="surface", description="Analysis depth: surface, deep, full"
|
||||
)
|
||||
skip_patterns: bool = Field(default=False, description="Skip design pattern detection")
|
||||
skip_test_examples: bool = Field(default=False, description="Skip test example extraction")
|
||||
skip_how_to_guides: bool = Field(default=False, description="Skip how-to guide generation")
|
||||
skip_config_patterns: bool = Field(default=False, description="Skip config pattern extraction")
|
||||
skip_api_reference: bool = Field(default=False, description="Skip API reference generation")
|
||||
skip_dependency_graph: bool = Field(default=False, description="Skip dependency graph")
|
||||
skip_docs: bool = Field(default=False, description="Skip documentation extraction")
|
||||
no_comments: bool = Field(default=False, description="Skip comment extraction")
|
||||
file_patterns: list[str] | None = Field(default=None, description="File patterns to analyze")
|
||||
|
||||
|
||||
class RAGSettings(BaseModel):
|
||||
"""RAG (Retrieval-Augmented Generation) configuration."""
|
||||
|
||||
chunk_for_rag: bool = Field(default=False, description="Enable semantic chunking")
|
||||
chunk_tokens: int = Field(default=512, description="Chunk size in tokens")
|
||||
chunk_overlap_tokens: int = Field(default=50, description="Overlap between chunks")
|
||||
preserve_code_blocks: bool = Field(default=True, description="Don't split code blocks")
|
||||
preserve_paragraphs: bool = Field(default=True, description="Respect paragraph boundaries")
|
||||
|
||||
|
||||
class ExecutionContext(BaseModel):
|
||||
"""Single source of truth for all execution configuration.
|
||||
|
||||
This is a singleton - use ExecutionContext.get() to access the instance.
|
||||
Initialize once at entry point with ExecutionContext.initialize().
|
||||
|
||||
Example:
|
||||
>>> ctx = ExecutionContext.initialize(args=parsed_args)
|
||||
>>> ctx = ExecutionContext.get() # Get initialized instance
|
||||
>>> print(ctx.output.name)
|
||||
"""
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {
|
||||
"source": {"type": "web", "raw_source": "https://react.dev/"},
|
||||
"enhancement": {"level": 2, "agent": "kimi"},
|
||||
"output": {"name": "react-docs"},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Configuration sections
|
||||
source: SourceInfoConfig | None = Field(default=None, description="Source information")
|
||||
enhancement: EnhancementSettings = Field(default_factory=EnhancementSettings)
|
||||
output: OutputSettings = Field(default_factory=OutputSettings)
|
||||
scraping: ScrapingSettings = Field(default_factory=ScrapingSettings)
|
||||
analysis: AnalysisSettings = Field(default_factory=AnalysisSettings)
|
||||
rag: RAGSettings = Field(default_factory=RAGSettings)
|
||||
|
||||
# Private attributes
|
||||
_raw_args: dict[str, Any] = PrivateAttr(default_factory=dict)
|
||||
_config_path: str | None = PrivateAttr(default=None)
|
||||
|
||||
# Singleton storage (class-level)
|
||||
_instance: ClassVar[ExecutionContext | None] = None
|
||||
_lock: ClassVar[threading.Lock] = threading.Lock()
|
||||
_initialized: ClassVar[bool] = False
|
||||
|
||||
@classmethod
|
||||
def get(cls) -> ExecutionContext:
|
||||
"""Get the singleton instance (thread-safe).
|
||||
|
||||
Returns a default context if not explicitly initialized.
|
||||
This ensures components can always read from the context
|
||||
without try/except blocks.
|
||||
"""
|
||||
with cls._lock:
|
||||
if cls._instance is None:
|
||||
cls._instance = cls()
|
||||
logger.debug("ExecutionContext auto-initialized with defaults")
|
||||
return cls._instance
|
||||
|
||||
@classmethod
|
||||
def initialize(
|
||||
cls,
|
||||
args: Any | None = None,
|
||||
config_path: str | None = None,
|
||||
source_info: Any | None = None,
|
||||
) -> ExecutionContext:
|
||||
"""Initialize the singleton context.
|
||||
|
||||
Priority (highest to lowest):
|
||||
1. CLI args (explicit user input)
|
||||
2. Config file (JSON config)
|
||||
3. Environment variables
|
||||
4. Defaults
|
||||
|
||||
Args:
|
||||
args: Parsed argparse.Namespace
|
||||
config_path: Path to config JSON file
|
||||
source_info: SourceInfo from source_detector
|
||||
|
||||
Returns:
|
||||
Initialized ExecutionContext instance
|
||||
"""
|
||||
with cls._lock:
|
||||
if cls._initialized:
|
||||
logger.info(
|
||||
"ExecutionContext.initialize() called again — returning existing instance. "
|
||||
"Use ExecutionContext.reset() first if re-initialization is intended."
|
||||
)
|
||||
return cls._instance
|
||||
|
||||
context_data = cls._build_from_sources(args, config_path, source_info)
|
||||
cls._instance = cls.model_validate(context_data)
|
||||
if args:
|
||||
cls._instance._raw_args = vars(args)
|
||||
cls._instance._config_path = config_path
|
||||
cls._initialized = True
|
||||
return cls._instance
|
||||
|
||||
@classmethod
|
||||
def reset(cls) -> None:
|
||||
"""Reset the singleton (mainly for testing)."""
|
||||
with cls._lock:
|
||||
cls._instance = None
|
||||
cls._initialized = False
|
||||
|
||||
@classmethod
|
||||
def _build_from_sources(
|
||||
cls,
|
||||
args: Any | None,
|
||||
config_path: str | None,
|
||||
source_info: Any | None,
|
||||
) -> dict[str, Any]:
|
||||
"""Build context dict from all configuration sources."""
|
||||
# Start with defaults
|
||||
data = cls._default_data()
|
||||
|
||||
# Layer 1: Config file
|
||||
if config_path:
|
||||
file_config = cls._load_config_file(config_path)
|
||||
data = cls._deep_merge(data, file_config)
|
||||
|
||||
# Layer 2: CLI args (override config file)
|
||||
if args:
|
||||
arg_config = cls._args_to_data(args)
|
||||
data = cls._deep_merge(data, arg_config)
|
||||
|
||||
# Layer 3: Source info
|
||||
if source_info:
|
||||
data["source"] = {
|
||||
"type": source_info.type,
|
||||
"raw_source": getattr(source_info, "raw_source", None)
|
||||
or getattr(source_info, "raw_input", ""),
|
||||
"parsed": source_info.parsed,
|
||||
"suggested_name": source_info.suggested_name,
|
||||
}
|
||||
|
||||
return data
|
||||
|
||||
@classmethod
|
||||
def _default_data(cls) -> dict[str, Any]:
|
||||
"""Get default configuration."""
|
||||
from skill_seekers.cli.agent_client import get_default_timeout
|
||||
|
||||
return {
|
||||
"enhancement": {
|
||||
"enabled": True,
|
||||
"level": 2,
|
||||
# Env-var-based mode detection (lowest priority — CLI and config override this)
|
||||
"mode": "api"
|
||||
if any(
|
||||
os.environ.get(k)
|
||||
for k in (
|
||||
"ANTHROPIC_API_KEY",
|
||||
"OPENAI_API_KEY",
|
||||
"MOONSHOT_API_KEY",
|
||||
"GOOGLE_API_KEY",
|
||||
)
|
||||
)
|
||||
else "auto",
|
||||
"agent": os.environ.get("SKILL_SEEKER_AGENT"),
|
||||
"agent_cmd": None,
|
||||
"api_key": None,
|
||||
"timeout": get_default_timeout(),
|
||||
"workflows": [],
|
||||
"stages": [],
|
||||
"workflow_vars": {},
|
||||
},
|
||||
"output": {
|
||||
"name": None,
|
||||
"output_dir": None,
|
||||
"doc_version": "",
|
||||
"dry_run": False,
|
||||
},
|
||||
"scraping": {
|
||||
"max_pages": None,
|
||||
"rate_limit": None,
|
||||
"browser": False,
|
||||
"browser_wait_until": "domcontentloaded",
|
||||
"browser_extra_wait": 0,
|
||||
"workers": 1,
|
||||
"async_mode": False,
|
||||
"resume": False,
|
||||
"fresh": False,
|
||||
"skip_scrape": False,
|
||||
"languages": ["en"],
|
||||
},
|
||||
"analysis": {
|
||||
"depth": "surface",
|
||||
"skip_patterns": False,
|
||||
"skip_test_examples": False,
|
||||
"skip_how_to_guides": False,
|
||||
"skip_config_patterns": False,
|
||||
"skip_api_reference": False,
|
||||
"skip_dependency_graph": False,
|
||||
"skip_docs": False,
|
||||
"no_comments": False,
|
||||
"file_patterns": None,
|
||||
},
|
||||
"rag": {
|
||||
"chunk_for_rag": False,
|
||||
"chunk_tokens": 512,
|
||||
"chunk_overlap_tokens": 50,
|
||||
"preserve_code_blocks": True,
|
||||
"preserve_paragraphs": True,
|
||||
},
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def _load_config_file(cls, config_path: str) -> dict[str, Any]:
|
||||
"""Load and normalize config file."""
|
||||
path = Path(config_path)
|
||||
try:
|
||||
with open(path, encoding="utf-8") as f:
|
||||
file_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
raise ValueError(f"Config file not found: {config_path}") from None
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"Invalid JSON in config file {config_path}: {e}") from None
|
||||
|
||||
config: dict[str, Any] = {}
|
||||
|
||||
# Unified config format (sources array)
|
||||
if "sources" in file_data:
|
||||
enhancement = file_data.get("enhancement", {})
|
||||
|
||||
# Handle timeout field (can be "unlimited" or integer)
|
||||
timeout_val = enhancement.get("timeout", 2700)
|
||||
if isinstance(timeout_val, str) and timeout_val.lower() in ("unlimited", "none"):
|
||||
from skill_seekers.cli.agent_client import UNLIMITED_TIMEOUT
|
||||
|
||||
timeout_val = UNLIMITED_TIMEOUT
|
||||
|
||||
config["output"] = {
|
||||
"name": file_data.get("name"),
|
||||
"doc_version": file_data.get("version", ""),
|
||||
}
|
||||
config["enhancement"] = {
|
||||
"enabled": enhancement.get("enabled", True),
|
||||
"level": enhancement.get("level", 2),
|
||||
"mode": enhancement.get("mode", "auto").lower(),
|
||||
"agent": enhancement.get("agent"),
|
||||
"timeout": timeout_val,
|
||||
"workflows": file_data.get("workflows", []),
|
||||
"stages": file_data.get("workflow_stages", []),
|
||||
"workflow_vars": file_data.get("workflow_vars", {}),
|
||||
}
|
||||
|
||||
# Simple web config format
|
||||
elif "base_url" in file_data:
|
||||
config["output"] = {
|
||||
"name": file_data.get("name"),
|
||||
"doc_version": file_data.get("version", ""),
|
||||
}
|
||||
config["scraping"] = {
|
||||
"max_pages": file_data.get("max_pages"),
|
||||
"rate_limit": file_data.get("rate_limit"),
|
||||
"browser": file_data.get("browser", False),
|
||||
}
|
||||
|
||||
return config
|
||||
|
||||
@classmethod
|
||||
def _args_to_data(cls, args: Any) -> dict[str, Any]:
|
||||
"""Convert argparse.Namespace to config dict."""
|
||||
config: dict[str, Any] = {}
|
||||
|
||||
# Output
|
||||
if hasattr(args, "name") and args.name is not None:
|
||||
config.setdefault("output", {})["name"] = args.name
|
||||
if hasattr(args, "output") and args.output is not None:
|
||||
config.setdefault("output", {})["output_dir"] = args.output
|
||||
if hasattr(args, "doc_version") and args.doc_version:
|
||||
config.setdefault("output", {})["doc_version"] = args.doc_version
|
||||
if getattr(args, "dry_run", False):
|
||||
config.setdefault("output", {})["dry_run"] = True
|
||||
|
||||
# Enhancement
|
||||
if hasattr(args, "enhance_level") and args.enhance_level is not None:
|
||||
config.setdefault("enhancement", {})["level"] = args.enhance_level
|
||||
if getattr(args, "agent", None):
|
||||
config.setdefault("enhancement", {})["agent"] = args.agent
|
||||
if getattr(args, "agent_cmd", None):
|
||||
config.setdefault("enhancement", {})["agent_cmd"] = args.agent_cmd
|
||||
if getattr(args, "api_key", None):
|
||||
config.setdefault("enhancement", {})["api_key"] = args.api_key
|
||||
|
||||
# Resolve mode from explicit CLI flags:
|
||||
# --api-key → "api", --agent (without --api-key) → "local".
|
||||
# Env-var-based mode detection belongs in _default_data(), not here,
|
||||
# to preserve the priority: CLI args > Config file > Env vars > Defaults.
|
||||
if getattr(args, "api_key", None):
|
||||
config.setdefault("enhancement", {})["mode"] = "api"
|
||||
elif getattr(args, "agent", None):
|
||||
config.setdefault("enhancement", {})["mode"] = "local"
|
||||
|
||||
# Workflows
|
||||
if getattr(args, "enhance_workflow", None):
|
||||
config.setdefault("enhancement", {})["workflows"] = list(args.enhance_workflow)
|
||||
if getattr(args, "enhance_stage", None):
|
||||
config.setdefault("enhancement", {})["stages"] = list(args.enhance_stage)
|
||||
if getattr(args, "var", None):
|
||||
config.setdefault("enhancement", {})["workflow_vars"] = cls._parse_vars(args.var)
|
||||
|
||||
# Scraping
|
||||
if hasattr(args, "max_pages") and args.max_pages is not None:
|
||||
config.setdefault("scraping", {})["max_pages"] = args.max_pages
|
||||
if hasattr(args, "rate_limit") and args.rate_limit is not None:
|
||||
config.setdefault("scraping", {})["rate_limit"] = args.rate_limit
|
||||
if getattr(args, "browser", False):
|
||||
config.setdefault("scraping", {})["browser"] = True
|
||||
if hasattr(args, "workers") and args.workers:
|
||||
config.setdefault("scraping", {})["workers"] = args.workers
|
||||
if getattr(args, "async_mode", False):
|
||||
config.setdefault("scraping", {})["async_mode"] = True
|
||||
if getattr(args, "resume", False):
|
||||
config.setdefault("scraping", {})["resume"] = True
|
||||
if getattr(args, "fresh", False):
|
||||
config.setdefault("scraping", {})["fresh"] = True
|
||||
if getattr(args, "skip_scrape", False):
|
||||
config.setdefault("scraping", {})["skip_scrape"] = True
|
||||
|
||||
# Analysis
|
||||
if getattr(args, "depth", None):
|
||||
config.setdefault("analysis", {})["depth"] = args.depth
|
||||
if getattr(args, "skip_patterns", False):
|
||||
config.setdefault("analysis", {})["skip_patterns"] = True
|
||||
if getattr(args, "skip_test_examples", False):
|
||||
config.setdefault("analysis", {})["skip_test_examples"] = True
|
||||
if getattr(args, "skip_how_to_guides", False):
|
||||
config.setdefault("analysis", {})["skip_how_to_guides"] = True
|
||||
if getattr(args, "file_patterns", None):
|
||||
config.setdefault("analysis", {})["file_patterns"] = [
|
||||
p.strip() for p in args.file_patterns.split(",")
|
||||
]
|
||||
|
||||
# RAG
|
||||
if getattr(args, "chunk_for_rag", False):
|
||||
config.setdefault("rag", {})["chunk_for_rag"] = True
|
||||
if hasattr(args, "chunk_tokens") and args.chunk_tokens is not None:
|
||||
config.setdefault("rag", {})["chunk_tokens"] = args.chunk_tokens
|
||||
|
||||
return config
|
||||
|
||||
@staticmethod
|
||||
def _deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Deep merge override into base."""
|
||||
result = base.copy()
|
||||
for key, value in override.items():
|
||||
if isinstance(value, dict) and key in result and isinstance(result[key], dict):
|
||||
result[key] = ExecutionContext._deep_merge(result[key], value)
|
||||
else:
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _parse_vars(var_list: list[str]) -> dict[str, str]:
|
||||
"""Parse --var key=value into dict."""
|
||||
result = {}
|
||||
for var in var_list:
|
||||
if "=" in var:
|
||||
key, value = var.split("=", 1)
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
@property
|
||||
def config_path(self) -> str | None:
|
||||
"""Path to the config file used for initialization, if any."""
|
||||
return self._config_path
|
||||
|
||||
def get_raw(self, name: str, default: Any = None) -> Any:
|
||||
"""Get raw argument value (backward compatibility)."""
|
||||
return self._raw_args.get(name, default)
|
||||
|
||||
def get_agent_client(self) -> Any:
|
||||
"""Get configured AgentClient from context."""
|
||||
from skill_seekers.cli.agent_client import AgentClient
|
||||
|
||||
return AgentClient(mode=self.enhancement.mode, agent=self.enhancement.agent)
|
||||
|
||||
@contextlib.contextmanager
|
||||
def override(self, **kwargs: Any) -> Generator[ExecutionContext, None, None]:
|
||||
"""Temporarily override context values.
|
||||
|
||||
Thread-safe: uses an override stack so nested/concurrent overrides
|
||||
restore correctly regardless of ordering.
|
||||
|
||||
Usage:
|
||||
with ctx.override(enhancement__level=3):
|
||||
run_workflow() # Uses level 3
|
||||
# Original values restored
|
||||
"""
|
||||
# Create new data with overrides
|
||||
current_data = self.model_dump(exclude={"_raw_args"})
|
||||
|
||||
for key, value in kwargs.items():
|
||||
if "__" in key:
|
||||
parts = key.split("__")
|
||||
target = current_data
|
||||
for part in parts[:-1]:
|
||||
target = target.setdefault(part, {})
|
||||
target[parts[-1]] = value
|
||||
else:
|
||||
current_data[key] = value
|
||||
|
||||
# Create temporary instance and preserve _raw_args
|
||||
temp_ctx = self.__class__.model_validate(current_data)
|
||||
temp_ctx._raw_args = dict(self._raw_args) # Copy raw args to temp context
|
||||
|
||||
# Swap singleton atomically and save previous state on a stack
|
||||
# so nested/concurrent overrides restore in the correct order.
|
||||
with self.__class__._lock:
|
||||
saved = (self.__class__._instance, self.__class__._initialized)
|
||||
self.__class__._instance = temp_ctx
|
||||
self.__class__._initialized = True
|
||||
try:
|
||||
yield temp_ctx
|
||||
finally:
|
||||
with self.__class__._lock:
|
||||
self.__class__._instance = saved[0]
|
||||
self.__class__._initialized = saved[1]
|
||||
|
||||
|
||||
def get_context() -> ExecutionContext:
|
||||
"""Shortcut for ExecutionContext.get()."""
|
||||
return ExecutionContext.get()
|
||||
@@ -14,7 +14,6 @@ Usage:
|
||||
skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import fnmatch
|
||||
import itertools
|
||||
import json
|
||||
@@ -32,8 +31,7 @@ except ImportError:
|
||||
print("Error: PyGithub not installed. Run: pip install PyGithub")
|
||||
sys.exit(1)
|
||||
|
||||
from skill_seekers.cli.arguments.github import add_github_arguments
|
||||
from skill_seekers.cli.utils import setup_logging
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
# Try to import pathspec for .gitignore support
|
||||
try:
|
||||
@@ -183,7 +181,7 @@ def extract_description_from_readme(readme_content: str, repo_name: str) -> str:
|
||||
return f"Use when working with {project_name}"
|
||||
|
||||
|
||||
class GitHubScraper:
|
||||
class GitHubScraper(SkillConverter):
|
||||
"""
|
||||
GitHub Repository Scraper (C1.1-C1.9)
|
||||
|
||||
@@ -199,8 +197,11 @@ class GitHubScraper:
|
||||
- Releases
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "github"
|
||||
|
||||
def __init__(self, config: dict[str, Any], local_repo_path: str | None = None):
|
||||
"""Initialize GitHub scraper with configuration."""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.repo_name = config["repo"]
|
||||
self.name = config.get("name", self.repo_name.split("/")[-1])
|
||||
@@ -353,6 +354,15 @@ class GitHubScraper:
|
||||
logger.error(f"Unexpected error during scraping: {e}")
|
||||
raise
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to scrape()."""
|
||||
self.scrape()
|
||||
|
||||
def build_skill(self):
|
||||
"""SkillConverter interface — delegates to GitHubToSkillConverter."""
|
||||
converter = GitHubToSkillConverter(self.config)
|
||||
converter.build_skill()
|
||||
|
||||
def _fetch_repository(self):
|
||||
"""C1.1: Fetch repository structure using GitHub API."""
|
||||
logger.info(f"Fetching repository: {self.repo_name}")
|
||||
@@ -1379,186 +1389,3 @@ Use this skill when you need to:
|
||||
with open(structure_path, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
logger.info(f"Generated: {structure_path}")
|
||||
|
||||
|
||||
def setup_argument_parser() -> argparse.ArgumentParser:
|
||||
"""Setup and configure command-line argument parser.
|
||||
|
||||
Creates an ArgumentParser with all CLI options for the github scraper.
|
||||
All arguments are defined in skill_seekers.cli.arguments.github to ensure
|
||||
consistency between the standalone scraper and unified CLI.
|
||||
|
||||
Returns:
|
||||
argparse.ArgumentParser: Configured argument parser
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="GitHub Repository to AI Skill Converter",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
skill-seekers github --repo facebook/react
|
||||
skill-seekers github --config configs/react_github.json
|
||||
skill-seekers github --repo owner/repo --token $GITHUB_TOKEN
|
||||
""",
|
||||
)
|
||||
|
||||
# Add all github arguments from shared definitions
|
||||
# This ensures the standalone scraper and unified CLI stay in sync
|
||||
add_github_arguments(parser)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
def main():
|
||||
"""C1.10: CLI tool entry point."""
|
||||
parser = setup_argument_parser()
|
||||
args = parser.parse_args()
|
||||
|
||||
setup_logging(verbose=getattr(args, "verbose", False), quiet=getattr(args, "quiet", False))
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
repo = args.repo or (args.config and "(from config)")
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f"DRY RUN: GitHub Repository Analysis")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Repository: {repo}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Include issues: {not getattr(args, 'no_issues', False)}")
|
||||
print(f"Include releases: {not getattr(args, 'no_releases', False)}")
|
||||
print(f"Include changelog: {not getattr(args, 'no_changelog', False)}")
|
||||
print(f"Max issues: {getattr(args, 'max_issues', 100)}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"Profile: {getattr(args, 'profile', None) or '(default)'}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Build config from args or file
|
||||
if args.config:
|
||||
with open(args.config, encoding="utf-8") as f:
|
||||
config = json.load(f)
|
||||
# Override with CLI args if provided
|
||||
if args.non_interactive:
|
||||
config["interactive"] = False
|
||||
if args.profile:
|
||||
config["github_profile"] = args.profile
|
||||
elif args.repo:
|
||||
config = {
|
||||
"repo": args.repo,
|
||||
"name": args.name or args.repo.split("/")[-1],
|
||||
"description": args.description or f"Use when working with {args.repo.split('/')[-1]}",
|
||||
"github_token": args.token,
|
||||
"include_issues": not args.no_issues,
|
||||
"include_changelog": not args.no_changelog,
|
||||
"include_releases": not args.no_releases,
|
||||
"max_issues": args.max_issues,
|
||||
"interactive": not args.non_interactive,
|
||||
"github_profile": args.profile,
|
||||
"local_repo_path": getattr(args, "local_repo_path", None),
|
||||
}
|
||||
else:
|
||||
parser.error("Either --repo or --config is required")
|
||||
|
||||
try:
|
||||
# Phase 1: Scrape GitHub repository
|
||||
scraper = GitHubScraper(config)
|
||||
scraper.scrape()
|
||||
|
||||
if args.scrape_only:
|
||||
logger.info("Scrape complete (--scrape-only mode)")
|
||||
return
|
||||
|
||||
# Phase 2: Build skill
|
||||
converter = GitHubToSkillConverter(config)
|
||||
converter.build_skill()
|
||||
|
||||
skill_name = config.get("name", config["repo"].split("/")[-1])
|
||||
skill_dir = f"output/{skill_name}"
|
||||
|
||||
# ============================================================
|
||||
# WORKFLOW SYSTEM INTEGRATION (Phase 2 - github_scraper)
|
||||
# ============================================================
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
# Pass GitHub-specific context to workflows
|
||||
github_context = {
|
||||
"repo": config.get("repo", ""),
|
||||
"name": skill_name,
|
||||
"description": config.get("description", ""),
|
||||
}
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args, context=github_context)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Phase 3: Optional enhancement with auto-detected mode
|
||||
# Note: Runs independently of workflow system (they complement each other)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
import os
|
||||
|
||||
# Auto-detect mode based on API key availability
|
||||
api_key = args.api_key or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
logger.info("\n" + "=" * 80)
|
||||
logger.info(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
logger.info("=" * 80)
|
||||
if workflow_executed:
|
||||
logger.info(f" Running after workflow: {workflow_name}")
|
||||
logger.info(
|
||||
" (Workflow provides specialized analysis, enhancement provides general improvements)"
|
||||
)
|
||||
logger.info("")
|
||||
|
||||
if api_key:
|
||||
# API-based enhancement
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
logger.info("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
logger.error("❌ API enhancement not available. Install: pip install anthropic")
|
||||
logger.info("💡 Falling back to LOCAL mode...")
|
||||
# Fall back to LOCAL mode
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
agent_name = agent or "claude"
|
||||
logger.info(f"✅ Local enhancement complete! (via {agent_name})")
|
||||
else:
|
||||
# LOCAL enhancement (no API key)
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
agent_name = agent or "claude"
|
||||
logger.info(f"✅ Local enhancement complete! (via {agent_name})")
|
||||
|
||||
logger.info(f"\n✅ Success! Skill created at: {skill_dir}/")
|
||||
|
||||
# Only suggest enhancement if neither workflow nor traditional enhancement was done
|
||||
if not workflow_executed and getattr(args, "enhance_level", 0) == 0:
|
||||
logger.info("\n💡 Optional: Enhance SKILL.md with AI:")
|
||||
logger.info(f" skill-seekers enhance {skill_dir}/ --enhance-level 2")
|
||||
logger.info(" (auto-detects API vs LOCAL mode based on ANTHROPIC_API_KEY)")
|
||||
logger.info("\n💡 Or use a workflow:")
|
||||
logger.info(
|
||||
f" skill-seekers github --repo {config['repo']} --enhance-workflow architecture-comprehensive"
|
||||
)
|
||||
|
||||
logger.info(f"\nNext step: skill-seekers package {skill_dir}/")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -908,8 +908,29 @@ class HowToGuideBuilder:
|
||||
return collection
|
||||
|
||||
def _extract_workflow_examples(self, examples: list[dict]) -> list[dict]:
|
||||
"""Filter to workflow category only"""
|
||||
return [ex for ex in examples if isinstance(ex, dict) and ex.get("category") == "workflow"]
|
||||
"""Filter to examples suitable for guide generation.
|
||||
|
||||
Includes:
|
||||
- All workflow-category examples
|
||||
- Setup/config examples with sufficient complexity (4+ steps or high confidence)
|
||||
- Instantiation examples with high confidence and multiple dependencies
|
||||
"""
|
||||
guide_worthy = []
|
||||
for ex in examples:
|
||||
if not isinstance(ex, dict):
|
||||
continue
|
||||
category = ex.get("category", "")
|
||||
complexity = ex.get("complexity_score", 0)
|
||||
confidence = ex.get("confidence", 0)
|
||||
|
||||
if category == "workflow":
|
||||
guide_worthy.append(ex)
|
||||
elif category in ("setup", "config") and (complexity >= 0.4 or confidence >= 0.7):
|
||||
guide_worthy.append(ex)
|
||||
elif category == "instantiation" and complexity >= 0.6 and confidence >= 0.7:
|
||||
guide_worthy.append(ex)
|
||||
|
||||
return guide_worthy
|
||||
|
||||
def _create_guide(self, title: str, workflows: list[dict], enhancer=None) -> HowToGuide:
|
||||
"""
|
||||
|
||||
@@ -16,17 +16,17 @@ Usage:
|
||||
skill-seekers html --from-json page_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# BeautifulSoup is a core dependency (always available)
|
||||
from bs4 import BeautifulSoup, Comment, Tag
|
||||
|
||||
from .skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# File extensions treated as HTML
|
||||
@@ -95,7 +95,7 @@ def _collect_html_files(html_path: str) -> list[Path]:
|
||||
raise ValueError(f"Path is neither a file nor a directory: {html_path}")
|
||||
|
||||
|
||||
class HtmlToSkillConverter:
|
||||
class HtmlToSkillConverter(SkillConverter):
|
||||
"""Convert local HTML files to a skill.
|
||||
|
||||
Supports single HTML files and directories of HTML files. Parses document
|
||||
@@ -112,6 +112,8 @@ class HtmlToSkillConverter:
|
||||
extracted_data: Parsed extraction results dict.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "html"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialize the HTML to skill converter.
|
||||
|
||||
@@ -122,6 +124,7 @@ class HtmlToSkillConverter:
|
||||
- description (str): Skill description (optional).
|
||||
- categories (dict): Category definitions for content grouping.
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.html_path: str = config.get("html_path", "")
|
||||
@@ -139,6 +142,10 @@ class HtmlToSkillConverter:
|
||||
# Extracted data
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to extract_html()."""
|
||||
return self.extract_html()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction
|
||||
# ------------------------------------------------------------------
|
||||
@@ -1742,205 +1749,3 @@ def _score_code_quality(code: str) -> float:
|
||||
score -= 2.0
|
||||
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the HTML scraper.
|
||||
|
||||
Parses command-line arguments and runs the extraction/build pipeline.
|
||||
Supports two workflows:
|
||||
1. Direct HTML extraction: ``--html-path page.html --name myskill``
|
||||
2. Build from JSON: ``--from-json page_extracted.json``
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for failure).
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert local HTML files to skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" %(prog)s --html-path page.html --name myskill\n"
|
||||
" %(prog)s --html-path ./docs/ --name myskill\n"
|
||||
" %(prog)s --from-json page_extracted.json\n"
|
||||
),
|
||||
)
|
||||
|
||||
# Shared universal args
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for HTML
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
action.help = (
|
||||
"AI enhancement level (auto-detects API vs LOCAL mode): "
|
||||
"0=disabled (default for HTML), 1=SKILL.md only, "
|
||||
"2=+architecture/config, 3=full enhancement. "
|
||||
"Mode selection: uses API if ANTHROPIC_API_KEY is set, "
|
||||
"otherwise LOCAL (Claude Code, Kimi, etc.)"
|
||||
)
|
||||
|
||||
# HTML-specific args
|
||||
parser.add_argument(
|
||||
"--html-path",
|
||||
type=str,
|
||||
help="Path to HTML file or directory of HTML files",
|
||||
metavar="PATH",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json",
|
||||
type=str,
|
||||
help="Build skill from previously extracted JSON",
|
||||
metavar="FILE",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = getattr(args, "html_path", None) or getattr(args, "from_json", None) or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: HTML Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (getattr(args, "html_path", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --html-path or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} documentation",
|
||||
}
|
||||
try:
|
||||
converter = HtmlToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct HTML mode
|
||||
if not getattr(args, "name", None):
|
||||
# Auto-detect name from path
|
||||
path = Path(args.html_path)
|
||||
args.name = path.stem if path.is_file() else path.name
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"html_path": args.html_path,
|
||||
# Pass None so extract_html() can infer from HTML metadata
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = HtmlToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_html():
|
||||
print(
|
||||
"\n❌ HTML extraction failed - see error above",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import (
|
||||
enhance_skill_md,
|
||||
)
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import (
|
||||
LocalSkillEnhancer,
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except (FileNotFoundError, ValueError) as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(
|
||||
f"\n❌ Unexpected error during HTML processing: {e}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -14,12 +14,10 @@ Usage:
|
||||
skill-seekers jupyter --from-json notebook_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Optional dependency guard
|
||||
@@ -30,6 +28,8 @@ try:
|
||||
except ImportError:
|
||||
JUPYTER_AVAILABLE = False
|
||||
|
||||
from .skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Import pattern categories for code analysis
|
||||
@@ -199,10 +199,13 @@ def infer_description_from_notebook(metadata: dict | None = None, name: str = ""
|
||||
)
|
||||
|
||||
|
||||
class JupyterToSkillConverter:
|
||||
class JupyterToSkillConverter(SkillConverter):
|
||||
"""Convert Jupyter Notebook (.ipynb) to skill."""
|
||||
|
||||
SOURCE_TYPE = "jupyter"
|
||||
|
||||
def __init__(self, config: dict):
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.notebook_path = config.get("notebook_path", "")
|
||||
@@ -214,6 +217,10 @@ class JupyterToSkillConverter:
|
||||
self.categories = config.get("categories", {})
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to extract_notebook()."""
|
||||
return self.extract_notebook()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction
|
||||
# ------------------------------------------------------------------
|
||||
@@ -1082,132 +1089,3 @@ def _score_code_quality(code: str) -> float:
|
||||
if line_count > 0 and not non_magic:
|
||||
score -= 1.0
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""Standalone CLI entry point for the Jupyter Notebook scraper."""
|
||||
from .arguments.jupyter import add_jupyter_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Jupyter Notebook (.ipynb) to skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_jupyter_arguments(parser)
|
||||
args = parser.parse_args()
|
||||
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
if getattr(args, "dry_run", False):
|
||||
source = getattr(args, "notebook", None) or getattr(args, "from_json", None) or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: Jupyter Notebook Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
if not (getattr(args, "notebook", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --notebook or --from-json")
|
||||
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} notebook documentation",
|
||||
}
|
||||
try:
|
||||
converter = JupyterToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct notebook mode
|
||||
if not getattr(args, "name", None):
|
||||
nb_path = Path(args.notebook)
|
||||
args.name = nb_path.stem if nb_path.is_file() else (nb_path.name or "notebooks")
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"notebook_path": args.notebook,
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = JupyterToSkillConverter(config)
|
||||
if not converter.extract_notebook():
|
||||
print("\n❌ Notebook extraction failed - see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
converter.build_skill()
|
||||
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis, "
|
||||
"enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during Jupyter processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -2,97 +2,67 @@
|
||||
"""
|
||||
Skill Seekers - Unified CLI Entry Point
|
||||
|
||||
Provides a git-style unified command-line interface for all Skill Seekers tools.
|
||||
Convert documentation, codebases, and repositories into AI skills.
|
||||
|
||||
Usage:
|
||||
skill-seekers <command> [options]
|
||||
|
||||
Commands:
|
||||
config Configure GitHub tokens, API keys, and settings
|
||||
scrape Scrape documentation website
|
||||
github Scrape GitHub repository
|
||||
pdf Extract from PDF file
|
||||
word Extract from Word (.docx) file
|
||||
epub Extract from EPUB e-book (.epub)
|
||||
video Extract from video (YouTube or local)
|
||||
jupyter Extract from Jupyter Notebook (.ipynb)
|
||||
html Extract from local HTML files
|
||||
openapi Extract from OpenAPI/Swagger spec
|
||||
asciidoc Extract from AsciiDoc documents (.adoc)
|
||||
pptx Extract from PowerPoint (.pptx)
|
||||
rss Extract from RSS/Atom feeds
|
||||
manpage Extract from man pages
|
||||
confluence Extract from Confluence wiki
|
||||
notion Extract from Notion pages
|
||||
chat Extract from Slack/Discord chat exports
|
||||
unified Multi-source scraping (docs + GitHub + PDF + more)
|
||||
analyze Analyze local codebase and extract code knowledge
|
||||
create Create skill from any source (auto-detects type)
|
||||
enhance AI-powered enhancement (auto: API or LOCAL mode)
|
||||
enhance-status Check enhancement status (for background/daemon modes)
|
||||
package Package skill into .zip file
|
||||
upload Upload skill to target platform
|
||||
install One-command workflow (scrape + enhance + package + upload)
|
||||
install-agent Install skill to AI agent directories
|
||||
estimate Estimate page count before scraping
|
||||
extract-test-examples Extract usage examples from test files
|
||||
install-agent Install skill to AI agent directories
|
||||
resume Resume interrupted scraping job
|
||||
config Configure GitHub tokens, API keys, and settings
|
||||
doctor Health check for dependencies and configuration
|
||||
|
||||
Examples:
|
||||
skill-seekers scrape --config configs/react.json
|
||||
skill-seekers github --repo microsoft/TypeScript
|
||||
skill-seekers unified --config configs/react_unified.json
|
||||
skill-seekers extract-test-examples tests/ --language python
|
||||
skill-seekers create https://react.dev
|
||||
skill-seekers create owner/repo
|
||||
skill-seekers create ./document.pdf
|
||||
skill-seekers create configs/unity-spine.json
|
||||
skill-seekers create configs/unity-spine.json --enhance-workflow unity-game-dev
|
||||
skill-seekers enhance output/react/
|
||||
skill-seekers package output/react/
|
||||
skill-seekers install-agent output/react/ --agent cursor
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import importlib
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from skill_seekers.cli import __version__
|
||||
|
||||
|
||||
# Command module mapping (command name -> module path)
|
||||
COMMAND_MODULES = {
|
||||
"create": "skill_seekers.cli.create_command", # NEW: Unified create command
|
||||
"doctor": "skill_seekers.cli.doctor",
|
||||
"config": "skill_seekers.cli.config_command",
|
||||
"scrape": "skill_seekers.cli.doc_scraper",
|
||||
"github": "skill_seekers.cli.github_scraper",
|
||||
"pdf": "skill_seekers.cli.pdf_scraper",
|
||||
"word": "skill_seekers.cli.word_scraper",
|
||||
"epub": "skill_seekers.cli.epub_scraper",
|
||||
"video": "skill_seekers.cli.video_scraper",
|
||||
"unified": "skill_seekers.cli.unified_scraper",
|
||||
# Skill creation — unified entry point for all 18 source types
|
||||
"create": "skill_seekers.cli.create_command",
|
||||
# Enhancement & packaging
|
||||
"enhance": "skill_seekers.cli.enhance_command",
|
||||
"enhance-status": "skill_seekers.cli.enhance_status",
|
||||
"package": "skill_seekers.cli.package_skill",
|
||||
"upload": "skill_seekers.cli.upload_skill",
|
||||
"install": "skill_seekers.cli.install_skill",
|
||||
"install-agent": "skill_seekers.cli.install_agent",
|
||||
# Utilities
|
||||
"estimate": "skill_seekers.cli.estimate_pages",
|
||||
"extract-test-examples": "skill_seekers.cli.test_example_extractor",
|
||||
"install-agent": "skill_seekers.cli.install_agent",
|
||||
"analyze": "skill_seekers.cli.codebase_scraper",
|
||||
"install": "skill_seekers.cli.install_skill",
|
||||
"resume": "skill_seekers.cli.resume_command",
|
||||
"quality": "skill_seekers.cli.quality_metrics",
|
||||
# Configuration & workflows
|
||||
"config": "skill_seekers.cli.config_command",
|
||||
"doctor": "skill_seekers.cli.doctor",
|
||||
"workflows": "skill_seekers.cli.workflows_command",
|
||||
"sync-config": "skill_seekers.cli.sync_config",
|
||||
# Advanced (less common)
|
||||
"stream": "skill_seekers.cli.streaming_ingest",
|
||||
"update": "skill_seekers.cli.incremental_updater",
|
||||
"multilang": "skill_seekers.cli.multilang_support",
|
||||
"quality": "skill_seekers.cli.quality_metrics",
|
||||
"workflows": "skill_seekers.cli.workflows_command",
|
||||
"sync-config": "skill_seekers.cli.sync_config",
|
||||
# New source types (v3.2.0+)
|
||||
"jupyter": "skill_seekers.cli.jupyter_scraper",
|
||||
"html": "skill_seekers.cli.html_scraper",
|
||||
"openapi": "skill_seekers.cli.openapi_scraper",
|
||||
"asciidoc": "skill_seekers.cli.asciidoc_scraper",
|
||||
"pptx": "skill_seekers.cli.pptx_scraper",
|
||||
"rss": "skill_seekers.cli.rss_scraper",
|
||||
"manpage": "skill_seekers.cli.man_scraper",
|
||||
"confluence": "skill_seekers.cli.confluence_scraper",
|
||||
"notion": "skill_seekers.cli.notion_scraper",
|
||||
"chat": "skill_seekers.cli.chat_scraper",
|
||||
}
|
||||
|
||||
|
||||
@@ -106,14 +76,14 @@ def create_parser() -> argparse.ArgumentParser:
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Scrape documentation
|
||||
skill-seekers scrape --config configs/react.json
|
||||
# Create skill from documentation (auto-detects source type)
|
||||
skill-seekers create https://docs.react.dev --name react
|
||||
|
||||
# Scrape GitHub repository
|
||||
skill-seekers github --repo microsoft/TypeScript --name typescript
|
||||
# Create skill from GitHub repository
|
||||
skill-seekers create microsoft/TypeScript --name typescript
|
||||
|
||||
# Multi-source scraping (unified)
|
||||
skill-seekers unified --config configs/react_unified.json
|
||||
# Create skill from PDF file
|
||||
skill-seekers create ./documentation.pdf --name mydocs
|
||||
|
||||
# AI-powered enhancement
|
||||
skill-seekers enhance output/react/
|
||||
@@ -145,6 +115,9 @@ For more information: https://github.com/yusufkaraaslan/Skill_Seekers
|
||||
def _reconstruct_argv(command: str, args: argparse.Namespace) -> list[str]:
|
||||
"""Reconstruct sys.argv from args namespace for command module.
|
||||
|
||||
DEPRECATED: Use ExecutionContext instead. This function is kept for
|
||||
backward compatibility and will be removed in a future version.
|
||||
|
||||
Args:
|
||||
command: Command name
|
||||
args: Parsed arguments namespace
|
||||
@@ -206,18 +179,8 @@ def main(argv: list[str] | None = None) -> int:
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for error)
|
||||
"""
|
||||
# Special handling for analyze --preset-list (no directory required)
|
||||
if argv is None:
|
||||
argv = sys.argv[1:]
|
||||
if len(argv) >= 2 and argv[0] == "analyze" and "--preset-list" in argv:
|
||||
from skill_seekers.cli.codebase_scraper import main as analyze_main
|
||||
|
||||
original_argv = sys.argv.copy()
|
||||
sys.argv = ["codebase_scraper.py", "--preset-list"]
|
||||
try:
|
||||
return analyze_main() or 0
|
||||
finally:
|
||||
sys.argv = original_argv
|
||||
|
||||
parser = create_parser()
|
||||
args = parser.parse_args(argv)
|
||||
@@ -226,6 +189,10 @@ def main(argv: list[str] | None = None) -> int:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
# Note: ExecutionContext is initialized by individual commands (e.g., create_command,
|
||||
# enhance_command) with the correct config_path and source_info. Do NOT initialize
|
||||
# it here — commands need to set config_path which requires source detection first.
|
||||
|
||||
# Get command module
|
||||
module_name = COMMAND_MODULES.get(args.command)
|
||||
if not module_name:
|
||||
@@ -233,9 +200,38 @@ def main(argv: list[str] | None = None) -> int:
|
||||
parser.print_help()
|
||||
return 1
|
||||
|
||||
# Special handling for 'analyze' command (has post-processing)
|
||||
if args.command == "analyze":
|
||||
return _handle_analyze_command(args)
|
||||
# create command: call directly with parsed args (no argv reconstruction)
|
||||
if args.command == "create":
|
||||
# Handle --help-* flags before execute (no source needed for help)
|
||||
from skill_seekers.cli.arguments.create import add_create_arguments
|
||||
|
||||
help_modes = {
|
||||
"_help_web": "web",
|
||||
"_help_github": "github",
|
||||
"_help_local": "local",
|
||||
"_help_pdf": "pdf",
|
||||
"_help_word": "word",
|
||||
"_help_epub": "epub",
|
||||
"_help_video": "video",
|
||||
"_help_config": "config",
|
||||
"_help_advanced": "advanced",
|
||||
"_help_all": "all",
|
||||
}
|
||||
for attr, mode in help_modes.items():
|
||||
if getattr(args, attr, False):
|
||||
help_parser = argparse.ArgumentParser(
|
||||
prog="skill-seekers create",
|
||||
description=f"Create skill — {mode} options",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
add_create_arguments(help_parser, mode=mode)
|
||||
help_parser.print_help()
|
||||
return 0
|
||||
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
command = CreateCommand(args)
|
||||
return command.execute()
|
||||
|
||||
# Standard delegation for all other commands
|
||||
try:
|
||||
@@ -269,165 +265,5 @@ def main(argv: list[str] | None = None) -> int:
|
||||
return 1
|
||||
|
||||
|
||||
def _handle_analyze_command(args: argparse.Namespace) -> int:
|
||||
"""Handle analyze command with special post-processing logic.
|
||||
|
||||
Args:
|
||||
args: Parsed arguments
|
||||
|
||||
Returns:
|
||||
Exit code
|
||||
"""
|
||||
from skill_seekers.cli.codebase_scraper import main as analyze_main
|
||||
|
||||
# Reconstruct sys.argv for analyze command
|
||||
original_argv = sys.argv.copy()
|
||||
sys.argv = ["codebase_scraper.py", "--directory", args.directory]
|
||||
|
||||
if args.output:
|
||||
sys.argv.extend(["--output", args.output])
|
||||
|
||||
# Handle preset flags (depth and features)
|
||||
if args.quick:
|
||||
sys.argv.extend(
|
||||
[
|
||||
"--depth",
|
||||
"surface",
|
||||
"--skip-patterns",
|
||||
"--skip-test-examples",
|
||||
"--skip-how-to-guides",
|
||||
"--skip-config-patterns",
|
||||
]
|
||||
)
|
||||
elif args.comprehensive:
|
||||
sys.argv.extend(["--depth", "full"])
|
||||
elif args.depth:
|
||||
sys.argv.extend(["--depth", args.depth])
|
||||
|
||||
# Determine enhance_level (simplified - use default or override)
|
||||
enhance_level = getattr(args, "enhance_level", 2) # Default is 2
|
||||
if getattr(args, "quick", False):
|
||||
enhance_level = 0 # Quick mode disables enhancement
|
||||
|
||||
sys.argv.extend(["--enhance-level", str(enhance_level)])
|
||||
|
||||
# Pass through remaining arguments
|
||||
if args.languages:
|
||||
sys.argv.extend(["--languages", args.languages])
|
||||
if args.file_patterns:
|
||||
sys.argv.extend(["--file-patterns", args.file_patterns])
|
||||
if args.skip_api_reference:
|
||||
sys.argv.append("--skip-api-reference")
|
||||
if args.skip_dependency_graph:
|
||||
sys.argv.append("--skip-dependency-graph")
|
||||
if args.skip_patterns:
|
||||
sys.argv.append("--skip-patterns")
|
||||
if args.skip_test_examples:
|
||||
sys.argv.append("--skip-test-examples")
|
||||
if args.skip_how_to_guides:
|
||||
sys.argv.append("--skip-how-to-guides")
|
||||
if args.skip_config_patterns:
|
||||
sys.argv.append("--skip-config-patterns")
|
||||
if args.skip_docs:
|
||||
sys.argv.append("--skip-docs")
|
||||
if args.no_comments:
|
||||
sys.argv.append("--no-comments")
|
||||
if args.verbose:
|
||||
sys.argv.append("--verbose")
|
||||
if getattr(args, "quiet", False):
|
||||
sys.argv.append("--quiet")
|
||||
if getattr(args, "dry_run", False):
|
||||
sys.argv.append("--dry-run")
|
||||
if getattr(args, "preset", None):
|
||||
sys.argv.extend(["--preset", args.preset])
|
||||
if getattr(args, "name", None):
|
||||
sys.argv.extend(["--name", args.name])
|
||||
if getattr(args, "description", None):
|
||||
sys.argv.extend(["--description", args.description])
|
||||
if getattr(args, "api_key", None):
|
||||
sys.argv.extend(["--api-key", args.api_key])
|
||||
# Enhancement Workflow arguments
|
||||
if getattr(args, "enhance_workflow", None):
|
||||
for wf in args.enhance_workflow:
|
||||
sys.argv.extend(["--enhance-workflow", wf])
|
||||
if getattr(args, "enhance_stage", None):
|
||||
for stage in args.enhance_stage:
|
||||
sys.argv.extend(["--enhance-stage", stage])
|
||||
if getattr(args, "var", None):
|
||||
for var in args.var:
|
||||
sys.argv.extend(["--var", var])
|
||||
if getattr(args, "workflow_dry_run", False):
|
||||
sys.argv.append("--workflow-dry-run")
|
||||
|
||||
try:
|
||||
result = analyze_main() or 0
|
||||
|
||||
# Enhance SKILL.md if enhance_level >= 1
|
||||
if result == 0 and enhance_level >= 1:
|
||||
skill_dir = Path(args.output)
|
||||
skill_md = skill_dir / "SKILL.md"
|
||||
|
||||
if skill_md.exists():
|
||||
print("\n" + "=" * 60)
|
||||
print(f"ENHANCING SKILL.MD WITH AI (Level {enhance_level})")
|
||||
print("=" * 60 + "\n")
|
||||
|
||||
try:
|
||||
from skill_seekers.cli.enhance_command import (
|
||||
_is_root,
|
||||
_pick_mode,
|
||||
_run_api_mode,
|
||||
_run_local_mode,
|
||||
)
|
||||
import argparse as _ap
|
||||
|
||||
_fake_args = _ap.Namespace(
|
||||
skill_directory=str(skill_dir),
|
||||
target=None,
|
||||
api_key=None,
|
||||
dry_run=False,
|
||||
agent=None,
|
||||
agent_cmd=None,
|
||||
interactive_enhancement=False,
|
||||
background=False,
|
||||
daemon=False,
|
||||
no_force=False,
|
||||
timeout=2700,
|
||||
)
|
||||
_mode, _target = _pick_mode(_fake_args)
|
||||
|
||||
if _mode == "api":
|
||||
print(f"\n🤖 Enhancement mode: API ({_target})")
|
||||
success = _run_api_mode(_fake_args, _target) == 0
|
||||
elif _is_root():
|
||||
print("\n⚠️ Skipping SKILL.md enhancement: running as root")
|
||||
print(" Set ANTHROPIC_API_KEY / GOOGLE_API_KEY to enable API mode")
|
||||
success = False
|
||||
else:
|
||||
agent_name = (
|
||||
os.environ.get("SKILL_SEEKER_AGENT", "claude").strip() or "claude"
|
||||
)
|
||||
print(f"\n🤖 Enhancement mode: LOCAL ({agent_name})")
|
||||
success = _run_local_mode(_fake_args) == 0
|
||||
|
||||
if success:
|
||||
print("\n✅ SKILL.md enhancement complete!")
|
||||
with open(skill_md) as f:
|
||||
lines = len(f.readlines())
|
||||
print(f" Enhanced SKILL.md: {lines} lines")
|
||||
else:
|
||||
print("\n⚠️ SKILL.md enhancement did not complete")
|
||||
print(" You can retry with: skill-seekers enhance " + str(skill_dir))
|
||||
except Exception as e:
|
||||
print(f"\n⚠️ SKILL.md enhancement failed: {e}")
|
||||
print(" You can retry with: skill-seekers enhance " + str(skill_dir))
|
||||
else:
|
||||
print(f"\n⚠️ SKILL.md not found at {skill_md}, skipping enhancement")
|
||||
|
||||
return result
|
||||
finally:
|
||||
sys.argv = original_argv
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -20,15 +20,15 @@ Usage:
|
||||
skill-seekers man --from-json unix-tools_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -116,7 +116,7 @@ def infer_description_from_manpages(
|
||||
)
|
||||
|
||||
|
||||
class ManPageToSkillConverter:
|
||||
class ManPageToSkillConverter(SkillConverter):
|
||||
"""Convert Unix man pages into a skill directory structure.
|
||||
|
||||
Supports extraction via the ``man`` command or by reading raw man-page
|
||||
@@ -125,6 +125,8 @@ class ManPageToSkillConverter:
|
||||
from skill generation.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "manpage"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialise the converter from a configuration dictionary.
|
||||
|
||||
@@ -137,6 +139,7 @@ class ManPageToSkillConverter:
|
||||
- ``description``-- explicit description (optional)
|
||||
- ``categories`` -- keyword-based categorisation map (optional)
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.man_names: list[str] = config.get("man_names", [])
|
||||
@@ -156,6 +159,10 @@ class ManPageToSkillConverter:
|
||||
# Extracted data placeholder
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from man pages (SkillConverter interface)."""
|
||||
self.extract_manpages()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction
|
||||
# ------------------------------------------------------------------
|
||||
@@ -1285,233 +1292,3 @@ class ManPageToSkillConverter:
|
||||
safe = re.sub(r"[^\w\s-]", "", name.lower())
|
||||
safe = re.sub(r"[-\s]+", "_", safe)
|
||||
return safe
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the man page scraper.
|
||||
|
||||
Supports three workflows:
|
||||
|
||||
1. ``--man-names git,curl`` -- extract named man pages via the ``man``
|
||||
command.
|
||||
2. ``--man-path /usr/share/man/man1`` -- read man page files from a
|
||||
directory.
|
||||
3. ``--from-json data.json`` -- reload previously extracted data and
|
||||
rebuild the skill.
|
||||
|
||||
Returns:
|
||||
Exit code (0 on success, non-zero on error).
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Unix man pages to a skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" %(prog)s --man-names git,curl --name unix-tools\n"
|
||||
" %(prog)s --man-path /usr/share/man/man1 --name coreutils\n"
|
||||
" %(prog)s --from-json unix-tools_extracted.json\n"
|
||||
),
|
||||
)
|
||||
|
||||
# Standard arguments (name, description, output, enhance-level, etc.)
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for man pages
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
action.help = (
|
||||
"AI enhancement level (auto-detects API vs LOCAL mode): "
|
||||
"0=disabled (default for man), 1=SKILL.md only, "
|
||||
"2=+architecture/config, 3=full enhancement. "
|
||||
"Mode selection: uses API if ANTHROPIC_API_KEY is set, "
|
||||
"otherwise LOCAL (Claude Code, Kimi, etc.)"
|
||||
)
|
||||
|
||||
# Man-specific arguments
|
||||
parser.add_argument(
|
||||
"--man-names",
|
||||
type=str,
|
||||
help="Comma-separated list of man page names (e.g. git,curl,grep)",
|
||||
metavar="NAMES",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--man-path",
|
||||
type=str,
|
||||
help="Directory containing man page files (.1-.8, .man, .gz)",
|
||||
metavar="DIR",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--sections",
|
||||
type=str,
|
||||
help="Comma-separated list of man section numbers to extract (e.g. 1,3,8)",
|
||||
metavar="NUMS",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json",
|
||||
type=str,
|
||||
help="Build skill from previously extracted JSON",
|
||||
metavar="FILE",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Dry run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = (
|
||||
getattr(args, "man_names", None)
|
||||
or getattr(args, "man_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
or "(none)"
|
||||
)
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: Man Page Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Sections: {getattr(args, 'sections', None) or 'all'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate: must have at least one source
|
||||
if not (
|
||||
getattr(args, "man_names", None)
|
||||
or getattr(args, "man_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
):
|
||||
parser.error("Must specify --man-names, --man-path, or --from-json")
|
||||
|
||||
# Parse section numbers
|
||||
section_list: list[int] = []
|
||||
if getattr(args, "sections", None):
|
||||
try:
|
||||
section_list = [int(s.strip()) for s in args.sections.split(",") if s.strip()]
|
||||
except ValueError:
|
||||
parser.error("--sections must be comma-separated integers (e.g. 1,3,8)")
|
||||
|
||||
# Parse man names
|
||||
man_name_list: list[str] = []
|
||||
if getattr(args, "man_names", None):
|
||||
man_name_list = [n.strip() for n in args.man_names.split(",") if n.strip()]
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} documentation",
|
||||
}
|
||||
try:
|
||||
converter = ManPageToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Auto-detect name from man names or path
|
||||
if not getattr(args, "name", None):
|
||||
if man_name_list:
|
||||
args.name = man_name_list[0] if len(man_name_list) == 1 else "man-pages"
|
||||
elif getattr(args, "man_path", None):
|
||||
args.name = Path(args.man_path).name
|
||||
else:
|
||||
args.name = "man-pages"
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"man_names": man_name_list,
|
||||
"man_path": getattr(args, "man_path", ""),
|
||||
"sections": section_list,
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = ManPageToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_manpages():
|
||||
print("\n❌ Man page extraction failed -- see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during man page processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -16,17 +16,17 @@ Usage:
|
||||
skill-seekers notion --from-json output/myskill_notion_data.json --name myskill
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
# Optional dependency guard — notion-client is not a core dependency
|
||||
try:
|
||||
from notion_client import Client as NotionClient
|
||||
@@ -71,7 +71,7 @@ def infer_description_from_notion(metadata: dict | None = None, name: str = "")
|
||||
)
|
||||
|
||||
|
||||
class NotionToSkillConverter:
|
||||
class NotionToSkillConverter(SkillConverter):
|
||||
"""Convert Notion workspace content (database or page tree) to a skill.
|
||||
|
||||
Args:
|
||||
@@ -79,7 +79,10 @@ class NotionToSkillConverter:
|
||||
token, description, max_pages.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "notion"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.database_id: str | None = config.get("database_id")
|
||||
@@ -109,6 +112,10 @@ class NotionToSkillConverter:
|
||||
logger.info("Notion API client initialised")
|
||||
return self._client
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from Notion (SkillConverter interface)."""
|
||||
self.extract_notion()
|
||||
|
||||
# -- Public extraction -----------------------------------------------
|
||||
|
||||
def extract_notion(self) -> bool:
|
||||
@@ -857,173 +864,3 @@ class NotionToSkillConverter:
|
||||
"""Strip trailing Notion hex IDs from export filenames."""
|
||||
cleaned = re.sub(r"\s+[0-9a-f]{16,}$", "", stem)
|
||||
return cleaned.strip() or stem
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the Notion scraper."""
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Notion workspace content to AI-ready skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" skill-seekers notion --database-id ID --token $NOTION_TOKEN --name myskill\n"
|
||||
" skill-seekers notion --page-id ID --token $NOTION_TOKEN --name myskill\n"
|
||||
" skill-seekers notion --export-path ./export/ --name myskill\n"
|
||||
" skill-seekers notion --from-json output/myskill_notion_data.json --name myskill"
|
||||
),
|
||||
)
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for Notion
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
|
||||
# Notion-specific arguments
|
||||
parser.add_argument(
|
||||
"--database-id", type=str, help="Notion database ID (API mode)", metavar="ID"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--page-id", type=str, help="Notion page ID (API mode, recursive)", metavar="ID"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--export-path", type=str, help="Notion export directory (export mode)", metavar="PATH"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--token", type=str, help="Notion integration token (or NOTION_TOKEN env)", metavar="TOKEN"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-pages",
|
||||
type=int,
|
||||
default=DEFAULT_MAX_PAGES,
|
||||
help=f"Maximum pages to extract (default: {DEFAULT_MAX_PAGES})",
|
||||
metavar="N",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json", type=str, help="Build from previously extracted JSON", metavar="FILE"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Logging
|
||||
level = (
|
||||
logging.WARNING
|
||||
if getattr(args, "quiet", False)
|
||||
else (logging.DEBUG if getattr(args, "verbose", False) else logging.INFO)
|
||||
)
|
||||
logging.basicConfig(level=level, format="%(message)s", force=True)
|
||||
|
||||
# Dry run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = (
|
||||
getattr(args, "database_id", None)
|
||||
or getattr(args, "page_id", None)
|
||||
or getattr(args, "export_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
or "(none)"
|
||||
)
|
||||
print(f"\n{'=' * 60}\nDRY RUN: Notion Extraction\n{'=' * 60}")
|
||||
print(
|
||||
f"Source: {source}\nName: {getattr(args, 'name', None) or '(auto)'}\nMax pages: {args.max_pages}"
|
||||
)
|
||||
return 0
|
||||
|
||||
# Validate
|
||||
has_source = any(
|
||||
getattr(args, a, None) for a in ("database_id", "page_id", "export_path", "from_json")
|
||||
)
|
||||
if not has_source:
|
||||
parser.error("Must specify --database-id, --page-id, --export-path, or --from-json")
|
||||
if not getattr(args, "name", None):
|
||||
if getattr(args, "from_json", None):
|
||||
args.name = Path(args.from_json).stem.replace("_notion_data", "")
|
||||
elif getattr(args, "export_path", None):
|
||||
args.name = Path(args.export_path).stem
|
||||
else:
|
||||
parser.error("--name is required when using --database-id or --page-id")
|
||||
|
||||
# --from-json: build only
|
||||
if getattr(args, "from_json", None):
|
||||
config = {
|
||||
"name": args.name,
|
||||
"description": getattr(args, "description", None),
|
||||
"max_pages": args.max_pages,
|
||||
}
|
||||
try:
|
||||
conv = NotionToSkillConverter(config)
|
||||
conv.load_extracted_data(args.from_json)
|
||||
conv.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n Error: {e}", file=sys.stderr)
|
||||
sys.exit(1) # noqa: E702
|
||||
return 0
|
||||
|
||||
# Full extract + build
|
||||
config: dict[str, Any] = {
|
||||
"name": args.name,
|
||||
"database_id": getattr(args, "database_id", None),
|
||||
"page_id": getattr(args, "page_id", None),
|
||||
"export_path": getattr(args, "export_path", None),
|
||||
"token": getattr(args, "token", None),
|
||||
"description": getattr(args, "description", None),
|
||||
"max_pages": args.max_pages,
|
||||
}
|
||||
try:
|
||||
conv = NotionToSkillConverter(config)
|
||||
if not conv.extract_notion():
|
||||
print("\n Notion extraction failed", file=sys.stderr)
|
||||
sys.exit(1) # noqa: E702
|
||||
conv.build_skill()
|
||||
|
||||
# Run enhancement workflows if specified
|
||||
try:
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
run_workflows(args)
|
||||
except (ImportError, AttributeError):
|
||||
pass
|
||||
|
||||
# Traditional AI enhancement
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
skill_dir = conv.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
except ImportError:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
except RuntimeError as e:
|
||||
print(f"\n Error: {e}", file=sys.stderr)
|
||||
sys.exit(1) # noqa: E702
|
||||
except Exception as e:
|
||||
print(f"\n Unexpected error: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1) # noqa: E702
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -22,14 +22,11 @@ Usage:
|
||||
python3 -m skill_seekers.cli.openapi_scraper --spec spec.yaml --name my-api
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import copy
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
# Optional dependency guard
|
||||
@@ -40,6 +37,8 @@ try:
|
||||
except ImportError:
|
||||
YAML_AVAILABLE = False
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# HTTP methods recognized in OpenAPI path items
|
||||
@@ -90,7 +89,7 @@ def infer_description_from_spec(info: dict | None = None, name: str = "") -> str
|
||||
return f"Use when working with the {name} API" if name else "Use when working with this API"
|
||||
|
||||
|
||||
class OpenAPIToSkillConverter:
|
||||
class OpenAPIToSkillConverter(SkillConverter):
|
||||
"""Convert OpenAPI/Swagger specifications to AI-ready skills.
|
||||
|
||||
Supports OpenAPI 2.0 (Swagger), 3.0, and 3.1 specifications in both
|
||||
@@ -111,6 +110,8 @@ class OpenAPIToSkillConverter:
|
||||
extracted_data: Structured extraction result with endpoints, schemas, etc.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "openapi"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialize the converter with configuration.
|
||||
|
||||
@@ -125,6 +126,7 @@ class OpenAPIToSkillConverter:
|
||||
ValueError: If neither spec_path nor spec_url is provided and
|
||||
no from_json workflow is intended.
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.spec_path: str = config.get("spec_path", "")
|
||||
@@ -142,6 +144,10 @@ class OpenAPIToSkillConverter:
|
||||
self.extracted_data: dict[str, Any] = {}
|
||||
self.openapi_version: str = ""
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from OpenAPI spec (SkillConverter interface)."""
|
||||
self.extract_spec()
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
# Spec loading
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
@@ -1772,192 +1778,3 @@ class OpenAPIToSkillConverter:
|
||||
safe = re.sub(r"[^\w\s-]", "", name.lower())
|
||||
safe = re.sub(r"[-\s]+", "_", safe)
|
||||
return safe
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
# CLI entry point
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the OpenAPI scraper.
|
||||
|
||||
Supports three input modes:
|
||||
1. Local spec file: --spec path/to/spec.yaml
|
||||
2. Remote spec URL: --spec-url https://example.com/openapi.json
|
||||
3. Pre-extracted JSON: --from-json extracted.json
|
||||
|
||||
Standard arguments (--name, --description, --verbose, --quiet, --dry-run)
|
||||
are provided by the shared argument system.
|
||||
"""
|
||||
_check_yaml_deps()
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert OpenAPI/Swagger specifications to AI-ready skills",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s --spec petstore.yaml --name petstore-api
|
||||
%(prog)s --spec-url https://petstore3.swagger.io/api/v3/openapi.json --name petstore
|
||||
%(prog)s --from-json petstore_extracted.json
|
||||
""",
|
||||
)
|
||||
|
||||
# Standard shared arguments
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for OpenAPI
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
action.help = (
|
||||
"AI enhancement level (auto-detects API vs LOCAL mode): "
|
||||
"0=disabled (default for OpenAPI), 1=SKILL.md only, "
|
||||
"2=+architecture/config, 3=full enhancement. "
|
||||
"Mode selection: uses API if ANTHROPIC_API_KEY is set, "
|
||||
"otherwise LOCAL (Claude Code, Kimi, etc.)"
|
||||
)
|
||||
|
||||
# OpenAPI-specific arguments
|
||||
parser.add_argument(
|
||||
"--spec",
|
||||
type=str,
|
||||
help="Local path to OpenAPI/Swagger spec file (YAML or JSON)",
|
||||
metavar="PATH",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--spec-url",
|
||||
type=str,
|
||||
help="Remote URL to fetch OpenAPI/Swagger spec from",
|
||||
metavar="URL",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json",
|
||||
type=str,
|
||||
help="Build skill from previously extracted JSON data",
|
||||
metavar="FILE",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Setup logging
|
||||
if getattr(args, "quiet", False):
|
||||
logging.basicConfig(level=logging.WARNING, format="%(message)s")
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
|
||||
else:
|
||||
logging.basicConfig(level=logging.INFO, format="%(message)s")
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = args.spec or args.spec_url or args.from_json or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: OpenAPI Specification Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (args.spec or args.spec_url or args.from_json):
|
||||
parser.error("Must specify --spec (file path), --spec-url (URL), or --from-json")
|
||||
|
||||
# Build from pre-extracted JSON
|
||||
if args.from_json:
|
||||
name = args.name or Path(args.from_json).stem.replace("_extracted", "")
|
||||
config: dict[str, Any] = {
|
||||
"name": name,
|
||||
"description": (args.description or f"Use when working with the {name} API"),
|
||||
}
|
||||
converter = OpenAPIToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
return 0
|
||||
|
||||
# Determine name
|
||||
if not args.name:
|
||||
if args.spec:
|
||||
name = Path(args.spec).stem
|
||||
elif args.spec_url:
|
||||
# Derive name from URL
|
||||
from urllib.parse import urlparse
|
||||
|
||||
url_path = urlparse(args.spec_url).path
|
||||
name = Path(url_path).stem if url_path else "api"
|
||||
else:
|
||||
name = "api"
|
||||
else:
|
||||
name = args.name
|
||||
|
||||
# Build config
|
||||
config = {
|
||||
"name": name,
|
||||
"spec_path": args.spec or "",
|
||||
"spec_url": args.spec_url or "",
|
||||
}
|
||||
if args.description:
|
||||
config["description"] = args.description
|
||||
|
||||
# Create converter and run
|
||||
try:
|
||||
converter = OpenAPIToSkillConverter(config)
|
||||
|
||||
if not converter.extract_spec():
|
||||
print("\n OpenAPI extraction failed", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement workflow integration
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print(f"\n{'=' * 80}")
|
||||
print(f" AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print(" API enhancement complete!")
|
||||
except ImportError:
|
||||
print(" API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print(" Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print(" Local enhancement complete!")
|
||||
|
||||
except (ValueError, RuntimeError) as e:
|
||||
print(f"\n Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n Unexpected error during OpenAPI processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -1,21 +1,15 @@
|
||||
"""Parser registry and factory.
|
||||
|
||||
This module registers all subcommand parsers and provides a factory
|
||||
function to create them.
|
||||
function to create them. Individual scraper commands have been removed —
|
||||
use `skill-seekers create <source>` for all source types.
|
||||
"""
|
||||
|
||||
from .base import SubcommandParser
|
||||
|
||||
# Import all parser classes
|
||||
from .create_parser import CreateParser # NEW: Unified create command
|
||||
# Import parser classes (scrapers removed — use create command)
|
||||
from .create_parser import CreateParser
|
||||
from .config_parser import ConfigParser
|
||||
from .scrape_parser import ScrapeParser
|
||||
from .github_parser import GitHubParser
|
||||
from .pdf_parser import PDFParser
|
||||
from .word_parser import WordParser
|
||||
from .epub_parser import EpubParser
|
||||
from .video_parser import VideoParser
|
||||
from .unified_parser import UnifiedParser
|
||||
from .enhance_parser import EnhanceParser
|
||||
from .enhance_status_parser import EnhanceStatusParser
|
||||
from .package_parser import PackageParser
|
||||
@@ -23,7 +17,6 @@ from .upload_parser import UploadParser
|
||||
from .estimate_parser import EstimateParser
|
||||
from .test_examples_parser import TestExamplesParser
|
||||
from .install_agent_parser import InstallAgentParser
|
||||
from .analyze_parser import AnalyzeParser
|
||||
from .install_parser import InstallParser
|
||||
from .resume_parser import ResumeParser
|
||||
from .stream_parser import StreamParser
|
||||
@@ -34,57 +27,26 @@ from .workflows_parser import WorkflowsParser
|
||||
from .sync_config_parser import SyncConfigParser
|
||||
from .doctor_parser import DoctorParser
|
||||
|
||||
# New source type parsers (v3.2.0+)
|
||||
from .jupyter_parser import JupyterParser
|
||||
from .html_parser import HtmlParser
|
||||
from .openapi_parser import OpenAPIParser
|
||||
from .asciidoc_parser import AsciiDocParser
|
||||
from .pptx_parser import PptxParser
|
||||
from .rss_parser import RssParser
|
||||
from .manpage_parser import ManPageParser
|
||||
from .confluence_parser import ConfluenceParser
|
||||
from .notion_parser import NotionParser
|
||||
from .chat_parser import ChatParser
|
||||
|
||||
# Registry of all parsers (in order of usage frequency)
|
||||
# Registry of all parsers
|
||||
PARSERS = [
|
||||
CreateParser(), # NEW: Unified create command (placed first for prominence)
|
||||
CreateParser(),
|
||||
DoctorParser(),
|
||||
ConfigParser(),
|
||||
ScrapeParser(),
|
||||
GitHubParser(),
|
||||
PackageParser(),
|
||||
UploadParser(),
|
||||
AnalyzeParser(),
|
||||
EnhanceParser(),
|
||||
EnhanceStatusParser(),
|
||||
PDFParser(),
|
||||
WordParser(),
|
||||
EpubParser(),
|
||||
VideoParser(),
|
||||
UnifiedParser(),
|
||||
PackageParser(),
|
||||
UploadParser(),
|
||||
EstimateParser(),
|
||||
InstallParser(),
|
||||
InstallAgentParser(),
|
||||
TestExamplesParser(),
|
||||
ResumeParser(),
|
||||
StreamParser(),
|
||||
UpdateParser(),
|
||||
MultilangParser(),
|
||||
QualityParser(),
|
||||
WorkflowsParser(),
|
||||
SyncConfigParser(),
|
||||
# New source types (v3.2.0+)
|
||||
JupyterParser(),
|
||||
HtmlParser(),
|
||||
OpenAPIParser(),
|
||||
AsciiDocParser(),
|
||||
PptxParser(),
|
||||
RssParser(),
|
||||
ManPageParser(),
|
||||
ConfluenceParser(),
|
||||
NotionParser(),
|
||||
ChatParser(),
|
||||
StreamParser(),
|
||||
UpdateParser(),
|
||||
MultilangParser(),
|
||||
]
|
||||
|
||||
|
||||
|
||||
@@ -48,14 +48,13 @@ Presets: -p quick (1-2min) | -p standard (5-10min) | -p comprehensive (20-60min)
|
||||
def add_arguments(self, parser):
|
||||
"""Add create-specific arguments.
|
||||
|
||||
Uses shared argument definitions with progressive disclosure.
|
||||
Default mode shows only universal arguments (15 flags).
|
||||
|
||||
Multi-mode help handled via custom flags detected in argument parsing.
|
||||
Registers ALL arguments (120+ flags) so the top-level parser
|
||||
accepts source-specific flags like --browser, --max-pages, etc.
|
||||
Default help still shows only universal args; use --help-all for full list.
|
||||
"""
|
||||
# Add all arguments in 'default' mode (universal only)
|
||||
# This keeps help text clean and focused
|
||||
add_create_arguments(parser, mode="default")
|
||||
# Register all arguments so source-specific flags are accepted
|
||||
# by the top-level parser (create is the only entry point now)
|
||||
add_create_arguments(parser, mode="all")
|
||||
|
||||
# Add hidden help mode flags
|
||||
# These won't show in default help but can be used to get source-specific help
|
||||
|
||||
@@ -11,16 +11,14 @@ Usage:
|
||||
python3 pdf_scraper.py --from-json manual_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Import the PDF extractor
|
||||
from .pdf_extractor_poc import PDFExtractor
|
||||
from .skill_converter import SkillConverter
|
||||
|
||||
|
||||
def infer_description_from_pdf(pdf_metadata: dict = None, name: str = "") -> str:
|
||||
@@ -62,10 +60,13 @@ def infer_description_from_pdf(pdf_metadata: dict = None, name: str = "") -> str
|
||||
)
|
||||
|
||||
|
||||
class PDFToSkillConverter:
|
||||
class PDFToSkillConverter(SkillConverter):
|
||||
"""Convert PDF documentation to AI skill"""
|
||||
|
||||
SOURCE_TYPE = "pdf"
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.pdf_path = config.get("pdf_path", "")
|
||||
@@ -87,6 +88,10 @@ class PDFToSkillConverter:
|
||||
# Extracted data
|
||||
self.extracted_data = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to extract_pdf()."""
|
||||
return self.extract_pdf()
|
||||
|
||||
def extract_pdf(self):
|
||||
"""Extract content from PDF using pdf_extractor_poc.py"""
|
||||
print(f"\n🔍 Extracting from PDF: {self.pdf_path}")
|
||||
@@ -631,151 +636,3 @@ class PDFToSkillConverter:
|
||||
safe = re.sub(r"[^\w\s-]", "", name.lower())
|
||||
safe = re.sub(r"[-\s]+", "_", safe)
|
||||
return safe
|
||||
|
||||
|
||||
def main():
|
||||
from .arguments.pdf import add_pdf_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert PDF documentation to AI skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
add_pdf_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level from behavior args
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = args.pdf or args.config or args.from_json or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print(f"DRY RUN: PDF Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return
|
||||
|
||||
# Validate inputs
|
||||
if not (args.config or args.pdf or args.from_json):
|
||||
parser.error("Must specify --config, --pdf, or --from-json")
|
||||
|
||||
# Load or create config
|
||||
if args.config:
|
||||
with open(args.config) as f:
|
||||
config = json.load(f)
|
||||
elif args.from_json:
|
||||
# Build from extracted JSON
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": name,
|
||||
"description": args.description or f"Use when referencing {name} documentation",
|
||||
}
|
||||
converter = PDFToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
return
|
||||
else:
|
||||
# Direct PDF mode
|
||||
if not args.name:
|
||||
parser.error("Must specify --name with --pdf")
|
||||
config = {
|
||||
"name": args.name,
|
||||
"pdf_path": args.pdf,
|
||||
"description": args.description or f"Use when referencing {args.name} documentation",
|
||||
"extract_options": {
|
||||
"chunk_size": 10,
|
||||
"min_quality": 5.0,
|
||||
"extract_images": True,
|
||||
"min_image_size": 100,
|
||||
},
|
||||
}
|
||||
|
||||
# Create converter
|
||||
try:
|
||||
converter = PDFToSkillConverter(config)
|
||||
|
||||
# Extract if needed
|
||||
if config.get("pdf_path") and not converter.extract_pdf():
|
||||
print("\n❌ PDF extraction failed - see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# Enhancement Workflow Integration (Phase 2 - PDF Support)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
# Traditional Enhancement (complements workflow system)
|
||||
# ═══════════════════════════════════════════════════════════════════════════
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
import os
|
||||
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis, enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
agent_name = agent or "claude"
|
||||
print(f"✅ Local enhancement complete! (via {agent_name})")
|
||||
else:
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
agent_name = agent or "claude"
|
||||
print(f"✅ Local enhancement complete! (via {agent_name})")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during PDF processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
@@ -15,12 +15,10 @@ Usage:
|
||||
skill-seekers pptx --from-json presentation_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Optional dependency guard
|
||||
@@ -33,6 +31,8 @@ try:
|
||||
except ImportError:
|
||||
PPTX_AVAILABLE = False
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -147,7 +147,7 @@ def infer_description_from_pptx(
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class PptxToSkillConverter:
|
||||
class PptxToSkillConverter(SkillConverter):
|
||||
"""Convert PowerPoint presentation (.pptx) to an AI-ready skill.
|
||||
|
||||
Follows the same pipeline pattern as the Word, EPUB, and PDF scrapers:
|
||||
@@ -165,6 +165,8 @@ class PptxToSkillConverter:
|
||||
.pptx files (merged into a single skill).
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "pptx"
|
||||
|
||||
def __init__(self, config: dict) -> None:
|
||||
"""Initialize the converter with a configuration dictionary.
|
||||
|
||||
@@ -175,6 +177,7 @@ class PptxToSkillConverter:
|
||||
- description (str): Skill description (optional, inferred if absent)
|
||||
- categories (dict): Manual category assignments (optional)
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.pptx_path: str = config.get("pptx_path", "")
|
||||
@@ -192,6 +195,10 @@ class PptxToSkillConverter:
|
||||
# Extracted data (populated by extract_pptx or load_extracted_data)
|
||||
self.extracted_data: dict | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from PowerPoint files (SkillConverter interface)."""
|
||||
self.extract_pptx()
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Extraction
|
||||
# ------------------------------------------------------------------
|
||||
@@ -1661,165 +1668,3 @@ def _score_code_quality(code: str) -> float:
|
||||
score -= 2.0
|
||||
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the PowerPoint scraper.
|
||||
|
||||
Parses command-line arguments and runs the extraction and skill-building
|
||||
pipeline. Supports direct .pptx input, directory input, and loading from
|
||||
previously extracted JSON.
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for errors).
|
||||
"""
|
||||
from skill_seekers.cli.arguments.pptx import add_pptx_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert PowerPoint presentation (.pptx) to skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
add_pptx_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = getattr(args, "pptx", None) or getattr(args, "from_json", None) or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: PowerPoint Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (getattr(args, "pptx", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --pptx or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} presentation",
|
||||
}
|
||||
try:
|
||||
converter = PptxToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct PPTX mode
|
||||
if not getattr(args, "name", None):
|
||||
# Auto-detect name from filename or directory name
|
||||
pptx_path = Path(args.pptx)
|
||||
args.name = pptx_path.stem if pptx_path.is_file() else pptx_path.name
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"pptx_path": args.pptx,
|
||||
# Pass None so extract_pptx() can infer from presentation metadata
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = PptxToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_pptx():
|
||||
print(
|
||||
"\n❌ PowerPoint extraction failed - see error above",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis,"
|
||||
" enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except (FileNotFoundError, ValueError) as e:
|
||||
print(f"\n❌ Input error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(
|
||||
f"\n❌ Unexpected error during PowerPoint processing: {e}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -19,16 +19,13 @@ Usage:
|
||||
python3 -m skill_seekers.cli.rss_scraper --feed-url https://example.com/atom.xml --name myblog
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
# Optional dependency guard — feedparser is not in core deps
|
||||
@@ -42,6 +39,8 @@ except ImportError:
|
||||
# BeautifulSoup is a core dependency (always available)
|
||||
from bs4 import BeautifulSoup, Comment, Tag
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Feed type constants
|
||||
@@ -109,7 +108,7 @@ def infer_description_from_feed(
|
||||
)
|
||||
|
||||
|
||||
class RssToSkillConverter:
|
||||
class RssToSkillConverter(SkillConverter):
|
||||
"""Convert RSS/Atom feeds to AI-ready skills.
|
||||
|
||||
Parses RSS 2.0, RSS 1.0 (RDF), and Atom feeds using feedparser.
|
||||
@@ -117,6 +116,8 @@ class RssToSkillConverter:
|
||||
requests + BeautifulSoup.
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "rss"
|
||||
|
||||
def __init__(self, config: dict[str, Any]) -> None:
|
||||
"""Initialize the converter with configuration.
|
||||
|
||||
@@ -125,6 +126,7 @@ class RssToSkillConverter:
|
||||
follow_links (default True), max_articles (default 50),
|
||||
and description (optional).
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name: str = config["name"]
|
||||
self.feed_url: str = config.get("feed_url", "")
|
||||
@@ -142,6 +144,10 @@ class RssToSkillConverter:
|
||||
# Internal state
|
||||
self.extracted_data: dict[str, Any] | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from RSS/Atom feed (SkillConverter interface)."""
|
||||
self.extract_feed()
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
# Public API
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
@@ -865,227 +871,3 @@ class RssToSkillConverter:
|
||||
safe = re.sub(r"[^\w\s-]", "", name.lower())
|
||||
safe = re.sub(r"[-\s]+", "_", safe)
|
||||
return safe or "unnamed"
|
||||
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────────
|
||||
# CLI entry point
|
||||
# ──────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""CLI entry point for the RSS/Atom feed scraper."""
|
||||
from .arguments.common import add_all_standard_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert RSS/Atom feed to AI-ready skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=(
|
||||
"Examples:\n"
|
||||
" %(prog)s --feed-url https://example.com/feed.xml --name myblog\n"
|
||||
" %(prog)s --feed-path ./feed.xml --name myblog\n"
|
||||
" %(prog)s --feed-url https://example.com/rss --no-follow-links --name myblog\n"
|
||||
" %(prog)s --from-json myblog_extracted.json\n"
|
||||
),
|
||||
)
|
||||
|
||||
# Standard arguments (name, description, output, enhance-level, etc.)
|
||||
add_all_standard_arguments(parser)
|
||||
|
||||
# Override enhance-level default to 0 for RSS
|
||||
for action in parser._actions:
|
||||
if hasattr(action, "dest") and action.dest == "enhance_level":
|
||||
action.default = 0
|
||||
action.help = (
|
||||
"AI enhancement level (auto-detects API vs LOCAL mode): "
|
||||
"0=disabled (default for RSS), 1=SKILL.md only, "
|
||||
"2=+architecture/config, 3=full enhancement. "
|
||||
"Mode selection: uses API if ANTHROPIC_API_KEY is set, "
|
||||
"otherwise LOCAL (Claude Code, Kimi, etc.)"
|
||||
)
|
||||
|
||||
# RSS-specific arguments
|
||||
parser.add_argument(
|
||||
"--feed-url",
|
||||
type=str,
|
||||
help="URL of the RSS/Atom feed to scrape",
|
||||
metavar="URL",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--feed-path",
|
||||
type=str,
|
||||
help="Local file path to an RSS/Atom XML file",
|
||||
metavar="PATH",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--follow-links",
|
||||
action="store_true",
|
||||
default=True,
|
||||
dest="follow_links",
|
||||
help="Follow article links to scrape full content (default: enabled)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--no-follow-links",
|
||||
action="store_false",
|
||||
dest="follow_links",
|
||||
help="Do not follow article links — use feed content only",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-articles",
|
||||
type=int,
|
||||
default=50,
|
||||
metavar="N",
|
||||
help="Maximum number of articles to process (default: 50)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--from-json",
|
||||
type=str,
|
||||
help="Build skill from previously extracted JSON file",
|
||||
metavar="FILE",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = (
|
||||
getattr(args, "feed_url", None)
|
||||
or getattr(args, "feed_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
or "(none)"
|
||||
)
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: RSS/Atom Feed Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Follow links: {getattr(args, 'follow_links', True)}")
|
||||
print(f"Max articles: {getattr(args, 'max_articles', 50)}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
has_source = (
|
||||
getattr(args, "feed_url", None)
|
||||
or getattr(args, "feed_path", None)
|
||||
or getattr(args, "from_json", None)
|
||||
)
|
||||
if not has_source:
|
||||
parser.error("Must specify --feed-url, --feed-path, or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config: dict[str, Any] = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} feed content",
|
||||
}
|
||||
try:
|
||||
converter = RssToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Feed extraction workflow
|
||||
if not getattr(args, "name", None):
|
||||
# Auto-detect name from URL or file path
|
||||
if getattr(args, "feed_url", None):
|
||||
from urllib.parse import urlparse
|
||||
|
||||
parsed_url = urlparse(args.feed_url)
|
||||
args.name = parsed_url.hostname.replace(".", "-") if parsed_url.hostname else "feed"
|
||||
elif getattr(args, "feed_path", None):
|
||||
args.name = Path(args.feed_path).stem
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"feed_url": getattr(args, "feed_url", "") or "",
|
||||
"feed_path": getattr(args, "feed_path", "") or "",
|
||||
"follow_links": getattr(args, "follow_links", True),
|
||||
"max_articles": getattr(args, "max_articles", 50),
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
|
||||
try:
|
||||
converter = RssToSkillConverter(config)
|
||||
|
||||
# Extract feed
|
||||
if not converter.extract_feed():
|
||||
print("\n❌ Feed extraction failed — see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis, "
|
||||
"enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during feed processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
115
src/skill_seekers/cli/skill_converter.py
Normal file
115
src/skill_seekers/cli/skill_converter.py
Normal file
@@ -0,0 +1,115 @@
|
||||
"""
|
||||
SkillConverter — Base interface for all source type converters.
|
||||
|
||||
Every scraper/converter inherits this and implements extract().
|
||||
The create command calls converter.run() — same interface for all 18 types.
|
||||
|
||||
Usage:
|
||||
converter = get_converter("web", config)
|
||||
converter.run() # extract + build + return exit code
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SkillConverter:
|
||||
"""Base interface for all skill converters.
|
||||
|
||||
Subclasses must implement extract() at minimum.
|
||||
build_skill() has a default implementation that most converters override.
|
||||
"""
|
||||
|
||||
# Override in subclass
|
||||
SOURCE_TYPE: str = "unknown"
|
||||
|
||||
def __init__(self, config: dict[str, Any]):
|
||||
self.config = config
|
||||
self.name = config.get("name", "unnamed")
|
||||
self.skill_dir = f"output/{self.name}"
|
||||
|
||||
def run(self) -> int:
|
||||
"""Main entry point — extract source and build skill.
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for failure).
|
||||
"""
|
||||
try:
|
||||
logger.info(f"Extracting from {self.SOURCE_TYPE} source: {self.name}")
|
||||
self.extract()
|
||||
result = self.build_skill()
|
||||
if result is False:
|
||||
logger.error(f"❌ {self.SOURCE_TYPE} build_skill() reported failure")
|
||||
return 1
|
||||
logger.info(f"✅ Skill built: {self.skill_dir}/")
|
||||
return 0
|
||||
except Exception as e:
|
||||
logger.exception(f"❌ {self.SOURCE_TYPE} extraction failed: {e}")
|
||||
return 1
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from source. Override in subclass."""
|
||||
raise NotImplementedError(f"{self.__class__.__name__} must implement extract()")
|
||||
|
||||
def build_skill(self):
|
||||
"""Build SKILL.md from extracted data. Override in subclass."""
|
||||
raise NotImplementedError(f"{self.__class__.__name__} must implement build_skill()")
|
||||
|
||||
|
||||
# Registry mapping source type → (module_path, class_name)
|
||||
CONVERTER_REGISTRY: dict[str, tuple[str, str]] = {
|
||||
"web": ("skill_seekers.cli.doc_scraper", "DocToSkillConverter"),
|
||||
"github": ("skill_seekers.cli.github_scraper", "GitHubScraper"),
|
||||
"pdf": ("skill_seekers.cli.pdf_scraper", "PDFToSkillConverter"),
|
||||
"word": ("skill_seekers.cli.word_scraper", "WordToSkillConverter"),
|
||||
"epub": ("skill_seekers.cli.epub_scraper", "EpubToSkillConverter"),
|
||||
"video": ("skill_seekers.cli.video_scraper", "VideoToSkillConverter"),
|
||||
"local": ("skill_seekers.cli.codebase_scraper", "CodebaseAnalyzer"),
|
||||
"jupyter": ("skill_seekers.cli.jupyter_scraper", "JupyterToSkillConverter"),
|
||||
"html": ("skill_seekers.cli.html_scraper", "HtmlToSkillConverter"),
|
||||
"openapi": ("skill_seekers.cli.openapi_scraper", "OpenAPIToSkillConverter"),
|
||||
"asciidoc": ("skill_seekers.cli.asciidoc_scraper", "AsciiDocToSkillConverter"),
|
||||
"pptx": ("skill_seekers.cli.pptx_scraper", "PptxToSkillConverter"),
|
||||
"rss": ("skill_seekers.cli.rss_scraper", "RssToSkillConverter"),
|
||||
"manpage": ("skill_seekers.cli.man_scraper", "ManPageToSkillConverter"),
|
||||
"confluence": ("skill_seekers.cli.confluence_scraper", "ConfluenceToSkillConverter"),
|
||||
"notion": ("skill_seekers.cli.notion_scraper", "NotionToSkillConverter"),
|
||||
"chat": ("skill_seekers.cli.chat_scraper", "ChatToSkillConverter"),
|
||||
# NOTE: UnifiedScraper takes (config_path: str), not (config: dict).
|
||||
# Callers must construct it directly, not via get_converter().
|
||||
"config": ("skill_seekers.cli.unified_scraper", "UnifiedScraper"),
|
||||
}
|
||||
|
||||
|
||||
def get_converter(source_type: str, config: dict[str, Any]) -> SkillConverter:
|
||||
"""Get the appropriate converter for a source type.
|
||||
|
||||
Args:
|
||||
source_type: Source type from SourceDetector (web, github, pdf, etc.)
|
||||
config: Configuration dict for the converter.
|
||||
|
||||
Returns:
|
||||
Initialized converter instance.
|
||||
|
||||
Raises:
|
||||
ValueError: If source type is not supported.
|
||||
"""
|
||||
import importlib
|
||||
|
||||
if source_type not in CONVERTER_REGISTRY:
|
||||
raise ValueError(
|
||||
f"Unknown source type: {source_type}. "
|
||||
f"Supported: {', '.join(sorted(CONVERTER_REGISTRY))}"
|
||||
)
|
||||
|
||||
module_path, class_name = CONVERTER_REGISTRY[source_type]
|
||||
module = importlib.import_module(module_path)
|
||||
converter_class = getattr(module, class_name, None)
|
||||
if converter_class is None:
|
||||
raise ValueError(
|
||||
f"Class '{class_name}' not found in module '{module_path}'. "
|
||||
f"Check CONVERTER_REGISTRY entry for '{source_type}'."
|
||||
)
|
||||
return converter_class(config)
|
||||
@@ -12,7 +12,6 @@ Usage:
|
||||
skill-seekers unified --config configs/react_unified.json --merge-mode ai-enhanced
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -28,8 +27,8 @@ try:
|
||||
from skill_seekers.cli.config_validator import validate_config
|
||||
from skill_seekers.cli.conflict_detector import ConflictDetector
|
||||
from skill_seekers.cli.merge_sources import AIEnhancedMerger, RuleBasedMerger
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
from skill_seekers.cli.unified_skill_builder import UnifiedSkillBuilder
|
||||
from skill_seekers.cli.utils import setup_logging
|
||||
except ImportError as e:
|
||||
print(f"Error importing modules: {e}")
|
||||
print("Make sure you're running from the project root directory")
|
||||
@@ -38,7 +37,7 @@ except ImportError as e:
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class UnifiedScraper:
|
||||
class UnifiedScraper(SkillConverter):
|
||||
"""
|
||||
Orchestrates multi-source scraping and merging.
|
||||
|
||||
@@ -50,6 +49,8 @@ class UnifiedScraper:
|
||||
5. Build unified skill
|
||||
"""
|
||||
|
||||
SOURCE_TYPE = "config"
|
||||
|
||||
def __init__(self, config_path: str, merge_mode: str | None = None):
|
||||
"""
|
||||
Initialize unified scraper.
|
||||
@@ -58,6 +59,7 @@ class UnifiedScraper:
|
||||
config_path: Path to unified config JSON
|
||||
merge_mode: Override config merge_mode ('rule-based' or 'claude-enhanced')
|
||||
"""
|
||||
super().__init__({"name": "unified", "config_path": config_path})
|
||||
self.config_path = config_path
|
||||
|
||||
# Validate and load config
|
||||
@@ -157,6 +159,42 @@ class UnifiedScraper:
|
||||
logger.info(f"📝 Logging to: {log_file}")
|
||||
logger.info(f"🗂️ Cache directory: {self.cache_dir}")
|
||||
|
||||
@staticmethod
|
||||
def _enrich_docs_json(docs_json: dict, data_file_path: str) -> dict:
|
||||
"""Enrich docs summary with page content from individual page files.
|
||||
|
||||
summary.json only has {title, url} per page; full content lives in pages/*.json.
|
||||
ConflictDetector needs content to extract APIs, so we load page files and convert
|
||||
to the dict format {url: page_data} that the detector's dict branch understands.
|
||||
"""
|
||||
pages = docs_json.get("pages", [])
|
||||
if not isinstance(pages, list) or not pages or "content" in pages[0]:
|
||||
return docs_json
|
||||
|
||||
pages_dir = os.path.join(os.path.dirname(data_file_path), "pages")
|
||||
if not os.path.isdir(pages_dir):
|
||||
return docs_json
|
||||
|
||||
enriched_pages = {}
|
||||
for page_file in os.listdir(pages_dir):
|
||||
if page_file.endswith(".json"):
|
||||
try:
|
||||
with open(os.path.join(pages_dir, page_file), encoding="utf-8") as pf:
|
||||
page_data = json.load(pf)
|
||||
url = page_data.get("url", "")
|
||||
if url:
|
||||
enriched_pages[url] = page_data
|
||||
except (json.JSONDecodeError, OSError):
|
||||
continue
|
||||
|
||||
if enriched_pages:
|
||||
docs_json = {**docs_json, "pages": enriched_pages}
|
||||
logger.info(
|
||||
f"Enriched docs data with {len(enriched_pages)} page files for API extraction"
|
||||
)
|
||||
|
||||
return docs_json
|
||||
|
||||
def scrape_all_sources(self):
|
||||
"""
|
||||
Scrape all configured sources.
|
||||
@@ -259,50 +297,41 @@ class UnifiedScraper:
|
||||
"sources": [doc_source],
|
||||
}
|
||||
|
||||
# Write temporary config
|
||||
temp_config_path = os.path.join(self.data_dir, "temp_docs_config.json")
|
||||
with open(temp_config_path, "w", encoding="utf-8") as f:
|
||||
json.dump(doc_config, f, indent=2)
|
||||
|
||||
# Run doc_scraper as subprocess
|
||||
# Run doc_scraper directly (no subprocess needed with ExecutionContext)
|
||||
logger.info(f"Scraping documentation from {source['base_url']}")
|
||||
|
||||
doc_scraper_path = Path(__file__).parent / "doc_scraper.py"
|
||||
cmd = [sys.executable, str(doc_scraper_path), "--config", temp_config_path, "--fresh"]
|
||||
|
||||
# Forward agent-related CLI args so doc scraper enhancement respects
|
||||
# the user's chosen agent instead of defaulting to claude.
|
||||
cli_args = getattr(self, "_cli_args", None)
|
||||
if cli_args is not None:
|
||||
if getattr(cli_args, "agent", None):
|
||||
cmd.extend(["--agent", cli_args.agent])
|
||||
if getattr(cli_args, "agent_cmd", None):
|
||||
cmd.extend(["--agent-cmd", cli_args.agent_cmd])
|
||||
if getattr(cli_args, "api_key", None):
|
||||
cmd.extend(["--api-key", cli_args.api_key])
|
||||
|
||||
# Support "browser": true in source config for JavaScript SPA sites
|
||||
if source.get("browser", False):
|
||||
cmd.append("--browser")
|
||||
doc_config["browser"] = True
|
||||
logger.info(" 🌐 Browser mode enabled (JavaScript rendering via Playwright)")
|
||||
|
||||
# Import and call directly
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd, capture_output=True, text=True, stdin=subprocess.DEVNULL, timeout=3600
|
||||
from skill_seekers.cli.doc_scraper import scrape_documentation
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
# Create child context with doc-specific overrides
|
||||
doc_ctx = ExecutionContext.get().override(
|
||||
output__name=f"{self.name}_docs",
|
||||
scraping__max_pages=source.get("max_pages", 500),
|
||||
)
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.error("Documentation scraping timed out after 60 minutes")
|
||||
return
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.error(f"Documentation scraping failed with return code {result.returncode}")
|
||||
logger.error(f"STDERR: {result.stderr}")
|
||||
logger.error(f"STDOUT: {result.stdout}")
|
||||
return
|
||||
with doc_ctx:
|
||||
result = scrape_documentation(
|
||||
config=doc_config,
|
||||
ctx=ExecutionContext.get(),
|
||||
)
|
||||
|
||||
# Log subprocess output for debugging
|
||||
if result.stdout:
|
||||
logger.info(f"Doc scraper output: {result.stdout[-500:]}") # Last 500 chars
|
||||
if result != 0:
|
||||
logger.error(f"Documentation scraping failed with return code {result}")
|
||||
return
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Documentation scraping failed: {e}")
|
||||
import traceback
|
||||
|
||||
logger.debug(f"Traceback: {traceback.format_exc()}")
|
||||
return
|
||||
|
||||
# Load scraped data
|
||||
docs_data_file = f"output/{doc_config['name']}_data/summary.json"
|
||||
@@ -327,10 +356,6 @@ class UnifiedScraper:
|
||||
else:
|
||||
logger.warning("Documentation data file not found")
|
||||
|
||||
# Clean up temp config
|
||||
if os.path.exists(temp_config_path):
|
||||
os.remove(temp_config_path)
|
||||
|
||||
# Move intermediate files to cache to keep output/ clean
|
||||
docs_output_dir = f"output/{doc_config['name']}"
|
||||
docs_data_dir = f"output/{doc_config['name']}_data"
|
||||
@@ -1704,13 +1729,17 @@ class UnifiedScraper:
|
||||
docs_data = docs_list[0]
|
||||
github_data = github_list[0]
|
||||
|
||||
# Load data files
|
||||
# Load data files (cached for reuse in merge_sources)
|
||||
with open(docs_data["data_file"], encoding="utf-8") as f:
|
||||
docs_json = json.load(f)
|
||||
docs_json = self._enrich_docs_json(docs_json, docs_data["data_file"])
|
||||
|
||||
with open(github_data["data_file"], encoding="utf-8") as f:
|
||||
github_json = json.load(f)
|
||||
|
||||
self._cached_docs_json = docs_json
|
||||
self._cached_github_json = github_json
|
||||
|
||||
# Detect conflicts
|
||||
detector = ConflictDetector(docs_json, github_json)
|
||||
conflicts = detector.detect_all_conflicts()
|
||||
@@ -1758,15 +1787,18 @@ class UnifiedScraper:
|
||||
logger.warning("Missing documentation or GitHub data for merging")
|
||||
return None
|
||||
|
||||
docs_data = docs_list[0]
|
||||
github_data = github_list[0]
|
||||
# Reuse cached data from detect_conflicts() to avoid redundant disk I/O
|
||||
docs_json = getattr(self, "_cached_docs_json", None)
|
||||
github_json = getattr(self, "_cached_github_json", None)
|
||||
|
||||
# Load data
|
||||
with open(docs_data["data_file"], encoding="utf-8") as f:
|
||||
docs_json = json.load(f)
|
||||
|
||||
with open(github_data["data_file"], encoding="utf-8") as f:
|
||||
github_json = json.load(f)
|
||||
if docs_json is None or github_json is None:
|
||||
docs_data = docs_list[0]
|
||||
github_data = github_list[0]
|
||||
with open(docs_data["data_file"], encoding="utf-8") as f:
|
||||
docs_json = json.load(f)
|
||||
docs_json = self._enrich_docs_json(docs_json, docs_data["data_file"])
|
||||
with open(github_data["data_file"], encoding="utf-8") as f:
|
||||
github_json = json.load(f)
|
||||
|
||||
# Choose merger
|
||||
if self.merge_mode in ("ai-enhanced", "claude-enhanced"):
|
||||
@@ -1786,6 +1818,10 @@ class UnifiedScraper:
|
||||
|
||||
return merged_data
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to scrape_all_sources()."""
|
||||
self.scrape_all_sources()
|
||||
|
||||
def build_skill(self, merged_data: dict | None = None):
|
||||
"""
|
||||
Build final unified skill.
|
||||
@@ -1892,17 +1928,28 @@ class UnifiedScraper:
|
||||
run_workflows(effective_args, context=unified_context)
|
||||
|
||||
# Phase 6: AI Enhancement of SKILL.md
|
||||
# Triggered by config "enhancement" block or CLI --enhance-level
|
||||
# Read from ExecutionContext first (has correct priority resolution),
|
||||
# fall back to raw config dict for backward compatibility.
|
||||
enhancement_config = self.config.get("enhancement", {})
|
||||
enhancement_enabled = enhancement_config.get("enabled", False)
|
||||
enhancement_level = enhancement_config.get("level", 0)
|
||||
enhancement_mode = enhancement_config.get("mode", "AUTO").upper()
|
||||
try:
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
# CLI --enhance-level overrides config
|
||||
cli_enhance_level = getattr(args, "enhance_level", None) if args is not None else None
|
||||
if cli_enhance_level is not None:
|
||||
enhancement_enabled = cli_enhance_level > 0
|
||||
enhancement_level = cli_enhance_level
|
||||
ctx = ExecutionContext.get()
|
||||
enhancement_enabled = ctx.enhancement.enabled
|
||||
enhancement_level = ctx.enhancement.level
|
||||
enhancement_mode = ctx.enhancement.mode.upper()
|
||||
except (RuntimeError, Exception):
|
||||
# Fallback to raw config + args
|
||||
enhancement_enabled = enhancement_config.get("enabled", False)
|
||||
enhancement_level = enhancement_config.get("level", 0)
|
||||
enhancement_mode = enhancement_config.get("mode", "AUTO").upper()
|
||||
|
||||
cli_enhance_level = (
|
||||
getattr(args, "enhance_level", None) if args is not None else None
|
||||
)
|
||||
if cli_enhance_level is not None:
|
||||
enhancement_enabled = cli_enhance_level > 0
|
||||
enhancement_level = cli_enhance_level
|
||||
|
||||
if enhancement_enabled and enhancement_level > 0:
|
||||
logger.info("\n" + "=" * 60)
|
||||
@@ -1918,16 +1965,19 @@ class UnifiedScraper:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
# Get agent from CLI args, config enhancement block, or env var
|
||||
agent = None
|
||||
agent_cmd = None
|
||||
if args is not None:
|
||||
agent = getattr(args, "agent", None)
|
||||
agent_cmd = getattr(args, "agent_cmd", None)
|
||||
if not agent:
|
||||
agent = enhancement_config.get("agent", None)
|
||||
if not agent:
|
||||
agent = os.environ.get("SKILL_SEEKER_AGENT", "").strip() or None
|
||||
# Get agent from ExecutionContext (already resolved with correct priority)
|
||||
try:
|
||||
ctx = ExecutionContext.get()
|
||||
agent = ctx.enhancement.agent
|
||||
agent_cmd = ctx.enhancement.agent_cmd
|
||||
except (RuntimeError, Exception):
|
||||
agent = None
|
||||
agent_cmd = None
|
||||
if args is not None:
|
||||
agent = getattr(args, "agent", None)
|
||||
agent_cmd = getattr(args, "agent_cmd", None)
|
||||
if not agent:
|
||||
agent = os.environ.get("SKILL_SEEKER_AGENT", "").strip() or None
|
||||
|
||||
# Read timeout from config enhancement block
|
||||
timeout_val = enhancement_config.get("timeout")
|
||||
@@ -2016,183 +2066,14 @@ class UnifiedScraper:
|
||||
logger.info(f"📁 Output: {self.output_dir}/")
|
||||
logger.info(f"📁 Data: {self.data_dir}/")
|
||||
|
||||
return 0
|
||||
|
||||
except KeyboardInterrupt:
|
||||
logger.info("\n\n⚠️ Scraping interrupted by user")
|
||||
sys.exit(1)
|
||||
return 130
|
||||
except Exception as e:
|
||||
logger.error(f"\n\n❌ Error during scraping: {e}")
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Unified multi-source scraper",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Basic usage with unified config
|
||||
skill-seekers unified --config configs/godot_unified.json
|
||||
|
||||
# Override merge mode
|
||||
skill-seekers unified --config configs/react_unified.json --merge-mode ai-enhanced
|
||||
|
||||
# Backward compatible with legacy configs
|
||||
skill-seekers unified --config configs/react.json
|
||||
""",
|
||||
)
|
||||
|
||||
parser.add_argument("--config", "-c", required=True, help="Path to unified config JSON file")
|
||||
parser.add_argument(
|
||||
"--merge-mode",
|
||||
"-m",
|
||||
choices=["rule-based", "ai-enhanced", "claude-enhanced"],
|
||||
help="Override config merge mode (ai-enhanced or rule-based). 'claude-enhanced' accepted as alias.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-codebase-analysis",
|
||||
action="store_true",
|
||||
help="Skip C3.x codebase analysis for GitHub sources (default: enabled)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--fresh",
|
||||
action="store_true",
|
||||
help="Clear any existing data and start fresh (ignore checkpoints)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="Preview what will be scraped without actually scraping",
|
||||
)
|
||||
# Enhancement Workflow arguments (mirrors scrape/github/pdf/codebase scrapers)
|
||||
parser.add_argument(
|
||||
"--enhance-workflow",
|
||||
action="append",
|
||||
dest="enhance_workflow",
|
||||
help="Apply enhancement workflow (file path or preset). Can use multiple times to chain workflows.",
|
||||
metavar="WORKFLOW",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--enhance-stage",
|
||||
action="append",
|
||||
dest="enhance_stage",
|
||||
help="Add inline enhancement stage (format: 'name:prompt'). Can be used multiple times.",
|
||||
metavar="STAGE",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--var",
|
||||
action="append",
|
||||
dest="var",
|
||||
help="Override workflow variable (format: 'key=value'). Can be used multiple times.",
|
||||
metavar="VAR",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--workflow-dry-run",
|
||||
action="store_true",
|
||||
dest="workflow_dry_run",
|
||||
help="Preview workflow stages without executing (requires --enhance-workflow)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--api-key",
|
||||
type=str,
|
||||
metavar="KEY",
|
||||
help="Anthropic API key (or set ANTHROPIC_API_KEY env var)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--enhance-level",
|
||||
type=int,
|
||||
choices=[0, 1, 2, 3],
|
||||
default=None,
|
||||
metavar="LEVEL",
|
||||
help=(
|
||||
"Global AI enhancement level override for all sources "
|
||||
"(0=off, 1=SKILL.md, 2=+arch/config, 3=full). "
|
||||
"Overrides per-source enhance_level in config."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
"--agent",
|
||||
type=str,
|
||||
choices=["claude", "codex", "copilot", "opencode", "kimi", "custom"],
|
||||
metavar="AGENT",
|
||||
help="Local coding agent for enhancement (default: AI agent from SKILL_SEEKER_AGENT env var)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--agent-cmd",
|
||||
type=str,
|
||||
metavar="CMD",
|
||||
help="Override agent command template (advanced)",
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
setup_logging()
|
||||
|
||||
# Create scraper
|
||||
scraper = UnifiedScraper(args.config, args.merge_mode)
|
||||
|
||||
# Disable codebase analysis if requested
|
||||
if args.skip_codebase_analysis:
|
||||
for source in scraper.config.get("sources", []):
|
||||
if source["type"] == "github":
|
||||
source["enable_codebase_analysis"] = False
|
||||
logger.info(
|
||||
f"⏭️ Skipping codebase analysis for GitHub source: {source.get('repo', 'unknown')}"
|
||||
)
|
||||
|
||||
# Handle --fresh flag (clear cache)
|
||||
if args.fresh:
|
||||
import shutil
|
||||
|
||||
if os.path.exists(scraper.cache_dir):
|
||||
logger.info(f"🧹 Clearing cache: {scraper.cache_dir}")
|
||||
shutil.rmtree(scraper.cache_dir)
|
||||
# Recreate directories
|
||||
os.makedirs(scraper.sources_dir, exist_ok=True)
|
||||
os.makedirs(scraper.data_dir, exist_ok=True)
|
||||
os.makedirs(scraper.repos_dir, exist_ok=True)
|
||||
os.makedirs(scraper.logs_dir, exist_ok=True)
|
||||
|
||||
# Handle --dry-run flag
|
||||
if args.dry_run:
|
||||
logger.info("🔍 DRY RUN MODE - Preview only, no scraping will occur")
|
||||
logger.info(f"\nWould scrape {len(scraper.config.get('sources', []))} sources:")
|
||||
# Source type display config: type -> (label, key for detail)
|
||||
_SOURCE_DISPLAY = {
|
||||
"documentation": ("Documentation", "base_url"),
|
||||
"github": ("GitHub", "repo"),
|
||||
"pdf": ("PDF", "path"),
|
||||
"word": ("Word", "path"),
|
||||
"epub": ("EPUB", "path"),
|
||||
"video": ("Video", "url"),
|
||||
"local": ("Local Codebase", "path"),
|
||||
"jupyter": ("Jupyter Notebook", "path"),
|
||||
"html": ("HTML", "path"),
|
||||
"openapi": ("OpenAPI Spec", "path"),
|
||||
"asciidoc": ("AsciiDoc", "path"),
|
||||
"pptx": ("PowerPoint", "path"),
|
||||
"confluence": ("Confluence", "base_url"),
|
||||
"notion": ("Notion", "page_id"),
|
||||
"rss": ("RSS/Atom Feed", "url"),
|
||||
"manpage": ("Man Page", "names"),
|
||||
"chat": ("Chat Export", "path"),
|
||||
}
|
||||
for idx, source in enumerate(scraper.config.get("sources", []), 1):
|
||||
source_type = source.get("type", "unknown")
|
||||
label, key = _SOURCE_DISPLAY.get(source_type, (source_type.title(), "path"))
|
||||
detail = source.get(key, "N/A")
|
||||
if isinstance(detail, list):
|
||||
detail = ", ".join(str(d) for d in detail)
|
||||
logger.info(f" {idx}. {label}: {detail}")
|
||||
logger.info(f"\nOutput directory: {scraper.output_dir}")
|
||||
logger.info(f"Merge mode: {scraper.merge_mode}")
|
||||
return
|
||||
|
||||
# Run scraper (pass args for workflow integration)
|
||||
scraper.run(args=args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
return 1
|
||||
|
||||
@@ -15,6 +15,7 @@ discrepancies transparently.
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
@@ -532,12 +533,49 @@ This skill synthesizes knowledge from multiple sources:
|
||||
logger.warning("No source SKILL.md files found, generating minimal SKILL.md (legacy)")
|
||||
content = self._generate_minimal_skill_md()
|
||||
|
||||
# Ensure frontmatter uses config name/description, not auto-generated slugs
|
||||
content = self._normalize_frontmatter(content)
|
||||
|
||||
# Write final content
|
||||
with open(skill_path, "w", encoding="utf-8") as f:
|
||||
f.write(content)
|
||||
|
||||
logger.info(f"Created SKILL.md ({len(content)} chars, ~{len(content.split())} words)")
|
||||
|
||||
def _normalize_frontmatter(self, content: str) -> str:
|
||||
"""Ensure SKILL.md frontmatter uses the config name and description.
|
||||
|
||||
Standalone source SKILL.md files may have auto-generated slugs
|
||||
(e.g., 'primetween-github-0-kyrylokuzyk-primetween'). This replaces
|
||||
the name and description with the canonical values from the config.
|
||||
"""
|
||||
if not content.startswith("---"):
|
||||
return content
|
||||
|
||||
end = content.find("---", 3)
|
||||
if end == -1:
|
||||
return content
|
||||
|
||||
frontmatter = content[3:end]
|
||||
body = content[end + 3 :]
|
||||
|
||||
canonical_name = self.name.lower().replace("_", "-").replace(" ", "-")[:64]
|
||||
frontmatter = re.sub(
|
||||
r"^name:.*$", f"name: {canonical_name}", frontmatter, count=1, flags=re.MULTILINE
|
||||
)
|
||||
|
||||
# Handle both single-line and multiline YAML description values
|
||||
desc = self.description[:1024] if len(self.description) > 1024 else self.description
|
||||
frontmatter = re.sub(
|
||||
r"^description:.*(?:\n[ \t]+.*)*$",
|
||||
f"description: {desc}",
|
||||
frontmatter,
|
||||
count=1,
|
||||
flags=re.MULTILINE,
|
||||
)
|
||||
|
||||
return f"---{frontmatter}---{body}"
|
||||
|
||||
def _synthesize_docs_pdf(self, skill_mds: dict[str, str]) -> str:
|
||||
"""Synthesize documentation + PDF sources.
|
||||
|
||||
@@ -958,8 +996,8 @@ This skill combines knowledge from multiple sources:
|
||||
def _format_api_entry(self, api_data: dict, inline_conflict: bool = False) -> str:
|
||||
"""Format a single API entry."""
|
||||
name = api_data.get("name", "Unknown")
|
||||
signature = api_data.get("merged_signature", name)
|
||||
description = api_data.get("merged_description", "")
|
||||
signature = api_data.get("merged_signature", api_data.get("signature", name))
|
||||
description = api_data.get("merged_description", api_data.get("description", ""))
|
||||
warning = api_data.get("warning", "")
|
||||
|
||||
entry = f"#### `{signature}`\n\n"
|
||||
@@ -1302,7 +1340,7 @@ This skill combines knowledge from multiple sources:
|
||||
apis = self.merged_data.get("apis", {})
|
||||
|
||||
for api_name in sorted(apis.keys()):
|
||||
api_data = apis[api_name]
|
||||
api_data = {**apis[api_name], "name": api_name}
|
||||
entry = self._format_api_entry(api_data, inline_conflict=True)
|
||||
f.write(entry)
|
||||
|
||||
@@ -1386,16 +1424,33 @@ This skill combines knowledge from multiple sources:
|
||||
if c3_data.get("architecture"):
|
||||
languages = c3_data["architecture"].get("languages", {})
|
||||
|
||||
# If no languages from C3.7, try to get from GitHub data
|
||||
# github_data already available from method scope
|
||||
if not languages and github_data.get("languages"):
|
||||
# GitHub data has languages as list, convert to dict with count 1
|
||||
languages = dict.fromkeys(github_data["languages"], 1)
|
||||
# If no languages from C3.7, try to get from code_analysis or GitHub data
|
||||
if not languages:
|
||||
code_analysis = github_data.get("code_analysis", {})
|
||||
if code_analysis.get("files_analyzed") and code_analysis.get("languages_analyzed"):
|
||||
# Use code_analysis file counts per language
|
||||
files = code_analysis.get("files", [])
|
||||
lang_counts = {}
|
||||
for file_info in files:
|
||||
lang = file_info.get("language", "Unknown")
|
||||
lang_counts[lang] = lang_counts.get(lang, 0) + 1
|
||||
if lang_counts:
|
||||
languages = lang_counts
|
||||
else:
|
||||
# Fallback: total count attributed to primary language
|
||||
for lang in code_analysis["languages_analyzed"]:
|
||||
languages[lang] = code_analysis["files_analyzed"]
|
||||
elif github_data.get("languages"):
|
||||
gh_langs = github_data["languages"]
|
||||
if isinstance(gh_langs, dict):
|
||||
languages = dict.fromkeys(gh_langs, 0)
|
||||
elif isinstance(gh_langs, list):
|
||||
languages = dict.fromkeys(gh_langs, 0)
|
||||
|
||||
if languages:
|
||||
f.write("**Languages Detected**:\n")
|
||||
for lang, count in sorted(languages.items(), key=lambda x: x[1], reverse=True)[:5]:
|
||||
if isinstance(count, int):
|
||||
if isinstance(count, int) and count > 0:
|
||||
f.write(f"- {lang}: {count} files\n")
|
||||
else:
|
||||
f.write(f"- {lang}\n")
|
||||
@@ -1534,6 +1589,24 @@ This skill combines knowledge from multiple sources:
|
||||
|
||||
logger.info("📐 Created ARCHITECTURE.md")
|
||||
|
||||
@staticmethod
|
||||
def _make_path_relative(file_path: str) -> str:
|
||||
"""Strip absolute path prefixes, keeping only the repo-relative path."""
|
||||
# Strip .skillseeker-cache repo clone paths
|
||||
if ".skillseeker-cache" in file_path:
|
||||
# Pattern: ...repos/{idx}_{owner}_{repo}/relative/path
|
||||
parts = file_path.split("/repos/")
|
||||
if len(parts) > 1:
|
||||
# Skip the repo dir name (e.g., '0_Owner_Repo/')
|
||||
remainder = parts[1]
|
||||
slash_idx = remainder.find("/")
|
||||
if slash_idx != -1:
|
||||
return remainder[slash_idx + 1 :]
|
||||
# Generic: if it looks absolute, try to make it relative
|
||||
if os.path.isabs(file_path):
|
||||
return os.path.basename(file_path)
|
||||
return file_path
|
||||
|
||||
def _generate_pattern_references(self, c3_dir: str, patterns_data: dict):
|
||||
"""Generate design pattern references (C3.1)."""
|
||||
if not patterns_data:
|
||||
@@ -1556,7 +1629,8 @@ This skill combines knowledge from multiple sources:
|
||||
for file_data in patterns_data:
|
||||
patterns = file_data.get("patterns", [])
|
||||
if patterns:
|
||||
f.write(f"## {file_data['file_path']}\n\n")
|
||||
display_path = self._make_path_relative(file_data["file_path"])
|
||||
f.write(f"## {display_path}\n\n")
|
||||
for p in patterns:
|
||||
f.write(f"### {p['pattern_type']}\n\n")
|
||||
if p.get("class_name"):
|
||||
|
||||
@@ -14,14 +14,13 @@ Usage:
|
||||
python3 video_scraper.py --from-json video_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
||||
from skill_seekers.cli.skill_converter import SkillConverter
|
||||
from skill_seekers.cli.video_models import (
|
||||
AudioVisualAlignment,
|
||||
TextGroupTimeline,
|
||||
@@ -318,9 +317,11 @@ def _ai_clean_reference(ref_path: str, content: str, api_key: str | None = None)
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class VideoToSkillConverter:
|
||||
class VideoToSkillConverter(SkillConverter):
|
||||
"""Convert video content to AI skill."""
|
||||
|
||||
SOURCE_TYPE = "video"
|
||||
|
||||
def __init__(self, config: dict):
|
||||
"""Initialize converter.
|
||||
|
||||
@@ -333,6 +334,7 @@ class VideoToSkillConverter:
|
||||
- visual: Whether to enable visual extraction
|
||||
- whisper_model: Whisper model size
|
||||
"""
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.description = config.get("description", "")
|
||||
@@ -355,6 +357,10 @@ class VideoToSkillConverter:
|
||||
# Results
|
||||
self.result: VideoScraperResult | None = None
|
||||
|
||||
def extract(self):
|
||||
"""Extract content from video source (SkillConverter interface)."""
|
||||
self.process()
|
||||
|
||||
def process(self) -> VideoScraperResult:
|
||||
"""Run the full video processing pipeline.
|
||||
|
||||
@@ -1015,241 +1021,3 @@ class VideoToSkillConverter:
|
||||
lines.append(f"- [{video.title}](references/{ref_filename})")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# CLI Entry Point
|
||||
# =============================================================================
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""Entry point for video scraper CLI.
|
||||
|
||||
Returns:
|
||||
Exit code (0 for success, non-zero for error).
|
||||
"""
|
||||
from skill_seekers.cli.arguments.video import add_video_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
prog="skill-seekers-video",
|
||||
description="Extract transcripts and metadata from videos and generate skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""\
|
||||
Examples:
|
||||
skill-seekers video --url https://www.youtube.com/watch?v=...
|
||||
skill-seekers video --video-file recording.mp4
|
||||
skill-seekers video --playlist https://www.youtube.com/playlist?list=...
|
||||
skill-seekers video --from-json video_extracted.json
|
||||
skill-seekers video --url https://youtu.be/... --languages en,es
|
||||
""",
|
||||
)
|
||||
|
||||
add_video_arguments(parser)
|
||||
args = parser.parse_args()
|
||||
|
||||
# --setup: run GPU detection + dependency installation, then exit
|
||||
if getattr(args, "setup", False):
|
||||
from skill_seekers.cli.video_setup import run_setup
|
||||
|
||||
return run_setup(interactive=True)
|
||||
|
||||
# Setup logging
|
||||
log_level = logging.DEBUG if args.verbose else (logging.WARNING if args.quiet else logging.INFO)
|
||||
logging.basicConfig(level=log_level, format="%(levelname)s: %(message)s")
|
||||
|
||||
# Validate inputs
|
||||
has_source = any(
|
||||
[
|
||||
getattr(args, "url", None),
|
||||
getattr(args, "video_file", None),
|
||||
getattr(args, "playlist", None),
|
||||
]
|
||||
)
|
||||
has_json = getattr(args, "from_json", None)
|
||||
|
||||
if not has_source and not has_json:
|
||||
parser.error("Must specify --url, --video-file, --playlist, or --from-json")
|
||||
|
||||
# Parse and validate time clipping
|
||||
raw_start = getattr(args, "start_time", None)
|
||||
raw_end = getattr(args, "end_time", None)
|
||||
clip_start: float | None = None
|
||||
clip_end: float | None = None
|
||||
|
||||
if raw_start is not None:
|
||||
try:
|
||||
clip_start = parse_time_to_seconds(raw_start)
|
||||
except ValueError as exc:
|
||||
parser.error(f"--start-time: {exc}")
|
||||
if raw_end is not None:
|
||||
try:
|
||||
clip_end = parse_time_to_seconds(raw_end)
|
||||
except ValueError as exc:
|
||||
parser.error(f"--end-time: {exc}")
|
||||
|
||||
if clip_start is not None or clip_end is not None:
|
||||
if getattr(args, "playlist", None):
|
||||
parser.error("--start-time/--end-time cannot be used with --playlist")
|
||||
if clip_start is not None and clip_end is not None and clip_start >= clip_end:
|
||||
parser.error(f"--start-time ({clip_start}s) must be before --end-time ({clip_end}s)")
|
||||
|
||||
# Build config
|
||||
config = {
|
||||
"name": args.name or "video_skill",
|
||||
"description": getattr(args, "description", None) or "",
|
||||
"output": getattr(args, "output", None),
|
||||
"url": getattr(args, "url", None),
|
||||
"video_file": getattr(args, "video_file", None),
|
||||
"playlist": getattr(args, "playlist", None),
|
||||
"languages": getattr(args, "languages", "en"),
|
||||
"visual": getattr(args, "visual", False),
|
||||
"whisper_model": getattr(args, "whisper_model", "base"),
|
||||
"visual_interval": getattr(args, "visual_interval", 0.7),
|
||||
"visual_min_gap": getattr(args, "visual_min_gap", 0.5),
|
||||
"visual_similarity": getattr(args, "visual_similarity", 3.0),
|
||||
"vision_ocr": getattr(args, "vision_ocr", False),
|
||||
"start_time": clip_start,
|
||||
"end_time": clip_end,
|
||||
}
|
||||
|
||||
converter = VideoToSkillConverter(config)
|
||||
|
||||
# Dry run
|
||||
if args.dry_run:
|
||||
logger.info("DRY RUN — would process:")
|
||||
for key in ["url", "video_file", "playlist"]:
|
||||
if config.get(key):
|
||||
logger.info(f" {key}: {config[key]}")
|
||||
logger.info(f" name: {config['name']}")
|
||||
logger.info(f" languages: {config['languages']}")
|
||||
logger.info(f" visual: {config['visual']}")
|
||||
if clip_start is not None or clip_end is not None:
|
||||
start_str = _format_duration(clip_start) if clip_start is not None else "start"
|
||||
end_str = _format_duration(clip_end) if clip_end is not None else "end"
|
||||
logger.info(f" clip range: {start_str} - {end_str}")
|
||||
return 0
|
||||
|
||||
# Workflow 1: Build from JSON
|
||||
if has_json:
|
||||
logger.info(f"Loading extracted data from {args.from_json}")
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
logger.info(f"Skill built at {converter.skill_dir}")
|
||||
return 0
|
||||
|
||||
# Workflow 2: Full extraction + build
|
||||
try:
|
||||
result = converter.process()
|
||||
if not result.videos:
|
||||
logger.error("No videos were successfully processed")
|
||||
if result.errors:
|
||||
for err in result.errors:
|
||||
logger.error(f" {err['source']}: {err['error']}")
|
||||
return 1
|
||||
|
||||
converter.save_extracted_data()
|
||||
converter.build_skill()
|
||||
|
||||
logger.info(f"\nSkill built successfully at {converter.skill_dir}")
|
||||
logger.info(f" Videos: {len(result.videos)}")
|
||||
logger.info(f" Segments: {result.total_segments}")
|
||||
logger.info(f" Duration: {_format_duration(result.total_duration_seconds)}")
|
||||
logger.info(f" Processing time: {result.processing_time_seconds:.1f}s")
|
||||
|
||||
if result.warnings:
|
||||
for w in result.warnings:
|
||||
logger.warning(f" {w}")
|
||||
|
||||
except RuntimeError as e:
|
||||
logger.error(str(e))
|
||||
return 1
|
||||
|
||||
# Enhancement
|
||||
enhance_level = getattr(args, "enhance_level", 0)
|
||||
if enhance_level > 0:
|
||||
# Pass 1: Clean reference files (Code Timeline reconstruction)
|
||||
converter._enhance_reference_files(enhance_level, args)
|
||||
|
||||
# Auto-inject video-tutorial workflow if no workflow specified
|
||||
if not getattr(args, "enhance_workflow", None):
|
||||
args.enhance_workflow = ["video-tutorial"]
|
||||
|
||||
# Pass 2: Run workflow stages (specialized video analysis)
|
||||
try:
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
video_context = {
|
||||
"skill_name": converter.name,
|
||||
"skill_dir": converter.skill_dir,
|
||||
"source_type": "video_tutorial",
|
||||
}
|
||||
run_workflows(args, context=video_context)
|
||||
except ImportError:
|
||||
logger.debug("Workflow runner not available, skipping workflow stages")
|
||||
|
||||
# Run traditional SKILL.md enhancement (reads references + rewrites)
|
||||
_run_video_enhancement(converter.skill_dir, enhance_level, args)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def _run_video_enhancement(skill_dir: str, enhance_level: int, args) -> None:
|
||||
"""Run traditional SKILL.md enhancement with video-aware prompt.
|
||||
|
||||
This calls the same SkillEnhancer used by other scrapers, but the prompt
|
||||
auto-detects video_tutorial source type and uses a video-specific prompt.
|
||||
"""
|
||||
import os
|
||||
import subprocess
|
||||
|
||||
has_api_key = bool(
|
||||
os.environ.get("ANTHROPIC_API_KEY")
|
||||
or os.environ.get("ANTHROPIC_AUTH_TOKEN")
|
||||
or getattr(args, "api_key", None)
|
||||
or os.environ.get("MOONSHOT_API_KEY")
|
||||
)
|
||||
|
||||
agent = getattr(args, "agent", None)
|
||||
|
||||
if not has_api_key and not agent:
|
||||
logger.info("\n💡 Enhance your video skill with AI:")
|
||||
logger.info(f" export ANTHROPIC_API_KEY=sk-ant-...")
|
||||
logger.info(f" skill-seekers enhance {skill_dir} --enhance-level {enhance_level}")
|
||||
return
|
||||
|
||||
logger.info(f"\n🤖 Running video-aware SKILL.md enhancement (level {enhance_level})...")
|
||||
|
||||
try:
|
||||
enhance_cmd = ["skill-seekers-enhance", skill_dir]
|
||||
api_key = getattr(args, "api_key", None)
|
||||
if api_key:
|
||||
enhance_cmd.extend(["--api-key", api_key])
|
||||
if agent:
|
||||
enhance_cmd.extend(["--agent", agent])
|
||||
|
||||
logger.info(
|
||||
"Starting video skill enhancement (this may take 10+ minutes "
|
||||
"for large videos with AI enhancement)..."
|
||||
)
|
||||
subprocess.run(enhance_cmd, check=True, timeout=1800)
|
||||
logger.info("Video skill enhancement complete!")
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning(
|
||||
"⚠ Enhancement timed out after 30 minutes. "
|
||||
"The skill was still built without enhancement. "
|
||||
"You can retry manually with:\n"
|
||||
f" skill-seekers enhance {skill_dir} --enhance-level {enhance_level}"
|
||||
)
|
||||
except subprocess.CalledProcessError as exc:
|
||||
logger.warning(
|
||||
f"⚠ Enhancement failed (exit code {exc.returncode}), "
|
||||
"but skill was still built. You can retry manually with:\n"
|
||||
f" skill-seekers enhance {skill_dir} --enhance-level {enhance_level}"
|
||||
)
|
||||
except FileNotFoundError:
|
||||
logger.warning("⚠ skill-seekers-enhance not found. Run manually:")
|
||||
logger.info(f" skill-seekers enhance {skill_dir} --enhance-level {enhance_level}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -10,12 +10,10 @@ Usage:
|
||||
python3 word_scraper.py --from-json document_extracted.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Optional dependency guard
|
||||
@@ -27,6 +25,8 @@ try:
|
||||
except ImportError:
|
||||
WORD_AVAILABLE = False
|
||||
|
||||
from .skill_converter import SkillConverter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -72,10 +72,13 @@ def infer_description_from_word(metadata: dict = None, name: str = "") -> str:
|
||||
)
|
||||
|
||||
|
||||
class WordToSkillConverter:
|
||||
class WordToSkillConverter(SkillConverter):
|
||||
"""Convert Word document (.docx) to AI skill."""
|
||||
|
||||
SOURCE_TYPE = "word"
|
||||
|
||||
def __init__(self, config):
|
||||
super().__init__(config)
|
||||
self.config = config
|
||||
self.name = config["name"]
|
||||
self.docx_path = config.get("docx_path", "")
|
||||
@@ -93,6 +96,10 @@ class WordToSkillConverter:
|
||||
# Extracted data
|
||||
self.extracted_data = None
|
||||
|
||||
def extract(self):
|
||||
"""SkillConverter interface — delegates to extract_docx()."""
|
||||
return self.extract_docx()
|
||||
|
||||
def extract_docx(self):
|
||||
"""Extract content from Word document using mammoth + python-docx.
|
||||
|
||||
@@ -918,146 +925,3 @@ def _score_code_quality(code: str) -> float:
|
||||
score -= 2.0
|
||||
|
||||
return min(10.0, max(0.0, score))
|
||||
|
||||
|
||||
def main():
|
||||
from .arguments.word import add_word_arguments
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Convert Word document (.docx) to AI skill",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
|
||||
add_word_arguments(parser)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Set logging level
|
||||
if getattr(args, "quiet", False):
|
||||
logging.getLogger().setLevel(logging.WARNING)
|
||||
elif getattr(args, "verbose", False):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
# Handle --dry-run
|
||||
if getattr(args, "dry_run", False):
|
||||
source = getattr(args, "docx", None) or getattr(args, "from_json", None) or "(none)"
|
||||
print(f"\n{'=' * 60}")
|
||||
print("DRY RUN: Word Document Extraction")
|
||||
print(f"{'=' * 60}")
|
||||
print(f"Source: {source}")
|
||||
print(f"Name: {getattr(args, 'name', None) or '(auto-detect)'}")
|
||||
print(f"Enhance level: {getattr(args, 'enhance_level', 0)}")
|
||||
print(f"\n✅ Dry run complete")
|
||||
return 0
|
||||
|
||||
# Validate inputs
|
||||
if not (getattr(args, "docx", None) or getattr(args, "from_json", None)):
|
||||
parser.error("Must specify --docx or --from-json")
|
||||
|
||||
# Build from JSON workflow
|
||||
if getattr(args, "from_json", None):
|
||||
name = Path(args.from_json).stem.replace("_extracted", "")
|
||||
config = {
|
||||
"name": getattr(args, "name", None) or name,
|
||||
"description": getattr(args, "description", None)
|
||||
or f"Use when referencing {name} documentation",
|
||||
}
|
||||
try:
|
||||
converter = WordToSkillConverter(config)
|
||||
converter.load_extracted_data(args.from_json)
|
||||
converter.build_skill()
|
||||
except Exception as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return 0
|
||||
|
||||
# Direct DOCX mode
|
||||
if not getattr(args, "name", None):
|
||||
# Auto-detect name from filename
|
||||
args.name = Path(args.docx).stem
|
||||
|
||||
config = {
|
||||
"name": args.name,
|
||||
"docx_path": args.docx,
|
||||
# Pass None so extract_docx() can infer from document metadata (subject/title)
|
||||
"description": getattr(args, "description", None),
|
||||
}
|
||||
if getattr(args, "categories", None):
|
||||
config["categories"] = args.categories
|
||||
|
||||
try:
|
||||
converter = WordToSkillConverter(config)
|
||||
|
||||
# Extract
|
||||
if not converter.extract_docx():
|
||||
print("\n❌ Word extraction failed - see error above", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# Build skill
|
||||
converter.build_skill()
|
||||
|
||||
# Enhancement Workflow Integration
|
||||
from skill_seekers.cli.workflow_runner import run_workflows
|
||||
|
||||
workflow_executed, workflow_names = run_workflows(args)
|
||||
workflow_name = ", ".join(workflow_names) if workflow_names else None
|
||||
|
||||
# Traditional enhancement (complements workflow system)
|
||||
if getattr(args, "enhance_level", 0) > 0:
|
||||
import os
|
||||
|
||||
api_key = getattr(args, "api_key", None) or os.environ.get("ANTHROPIC_API_KEY")
|
||||
mode = "API" if api_key else "LOCAL"
|
||||
|
||||
print("\n" + "=" * 80)
|
||||
print(f"🤖 Traditional AI Enhancement ({mode} mode, level {args.enhance_level})")
|
||||
print("=" * 80)
|
||||
if workflow_executed:
|
||||
print(f" Running after workflow: {workflow_name}")
|
||||
print(
|
||||
" (Workflow provides specialized analysis, enhancement provides general improvements)"
|
||||
)
|
||||
print("")
|
||||
|
||||
skill_dir = converter.skill_dir
|
||||
if api_key:
|
||||
try:
|
||||
from skill_seekers.cli.enhance_skill import enhance_skill_md
|
||||
|
||||
enhance_skill_md(skill_dir, api_key)
|
||||
print("✅ API enhancement complete!")
|
||||
except ImportError:
|
||||
print("❌ API enhancement not available. Falling back to LOCAL mode...")
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
else:
|
||||
from pathlib import Path
|
||||
from skill_seekers.cli.enhance_skill_local import LocalSkillEnhancer
|
||||
|
||||
agent = getattr(args, "agent", None) if args else None
|
||||
agent_cmd = getattr(args, "agent_cmd", None) if args else None
|
||||
enhancer = LocalSkillEnhancer(Path(skill_dir), agent=agent, agent_cmd=agent_cmd)
|
||||
enhancer.run(headless=True)
|
||||
print("✅ Local enhancement complete!")
|
||||
|
||||
except RuntimeError as e:
|
||||
print(f"\n❌ Error: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n❌ Unexpected error during Word processing: {e}", file=sys.stderr)
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
||||
@@ -13,7 +13,9 @@ This module contains all scraping-related MCP tool implementations:
|
||||
Extracted from server.py for better modularity and organization.
|
||||
"""
|
||||
|
||||
import io
|
||||
import json
|
||||
import logging
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
@@ -34,6 +36,48 @@ except ImportError:
|
||||
CLI_DIR = Path(__file__).parent.parent.parent / "cli"
|
||||
|
||||
|
||||
def _run_converter(converter, progress_msg: str) -> list:
|
||||
"""Run a converter in-process with log capture.
|
||||
|
||||
Args:
|
||||
converter: An initialized SkillConverter instance.
|
||||
progress_msg: Progress message to prepend to output.
|
||||
|
||||
Returns:
|
||||
List[TextContent] with success/error message.
|
||||
"""
|
||||
log_capture = io.StringIO()
|
||||
handler = logging.StreamHandler(log_capture)
|
||||
handler.setLevel(logging.INFO)
|
||||
sk_logger = logging.getLogger("skill_seekers")
|
||||
sk_logger.addHandler(handler)
|
||||
try:
|
||||
result = converter.run()
|
||||
except Exception as exc:
|
||||
captured = log_capture.getvalue()
|
||||
return [
|
||||
TextContent(
|
||||
type="text",
|
||||
text=f"{progress_msg}{captured}\n\n❌ Converter raised an exception:\n{exc}",
|
||||
)
|
||||
]
|
||||
finally:
|
||||
sk_logger.removeHandler(handler)
|
||||
|
||||
captured = log_capture.getvalue()
|
||||
output = progress_msg + captured
|
||||
|
||||
if result == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
return [
|
||||
TextContent(
|
||||
type="text",
|
||||
text=f"{output}\n\n❌ Converter returned non-zero exit code ({result})",
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def run_subprocess_with_streaming(cmd: list[str], timeout: int = None) -> tuple:
|
||||
"""
|
||||
Run subprocess with real-time output streaming.
|
||||
@@ -141,10 +185,11 @@ async def estimate_pages_tool(args: dict) -> list[TextContent]:
|
||||
# Estimate: 0.5s per page discovered
|
||||
timeout = max(300, max_discovery // 2) # Minimum 5 minutes
|
||||
|
||||
# Run estimate_pages.py
|
||||
# Run estimate_pages module
|
||||
cmd = [
|
||||
sys.executable,
|
||||
str(CLI_DIR / "estimate_pages.py"),
|
||||
"-m",
|
||||
"skill_seekers.cli.estimate_pages",
|
||||
config_path,
|
||||
"--max-discovery",
|
||||
str(max_discovery),
|
||||
@@ -185,8 +230,6 @@ async def scrape_docs_tool(args: dict) -> list[TextContent]:
|
||||
"""
|
||||
config_path = args["config_path"]
|
||||
unlimited = args.get("unlimited", False)
|
||||
enhance_local = args.get("enhance_local", False)
|
||||
skip_scrape = args.get("skip_scrape", False)
|
||||
dry_run = args.get("dry_run", False)
|
||||
merge_mode = args.get("merge_mode")
|
||||
|
||||
@@ -218,80 +261,52 @@ async def scrape_docs_tool(args: dict) -> list[TextContent]:
|
||||
else:
|
||||
config_to_use = config_path
|
||||
|
||||
# Choose scraper based on format
|
||||
# Build progress message
|
||||
if is_unified:
|
||||
scraper_script = "unified_scraper.py"
|
||||
progress_msg = "🔄 Starting unified multi-source scraping...\n"
|
||||
progress_msg += "📦 Config format: Unified (multiple sources)\n"
|
||||
else:
|
||||
scraper_script = "doc_scraper.py"
|
||||
progress_msg = "🔄 Starting scraping process...\n"
|
||||
progress_msg += "📦 Config format: Legacy (single source)\n"
|
||||
|
||||
# Build command
|
||||
cmd = [sys.executable, str(CLI_DIR / scraper_script), "--config", config_to_use]
|
||||
|
||||
# Add merge mode for unified configs
|
||||
if is_unified and merge_mode:
|
||||
cmd.extend(["--merge-mode", merge_mode])
|
||||
|
||||
# Add --fresh to avoid user input prompts when existing data found
|
||||
if not skip_scrape:
|
||||
cmd.append("--fresh")
|
||||
|
||||
if enhance_local:
|
||||
cmd.append("--enhance-local")
|
||||
if skip_scrape:
|
||||
cmd.append("--skip-scrape")
|
||||
if dry_run:
|
||||
cmd.append("--dry-run")
|
||||
|
||||
# Determine timeout based on operation type
|
||||
if dry_run:
|
||||
timeout = 300 # 5 minutes for dry run
|
||||
elif skip_scrape:
|
||||
timeout = 600 # 10 minutes for building from cache
|
||||
elif unlimited:
|
||||
timeout = None # No timeout for unlimited mode (user explicitly requested)
|
||||
else:
|
||||
# Read config to estimate timeout
|
||||
try:
|
||||
if is_unified:
|
||||
# For unified configs, estimate based on all sources
|
||||
total_pages = 0
|
||||
for source in config.get("sources", []):
|
||||
if source.get("type") == "documentation":
|
||||
total_pages += source.get("max_pages", 500)
|
||||
max_pages = total_pages or 500
|
||||
else:
|
||||
max_pages = config.get("max_pages", 500)
|
||||
|
||||
# Estimate: 30s per page + buffer
|
||||
timeout = max(3600, max_pages * 35) # Minimum 1 hour, or 35s per page
|
||||
except Exception:
|
||||
timeout = 14400 # Default: 4 hours
|
||||
|
||||
# Add progress message
|
||||
if timeout:
|
||||
progress_msg += f"⏱️ Maximum time allowed: {timeout // 60} minutes\n"
|
||||
else:
|
||||
progress_msg += "⏱️ Unlimited mode - no timeout\n"
|
||||
progress_msg += "📝 Progress will be shown below:\n\n"
|
||||
|
||||
# Run scraper with streaming
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
# Run converter in-process
|
||||
try:
|
||||
if is_unified:
|
||||
from skill_seekers.cli.unified_scraper import UnifiedScraper
|
||||
|
||||
# Clean up temporary config
|
||||
if unlimited and Path(config_to_use).exists():
|
||||
Path(config_to_use).unlink()
|
||||
converter = UnifiedScraper(config_to_use, merge_mode=merge_mode)
|
||||
else:
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
output = progress_msg + stdout
|
||||
# For legacy format, detect type from config keys
|
||||
with open(config_to_use) as f:
|
||||
config_to_pass = json.load(f)
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
error_output = output + f"\n\n❌ Error:\n{stderr}"
|
||||
return [TextContent(type="text", text=error_output)]
|
||||
# Detect source type from config content
|
||||
if "base_url" in config_to_pass:
|
||||
source_type = "web"
|
||||
elif "repo" in config_to_pass:
|
||||
source_type = "github"
|
||||
elif "pdf_path" in config_to_pass:
|
||||
source_type = "pdf"
|
||||
elif "directory" in config_to_pass:
|
||||
source_type = "local"
|
||||
else:
|
||||
source_type = "web" # default fallback
|
||||
|
||||
converter = get_converter(source_type, config_to_pass)
|
||||
if dry_run:
|
||||
converter.dry_run = True
|
||||
|
||||
result = _run_converter(converter, progress_msg)
|
||||
finally:
|
||||
# Clean up temporary config
|
||||
if unlimited and Path(config_to_use).exists():
|
||||
Path(config_to_use).unlink()
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def scrape_pdf_tool(args: dict) -> list[TextContent]:
|
||||
@@ -318,44 +333,48 @@ async def scrape_pdf_tool(args: dict) -> list[TextContent]:
|
||||
description = args.get("description")
|
||||
from_json = args.get("from_json")
|
||||
|
||||
# Build command
|
||||
cmd = [sys.executable, str(CLI_DIR / "pdf_scraper.py")]
|
||||
progress_msg = "📄 Scraping PDF documentation...\n\n"
|
||||
|
||||
# Mode 1: Config file
|
||||
if config_path:
|
||||
cmd.extend(["--config", config_path])
|
||||
with open(config_path) as f:
|
||||
pdf_config = json.load(f)
|
||||
|
||||
# Mode 2: Direct PDF
|
||||
elif pdf_path and name:
|
||||
cmd.extend(["--pdf", pdf_path, "--name", name])
|
||||
pdf_config = {"name": name, "pdf_path": pdf_path}
|
||||
if description:
|
||||
cmd.extend(["--description", description])
|
||||
pdf_config["description"] = description
|
||||
|
||||
# Mode 3: From JSON
|
||||
# Mode 3: From JSON — use PDFToSkillConverter.load_extracted_data
|
||||
elif from_json:
|
||||
cmd.extend(["--from-json", from_json])
|
||||
from skill_seekers.cli.pdf_scraper import PDFToSkillConverter
|
||||
|
||||
# Build a minimal config; name is derived from the JSON filename
|
||||
json_name = Path(from_json).stem.replace("_extracted", "")
|
||||
pdf_config = {"name": json_name}
|
||||
converter = PDFToSkillConverter(pdf_config)
|
||||
converter.load_extracted_data(from_json)
|
||||
converter.build_skill()
|
||||
return [
|
||||
TextContent(
|
||||
type="text",
|
||||
text=f"{progress_msg}✅ Skill built from extracted JSON: {from_json}",
|
||||
)
|
||||
]
|
||||
|
||||
else:
|
||||
return [
|
||||
TextContent(
|
||||
type="text", text="❌ Error: Must specify --config, --pdf + --name, or --from-json"
|
||||
type="text",
|
||||
text="❌ Error: Must specify --config, --pdf + --name, or --from-json",
|
||||
)
|
||||
]
|
||||
|
||||
# Run pdf_scraper.py with streaming (can take a while)
|
||||
timeout = 600 # 10 minutes for PDF extraction
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
progress_msg = "📄 Scraping PDF documentation...\n"
|
||||
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
|
||||
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
|
||||
output = progress_msg + stdout
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
|
||||
converter = get_converter("pdf", pdf_config)
|
||||
return _run_converter(converter, progress_msg)
|
||||
|
||||
|
||||
async def scrape_video_tool(args: dict) -> list[TextContent]:
|
||||
@@ -411,29 +430,23 @@ async def scrape_video_tool(args: dict) -> list[TextContent]:
|
||||
start_time = args.get("start_time")
|
||||
end_time = args.get("end_time")
|
||||
|
||||
# Build command
|
||||
cmd = [sys.executable, str(CLI_DIR / "video_scraper.py")]
|
||||
# Build config dict for the converter
|
||||
video_config: dict = {}
|
||||
|
||||
if from_json:
|
||||
cmd.extend(["--from-json", from_json])
|
||||
video_config["from_json"] = from_json
|
||||
video_config["name"] = name or Path(from_json).stem.replace("_video_extracted", "")
|
||||
elif url:
|
||||
cmd.extend(["--url", url])
|
||||
if name:
|
||||
cmd.extend(["--name", name])
|
||||
if description:
|
||||
cmd.extend(["--description", description])
|
||||
if languages:
|
||||
cmd.extend(["--languages", languages])
|
||||
video_config["url"] = url
|
||||
if not name:
|
||||
return [TextContent(type="text", text="❌ Error: --name is required with --url")]
|
||||
video_config["name"] = name
|
||||
elif video_file:
|
||||
cmd.extend(["--video-file", video_file])
|
||||
if name:
|
||||
cmd.extend(["--name", name])
|
||||
if description:
|
||||
cmd.extend(["--description", description])
|
||||
video_config["video_file"] = video_file
|
||||
video_config["name"] = name or Path(video_file).stem
|
||||
elif playlist:
|
||||
cmd.extend(["--playlist", playlist])
|
||||
if name:
|
||||
cmd.extend(["--name", name])
|
||||
video_config["playlist"] = playlist
|
||||
video_config["name"] = name or "playlist"
|
||||
else:
|
||||
return [
|
||||
TextContent(
|
||||
@@ -442,38 +455,31 @@ async def scrape_video_tool(args: dict) -> list[TextContent]:
|
||||
)
|
||||
]
|
||||
|
||||
# Visual extraction parameters
|
||||
if visual:
|
||||
cmd.append("--visual")
|
||||
if description:
|
||||
video_config["description"] = description
|
||||
if languages:
|
||||
video_config["languages"] = languages
|
||||
video_config["visual"] = visual
|
||||
if whisper_model:
|
||||
cmd.extend(["--whisper-model", whisper_model])
|
||||
video_config["whisper_model"] = whisper_model
|
||||
if visual_interval is not None:
|
||||
cmd.extend(["--visual-interval", str(visual_interval)])
|
||||
video_config["visual_interval"] = visual_interval
|
||||
if visual_min_gap is not None:
|
||||
cmd.extend(["--visual-min-gap", str(visual_min_gap)])
|
||||
video_config["visual_min_gap"] = visual_min_gap
|
||||
if visual_similarity is not None:
|
||||
cmd.extend(["--visual-similarity", str(visual_similarity)])
|
||||
if vision_ocr:
|
||||
cmd.append("--vision-ocr")
|
||||
video_config["visual_similarity"] = visual_similarity
|
||||
video_config["vision_ocr"] = vision_ocr
|
||||
if start_time:
|
||||
cmd.extend(["--start-time", str(start_time)])
|
||||
video_config["start_time"] = start_time
|
||||
if end_time:
|
||||
cmd.extend(["--end-time", str(end_time)])
|
||||
video_config["end_time"] = end_time
|
||||
|
||||
# Run video_scraper.py with streaming
|
||||
timeout = 600 # 10 minutes for video extraction
|
||||
progress_msg = "🎬 Scraping video content...\n\n"
|
||||
|
||||
progress_msg = "🎬 Scraping video content...\n"
|
||||
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
|
||||
output = progress_msg + stdout
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
|
||||
converter = get_converter("video", video_config)
|
||||
return _run_converter(converter, progress_msg)
|
||||
|
||||
|
||||
async def scrape_github_tool(args: dict) -> list[TextContent]:
|
||||
@@ -510,50 +516,37 @@ async def scrape_github_tool(args: dict) -> list[TextContent]:
|
||||
max_issues = args.get("max_issues", 100)
|
||||
scrape_only = args.get("scrape_only", False)
|
||||
|
||||
# Build command
|
||||
cmd = [sys.executable, str(CLI_DIR / "github_scraper.py")]
|
||||
|
||||
# Mode 1: Config file
|
||||
# Build config dict for the converter
|
||||
if config_path:
|
||||
cmd.extend(["--config", config_path])
|
||||
|
||||
# Mode 2: Direct repo
|
||||
with open(config_path) as f:
|
||||
github_config = json.load(f)
|
||||
elif repo:
|
||||
cmd.extend(["--repo", repo])
|
||||
github_config: dict = {"repo": repo}
|
||||
if name:
|
||||
cmd.extend(["--name", name])
|
||||
github_config["name"] = name
|
||||
if description:
|
||||
cmd.extend(["--description", description])
|
||||
github_config["description"] = description
|
||||
if token:
|
||||
cmd.extend(["--token", token])
|
||||
github_config["token"] = token
|
||||
if no_issues:
|
||||
cmd.append("--no-issues")
|
||||
github_config["no_issues"] = True
|
||||
if no_changelog:
|
||||
cmd.append("--no-changelog")
|
||||
github_config["no_changelog"] = True
|
||||
if no_releases:
|
||||
cmd.append("--no-releases")
|
||||
github_config["no_releases"] = True
|
||||
if max_issues != 100:
|
||||
cmd.extend(["--max-issues", str(max_issues)])
|
||||
github_config["max_issues"] = max_issues
|
||||
if scrape_only:
|
||||
cmd.append("--scrape-only")
|
||||
|
||||
github_config["scrape_only"] = True
|
||||
else:
|
||||
return [TextContent(type="text", text="❌ Error: Must specify --repo or --config")]
|
||||
|
||||
# Run github_scraper.py with streaming (can take a while)
|
||||
timeout = 600 # 10 minutes for GitHub scraping
|
||||
progress_msg = "🐙 Scraping GitHub repository...\n\n"
|
||||
|
||||
progress_msg = "🐙 Scraping GitHub repository...\n"
|
||||
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
|
||||
output = progress_msg + stdout
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
|
||||
converter = get_converter("github", github_config)
|
||||
return _run_converter(converter, progress_msg)
|
||||
|
||||
|
||||
async def scrape_codebase_tool(args: dict) -> list[TextContent]:
|
||||
@@ -605,7 +598,7 @@ async def scrape_codebase_tool(args: dict) -> list[TextContent]:
|
||||
if not directory:
|
||||
return [TextContent(type="text", text="❌ Error: directory parameter is required")]
|
||||
|
||||
output = args.get("output", "output/codebase/")
|
||||
output_dir = args.get("output", "output/codebase/")
|
||||
depth = args.get("depth", "deep")
|
||||
languages = args.get("languages", "")
|
||||
file_patterns = args.get("file_patterns", "")
|
||||
@@ -620,43 +613,28 @@ async def scrape_codebase_tool(args: dict) -> list[TextContent]:
|
||||
skip_config_patterns = args.get("skip_config_patterns", False)
|
||||
skip_docs = args.get("skip_docs", False)
|
||||
|
||||
# Build command
|
||||
cmd = [sys.executable, "-m", "skill_seekers.cli.codebase_scraper"]
|
||||
cmd.extend(["--directory", directory])
|
||||
# Derive a name from the directory for the converter
|
||||
dir_name = Path(directory).resolve().name or "codebase"
|
||||
|
||||
if output:
|
||||
cmd.extend(["--output", output])
|
||||
if depth:
|
||||
cmd.extend(["--depth", depth])
|
||||
# Build config dict for CodebaseAnalyzer
|
||||
codebase_config: dict = {
|
||||
"name": dir_name,
|
||||
"directory": directory,
|
||||
"output_dir": output_dir,
|
||||
"depth": depth,
|
||||
"enhance_level": enhance_level,
|
||||
"build_api_reference": not skip_api_reference,
|
||||
"build_dependency_graph": not skip_dependency_graph,
|
||||
"detect_patterns": not skip_patterns,
|
||||
"extract_test_examples": not skip_test_examples,
|
||||
"build_how_to_guides": not skip_how_to_guides,
|
||||
"extract_config_patterns": not skip_config_patterns,
|
||||
"extract_docs": not skip_docs,
|
||||
}
|
||||
if languages:
|
||||
cmd.extend(["--languages", languages])
|
||||
codebase_config["languages"] = languages
|
||||
if file_patterns:
|
||||
cmd.extend(["--file-patterns", file_patterns])
|
||||
if enhance_level > 0:
|
||||
cmd.extend(["--enhance-level", str(enhance_level)])
|
||||
|
||||
# Skip flags
|
||||
if skip_api_reference:
|
||||
cmd.append("--skip-api-reference")
|
||||
if skip_dependency_graph:
|
||||
cmd.append("--skip-dependency-graph")
|
||||
if skip_patterns:
|
||||
cmd.append("--skip-patterns")
|
||||
if skip_test_examples:
|
||||
cmd.append("--skip-test-examples")
|
||||
if skip_how_to_guides:
|
||||
cmd.append("--skip-how-to-guides")
|
||||
if skip_config_patterns:
|
||||
cmd.append("--skip-config-patterns")
|
||||
if skip_docs:
|
||||
cmd.append("--skip-docs")
|
||||
|
||||
# Adjust timeout based on enhance_level
|
||||
timeout = 600 # 10 minutes base
|
||||
if enhance_level >= 2:
|
||||
timeout = 1200 # 20 minutes with AI enhancement
|
||||
if enhance_level >= 3:
|
||||
timeout = 3600 # 60 minutes for full enhancement
|
||||
codebase_config["file_patterns"] = file_patterns
|
||||
|
||||
level_names = {0: "off", 1: "SKILL.md only", 2: "standard", 3: "full"}
|
||||
progress_msg = "🔍 Analyzing local codebase...\n"
|
||||
@@ -664,16 +642,12 @@ async def scrape_codebase_tool(args: dict) -> list[TextContent]:
|
||||
progress_msg += f"📊 Depth: {depth}\n"
|
||||
if enhance_level > 0:
|
||||
progress_msg += f"🤖 AI Enhancement: Level {enhance_level} ({level_names.get(enhance_level, 'unknown')})\n"
|
||||
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
|
||||
progress_msg += "\n"
|
||||
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
output_text = progress_msg + stdout
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output_text)]
|
||||
else:
|
||||
return [TextContent(type="text", text=f"{output_text}\n\n❌ Error:\n{stderr}")]
|
||||
converter = get_converter("local", codebase_config)
|
||||
return _run_converter(converter, progress_msg)
|
||||
|
||||
|
||||
async def detect_patterns_tool(args: dict) -> list[TextContent]:
|
||||
@@ -1096,51 +1070,35 @@ async def scrape_generic_tool(args: dict) -> list[TextContent]:
|
||||
)
|
||||
]
|
||||
|
||||
# Build the subprocess command
|
||||
# Map source type to module name (most are <type>_scraper, but some differ)
|
||||
_MODULE_NAMES = {
|
||||
"manpage": "man_scraper",
|
||||
# Build config dict for the converter — map MCP args to the keys
|
||||
# each converter expects in its __init__.
|
||||
_CONFIG_KEY: dict[str, str] = {
|
||||
"jupyter": "notebook_path",
|
||||
"html": "html_path",
|
||||
"openapi": "spec_path",
|
||||
"asciidoc": "asciidoc_path",
|
||||
"pptx": "pptx_path",
|
||||
"manpage": "man_path",
|
||||
"confluence": "export_path",
|
||||
"notion": "export_path",
|
||||
"rss": "feed_path",
|
||||
"chat": "export_path",
|
||||
}
|
||||
module_name = _MODULE_NAMES.get(source_type, f"{source_type}_scraper")
|
||||
cmd = [sys.executable, "-m", f"skill_seekers.cli.{module_name}"]
|
||||
|
||||
# Map source type to the correct CLI flag for file/path input and URL input.
|
||||
# Each scraper has its own flag name — using a generic --path or --url would fail.
|
||||
_PATH_FLAGS: dict[str, str] = {
|
||||
"jupyter": "--notebook",
|
||||
"html": "--html-path",
|
||||
"openapi": "--spec",
|
||||
"asciidoc": "--asciidoc-path",
|
||||
"pptx": "--pptx",
|
||||
"manpage": "--man-path",
|
||||
"confluence": "--export-path",
|
||||
"notion": "--export-path",
|
||||
"rss": "--feed-path",
|
||||
"chat": "--export-path",
|
||||
}
|
||||
_URL_FLAGS: dict[str, str] = {
|
||||
"confluence": "--base-url",
|
||||
"notion": "--page-id",
|
||||
"rss": "--feed-url",
|
||||
"openapi": "--spec-url",
|
||||
_URL_CONFIG_KEY: dict[str, str] = {
|
||||
"confluence": "base_url",
|
||||
"notion": "page_id",
|
||||
"rss": "feed_url",
|
||||
"openapi": "spec_url",
|
||||
}
|
||||
|
||||
# Determine the input flag based on source type
|
||||
config: dict = {"name": name}
|
||||
|
||||
if source_type in _URL_BASED_TYPES and url:
|
||||
url_flag = _URL_FLAGS.get(source_type, "--url")
|
||||
cmd.extend([url_flag, url])
|
||||
config[_URL_CONFIG_KEY.get(source_type, "url")] = url
|
||||
elif path:
|
||||
path_flag = _PATH_FLAGS.get(source_type, "--path")
|
||||
cmd.extend([path_flag, path])
|
||||
config[_CONFIG_KEY.get(source_type, "path")] = path
|
||||
elif url:
|
||||
# Allow url fallback for file-based types (some may accept URLs too)
|
||||
url_flag = _URL_FLAGS.get(source_type, "--url")
|
||||
cmd.extend([url_flag, url])
|
||||
|
||||
cmd.extend(["--name", name])
|
||||
|
||||
# Set a reasonable timeout
|
||||
timeout = 600 # 10 minutes
|
||||
config[_URL_CONFIG_KEY.get(source_type, "url")] = url
|
||||
|
||||
emoji = _SOURCE_EMOJIS.get(source_type, "🔧")
|
||||
progress_msg = f"{emoji} Scraping {source_type} source...\n"
|
||||
@@ -1148,14 +1106,9 @@ async def scrape_generic_tool(args: dict) -> list[TextContent]:
|
||||
progress_msg += f"📁 Path: {path}\n"
|
||||
if url:
|
||||
progress_msg += f"🔗 URL: {url}\n"
|
||||
progress_msg += f"📛 Name: {name}\n"
|
||||
progress_msg += f"⏱️ Maximum time: {timeout // 60} minutes\n\n"
|
||||
progress_msg += f"📛 Name: {name}\n\n"
|
||||
|
||||
stdout, stderr, returncode = run_subprocess_with_streaming(cmd, timeout=timeout)
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
output = progress_msg + stdout
|
||||
|
||||
if returncode == 0:
|
||||
return [TextContent(type="text", text=output)]
|
||||
else:
|
||||
return [TextContent(type="text", text=f"{output}\n\n❌ Error:\n{stderr}")]
|
||||
converter = get_converter(source_type, config)
|
||||
return _run_converter(converter, progress_msg)
|
||||
|
||||
@@ -29,3 +29,17 @@ def pytest_configure(config): # noqa: ARG001
|
||||
def anyio_backend():
|
||||
"""Override anyio backend to only use asyncio (not trio)."""
|
||||
return "asyncio"
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_execution_context():
|
||||
"""Reset the ExecutionContext singleton before and after every test.
|
||||
|
||||
Without this, a test that calls ExecutionContext.initialize() poisons
|
||||
all subsequent tests in the same process.
|
||||
"""
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
ExecutionContext.reset()
|
||||
yield
|
||||
ExecutionContext.reset()
|
||||
|
||||
@@ -1,263 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Tests for analyze subcommand integration in main CLI."""
|
||||
|
||||
import sys
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
|
||||
class TestAnalyzeSubcommand(unittest.TestCase):
|
||||
"""Test analyze subcommand registration and argument parsing."""
|
||||
|
||||
def setUp(self):
|
||||
"""Create parser for testing."""
|
||||
self.parser = create_parser()
|
||||
|
||||
def test_analyze_subcommand_exists(self):
|
||||
"""Test that analyze subcommand is registered."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", "."])
|
||||
self.assertEqual(args.command, "analyze")
|
||||
self.assertEqual(args.directory, ".")
|
||||
|
||||
def test_analyze_with_output_directory(self):
|
||||
"""Test analyze with custom output directory."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--output", "custom/"])
|
||||
self.assertEqual(args.output, "custom/")
|
||||
|
||||
def test_quick_preset_flag(self):
|
||||
"""Test --quick preset flag parsing."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--quick"])
|
||||
self.assertTrue(args.quick)
|
||||
self.assertFalse(args.comprehensive)
|
||||
|
||||
def test_comprehensive_preset_flag(self):
|
||||
"""Test --comprehensive preset flag parsing."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--comprehensive"])
|
||||
self.assertTrue(args.comprehensive)
|
||||
self.assertFalse(args.quick)
|
||||
|
||||
def test_quick_and_comprehensive_mutually_exclusive(self):
|
||||
"""Test that both flags can be parsed (mutual exclusion enforced at runtime)."""
|
||||
# The parser allows both flags; runtime logic prevents simultaneous use
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--quick", "--comprehensive"])
|
||||
self.assertTrue(args.quick)
|
||||
self.assertTrue(args.comprehensive)
|
||||
# Note: Runtime will catch this and return error code 1
|
||||
|
||||
def test_enhance_level_flag(self):
|
||||
"""Test --enhance-level flag parsing."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--enhance-level", "2"])
|
||||
self.assertEqual(args.enhance_level, 2)
|
||||
|
||||
def test_skip_flags_passed_through(self):
|
||||
"""Test that skip flags are recognized."""
|
||||
args = self.parser.parse_args(
|
||||
["analyze", "--directory", ".", "--skip-patterns", "--skip-test-examples"]
|
||||
)
|
||||
self.assertTrue(args.skip_patterns)
|
||||
self.assertTrue(args.skip_test_examples)
|
||||
|
||||
def test_all_skip_flags(self):
|
||||
"""Test all skip flags are properly parsed."""
|
||||
args = self.parser.parse_args(
|
||||
[
|
||||
"analyze",
|
||||
"--directory",
|
||||
".",
|
||||
"--skip-api-reference",
|
||||
"--skip-dependency-graph",
|
||||
"--skip-patterns",
|
||||
"--skip-test-examples",
|
||||
"--skip-how-to-guides",
|
||||
"--skip-config-patterns",
|
||||
"--skip-docs",
|
||||
]
|
||||
)
|
||||
self.assertTrue(args.skip_api_reference)
|
||||
self.assertTrue(args.skip_dependency_graph)
|
||||
self.assertTrue(args.skip_patterns)
|
||||
self.assertTrue(args.skip_test_examples)
|
||||
self.assertTrue(args.skip_how_to_guides)
|
||||
self.assertTrue(args.skip_config_patterns)
|
||||
self.assertTrue(args.skip_docs)
|
||||
|
||||
def test_backward_compatible_depth_flag(self):
|
||||
"""Test that deprecated --depth flag still works."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--depth", "full"])
|
||||
self.assertEqual(args.depth, "full")
|
||||
|
||||
def test_depth_flag_choices(self):
|
||||
"""Test that depth flag accepts correct values."""
|
||||
for depth in ["surface", "deep", "full"]:
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--depth", depth])
|
||||
self.assertEqual(args.depth, depth)
|
||||
|
||||
def test_languages_flag(self):
|
||||
"""Test languages flag parsing."""
|
||||
args = self.parser.parse_args(
|
||||
["analyze", "--directory", ".", "--languages", "Python,JavaScript"]
|
||||
)
|
||||
self.assertEqual(args.languages, "Python,JavaScript")
|
||||
|
||||
def test_file_patterns_flag(self):
|
||||
"""Test file patterns flag parsing."""
|
||||
args = self.parser.parse_args(
|
||||
["analyze", "--directory", ".", "--file-patterns", "*.py,src/**/*.js"]
|
||||
)
|
||||
self.assertEqual(args.file_patterns, "*.py,src/**/*.js")
|
||||
|
||||
def test_no_comments_flag(self):
|
||||
"""Test no-comments flag parsing."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--no-comments"])
|
||||
self.assertTrue(args.no_comments)
|
||||
|
||||
def test_verbose_flag(self):
|
||||
"""Test verbose flag parsing."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--verbose"])
|
||||
self.assertTrue(args.verbose)
|
||||
|
||||
def test_complex_command_combination(self):
|
||||
"""Test complex command with multiple flags."""
|
||||
args = self.parser.parse_args(
|
||||
[
|
||||
"analyze",
|
||||
"--directory",
|
||||
"./src",
|
||||
"--output",
|
||||
"analysis/",
|
||||
"--quick",
|
||||
"--languages",
|
||||
"Python",
|
||||
"--skip-patterns",
|
||||
"--verbose",
|
||||
]
|
||||
)
|
||||
self.assertEqual(args.directory, "./src")
|
||||
self.assertEqual(args.output, "analysis/")
|
||||
self.assertTrue(args.quick)
|
||||
self.assertEqual(args.languages, "Python")
|
||||
self.assertTrue(args.skip_patterns)
|
||||
self.assertTrue(args.verbose)
|
||||
|
||||
def test_directory_is_required(self):
|
||||
"""Test that directory argument is required."""
|
||||
with self.assertRaises(SystemExit):
|
||||
self.parser.parse_args(["analyze"])
|
||||
|
||||
def test_default_output_directory(self):
|
||||
"""Test default output directory value."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", "."])
|
||||
self.assertEqual(args.output, "output/codebase/")
|
||||
|
||||
|
||||
class TestAnalyzePresetBehavior(unittest.TestCase):
|
||||
"""Test preset flag behavior and argument transformation."""
|
||||
|
||||
def setUp(self):
|
||||
"""Create parser for testing."""
|
||||
self.parser = create_parser()
|
||||
|
||||
def test_quick_preset_implies_surface_depth(self):
|
||||
"""Test that --quick preset should trigger surface depth."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--quick"])
|
||||
self.assertTrue(args.quick)
|
||||
# Note: Depth transformation happens in dispatch handler
|
||||
|
||||
def test_comprehensive_preset_implies_full_depth(self):
|
||||
"""Test that --comprehensive preset should trigger full depth."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--comprehensive"])
|
||||
self.assertTrue(args.comprehensive)
|
||||
# Note: Depth transformation happens in dispatch handler
|
||||
|
||||
def test_enhance_level_standalone(self):
|
||||
"""Test --enhance-level can be used without presets."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--enhance-level", "3"])
|
||||
self.assertEqual(args.enhance_level, 3)
|
||||
self.assertFalse(args.quick)
|
||||
self.assertFalse(args.comprehensive)
|
||||
|
||||
|
||||
class TestAnalyzeWorkflowFlags(unittest.TestCase):
|
||||
"""Test workflow and parity flags added to the analyze subcommand."""
|
||||
|
||||
def setUp(self):
|
||||
"""Create parser for testing."""
|
||||
self.parser = create_parser()
|
||||
|
||||
def test_enhance_workflow_accepted_as_list(self):
|
||||
"""Test --enhance-workflow is accepted and stored as a list."""
|
||||
args = self.parser.parse_args(
|
||||
["analyze", "--directory", ".", "--enhance-workflow", "security-focus"]
|
||||
)
|
||||
self.assertEqual(args.enhance_workflow, ["security-focus"])
|
||||
|
||||
def test_enhance_workflow_chained_twice(self):
|
||||
"""Test --enhance-workflow can be chained to produce a two-item list."""
|
||||
args = self.parser.parse_args(
|
||||
[
|
||||
"analyze",
|
||||
"--directory",
|
||||
".",
|
||||
"--enhance-workflow",
|
||||
"security-focus",
|
||||
"--enhance-workflow",
|
||||
"minimal",
|
||||
]
|
||||
)
|
||||
self.assertEqual(args.enhance_workflow, ["security-focus", "minimal"])
|
||||
|
||||
def test_enhance_stage_accepted_as_list(self):
|
||||
"""Test --enhance-stage is accepted with action=append."""
|
||||
args = self.parser.parse_args(
|
||||
["analyze", "--directory", ".", "--enhance-stage", "sec:Analyze security"]
|
||||
)
|
||||
self.assertEqual(args.enhance_stage, ["sec:Analyze security"])
|
||||
|
||||
def test_var_accepted_as_list(self):
|
||||
"""Test --var is accepted with action=append (dest is 'var')."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--var", "focus=performance"])
|
||||
self.assertEqual(args.var, ["focus=performance"])
|
||||
|
||||
def test_workflow_dry_run_flag(self):
|
||||
"""Test --workflow-dry-run sets the flag."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--workflow-dry-run"])
|
||||
self.assertTrue(args.workflow_dry_run)
|
||||
|
||||
def test_api_key_stored_correctly(self):
|
||||
"""Test --api-key is stored in args."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--api-key", "sk-ant-test"])
|
||||
self.assertEqual(args.api_key, "sk-ant-test")
|
||||
|
||||
def test_dry_run_stored_correctly(self):
|
||||
"""Test --dry-run is stored in args."""
|
||||
args = self.parser.parse_args(["analyze", "--directory", ".", "--dry-run"])
|
||||
self.assertTrue(args.dry_run)
|
||||
|
||||
def test_workflow_flags_combined(self):
|
||||
"""Test workflow flags can be combined with other analyze flags."""
|
||||
args = self.parser.parse_args(
|
||||
[
|
||||
"analyze",
|
||||
"--directory",
|
||||
".",
|
||||
"--enhance-workflow",
|
||||
"security-focus",
|
||||
"--api-key",
|
||||
"sk-ant-test",
|
||||
"--dry-run",
|
||||
"--enhance-level",
|
||||
"1",
|
||||
]
|
||||
)
|
||||
self.assertEqual(args.enhance_workflow, ["security-focus"])
|
||||
self.assertEqual(args.api_key, "sk-ant-test")
|
||||
self.assertTrue(args.dry_run)
|
||||
self.assertEqual(args.enhance_level, 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -1,344 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
End-to-End tests for the new 'analyze' command.
|
||||
Tests real-world usage scenarios with actual command execution.
|
||||
"""
|
||||
|
||||
import json
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
|
||||
|
||||
class TestAnalyzeCommandE2E(unittest.TestCase):
|
||||
"""End-to-end tests for skill-seekers analyze command."""
|
||||
|
||||
@classmethod
|
||||
def setUpClass(cls):
|
||||
"""Set up test fixtures once for all tests."""
|
||||
cls.test_dir = Path(tempfile.mkdtemp(prefix="analyze_e2e_"))
|
||||
cls.create_sample_codebase()
|
||||
|
||||
@classmethod
|
||||
def tearDownClass(cls):
|
||||
"""Clean up test directory."""
|
||||
if cls.test_dir.exists():
|
||||
shutil.rmtree(cls.test_dir)
|
||||
|
||||
@classmethod
|
||||
def create_sample_codebase(cls):
|
||||
"""Create a sample Python codebase for testing."""
|
||||
# Create directory structure
|
||||
(cls.test_dir / "src").mkdir()
|
||||
(cls.test_dir / "tests").mkdir()
|
||||
|
||||
# Create sample Python files
|
||||
(cls.test_dir / "src" / "__init__.py").write_text("")
|
||||
|
||||
(cls.test_dir / "src" / "main.py").write_text('''
|
||||
"""Main application module."""
|
||||
|
||||
class Application:
|
||||
"""Main application class."""
|
||||
|
||||
def __init__(self, name: str):
|
||||
"""Initialize application.
|
||||
|
||||
Args:
|
||||
name: Application name
|
||||
"""
|
||||
self.name = name
|
||||
|
||||
def run(self):
|
||||
"""Run the application."""
|
||||
print(f"Running {self.name}")
|
||||
return True
|
||||
''')
|
||||
|
||||
(cls.test_dir / "tests" / "test_main.py").write_text('''
|
||||
"""Tests for main module."""
|
||||
import unittest
|
||||
from src.main import Application
|
||||
|
||||
class TestApplication(unittest.TestCase):
|
||||
"""Test Application class."""
|
||||
|
||||
def test_init(self):
|
||||
"""Test application initialization."""
|
||||
app = Application("Test")
|
||||
self.assertEqual(app.name, "Test")
|
||||
|
||||
def test_run(self):
|
||||
"""Test application run."""
|
||||
app = Application("Test")
|
||||
self.assertTrue(app.run())
|
||||
''')
|
||||
|
||||
def run_command(self, *args, timeout=120):
|
||||
"""Run skill-seekers command and return result."""
|
||||
cmd = ["skill-seekers"] + list(args)
|
||||
result = subprocess.run(
|
||||
cmd, capture_output=True, text=True, timeout=timeout, cwd=str(self.test_dir)
|
||||
)
|
||||
return result
|
||||
|
||||
def test_analyze_help_shows_command(self):
|
||||
"""Test that analyze command appears in main help."""
|
||||
result = self.run_command("--help", timeout=5)
|
||||
self.assertEqual(result.returncode, 0, f"Help failed: {result.stderr}")
|
||||
self.assertIn("analyze", result.stdout)
|
||||
self.assertIn("Analyze local codebase", result.stdout)
|
||||
|
||||
def test_analyze_subcommand_help(self):
|
||||
"""Test that analyze subcommand has proper help."""
|
||||
result = self.run_command("analyze", "--help", timeout=5)
|
||||
self.assertEqual(result.returncode, 0, f"Analyze help failed: {result.stderr}")
|
||||
self.assertIn("--quick", result.stdout)
|
||||
self.assertIn("--comprehensive", result.stdout)
|
||||
self.assertIn("--enhance", result.stdout)
|
||||
self.assertIn("--directory", result.stdout)
|
||||
|
||||
def test_analyze_quick_preset(self):
|
||||
"""Test quick analysis preset (real execution)."""
|
||||
output_dir = self.test_dir / "output_quick"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze", "--directory", str(self.test_dir), "--output", str(output_dir), "--quick"
|
||||
)
|
||||
|
||||
# Check command succeeded
|
||||
self.assertEqual(
|
||||
result.returncode,
|
||||
0,
|
||||
f"Quick analysis failed:\nSTDOUT: {result.stdout}\nSTDERR: {result.stderr}",
|
||||
)
|
||||
|
||||
# Verify output directory was created
|
||||
self.assertTrue(output_dir.exists(), "Output directory not created")
|
||||
|
||||
# Verify SKILL.md was generated
|
||||
skill_file = output_dir / "SKILL.md"
|
||||
self.assertTrue(skill_file.exists(), "SKILL.md not generated")
|
||||
|
||||
# Verify SKILL.md has content and valid structure
|
||||
skill_content = skill_file.read_text()
|
||||
self.assertGreater(len(skill_content), 100, "SKILL.md is too short")
|
||||
|
||||
# Check for expected structure (works even with 0 files analyzed)
|
||||
self.assertIn("Codebase", skill_content, "Missing codebase header")
|
||||
self.assertIn("Analysis", skill_content, "Missing analysis section")
|
||||
|
||||
# Verify it's valid markdown with frontmatter
|
||||
self.assertTrue(skill_content.startswith("---"), "Missing YAML frontmatter")
|
||||
self.assertIn("name:", skill_content, "Missing name in frontmatter")
|
||||
|
||||
def test_analyze_with_custom_output(self):
|
||||
"""Test analysis with custom output directory."""
|
||||
output_dir = self.test_dir / "custom_output"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze", "--directory", str(self.test_dir), "--output", str(output_dir), "--quick"
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Analysis failed: {result.stderr}")
|
||||
self.assertTrue(output_dir.exists(), "Custom output directory not created")
|
||||
self.assertTrue((output_dir / "SKILL.md").exists(), "SKILL.md not in custom directory")
|
||||
|
||||
def test_analyze_skip_flags_work(self):
|
||||
"""Test that skip flags are properly handled."""
|
||||
output_dir = self.test_dir / "output_skip"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze",
|
||||
"--directory",
|
||||
str(self.test_dir),
|
||||
"--output",
|
||||
str(output_dir),
|
||||
"--quick",
|
||||
"--skip-patterns",
|
||||
"--skip-test-examples",
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Analysis with skip flags failed: {result.stderr}")
|
||||
self.assertTrue(
|
||||
(output_dir / "SKILL.md").exists(), "SKILL.md not generated with skip flags"
|
||||
)
|
||||
|
||||
def test_analyze_invalid_directory(self):
|
||||
"""Test analysis with non-existent directory."""
|
||||
result = self.run_command(
|
||||
"analyze", "--directory", "/nonexistent/directory/path", "--quick", timeout=10
|
||||
)
|
||||
|
||||
# Should fail with error
|
||||
self.assertNotEqual(result.returncode, 0, "Should fail with invalid directory")
|
||||
self.assertTrue(
|
||||
"not found" in result.stderr.lower() or "does not exist" in result.stderr.lower(),
|
||||
f"Expected directory error, got: {result.stderr}",
|
||||
)
|
||||
|
||||
def test_analyze_missing_directory_arg(self):
|
||||
"""Test that --directory is required."""
|
||||
result = self.run_command("analyze", "--quick", timeout=5)
|
||||
|
||||
# Should fail without --directory
|
||||
self.assertNotEqual(result.returncode, 0, "Should fail without --directory")
|
||||
self.assertTrue(
|
||||
"required" in result.stderr.lower() or "directory" in result.stderr.lower(),
|
||||
f"Expected missing argument error, got: {result.stderr}",
|
||||
)
|
||||
|
||||
def test_backward_compatibility_depth_flag(self):
|
||||
"""Test that old --depth flag still works."""
|
||||
output_dir = self.test_dir / "output_depth"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze",
|
||||
"--directory",
|
||||
str(self.test_dir),
|
||||
"--output",
|
||||
str(output_dir),
|
||||
"--depth",
|
||||
"surface",
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Depth flag failed: {result.stderr}")
|
||||
self.assertTrue((output_dir / "SKILL.md").exists(), "SKILL.md not generated with --depth")
|
||||
|
||||
def test_analyze_generates_references(self):
|
||||
"""Test that references directory is created."""
|
||||
output_dir = self.test_dir / "output_refs"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze", "--directory", str(self.test_dir), "--output", str(output_dir), "--quick"
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Analysis failed: {result.stderr}")
|
||||
|
||||
# Check for references directory
|
||||
refs_dir = output_dir / "references"
|
||||
if refs_dir.exists(): # Optional, depends on content
|
||||
self.assertTrue(refs_dir.is_dir(), "References is not a directory")
|
||||
|
||||
def test_analyze_output_structure(self):
|
||||
"""Test that output has expected structure."""
|
||||
output_dir = self.test_dir / "output_structure"
|
||||
|
||||
result = self.run_command(
|
||||
"analyze", "--directory", str(self.test_dir), "--output", str(output_dir), "--quick"
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Analysis failed: {result.stderr}")
|
||||
|
||||
# Verify expected files/directories
|
||||
self.assertTrue((output_dir / "SKILL.md").exists(), "SKILL.md missing")
|
||||
|
||||
# Check for code_analysis.json if it exists
|
||||
analysis_file = output_dir / "code_analysis.json"
|
||||
if analysis_file.exists():
|
||||
# Verify it's valid JSON
|
||||
with open(analysis_file) as f:
|
||||
data = json.load(f)
|
||||
self.assertIsInstance(data, (dict, list), "code_analysis.json is not valid JSON")
|
||||
|
||||
|
||||
class TestAnalyzeOldCommand(unittest.TestCase):
|
||||
"""Test that old skill-seekers-codebase command still works."""
|
||||
|
||||
def test_old_command_still_exists(self):
|
||||
"""Test that skill-seekers-codebase still exists."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers-codebase", "--help"], capture_output=True, text=True, timeout=5
|
||||
)
|
||||
|
||||
# Command should exist and show help
|
||||
self.assertEqual(result.returncode, 0, f"Old command doesn't work: {result.stderr}")
|
||||
self.assertIn("--directory", result.stdout)
|
||||
|
||||
|
||||
class TestAnalyzeIntegration(unittest.TestCase):
|
||||
"""Integration tests for analyze command with other features."""
|
||||
|
||||
def setUp(self):
|
||||
"""Set up test directory."""
|
||||
self.test_dir = Path(tempfile.mkdtemp(prefix="analyze_int_"))
|
||||
|
||||
# Create minimal Python project
|
||||
(self.test_dir / "main.py").write_text('''
|
||||
def hello():
|
||||
"""Say hello."""
|
||||
return "Hello, World!"
|
||||
''')
|
||||
|
||||
def tearDown(self):
|
||||
"""Clean up test directory."""
|
||||
if self.test_dir.exists():
|
||||
shutil.rmtree(self.test_dir)
|
||||
|
||||
def test_analyze_then_check_output(self):
|
||||
"""Test analyzing and verifying output can be read."""
|
||||
output_dir = self.test_dir / "output"
|
||||
|
||||
# Run analysis
|
||||
result = subprocess.run(
|
||||
[
|
||||
"skill-seekers",
|
||||
"analyze",
|
||||
"--directory",
|
||||
str(self.test_dir),
|
||||
"--output",
|
||||
str(output_dir),
|
||||
"--quick",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Analysis failed: {result.stderr}")
|
||||
|
||||
# Read and verify SKILL.md
|
||||
skill_file = output_dir / "SKILL.md"
|
||||
self.assertTrue(skill_file.exists(), "SKILL.md not created")
|
||||
|
||||
content = skill_file.read_text()
|
||||
# Check for valid structure instead of specific content
|
||||
# (file detection may vary in temp directories)
|
||||
self.assertGreater(len(content), 50, "Output too short")
|
||||
self.assertIn("Codebase", content, "Missing codebase header")
|
||||
self.assertTrue(content.startswith("---"), "Missing YAML frontmatter")
|
||||
|
||||
def test_analyze_verbose_flag(self):
|
||||
"""Test that verbose flag works."""
|
||||
output_dir = self.test_dir / "output"
|
||||
|
||||
result = subprocess.run(
|
||||
[
|
||||
"skill-seekers",
|
||||
"analyze",
|
||||
"--directory",
|
||||
str(self.test_dir),
|
||||
"--output",
|
||||
str(output_dir),
|
||||
"--quick",
|
||||
"--verbose",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
|
||||
self.assertEqual(result.returncode, 0, f"Verbose analysis failed: {result.stderr}")
|
||||
|
||||
# Verbose should produce more output
|
||||
combined_output = result.stdout + result.stderr
|
||||
self.assertGreater(len(combined_output), 100, "Verbose mode didn't produce extra output")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
@@ -37,9 +37,7 @@ class TestBootstrapSkillScript:
|
||||
|
||||
# Must have commands table
|
||||
assert "## Commands" in content, "Header must have Commands section"
|
||||
assert "skill-seekers analyze" in content, "Header must mention analyze command"
|
||||
assert "skill-seekers scrape" in content, "Header must mention scrape command"
|
||||
assert "skill-seekers github" in content, "Header must mention github command"
|
||||
assert "skill-seekers create" in content, "Header must mention create command"
|
||||
|
||||
def test_header_has_yaml_frontmatter(self, project_root):
|
||||
"""Test that header has valid YAML frontmatter."""
|
||||
|
||||
@@ -147,18 +147,31 @@ class TestDocScraperBrowserIntegration:
|
||||
|
||||
|
||||
class TestBrowserArgument:
|
||||
"""Test --browser argument is registered in CLI."""
|
||||
"""Test --browser argument is accepted by DocToSkillConverter config."""
|
||||
|
||||
def test_scrape_parser_accepts_browser_flag(self):
|
||||
from skill_seekers.cli.doc_scraper import setup_argument_parser
|
||||
def test_browser_config_true(self):
|
||||
"""Test that DocToSkillConverter accepts browser=True in config."""
|
||||
from skill_seekers.cli.doc_scraper import DocToSkillConverter
|
||||
|
||||
parser = setup_argument_parser()
|
||||
args = parser.parse_args(["--name", "test", "--url", "https://example.com", "--browser"])
|
||||
assert args.browser is True
|
||||
config = {
|
||||
"name": "test",
|
||||
"base_url": "https://example.com",
|
||||
"browser": True,
|
||||
"selectors": {},
|
||||
"url_patterns": {"include": [], "exclude": []},
|
||||
}
|
||||
scraper = DocToSkillConverter(config)
|
||||
assert scraper.browser_mode is True
|
||||
|
||||
def test_scrape_parser_browser_default_false(self):
|
||||
from skill_seekers.cli.doc_scraper import setup_argument_parser
|
||||
def test_browser_config_default_false(self):
|
||||
"""Test that DocToSkillConverter defaults browser to False."""
|
||||
from skill_seekers.cli.doc_scraper import DocToSkillConverter
|
||||
|
||||
parser = setup_argument_parser()
|
||||
args = parser.parse_args(["--name", "test", "--url", "https://example.com"])
|
||||
assert args.browser is False
|
||||
config = {
|
||||
"name": "test",
|
||||
"base_url": "https://example.com",
|
||||
"selectors": {},
|
||||
"url_patterns": {"include": [], "exclude": []},
|
||||
}
|
||||
scraper = DocToSkillConverter(config)
|
||||
assert scraper.browser_mode is False
|
||||
|
||||
@@ -14,8 +14,6 @@ from skill_seekers.cli.parsers import (
|
||||
get_parser_names,
|
||||
register_parsers,
|
||||
)
|
||||
from skill_seekers.cli.parsers.scrape_parser import ScrapeParser
|
||||
from skill_seekers.cli.parsers.github_parser import GitHubParser
|
||||
from skill_seekers.cli.parsers.package_parser import PackageParser
|
||||
|
||||
|
||||
@@ -24,20 +22,17 @@ class TestParserRegistry:
|
||||
|
||||
def test_all_parsers_registered(self):
|
||||
"""Test that all parsers are registered."""
|
||||
assert len(PARSERS) == 36, f"Expected 36 parsers, got {len(PARSERS)}"
|
||||
assert len(PARSERS) == 18, f"Expected 18 parsers, got {len(PARSERS)}"
|
||||
|
||||
def test_get_parser_names(self):
|
||||
"""Test getting list of parser names."""
|
||||
names = get_parser_names()
|
||||
assert len(names) == 36
|
||||
assert "scrape" in names
|
||||
assert "github" in names
|
||||
assert len(names) == 18
|
||||
assert "create" in names
|
||||
assert "package" in names
|
||||
assert "upload" in names
|
||||
assert "analyze" in names
|
||||
assert "config" in names
|
||||
assert "workflows" in names
|
||||
assert "video" in names
|
||||
|
||||
def test_all_parsers_are_subcommand_parsers(self):
|
||||
"""Test that all parsers inherit from SubcommandParser."""
|
||||
@@ -71,29 +66,6 @@ class TestParserRegistry:
|
||||
class TestParserCreation:
|
||||
"""Test parser creation functionality."""
|
||||
|
||||
def test_scrape_parser_creates_subparser(self):
|
||||
"""Test that ScrapeParser creates valid subparser."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers()
|
||||
|
||||
scrape_parser = ScrapeParser()
|
||||
subparser = scrape_parser.create_parser(subparsers)
|
||||
|
||||
assert subparser is not None
|
||||
assert scrape_parser.name == "scrape"
|
||||
assert scrape_parser.help == "Scrape documentation website"
|
||||
|
||||
def test_github_parser_creates_subparser(self):
|
||||
"""Test that GitHubParser creates valid subparser."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers()
|
||||
|
||||
github_parser = GitHubParser()
|
||||
subparser = github_parser.create_parser(subparsers)
|
||||
|
||||
assert subparser is not None
|
||||
assert github_parser.name == "github"
|
||||
|
||||
def test_package_parser_creates_subparser(self):
|
||||
"""Test that PackageParser creates valid subparser."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
@@ -106,21 +78,18 @@ class TestParserCreation:
|
||||
assert package_parser.name == "package"
|
||||
|
||||
def test_register_parsers_creates_all_subcommands(self):
|
||||
"""Test that register_parsers creates all 19 subcommands."""
|
||||
"""Test that register_parsers creates all subcommands."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers(dest="command")
|
||||
|
||||
# Register all parsers
|
||||
register_parsers(subparsers)
|
||||
|
||||
# Test that all commands can be parsed
|
||||
# Test that existing commands can be parsed
|
||||
test_commands = [
|
||||
"config --show",
|
||||
"scrape --config test.json",
|
||||
"github --repo owner/repo",
|
||||
"package output/test/",
|
||||
"upload test.zip",
|
||||
"analyze --directory .",
|
||||
"enhance output/test/",
|
||||
"estimate test.json",
|
||||
]
|
||||
@@ -133,40 +102,6 @@ class TestParserCreation:
|
||||
class TestSpecificParsers:
|
||||
"""Test specific parser implementations."""
|
||||
|
||||
def test_scrape_parser_arguments(self):
|
||||
"""Test ScrapeParser has correct arguments."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers(dest="command")
|
||||
|
||||
scrape_parser = ScrapeParser()
|
||||
scrape_parser.create_parser(subparsers)
|
||||
|
||||
# Test various argument combinations
|
||||
args = main_parser.parse_args(["scrape", "--config", "test.json"])
|
||||
assert args.command == "scrape"
|
||||
assert args.config == "test.json"
|
||||
|
||||
args = main_parser.parse_args(["scrape", "--config", "test.json", "--max-pages", "100"])
|
||||
assert args.max_pages == 100
|
||||
|
||||
args = main_parser.parse_args(["scrape", "--enhance-level", "2"])
|
||||
assert args.enhance_level == 2
|
||||
|
||||
def test_github_parser_arguments(self):
|
||||
"""Test GitHubParser has correct arguments."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers(dest="command")
|
||||
|
||||
github_parser = GitHubParser()
|
||||
github_parser.create_parser(subparsers)
|
||||
|
||||
args = main_parser.parse_args(["github", "--repo", "owner/repo"])
|
||||
assert args.command == "github"
|
||||
assert args.repo == "owner/repo"
|
||||
|
||||
args = main_parser.parse_args(["github", "--repo", "owner/repo", "--non-interactive"])
|
||||
assert args.non_interactive is True
|
||||
|
||||
def test_package_parser_arguments(self):
|
||||
"""Test PackageParser has correct arguments."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
@@ -185,44 +120,19 @@ class TestSpecificParsers:
|
||||
args = main_parser.parse_args(["package", "output/test/", "--no-open"])
|
||||
assert args.no_open is True
|
||||
|
||||
def test_analyze_parser_arguments(self):
|
||||
"""Test AnalyzeParser has correct arguments."""
|
||||
main_parser = argparse.ArgumentParser()
|
||||
subparsers = main_parser.add_subparsers(dest="command")
|
||||
|
||||
from skill_seekers.cli.parsers.analyze_parser import AnalyzeParser
|
||||
class TestCurrentCommands:
|
||||
"""Test current CLI commands after Grand Unification."""
|
||||
|
||||
analyze_parser = AnalyzeParser()
|
||||
analyze_parser.create_parser(subparsers)
|
||||
|
||||
args = main_parser.parse_args(["analyze", "--directory", "."])
|
||||
assert args.command == "analyze"
|
||||
assert args.directory == "."
|
||||
|
||||
args = main_parser.parse_args(["analyze", "--directory", ".", "--quick"])
|
||||
assert args.quick is True
|
||||
|
||||
args = main_parser.parse_args(["analyze", "--directory", ".", "--comprehensive"])
|
||||
assert args.comprehensive is True
|
||||
|
||||
args = main_parser.parse_args(["analyze", "--directory", ".", "--skip-patterns"])
|
||||
assert args.skip_patterns is True
|
||||
|
||||
|
||||
class TestBackwardCompatibility:
|
||||
"""Test backward compatibility with old CLI."""
|
||||
|
||||
def test_all_original_commands_still_work(self):
|
||||
"""Test that all original commands are still registered."""
|
||||
def test_all_current_commands_registered(self):
|
||||
"""Test that all current commands are registered."""
|
||||
names = get_parser_names()
|
||||
|
||||
# Original commands from old main.py
|
||||
original_commands = [
|
||||
# Commands that survived the Grand Unification
|
||||
# (individual scraper commands removed; use 'create' instead)
|
||||
current_commands = [
|
||||
"config",
|
||||
"scrape",
|
||||
"github",
|
||||
"pdf",
|
||||
"unified",
|
||||
"create",
|
||||
"enhance",
|
||||
"enhance-status",
|
||||
"package",
|
||||
@@ -230,22 +140,50 @@ class TestBackwardCompatibility:
|
||||
"estimate",
|
||||
"extract-test-examples",
|
||||
"install-agent",
|
||||
"analyze",
|
||||
"install",
|
||||
"resume",
|
||||
"stream",
|
||||
"update",
|
||||
"multilang",
|
||||
"quality",
|
||||
"doctor",
|
||||
"workflows",
|
||||
"sync-config",
|
||||
]
|
||||
|
||||
for cmd in original_commands:
|
||||
for cmd in current_commands:
|
||||
assert cmd in names, f"Command '{cmd}' not found in parser registry!"
|
||||
|
||||
def test_removed_scraper_commands_not_present(self):
|
||||
"""Test that individual scraper commands were removed."""
|
||||
names = get_parser_names()
|
||||
|
||||
removed_commands = [
|
||||
"scrape",
|
||||
"github",
|
||||
"pdf",
|
||||
"video",
|
||||
"word",
|
||||
"epub",
|
||||
"jupyter",
|
||||
"html",
|
||||
"openapi",
|
||||
"asciidoc",
|
||||
"pptx",
|
||||
"rss",
|
||||
"manpage",
|
||||
"confluence",
|
||||
"notion",
|
||||
"chat",
|
||||
]
|
||||
|
||||
for cmd in removed_commands:
|
||||
assert cmd not in names, f"Removed command '{cmd}' still in parser registry!"
|
||||
|
||||
def test_command_count_matches(self):
|
||||
"""Test that we have exactly 35 commands (25 original + 10 new source types)."""
|
||||
assert len(PARSERS) == 36
|
||||
assert len(get_parser_names()) == 36
|
||||
"""Test that we have exactly 18 commands."""
|
||||
assert len(PARSERS) == 18
|
||||
assert len(get_parser_names()) == 18
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -14,152 +14,6 @@ import subprocess
|
||||
import argparse
|
||||
|
||||
|
||||
class TestParserSync:
|
||||
"""E2E tests for parser synchronization (Issue #285)."""
|
||||
|
||||
def test_scrape_interactive_flag_works(self):
|
||||
"""Test that --interactive flag (previously missing) now works."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--interactive", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert result.returncode == 0, "Command should execute successfully"
|
||||
assert "--interactive" in result.stdout, "Help should show --interactive flag"
|
||||
assert "-i" in result.stdout, "Help should show short form -i"
|
||||
|
||||
def test_scrape_chunk_for_rag_flag_works(self):
|
||||
"""Test that --chunk-for-rag flag (previously missing) now works."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert "--chunk-for-rag" in result.stdout, "Help should show --chunk-for-rag flag"
|
||||
assert "--chunk-tokens" in result.stdout, "Help should show --chunk-tokens flag"
|
||||
assert "--chunk-overlap-tokens" in result.stdout, (
|
||||
"Help should show --chunk-overlap-tokens flag"
|
||||
)
|
||||
|
||||
def test_scrape_verbose_flag_works(self):
|
||||
"""Test that --verbose flag (previously missing) now works."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert "--verbose" in result.stdout, "Help should show --verbose flag"
|
||||
assert "-v" in result.stdout, "Help should show short form -v"
|
||||
|
||||
def test_scrape_url_flag_works(self):
|
||||
"""Test that --url flag (previously missing) now works."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert "--url URL" in result.stdout, "Help should show --url flag"
|
||||
|
||||
def test_github_all_flags_present(self):
|
||||
"""Test that github command has all expected flags."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "github", "--help"], capture_output=True, text=True
|
||||
)
|
||||
# Key github flags that should be present
|
||||
expected_flags = [
|
||||
"--repo",
|
||||
"--api-key",
|
||||
"--profile",
|
||||
"--non-interactive",
|
||||
]
|
||||
for flag in expected_flags:
|
||||
assert flag in result.stdout, f"Help should show {flag} flag"
|
||||
|
||||
|
||||
class TestPresetSystem:
|
||||
"""E2E tests for preset system (Issue #268)."""
|
||||
|
||||
def test_analyze_preset_flag_exists(self):
|
||||
"""Test that analyze command has --preset flag."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert "--preset" in result.stdout, "Help should show --preset flag"
|
||||
assert "quick" in result.stdout, "Help should mention 'quick' preset"
|
||||
assert "standard" in result.stdout, "Help should mention 'standard' preset"
|
||||
assert "comprehensive" in result.stdout, "Help should mention 'comprehensive' preset"
|
||||
|
||||
def test_analyze_preset_list_flag_exists(self):
|
||||
"""Test that analyze command has --preset-list flag."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"], capture_output=True, text=True
|
||||
)
|
||||
assert "--preset-list" in result.stdout, "Help should show --preset-list flag"
|
||||
|
||||
def test_preset_list_shows_presets(self):
|
||||
"""Test that --preset-list shows all available presets."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--preset-list"], capture_output=True, text=True
|
||||
)
|
||||
assert result.returncode == 0, "Command should execute successfully"
|
||||
assert "Available presets" in result.stdout, "Should show preset list header"
|
||||
assert "quick" in result.stdout, "Should show quick preset"
|
||||
assert "standard" in result.stdout, "Should show standard preset"
|
||||
assert "comprehensive" in result.stdout, "Should show comprehensive preset"
|
||||
assert "1-2 minutes" in result.stdout, "Should show time estimates"
|
||||
|
||||
def test_deprecated_quick_flag_shows_warning(self, tmp_path):
|
||||
"""Test that --quick flag shows deprecation warning."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--directory", str(tmp_path), "--quick"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
# Note: Deprecation warnings go to stderr or stdout
|
||||
output = result.stdout + result.stderr
|
||||
assert "DEPRECATED" in output, "Should show deprecation warning"
|
||||
assert "--preset quick" in output, "Should suggest alternative"
|
||||
|
||||
def test_deprecated_comprehensive_flag_shows_warning(self, tmp_path):
|
||||
"""Test that --comprehensive flag shows deprecation warning."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--directory", str(tmp_path), "--comprehensive"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
output = result.stdout + result.stderr
|
||||
assert "DEPRECATED" in output, "Should show deprecation warning"
|
||||
assert "--preset comprehensive" in output, "Should suggest alternative"
|
||||
|
||||
|
||||
class TestBackwardCompatibility:
|
||||
"""E2E tests for backward compatibility."""
|
||||
|
||||
def test_old_scrape_command_still_works(self):
|
||||
"""Test that old scrape command invocations still work."""
|
||||
result = subprocess.run(["skill-seekers-scrape", "--help"], capture_output=True, text=True)
|
||||
assert result.returncode == 0, "Old command should still work"
|
||||
assert "documentation" in result.stdout.lower(), "Help should mention documentation"
|
||||
|
||||
def test_unified_cli_and_standalone_have_same_args(self):
|
||||
"""Test that unified CLI and standalone have identical arguments."""
|
||||
# Get help from unified CLI
|
||||
unified_result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
# Get help from standalone
|
||||
standalone_result = subprocess.run(
|
||||
["skill-seekers-scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
# Both should have the same key flags
|
||||
key_flags = [
|
||||
"--interactive",
|
||||
"--url",
|
||||
"--verbose",
|
||||
"--chunk-for-rag",
|
||||
"--config",
|
||||
"--max-pages",
|
||||
]
|
||||
|
||||
for flag in key_flags:
|
||||
assert flag in unified_result.stdout, f"Unified should have {flag}"
|
||||
assert flag in standalone_result.stdout, f"Standalone should have {flag}"
|
||||
|
||||
|
||||
class TestProgrammaticAPI:
|
||||
"""Test that the shared argument functions work programmatically."""
|
||||
|
||||
@@ -211,11 +65,7 @@ class TestIntegration:
|
||||
|
||||
# All major commands should be listed
|
||||
expected_commands = [
|
||||
"scrape",
|
||||
"github",
|
||||
"pdf",
|
||||
"unified",
|
||||
"analyze",
|
||||
"create",
|
||||
"enhance",
|
||||
"package",
|
||||
"upload",
|
||||
@@ -224,75 +74,6 @@ class TestIntegration:
|
||||
for cmd in expected_commands:
|
||||
assert cmd in result.stdout, f"Should list {cmd} command"
|
||||
|
||||
def test_scrape_help_detailed(self):
|
||||
"""Test that scrape help shows all argument details."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
# Check for argument categories
|
||||
assert "url" in result.stdout.lower(), "Should show url argument"
|
||||
assert "scraping options" in result.stdout.lower() or "options" in result.stdout.lower()
|
||||
assert "enhancement" in result.stdout.lower(), "Should mention enhancement options"
|
||||
|
||||
def test_analyze_help_shows_presets(self):
|
||||
"""Test that analyze help prominently shows preset information."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
assert "--preset" in result.stdout, "Should show --preset flag"
|
||||
assert "DEFAULT" in result.stdout or "default" in result.stdout, (
|
||||
"Should indicate default preset"
|
||||
)
|
||||
|
||||
|
||||
class TestE2EWorkflow:
|
||||
"""End-to-end workflow tests."""
|
||||
|
||||
@pytest.mark.slow
|
||||
def test_dry_run_scrape_with_new_args(self, tmp_path):
|
||||
"""Test scraping with previously missing arguments (dry run)."""
|
||||
result = subprocess.run(
|
||||
[
|
||||
"skill-seekers",
|
||||
"scrape",
|
||||
"--url",
|
||||
"https://example.com",
|
||||
"--interactive",
|
||||
"false", # Would fail if arg didn't exist
|
||||
"--verbose", # Would fail if arg didn't exist
|
||||
"--dry-run",
|
||||
],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=10,
|
||||
)
|
||||
|
||||
# Dry run should complete without errors
|
||||
# (it may return non-zero if --interactive false isn't valid,
|
||||
# but it shouldn't crash with "unrecognized arguments")
|
||||
assert "unrecognized arguments" not in result.stderr.lower()
|
||||
|
||||
@pytest.mark.slow
|
||||
def test_analyze_with_preset_flag(self, tmp_path):
|
||||
"""Test analyze with preset flag (no dry-run available)."""
|
||||
# Create a dummy directory to analyze
|
||||
test_dir = tmp_path / "test_code"
|
||||
test_dir.mkdir()
|
||||
(test_dir / "test.py").write_text("def hello(): pass")
|
||||
|
||||
# Just verify the flag is recognized (no execution)
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
|
||||
# Verify preset flag exists
|
||||
assert "--preset" in result.stdout, "Should have --preset flag"
|
||||
assert "unrecognized arguments" not in result.stderr.lower()
|
||||
|
||||
|
||||
class TestVarFlagRouting:
|
||||
"""Test that --var flag is correctly routed through create command."""
|
||||
@@ -306,15 +87,6 @@ class TestVarFlagRouting:
|
||||
)
|
||||
assert "--var" in result.stdout, "create --help should show --var flag"
|
||||
|
||||
def test_var_flag_accepted_by_analyze(self):
|
||||
"""Test that --var flag is accepted by analyze command."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert "--var" in result.stdout, "analyze --help should show --var flag"
|
||||
|
||||
@pytest.mark.slow
|
||||
def test_var_flag_not_rejected_in_create_local(self, tmp_path):
|
||||
"""Test --var KEY=VALUE doesn't cause 'unrecognized arguments' in create."""
|
||||
@@ -354,15 +126,6 @@ class TestBackwardCompatibleFlags:
|
||||
# but should not cause an error if used
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_no_preserve_code_alias_accepted_by_scrape(self):
|
||||
"""Test --no-preserve-code (old name) is still accepted by scrape command."""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
|
||||
def test_no_preserve_code_alias_accepted_by_create(self):
|
||||
"""Test --no-preserve-code (old name) is still accepted by create command."""
|
||||
result = subprocess.run(
|
||||
|
||||
@@ -101,395 +101,96 @@ class TestCreateCommandBasic:
|
||||
# Verify help works
|
||||
assert result.returncode in [0, 2]
|
||||
|
||||
def test_create_invalid_source_shows_error(self):
|
||||
"""Test that invalid sources raise a helpful ValueError."""
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
with pytest.raises(ValueError) as exc_info:
|
||||
SourceDetector.detect("not_a_valid_source_123_xyz")
|
||||
class TestCreateCommandConverterRouting:
|
||||
"""Tests that create command routes to correct converters."""
|
||||
|
||||
error_message = str(exc_info.value)
|
||||
assert "Cannot determine source type" in error_message
|
||||
# Error should include helpful examples
|
||||
assert "https://" in error_message or "github" in error_message.lower()
|
||||
def test_get_converter_web(self):
|
||||
"""Test that get_converter returns DocToSkillConverter for web."""
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
def test_create_supports_universal_flags(self):
|
||||
"""Test that universal flags are accepted."""
|
||||
import subprocess
|
||||
config = {"name": "test", "base_url": "https://example.com"}
|
||||
converter = get_converter("web", config)
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "create", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert converter.SOURCE_TYPE == "web"
|
||||
assert converter.name == "test"
|
||||
|
||||
# Check that universal flags are present
|
||||
assert "--name" in result.stdout
|
||||
assert "--enhance" in result.stdout
|
||||
assert "--chunk-for-rag" in result.stdout
|
||||
assert "--preset" in result.stdout
|
||||
assert "--dry-run" in result.stdout
|
||||
def test_get_converter_github(self):
|
||||
"""Test that get_converter returns GitHubScraper for github."""
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
config = {"name": "test", "repo": "owner/repo"}
|
||||
converter = get_converter("github", config)
|
||||
|
||||
assert converter.SOURCE_TYPE == "github"
|
||||
assert converter.name == "test"
|
||||
|
||||
def test_get_converter_pdf(self):
|
||||
"""Test that get_converter returns PDFToSkillConverter for pdf."""
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
config = {"name": "test", "pdf_path": "/tmp/test.pdf"}
|
||||
converter = get_converter("pdf", config)
|
||||
|
||||
assert converter.SOURCE_TYPE == "pdf"
|
||||
assert converter.name == "test"
|
||||
|
||||
def test_get_converter_unknown_raises(self):
|
||||
"""Test that get_converter raises ValueError for unknown type."""
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
with pytest.raises(ValueError, match="Unknown source type"):
|
||||
get_converter("unknown_type", {})
|
||||
|
||||
|
||||
class TestCreateCommandArgvForwarding:
|
||||
"""Unit tests for _build_argv argument forwarding."""
|
||||
class TestExecutionContextIntegration:
|
||||
"""Tests that ExecutionContext flows correctly through the system."""
|
||||
|
||||
def _make_args(self, **kwargs):
|
||||
def test_execution_context_auto_initializes(self):
|
||||
"""ExecutionContext.get() returns defaults without explicit init."""
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
|
||||
# Reset to ensure clean state
|
||||
ExecutionContext.reset()
|
||||
|
||||
# Should not raise - returns default context
|
||||
ctx = ExecutionContext.get()
|
||||
assert ctx is not None
|
||||
assert ctx.output.name is None # Default value
|
||||
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_execution_context_values_preserved(self):
|
||||
"""Values set in context are preserved and accessible."""
|
||||
from skill_seekers.cli.execution_context import ExecutionContext
|
||||
import argparse
|
||||
|
||||
defaults = {
|
||||
"source": "https://example.com",
|
||||
"enhance_workflow": None,
|
||||
"enhance_stage": None,
|
||||
"var": None,
|
||||
"workflow_dry_run": False,
|
||||
"enhance_level": 2,
|
||||
"output": None,
|
||||
"name": None,
|
||||
"description": None,
|
||||
"config": None,
|
||||
"api_key": None,
|
||||
"dry_run": False,
|
||||
"verbose": False,
|
||||
"quiet": False,
|
||||
"chunk_for_rag": False,
|
||||
"chunk_size": 512,
|
||||
"chunk_overlap": 50,
|
||||
"preset": None,
|
||||
"no_preserve_code_blocks": False,
|
||||
"no_preserve_paragraphs": False,
|
||||
"interactive_enhancement": False,
|
||||
"agent": None,
|
||||
"agent_cmd": None,
|
||||
"doc_version": "",
|
||||
}
|
||||
defaults.update(kwargs)
|
||||
return argparse.Namespace(**defaults)
|
||||
ExecutionContext.reset()
|
||||
|
||||
def _collect_argv(self, args):
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
cmd = CreateCommand(args)
|
||||
cmd.source_info = SourceDetector.detect(args.source)
|
||||
return cmd._build_argv("test_module", [])
|
||||
|
||||
def test_single_enhance_workflow_forwarded(self):
|
||||
args = self._make_args(enhance_workflow=["security-focus"])
|
||||
argv = self._collect_argv(args)
|
||||
assert argv.count("--enhance-workflow") == 1
|
||||
assert "security-focus" in argv
|
||||
|
||||
def test_multiple_enhance_workflows_all_forwarded(self):
|
||||
"""Each workflow must appear as a separate --enhance-workflow flag."""
|
||||
args = self._make_args(enhance_workflow=["security-focus", "minimal"])
|
||||
argv = self._collect_argv(args)
|
||||
assert argv.count("--enhance-workflow") == 2
|
||||
idx1 = argv.index("security-focus")
|
||||
idx2 = argv.index("minimal")
|
||||
assert argv[idx1 - 1] == "--enhance-workflow"
|
||||
assert argv[idx2 - 1] == "--enhance-workflow"
|
||||
|
||||
def test_no_enhance_workflow_not_forwarded(self):
|
||||
args = self._make_args(enhance_workflow=None)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--enhance-workflow" not in argv
|
||||
|
||||
# ── enhance_stage ────────────────────────────────────────────────────────
|
||||
|
||||
def test_single_enhance_stage_forwarded(self):
|
||||
args = self._make_args(enhance_stage=["security:Check for vulnerabilities"])
|
||||
argv = self._collect_argv(args)
|
||||
assert "--enhance-stage" in argv
|
||||
assert "security:Check for vulnerabilities" in argv
|
||||
|
||||
def test_multiple_enhance_stages_all_forwarded(self):
|
||||
stages = ["sec:Check security", "cleanup:Remove boilerplate"]
|
||||
args = self._make_args(enhance_stage=stages)
|
||||
argv = self._collect_argv(args)
|
||||
assert argv.count("--enhance-stage") == 2
|
||||
for stage in stages:
|
||||
assert stage in argv
|
||||
|
||||
def test_enhance_stage_none_not_forwarded(self):
|
||||
args = self._make_args(enhance_stage=None)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--enhance-stage" not in argv
|
||||
|
||||
# ── var ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def test_single_var_forwarded(self):
|
||||
args = self._make_args(var=["depth=comprehensive"])
|
||||
argv = self._collect_argv(args)
|
||||
assert "--var" in argv
|
||||
assert "depth=comprehensive" in argv
|
||||
|
||||
def test_multiple_vars_all_forwarded(self):
|
||||
args = self._make_args(var=["depth=comprehensive", "focus=security"])
|
||||
argv = self._collect_argv(args)
|
||||
assert argv.count("--var") == 2
|
||||
assert "depth=comprehensive" in argv
|
||||
assert "focus=security" in argv
|
||||
|
||||
def test_var_none_not_forwarded(self):
|
||||
args = self._make_args(var=None)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--var" not in argv
|
||||
|
||||
# ── workflow_dry_run ─────────────────────────────────────────────────────
|
||||
|
||||
def test_workflow_dry_run_forwarded(self):
|
||||
args = self._make_args(workflow_dry_run=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--workflow-dry-run" in argv
|
||||
|
||||
def test_workflow_dry_run_false_not_forwarded(self):
|
||||
args = self._make_args(workflow_dry_run=False)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--workflow-dry-run" not in argv
|
||||
|
||||
# ── mixed ────────────────────────────────────────────────────────────────
|
||||
|
||||
def test_workflow_and_stage_both_forwarded(self):
|
||||
args = self._make_args(
|
||||
enhance_workflow=["security-focus"],
|
||||
enhance_stage=["cleanup:Remove boilerplate"],
|
||||
var=["depth=basic"],
|
||||
workflow_dry_run=True,
|
||||
)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--enhance-workflow" in argv
|
||||
assert "security-focus" in argv
|
||||
assert "--enhance-stage" in argv
|
||||
assert "--var" in argv
|
||||
assert "--workflow-dry-run" in argv
|
||||
|
||||
# ── _SKIP_ARGS exclusion ────────────────────────────────────────────────
|
||||
|
||||
def test_source_never_forwarded(self):
|
||||
"""'source' is in _SKIP_ARGS and must never appear in argv."""
|
||||
args = self._make_args(source="https://example.com")
|
||||
argv = self._collect_argv(args)
|
||||
assert "--source" not in argv
|
||||
|
||||
def test_func_never_forwarded(self):
|
||||
"""'func' is in _SKIP_ARGS and must never appear in argv."""
|
||||
args = self._make_args(func=lambda: None)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--func" not in argv
|
||||
|
||||
def test_config_never_forwarded_by_build_argv(self):
|
||||
"""'config' is in _SKIP_ARGS; forwarded manually by specific routes."""
|
||||
args = self._make_args(config="/path/to/config.json")
|
||||
argv = self._collect_argv(args)
|
||||
assert "--config" not in argv
|
||||
|
||||
def test_subcommand_never_forwarded(self):
|
||||
"""'subcommand' is in _SKIP_ARGS."""
|
||||
args = self._make_args(subcommand="create")
|
||||
argv = self._collect_argv(args)
|
||||
assert "--subcommand" not in argv
|
||||
|
||||
def test_command_never_forwarded(self):
|
||||
"""'command' is in _SKIP_ARGS."""
|
||||
args = self._make_args(command="create")
|
||||
argv = self._collect_argv(args)
|
||||
assert "--command" not in argv
|
||||
|
||||
# ── _DEST_TO_FLAG mapping ───────────────────────────────────────────────
|
||||
|
||||
def test_async_mode_maps_to_async_flag(self):
|
||||
"""async_mode dest should produce --async flag, not --async-mode."""
|
||||
args = self._make_args(async_mode=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--async" in argv
|
||||
assert "--async-mode" not in argv
|
||||
|
||||
def test_skip_config_maps_to_skip_config_patterns(self):
|
||||
"""skip_config dest should produce --skip-config-patterns flag."""
|
||||
args = self._make_args(skip_config=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--skip-config-patterns" in argv
|
||||
assert "--skip-config" not in argv
|
||||
|
||||
# ── Boolean arg forwarding ──────────────────────────────────────────────
|
||||
|
||||
def test_boolean_true_appends_flag(self):
|
||||
args = self._make_args(dry_run=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--dry-run" in argv
|
||||
|
||||
def test_boolean_false_does_not_append_flag(self):
|
||||
args = self._make_args(dry_run=False)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--dry-run" not in argv
|
||||
|
||||
def test_verbose_true_forwarded(self):
|
||||
args = self._make_args(verbose=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--verbose" in argv
|
||||
|
||||
def test_quiet_true_forwarded(self):
|
||||
args = self._make_args(quiet=True)
|
||||
argv = self._collect_argv(args)
|
||||
assert "--quiet" in argv
|
||||
|
||||
# ── List arg forwarding ─────────────────────────────────────────────────
|
||||
|
||||
def test_list_arg_each_item_gets_separate_flag(self):
|
||||
"""Each list item gets its own --flag value pair."""
|
||||
args = self._make_args(enhance_workflow=["a", "b", "c"])
|
||||
argv = self._collect_argv(args)
|
||||
assert argv.count("--enhance-workflow") == 3
|
||||
for item in ["a", "b", "c"]:
|
||||
idx = argv.index(item)
|
||||
assert argv[idx - 1] == "--enhance-workflow"
|
||||
|
||||
# ── _is_explicitly_set ──────────────────────────────────────────────────
|
||||
|
||||
def test_is_explicitly_set_none_is_not_set(self):
|
||||
"""None values should NOT be considered explicitly set."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
assert cmd._is_explicitly_set("name", None) is False
|
||||
|
||||
def test_is_explicitly_set_bool_true_is_set(self):
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
assert cmd._is_explicitly_set("dry_run", True) is True
|
||||
|
||||
def test_is_explicitly_set_bool_false_is_not_set(self):
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
assert cmd._is_explicitly_set("dry_run", False) is False
|
||||
|
||||
def test_is_explicitly_set_default_doc_version_empty_not_set(self):
|
||||
"""doc_version defaults to '' which means not explicitly set."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
assert cmd._is_explicitly_set("doc_version", "") is False
|
||||
|
||||
def test_is_explicitly_set_nonempty_string_is_set(self):
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
assert cmd._is_explicitly_set("name", "my-skill") is True
|
||||
|
||||
def test_is_explicitly_set_non_default_value_is_set(self):
|
||||
"""A value that differs from the known default IS explicitly set."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
|
||||
args = self._make_args()
|
||||
cmd = CreateCommand(args)
|
||||
# max_issues default is 100; setting to 50 means explicitly set
|
||||
assert cmd._is_explicitly_set("max_issues", 50) is True
|
||||
# Setting to default value means NOT explicitly set
|
||||
assert cmd._is_explicitly_set("max_issues", 100) is False
|
||||
|
||||
# ── Allowlist filtering ─────────────────────────────────────────────────
|
||||
|
||||
def test_allowlist_only_forwards_allowed_args(self):
|
||||
"""When allowlist is provided, only those args are forwarded."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
args = self._make_args(
|
||||
args = argparse.Namespace(
|
||||
source="https://example.com",
|
||||
name="test_skill",
|
||||
enhance_level=3,
|
||||
dry_run=True,
|
||||
verbose=True,
|
||||
name="test-skill",
|
||||
)
|
||||
cmd = CreateCommand(args)
|
||||
cmd.source_info = SourceDetector.detect(args.source)
|
||||
|
||||
# Only allow dry_run in the allowlist
|
||||
allowlist = frozenset({"dry_run"})
|
||||
argv = cmd._build_argv("test_module", [], allowlist=allowlist)
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
assert ctx.output.name == "test_skill"
|
||||
assert ctx.enhancement.level == 3
|
||||
assert ctx.output.dry_run is True
|
||||
|
||||
assert "--dry-run" in argv
|
||||
assert "--verbose" not in argv
|
||||
assert "--name" not in argv
|
||||
# Getting context again returns same values
|
||||
ctx2 = ExecutionContext.get()
|
||||
assert ctx2.output.name == "test_skill"
|
||||
|
||||
def test_allowlist_skips_non_allowed_even_if_set(self):
|
||||
"""Args not in the allowlist are excluded even if explicitly set."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
args = self._make_args(
|
||||
enhance_workflow=["security-focus"],
|
||||
quiet=True,
|
||||
)
|
||||
cmd = CreateCommand(args)
|
||||
cmd.source_info = SourceDetector.detect(args.source)
|
||||
|
||||
allowlist = frozenset({"quiet"})
|
||||
argv = cmd._build_argv("test_module", [], allowlist=allowlist)
|
||||
|
||||
assert "--quiet" in argv
|
||||
assert "--enhance-workflow" not in argv
|
||||
|
||||
def test_allowlist_empty_forwards_nothing(self):
|
||||
"""Empty allowlist should forward no user args (auto-name may still be added)."""
|
||||
from skill_seekers.cli.create_command import CreateCommand
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
args = self._make_args(dry_run=True, verbose=True)
|
||||
cmd = CreateCommand(args)
|
||||
cmd.source_info = SourceDetector.detect(args.source)
|
||||
|
||||
allowlist = frozenset()
|
||||
argv = cmd._build_argv("test_module", ["pos"], allowlist=allowlist)
|
||||
|
||||
# User-set args (dry_run, verbose) should NOT be forwarded
|
||||
assert "--dry-run" not in argv
|
||||
assert "--verbose" not in argv
|
||||
# Only module name, positional, and possibly auto-added --name
|
||||
assert argv[0] == "test_module"
|
||||
assert "pos" in argv
|
||||
ExecutionContext.reset()
|
||||
|
||||
|
||||
class TestBackwardCompatibility:
|
||||
"""Test that old commands still work."""
|
||||
class TestUnifiedCommands:
|
||||
"""Test that unified commands still work."""
|
||||
|
||||
def test_scrape_command_still_works(self):
|
||||
"""Old scrape command should still function."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "scrape" in result.stdout.lower()
|
||||
|
||||
def test_github_command_still_works(self):
|
||||
"""Old github command should still function."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "github", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "github" in result.stdout.lower()
|
||||
|
||||
def test_analyze_command_still_works(self):
|
||||
"""Old analyze command should still function."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "analyze", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "analyze" in result.stdout.lower()
|
||||
|
||||
def test_main_help_shows_all_commands(self):
|
||||
"""Main help should show both old and new commands."""
|
||||
def test_main_help_shows_available_commands(self):
|
||||
"""Main help should show available commands."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
@@ -498,14 +199,11 @@ class TestBackwardCompatibility:
|
||||
assert result.returncode == 0
|
||||
# Should show create command
|
||||
assert "create" in result.stdout
|
||||
|
||||
# Should still show old commands
|
||||
assert "scrape" in result.stdout
|
||||
assert "github" in result.stdout
|
||||
assert "analyze" in result.stdout
|
||||
# Should show enhance command
|
||||
assert "enhance" in result.stdout
|
||||
|
||||
def test_workflows_command_still_works(self):
|
||||
"""The new workflows subcommand is accessible via the main CLI."""
|
||||
"""The workflows subcommand is accessible via the main CLI."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
@@ -515,4 +213,29 @@ class TestBackwardCompatibility:
|
||||
timeout=10,
|
||||
)
|
||||
assert result.returncode == 0
|
||||
assert "workflow" in result.stdout.lower()
|
||||
|
||||
|
||||
class TestRemovedCommands:
|
||||
"""Test that old individual scraper commands are properly removed."""
|
||||
|
||||
def test_scrape_command_removed(self):
|
||||
"""Old scrape command should not exist."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "scrape", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
# Should fail - command removed
|
||||
assert result.returncode == 2
|
||||
assert "invalid choice" in result.stderr
|
||||
|
||||
def test_github_command_removed(self):
|
||||
"""Old github command should not exist."""
|
||||
import subprocess
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "github", "--help"], capture_output=True, text=True, timeout=10
|
||||
)
|
||||
# Should fail - command removed
|
||||
assert result.returncode == 2
|
||||
assert "invalid choice" in result.stderr
|
||||
|
||||
511
tests/test_execution_context.py
Normal file
511
tests/test_execution_context.py
Normal file
@@ -0,0 +1,511 @@
|
||||
"""Tests for ExecutionContext singleton.
|
||||
|
||||
This module tests the ExecutionContext class which provides a single source
|
||||
of truth for all configuration in Skill Seekers.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import tempfile
|
||||
|
||||
import pytest
|
||||
|
||||
from skill_seekers.cli.execution_context import (
|
||||
ExecutionContext,
|
||||
get_context,
|
||||
)
|
||||
|
||||
|
||||
class TestExecutionContextBasics:
|
||||
"""Basic functionality tests."""
|
||||
|
||||
def setup_method(self):
|
||||
"""Reset singleton before each test."""
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
"""Clean up after each test."""
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_get_returns_defaults_when_not_initialized(self):
|
||||
"""Should return default context when not explicitly initialized."""
|
||||
ctx = ExecutionContext.get()
|
||||
assert ctx is not None
|
||||
assert ctx.enhancement.level == 2 # default
|
||||
assert ctx.output.name is None # default
|
||||
|
||||
def test_get_context_shortcut(self):
|
||||
"""get_context() should be equivalent to ExecutionContext.get()."""
|
||||
args = argparse.Namespace(name="test-skill")
|
||||
ExecutionContext.initialize(args=args)
|
||||
|
||||
ctx = get_context()
|
||||
assert ctx.output.name == "test-skill"
|
||||
|
||||
def test_initialize_returns_instance(self):
|
||||
"""initialize() should return the context instance."""
|
||||
args = argparse.Namespace(name="test")
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert isinstance(ctx, ExecutionContext)
|
||||
assert ctx.output.name == "test"
|
||||
|
||||
def test_singleton_behavior(self):
|
||||
"""Multiple calls should return same instance."""
|
||||
args = argparse.Namespace(name="first")
|
||||
ctx1 = ExecutionContext.initialize(args=args)
|
||||
ctx2 = ExecutionContext.get()
|
||||
|
||||
assert ctx1 is ctx2
|
||||
|
||||
def test_reset_clears_instance(self):
|
||||
"""reset() should clear the initialized instance, get() returns fresh defaults."""
|
||||
args = argparse.Namespace(name="test-skill")
|
||||
ExecutionContext.initialize(args=args)
|
||||
assert ExecutionContext.get().output.name == "test-skill"
|
||||
|
||||
ExecutionContext.reset()
|
||||
|
||||
# After reset, get() returns default context (not the old one)
|
||||
ctx = ExecutionContext.get()
|
||||
assert ctx.output.name is None # default, not "test-skill"
|
||||
|
||||
|
||||
class TestExecutionContextFromArgs:
|
||||
"""Tests for building context from CLI args."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_basic_args(self):
|
||||
"""Should extract basic args correctly."""
|
||||
args = argparse.Namespace(
|
||||
name="react-docs",
|
||||
output="custom/output",
|
||||
doc_version="18.2",
|
||||
dry_run=True,
|
||||
enhance_level=3,
|
||||
agent="kimi",
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.output.name == "react-docs"
|
||||
assert ctx.output.output_dir == "custom/output"
|
||||
assert ctx.output.doc_version == "18.2"
|
||||
assert ctx.output.dry_run is True
|
||||
assert ctx.enhancement.level == 3
|
||||
assert ctx.enhancement.agent == "kimi"
|
||||
|
||||
def test_scraping_args(self):
|
||||
"""Should extract scraping args correctly."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
max_pages=100,
|
||||
rate_limit=1.5,
|
||||
browser=True,
|
||||
workers=4,
|
||||
async_mode=True,
|
||||
resume=True,
|
||||
fresh=False,
|
||||
skip_scrape=True,
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.scraping.max_pages == 100
|
||||
assert ctx.scraping.rate_limit == 1.5
|
||||
assert ctx.scraping.browser is True
|
||||
assert ctx.scraping.workers == 4
|
||||
assert ctx.scraping.async_mode is True
|
||||
assert ctx.scraping.resume is True
|
||||
assert ctx.scraping.skip_scrape is True
|
||||
|
||||
def test_analysis_args(self):
|
||||
"""Should extract analysis args correctly."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
depth="full",
|
||||
skip_patterns=True,
|
||||
skip_test_examples=True,
|
||||
skip_how_to_guides=True,
|
||||
file_patterns="*.py,*.js",
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.analysis.depth == "full"
|
||||
assert ctx.analysis.skip_patterns is True
|
||||
assert ctx.analysis.skip_test_examples is True
|
||||
assert ctx.analysis.skip_how_to_guides is True
|
||||
assert ctx.analysis.file_patterns == ["*.py", "*.js"]
|
||||
|
||||
def test_workflow_args(self):
|
||||
"""Should extract workflow args correctly."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
enhance_workflow=["security-focus", "api-docs"],
|
||||
enhance_stage=["stage1:prompt1"],
|
||||
var=["key1=value1", "key2=value2"],
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.enhancement.workflows == ["security-focus", "api-docs"]
|
||||
assert ctx.enhancement.stages == ["stage1:prompt1"]
|
||||
assert ctx.enhancement.workflow_vars == {"key1": "value1", "key2": "value2"}
|
||||
|
||||
def test_rag_args(self):
|
||||
"""Should extract RAG args correctly."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
chunk_for_rag=True,
|
||||
chunk_tokens=1024,
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.rag.chunk_for_rag is True
|
||||
assert ctx.rag.chunk_tokens == 1024
|
||||
|
||||
def test_api_mode_detection(self):
|
||||
"""Should detect API mode from api_key."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
api_key="test-key",
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.enhancement.mode == "api"
|
||||
|
||||
def test_local_mode_detection(self):
|
||||
"""Should default to local/auto mode without API key."""
|
||||
# Clean API key env vars to ensure test isolation
|
||||
api_keys = ["ANTHROPIC_API_KEY", "OPENAI_API_KEY", "MOONSHOT_API_KEY", "GOOGLE_API_KEY"]
|
||||
saved = {k: os.environ.pop(k, None) for k in api_keys}
|
||||
try:
|
||||
args = argparse.Namespace(name="test")
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
assert ctx.enhancement.mode in ("local", "auto")
|
||||
finally:
|
||||
for k, v in saved.items():
|
||||
if v is not None:
|
||||
os.environ[k] = v
|
||||
|
||||
def test_raw_args_access(self):
|
||||
"""Should provide access to raw args for backward compatibility."""
|
||||
args = argparse.Namespace(
|
||||
name="test",
|
||||
custom_field="custom_value",
|
||||
)
|
||||
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.get_raw("name") == "test"
|
||||
assert ctx.get_raw("custom_field") == "custom_value"
|
||||
assert ctx.get_raw("nonexistent", "default") == "default"
|
||||
|
||||
|
||||
class TestExecutionContextFromConfigFile:
|
||||
"""Tests for building context from config files."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_unified_config_format(self):
|
||||
"""Should load unified config with sources array."""
|
||||
config = {
|
||||
"name": "unity-docs",
|
||||
"version": "2022.3",
|
||||
"enhancement": {
|
||||
"enabled": True,
|
||||
"level": 2,
|
||||
"mode": "local",
|
||||
"agent": "kimi",
|
||||
"timeout": "unlimited",
|
||||
},
|
||||
"workflows": ["unity-game-dev"],
|
||||
"workflow_stages": ["custom:stage"],
|
||||
"workflow_vars": {"var1": "value1"},
|
||||
"sources": [{"type": "documentation", "base_url": "https://docs.unity3d.com/"}],
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
||||
json.dump(config, f)
|
||||
config_path = f.name
|
||||
|
||||
try:
|
||||
ctx = ExecutionContext.initialize(config_path=config_path)
|
||||
|
||||
assert ctx.output.name == "unity-docs"
|
||||
assert ctx.output.doc_version == "2022.3"
|
||||
assert ctx.enhancement.enabled is True
|
||||
assert ctx.enhancement.level == 2
|
||||
assert ctx.enhancement.mode == "local"
|
||||
assert ctx.enhancement.agent == "kimi"
|
||||
assert ctx.enhancement.workflows == ["unity-game-dev"]
|
||||
assert ctx.enhancement.stages == ["custom:stage"]
|
||||
assert ctx.enhancement.workflow_vars == {"var1": "value1"}
|
||||
finally:
|
||||
os.unlink(config_path)
|
||||
|
||||
def test_simple_web_config_format(self):
|
||||
"""Should load simple web config format."""
|
||||
config = {
|
||||
"name": "react-docs",
|
||||
"version": "18.2",
|
||||
"base_url": "https://react.dev/",
|
||||
"max_pages": 500,
|
||||
"rate_limit": 0.5,
|
||||
"browser": True,
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
||||
json.dump(config, f)
|
||||
config_path = f.name
|
||||
|
||||
try:
|
||||
ctx = ExecutionContext.initialize(config_path=config_path)
|
||||
|
||||
assert ctx.output.name == "react-docs"
|
||||
assert ctx.output.doc_version == "18.2"
|
||||
assert ctx.scraping.max_pages == 500
|
||||
assert ctx.scraping.rate_limit == 0.5
|
||||
assert ctx.scraping.browser is True
|
||||
finally:
|
||||
os.unlink(config_path)
|
||||
|
||||
def test_timeout_integer(self):
|
||||
"""Should handle integer timeout in config."""
|
||||
config = {
|
||||
"name": "test",
|
||||
"enhancement": {"timeout": 3600},
|
||||
"sources": [],
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
||||
json.dump(config, f)
|
||||
config_path = f.name
|
||||
|
||||
try:
|
||||
ctx = ExecutionContext.initialize(config_path=config_path)
|
||||
assert ctx.enhancement.timeout == 3600
|
||||
finally:
|
||||
os.unlink(config_path)
|
||||
|
||||
|
||||
class TestExecutionContextPriority:
|
||||
"""Tests for configuration priority (CLI > Config > Env > Defaults)."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
self._original_env = {}
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
# Restore env vars
|
||||
for key, value in self._original_env.items():
|
||||
if value is not None:
|
||||
os.environ[key] = value
|
||||
else:
|
||||
os.environ.pop(key, None)
|
||||
|
||||
def test_cli_overrides_config(self):
|
||||
"""CLI args should override config file values."""
|
||||
config = {"name": "config-name", "sources": []}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
||||
json.dump(config, f)
|
||||
config_path = f.name
|
||||
|
||||
try:
|
||||
args = argparse.Namespace(name="cli-name")
|
||||
ctx = ExecutionContext.initialize(args=args, config_path=config_path)
|
||||
|
||||
# CLI should win
|
||||
assert ctx.output.name == "cli-name"
|
||||
finally:
|
||||
os.unlink(config_path)
|
||||
|
||||
def test_config_overrides_defaults(self):
|
||||
"""Config file should override default values."""
|
||||
config = {
|
||||
"name": "config-name",
|
||||
"enhancement": {"level": 3},
|
||||
"sources": [],
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
|
||||
json.dump(config, f)
|
||||
config_path = f.name
|
||||
|
||||
try:
|
||||
ctx = ExecutionContext.initialize(config_path=config_path)
|
||||
|
||||
# Config should override default (level=2)
|
||||
assert ctx.enhancement.level == 3
|
||||
finally:
|
||||
os.unlink(config_path)
|
||||
|
||||
def test_env_overrides_defaults(self):
|
||||
"""Environment variables should override defaults."""
|
||||
self._original_env["SKILL_SEEKER_AGENT"] = os.environ.get("SKILL_SEEKER_AGENT")
|
||||
os.environ["SKILL_SEEKER_AGENT"] = "claude"
|
||||
|
||||
ctx = ExecutionContext.initialize()
|
||||
|
||||
# Env var should override default (None)
|
||||
assert ctx.enhancement.agent == "claude"
|
||||
|
||||
|
||||
class TestExecutionContextSourceInfo:
|
||||
"""Tests for source info integration."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_source_info_integration(self):
|
||||
"""Should integrate source info from source_detector."""
|
||||
|
||||
class MockSourceInfo:
|
||||
type = "web"
|
||||
raw_source = "https://react.dev/"
|
||||
parsed = {"url": "https://react.dev/"}
|
||||
suggested_name = "react"
|
||||
|
||||
ctx = ExecutionContext.initialize(source_info=MockSourceInfo())
|
||||
|
||||
assert ctx.source is not None
|
||||
assert ctx.source.type == "web"
|
||||
assert ctx.source.raw_source == "https://react.dev/"
|
||||
assert ctx.source.suggested_name == "react"
|
||||
|
||||
|
||||
class TestExecutionContextOverride:
|
||||
"""Tests for the override context manager."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_override_temporarily_changes_values(self):
|
||||
"""override() should temporarily change values."""
|
||||
args = argparse.Namespace(name="original", enhance_level=2)
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
assert ctx.enhancement.level == 2
|
||||
|
||||
with ctx.override(enhancement__level=3):
|
||||
ctx_from_get = ExecutionContext.get()
|
||||
assert ctx_from_get.enhancement.level == 3
|
||||
|
||||
# After exit, original value restored
|
||||
assert ExecutionContext.get().enhancement.level == 2
|
||||
|
||||
def test_override_restores_on_exception(self):
|
||||
"""override() should restore values even on exception."""
|
||||
args = argparse.Namespace(name="original", enhance_level=2)
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
|
||||
try:
|
||||
with ctx.override(enhancement__level=3):
|
||||
assert ExecutionContext.get().enhancement.level == 3
|
||||
raise ValueError("Test error")
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
# Should still be restored
|
||||
assert ExecutionContext.get().enhancement.level == 2
|
||||
|
||||
|
||||
class TestExecutionContextValidation:
|
||||
"""Tests for Pydantic validation."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_enhancement_level_bounds(self):
|
||||
"""Enhancement level should be 0-3."""
|
||||
args = argparse.Namespace(name="test", enhance_level=5)
|
||||
|
||||
with pytest.raises(ValueError) as exc_info:
|
||||
ExecutionContext.initialize(args=args)
|
||||
|
||||
assert "level" in str(exc_info.value)
|
||||
|
||||
def test_analysis_depth_choices(self):
|
||||
"""Analysis depth should reject invalid values."""
|
||||
import pydantic
|
||||
|
||||
args = argparse.Namespace(name="test", depth="invalid")
|
||||
with pytest.raises(pydantic.ValidationError):
|
||||
ExecutionContext.initialize(args=args)
|
||||
|
||||
def test_analysis_depth_valid_choices(self):
|
||||
"""Analysis depth should accept surface, deep, full."""
|
||||
for depth in ("surface", "deep", "full"):
|
||||
ExecutionContext.reset()
|
||||
args = argparse.Namespace(name="test", depth=depth)
|
||||
ctx = ExecutionContext.initialize(args=args)
|
||||
assert ctx.analysis.depth == depth
|
||||
|
||||
|
||||
class TestExecutionContextDefaults:
|
||||
"""Tests for default values."""
|
||||
|
||||
def setup_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def teardown_method(self):
|
||||
ExecutionContext.reset()
|
||||
|
||||
def test_default_values(self):
|
||||
"""Should have sensible defaults."""
|
||||
# Clear API key env vars so mode defaults to "auto" regardless of environment
|
||||
api_keys = ("ANTHROPIC_API_KEY", "OPENAI_API_KEY", "MOONSHOT_API_KEY", "GOOGLE_API_KEY")
|
||||
saved = {k: os.environ.pop(k, None) for k in api_keys}
|
||||
try:
|
||||
ctx = ExecutionContext.initialize()
|
||||
|
||||
# Enhancement defaults
|
||||
assert ctx.enhancement.enabled is True
|
||||
assert ctx.enhancement.level == 2
|
||||
assert ctx.enhancement.mode == "auto" # Default is auto, resolved at runtime
|
||||
assert ctx.enhancement.timeout == 2700 # 45 minutes
|
||||
finally:
|
||||
for k, v in saved.items():
|
||||
if v is not None:
|
||||
os.environ[k] = v
|
||||
|
||||
# Output defaults
|
||||
assert ctx.output.name is None
|
||||
assert ctx.output.dry_run is False
|
||||
|
||||
# Scraping defaults
|
||||
assert ctx.scraping.browser is False
|
||||
assert ctx.scraping.workers == 1
|
||||
assert ctx.scraping.languages == ["en"]
|
||||
|
||||
# Analysis defaults
|
||||
assert ctx.analysis.depth == "surface"
|
||||
assert ctx.analysis.skip_patterns is False
|
||||
|
||||
# RAG defaults
|
||||
assert ctx.rag.chunk_for_rag is False
|
||||
assert ctx.rag.chunk_tokens == 512
|
||||
@@ -46,31 +46,20 @@ class TestFrameworkDetection(unittest.TestCase):
|
||||
" return render_template('index.html')\n"
|
||||
)
|
||||
|
||||
# Run codebase analyzer
|
||||
from skill_seekers.cli.codebase_scraper import main as scraper_main
|
||||
import sys
|
||||
# Run codebase analyzer directly
|
||||
from skill_seekers.cli.codebase_scraper import analyze_codebase
|
||||
|
||||
old_argv = sys.argv
|
||||
try:
|
||||
sys.argv = [
|
||||
"skill-seekers-codebase",
|
||||
"--directory",
|
||||
str(self.test_project),
|
||||
"--output",
|
||||
str(self.output_dir),
|
||||
"--depth",
|
||||
"deep",
|
||||
"--ai-mode",
|
||||
"none",
|
||||
"--skip-patterns",
|
||||
"--skip-test-examples",
|
||||
"--skip-how-to-guides",
|
||||
"--skip-config-patterns",
|
||||
"--skip-docs",
|
||||
]
|
||||
scraper_main()
|
||||
finally:
|
||||
sys.argv = old_argv
|
||||
analyze_codebase(
|
||||
directory=self.test_project,
|
||||
output_dir=self.output_dir,
|
||||
depth="deep",
|
||||
enhance_level=0,
|
||||
detect_patterns=False,
|
||||
extract_test_examples=False,
|
||||
build_how_to_guides=False,
|
||||
extract_config_patterns=False,
|
||||
extract_docs=False,
|
||||
)
|
||||
|
||||
# Verify Flask was detected
|
||||
arch_file = self.output_dir / "references" / "architecture" / "architectural_patterns.json"
|
||||
@@ -91,26 +80,15 @@ class TestFrameworkDetection(unittest.TestCase):
|
||||
"import django\nfrom flask import Flask\nimport requests"
|
||||
)
|
||||
|
||||
# Run codebase analyzer
|
||||
from skill_seekers.cli.codebase_scraper import main as scraper_main
|
||||
import sys
|
||||
# Run codebase analyzer directly
|
||||
from skill_seekers.cli.codebase_scraper import analyze_codebase
|
||||
|
||||
old_argv = sys.argv
|
||||
try:
|
||||
sys.argv = [
|
||||
"skill-seekers-codebase",
|
||||
"--directory",
|
||||
str(self.test_project),
|
||||
"--output",
|
||||
str(self.output_dir),
|
||||
"--depth",
|
||||
"deep",
|
||||
"--ai-mode",
|
||||
"none",
|
||||
]
|
||||
scraper_main()
|
||||
finally:
|
||||
sys.argv = old_argv
|
||||
analyze_codebase(
|
||||
directory=self.test_project,
|
||||
output_dir=self.output_dir,
|
||||
depth="deep",
|
||||
enhance_level=0,
|
||||
)
|
||||
|
||||
# Verify file was analyzed
|
||||
code_analysis = self.output_dir / "code_analysis.json"
|
||||
@@ -143,26 +121,15 @@ class TestFrameworkDetection(unittest.TestCase):
|
||||
# File with no framework imports
|
||||
(app_dir / "utils.py").write_text("def my_function():\n return 'hello'\n")
|
||||
|
||||
# Run codebase analyzer
|
||||
from skill_seekers.cli.codebase_scraper import main as scraper_main
|
||||
import sys
|
||||
# Run codebase analyzer directly
|
||||
from skill_seekers.cli.codebase_scraper import analyze_codebase
|
||||
|
||||
old_argv = sys.argv
|
||||
try:
|
||||
sys.argv = [
|
||||
"skill-seekers-codebase",
|
||||
"--directory",
|
||||
str(self.test_project),
|
||||
"--output",
|
||||
str(self.output_dir),
|
||||
"--depth",
|
||||
"deep",
|
||||
"--ai-mode",
|
||||
"none",
|
||||
]
|
||||
scraper_main()
|
||||
finally:
|
||||
sys.argv = old_argv
|
||||
analyze_codebase(
|
||||
directory=self.test_project,
|
||||
output_dir=self.output_dir,
|
||||
depth="deep",
|
||||
enhance_level=0,
|
||||
)
|
||||
|
||||
# Check frameworks detected
|
||||
arch_file = self.output_dir / "references" / "architecture" / "architectural_patterns.json"
|
||||
|
||||
@@ -52,8 +52,8 @@ class TestGitSourcesE2E:
|
||||
"""Create a temporary git repository with sample configs."""
|
||||
repo_dir = tempfile.mkdtemp(prefix="ss_repo_")
|
||||
|
||||
# Initialize git repository
|
||||
repo = git.Repo.init(repo_dir)
|
||||
# Initialize git repository with 'master' branch for test consistency
|
||||
repo = git.Repo.init(repo_dir, initial_branch="master")
|
||||
|
||||
# Create sample config files
|
||||
configs = {
|
||||
@@ -685,8 +685,8 @@ class TestMCPToolsE2E:
|
||||
"""Create a temporary git repository with sample configs."""
|
||||
repo_dir = tempfile.mkdtemp(prefix="ss_mcp_repo_")
|
||||
|
||||
# Initialize git repository
|
||||
repo = git.Repo.init(repo_dir)
|
||||
# Initialize git repository with 'master' branch for test consistency
|
||||
repo = git.Repo.init(repo_dir, initial_branch="master")
|
||||
|
||||
# Create sample config
|
||||
config = {
|
||||
|
||||
@@ -8,7 +8,6 @@ Tests verify complete fixes for:
|
||||
3. Custom API endpoint support (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN)
|
||||
"""
|
||||
|
||||
import contextlib
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
@@ -117,82 +116,48 @@ class TestIssue219Problem1LargeFiles(unittest.TestCase):
|
||||
|
||||
|
||||
class TestIssue219Problem2CLIFlags(unittest.TestCase):
|
||||
"""E2E Test: Problem #2 - CLI flags working through main.py dispatcher"""
|
||||
"""E2E Test: Problem #2 - CLI flags working through create command"""
|
||||
|
||||
def test_github_command_has_enhancement_flags(self):
|
||||
"""E2E: Verify --enhance-level flag exists in github command help"""
|
||||
def test_create_command_has_enhancement_flags(self):
|
||||
"""E2E: Verify --enhance-level flag exists in create command help"""
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "github", "--help"], capture_output=True, text=True
|
||||
["skill-seekers", "create", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
# VERIFY: Command succeeds
|
||||
self.assertEqual(result.returncode, 0, "github --help should succeed")
|
||||
self.assertEqual(result.returncode, 0, "create --help should succeed")
|
||||
|
||||
# VERIFY: Enhancement flags present
|
||||
self.assertIn("--enhance-level", result.stdout, "Missing --enhance-level flag")
|
||||
self.assertIn("--api-key", result.stdout, "Missing --api-key flag")
|
||||
|
||||
def test_github_command_accepts_enhance_level_flag(self):
|
||||
"""E2E: Verify --enhance-level flag doesn't cause 'unrecognized arguments' error"""
|
||||
# Strategy: Parse arguments directly without executing to avoid network hangs on CI
|
||||
# This tests that the CLI accepts the flag without actually running the command
|
||||
import argparse
|
||||
def test_enhance_level_flag_accepted_by_create(self):
|
||||
"""E2E: Verify --enhance-level flag is accepted by create command parser"""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
# Get the argument parser from github_scraper
|
||||
parser = argparse.ArgumentParser()
|
||||
# Add the same arguments as github_scraper.main()
|
||||
parser.add_argument("--repo", required=True)
|
||||
parser.add_argument("--enhance-level", type=int, choices=[0, 1, 2, 3], default=2)
|
||||
parser.add_argument("--api-key")
|
||||
parser = create_parser()
|
||||
|
||||
# VERIFY: Parsing succeeds without "unrecognized arguments" error
|
||||
try:
|
||||
args = parser.parse_args(["--repo", "test/test", "--enhance-level", "2"])
|
||||
# If we get here, argument parsing succeeded
|
||||
args = parser.parse_args(["create", "owner/repo", "--enhance-level", "2"])
|
||||
self.assertEqual(args.enhance_level, 2, "Flag should be parsed as 2")
|
||||
self.assertEqual(args.repo, "test/test")
|
||||
except SystemExit as e:
|
||||
# Argument parsing failed
|
||||
self.fail(f"Argument parsing failed with: {e}")
|
||||
|
||||
def test_cli_dispatcher_forwards_flags_to_github_scraper(self):
|
||||
"""E2E: Verify main.py dispatcher forwards flags to github_scraper.py"""
|
||||
from skill_seekers.cli import main
|
||||
def test_github_scraper_class_accepts_enhance_level(self):
|
||||
"""E2E: Verify GitHubScraper config accepts enhance_level."""
|
||||
from skill_seekers.cli.github_scraper import GitHubScraper
|
||||
|
||||
# Mock sys.argv to simulate CLI call
|
||||
test_args = [
|
||||
"skill-seekers",
|
||||
"github",
|
||||
"--repo",
|
||||
"test/test",
|
||||
"--name",
|
||||
"test",
|
||||
"--enhance-level",
|
||||
"2",
|
||||
]
|
||||
config = {
|
||||
"repo": "test/test",
|
||||
"name": "test",
|
||||
"github_token": None,
|
||||
"enhance_level": 2,
|
||||
}
|
||||
|
||||
with (
|
||||
patch("sys.argv", test_args),
|
||||
patch("skill_seekers.cli.github_scraper.main") as mock_github_main,
|
||||
):
|
||||
mock_github_main.return_value = 0
|
||||
|
||||
# Call main dispatcher
|
||||
with patch("sys.exit"), contextlib.suppress(SystemExit):
|
||||
main.main()
|
||||
|
||||
# VERIFY: github_scraper.main was called
|
||||
mock_github_main.assert_called_once()
|
||||
|
||||
# VERIFY: sys.argv contains --enhance-level flag
|
||||
# (main.py should have added it before calling github_scraper)
|
||||
called_with_enhance = any(
|
||||
"--enhance-level" in str(call) for call in mock_github_main.call_args_list
|
||||
)
|
||||
self.assertTrue(
|
||||
called_with_enhance or "--enhance-level" in sys.argv,
|
||||
"Flag should be forwarded to github_scraper",
|
||||
)
|
||||
with patch("skill_seekers.cli.github_scraper.Github"):
|
||||
scraper = GitHubScraper(config)
|
||||
# Just verify it doesn't crash with enhance_level in config
|
||||
self.assertIsNotNone(scraper)
|
||||
|
||||
|
||||
@unittest.skipIf(not ANTHROPIC_AVAILABLE, "anthropic package not installed")
|
||||
@@ -338,17 +303,16 @@ class TestIssue219IntegrationAll(unittest.TestCase):
|
||||
def test_all_fixes_work_together(self):
|
||||
"""E2E: Verify all 3 fixes work in combination"""
|
||||
# This test verifies the complete workflow:
|
||||
# 1. CLI accepts --enhance-level
|
||||
# 1. CLI accepts --enhance-level via create command
|
||||
# 2. Large files are downloaded
|
||||
# 3. Custom API endpoints work
|
||||
|
||||
result = subprocess.run(
|
||||
["skill-seekers", "github", "--help"], capture_output=True, text=True
|
||||
["skill-seekers", "create", "--help"], capture_output=True, text=True
|
||||
)
|
||||
|
||||
# Enhancement flags present
|
||||
self.assertIn("--enhance-level", result.stdout)
|
||||
self.assertIn("--api-key", result.stdout)
|
||||
|
||||
# Verify we can import all fixed modules
|
||||
try:
|
||||
|
||||
@@ -280,58 +280,69 @@ class TestScrapeDocsTool(unittest.IsolatedAsyncioTestCase):
|
||||
os.chdir(self.original_cwd)
|
||||
shutil.rmtree(self.temp_dir, ignore_errors=True)
|
||||
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools.run_subprocess_with_streaming")
|
||||
async def test_scrape_docs_basic(self, mock_streaming):
|
||||
"""Test basic documentation scraping"""
|
||||
# Mock successful subprocess run with streaming
|
||||
mock_streaming.return_value = ("Scraping completed successfully", "", 0)
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools._run_converter")
|
||||
@patch("skill_seekers.cli.skill_converter.get_converter")
|
||||
async def test_scrape_docs_basic(self, mock_get_converter, mock_run_converter):
|
||||
"""Test basic documentation scraping via in-process converter"""
|
||||
from skill_seekers.mcp.tools.scraping_tools import TextContent
|
||||
|
||||
mock_run_converter.return_value = [
|
||||
TextContent(type="text", text="Scraping completed successfully")
|
||||
]
|
||||
|
||||
args = {"config_path": str(self.config_path)}
|
||||
|
||||
result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
|
||||
self.assertIsInstance(result, list)
|
||||
self.assertIn("success", result[0].text.lower())
|
||||
mock_get_converter.assert_called_once()
|
||||
mock_run_converter.assert_called_once()
|
||||
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools.run_subprocess_with_streaming")
|
||||
async def test_scrape_docs_with_skip_scrape(self, mock_streaming):
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools._run_converter")
|
||||
@patch("skill_seekers.cli.skill_converter.get_converter")
|
||||
async def test_scrape_docs_with_skip_scrape(self, mock_get_converter, mock_run_converter):
|
||||
"""Test scraping with skip_scrape flag"""
|
||||
# Mock successful subprocess run with streaming
|
||||
mock_streaming.return_value = ("Using cached data", "", 0)
|
||||
from skill_seekers.mcp.tools.scraping_tools import TextContent
|
||||
|
||||
mock_run_converter.return_value = [TextContent(type="text", text="Using cached data")]
|
||||
|
||||
args = {"config_path": str(self.config_path), "skip_scrape": True}
|
||||
result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
|
||||
_result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
self.assertIsInstance(result, list)
|
||||
mock_get_converter.assert_called_once()
|
||||
|
||||
# Verify --skip-scrape was passed
|
||||
call_args = mock_streaming.call_args[0][0]
|
||||
self.assertIn("--skip-scrape", call_args)
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools._run_converter")
|
||||
@patch("skill_seekers.cli.skill_converter.get_converter")
|
||||
async def test_scrape_docs_with_dry_run(self, mock_get_converter, mock_run_converter):
|
||||
"""Test scraping with dry_run flag sets converter.dry_run"""
|
||||
from skill_seekers.mcp.tools.scraping_tools import TextContent
|
||||
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools.run_subprocess_with_streaming")
|
||||
async def test_scrape_docs_with_dry_run(self, mock_streaming):
|
||||
"""Test scraping with dry_run flag"""
|
||||
# Mock successful subprocess run with streaming
|
||||
mock_streaming.return_value = ("Dry run completed", "", 0)
|
||||
mock_converter = mock_get_converter.return_value
|
||||
mock_run_converter.return_value = [TextContent(type="text", text="Dry run completed")]
|
||||
|
||||
args = {"config_path": str(self.config_path), "dry_run": True}
|
||||
result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
|
||||
_result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
self.assertIsInstance(result, list)
|
||||
# Verify dry_run was set on the converter instance
|
||||
self.assertTrue(mock_converter.dry_run)
|
||||
|
||||
call_args = mock_streaming.call_args[0][0]
|
||||
self.assertIn("--dry-run", call_args)
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools._run_converter")
|
||||
@patch("skill_seekers.cli.skill_converter.get_converter")
|
||||
async def test_scrape_docs_with_enhance_local(self, mock_get_converter, mock_run_converter):
|
||||
"""Test scraping with local enhancement flag"""
|
||||
from skill_seekers.mcp.tools.scraping_tools import TextContent
|
||||
|
||||
@patch("skill_seekers.mcp.tools.scraping_tools.run_subprocess_with_streaming")
|
||||
async def test_scrape_docs_with_enhance_local(self, mock_streaming):
|
||||
"""Test scraping with local enhancement"""
|
||||
# Mock successful subprocess run with streaming
|
||||
mock_streaming.return_value = ("Scraping with enhancement", "", 0)
|
||||
mock_run_converter.return_value = [
|
||||
TextContent(type="text", text="Scraping with enhancement")
|
||||
]
|
||||
|
||||
args = {"config_path": str(self.config_path), "enhance_local": True}
|
||||
result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
|
||||
_result = await skill_seeker_server.scrape_docs_tool(args)
|
||||
|
||||
call_args = mock_streaming.call_args[0][0]
|
||||
self.assertIn("--enhance-local", call_args)
|
||||
self.assertIsInstance(result, list)
|
||||
mock_get_converter.assert_called_once()
|
||||
|
||||
|
||||
@unittest.skipUnless(MCP_AVAILABLE, "MCP package not installed")
|
||||
|
||||
@@ -13,8 +13,6 @@ import textwrap
|
||||
import pytest
|
||||
|
||||
from skill_seekers.cli.config_validator import ConfigValidator
|
||||
from skill_seekers.cli.main import COMMAND_MODULES
|
||||
from skill_seekers.cli.parsers import PARSERS, get_parser_names
|
||||
from skill_seekers.cli.source_detector import SourceDetector, SourceInfo
|
||||
from skill_seekers.cli.unified_skill_builder import UnifiedSkillBuilder
|
||||
|
||||
@@ -554,58 +552,11 @@ class TestUnifiedSkillBuilderGenericMerge:
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 4. COMMAND_MODULES and parser wiring
|
||||
# 4. New source types accessible via 'create' command
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestCommandModules:
|
||||
"""Test that all 10 new source types are wired into CLI."""
|
||||
|
||||
NEW_COMMAND_NAMES = [
|
||||
"jupyter",
|
||||
"html",
|
||||
"openapi",
|
||||
"asciidoc",
|
||||
"pptx",
|
||||
"rss",
|
||||
"manpage",
|
||||
"confluence",
|
||||
"notion",
|
||||
"chat",
|
||||
]
|
||||
|
||||
def test_new_types_in_command_modules(self):
|
||||
"""Test all 10 new source types are in COMMAND_MODULES."""
|
||||
for cmd in self.NEW_COMMAND_NAMES:
|
||||
assert cmd in COMMAND_MODULES, f"'{cmd}' not in COMMAND_MODULES"
|
||||
|
||||
def test_command_modules_values_are_module_paths(self):
|
||||
"""Test COMMAND_MODULES values look like importable module paths."""
|
||||
for cmd in self.NEW_COMMAND_NAMES:
|
||||
module_path = COMMAND_MODULES[cmd]
|
||||
assert module_path.startswith("skill_seekers.cli."), (
|
||||
f"Module path for '{cmd}' doesn't start with 'skill_seekers.cli.'"
|
||||
)
|
||||
|
||||
def test_new_parser_names_include_all_10(self):
|
||||
"""Test that get_parser_names() includes all 10 new source types."""
|
||||
names = get_parser_names()
|
||||
for cmd in self.NEW_COMMAND_NAMES:
|
||||
assert cmd in names, f"Parser '{cmd}' not registered"
|
||||
|
||||
def test_total_parser_count(self):
|
||||
"""Test total PARSERS count is 36 (25 original + 10 new + 1 doctor)."""
|
||||
assert len(PARSERS) == 36
|
||||
|
||||
def test_no_duplicate_parser_names(self):
|
||||
"""Test no duplicate parser names exist."""
|
||||
names = get_parser_names()
|
||||
assert len(names) == len(set(names)), "Duplicate parser names found!"
|
||||
|
||||
def test_command_module_count(self):
|
||||
"""Test COMMAND_MODULES has expected number of entries."""
|
||||
# 25 original + 10 new + 1 doctor = 36
|
||||
assert len(COMMAND_MODULES) == 36
|
||||
# Individual scraper CLI commands (jupyter, html, etc.) were removed in the
|
||||
# Grand Unification refactor. All 17 source types are now accessed via
|
||||
# `skill-seekers create`. The routing is tested in TestCreateCommandRouting.
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -769,29 +720,37 @@ class TestSourceDetectorValidation:
|
||||
|
||||
|
||||
class TestCreateCommandRouting:
|
||||
"""Test that CreateCommand._route_to_scraper maps new types to _route_generic."""
|
||||
"""Test that CreateCommand uses get_converter for all source types."""
|
||||
|
||||
# We can't easily call _route_to_scraper (it imports real scrapers),
|
||||
# but we verify the routing table is correct by checking the method source.
|
||||
NEW_SOURCE_TYPES = [
|
||||
"jupyter",
|
||||
"html",
|
||||
"openapi",
|
||||
"asciidoc",
|
||||
"pptx",
|
||||
"rss",
|
||||
"manpage",
|
||||
"confluence",
|
||||
"notion",
|
||||
"chat",
|
||||
]
|
||||
|
||||
GENERIC_ROUTES = {
|
||||
"jupyter": ("jupyter_scraper", "--notebook"),
|
||||
"html": ("html_scraper", "--html-path"),
|
||||
"openapi": ("openapi_scraper", "--spec"),
|
||||
"asciidoc": ("asciidoc_scraper", "--asciidoc-path"),
|
||||
"pptx": ("pptx_scraper", "--pptx"),
|
||||
"rss": ("rss_scraper", "--feed-path"),
|
||||
"manpage": ("man_scraper", "--man-path"),
|
||||
"confluence": ("confluence_scraper", "--export-path"),
|
||||
"notion": ("notion_scraper", "--export-path"),
|
||||
"chat": ("chat_scraper", "--export-path"),
|
||||
}
|
||||
def test_get_converter_handles_all_new_types(self):
|
||||
"""Test get_converter returns a converter for each new source type."""
|
||||
from skill_seekers.cli.skill_converter import get_converter
|
||||
|
||||
def test_route_to_scraper_source_coverage(self):
|
||||
"""Test _route_to_scraper method handles all 10 new types.
|
||||
for source_type in self.NEW_SOURCE_TYPES:
|
||||
# get_converter should not raise for known types
|
||||
# (it may raise ImportError for missing optional deps, which is OK)
|
||||
try:
|
||||
converter_cls = get_converter(source_type, {"name": "test"})
|
||||
assert converter_cls is not None, f"get_converter returned None for '{source_type}'"
|
||||
except ImportError:
|
||||
# Optional dependency not installed - that's fine
|
||||
pass
|
||||
|
||||
We inspect the method source to verify each type has a branch.
|
||||
"""
|
||||
def test_route_to_scraper_uses_get_converter(self):
|
||||
"""Test _route_to_scraper delegates to get_converter (not per-type branches)."""
|
||||
import inspect
|
||||
|
||||
source = inspect.getsource(
|
||||
@@ -800,24 +759,9 @@ class TestCreateCommandRouting:
|
||||
fromlist=["CreateCommand"],
|
||||
).CreateCommand._route_to_scraper
|
||||
)
|
||||
for source_type in self.GENERIC_ROUTES:
|
||||
assert f'"{source_type}"' in source, (
|
||||
f"_route_to_scraper missing branch for '{source_type}'"
|
||||
)
|
||||
|
||||
def test_generic_route_module_names(self):
|
||||
"""Test _route_generic is called with correct module names."""
|
||||
import inspect
|
||||
|
||||
source = inspect.getsource(
|
||||
__import__(
|
||||
"skill_seekers.cli.create_command",
|
||||
fromlist=["CreateCommand"],
|
||||
).CreateCommand._route_to_scraper
|
||||
assert "get_converter" in source, (
|
||||
"_route_to_scraper should use get_converter for unified routing"
|
||||
)
|
||||
for source_type, (module, flag) in self.GENERIC_ROUTES.items():
|
||||
assert f'"{module}"' in source, f"Module name '{module}' not found for '{source_type}'"
|
||||
assert f'"{flag}"' in source, f"Flag '{flag}' not found for '{source_type}'"
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -1,188 +0,0 @@
|
||||
"""Test that unified CLI parsers stay in sync with scraper modules.
|
||||
|
||||
This test ensures that the unified CLI (skill-seekers <command>) has exactly
|
||||
the same arguments as the standalone scraper modules. This prevents the
|
||||
parsers from drifting out of sync (Issue #285).
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
|
||||
class TestScrapeParserSync:
|
||||
"""Ensure scrape_parser has all arguments from doc_scraper."""
|
||||
|
||||
def test_scrape_argument_count_matches(self):
|
||||
"""Verify unified CLI parser has same argument count as doc_scraper."""
|
||||
from skill_seekers.cli.doc_scraper import setup_argument_parser
|
||||
from skill_seekers.cli.parsers.scrape_parser import ScrapeParser
|
||||
|
||||
# Get source arguments from doc_scraper
|
||||
source_parser = setup_argument_parser()
|
||||
source_count = len([a for a in source_parser._actions if a.dest != "help"])
|
||||
|
||||
# Get target arguments from unified CLI parser
|
||||
target_parser = argparse.ArgumentParser()
|
||||
ScrapeParser().add_arguments(target_parser)
|
||||
target_count = len([a for a in target_parser._actions if a.dest != "help"])
|
||||
|
||||
assert source_count == target_count, (
|
||||
f"Argument count mismatch: doc_scraper has {source_count}, "
|
||||
f"but unified CLI parser has {target_count}"
|
||||
)
|
||||
|
||||
def test_scrape_argument_dests_match(self):
|
||||
"""Verify unified CLI parser has same argument destinations as doc_scraper."""
|
||||
from skill_seekers.cli.doc_scraper import setup_argument_parser
|
||||
from skill_seekers.cli.parsers.scrape_parser import ScrapeParser
|
||||
|
||||
# Get source arguments from doc_scraper
|
||||
source_parser = setup_argument_parser()
|
||||
source_dests = {a.dest for a in source_parser._actions if a.dest != "help"}
|
||||
|
||||
# Get target arguments from unified CLI parser
|
||||
target_parser = argparse.ArgumentParser()
|
||||
ScrapeParser().add_arguments(target_parser)
|
||||
target_dests = {a.dest for a in target_parser._actions if a.dest != "help"}
|
||||
|
||||
# Check for missing arguments
|
||||
missing = source_dests - target_dests
|
||||
extra = target_dests - source_dests
|
||||
|
||||
assert not missing, f"scrape_parser missing arguments: {missing}"
|
||||
assert not extra, f"scrape_parser has extra arguments not in doc_scraper: {extra}"
|
||||
|
||||
def test_scrape_specific_arguments_present(self):
|
||||
"""Verify key scrape arguments are present in unified CLI."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
parser = create_parser()
|
||||
|
||||
# Get the scrape subparser
|
||||
subparsers_action = None
|
||||
for action in parser._actions:
|
||||
if isinstance(action, argparse._SubParsersAction):
|
||||
subparsers_action = action
|
||||
break
|
||||
|
||||
assert subparsers_action is not None, "No subparsers found"
|
||||
assert "scrape" in subparsers_action.choices, "scrape subparser not found"
|
||||
|
||||
scrape_parser = subparsers_action.choices["scrape"]
|
||||
arg_dests = {a.dest for a in scrape_parser._actions if a.dest != "help"}
|
||||
|
||||
# Check key arguments that were missing in Issue #285
|
||||
required_args = [
|
||||
"interactive",
|
||||
"url",
|
||||
"verbose",
|
||||
"quiet",
|
||||
"resume",
|
||||
"fresh",
|
||||
"rate_limit",
|
||||
"no_rate_limit",
|
||||
"chunk_for_rag",
|
||||
]
|
||||
|
||||
for arg in required_args:
|
||||
assert arg in arg_dests, f"Required argument '{arg}' missing from scrape parser"
|
||||
|
||||
|
||||
class TestGitHubParserSync:
|
||||
"""Ensure github_parser has all arguments from github_scraper."""
|
||||
|
||||
def test_github_argument_count_matches(self):
|
||||
"""Verify unified CLI parser has same argument count as github_scraper."""
|
||||
from skill_seekers.cli.github_scraper import setup_argument_parser
|
||||
from skill_seekers.cli.parsers.github_parser import GitHubParser
|
||||
|
||||
# Get source arguments from github_scraper
|
||||
source_parser = setup_argument_parser()
|
||||
source_count = len([a for a in source_parser._actions if a.dest != "help"])
|
||||
|
||||
# Get target arguments from unified CLI parser
|
||||
target_parser = argparse.ArgumentParser()
|
||||
GitHubParser().add_arguments(target_parser)
|
||||
target_count = len([a for a in target_parser._actions if a.dest != "help"])
|
||||
|
||||
assert source_count == target_count, (
|
||||
f"Argument count mismatch: github_scraper has {source_count}, "
|
||||
f"but unified CLI parser has {target_count}"
|
||||
)
|
||||
|
||||
def test_github_argument_dests_match(self):
|
||||
"""Verify unified CLI parser has same argument destinations as github_scraper."""
|
||||
from skill_seekers.cli.github_scraper import setup_argument_parser
|
||||
from skill_seekers.cli.parsers.github_parser import GitHubParser
|
||||
|
||||
# Get source arguments from github_scraper
|
||||
source_parser = setup_argument_parser()
|
||||
source_dests = {a.dest for a in source_parser._actions if a.dest != "help"}
|
||||
|
||||
# Get target arguments from unified CLI parser
|
||||
target_parser = argparse.ArgumentParser()
|
||||
GitHubParser().add_arguments(target_parser)
|
||||
target_dests = {a.dest for a in target_parser._actions if a.dest != "help"}
|
||||
|
||||
# Check for missing arguments
|
||||
missing = source_dests - target_dests
|
||||
extra = target_dests - source_dests
|
||||
|
||||
assert not missing, f"github_parser missing arguments: {missing}"
|
||||
assert not extra, f"github_parser has extra arguments not in github_scraper: {extra}"
|
||||
|
||||
|
||||
class TestUnifiedCLI:
|
||||
"""Test the unified CLI main parser."""
|
||||
|
||||
def test_main_parser_creates_successfully(self):
|
||||
"""Verify the main parser can be created without errors."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
parser = create_parser()
|
||||
assert parser is not None
|
||||
|
||||
def test_all_subcommands_present(self):
|
||||
"""Verify all expected subcommands are present."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
parser = create_parser()
|
||||
|
||||
# Find subparsers action
|
||||
subparsers_action = None
|
||||
for action in parser._actions:
|
||||
if isinstance(action, argparse._SubParsersAction):
|
||||
subparsers_action = action
|
||||
break
|
||||
|
||||
assert subparsers_action is not None, "No subparsers found"
|
||||
|
||||
# Check expected subcommands
|
||||
expected_commands = ["scrape", "github"]
|
||||
for cmd in expected_commands:
|
||||
assert cmd in subparsers_action.choices, f"Subcommand '{cmd}' not found"
|
||||
|
||||
def test_scrape_help_works(self):
|
||||
"""Verify scrape subcommand help can be generated."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
parser = create_parser()
|
||||
|
||||
# This should not raise an exception
|
||||
try:
|
||||
parser.parse_args(["scrape", "--help"])
|
||||
except SystemExit as e:
|
||||
# --help causes SystemExit(0) which is expected
|
||||
assert e.code == 0
|
||||
|
||||
def test_github_help_works(self):
|
||||
"""Verify github subcommand help can be generated."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
parser = create_parser()
|
||||
|
||||
# This should not raise an exception
|
||||
try:
|
||||
parser.parse_args(["github", "--help"])
|
||||
except SystemExit as e:
|
||||
# --help causes SystemExit(0) which is expected
|
||||
assert e.code == 0
|
||||
@@ -519,38 +519,5 @@ class TestJSONWorkflow(unittest.TestCase):
|
||||
self.assertEqual(converter.extracted_data["total_pages"], 1)
|
||||
|
||||
|
||||
class TestPDFCLIArguments(unittest.TestCase):
|
||||
"""Test PDF subcommand CLI argument parsing via the main CLI."""
|
||||
|
||||
def setUp(self):
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
self.parser = create_parser()
|
||||
|
||||
def test_api_key_stored_correctly(self):
|
||||
"""Test --api-key is accepted and stored correctly after switching to add_pdf_arguments."""
|
||||
args = self.parser.parse_args(["pdf", "--pdf", "test.pdf", "--api-key", "sk-ant-test"])
|
||||
self.assertEqual(args.api_key, "sk-ant-test")
|
||||
|
||||
def test_enhance_level_accepted(self):
|
||||
"""Test --enhance-level is accepted for pdf subcommand."""
|
||||
args = self.parser.parse_args(["pdf", "--pdf", "test.pdf", "--enhance-level", "1"])
|
||||
self.assertEqual(args.enhance_level, 1)
|
||||
|
||||
def test_enhance_workflow_accepted(self):
|
||||
"""Test --enhance-workflow is accepted and stores a list."""
|
||||
args = self.parser.parse_args(["pdf", "--pdf", "test.pdf", "--enhance-workflow", "minimal"])
|
||||
self.assertEqual(args.enhance_workflow, ["minimal"])
|
||||
|
||||
def test_workflow_dry_run_accepted(self):
|
||||
"""Test --workflow-dry-run is accepted."""
|
||||
args = self.parser.parse_args(["pdf", "--pdf", "test.pdf", "--workflow-dry-run"])
|
||||
self.assertTrue(args.workflow_dry_run)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
||||
@@ -207,100 +207,6 @@ class TestPresetApplication:
|
||||
PresetManager.apply_preset("nonexistent", args)
|
||||
|
||||
|
||||
class TestDeprecationWarnings:
|
||||
"""Test deprecation warning functionality."""
|
||||
|
||||
def test_check_deprecated_flags_quick(self, capsys):
|
||||
"""Test deprecation warning for --quick flag."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=True, comprehensive=False, depth=None, ai_mode="auto")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" in captured.out
|
||||
assert "--quick" in captured.out
|
||||
assert "--preset quick" in captured.out
|
||||
assert "v4.0.0" in captured.out
|
||||
|
||||
def test_check_deprecated_flags_comprehensive(self, capsys):
|
||||
"""Test deprecation warning for --comprehensive flag."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=False, comprehensive=True, depth=None, ai_mode="auto")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" in captured.out
|
||||
assert "--comprehensive" in captured.out
|
||||
assert "--preset comprehensive" in captured.out
|
||||
assert "v4.0.0" in captured.out
|
||||
|
||||
def test_check_deprecated_flags_depth(self, capsys):
|
||||
"""Test deprecation warning for --depth flag."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=False, comprehensive=False, depth="full", ai_mode="auto")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" in captured.out
|
||||
assert "--depth full" in captured.out
|
||||
assert "--preset comprehensive" in captured.out
|
||||
assert "v4.0.0" in captured.out
|
||||
|
||||
def test_check_deprecated_flags_ai_mode(self, capsys):
|
||||
"""Test deprecation warning for --ai-mode flag."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=False, comprehensive=False, depth=None, ai_mode="api")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" in captured.out
|
||||
assert "--ai-mode api" in captured.out
|
||||
assert "--enhance-level" in captured.out
|
||||
assert "v4.0.0" in captured.out
|
||||
|
||||
def test_check_deprecated_flags_multiple(self, capsys):
|
||||
"""Test deprecation warnings for multiple flags."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=True, comprehensive=False, depth="surface", ai_mode="local")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" in captured.out
|
||||
assert "--depth surface" in captured.out
|
||||
assert "--ai-mode local" in captured.out
|
||||
assert "--quick" in captured.out
|
||||
assert "MIGRATION TIP" in captured.out
|
||||
assert "v4.0.0" in captured.out
|
||||
|
||||
def test_check_deprecated_flags_none(self, capsys):
|
||||
"""Test no warnings when no deprecated flags used."""
|
||||
from skill_seekers.cli.codebase_scraper import _check_deprecated_flags
|
||||
import argparse
|
||||
|
||||
args = argparse.Namespace(quick=False, comprehensive=False, depth=None, ai_mode="auto")
|
||||
|
||||
_check_deprecated_flags(args)
|
||||
|
||||
captured = capsys.readouterr()
|
||||
assert "DEPRECATED" not in captured.out
|
||||
assert "v4.0.0" not in captured.out
|
||||
|
||||
|
||||
class TestBackwardCompatibility:
|
||||
"""Test backward compatibility with old flags."""
|
||||
|
||||
|
||||
@@ -574,62 +574,6 @@ def test_config_file_validation():
|
||||
os.unlink(config_path)
|
||||
|
||||
|
||||
# ===========================
|
||||
# Unified CLI Argument Tests
|
||||
# ===========================
|
||||
|
||||
|
||||
class TestUnifiedCLIArguments:
|
||||
"""Test that unified subcommand parser exposes the expected CLI flags."""
|
||||
|
||||
@pytest.fixture
|
||||
def parser(self):
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
return create_parser()
|
||||
|
||||
def test_api_key_stored_correctly(self, parser):
|
||||
"""Test --api-key KEY is stored in args."""
|
||||
args = parser.parse_args(["unified", "--config", "my.json", "--api-key", "sk-ant-test"])
|
||||
assert args.api_key == "sk-ant-test"
|
||||
|
||||
def test_enhance_level_stored_correctly(self, parser):
|
||||
"""Test --enhance-level 2 is stored in args."""
|
||||
args = parser.parse_args(["unified", "--config", "my.json", "--enhance-level", "2"])
|
||||
assert args.enhance_level == 2
|
||||
|
||||
def test_enhance_level_default_is_none(self, parser):
|
||||
"""Test --enhance-level defaults to None (per-source values apply)."""
|
||||
args = parser.parse_args(["unified", "--config", "my.json"])
|
||||
assert args.enhance_level is None
|
||||
|
||||
def test_enhance_level_all_choices(self, parser):
|
||||
"""Test all valid --enhance-level choices are accepted."""
|
||||
for level in [0, 1, 2, 3]:
|
||||
args = parser.parse_args(
|
||||
["unified", "--config", "my.json", "--enhance-level", str(level)]
|
||||
)
|
||||
assert args.enhance_level == level
|
||||
|
||||
def test_enhance_workflow_accepted(self, parser):
|
||||
"""Test --enhance-workflow is accepted."""
|
||||
args = parser.parse_args(
|
||||
["unified", "--config", "my.json", "--enhance-workflow", "security-focus"]
|
||||
)
|
||||
assert args.enhance_workflow == ["security-focus"]
|
||||
|
||||
def test_api_key_and_enhance_level_combined(self, parser):
|
||||
"""Test --api-key and --enhance-level can be combined."""
|
||||
args = parser.parse_args(
|
||||
["unified", "--config", "my.json", "--api-key", "sk-ant-test", "--enhance-level", "3"]
|
||||
)
|
||||
assert args.api_key == "sk-ant-test"
|
||||
assert args.enhance_level == 3
|
||||
|
||||
|
||||
# ===========================
|
||||
# Workflow JSON Config Tests
|
||||
# ===========================
|
||||
|
||||
@@ -168,35 +168,32 @@ class TestScrapeAllSourcesRouting:
|
||||
|
||||
|
||||
class TestScrapeDocumentation:
|
||||
"""_scrape_documentation() writes a temp config and runs doc_scraper as subprocess."""
|
||||
"""_scrape_documentation() calls scrape_documentation() directly."""
|
||||
|
||||
def test_subprocess_called_with_config_and_fresh_flag(self, tmp_path):
|
||||
"""subprocess.run is called with --config and --fresh for the doc scraper."""
|
||||
def test_scrape_documentation_called_directly(self, tmp_path):
|
||||
"""scrape_documentation is called directly (not via subprocess)."""
|
||||
scraper = _make_scraper(tmp_path=tmp_path)
|
||||
source = {"base_url": "https://docs.example.com/", "type": "documentation"}
|
||||
|
||||
with patch("skill_seekers.cli.unified_scraper.subprocess.run") as mock_run:
|
||||
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="error")
|
||||
with patch("skill_seekers.cli.doc_scraper.scrape_documentation") as mock_scrape:
|
||||
mock_scrape.return_value = 1 # simulate failure
|
||||
scraper._scrape_documentation(source)
|
||||
|
||||
assert mock_run.called
|
||||
cmd_args = mock_run.call_args[0][0]
|
||||
assert "--fresh" in cmd_args
|
||||
assert "--config" in cmd_args
|
||||
assert mock_scrape.called
|
||||
|
||||
def test_nothing_appended_on_subprocess_failure(self, tmp_path):
|
||||
"""If subprocess returns non-zero, scraped_data["documentation"] stays empty."""
|
||||
def test_nothing_appended_on_scrape_failure(self, tmp_path):
|
||||
"""If scrape_documentation returns non-zero, scraped_data["documentation"] stays empty."""
|
||||
scraper = _make_scraper(tmp_path=tmp_path)
|
||||
source = {"base_url": "https://docs.example.com/", "type": "documentation"}
|
||||
|
||||
with patch("skill_seekers.cli.unified_scraper.subprocess.run") as mock_run:
|
||||
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="err")
|
||||
with patch("skill_seekers.cli.doc_scraper.scrape_documentation") as mock_scrape:
|
||||
mock_scrape.return_value = 1
|
||||
scraper._scrape_documentation(source)
|
||||
|
||||
assert scraper.scraped_data["documentation"] == []
|
||||
|
||||
def test_llms_txt_url_forwarded_to_doc_config(self, tmp_path):
|
||||
"""llms_txt_url from source is forwarded to the temporary doc config."""
|
||||
"""llms_txt_url from source is forwarded to the doc config."""
|
||||
scraper = _make_scraper(tmp_path=tmp_path)
|
||||
source = {
|
||||
"base_url": "https://docs.example.com/",
|
||||
@@ -204,30 +201,21 @@ class TestScrapeDocumentation:
|
||||
"llms_txt_url": "https://docs.example.com/llms.txt",
|
||||
}
|
||||
|
||||
written_configs = []
|
||||
captured_config = {}
|
||||
|
||||
original_json_dump = json.dumps
|
||||
def fake_scrape(config, ctx=None): # noqa: ARG001
|
||||
captured_config.update(config)
|
||||
return 1 # fail so we don't need to set up output files
|
||||
|
||||
def capture_dump(obj, f, **kwargs):
|
||||
if isinstance(f, str):
|
||||
return original_json_dump(obj, f, **kwargs)
|
||||
written_configs.append(obj)
|
||||
return original_json_dump(obj)
|
||||
|
||||
with (
|
||||
patch("skill_seekers.cli.unified_scraper.subprocess.run") as mock_run,
|
||||
patch(
|
||||
"skill_seekers.cli.unified_scraper.json.dump",
|
||||
side_effect=lambda obj, _f, **_kw: written_configs.append(obj),
|
||||
),
|
||||
):
|
||||
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
|
||||
with patch("skill_seekers.cli.doc_scraper.scrape_documentation", side_effect=fake_scrape):
|
||||
scraper._scrape_documentation(source)
|
||||
|
||||
assert any("llms_txt_url" in s for c in written_configs for s in c.get("sources", [c]))
|
||||
# The llms_txt_url should be in the sources list of the doc config
|
||||
sources = captured_config.get("sources", [])
|
||||
assert any("llms_txt_url" in s for s in sources)
|
||||
|
||||
def test_start_urls_forwarded_to_doc_config(self, tmp_path):
|
||||
"""start_urls from source is forwarded to the temporary doc config."""
|
||||
"""start_urls from source is forwarded to the doc config."""
|
||||
scraper = _make_scraper(tmp_path=tmp_path)
|
||||
source = {
|
||||
"base_url": "https://docs.example.com/",
|
||||
@@ -235,19 +223,17 @@ class TestScrapeDocumentation:
|
||||
"start_urls": ["https://docs.example.com/intro"],
|
||||
}
|
||||
|
||||
written_configs = []
|
||||
captured_config = {}
|
||||
|
||||
with (
|
||||
patch("skill_seekers.cli.unified_scraper.subprocess.run") as mock_run,
|
||||
patch(
|
||||
"skill_seekers.cli.unified_scraper.json.dump",
|
||||
side_effect=lambda obj, _f, **_kw: written_configs.append(obj),
|
||||
),
|
||||
):
|
||||
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
|
||||
def fake_scrape(config, ctx=None): # noqa: ARG001
|
||||
captured_config.update(config)
|
||||
return 1
|
||||
|
||||
with patch("skill_seekers.cli.doc_scraper.scrape_documentation", side_effect=fake_scrape):
|
||||
scraper._scrape_documentation(source)
|
||||
|
||||
assert any("start_urls" in s for c in written_configs for s in c.get("sources", [c]))
|
||||
sources = captured_config.get("sources", [])
|
||||
assert any("start_urls" in s for s in sources)
|
||||
|
||||
|
||||
# ===========================================================================
|
||||
|
||||
@@ -728,13 +728,12 @@ class TestVideoArguments(unittest.TestCase):
|
||||
args = parser.parse_args([])
|
||||
self.assertEqual(args.enhance_level, 0)
|
||||
|
||||
def test_unified_parser_has_video(self):
|
||||
"""Test video subcommand is registered in main parser."""
|
||||
from skill_seekers.cli.main import create_parser
|
||||
def test_video_accessible_via_create(self):
|
||||
"""Test video source is accessible via 'create' command (not as subcommand)."""
|
||||
from skill_seekers.cli.source_detector import SourceDetector
|
||||
|
||||
parser = create_parser()
|
||||
args = parser.parse_args(["video", "--url", "https://youtube.com/watch?v=test"])
|
||||
self.assertEqual(args.url, "https://youtube.com/watch?v=test")
|
||||
info = SourceDetector.detect("https://youtube.com/watch?v=test")
|
||||
self.assertEqual(info.type, "video")
|
||||
|
||||
|
||||
# =============================================================================
|
||||
|
||||
@@ -711,21 +711,16 @@ class TestVideoArgumentSetup(unittest.TestCase):
|
||||
|
||||
|
||||
class TestVideoScraperSetupEarlyExit(unittest.TestCase):
|
||||
"""Test that --setup exits before source validation."""
|
||||
"""Test that --setup triggers run_setup via video setup module."""
|
||||
|
||||
@patch("skill_seekers.cli.video_setup.run_setup", return_value=0)
|
||||
def test_setup_skips_source_validation(self, mock_setup):
|
||||
"""--setup without --url should NOT error about missing source."""
|
||||
from skill_seekers.cli.video_scraper import main
|
||||
def test_setup_runs_successfully(self, mock_setup):
|
||||
"""run_setup(interactive=True) should return 0 on success."""
|
||||
from skill_seekers.cli.video_setup import run_setup
|
||||
|
||||
old_argv = sys.argv
|
||||
try:
|
||||
sys.argv = ["video_scraper", "--setup"]
|
||||
rc = main()
|
||||
assert rc == 0
|
||||
mock_setup.assert_called_once_with(interactive=True)
|
||||
finally:
|
||||
sys.argv = old_argv
|
||||
rc = run_setup(interactive=True)
|
||||
assert rc == 0
|
||||
mock_setup.assert_called_once_with(interactive=True)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -572,61 +572,6 @@ class TestWordJSONWorkflow(unittest.TestCase):
|
||||
self.assertTrue(skill_md.exists())
|
||||
|
||||
|
||||
class TestWordCLIArguments(unittest.TestCase):
|
||||
"""Test word subcommand CLI argument parsing via the main CLI."""
|
||||
|
||||
def setUp(self):
|
||||
import sys
|
||||
from pathlib import Path as P
|
||||
|
||||
sys.path.insert(0, str(P(__file__).parent.parent / "src"))
|
||||
from skill_seekers.cli.main import create_parser
|
||||
|
||||
self.parser = create_parser()
|
||||
|
||||
def test_docx_argument_accepted(self):
|
||||
"""--docx flag is accepted for the word subcommand."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx"])
|
||||
self.assertEqual(args.docx, "test.docx")
|
||||
|
||||
def test_api_key_accepted(self):
|
||||
"""--api-key is accepted for word subcommand."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx", "--api-key", "sk-ant-test"])
|
||||
self.assertEqual(args.api_key, "sk-ant-test")
|
||||
|
||||
def test_enhance_level_accepted(self):
|
||||
"""--enhance-level is accepted for word subcommand."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx", "--enhance-level", "1"])
|
||||
self.assertEqual(args.enhance_level, 1)
|
||||
|
||||
def test_enhance_workflow_accepted(self):
|
||||
"""--enhance-workflow is accepted and stores a list."""
|
||||
args = self.parser.parse_args(
|
||||
["word", "--docx", "test.docx", "--enhance-workflow", "minimal"]
|
||||
)
|
||||
self.assertEqual(args.enhance_workflow, ["minimal"])
|
||||
|
||||
def test_workflow_dry_run_accepted(self):
|
||||
"""--workflow-dry-run is accepted."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx", "--workflow-dry-run"])
|
||||
self.assertTrue(args.workflow_dry_run)
|
||||
|
||||
def test_dry_run_accepted(self):
|
||||
"""--dry-run is accepted for word subcommand."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx", "--dry-run"])
|
||||
self.assertTrue(args.dry_run)
|
||||
|
||||
def test_from_json_accepted(self):
|
||||
"""--from-json is accepted."""
|
||||
args = self.parser.parse_args(["word", "--from-json", "data.json"])
|
||||
self.assertEqual(args.from_json, "data.json")
|
||||
|
||||
def test_name_accepted(self):
|
||||
"""--name is accepted."""
|
||||
args = self.parser.parse_args(["word", "--docx", "test.docx", "--name", "myskill"])
|
||||
self.assertEqual(args.name, "myskill")
|
||||
|
||||
|
||||
class TestWordHelperFunctions(unittest.TestCase):
|
||||
"""Test module-level helper functions."""
|
||||
|
||||
|
||||
Reference in New Issue
Block a user