Commit Graph

12 Commits

Author SHA1 Message Date
yusyus
c6a6db01bf feat: agent-agnostic refactor, smart SPA discovery, marketplace pipeline (#336)
* feat: fix unified scraper pipeline gaps, add multi-agent support, and Unity skill configs

Fix multiple bugs in the unified scraper pipeline discovered while creating Unity skills
(Spine, Addressables, DOTween):

- Fix doc scraper KeyError by passing base_url in temp config
- Fix scraped_data list-vs-dict bug in detect_conflicts() and merge_sources()
- Add Phase 6 auto-enhancement from config "enhancement" block (LOCAL + API mode)
- Add "browser": true config support for JavaScript SPA documentation sites
- Add Phase 3 skip message for better UX
- Add subprocess timeout (3600s) for doc scraper
- Fix SkillEnhancer missing skill_dir argument in API mode
- Fix browser renderer defaults (60s timeout, domcontentloaded wait condition)
- Fix C3.x JSON filename mismatch (design_patterns.json → all_patterns.json)
- Fix workflow builtin target handling when no pattern data available
- Make AI enhancement timeout configurable via SKILL_SEEKER_ENHANCE_TIMEOUT env var (300s default)
- Add C#, Go, Rust, Swift, Ruby, PHP, GDScript to GitHub scraper extension map
- Add multi-agent LOCAL mode support across all 17 scrapers (--agent flag)
- Add Kimi/Moonshot platform support (API keys, agent presets, config wizard)
- Add unity-game-dev.yaml workflow (7 stages covering Unity-specific patterns)
- Add 3 Unity skill configs (Spine, Addressables, DOTween)
- Add comprehensive Claude bias audit report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: create AgentClient abstraction, remove hardcoded Claude from 5 enhancers (#334)

Phase 1 of the full agent-agnostic refactor. Creates a centralized AgentClient
that all enhancers use instead of hardcoded subprocess calls and model names.

New file:
- agent_client.py: Unified AI client supporting API mode (Anthropic, Moonshot,
  Google, OpenAI) and LOCAL mode (Claude Code, Kimi, Codex, Copilot, OpenCode,
  custom agents). Provides detect_api_key(), get_model(), detect_default_target().

Refactored (removed all hardcoded ["claude", ...] subprocess calls):
- ai_enhancer.py: -140 lines, delegates to AgentClient
- config_enhancer.py: -150 lines, removed _run_claude_cli()
- guide_enhancer.py: -120 lines, removed _check_claude_cli(), _call_claude_*()
- unified_enhancer.py: -100 lines, removed _check_claude_cli(), _call_claude_*()
- codebase_scraper.py: collapsed 3 functions into 1 using AgentClient

Fixed:
- utils.py: has_api_key()/get_api_key() now check all providers
- enhance_skill.py, video_scraper.py, video_visual.py: model names configurable
  via ANTHROPIC_MODEL env var
- enhancement_workflow.py: uses call() with _call_claude() fallback

Net: -153 lines of code while adding full multi-agent support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 2 agent-agnostic refactor — defaults, help text, merge mode, MCP (#334)

Phase 2 of the full agent-agnostic refactor:

Default targets:
- Changed default="claude" to auto-detect from API keys in 5 argument files
  and 3 CLI scripts (install_skill, upload_skill, enhance_skill)
- Added AgentClient.detect_default_target() for runtime resolution
- MCP server functions now use "auto" default with runtime detection

Help text (16+ argument files):
- Replaced "ANTHROPIC_API_KEY" / "Claude Code" with agent-neutral wording
- Now mentions all API keys (ANTHROPIC, MOONSHOT, etc.) and "AI coding agent"

Log messages:
- main.py, enhance_command.py: "Claude Code CLI" → dynamic agent name
- enhance_command.py docstring: "Claude Code" → "AI coding agent"

Merge mode rename:
- Added "ai-enhanced" as the preferred merge mode name
- "claude-enhanced" kept as backward-compatible alias
- Renamed ClaudeEnhancedMerger → AIEnhancedMerger (with alias)
- Updated choices, validators, and descriptions

MCP server descriptions:
- server_fastmcp.py: "Claude AI skills" → "LLM skills" in tool descriptions
- packaging_tools.py: Updated defaults and dry-run messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 3 agent-agnostic refactor — docstrings, MCP descriptions, README (#334)

Phase 3 of the full agent-agnostic refactor:

Module docstrings (17+ scraper files):
- "Claude Skill Converter" → "AI Skill Converter"
- "Build Claude skill" → "Build AI/LLM skill"
- "Asking Claude" → "Asking AI"
- Updated doc_scraper, github_scraper, pdf_scraper, word_scraper,
  epub_scraper, video_scraper, enhance_skill, enhance_skill_local,
  unified_scraper, and others

MCP server_legacy.py (30+ fixes):
- All tool descriptions: "Claude skill" → "LLM skill"
- "Upload to Claude" → "Upload skill"
- "enhance with Claude Code" → "enhance with AI agent"
- Kept claude.ai/skills URLs (platform-specific, correct)

MCP README.md:
- Added multi-agent support note at top
- "Claude AI skills" → "LLM skills" throughout
- Updated examples to show multi-platform usage
- Kept Claude Code in supported agents list (accurate)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Phase 3 continued — remaining docstring and comment fixes (#334)

Additional agent-neutral text fixes in 8 files missed from the initial
Phase 3 commit:
- config_extractor.py, config_manager.py, constants.py: comments
- enhance_command.py: docstring and print messages
- guide_enhancer.py: class/module docstrings
- parsers/enhance_parser.py, install_parser.py: help text
- signal_flow_analyzer.py: docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* workflow added

* fix: address code review issues in AgentClient and Phase 6 (#334)

Fixes found during commit review:

1. AgentClient._call_local: Only append "Write your response to:" when
   caller explicitly passes output_file (was always appending)

2. Codex agent: Added uses_stdin flag to preset, pipe prompt via stdin
   instead of DEVNULL (codex reads from stdin with "-" arg)

3. Provider detection: Added _detect_provider_from_key() to detect
   provider from API key prefix (sk-ant- → anthropic, AIza → google)
   instead of always assuming anthropic

4. Phase 6 API mode: Replaced direct SkillEnhancer/ANTHROPIC_API_KEY
   with AgentClient for multi-provider support (Moonshot, Google, OpenAI)

5. config_enhancer: Removed output_file path from prompt — AgentClient
   manages temp files and output detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make claude adaptor model name configurable via ANTHROPIC_MODEL env var

Missed in the Phase 1 refactor — adaptors/claude.py:381 had a hardcoded
model name without the os.environ.get() wrapper that all other files use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add copilot stdin support, custom agent, and kimi aliases (#334)

Additional agent improvements from Kimi review:
- Added uses_stdin: True to copilot agent preset (reads from stdin like codex)
- Added custom agent support via SKILL_SEEKER_AGENT_CMD env var in _call_local()
- Added kimi_code/kimi-code aliases in normalize_agent_name()
- Added "kimi" to --target choices in enhance arguments
- Updated help text with MOONSHOT_API_KEY across argument files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Kimi CLI integration — add uses_stdin and output parsing (#334)

Kimi CLI's --print mode requires stdin piping and outputs structured
protocol messages (TurnBegin, TextPart, etc.) instead of plain text.

Fixes:
- Added uses_stdin: True to kimi preset (was not piping prompt)
- Added parse_output: "kimi" flag to preset
- Added _parse_kimi_output() to extract text from TextPart lines
- Kimi now returns clean text instead of raw protocol dump

Tested: kimi returns '{"status": "ok"}' correctly via AgentClient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Kimi CLI in enhance_skill_local — remove wrong skip-permissions, use absolute path

Two bugs in enhance_skill_local.py AGENT_PRESETS for Kimi:
1. supports_skip_permissions was True — Kimi doesn't support
   --dangerously-skip-permissions, only Claude does. Fixed to False.
2. {skill_dir} was resolved as relative path — Kimi CLI requires
   absolute paths for --work-dir. Fixed with .resolve().

Tested: `skill-seekers enhance output/test-e2e/ --agent kimi`
now works end-to-end (107s, 9233 bytes output).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove invalid --enhance-level flag from enhance subprocess calls

doc_scraper.py and video_scraper.py were passing --enhance-level to
skill-seekers-enhance, which doesn't accept that flag. This caused
enhancement to fail silently after scraping completed.

Fixes:
- Removed --enhance-level from enhance subprocess calls
- Added --agent passthrough in doc_scraper.py
- Fixed log messages to show correct command

Tested: `skill-seekers create <url> --enhance-level 1` now chains
scrape → enhance successfully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --agent and --agent-cmd to create command UNIVERSAL_ARGUMENTS

The --agent flag was defined in common.py but not imported into the
create command's UNIVERSAL_ARGUMENTS, so it wasn't available when using
`skill-seekers create <source> --agent kimi`. Now all 17 source types
support the --agent flag via the create command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update docs data_file path after moving to cache directory

The scraped_data["documentation"] stored the original output/ path for
data_file, but the directory was moved to .skillseeker-cache/ afterward.
Phase 2 conflict detection then failed with FileNotFoundError trying to
read the old path.

Now updates data_file to point to the cache location after the move.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: multi-language code signature extraction in GitHub scraper

The GitHub scraper only analyzed files matching the primary language
(by bytes). For multi-language repos like spine-runtimes (C++ primary
but C# is the target), this meant 0 C# files were analyzed.

Fix: Analyze top 3 languages with known extension mappings instead of
just the primary. Also support "language" field in config source to
explicitly target specific languages (e.g., "language": "C#").

Updated Unity configs to specify language: "C#" for focused analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: per-file language detection + remove artificial analysis limits

Rewrites GitHub scraper's _extract_signatures_and_tests() to detect
language per-file from extension instead of only analyzing the primary
language. This fixes multi-language repos like spine-runtimes (C++ primary)
where C# files were never analyzed.

Changes:
- Build reverse ext→language map, detect language per-file
- Analyze ALL files with known extensions (not just primary language)
- Config "language" field works as optional filter, not a workaround
- Store per-file language + languages_analyzed in output
- Remove 50-file API mode limit (rate limiting already handles this)
- Remove 100-file default config extraction limit (now unlimited by default)
- Fix unified scraper default max_pages from 100 to 500 (matches constants.py)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove remaining 100-file limit in config_extractor.extract_from_directory

The find_config_files default was changed to unlimited but
extract_from_directory and CLI --max-files still defaulted to 100.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: replace interactive terminal merge with automated AgentClient call

AIEnhancedMerger._launch_claude_merge() used to open a terminal window,
run a bash script, and poll for a file — requiring manual interaction.

Now uses AgentClient.call() to send the merge prompt directly and parse
the JSON response. Fully automated, no terminal needed, works with any
configured AI agent (Claude, Kimi, etc.).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add marketplace pipeline for publishing skills to Claude Code plugin repos

Connect the three-repo pipeline: configs repo → Skill Seekers engine → plugin
marketplace repos. Enables automated publishing of generated skills directly
into Claude Code plugin repositories with proper plugin.json and marketplace.json
structure.

New components:
- MarketplaceManager: Registry for plugin marketplace repos at
  ~/.skill-seekers/marketplaces.json with per-repo git tokens, branch config,
  and default author metadata
- MarketplacePublisher: Clones marketplace repo, creates plugin directory
  structure (skills/, .claude-plugin/plugin.json), updates marketplace.json,
  commits and pushes. Includes skill_name validation to prevent path traversal,
  and cleanup of partial state on git failures
- 4 MCP tools: add_marketplace, list_marketplaces, remove_marketplace,
  publish_to_marketplace — registered in FastMCP server
- Phase 6 in install workflow: automatic marketplace publishing after packaging,
  triggered by --marketplace CLI arg or marketplace_targets config field

CLI additions:
- --marketplace NAME: publish to registered marketplace after packaging
- --marketplace-category CAT: plugin category (default: development)
- --create-branch: create feature branch instead of committing to main

Security:
- Skill name regex validation (^[a-zA-Z0-9][a-zA-Z0-9._-]*$) prevents path
  traversal attacks via malicious SKILL.md frontmatter
- has_api_key variable scoping fix in install workflow summary
- try/finally cleanup of partial plugin directories on publish failure

Config schema:
- Optional marketplace_targets field in config JSON for multi-marketplace
  auto-publishing: [{"marketplace": "spyke", "category": "development"}]
- Backward compatible — ignored by older versions

Tests: 58 tests (36 manager + 22 publisher including 2 integration tests
using file:// git protocol for full publish success path)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: thread agent selection through entire enhancement pipeline

Propagates the --agent and --agent-cmd CLI parameters through all
enhancement components so users can use any supported coding agent
(kimi, claude, copilot, codex, opencode) consistently across the
full pipeline, not just in top-level enhancement.

Agent parameter threading:
- AIEnhancer: accepts agent param, passes to AgentClient
- ConfigEnhancer: accepts agent param, passes to AgentClient
- WorkflowEngine: accepts agent param, passes to sub-enhancers
  (PatternEnhancer, TestExampleEnhancer, AIEnhancer)
- ArchitecturalPatternDetector: accepts agent param for AI enhancement
- analyze_codebase(): accepts agent/agent_cmd, forwards to
  ConfigEnhancer, ArchitecturalPatternDetector, and doc processing
- UnifiedScraper: reads agent from CLI args, forwards to doc scraper
  subprocess, C3.x analysis, and LOCAL enhancement
- CreateCommand: forwards --agent and --agent-cmd to subprocess argv
- workflow_runner: passes agent to WorkflowEngine for inline/named workflows

Timeout improvements:
- Default enhancement timeout increased from 300s (5min) to 2700s (45min)
  to accommodate large skill generation with local agents
- New get_default_timeout() in agent_client.py with env var override
  (SKILL_SEEKER_ENHANCE_TIMEOUT) supporting 'unlimited' value
- Config enhancement block supports "timeout": "unlimited" field
- Removed hardcoded timeout=300 and timeout=600 calls in config_enhancer
  and merge_sources, now using centralized default

CLI additions (unified_scraper):
- --agent AGENT: select local coding agent for enhancement
- --agent-cmd CMD: override agent command template (advanced)

Config: unity-dotween.json updated with agent=kimi, timeout=unlimited,
removed unused file_patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add claude-code unified config for Claude Code CLI skill generation

Unified config combining official Claude Code documentation and source
code analysis. Covers internals, architecture, tools, commands, IDE
integrations, MCP, plugins, skills, and development workflows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add multi-agent support verification report and test artifacts

- AGENT_SUPPORT_VERIFICATION.md: verification report confirming agent
  parameter threading works across all enhancement components
- END_TO_END_EXAMPLES.md: complete workflows for all 17 source types
  with both Claude and Kimi agents
- test_agents.sh: shell script for real-world testing of agent support
  across major CLI commands with both agents
- test_realworld.md: real-world test scenarios for manual QA

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add .env to .gitignore to prevent secret exposure

The .env file containing API keys (ANTHROPIC_API_KEY, GITHUB_TOKEN, etc.)
was not in .gitignore, causing it to appear as untracked and risking
accidental commit. Added .env, .env.local, and .env.*.local patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: URL filtering uses base directory instead of full page URL (#331)

is_valid_url() checked url.startswith(self.base_url) where base_url
could be a full page path like ".../manual/index.html". Sibling pages
like ".../manual/LoadingAssets.html" failed the check because they
don't start with ".../index.html".

Now strips the filename to get the directory prefix:
  "https://example.com/docs/index.html" → "https://example.com/docs/"

This fixes SPA sites like Unity's DocFX docs where browser mode renders
the page but sibling links were filtered out.

Closes #331

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pass language config through to GitHub scraper in unified flow

The unified scraper built github_config from source fields but didn't
include the "language" field. The GitHub scraper's per-file detection
read self.config.get("language", "") which was always empty, so it
fell back to analyzing all languages instead of the focused C# filter.

For DOTween (C# only repo), this caused 0 files analyzed because
without the language filter, it analyzed top 3 languages but the
file tree matching failed silently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: centralize all enhancement timeouts to 45min default with unlimited support

All enhancement/AI timeouts now use get_default_timeout() from
agent_client.py instead of scattered hardcoded values (120s, 300s, 600s).

Default: 2700s (45 minutes)
Override: SKILL_SEEKER_ENHANCE_TIMEOUT env var
Unlimited: Set to "unlimited", "none", or "0"

Updated: agent_client.py, enhance_skill_local.py, arguments/enhance.py,
enhance_command.py, unified_enhancer.py, unified_scraper.py

Not changed (different purposes):
- Browser page load timeout (60s)
- API HTTP request timeout (120s)
- Doc scraper subprocess timeout (3600s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add browser_wait_until and browser_extra_wait config for SPA docs

DocFX sites (Unity docs) render navigation via JavaScript after initial
page load. With domcontentloaded, only 1 link was found. With
networkidle + 5s extra wait, 95 content pages are discovered.

New config options for documentation sources:
- browser_wait_until: "networkidle" | "load" | "domcontentloaded"
- browser_extra_wait: milliseconds to wait after page load for lazy nav

Updated Addressables config to use networkidle + 5000ms extra wait.
Pass browser settings through unified scraper to doc scraper config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: three-layer smart discovery engine for SPA documentation sites

Replaces the browser_wait_until/browser_extra_wait config hacks with
a proper discovery engine that runs before the BFS crawl loop:

Layer 1: sitemap.xml — checks domain root for sitemap, parses <loc> tags
Layer 2: llms.txt — existing mechanism (unchanged)
Layer 3: SPA nav — renders index page with networkidle via Playwright,
         extracts all links from the fully-rendered DOM sidebar/TOC

The BFS crawl then uses domcontentloaded (fast) since all pages are
already discovered. No config hacks needed — browser mode automatically
triggers SPA discovery when only 1 page is found.

Tested: Unity Addressables DocFX site now discovers 95 pages (was 1).
Removed browser_wait_until/browser_extra_wait from Addressables config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace manual arg forwarding with dynamic routing in create command

The create command manually hardcoded ~60% of scraper flags in _route_*()
methods, causing ~40 flags to be silently dropped. Every new flag required
edits in 2 places (arguments/create.py + create_command.py), guaranteed
to drift.

Replaced with _build_argv() — a dynamic forwarder that iterates
vars(self.args) and forwards all explicitly-set arguments automatically,
using the same pattern as main.py::_reconstruct_argv(). This eliminates
the root cause of all flag gaps.

Changes in create_command.py (-380 lines, +175 lines = net -205):
- Added _build_argv() dynamic arg forwarder with dest→flag translation
  map for mismatched names (async_mode→--async, video_playlist→--playlist,
  skip_config→--skip-config-patterns, workflow_var→--var)
- Added _call_module() helper (dedup sys.argv swap pattern)
- Simplified all _route_*() methods from 50-70 lines to 5-10 lines each
- Deleted _add_common_args() entirely (subsumed by _build_argv)
- _route_generic() now forwards ALL args, not just universal ones

New flags accessible via create command:
- --from-json: build skill from pre-extracted JSON (all source types)
- --skip-api-reference: skip API reference generation (local codebase)
- --skip-dependency-graph: skip dependency analysis (local codebase)
- --skip-config-patterns: skip config pattern extraction (local codebase)
- --no-comments: skip comment extraction (local codebase)
- --depth: analysis depth control (local codebase, deprecated)
- --setup: auto-detect GPU/install video deps (video)

Bug fix in unified_scraper.py:
- Fixed C3.x pattern data loss: unified_scraper read patterns/detected_patterns.json
  but codebase_scraper writes patterns/all_patterns.json. Changed both read
  locations (line 828 for local sources, line 1597 for GitHub C3.x) to use
  the correct filename. This was causing 100% loss of design pattern data
  (e.g., 905 patterns detected but 0 included in final skill).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 5 code review issues in marketplace and package pipeline

Fixes found by automated code review of the marketplace feature and
package command:

1. --marketplace flag silently ignored in package_skill.py CLI
   Added MarketplacePublisher invocation after successful packaging
   when --marketplace is provided. Previously the flag was parsed
   but never acted on.

2. Missing 7 platform choices in --target (package.py)
   Added minimax, opencode, deepseek, qwen, openrouter, together,
   fireworks to the argparse choices list. These platforms have
   registered adaptors but were rejected by the argument parser.

3. is_update always True for new marketplace registrations
   Two separate datetime.now() calls produced different microsecond
   timestamps, making added_at != updated_at always. Fixed by
   assigning a single timestamp to both fields.

4. Shallow clone (depth=1) caused push failures for marketplace repos
   MarketplacePublisher now does full clones instead of using
   GitConfigRepo's shallow clone (which is designed for read-only
   config fetching). Full clone is required for commit+push workflow.

5. Partial plugin dir not cleaned on force=True failure
   Removed the `and not force` guard from cleanup logic — if an
   operation fails midway, the partial directory should be cleaned
   regardless of whether force was set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address dynamic routing edge cases in create_command

Fixes from code review of the _build_argv() refactor:

1. Non-None defaults forwarded unconditionally — added enhance_level=2,
   doc_version="", video_languages="en", whisper_model="base",
   platform="slack", visual_interval=0.7, visual_min_gap=0.5,
   visual_similarity=3.0 to the defaults dict so they're only forwarded
   when the user explicitly overrides them. This fixes video sources
   incorrectly getting --enhance-level 2 (video default is 0).

2. video_url dest not translated — added "video_url": "--url" to
   _DEST_TO_FLAG so create correctly forwards --video-url as --url
   to video_scraper.py.

3. Video positional args double-forwarded — added video_url,
   video_playlist, video_file to _SKIP_ARGS since _route_video()
   already handles them via positional args from source detection.

4. Removed dead workflow_var entry from _DEST_TO_FLAG — the create
   parser uses key "var" not "workflow_var", so the translation
   was never triggered.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 15 broken tests and --from-json crash bug in create command

Fixes found by Kimi code review of the dynamic routing refactor:

1. 3 test_create_arguments.py failures — UNIVERSAL_ARGUMENTS count
   changed from 19 to 21 (added agent, agent_cmd). Updated expected
   count and name set. Moved from_json out of UNIVERSAL to
   ADVANCED_ARGUMENTS since not all scrapers support it.

2. 12 test_create_integration_basic.py failures — tests called
   _add_common_args() which was deleted in the refactor. Rewrote
   _collect_argv() to use _build_argv() via CreateCommand with
   SourceDetector. Updated _make_args defaults to match new
   parameter set.

3. --from-json crash bug — was in UNIVERSAL_ARGUMENTS so create
   accepted it for all source types, but web/github/local scrapers
   don't support it. Forwarding it caused argparse "unrecognized
   arguments" errors. Moved to ADVANCED_ARGUMENTS with documentation
   listing which source types support it.

4. Additional _is_explicitly_set defaults — added enhance_level=2,
   doc_version="", video_languages="en", whisper_model="base",
   platform="slack", visual_interval/min_gap/similarity defaults to
   prevent unconditional forwarding of parser defaults.

5. Video arg handling — added video_url to _DEST_TO_FLAG translation
   map, added video_url/video_playlist/video_file to _SKIP_ARGS
   (handled as positionals by _route_video).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: C3.x analysis data loss — read from references/ after _generate_references() cleanup

Root cause: _generate_references() in codebase_scraper.py copies analysis
directories (patterns/, test_examples/, config_patterns/, architecture/,
dependencies/, api_reference/) into references/ then DELETES the originals
to avoid duplication (Issue #279). But unified_scraper.py reads from the
original paths after analyze_codebase() returns — by which time the
originals are gone.

This caused 100% data loss for all 6 C3.x data types (design patterns,
test examples, config patterns, architecture, dependencies, API reference)
in the unified scraper pipeline. The data was correctly detected (e.g.,
905 patterns in 510 files) but never made it into the final skill.

Fix: Added _load_json_fallback() method that checks references/{subdir}/
first (where _generate_references moves the data), falling back to the
original path. Applied to both GitHub C3.x analysis (line ~1599) and
local source analysis (line ~828).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add allowlist to _build_argv for config route to unified_scraper

_build_argv() was forwarding all CLI args (--name, --doc-version, etc.)
to unified_scraper which doesn't accept them. Added allowlist parameter
to _build_argv() — when provided, ONLY args in the allowlist are forwarded.

The config route now uses _UNIFIED_SCRAPER_ARGS allowlist with the exact
set of flags unified_scraper accepts.

This is a targeted patch — the proper fix is the ExecutionContext singleton
refactor planned separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add force=True to marketplace publish from package CLI

The package command's --marketplace flag didn't pass force=True to
MarketplacePublisher, so re-publishing an existing skill would fail
with "already exists" error instead of overwriting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add push_config tool for publishing configs to registered source repos

New ConfigPublisher class that validates configs, places them in the
correct category directory, commits, and pushes to registered source
repositories. Follows the MarketplacePublisher pattern.

Features:
- Auto-detect category from config name/description
- Validate via ConfigValidator + repo's validate-config.py
- Support feature branch or direct push
- Force overwrite existing configs
- MCP tool: push_config(config_path, source_name, category)

Usage:
  push_config(config_path="configs/unity-spine.json", source_name="spyke")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: security hardening, error handling, tests, and cleanup

Security:
- Remove command injection via cloned repo script execution (config_publisher)
- Replace git add -A with targeted staging (marketplace_publisher)
- Clear auth tokens from cached .git/config after clone
- Use defusedxml for sitemap XML parsing (XXE protection)
- Add path traversal validation for config names

Error handling:
- AgentClient: specific exception handling for rate limit, auth, connection errors
- AgentClient: log subprocess stderr on non-zero exit, raise on explicit API mode failure
- config_publisher: only catch ValueError for validation warnings

Logic bugs:
- Fix _build_argv silently dropping --enhance-level 2 (matched default)
- Fix URL filtering over-broadening (strip to parent instead of adding /)
- Log warning when _call_module returns None exit code

Tests (134 new):
- test_agent_client.py: 71 tests for normalize, detect, init, timeout, model
- test_config_publisher.py: 23 tests for detect_category, publish, errors
- test_create_integration_basic.py: 20 tests for _build_argv routing
- Fix 11 pre-existing failures (guide_enhancer, doctor, install_skill, marketplace)

Cleanup:
- Remove 5 dev artifact files (-1405 lines)
- Rename _launch_claude_merge to _launch_ai_merge

All 3194 tests pass, 39 expected skips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pin ruff==0.15.8 in CI and reformat packaging_tools.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing pytest install to vector DB adaptor test jobs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reformat 7 files for ruff 0.15.8 and fix vector DB test path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove test-week2-integration job referencing missing script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update e2e test to accept dynamic platform name in upload phase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: YusufKaraaslanSpyke <yusuf@spykegames.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 04:50:15 +03:00
yusyus
0265de5816 style: Format all Python files with ruff
- Formatted 103 files to comply with ruff format requirements
- No code logic changes, only formatting/whitespace
- Fixes CI formatting check failures
2026-02-08 14:42:27 +03:00
yusyus
fda3712367 feat: Extend framework detection to 5 languages (JavaScript, Java, Ruby, PHP, C#)
## Summary
Framework detection now works for **6 languages** (up from 1):
-  Python (original)
-  JavaScript/TypeScript (new)
-  Java (new)
-  Ruby (new)
-  PHP (new)
-  C# (new)

## Changes

### 1. JavaScript/TypeScript Import Extraction (code_analyzer.py:361-386)
Detects:
- ES6 imports: `import React from 'react'`
- Side-effect imports: `import 'style.css'`
- CommonJS: `const foo = require('bar')`

Extracts package names: `react`, `vue`, `angular`, `express`, `axios`, etc.

### 2. Java Import Extraction (code_analyzer.py:1093-1110)
Detects:
- Package imports: `import org.springframework.boot.*;`
- Static imports: `import static com.example.Util.*;`

Extracts base packages: `org.springframework`, `com.google`, etc.

### 3. Ruby Import Extraction (code_analyzer.py:1245-1258)
Detects:
- Require: `require 'rails'`
- Require relative: `require_relative 'config'`

Extracts gem names: `rails`, `sinatra`, etc.

### 4. PHP Import Extraction (code_analyzer.py:1368-1381)
Detects:
- Namespace use: `use Laravel\Framework\App;`
- Aliased use: `use Foo\Bar as Baz;`

Extracts vendor names: `laravel`, `symfony`, etc.

### 5. C# Import Extraction (code_analyzer.py:677-696)
Detects:
- Using directives: `using System.Collections.Generic;`
- Static using: `using static System.Math;`

Extracts namespaces: `System.Collections`, `Microsoft.AspNetCore`, etc.

### 6. Enhanced Framework Markers (architectural_pattern_detector.py:104-111)
Added import-based markers for better detection:
- **Spring**: Added `org.springframework`
- **ASP.NET**: Added `Microsoft.AspNetCore`, `System.Web`
- **Rails**: Added `action` (for ActionController, ActionMailer)
- **Angular**: Added `@angular`, `angular`
- **Laravel**: Added `illuminate`, `laravel`

### 7. Multi-Language Support (architectural_pattern_detector.py:202-210)
Framework detector now:
- Collects imports from **all languages** (not just Python)
- Logs: "Collected N imports from M files"
- Detects frameworks across polyglot projects

## Test Results

**Multi-language test project:**
```
react_app/App.jsx       → React detected 
spring_app/Application.java → Spring detected 
rails_app/controller.rb → Rails detected 
```

**Output:**
```json
{
  "frameworks_detected": ["Spring", "Rails", "React"]
}
```

**All tests passing:**
-  95 tests (38 + 54 + 3)
-  No breaking changes
-  Backward compatible

## Impact

### What This Enables

1. **Polyglot project support** - Detect multiple frameworks in monorepos
2. **Better accuracy** - Import-based detection is more reliable than path-based
3. **Technology Stack insights** - ARCHITECTURE.md now shows all frameworks used
4. **Multi-platform coverage** - Works for web, mobile, backend, enterprise

### Supported Frameworks by Language

**JavaScript/TypeScript:**
- React, Vue.js, Angular (frontend)
- Express, Nest.js (backend)

**Java:**
- Spring Framework (Spring Boot, Spring MVC, etc.)

**Ruby:**
- Ruby on Rails

**PHP:**
- Laravel

**C#:**
- ASP.NET (Core, MVC, Web API)

**Python:**
- Django, Flask

### Example Use Cases

**Full-stack project:**
```
frontend/ (React)     → React detected
backend/ (Spring)     → Spring detected
Result: ["React", "Spring"]
```

**Microservices:**
```
api-gateway/ (Express)  → Express detected
auth-service/ (Spring)  → Spring detected
user-service/ (Rails)   → Rails detected
Result: ["Express", "Spring", "Rails"]
```

## Future Extensions

Ready to add:
- Go: `import "github.com/gin-gonic/gin"`
- Rust: `use actix_web::*;`
- Swift: `import SwiftUI`
- Kotlin: `import kotlinx.coroutines.*`

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 22:08:37 +03:00
yusyus
a565b87a90 fix: Framework detection now works by including import-only files (fixes #239)
## Problem
Framework detection was broken because files with only imports (no
classes/functions) were excluded from analysis. The architectural pattern
detector received empty file lists, resulting in 0 frameworks detected.

## Root Cause
In codebase_scraper.py:873-881, the has_content check filtered out files
that didn't have classes, functions, or other structural elements. This
excluded simple __init__.py files that only contained import statements,
which are critical for framework detection.

## Solution (3 parts)

1. **Extract imports from Python files** (code_analyzer.py:140-178)
   - Added import extraction using AST (ast.Import, ast.ImportFrom)
   - Returns imports list in analysis results
   - Now captures: "from flask import Flask" → ["flask"]

2. **Include import-only files** (codebase_scraper.py:873-881)
   - Updated has_content check to include files with imports
   - Files with imports are now included in analysis results
   - Comment added: "IMPORTANT: Include files with imports for framework
     detection (fixes #239)"

3. **Enhance framework detection** (architectural_pattern_detector.py:195-240)
   - Extract imports from all Python files in analysis
   - Check imports in addition to file paths and directory structure
   - Prioritize import-based detection (high confidence)
   - Require 2+ matches for path-based detection (avoid false positives)
   - Added debug logging: "Collected N imports for framework detection"

## Results

**Before fix:**
- Test Flask project: 0 files analyzed, 0 frameworks detected
- Files with imports: excluded from analysis
- Framework detection: completely broken

**After fix:**
- Test Flask project: 3 files analyzed, Flask detected 
- Files with imports: included in analysis
- Framework detection: working correctly
- No false positives (ASP.NET, Rails, etc.)

## Testing

Added comprehensive test suite (tests/test_framework_detection.py):
-  test_flask_framework_detection_from_imports
-  test_files_with_imports_are_included
-  test_no_false_positive_frameworks

All existing tests pass:
-  38 tests in test_codebase_scraper.py
-  54 tests in test_code_analyzer.py
-  3 new tests in test_framework_detection.py

## Impact

- Fixes issue #239 completely
- Framework detection now works for Python projects
- Import-only files (common in Python packages) are properly analyzed
- No performance impact (import extraction is fast)
- No breaking changes to existing functionality

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 22:02:06 +03:00
yusyus
4e8ad835ed style: Format code with ruff formatter
- Auto-format 11 files to comply with ruff formatting standards
- Fixes CI/CD formatter check failures

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 21:37:54 +03:00
yusyus
50b28fe561 fix: Framework detection, circular deps, and GDScript test discovery
FIXES:

1. Framework Detection (Unity → Godot)
   PROBLEM: Detected Unity instead of Godot due to generic "Assets" marker
   - "Assets" appears in comments: "// TODO: Replace with actual music assets"
   - Triggered false positive for Unity framework

   SOLUTION: Made Unity markers more specific
   - Before: "Assets", "ProjectSettings" (too generic)
   - After: "Assembly-CSharp.csproj", "UnityEngine.dll", "Library/" (specific)
   - Godot markers: "project.godot", ".godot", ".tscn", ".tres", ".gd"

   FILE: architectural_pattern_detector.py line 92-94

2. Circular Dependencies (Self-References)
   PROBLEM: Files showing circular dependency to themselves
   - WARNING: Cycle: analysis-config.gd -> analysis-config.gd
   - 3 self-referential cycles detected

   ROOT CAUSE: No self-loop filtering in build_graph()
   - File resolves class_name to itself
   - Edge created from file to same file

   SOLUTION: Skip self-dependencies in build_graph()
   - Added check: `target != file_path`
   - Prevents file from depending on itself

   FILE: dependency_analyzer.py line 728

3. GDScript Test File Detection
   PROBLEM: Found 0 test files (expected 20 GUT tests with 396 tests)
   - TEST_PATTERNS missing GDScript patterns
   - Only had: test_*.py, *_test.go, Test*.java, etc.

   SOLUTION: Added GDScript test patterns
   - Added: "test_*.gd", "*_test.gd" (GUT, gdUnit4, WAT)
   - Added ".gd": "GDScript" to LANGUAGE_MAP

   FILES:
   - test_example_extractor.py line 886-887
   - test_example_extractor.py line 901

IMPACT:
-  Godot projects correctly detected as "Godot" (not Unity)
-  No more false circular dependency warnings
-  GUT/gdUnit4/WAT test files now discovered and analyzed
-  Better test example extraction for Godot projects

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 22:11:38 +03:00
yusyus
6fe3e48b8a fix: Framework detection now checks directory structure for game engines
**Problem:**
Framework detection only checked analyzed source files, missing game
engine marker files like project.godot, .unity, .uproject (config files).

**Root Cause:**
_detect_frameworks() only scanned files_analysis list which contains
source code (.cs, .py, .js) but not config files.

**Solution:**
- Now scans actual directory structure using directory.iterdir()
- Checks BOTH analyzed files AND directory contents
- Game engines checked FIRST with priority (prevents false positives)
- Returns early if game engine found (avoids Unity→ASP.NET confusion)

**Test Results:**
Before: frameworks_detected: []
After:  frameworks_detected: ["Godot"] 

Tested with: Cosmic Ideler (Godot 4.6 RC2 project)
- Correctly detects project.godot file
- No longer requires source code to have "godot" in paths

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 21:20:17 +03:00
yusyus
32e080da1f feat: Complete Unity/game engine support and local source type validation
Completes the implementation for Unity/Unreal/Godot game engine support
and adds missing "local" source type validation.

Changes:
- Add "local" to VALID_SOURCE_TYPES in config_validator.py
- Add _validate_local_source() method with full validation
- Add Unity/Unreal/Godot to FRAMEWORK_MARKERS for priority detection
- Add game engine directory exclusions to all 3 scrapers:
  * Unity: Library/, Temp/, Logs/, UserSettings/, etc.
  * Unreal: Intermediate/, Saved/, DerivedDataCache/
  * Godot: .godot/, .import/
- Prevents scanning massive build cache directories (saves GBs + hours)

This completes all features mentioned in PR #278:
 Unity/Unreal/Godot framework detection with priority
 Pattern enhancement performance fix (grouped approach)
 Game engine directory exclusions
 Phase 5 SKILL.md AI enhancement
 Local source references copying
 "local" source type validation
 Config field name compatibility
 C# test example extraction

Tested:
- All unified config tests pass (18/18)
- All config validation tests pass (28/28)
- Ready for Unity project testing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 21:06:01 +03:00
yusyus
596b219599 fix: Resolve remaining 188 linting errors (249 total fixed)
Second batch of comprehensive linting fixes:

Unused Arguments/Variables (136 errors):
- ARG002/ARG001 (91 errors): Prefixed unused method/function arguments with '_'
  - Interface methods in adaptors (base.py, gemini.py, markdown.py)
  - AST analyzer methods maintaining signatures (code_analyzer.py)
  - Test fixtures and hooks (conftest.py)
  - Added noqa: ARG001/ARG002 for pytest hooks requiring exact names
- F841 (45 errors): Prefixed unused local variables with '_'
  - Tuple unpacking where some values aren't needed
  - Variables assigned but not referenced

Loop & Boolean Quality (28 errors):
- B007 (18 errors): Prefixed unused loop control variables with '_'
  - enumerate() loops where index not used
  - for-in loops where loop variable not referenced
- E712 (10 errors): Simplified boolean comparisons
  - Changed '== True' to direct boolean check
  - Changed '== False' to 'not' expression
  - Improved test readability

Code Quality (24 errors):
- SIM201 (4 errors): Already fixed in previous commit
- SIM118 (2 errors): Already fixed in previous commit
- E741 (4 errors): Already fixed in previous commit
- Config manager loop variable fix (1 error)

All Tests Passing:
- test_scraper_features.py: 42 passed
- test_integration.py: 51 passed
- test_architecture_scenarios.py: 11 passed
- test_real_world_fastmcp.py: 19 passed, 1 skipped

Note: Some SIM errors (nested if, multiple with) remain unfixed as they
would require non-trivial refactoring. Focus was on functional correctness.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-17 23:02:11 +03:00
Pablo Estevez
c33c6f9073 change max lenght 2026-01-17 17:48:15 +00:00
Pablo Estevez
5ed767ff9a run ruff 2026-01-17 17:29:21 +00:00
yusyus
73758182ac feat: C3.6 AI Enhancement + C3.7 Architectural Pattern Detection
Implemented two major features to enhance codebase analysis with intelligent,
automatic AI integration and architectural understanding.

## C3.6: AI Enhancement (Automatic & Smart)

Enhances C3.1 (Pattern Detection) and C3.2 (Test Examples) with AI-powered
insights using Claude API - works automatically when API key is available.

**Pattern Enhancement:**
- Explains WHY each pattern was detected (evidence-based reasoning)
- Suggests improvements and identifies potential issues
- Recommends related patterns
- Adjusts confidence scores based on AI analysis

**Test Example Enhancement:**
- Adds educational context to each example
- Groups examples into tutorial categories
- Identifies best practices demonstrated
- Highlights common mistakes to avoid

**Smart Auto-Activation:**
-  ZERO configuration - just set ANTHROPIC_API_KEY environment variable
-  NO special flags needed - works automatically
-  Graceful degradation - works offline without API key
-  Batch processing (5 items/call) minimizes API costs
-  Self-disabling if API unavailable or key missing

**Implementation:**
- NEW: src/skill_seekers/cli/ai_enhancer.py
  - PatternEnhancer: Enhances detected design patterns
  - TestExampleEnhancer: Enhances test examples with context
  - AIEnhancer base class with auto-detection
- Modified: pattern_recognizer.py (enhance_with_ai=True by default)
- Modified: test_example_extractor.py (enhance_with_ai=True by default)
- Modified: codebase_scraper.py (always passes enhance_with_ai=True)

## C3.7: Architectural Pattern Detection

Detects high-level architectural patterns by analyzing multi-file relationships,
directory structures, and framework conventions.

**Detected Patterns (8):**
1. MVC (Model-View-Controller)
2. MVVM (Model-View-ViewModel)
3. MVP (Model-View-Presenter)
4. Repository Pattern
5. Service Layer Pattern
6. Layered Architecture (3-tier, N-tier)
7. Clean Architecture
8. Hexagonal/Ports & Adapters

**Framework Detection (10+):**
- Backend: Django, Flask, Spring, ASP.NET, Rails, Laravel, Express
- Frontend: Angular, React, Vue.js

**Features:**
- Multi-file analysis (analyzes entire codebase structure)
- Directory structure pattern matching
- Evidence-based detection with confidence scoring
- AI-enhanced architectural insights (integrates with C3.6)
- Always enabled (provides valuable high-level overview)
- Output: output/codebase/architecture/architectural_patterns.json

**Implementation:**
- NEW: src/skill_seekers/cli/architectural_pattern_detector.py
  - ArchitecturalPatternDetector class
  - Framework detection engine
  - Pattern-specific detectors (MVC, MVVM, Repository, etc.)
- Modified: codebase_scraper.py (integrated into main analysis flow)

## Integration & UX

**Seamless Integration:**
- C3.6 enhances C3.1, C3.2, AND C3.7 with AI insights
- C3.7 provides architectural context for detected patterns
- All work together automatically
- No configuration needed - just works!

**User Experience:**
- Set ANTHROPIC_API_KEY → Get AI insights automatically
- No API key → Features still work, just without AI enhancement
- No new flags to learn
- Maximum value with zero friction

## Example Output

**Pattern Detection (C3.1 + C3.6):**
```json
{
  "pattern_type": "Singleton",
  "confidence": 0.85,
  "evidence": ["Private constructor", "getInstance() method"],
  "ai_analysis": {
    "explanation": "Detected Singleton due to private constructor...",
    "issues": ["Not thread-safe - consider double-checked locking"],
    "recommendations": ["Add synchronized block", "Use enum-based singleton"],
    "related_patterns": ["Factory", "Object Pool"]
  }
}
```

**Architectural Detection (C3.7):**
```json
{
  "pattern_name": "MVC (Model-View-Controller)",
  "confidence": 0.9,
  "evidence": [
    "Models directory with 15 model classes",
    "Views directory with 23 view files",
    "Controllers directory with 12 controllers",
    "Django framework detected (uses MVC)"
  ],
  "framework": "Django"
}
```

## Testing

- AI enhancement tested with Claude Sonnet 4
- Architectural detection tested on Django, Spring Boot, React projects
- All existing tests passing (962/966 tests)
- Graceful degradation verified (works without API key)

## Roadmap Progress

-  C3.1: Design Pattern Detection
-  C3.2: Test Example Extraction
-  C3.6: AI Enhancement (NEW!)
-  C3.7: Architectural Pattern Detection (NEW!)
- 🔜 C3.3: Build "how to" guides
- 🔜 C3.4: Extract configuration patterns
- 🔜 C3.5: Create architectural overview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-03 22:56:37 +03:00