* feat: fix unified scraper pipeline gaps, add multi-agent support, and Unity skill configs Fix multiple bugs in the unified scraper pipeline discovered while creating Unity skills (Spine, Addressables, DOTween): - Fix doc scraper KeyError by passing base_url in temp config - Fix scraped_data list-vs-dict bug in detect_conflicts() and merge_sources() - Add Phase 6 auto-enhancement from config "enhancement" block (LOCAL + API mode) - Add "browser": true config support for JavaScript SPA documentation sites - Add Phase 3 skip message for better UX - Add subprocess timeout (3600s) for doc scraper - Fix SkillEnhancer missing skill_dir argument in API mode - Fix browser renderer defaults (60s timeout, domcontentloaded wait condition) - Fix C3.x JSON filename mismatch (design_patterns.json → all_patterns.json) - Fix workflow builtin target handling when no pattern data available - Make AI enhancement timeout configurable via SKILL_SEEKER_ENHANCE_TIMEOUT env var (300s default) - Add C#, Go, Rust, Swift, Ruby, PHP, GDScript to GitHub scraper extension map - Add multi-agent LOCAL mode support across all 17 scrapers (--agent flag) - Add Kimi/Moonshot platform support (API keys, agent presets, config wizard) - Add unity-game-dev.yaml workflow (7 stages covering Unity-specific patterns) - Add 3 Unity skill configs (Spine, Addressables, DOTween) - Add comprehensive Claude bias audit report Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: create AgentClient abstraction, remove hardcoded Claude from 5 enhancers (#334) Phase 1 of the full agent-agnostic refactor. Creates a centralized AgentClient that all enhancers use instead of hardcoded subprocess calls and model names. New file: - agent_client.py: Unified AI client supporting API mode (Anthropic, Moonshot, Google, OpenAI) and LOCAL mode (Claude Code, Kimi, Codex, Copilot, OpenCode, custom agents). Provides detect_api_key(), get_model(), detect_default_target(). Refactored (removed all hardcoded ["claude", ...] subprocess calls): - ai_enhancer.py: -140 lines, delegates to AgentClient - config_enhancer.py: -150 lines, removed _run_claude_cli() - guide_enhancer.py: -120 lines, removed _check_claude_cli(), _call_claude_*() - unified_enhancer.py: -100 lines, removed _check_claude_cli(), _call_claude_*() - codebase_scraper.py: collapsed 3 functions into 1 using AgentClient Fixed: - utils.py: has_api_key()/get_api_key() now check all providers - enhance_skill.py, video_scraper.py, video_visual.py: model names configurable via ANTHROPIC_MODEL env var - enhancement_workflow.py: uses call() with _call_claude() fallback Net: -153 lines of code while adding full multi-agent support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Phase 2 agent-agnostic refactor — defaults, help text, merge mode, MCP (#334) Phase 2 of the full agent-agnostic refactor: Default targets: - Changed default="claude" to auto-detect from API keys in 5 argument files and 3 CLI scripts (install_skill, upload_skill, enhance_skill) - Added AgentClient.detect_default_target() for runtime resolution - MCP server functions now use "auto" default with runtime detection Help text (16+ argument files): - Replaced "ANTHROPIC_API_KEY" / "Claude Code" with agent-neutral wording - Now mentions all API keys (ANTHROPIC, MOONSHOT, etc.) and "AI coding agent" Log messages: - main.py, enhance_command.py: "Claude Code CLI" → dynamic agent name - enhance_command.py docstring: "Claude Code" → "AI coding agent" Merge mode rename: - Added "ai-enhanced" as the preferred merge mode name - "claude-enhanced" kept as backward-compatible alias - Renamed ClaudeEnhancedMerger → AIEnhancedMerger (with alias) - Updated choices, validators, and descriptions MCP server descriptions: - server_fastmcp.py: "Claude AI skills" → "LLM skills" in tool descriptions - packaging_tools.py: Updated defaults and dry-run messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Phase 3 agent-agnostic refactor — docstrings, MCP descriptions, README (#334) Phase 3 of the full agent-agnostic refactor: Module docstrings (17+ scraper files): - "Claude Skill Converter" → "AI Skill Converter" - "Build Claude skill" → "Build AI/LLM skill" - "Asking Claude" → "Asking AI" - Updated doc_scraper, github_scraper, pdf_scraper, word_scraper, epub_scraper, video_scraper, enhance_skill, enhance_skill_local, unified_scraper, and others MCP server_legacy.py (30+ fixes): - All tool descriptions: "Claude skill" → "LLM skill" - "Upload to Claude" → "Upload skill" - "enhance with Claude Code" → "enhance with AI agent" - Kept claude.ai/skills URLs (platform-specific, correct) MCP README.md: - Added multi-agent support note at top - "Claude AI skills" → "LLM skills" throughout - Updated examples to show multi-platform usage - Kept Claude Code in supported agents list (accurate) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Phase 3 continued — remaining docstring and comment fixes (#334) Additional agent-neutral text fixes in 8 files missed from the initial Phase 3 commit: - config_extractor.py, config_manager.py, constants.py: comments - enhance_command.py: docstring and print messages - guide_enhancer.py: class/module docstrings - parsers/enhance_parser.py, install_parser.py: help text - signal_flow_analyzer.py: docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * workflow added * fix: address code review issues in AgentClient and Phase 6 (#334) Fixes found during commit review: 1. AgentClient._call_local: Only append "Write your response to:" when caller explicitly passes output_file (was always appending) 2. Codex agent: Added uses_stdin flag to preset, pipe prompt via stdin instead of DEVNULL (codex reads from stdin with "-" arg) 3. Provider detection: Added _detect_provider_from_key() to detect provider from API key prefix (sk-ant- → anthropic, AIza → google) instead of always assuming anthropic 4. Phase 6 API mode: Replaced direct SkillEnhancer/ANTHROPIC_API_KEY with AgentClient for multi-provider support (Moonshot, Google, OpenAI) 5. config_enhancer: Removed output_file path from prompt — AgentClient manages temp files and output detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make claude adaptor model name configurable via ANTHROPIC_MODEL env var Missed in the Phase 1 refactor — adaptors/claude.py:381 had a hardcoded model name without the os.environ.get() wrapper that all other files use. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add copilot stdin support, custom agent, and kimi aliases (#334) Additional agent improvements from Kimi review: - Added uses_stdin: True to copilot agent preset (reads from stdin like codex) - Added custom agent support via SKILL_SEEKER_AGENT_CMD env var in _call_local() - Added kimi_code/kimi-code aliases in normalize_agent_name() - Added "kimi" to --target choices in enhance arguments - Updated help text with MOONSHOT_API_KEY across argument files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Kimi CLI integration — add uses_stdin and output parsing (#334) Kimi CLI's --print mode requires stdin piping and outputs structured protocol messages (TurnBegin, TextPart, etc.) instead of plain text. Fixes: - Added uses_stdin: True to kimi preset (was not piping prompt) - Added parse_output: "kimi" flag to preset - Added _parse_kimi_output() to extract text from TextPart lines - Kimi now returns clean text instead of raw protocol dump Tested: kimi returns '{"status": "ok"}' correctly via AgentClient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Kimi CLI in enhance_skill_local — remove wrong skip-permissions, use absolute path Two bugs in enhance_skill_local.py AGENT_PRESETS for Kimi: 1. supports_skip_permissions was True — Kimi doesn't support --dangerously-skip-permissions, only Claude does. Fixed to False. 2. {skill_dir} was resolved as relative path — Kimi CLI requires absolute paths for --work-dir. Fixed with .resolve(). Tested: `skill-seekers enhance output/test-e2e/ --agent kimi` now works end-to-end (107s, 9233 bytes output). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove invalid --enhance-level flag from enhance subprocess calls doc_scraper.py and video_scraper.py were passing --enhance-level to skill-seekers-enhance, which doesn't accept that flag. This caused enhancement to fail silently after scraping completed. Fixes: - Removed --enhance-level from enhance subprocess calls - Added --agent passthrough in doc_scraper.py - Fixed log messages to show correct command Tested: `skill-seekers create <url> --enhance-level 1` now chains scrape → enhance successfully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --agent and --agent-cmd to create command UNIVERSAL_ARGUMENTS The --agent flag was defined in common.py but not imported into the create command's UNIVERSAL_ARGUMENTS, so it wasn't available when using `skill-seekers create <source> --agent kimi`. Now all 17 source types support the --agent flag via the create command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update docs data_file path after moving to cache directory The scraped_data["documentation"] stored the original output/ path for data_file, but the directory was moved to .skillseeker-cache/ afterward. Phase 2 conflict detection then failed with FileNotFoundError trying to read the old path. Now updates data_file to point to the cache location after the move. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: multi-language code signature extraction in GitHub scraper The GitHub scraper only analyzed files matching the primary language (by bytes). For multi-language repos like spine-runtimes (C++ primary but C# is the target), this meant 0 C# files were analyzed. Fix: Analyze top 3 languages with known extension mappings instead of just the primary. Also support "language" field in config source to explicitly target specific languages (e.g., "language": "C#"). Updated Unity configs to specify language: "C#" for focused analysis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-file language detection + remove artificial analysis limits Rewrites GitHub scraper's _extract_signatures_and_tests() to detect language per-file from extension instead of only analyzing the primary language. This fixes multi-language repos like spine-runtimes (C++ primary) where C# files were never analyzed. Changes: - Build reverse ext→language map, detect language per-file - Analyze ALL files with known extensions (not just primary language) - Config "language" field works as optional filter, not a workaround - Store per-file language + languages_analyzed in output - Remove 50-file API mode limit (rate limiting already handles this) - Remove 100-file default config extraction limit (now unlimited by default) - Fix unified scraper default max_pages from 100 to 500 (matches constants.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove remaining 100-file limit in config_extractor.extract_from_directory The find_config_files default was changed to unlimited but extract_from_directory and CLI --max-files still defaulted to 100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: replace interactive terminal merge with automated AgentClient call AIEnhancedMerger._launch_claude_merge() used to open a terminal window, run a bash script, and poll for a file — requiring manual interaction. Now uses AgentClient.call() to send the merge prompt directly and parse the JSON response. Fully automated, no terminal needed, works with any configured AI agent (Claude, Kimi, etc.). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add marketplace pipeline for publishing skills to Claude Code plugin repos Connect the three-repo pipeline: configs repo → Skill Seekers engine → plugin marketplace repos. Enables automated publishing of generated skills directly into Claude Code plugin repositories with proper plugin.json and marketplace.json structure. New components: - MarketplaceManager: Registry for plugin marketplace repos at ~/.skill-seekers/marketplaces.json with per-repo git tokens, branch config, and default author metadata - MarketplacePublisher: Clones marketplace repo, creates plugin directory structure (skills/, .claude-plugin/plugin.json), updates marketplace.json, commits and pushes. Includes skill_name validation to prevent path traversal, and cleanup of partial state on git failures - 4 MCP tools: add_marketplace, list_marketplaces, remove_marketplace, publish_to_marketplace — registered in FastMCP server - Phase 6 in install workflow: automatic marketplace publishing after packaging, triggered by --marketplace CLI arg or marketplace_targets config field CLI additions: - --marketplace NAME: publish to registered marketplace after packaging - --marketplace-category CAT: plugin category (default: development) - --create-branch: create feature branch instead of committing to main Security: - Skill name regex validation (^[a-zA-Z0-9][a-zA-Z0-9._-]*$) prevents path traversal attacks via malicious SKILL.md frontmatter - has_api_key variable scoping fix in install workflow summary - try/finally cleanup of partial plugin directories on publish failure Config schema: - Optional marketplace_targets field in config JSON for multi-marketplace auto-publishing: [{"marketplace": "spyke", "category": "development"}] - Backward compatible — ignored by older versions Tests: 58 tests (36 manager + 22 publisher including 2 integration tests using file:// git protocol for full publish success path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: thread agent selection through entire enhancement pipeline Propagates the --agent and --agent-cmd CLI parameters through all enhancement components so users can use any supported coding agent (kimi, claude, copilot, codex, opencode) consistently across the full pipeline, not just in top-level enhancement. Agent parameter threading: - AIEnhancer: accepts agent param, passes to AgentClient - ConfigEnhancer: accepts agent param, passes to AgentClient - WorkflowEngine: accepts agent param, passes to sub-enhancers (PatternEnhancer, TestExampleEnhancer, AIEnhancer) - ArchitecturalPatternDetector: accepts agent param for AI enhancement - analyze_codebase(): accepts agent/agent_cmd, forwards to ConfigEnhancer, ArchitecturalPatternDetector, and doc processing - UnifiedScraper: reads agent from CLI args, forwards to doc scraper subprocess, C3.x analysis, and LOCAL enhancement - CreateCommand: forwards --agent and --agent-cmd to subprocess argv - workflow_runner: passes agent to WorkflowEngine for inline/named workflows Timeout improvements: - Default enhancement timeout increased from 300s (5min) to 2700s (45min) to accommodate large skill generation with local agents - New get_default_timeout() in agent_client.py with env var override (SKILL_SEEKER_ENHANCE_TIMEOUT) supporting 'unlimited' value - Config enhancement block supports "timeout": "unlimited" field - Removed hardcoded timeout=300 and timeout=600 calls in config_enhancer and merge_sources, now using centralized default CLI additions (unified_scraper): - --agent AGENT: select local coding agent for enhancement - --agent-cmd CMD: override agent command template (advanced) Config: unity-dotween.json updated with agent=kimi, timeout=unlimited, removed unused file_patterns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add claude-code unified config for Claude Code CLI skill generation Unified config combining official Claude Code documentation and source code analysis. Covers internals, architecture, tools, commands, IDE integrations, MCP, plugins, skills, and development workflows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add multi-agent support verification report and test artifacts - AGENT_SUPPORT_VERIFICATION.md: verification report confirming agent parameter threading works across all enhancement components - END_TO_END_EXAMPLES.md: complete workflows for all 17 source types with both Claude and Kimi agents - test_agents.sh: shell script for real-world testing of agent support across major CLI commands with both agents - test_realworld.md: real-world test scenarios for manual QA Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add .env to .gitignore to prevent secret exposure The .env file containing API keys (ANTHROPIC_API_KEY, GITHUB_TOKEN, etc.) was not in .gitignore, causing it to appear as untracked and risking accidental commit. Added .env, .env.local, and .env.*.local patterns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: URL filtering uses base directory instead of full page URL (#331) is_valid_url() checked url.startswith(self.base_url) where base_url could be a full page path like ".../manual/index.html". Sibling pages like ".../manual/LoadingAssets.html" failed the check because they don't start with ".../index.html". Now strips the filename to get the directory prefix: "https://example.com/docs/index.html" → "https://example.com/docs/" This fixes SPA sites like Unity's DocFX docs where browser mode renders the page but sibling links were filtered out. Closes #331 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pass language config through to GitHub scraper in unified flow The unified scraper built github_config from source fields but didn't include the "language" field. The GitHub scraper's per-file detection read self.config.get("language", "") which was always empty, so it fell back to analyzing all languages instead of the focused C# filter. For DOTween (C# only repo), this caused 0 files analyzed because without the language filter, it analyzed top 3 languages but the file tree matching failed silently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: centralize all enhancement timeouts to 45min default with unlimited support All enhancement/AI timeouts now use get_default_timeout() from agent_client.py instead of scattered hardcoded values (120s, 300s, 600s). Default: 2700s (45 minutes) Override: SKILL_SEEKER_ENHANCE_TIMEOUT env var Unlimited: Set to "unlimited", "none", or "0" Updated: agent_client.py, enhance_skill_local.py, arguments/enhance.py, enhance_command.py, unified_enhancer.py, unified_scraper.py Not changed (different purposes): - Browser page load timeout (60s) - API HTTP request timeout (120s) - Doc scraper subprocess timeout (3600s) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add browser_wait_until and browser_extra_wait config for SPA docs DocFX sites (Unity docs) render navigation via JavaScript after initial page load. With domcontentloaded, only 1 link was found. With networkidle + 5s extra wait, 95 content pages are discovered. New config options for documentation sources: - browser_wait_until: "networkidle" | "load" | "domcontentloaded" - browser_extra_wait: milliseconds to wait after page load for lazy nav Updated Addressables config to use networkidle + 5000ms extra wait. Pass browser settings through unified scraper to doc scraper config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: three-layer smart discovery engine for SPA documentation sites Replaces the browser_wait_until/browser_extra_wait config hacks with a proper discovery engine that runs before the BFS crawl loop: Layer 1: sitemap.xml — checks domain root for sitemap, parses <loc> tags Layer 2: llms.txt — existing mechanism (unchanged) Layer 3: SPA nav — renders index page with networkidle via Playwright, extracts all links from the fully-rendered DOM sidebar/TOC The BFS crawl then uses domcontentloaded (fast) since all pages are already discovered. No config hacks needed — browser mode automatically triggers SPA discovery when only 1 page is found. Tested: Unity Addressables DocFX site now discovers 95 pages (was 1). Removed browser_wait_until/browser_extra_wait from Addressables config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace manual arg forwarding with dynamic routing in create command The create command manually hardcoded ~60% of scraper flags in _route_*() methods, causing ~40 flags to be silently dropped. Every new flag required edits in 2 places (arguments/create.py + create_command.py), guaranteed to drift. Replaced with _build_argv() — a dynamic forwarder that iterates vars(self.args) and forwards all explicitly-set arguments automatically, using the same pattern as main.py::_reconstruct_argv(). This eliminates the root cause of all flag gaps. Changes in create_command.py (-380 lines, +175 lines = net -205): - Added _build_argv() dynamic arg forwarder with dest→flag translation map for mismatched names (async_mode→--async, video_playlist→--playlist, skip_config→--skip-config-patterns, workflow_var→--var) - Added _call_module() helper (dedup sys.argv swap pattern) - Simplified all _route_*() methods from 50-70 lines to 5-10 lines each - Deleted _add_common_args() entirely (subsumed by _build_argv) - _route_generic() now forwards ALL args, not just universal ones New flags accessible via create command: - --from-json: build skill from pre-extracted JSON (all source types) - --skip-api-reference: skip API reference generation (local codebase) - --skip-dependency-graph: skip dependency analysis (local codebase) - --skip-config-patterns: skip config pattern extraction (local codebase) - --no-comments: skip comment extraction (local codebase) - --depth: analysis depth control (local codebase, deprecated) - --setup: auto-detect GPU/install video deps (video) Bug fix in unified_scraper.py: - Fixed C3.x pattern data loss: unified_scraper read patterns/detected_patterns.json but codebase_scraper writes patterns/all_patterns.json. Changed both read locations (line 828 for local sources, line 1597 for GitHub C3.x) to use the correct filename. This was causing 100% loss of design pattern data (e.g., 905 patterns detected but 0 included in final skill). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address 5 code review issues in marketplace and package pipeline Fixes found by automated code review of the marketplace feature and package command: 1. --marketplace flag silently ignored in package_skill.py CLI Added MarketplacePublisher invocation after successful packaging when --marketplace is provided. Previously the flag was parsed but never acted on. 2. Missing 7 platform choices in --target (package.py) Added minimax, opencode, deepseek, qwen, openrouter, together, fireworks to the argparse choices list. These platforms have registered adaptors but were rejected by the argument parser. 3. is_update always True for new marketplace registrations Two separate datetime.now() calls produced different microsecond timestamps, making added_at != updated_at always. Fixed by assigning a single timestamp to both fields. 4. Shallow clone (depth=1) caused push failures for marketplace repos MarketplacePublisher now does full clones instead of using GitConfigRepo's shallow clone (which is designed for read-only config fetching). Full clone is required for commit+push workflow. 5. Partial plugin dir not cleaned on force=True failure Removed the `and not force` guard from cleanup logic — if an operation fails midway, the partial directory should be cleaned regardless of whether force was set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address dynamic routing edge cases in create_command Fixes from code review of the _build_argv() refactor: 1. Non-None defaults forwarded unconditionally — added enhance_level=2, doc_version="", video_languages="en", whisper_model="base", platform="slack", visual_interval=0.7, visual_min_gap=0.5, visual_similarity=3.0 to the defaults dict so they're only forwarded when the user explicitly overrides them. This fixes video sources incorrectly getting --enhance-level 2 (video default is 0). 2. video_url dest not translated — added "video_url": "--url" to _DEST_TO_FLAG so create correctly forwards --video-url as --url to video_scraper.py. 3. Video positional args double-forwarded — added video_url, video_playlist, video_file to _SKIP_ARGS since _route_video() already handles them via positional args from source detection. 4. Removed dead workflow_var entry from _DEST_TO_FLAG — the create parser uses key "var" not "workflow_var", so the translation was never triggered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve 15 broken tests and --from-json crash bug in create command Fixes found by Kimi code review of the dynamic routing refactor: 1. 3 test_create_arguments.py failures — UNIVERSAL_ARGUMENTS count changed from 19 to 21 (added agent, agent_cmd). Updated expected count and name set. Moved from_json out of UNIVERSAL to ADVANCED_ARGUMENTS since not all scrapers support it. 2. 12 test_create_integration_basic.py failures — tests called _add_common_args() which was deleted in the refactor. Rewrote _collect_argv() to use _build_argv() via CreateCommand with SourceDetector. Updated _make_args defaults to match new parameter set. 3. --from-json crash bug — was in UNIVERSAL_ARGUMENTS so create accepted it for all source types, but web/github/local scrapers don't support it. Forwarding it caused argparse "unrecognized arguments" errors. Moved to ADVANCED_ARGUMENTS with documentation listing which source types support it. 4. Additional _is_explicitly_set defaults — added enhance_level=2, doc_version="", video_languages="en", whisper_model="base", platform="slack", visual_interval/min_gap/similarity defaults to prevent unconditional forwarding of parser defaults. 5. Video arg handling — added video_url to _DEST_TO_FLAG translation map, added video_url/video_playlist/video_file to _SKIP_ARGS (handled as positionals by _route_video). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: C3.x analysis data loss — read from references/ after _generate_references() cleanup Root cause: _generate_references() in codebase_scraper.py copies analysis directories (patterns/, test_examples/, config_patterns/, architecture/, dependencies/, api_reference/) into references/ then DELETES the originals to avoid duplication (Issue #279). But unified_scraper.py reads from the original paths after analyze_codebase() returns — by which time the originals are gone. This caused 100% data loss for all 6 C3.x data types (design patterns, test examples, config patterns, architecture, dependencies, API reference) in the unified scraper pipeline. The data was correctly detected (e.g., 905 patterns in 510 files) but never made it into the final skill. Fix: Added _load_json_fallback() method that checks references/{subdir}/ first (where _generate_references moves the data), falling back to the original path. Applied to both GitHub C3.x analysis (line ~1599) and local source analysis (line ~828). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add allowlist to _build_argv for config route to unified_scraper _build_argv() was forwarding all CLI args (--name, --doc-version, etc.) to unified_scraper which doesn't accept them. Added allowlist parameter to _build_argv() — when provided, ONLY args in the allowlist are forwarded. The config route now uses _UNIFIED_SCRAPER_ARGS allowlist with the exact set of flags unified_scraper accepts. This is a targeted patch — the proper fix is the ExecutionContext singleton refactor planned separately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add force=True to marketplace publish from package CLI The package command's --marketplace flag didn't pass force=True to MarketplacePublisher, so re-publishing an existing skill would fail with "already exists" error instead of overwriting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add push_config tool for publishing configs to registered source repos New ConfigPublisher class that validates configs, places them in the correct category directory, commits, and pushes to registered source repositories. Follows the MarketplacePublisher pattern. Features: - Auto-detect category from config name/description - Validate via ConfigValidator + repo's validate-config.py - Support feature branch or direct push - Force overwrite existing configs - MCP tool: push_config(config_path, source_name, category) Usage: push_config(config_path="configs/unity-spine.json", source_name="spyke") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: security hardening, error handling, tests, and cleanup Security: - Remove command injection via cloned repo script execution (config_publisher) - Replace git add -A with targeted staging (marketplace_publisher) - Clear auth tokens from cached .git/config after clone - Use defusedxml for sitemap XML parsing (XXE protection) - Add path traversal validation for config names Error handling: - AgentClient: specific exception handling for rate limit, auth, connection errors - AgentClient: log subprocess stderr on non-zero exit, raise on explicit API mode failure - config_publisher: only catch ValueError for validation warnings Logic bugs: - Fix _build_argv silently dropping --enhance-level 2 (matched default) - Fix URL filtering over-broadening (strip to parent instead of adding /) - Log warning when _call_module returns None exit code Tests (134 new): - test_agent_client.py: 71 tests for normalize, detect, init, timeout, model - test_config_publisher.py: 23 tests for detect_category, publish, errors - test_create_integration_basic.py: 20 tests for _build_argv routing - Fix 11 pre-existing failures (guide_enhancer, doctor, install_skill, marketplace) Cleanup: - Remove 5 dev artifact files (-1405 lines) - Rename _launch_claude_merge to _launch_ai_merge All 3194 tests pass, 39 expected skips. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: pin ruff==0.15.8 in CI and reformat packaging_tools.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add missing pytest install to vector DB adaptor test jobs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reformat 7 files for ruff 0.15.8 and fix vector DB test path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove test-week2-integration job referencing missing script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update e2e test to accept dynamic platform name in upload phase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: YusufKaraaslanSpyke <yusuf@spykegames.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skill Seekers
English | 简体中文 | 日本語 | 한국어 | Español | Français | Deutsch | Português | Türkçe | العربية | हिन्दी | Русский
🧠 The data layer for AI systems. Skill Seekers turns documentation sites, GitHub repos, PDFs, videos, notebooks, wikis, and 10+ more source types into structured knowledge assets—ready to power AI Skills (Claude, Gemini, OpenAI), RAG pipelines (LangChain, LlamaIndex, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline) in minutes, not hours.
🌐 Visit SkillSeekersWeb.com - Browse 24+ preset configs, share your configs, and access complete documentation!
📋 View Development Roadmap & Tasks - 134 tasks across 10 categories, pick any to contribute!
🌐 Ecosystem
Skill Seekers is a multi-repo project. Here's where everything lives:
| Repository | Description | Links |
|---|---|---|
| Skill_Seekers | Core CLI & MCP server (this repo) | PyPI |
| skillseekersweb | Website & documentation | Live |
| skill-seekers-configs | Community config repository | |
| skill-seekers-action | GitHub Action for CI/CD | |
| skill-seekers-plugin | Claude Code plugin | |
| homebrew-skill-seekers | Homebrew tap for macOS |
Want to contribute? The website and configs repos are great starting points for new contributors!
🧠 The Data Layer for AI Systems
Skill Seekers is the universal preprocessing layer that sits between raw documentation and every AI system that consumes it. Whether you are building Claude skills, a LangChain RAG pipeline, or a Cursor .cursorrules file — the data preparation is identical. You do it once, and export to all targets.
# One command → structured knowledge asset
skill-seekers create https://docs.react.dev/
# or: skill-seekers create facebook/react
# or: skill-seekers create ./my-project
# Export to any AI system
skill-seekers package output/react --target claude # → Claude AI Skill (ZIP)
skill-seekers package output/react --target langchain # → LangChain Documents
skill-seekers package output/react --target llama-index # → LlamaIndex TextNodes
skill-seekers package output/react --target cursor # → .cursorrules
What gets built
| Output | Target | What it powers |
|---|---|---|
| Claude Skill (ZIP + YAML) | --target claude |
Claude Code, Claude API |
| Gemini Skill (tar.gz) | --target gemini |
Google Gemini |
| OpenAI / Custom GPT (ZIP) | --target openai |
GPT-4o, custom assistants |
| LangChain Documents | --target langchain |
QA chains, agents, retrievers |
| LlamaIndex TextNodes | --target llama-index |
Query engines, chat engines |
| Haystack Documents | --target haystack |
Enterprise RAG pipelines |
| Pinecone-ready (Markdown) | --target markdown |
Vector upsert |
| ChromaDB / FAISS / Qdrant | --format chroma/faiss/qdrant |
Local vector DBs |
Cursor .cursorrules |
--target claude → copy |
Cursor IDE AI context |
| Windsurf / Cline / Continue | --target claude → copy |
VS Code, IntelliJ, Vim |
Why it matters
- ⚡ 99% faster — Days of manual data prep → 15–45 minutes
- 🎯 AI Skill quality — 500+ line SKILL.md files with examples, patterns, and guides
- 📊 RAG-ready chunks — Smart chunking preserves code blocks and maintains context
- 🎬 Videos — Extract code, transcripts, and structured knowledge from YouTube and local videos
- 🔄 Multi-source — Combine 17 source types (docs, GitHub, PDFs, videos, notebooks, wikis, and more) into one knowledge asset
- 🌐 One prep, every target — Export the same asset to 16 platforms without re-scraping
- ✅ Battle-tested — 2,540+ tests, 24+ framework presets, production-ready
🚀 Quick Start (3 Commands)
# 1. Install
pip install skill-seekers
# 2. Create skill from any source
skill-seekers create https://docs.django.com/
# 3. Package for your AI platform
skill-seekers package output/django --target claude
That's it! You now have output/django-claude.zip ready to use.
Other Sources (17 Supported)
# GitHub repository
skill-seekers create facebook/react
# Local project
skill-seekers create ./my-project
# PDF document
skill-seekers create manual.pdf
# Word document
skill-seekers create report.docx
# EPUB e-book
skill-seekers create book.epub
# Jupyter Notebook
skill-seekers create notebook.ipynb
# OpenAPI spec
skill-seekers create openapi.yaml
# PowerPoint presentation
skill-seekers create presentation.pptx
# AsciiDoc document
skill-seekers create guide.adoc
# Local HTML file
skill-seekers create page.html
# RSS/Atom feed
skill-seekers create feed.rss
# Man page
skill-seekers create curl.1
# Video (YouTube, Vimeo, or local file — requires skill-seekers[video])
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial
# First time? Auto-install GPU-aware visual deps:
skill-seekers video --setup
# Confluence wiki
skill-seekers confluence --space TEAM --name wiki
# Notion pages
skill-seekers notion --database-id ... --name docs
# Slack/Discord chat export
skill-seekers chat --export-dir ./slack-export --name team-chat
Export Everywhere
# Package for multiple platforms
for platform in claude gemini openai langchain; do
skill-seekers package output/django --target $platform
done
What is Skill Seekers?
Skill Seekers is the data layer for AI systems. It transforms 17 source types—documentation websites, GitHub repositories, PDFs, videos, Jupyter Notebooks, Word/EPUB/AsciiDoc documents, OpenAPI specs, PowerPoint presentations, RSS feeds, man pages, Confluence wikis, Notion pages, Slack/Discord exports, and more—into structured knowledge assets for every AI target:
| Use Case | What you get | Examples |
|---|---|---|
| AI Skills | Comprehensive SKILL.md + references | Claude Code, Gemini, GPT |
| RAG Pipelines | Chunked documents with rich metadata | LangChain, LlamaIndex, Haystack |
| Vector Databases | Pre-formatted data ready for upsert | Pinecone, Chroma, Weaviate, FAISS |
| AI Coding Assistants | Context files your IDE AI reads automatically | Cursor, Windsurf, Cline, Continue.dev |
📚 Documentation
| I want to... | Read this |
|---|---|
| Get started quickly | Quick Start - 3 commands to first skill |
| Understand concepts | Core Concepts - How it works |
| Scrape sources | Scraping Guide - All source types |
| Enhance skills | Enhancement Guide - AI enhancement |
| Export skills | Packaging Guide - Platform export |
| Look up commands | CLI Reference - All 20 commands |
| Configure | Config Format - JSON specification |
| Fix issues | Troubleshooting - Common problems |
Complete documentation: docs/README.md
Instead of spending days on manual preprocessing, Skill Seekers:
- Ingests — docs, GitHub repos, local codebases, PDFs, videos, notebooks, wikis, and 10+ more source types
- Analyzes — deep AST parsing, pattern detection, API extraction
- Structures — categorized reference files with metadata
- Enhances — AI-powered SKILL.md generation (Claude, Gemini, or local)
- Exports — 16 platform-specific formats from one asset
Why Use This?
For AI Skill Builders (Claude, Gemini, OpenAI)
- 🎯 Production-grade Skills — 500+ line SKILL.md files with code examples, patterns, and guides
- 🔄 Enhancement Workflows — Apply
security-focus,architecture-comprehensive, or custom YAML presets - 🎮 Any Domain — Game engines (Godot, Unity), frameworks (React, Django), internal tools
- 🔧 Teams — Combine internal docs + code into a single source of truth
- 📚 Quality — AI-enhanced with examples, quick reference, and navigation guidance
For RAG Builders & AI Engineers
- 🤖 RAG-ready data — Pre-chunked LangChain
Documents, LlamaIndexTextNodes, HaystackDocuments - 🚀 99% faster — Days of preprocessing → 15–45 minutes
- 📊 Smart metadata — Categories, sources, types → better retrieval accuracy
- 🔄 Multi-source — Combine docs + GitHub + PDFs + videos in one pipeline
- 🌐 Platform-agnostic — Export to any vector DB or framework without re-scraping
For AI Coding Assistant Users
- 💻 Cursor / Windsurf / Cline — Generate
.cursorrules/.windsurfrules/.clinerulesautomatically - 🎯 Persistent context — AI "knows" your frameworks without repeated prompting
- 📚 Always current — Update context in minutes when docs change
Key Features
🌐 Documentation Scraping
- ✅ llms.txt Support - Automatically detects and uses LLM-ready documentation files (10x faster)
- ✅ Universal Scraper - Works with ANY documentation website
- ✅ Smart Categorization - Automatically organizes content by topic
- ✅ Code Language Detection - Recognizes Python, JavaScript, C++, GDScript, etc.
- ✅ 24+ Ready-to-Use Presets - Godot, React, Vue, Django, FastAPI, and more
📄 PDF Support
- ✅ Basic PDF Extraction - Extract text, code, and images from PDF files
- ✅ OCR for Scanned PDFs - Extract text from scanned documents
- ✅ Password-Protected PDFs - Handle encrypted PDFs
- ✅ Table Extraction - Extract complex tables from PDFs
- ✅ Parallel Processing - 3x faster for large PDFs
- ✅ Intelligent Caching - 50% faster on re-runs
🎬 Video Extraction
- ✅ YouTube & Local Videos - Extract transcripts, on-screen code, and structured knowledge from videos
- ✅ Visual Frame Analysis - OCR extraction from code editors, terminals, slides, and diagrams
- ✅ GPU Auto-Detection - Automatically installs correct PyTorch build (CUDA/ROCm/MPS/CPU)
- ✅ AI Enhancement - Two-pass: clean OCR artifacts + generate polished SKILL.md
- ✅ Time Clipping - Extract specific sections with
--start-timeand--end-time - ✅ Playlist Support - Batch process all videos in a YouTube playlist
- ✅ Vision API Fallback - Use Claude Vision for low-confidence OCR frames
🐙 GitHub Repository Analysis
- ✅ Deep Code Analysis - AST parsing for Python, JavaScript, TypeScript, Java, C++, Go
- ✅ API Extraction - Functions, classes, methods with parameters and types
- ✅ Repository Metadata - README, file tree, language breakdown, stars/forks
- ✅ GitHub Issues & PRs - Fetch open/closed issues with labels and milestones
- ✅ CHANGELOG & Releases - Automatically extract version history
- ✅ Conflict Detection - Compare documented APIs vs actual code implementation
- ✅ MCP Integration - Natural language: "Scrape GitHub repo facebook/react"
🔄 Unified Multi-Source Scraping
- ✅ Combine Multiple Sources - Mix documentation + GitHub + PDF in one skill
- ✅ Conflict Detection - Automatically finds discrepancies between docs and code
- ✅ Intelligent Merging - Rule-based or AI-powered conflict resolution
- ✅ Transparent Reporting - Side-by-side comparison with ⚠️ warnings
- ✅ Documentation Gap Analysis - Identifies outdated docs and undocumented features
- ✅ Single Source of Truth - One skill showing both intent (docs) and reality (code)
- ✅ Backward Compatible - Legacy single-source configs still work
🤖 Multi-LLM Platform Support
- ✅ 12 LLM Platforms - Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, Generic Markdown, OpenCode, Kimi (Moonshot AI), DeepSeek AI, Qwen (Alibaba), OpenRouter, Together AI, Fireworks AI
- ✅ Universal Scraping - Same documentation works for all platforms
- ✅ Platform-Specific Packaging - Optimized formats for each LLM
- ✅ One-Command Export -
--targetflag selects platform - ✅ Optional Dependencies - Install only what you need
- ✅ 100% Backward Compatible - Existing Claude workflows unchanged
| Platform | Format | Upload | Enhancement | API Key | Custom Endpoint |
|---|---|---|---|---|---|
| Claude AI | ZIP + YAML | ✅ Auto | ✅ Yes | ANTHROPIC_API_KEY | ANTHROPIC_BASE_URL |
| Google Gemini | tar.gz | ✅ Auto | ✅ Yes | GOOGLE_API_KEY | - |
| OpenAI ChatGPT | ZIP + Vector Store | ✅ Auto | ✅ Yes | OPENAI_API_KEY | - |
| MiniMax AI | ZIP + Knowledge Files | ✅ Auto | ✅ Yes | MINIMAX_API_KEY | - |
| Generic Markdown | ZIP | ❌ Manual | ❌ No | - | - |
# Claude (default - no changes needed!)
skill-seekers package output/react/
skill-seekers upload react.zip
# Google Gemini
pip install skill-seekers[gemini]
skill-seekers package output/react/ --target gemini
skill-seekers upload react-gemini.tar.gz --target gemini
# OpenAI ChatGPT
pip install skill-seekers[openai]
skill-seekers package output/react/ --target openai
skill-seekers upload react-openai.zip --target openai
# MiniMax AI
pip install skill-seekers[minimax]
skill-seekers package output/react/ --target minimax
skill-seekers upload react-minimax.zip --target minimax
# Generic Markdown (universal export)
skill-seekers package output/react/ --target markdown
# Use the markdown files directly in any LLM
🔧 Environment Variables for Claude-Compatible APIs (e.g., GLM-4.7)
Skill Seekers supports any Claude-compatible API endpoint:
# Option 1: Official Anthropic API (default)
export ANTHROPIC_API_KEY=sk-ant-...
# Option 2: GLM-4.7 Claude-compatible API
export ANTHROPIC_API_KEY=your-glm-47-api-key
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
# All AI enhancement features will use the configured endpoint
skill-seekers enhance output/react/
skill-seekers analyze --directory . --enhance
Note: Setting ANTHROPIC_BASE_URL allows you to use any Claude-compatible API endpoint, such as GLM-4.7 (智谱 AI) or other compatible services.
Installation:
# Install with Gemini support
pip install skill-seekers[gemini]
# Install with OpenAI support
pip install skill-seekers[openai]
# Install with MiniMax support
pip install skill-seekers[minimax]
# Install with all LLM platforms
pip install skill-seekers[all-llms]
🔗 RAG Framework Integrations
-
✅ LangChain Documents - Direct export to
Documentformat withpage_content+ metadata- Perfect for: QA chains, retrievers, vector stores, agents
- Example: LangChain RAG Pipeline
- Guide: LangChain Integration
-
✅ LlamaIndex TextNodes - Export to
TextNodeformat with unique IDs + embeddings- Perfect for: Query engines, chat engines, storage context
- Example: LlamaIndex Query Engine
- Guide: LlamaIndex Integration
-
✅ Pinecone-Ready Format - Optimized for vector database upsert
- Perfect for: Production vector search, semantic search, hybrid search
- Example: Pinecone Upsert
- Guide: Pinecone Integration
Quick Export:
# LangChain Documents (JSON)
skill-seekers package output/django --target langchain
# → output/django-langchain.json
# LlamaIndex TextNodes (JSON)
skill-seekers package output/django --target llama-index
# → output/django-llama-index.json
# Markdown (Universal)
skill-seekers package output/django --target markdown
# → output/django-markdown/SKILL.md + references/
Complete RAG Pipeline Guide: RAG Pipelines Documentation
🧠 AI Coding Assistant Integrations
Transform any framework documentation into expert coding context for 4+ AI assistants:
-
✅ Cursor IDE - Generate
.cursorrulesfor AI-powered code suggestions- Perfect for: Framework-specific code generation, consistent patterns
- Works with: Cursor IDE (VS Code fork)
- Guide: Cursor Integration
- Example: Cursor React Skill
-
✅ Windsurf - Customize Windsurf's AI assistant context with
.windsurfrules- Perfect for: IDE-native AI assistance, flow-based coding
- Works with: Windsurf IDE by Codeium
- Guide: Windsurf Integration
- Example: Windsurf FastAPI Context
-
✅ Cline (VS Code) - System prompts + MCP for VS Code agent
- Perfect for: Agentic code generation in VS Code
- Works with: Cline extension for VS Code
- Guide: Cline Integration
- Example: Cline Django Assistant
-
✅ Continue.dev - Context servers for IDE-agnostic AI
- Perfect for: Multi-IDE environments (VS Code, JetBrains, Vim), custom LLM providers
- Works with: Any IDE with Continue.dev plugin
- Guide: Continue Integration
- Example: Continue Universal Context
Quick Export for AI Coding Tools:
# For any AI coding assistant (Cursor, Windsurf, Cline, Continue.dev)
skill-seekers scrape --config configs/django.json
skill-seekers package output/django --target claude # or --target markdown
# Copy to your project (example for Cursor)
cp output/django-claude/SKILL.md my-project/.cursorrules
# Or for Windsurf
cp output/django-claude/SKILL.md my-project/.windsurf/rules/django.md
# Or for Cline
cp output/django-claude/SKILL.md my-project/.clinerules
# Or for Continue.dev (HTTP server)
python examples/continue-dev-universal/context_server.py
# Configure in ~/.continue/config.json
Integration Hub: All AI System Integrations
🌊 Three-Stream GitHub Architecture
- ✅ Triple-Stream Analysis - Split GitHub repos into Code, Docs, and Insights streams
- ✅ Unified Codebase Analyzer - Works with GitHub URLs AND local paths
- ✅ C3.x as Analysis Depth - Choose 'basic' (1-2 min) or 'c3x' (20-60 min) analysis
- ✅ Enhanced Router Generation - GitHub metadata, README quick start, common issues
- ✅ Issue Integration - Top problems and solutions from GitHub issues
- ✅ Smart Routing Keywords - GitHub labels weighted 2x for better topic detection
Three Streams Explained:
- Stream 1: Code - Deep C3.x analysis (patterns, examples, guides, configs, architecture)
- Stream 2: Docs - Repository documentation (README, CONTRIBUTING, docs/*.md)
- Stream 3: Insights - Community knowledge (issues, labels, stars, forks)
from skill_seekers.cli.unified_codebase_analyzer import UnifiedCodebaseAnalyzer
# Analyze GitHub repo with all three streams
analyzer = UnifiedCodebaseAnalyzer()
result = analyzer.analyze(
source="https://github.com/facebook/react",
depth="c3x", # or "basic" for fast analysis
fetch_github_metadata=True
)
# Access code stream (C3.x analysis)
print(f"Design patterns: {len(result.code_analysis['c3_1_patterns'])}")
print(f"Test examples: {result.code_analysis['c3_2_examples_count']}")
# Access docs stream (repository docs)
print(f"README: {result.github_docs['readme'][:100]}")
# Access insights stream (GitHub metadata)
print(f"Stars: {result.github_insights['metadata']['stars']}")
print(f"Common issues: {len(result.github_insights['common_problems'])}")
See complete documentation: Three-Stream Implementation Summary
🔐 Smart Rate Limit Management & Configuration
- ✅ Multi-Token Configuration System - Manage multiple GitHub accounts (personal, work, OSS)
- Secure config storage at
~/.config/skill-seekers/config.json(600 permissions) - Per-profile rate limit strategies:
prompt,wait,switch,fail - Configurable timeout per profile (default: 30 min, prevents indefinite waits)
- Smart fallback chain: CLI arg → Env var → Config file → Prompt
- API key management for Claude, Gemini, OpenAI
- Secure config storage at
- ✅ Interactive Configuration Wizard - Beautiful terminal UI for easy setup
- Browser integration for token creation (auto-opens GitHub, etc.)
- Token validation and connection testing
- Visual status display with color coding
- ✅ Intelligent Rate Limit Handler - No more indefinite waits!
- Upfront warning about rate limits (60/hour vs 5000/hour)
- Real-time detection from GitHub API responses
- Live countdown timers with progress
- Automatic profile switching when rate limited
- Four strategies: prompt (ask), wait (countdown), switch (try another), fail (abort)
- ✅ Resume Capability - Continue interrupted jobs
- Auto-save progress at configurable intervals (default: 60 sec)
- List all resumable jobs with progress details
- Auto-cleanup of old jobs (default: 7 days)
- ✅ CI/CD Support - Non-interactive mode for automation
--non-interactiveflag fails fast without prompts--profileflag to select specific GitHub account- Clear error messages for pipeline logs
Quick Setup:
# One-time configuration (5 minutes)
skill-seekers config --github
# Use specific profile for private repos
skill-seekers github --repo mycompany/private-repo --profile work
# CI/CD mode (fail fast, no prompts)
skill-seekers github --repo owner/repo --non-interactive
# Resume interrupted job
skill-seekers resume --list
skill-seekers resume github_react_20260117_143022
Rate Limit Strategies Explained:
- prompt (default) - Ask what to do when rate limited (wait, switch, setup token, cancel)
- wait - Automatically wait with countdown timer (respects timeout)
- switch - Automatically try next available profile (for multi-account setups)
- fail - Fail immediately with clear error (perfect for CI/CD)
🎯 Bootstrap Skill - Self-Hosting
Generate skill-seekers as a Claude Code skill to use within Claude:
# Generate the skill
./scripts/bootstrap_skill.sh
# Install to Claude Code
cp -r output/skill-seekers ~/.claude/skills/
What you get:
- ✅ Complete skill documentation - All CLI commands and usage patterns
- ✅ CLI command reference - Every tool and its options documented
- ✅ Quick start examples - Common workflows and best practices
- ✅ Auto-generated API docs - Code analysis, patterns, and examples
🔐 Private Config Repositories
- ✅ Git-Based Config Sources - Fetch configs from private/team git repositories
- ✅ Multi-Source Management - Register unlimited GitHub, GitLab, Bitbucket repos
- ✅ Team Collaboration - Share custom configs across 3-5 person teams
- ✅ Enterprise Support - Scale to 500+ developers with priority-based resolution
- ✅ Secure Authentication - Environment variable tokens (GITHUB_TOKEN, GITLAB_TOKEN)
- ✅ Intelligent Caching - Clone once, pull updates automatically
- ✅ Offline Mode - Work with cached configs when offline
🤖 Codebase Analysis (C3.x)
C3.4: Configuration Pattern Extraction with AI Enhancement
- ✅ 9 Config Formats - JSON, YAML, TOML, ENV, INI, Python, JavaScript, Dockerfile, Docker Compose
- ✅ 7 Pattern Types - Database, API, logging, cache, email, auth, server configurations
- ✅ AI Enhancement - Optional dual-mode AI analysis (API + LOCAL)
- Explains what each config does
- Suggests best practices and improvements
- Security analysis - Finds hardcoded secrets, exposed credentials
- ✅ Auto-Documentation - Generates JSON + Markdown documentation of all configs
- ✅ MCP Integration -
extract_config_patternstool with enhancement support
C3.3: AI-Enhanced How-To Guides
- ✅ Comprehensive AI Enhancement - Transforms basic guides into professional tutorials
- ✅ 5 Automatic Improvements - Step descriptions, troubleshooting, prerequisites, next steps, use cases
- ✅ Dual-Mode Support - API mode (Claude API) or LOCAL mode (Claude Code CLI)
- ✅ No API Costs with LOCAL Mode - FREE enhancement using your Claude Code Max plan
- ✅ Quality Transformation - 75-line templates → 500+ line comprehensive guides
Usage:
# Quick analysis (1-2 min, basic features only)
skill-seekers analyze --directory tests/ --quick
# Comprehensive analysis with AI (20-60 min, all features)
skill-seekers analyze --directory tests/ --comprehensive
# With AI enhancement
skill-seekers analyze --directory tests/ --enhance
Full Documentation: docs/HOW_TO_GUIDES.md
🔄 Enhancement Workflow Presets
Reusable YAML-defined enhancement pipelines that control how AI transforms your raw documentation into a polished skill.
- ✅ 5 Bundled Presets —
default,minimal,security-focus,architecture-comprehensive,api-documentation - ✅ User-Defined Presets — add custom workflows to
~/.config/skill-seekers/workflows/ - ✅ Multiple Workflows — chain two or more workflows in one command
- ✅ Fully Managed CLI — list, inspect, copy, add, remove, and validate workflows
# Apply a single workflow
skill-seekers create ./my-project --enhance-workflow security-focus
# Chain multiple workflows (applied in order)
skill-seekers create ./my-project \
--enhance-workflow security-focus \
--enhance-workflow minimal
# Manage presets
skill-seekers workflows list # List all (bundled + user)
skill-seekers workflows show security-focus # Print YAML content
skill-seekers workflows copy security-focus # Copy to user dir for editing
skill-seekers workflows add ./my-workflow.yaml # Install a custom preset
skill-seekers workflows remove my-workflow # Remove a user preset
skill-seekers workflows validate security-focus # Validate preset structure
# Copy multiple at once
skill-seekers workflows copy security-focus minimal api-documentation
# Add multiple files at once
skill-seekers workflows add ./wf-a.yaml ./wf-b.yaml
# Remove multiple at once
skill-seekers workflows remove my-wf-a my-wf-b
YAML preset format:
name: security-focus
description: "Security-focused review: vulnerabilities, auth, data handling"
version: "1.0"
stages:
- name: vulnerabilities
type: custom
prompt: "Review for OWASP top 10 and common security vulnerabilities..."
- name: auth-review
type: custom
prompt: "Examine authentication and authorisation patterns..."
uses_history: true
⚡ Performance & Scale
- ✅ Async Mode - 2-3x faster scraping with async/await (use
--asyncflag) - ✅ Large Documentation Support - Handle 10K-40K+ page docs with intelligent splitting
- ✅ Router/Hub Skills - Intelligent routing to specialized sub-skills
- ✅ Parallel Scraping - Process multiple skills simultaneously
- ✅ Checkpoint/Resume - Never lose progress on long scrapes
- ✅ Caching System - Scrape once, rebuild instantly
✅ Quality Assurance
- ✅ Fully Tested - 2,540+ tests with comprehensive coverage
📦 Installation
# Basic install (documentation scraping, GitHub analysis, PDF, packaging)
pip install skill-seekers
# With all LLM platform support
pip install skill-seekers[all-llms]
# With MCP server
pip install skill-seekers[mcp]
# Everything
pip install skill-seekers[all]
Need help choosing? Run the setup wizard:
skill-seekers-setup
Installation Options
| Install | Features |
|---|---|
pip install skill-seekers |
Scraping, GitHub analysis, PDF, all platforms |
pip install skill-seekers[gemini] |
+ Google Gemini support |
pip install skill-seekers[openai] |
+ OpenAI ChatGPT support |
pip install skill-seekers[all-llms] |
+ All LLM platforms |
pip install skill-seekers[mcp] |
+ MCP server for Claude Code, Cursor, etc. |
pip install skill-seekers[video] |
+ YouTube/Vimeo transcript & metadata extraction |
pip install skill-seekers[video-full] |
+ Whisper transcription & visual frame extraction |
pip install skill-seekers[jupyter] |
+ Jupyter Notebook support |
pip install skill-seekers[pptx] |
+ PowerPoint support |
pip install skill-seekers[confluence] |
+ Confluence wiki support |
pip install skill-seekers[notion] |
+ Notion pages support |
pip install skill-seekers[rss] |
+ RSS/Atom feed support |
pip install skill-seekers[chat] |
+ Slack/Discord chat export support |
pip install skill-seekers[asciidoc] |
+ AsciiDoc document support |
pip install skill-seekers[all] |
Everything enabled |
Video visual deps (GPU-aware): After installing
skill-seekers[video-full], runskill-seekers video --setupto auto-detect your GPU and install the correct PyTorch variant + easyocr. This is the recommended way to install visual extraction dependencies.
🚀 One-Command Install Workflow
The fastest way to go from config to uploaded skill - complete automation:
# Install React skill from official configs (auto-uploads to Claude)
skill-seekers install --config react
# Install from local config file
skill-seekers install --config configs/custom.json
# Install without uploading (package only)
skill-seekers install --config django --no-upload
# Preview workflow without executing
skill-seekers install --config react --dry-run
Time: 20-45 minutes total | Quality: Production-ready (9/10) | Cost: Free
Phases executed:
📥 PHASE 1: Fetch Config (if config name provided)
📖 PHASE 2: Scrape Documentation
✨ PHASE 3: AI Enhancement (MANDATORY - no skip option)
📦 PHASE 4: Package Skill
☁️ PHASE 5: Upload to Claude (optional, requires API key)
Requirements:
- ANTHROPIC_API_KEY environment variable (for auto-upload)
- Claude Code Max plan (for local AI enhancement)
📊 Feature Matrix
Skill Seekers supports 12 LLM platforms, 17 source types, and full feature parity across all targets.
Platforms: Claude AI, Google Gemini, OpenAI ChatGPT, MiniMax AI, Generic Markdown, OpenCode, Kimi (Moonshot AI), DeepSeek AI, Qwen (Alibaba), OpenRouter, Together AI, Fireworks AI Source Types: Documentation websites, GitHub repos, PDFs, Word (.docx), EPUB, Video, Local codebases, Jupyter Notebooks, Local HTML, OpenAPI/Swagger, AsciiDoc, PowerPoint (.pptx), RSS/Atom feeds, Man pages, Confluence wikis, Notion pages, Slack/Discord chat exports
See Complete Feature Matrix for detailed platform and feature support.
Quick Platform Comparison
| Feature | Claude | Gemini | OpenAI | MiniMax | Markdown |
|---|---|---|---|---|---|
| Format | ZIP + YAML | tar.gz | ZIP + Vector | ZIP + Knowledge | ZIP |
| Upload | ✅ API | ✅ API | ✅ API | ✅ API | ❌ Manual |
| Enhancement | ✅ Sonnet 4 | ✅ 2.0 Flash | ✅ GPT-4o | ✅ M2.7 | ❌ None |
| All Skill Modes | ✅ | ✅ | ✅ | ✅ | ✅ |
Usage Examples
Documentation Scraping
# Scrape documentation website
skill-seekers scrape --config configs/react.json
# Quick scrape without config
skill-seekers scrape --url https://react.dev --name react
# With async mode (3x faster)
skill-seekers scrape --config configs/godot.json --async --workers 8
PDF Extraction
# Basic PDF extraction
skill-seekers pdf --pdf docs/manual.pdf --name myskill
# Advanced features
skill-seekers pdf --pdf docs/manual.pdf --name myskill \
--extract-tables \ # Extract tables
--parallel \ # Fast parallel processing
--workers 8 # Use 8 CPU cores
# Scanned PDFs (requires: pip install pytesseract Pillow)
skill-seekers pdf --pdf docs/scanned.pdf --name myskill --ocr
Video Extraction
# Install video support
pip install skill-seekers[video] # Transcripts + metadata
pip install skill-seekers[video-full] # + Whisper + visual frame extraction
# Auto-detect GPU and install visual deps (PyTorch + easyocr)
skill-seekers video --setup
# Extract from YouTube video
skill-seekers video --url https://www.youtube.com/watch?v=dQw4w9WgXcQ --name mytutorial
# Extract from a YouTube playlist
skill-seekers video --playlist https://www.youtube.com/playlist?list=... --name myplaylist
# Extract from a local video file
skill-seekers video --video-file recording.mp4 --name myrecording
# Extract with visual frame analysis (requires video-full deps)
skill-seekers video --url https://www.youtube.com/watch?v=... --name mytutorial --visual
# With AI enhancement (cleans OCR + generates polished SKILL.md)
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --enhance-level 2
# Clip a specific section of a video (supports seconds, MM:SS, HH:MM:SS)
skill-seekers video --url https://www.youtube.com/watch?v=... --start-time 1:30 --end-time 5:00
# Use Vision API for low-confidence OCR frames (requires ANTHROPIC_API_KEY)
skill-seekers video --url https://www.youtube.com/watch?v=... --visual --vision-ocr
# Re-build skill from previously extracted data (skip download)
skill-seekers video --from-json output/mytutorial/video_data/extracted_data.json --name mytutorial
Full guide: See docs/VIDEO_GUIDE.md for complete CLI reference, visual pipeline details, AI enhancement options, and troubleshooting.
GitHub Repository Analysis
# Basic repository scraping
skill-seekers github --repo facebook/react
# With authentication (higher rate limits)
export GITHUB_TOKEN=ghp_your_token_here
skill-seekers github --repo facebook/react
# Customize what to include
skill-seekers github --repo django/django \
--include-issues \ # Extract GitHub Issues
--max-issues 100 \ # Limit issue count
--include-changelog # Extract CHANGELOG.md
Unified Multi-Source Scraping
Combine documentation + GitHub + PDF into one unified skill with conflict detection:
# Use existing unified configs
skill-seekers unified --config configs/react_unified.json
skill-seekers unified --config configs/django_unified.json
# Or create unified config
cat > configs/myframework_unified.json << 'EOF'
{
"name": "myframework",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://docs.myframework.com/",
"max_pages": 200
},
{
"type": "github",
"repo": "owner/myframework",
"code_analysis_depth": "surface"
}
]
}
EOF
skill-seekers unified --config configs/myframework_unified.json
Conflict Detection automatically finds:
- 🔴 Missing in code (high): Documented but not implemented
- 🟡 Missing in docs (medium): Implemented but not documented
- ⚠️ Signature mismatch: Different parameters/types
- ℹ️ Description mismatch: Different explanations
Full Guide: See docs/UNIFIED_SCRAPING.md for complete documentation.
Private Config Repositories
Share custom configs across teams using private git repositories:
# Option 1: Using MCP tools (recommended)
# Register your team's private repo
add_config_source(
name="team",
git_url="https://github.com/mycompany/skill-configs.git",
token_env="GITHUB_TOKEN"
)
# Fetch config from team repo
fetch_config(source="team", config_name="internal-api")
Supported Platforms:
- GitHub (
GITHUB_TOKEN), GitLab (GITLAB_TOKEN), Gitea (GITEA_TOKEN), Bitbucket (BITBUCKET_TOKEN)
Full Guide: See docs/GIT_CONFIG_SOURCES.md for complete documentation.
How It Works
graph LR
A[Documentation Website] --> B[Skill Seekers]
B --> C[Scraper]
B --> D[AI Enhancement]
B --> E[Packager]
C --> F[Organized References]
D --> F
F --> E
E --> G[Claude Skill .zip]
G --> H[Upload to Claude AI]
- Detect llms.txt - Checks for llms-full.txt, llms.txt, llms-small.txt first
- Scrape: Extracts all pages from documentation
- Categorize: Organizes content into topics (API, guides, tutorials, etc.)
- Enhance: AI analyzes docs and creates comprehensive SKILL.md with examples
- Package: Bundles everything into a Claude-ready
.zipfile
Architecture
The system is organized into 8 core modules and 5 utility modules (~200 classes total):
| Module | Purpose | Key Classes |
|---|---|---|
| CLICore | Git-style command dispatcher | CLIDispatcher, SourceDetector, CreateCommand |
| Scrapers | 17 source-type extractors | DocToSkillConverter, GitHubScraper, UnifiedScraper |
| Adaptors | 20+ output platform formats | SkillAdaptor (ABC), ClaudeAdaptor, LangChainAdaptor |
| Analysis | C3.x codebase analysis pipeline | UnifiedCodebaseAnalyzer, PatternRecognizer, 10 GoF detectors |
| Enhancement | AI-powered skill improvement | AIEnhancer, UnifiedEnhancer, WorkflowEngine |
| Packaging | Package, upload, install skills | PackageSkill, InstallAgent |
| MCP | FastMCP server (34 tools) | SkillSeekerMCPServer, 8 tool modules |
| Sync | Doc change detection | ChangeDetector, SyncMonitor, Notifier |
Utility modules: Parsers (28 CLI parsers), Storage (S3/GCS/Azure), Embedding (multi-provider vectors), Benchmark (performance), Utilities (16 shared helpers).
Full UML diagrams: docs/UML_ARCHITECTURE.md | StarUML project: docs/UML/skill_seekers.mdj | HTML API reference: docs/UML/html/
📋 Prerequisites
Before you start, make sure you have:
- Python 3.10 or higher - Download | Check:
python3 --version - Git - Download | Check:
git --version - 15-30 minutes for first-time setup
First time user? → Start Here: Bulletproof Quick Start Guide 🎯
📤 Uploading Skills to Claude
Once your skill is packaged, you need to upload it to Claude:
Option 1: Automatic Upload (API-based)
# Set your API key (one-time)
export ANTHROPIC_API_KEY=sk-ant-...
# Package and upload automatically
skill-seekers package output/react/ --upload
# OR upload existing .zip
skill-seekers upload output/react.zip
Option 2: Manual Upload (No API Key)
# Package skill
skill-seekers package output/react/
# → Creates output/react.zip
# Then manually upload:
# - Go to https://claude.ai/skills
# - Click "Upload Skill"
# - Select output/react.zip
Option 3: MCP (Claude Code)
In Claude Code, just ask:
"Package and upload the React skill"
🤖 Installing to AI Agents
Skill Seekers can automatically install skills to 18 AI coding agents.
# Install to specific agent
skill-seekers install-agent output/react/ --agent cursor
# Install to all agents at once
skill-seekers install-agent output/react/ --agent all
# Preview without installing
skill-seekers install-agent output/react/ --agent cursor --dry-run
Supported Agents
| Agent | Path | Type |
|---|---|---|
| Claude Code | ~/.claude/skills/ |
Global |
| Cursor | .cursor/skills/ |
Project |
| VS Code / Copilot | .github/skills/ |
Project |
| Amp | ~/.amp/skills/ |
Global |
| Goose | ~/.config/goose/skills/ |
Global |
| OpenCode | ~/.opencode/skills/ |
Global |
| Windsurf | ~/.windsurf/skills/ |
Global |
| Roo Code | .roo/skills/ |
Project |
| Cline | .cline/skills/ |
Project |
| Aider | ~/.aider/skills/ |
Global |
| Bolt | .bolt/skills/ |
Project |
| Kilo Code | .kilo/skills/ |
Project |
| Continue | ~/.continue/skills/ |
Global |
| Kimi Code | ~/.kimi/skills/ |
Global |
🔌 MCP Integration (26 Tools)
Skill Seekers ships an MCP server for use from Claude Code, Cursor, Windsurf, VS Code + Cline, or IntelliJ IDEA.
# stdio mode (Claude Code, VS Code + Cline)
python -m skill_seekers.mcp.server_fastmcp
# HTTP mode (Cursor, Windsurf, IntelliJ)
python -m skill_seekers.mcp.server_fastmcp --transport http --port 8765
# Auto-configure all agents at once
./setup_mcp.sh
All 26 tools available:
- Core (9):
list_configs,generate_config,validate_config,estimate_pages,scrape_docs,package_skill,upload_skill,enhance_skill,install_skill - Extended (10):
scrape_github,scrape_pdf,unified_scrape,merge_sources,detect_conflicts,add_config_source,fetch_config,list_config_sources,remove_config_source,split_config - Vector DB (4):
export_to_chroma,export_to_weaviate,export_to_faiss,export_to_qdrant - Cloud (3):
cloud_upload,cloud_download,cloud_list
Full Guide: docs/MCP_SETUP.md
⚙️ Configuration
Available Presets (24+)
# List all presets
skill-seekers list-configs
| Category | Presets |
|---|---|
| Web Frameworks | react, vue, angular, svelte, nextjs |
| Python | django, flask, fastapi, sqlalchemy, pytest |
| Game Development | godot, pygame, unity |
| Tools & DevOps | docker, kubernetes, terraform, ansible |
| Unified (Docs + GitHub) | react-unified, vue-unified, nextjs-unified, and more |
Creating Your Own Config
# Option 1: Interactive
skill-seekers scrape --interactive
# Option 2: Copy and edit a preset
cp configs/react.json configs/myframework.json
nano configs/myframework.json
skill-seekers scrape --config configs/myframework.json
Config File Structure
{
"name": "myframework",
"description": "When to use this skill",
"base_url": "https://docs.myframework.com/",
"selectors": {
"main_content": "article",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/docs", "/guide"],
"exclude": ["/blog", "/about"]
},
"categories": {
"getting_started": ["intro", "quickstart"],
"api": ["api", "reference"]
},
"rate_limit": 0.5,
"max_pages": 500
}
Where to Store Configs
The tool searches in this order:
- Exact path as provided
./configs/(current directory)~/.config/skill-seekers/configs/(user config directory)- SkillSeekersWeb.com API (preset configs)
📊 What Gets Created
output/
├── godot_data/ # Scraped raw data
│ ├── pages/ # JSON files (one per page)
│ └── summary.json # Overview
│
└── godot/ # The skill
├── SKILL.md # Enhanced with real examples
├── references/ # Categorized docs
│ ├── index.md
│ ├── getting_started.md
│ ├── scripting.md
│ └── ...
├── scripts/ # Empty (add your own)
└── assets/ # Empty (add your own)
🐛 Troubleshooting
No Content Extracted?
- Check your
main_contentselector - Try:
article,main,div[role="main"]
Data Exists But Won't Use It?
# Force re-scrape
rm -rf output/myframework_data/
skill-seekers scrape --config configs/myframework.json
Categories Not Good?
Edit the config categories section with better keywords.
Want to Update Docs?
# Delete old data and re-scrape
rm -rf output/godot_data/
skill-seekers scrape --config configs/godot.json
Enhancement Not Working?
# Check if API key is set
echo $ANTHROPIC_API_KEY
# Try LOCAL mode instead (uses Claude Code Max, no API key needed)
skill-seekers enhance output/react/ --mode LOCAL
# Monitor background enhancement status
skill-seekers enhance-status output/react/ --watch
GitHub Rate Limit Issues?
# Set a GitHub token (5000 req/hour vs 60/hour anonymous)
export GITHUB_TOKEN=ghp_your_token_here
# Or configure multiple profiles
skill-seekers config --github
📈 Performance
| Task | Time | Notes |
|---|---|---|
| Scraping (sync) | 15-45 min | First time only, thread-based |
| Scraping (async) | 5-15 min | 2-3x faster with --async flag |
| Building | 1-3 min | Fast rebuild from cache |
| Re-building | <1 min | With --skip-scrape |
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
| Enhancement (API) | 20-40 sec | Requires API key |
| Video (transcript) | 1-3 min | YouTube/local, transcript only |
| Video (visual) | 5-15 min | + OCR frame extraction |
| Packaging | 5-10 sec | Final .zip creation |
📚 Documentation
Getting Started
- BULLETPROOF_QUICKSTART.md - 🎯 START HERE if you're new!
- QUICKSTART.md - Quick start for experienced users
- TROUBLESHOOTING.md - Common issues and solutions
- docs/QUICK_REFERENCE.md - One-page cheat sheet
Architecture
- docs/UML_ARCHITECTURE.md - UML architecture overview with 14 diagrams
- docs/UML/exports/ - PNG diagram exports (package overview + 13 class diagrams)
- docs/UML/html/ - Full HTML API reference (all classes, operations, attributes)
- docs/UML/skill_seekers.mdj - StarUML project file (open with StarUML)
Guides
- docs/LARGE_DOCUMENTATION.md - Handle 10K-40K+ page docs
- ASYNC_SUPPORT.md - Async mode guide (2-3x faster scraping)
- docs/ENHANCEMENT_MODES.md - AI enhancement modes guide
- docs/MCP_SETUP.md - MCP integration setup
- docs/UNIFIED_SCRAPING.md - Multi-source scraping
- docs/VIDEO_GUIDE.md - Video extraction guide
Integration Guides
- docs/integrations/LANGCHAIN.md - LangChain RAG
- docs/integrations/CURSOR.md - Cursor IDE
- docs/integrations/WINDSURF.md - Windsurf IDE
- docs/integrations/CLINE.md - Cline (VS Code)
- docs/integrations/RAG_PIPELINES.md - All RAG pipelines
📝 License
MIT License - see LICENSE file for details
Happy skill building! 🚀


