Merge branch 'development' into feature/router-quality-improvements

Integrated multi-source support from development branch into feature branch's
C3.x auto-cloning and cache system. This merge combines TWO major features:

FEATURE BRANCH (C3.x + Cache):
- Automatic GitHub repository cloning for C3.x analysis
- Hidden .skillseeker-cache/ directory for intermediate files
- Cache reuse for faster rebuilds
- Enhanced AI skill quality improvements

DEVELOPMENT BRANCH (Multi-Source):
- Support multiple sources of same type (multiple GitHub repos, PDFs)
- List-based data storage with source indexing
- New configs: claude-code.json, medusa-mercurjs.json
- llms.txt downloader/parser enhancements
- New tests: test_markdown_parsing.py, test_multi_source.py

CONFLICT RESOLUTIONS:

1. configs/claude-code.json (COMPROMISE):
   - Kept file with _migration_note (preserves PR #244 work)
   - Feature branch had deleted it (config migration)
   - Development branch enhanced it (47 Claude Code doc URLs)

2. src/skill_seekers/cli/unified_scraper.py (INTEGRATED):
   Applied 8 changes for multi-source support:
   - List-based storage: {'github': [], 'documentation': [], 'pdf': []}
   - Source indexing with _source_counters
   - Unique naming: {name}_github_{idx}_{repo_id}
   - Unique data files: github_data_{idx}_{repo_id}.json
   - List append instead of dict assignment
   - Updated _clone_github_repo(repo_name, idx=0) signature
   - Applied same logic to _scrape_pdf()

3. src/skill_seekers/cli/unified_skill_builder.py (INTEGRATED):
   Applied 3 changes for multi-source synthesis:
   - _load_source_skill_mds(): Glob pattern for multiple sources
   - _generate_references(): Iterate through github_list
   - _generate_c3_analysis_references(repo_id): Per-repo C3.x references

TESTING STRATEGY:

Backward Compatibility:
- Single source configs work exactly as before (idx=0)

New Capabilities:
- Multiple GitHub repos: encode/httpx + facebook/react
- Multiple PDFs with unique indexing
- Mixed sources: docs + multiple GitHub repos

Pipeline Integrity:
- Scraper: Multi-source data collection with indexing
- Builder: Loads all source SKILL.md files
- Synthesis: Merges multiple sources with separators
- C3.x: Independent analysis per repo in unique subdirectories

Result: Support MULTIPLE sources per type + C3.x analysis + cache system

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-01-12 00:11:31 +03:00
10 changed files with 1695 additions and 131 deletions

84
configs/claude-code.json Normal file
View File

@@ -0,0 +1,84 @@
{
"_migration_note": "TODO: Migrate to external skill-seekers-configs repo. Kept temporarily to preserve PR #244 work.",
"name": "claude-code",
"description": "Claude Code CLI and development environment. Use for Claude Code features, tools, workflows, MCP integration, plugins, hooks, configuration, deployment, and AI-assisted development.",
"base_url": "https://code.claude.com/docs/en/",
"start_urls": [
"https://code.claude.com/docs/en/overview",
"https://code.claude.com/docs/en/quickstart",
"https://code.claude.com/docs/en/common-workflows",
"https://code.claude.com/docs/en/claude-code-on-the-web",
"https://code.claude.com/docs/en/desktop",
"https://code.claude.com/docs/en/chrome",
"https://code.claude.com/docs/en/vs-code",
"https://code.claude.com/docs/en/jetbrains",
"https://code.claude.com/docs/en/github-actions",
"https://code.claude.com/docs/en/gitlab-ci-cd",
"https://code.claude.com/docs/en/slack",
"https://code.claude.com/docs/en/sub-agents",
"https://code.claude.com/docs/en/plugins",
"https://code.claude.com/docs/en/discover-plugins",
"https://code.claude.com/docs/en/skills",
"https://code.claude.com/docs/en/output-styles",
"https://code.claude.com/docs/en/hooks-guide",
"https://code.claude.com/docs/en/headless",
"https://code.claude.com/docs/en/mcp",
"https://code.claude.com/docs/en/third-party-integrations",
"https://code.claude.com/docs/en/amazon-bedrock",
"https://code.claude.com/docs/en/google-vertex-ai",
"https://code.claude.com/docs/en/microsoft-foundry",
"https://code.claude.com/docs/en/network-config",
"https://code.claude.com/docs/en/llm-gateway",
"https://code.claude.com/docs/en/devcontainer",
"https://code.claude.com/docs/en/sandboxing",
"https://code.claude.com/docs/en/setup",
"https://code.claude.com/docs/en/iam",
"https://code.claude.com/docs/en/security",
"https://code.claude.com/docs/en/data-usage",
"https://code.claude.com/docs/en/monitoring-usage",
"https://code.claude.com/docs/en/costs",
"https://code.claude.com/docs/en/analytics",
"https://code.claude.com/docs/en/plugin-marketplaces",
"https://code.claude.com/docs/en/settings",
"https://code.claude.com/docs/en/terminal-config",
"https://code.claude.com/docs/en/model-config",
"https://code.claude.com/docs/en/memory",
"https://code.claude.com/docs/en/statusline",
"https://code.claude.com/docs/en/cli-reference",
"https://code.claude.com/docs/en/interactive-mode",
"https://code.claude.com/docs/en/slash-commands",
"https://code.claude.com/docs/en/checkpointing",
"https://code.claude.com/docs/en/hooks",
"https://code.claude.com/docs/en/plugins-reference",
"https://code.claude.com/docs/en/troubleshooting",
"https://code.claude.com/docs/en/legal-and-compliance"
],
"selectors": {
"main_content": "#content-area, #content-container, article, main",
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/docs/en/"],
"exclude": [
"/docs/fr/", "/docs/de/", "/docs/it/", "/docs/ja/", "/docs/es/",
"/docs/ko/", "/docs/zh-CN/", "/docs/zh-TW/", "/docs/ru/",
"/docs/id/", "/docs/pt/", "/changelog", "github.com"
]
},
"categories": {
"getting_started": ["overview", "quickstart", "common-workflows"],
"ide_integrations": ["vs-code", "jetbrains", "desktop", "chrome", "claude-code-on-the-web", "slack"],
"ci_cd": ["github-actions", "gitlab-ci-cd"],
"building": ["sub-agents", "subagent", "plugins", "discover-plugins", "skills", "output-styles", "hooks-guide", "headless", "programmatic"],
"mcp": ["mcp", "model-context-protocol"],
"deployment": ["third-party-integrations", "amazon-bedrock", "google-vertex-ai", "microsoft-foundry", "network-config", "llm-gateway", "devcontainer", "sandboxing"],
"administration": ["setup", "iam", "security", "data-usage", "monitoring-usage", "costs", "analytics", "plugin-marketplaces"],
"configuration": ["settings", "terminal-config", "model-config", "memory", "statusline"],
"reference": ["cli-reference", "interactive-mode", "slash-commands", "checkpointing", "hooks", "plugins-reference"],
"troubleshooting": ["troubleshooting"],
"legal": ["legal-and-compliance"]
},
"rate_limit": 0.5,
"max_pages": 250
}

View File

@@ -0,0 +1,71 @@
{
"name": "medusa-mercurjs",
"description": "Complete Medusa v2 + MercurJS multi-vendor e-commerce framework knowledge. Use when building headless commerce applications, implementing multi-vendor marketplaces, or understanding Medusa modules/workflows.",
"merge_mode": "rule-based",
"sources": [
{
"type": "documentation",
"base_url": "https://docs.medusajs.com",
"llms_txt_url": "https://docs.medusajs.com/llms-full.txt",
"extract_api": true,
"selectors": {
"main_content": "main, article, .content",
"title": "h1",
"code_blocks": "pre"
},
"url_patterns": {
"include": [
"/learn",
"/resources"
],
"exclude": []
},
"categories": {
"installation": ["installation", "install", "docker", "update"],
"fundamentals": ["fundamentals", "api-routes", "data-models", "modules", "module-links", "workflows", "events-and-subscribers", "scheduled-jobs", "custom-cli-scripts", "admin", "environment-variables"],
"customization": ["customization", "custom-features", "extend-features", "integrate-systems", "customize-admin"],
"debugging_testing": ["debugging-and-testing", "logging", "testing", "test-tools", "instrumentation", "feature-flags", "debug-workflows"],
"deployment": ["deployment", "production", "deploy", "general"],
"commerce_modules": ["commerce-modules", "product", "cart", "order", "payment", "pricing", "tax", "inventory", "fulfillment", "customer", "promotion", "auth", "region", "currency", "sales-channel", "stock-location", "api-key", "user"],
"infrastructure_modules": ["infrastructure-modules", "caching", "event", "file", "locking", "notification", "workflow-engine", "analytics"],
"storefront": ["storefront-development", "publishable-api-keys", "checkout", "products", "customers", "regions"],
"integrations": ["integrations", "sanity", "contentful", "stripe", "paypal", "shipstation", "sentry"],
"cli_tools": ["medusa-cli", "commands", "build", "develop", "plugin", "db"],
"references": ["references", "medusa-workflows", "helper-steps", "service-factory-reference", "data-model-repository-reference", "test-tools-reference", "fulfillment", "auth", "notification-provider", "file-provider", "locking-service", "caching-service"],
"recipes": ["recipes", "erp", "marketplace", "b2b", "subscriptions", "digital-products", "bundled-products"],
"admin_components": ["admin-components", "widgets", "ui-routes"],
"examples": ["examples", "guides", "how-to-tutorials", "tutorials"]
},
"rate_limit": 0.3,
"max_pages": 500
},
{
"type": "documentation",
"base_url": "https://docs.mercurjs.com/",
"llms_txt_url": "https://docs.mercurjs.com/llms-full.txt",
"extract_api": true,
"selectors": {
"main_content": "main, article",
"title": "h1",
"code_blocks": "pre"
},
"url_patterns": {
"include": ["/"],
"exclude": []
},
"categories": {
"quick_start": ["introduction", "get-started"],
"components": ["components", "backend", "admin-panel", "vendor-panel", "storefront"],
"core_concepts": ["core-concepts", "seller", "commission", "payouts", "order-splitting", "reviews", "requests", "notifications", "marketplace-settings"],
"product": ["product", "core-commerce-modules", "core-infrastructure-modules", "framework"],
"integrations": ["integrations", "algolia", "resend", "stripe"],
"api_admin": ["api-reference/admin", "admin-algolia", "admin-api-keys", "admin-attributes", "admin-auth", "admin-campaigns", "admin-claims", "admin-collections", "admin-commission", "admin-currencies", "admin-customers", "admin-draft-orders", "admin-exchanges", "admin-fulfillment", "admin-inventory", "admin-invites", "admin-notifications", "admin-orders", "admin-payments", "admin-price-lists", "admin-products", "admin-promotions", "admin-regions", "admin-reservations", "admin-returns", "admin-sales-channels", "admin-sellers", "admin-shipping", "admin-stock-locations", "admin-stores", "admin-tax", "admin-uploads", "admin-users"],
"api_store": ["api-reference/store", "store-auth", "store-carts", "store-collections", "store-currencies", "store-customers", "store-fulfillment", "store-orders", "store-payment", "store-products", "store-regions", "store-returns"],
"api_vendor": ["api-reference/vendor", "vendor-auth", "vendor-fulfillment", "vendor-inventory", "vendor-orders", "vendor-payouts", "vendor-products", "vendor-returns", "vendor-sellers", "vendor-shipping", "vendor-stock-locations", "vendor-uploads"],
"help": ["help", "llm", "mcp", "support"]
},
"rate_limit": 0.3,
"max_pages": 300
}
]
}