- Add CI Troubleshooting section with step-by-step debugging checklist - Update Local Pre-Commit Validation with auto-fix commands (uvx ruff --fix) - Add pitfall #9: CI Passes Locally But Fails in GitHub Actions - Document critical dependency patterns (MCP version, PyYAML, try/except ImportError) - Update test count references: 1,880+/1,952+ → 2,121 (current reality) - Add v3.1.0 CI Stability section to Recent Achievements - Include timing-sensitive test guidance for CI environments These improvements are based on real troubleshooting experience from recent CI failures (MCP version mismatch, PyYAML dependency, benchmark thresholds).
81 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
🎯 Project Overview
Skill Seekers is the universal documentation preprocessor for AI systems. It transforms documentation websites, GitHub repositories, and PDFs into production-ready formats for 16+ platforms: RAG pipelines (LangChain, LlamaIndex, Haystack), vector databases (Pinecone, Chroma, Weaviate, FAISS, Qdrant), AI coding assistants (Cursor, Windsurf, Cline, Continue.dev), and LLM platforms (Claude, Gemini, OpenAI).
Current Version: v3.1.0-dev Python Version: 3.10+ required Status: Production-ready, published on PyPI Website: https://skillseekersweb.com/ - Browse configs, share, and access documentation
📚 Table of Contents
- First Time Here? - Start here!
- Quick Commands - Common workflows
- Architecture - How it works
- Development - Building & testing
- Testing - Test strategy
- Debugging - Troubleshooting
- Contributing - How to add features
👋 First Time Here?
Complete this 3-minute setup to start contributing:
# 1. Install package in editable mode (REQUIRED for development)
pip install -e .
# 2. Verify installation
python -c "import skill_seekers; print(skill_seekers.__version__)" # Should print: 3.1.0-dev
# 3. Run a quick test
pytest tests/test_scraper_features.py::test_detect_language -v
# 4. You're ready! Pick a task from the roadmap:
# https://github.com/users/yusufkaraaslan/projects/2
Quick Navigation:
- Building/Testing → Development Commands
- Architecture → Core Design Pattern
- Common Issues → Common Pitfalls
- Contributing → See
CONTRIBUTING.md
⚡ Quick Command Reference (Most Used)
First time setup:
pip install -e . # REQUIRED before running tests or CLI
Running tests (NEVER skip - user requirement):
pytest tests/ -v # All tests
pytest tests/test_scraper_features.py -v # Single file
pytest tests/ --cov=src/skill_seekers --cov-report=html # With coverage
Code quality checks (matches CI):
ruff check src/ tests/ # Lint
ruff format src/ tests/ # Format
mypy src/skill_seekers # Type check
Common workflows:
# NEW unified create command (auto-detects source type)
skill-seekers create https://docs.react.dev/ -p quick
skill-seekers create facebook/react -p standard
skill-seekers create ./my-project -p comprehensive
skill-seekers create tutorial.pdf
# Legacy commands (still supported)
skill-seekers scrape --config configs/react.json
skill-seekers github --repo facebook/react
skill-seekers analyze --directory . --comprehensive
# Package for LLM platforms
skill-seekers package output/react/ --target claude
skill-seekers package output/react/ --target gemini
RAG Pipeline workflows:
# LangChain Documents
skill-seekers package output/react/ --format langchain
# LlamaIndex TextNodes
skill-seekers package output/react/ --format llama-index
# Haystack Documents
skill-seekers package output/react/ --format haystack
# ChromaDB direct upload
skill-seekers package output/react/ --format chroma --upload
# FAISS export
skill-seekers package output/react/ --format faiss
# Weaviate/Qdrant upload (requires API keys)
skill-seekers package output/react/ --format weaviate --upload
skill-seekers package output/react/ --format qdrant --upload
AI Coding Assistant workflows:
# Cursor IDE
skill-seekers package output/react/ --target claude
cp output/react-claude/SKILL.md .cursorrules
# Windsurf
cp output/react-claude/SKILL.md .windsurf/rules/react.md
# Cline (VS Code)
cp output/react-claude/SKILL.md .clinerules
# Continue.dev (universal IDE)
python examples/continue-dev-universal/context_server.py
# Configure in ~/.continue/config.json
Cloud Storage:
# Upload to S3
skill-seekers cloud upload --provider s3 --bucket my-skills output/react.zip
# Upload to GCS
skill-seekers cloud upload --provider gcs --bucket my-skills output/react.zip
# Upload to Azure
skill-seekers cloud upload --provider azure --container my-skills output/react.zip
🏗️ Architecture
Core Design Pattern: Platform Adaptors
The codebase uses the Strategy Pattern with a factory method to support 16 platforms across 4 categories:
src/skill_seekers/cli/adaptors/
├── __init__.py # Factory: get_adaptor(target/format)
├── base.py # Abstract base class
# LLM Platforms (3)
├── claude.py # Claude AI (ZIP + YAML)
├── gemini.py # Google Gemini (tar.gz)
├── openai.py # OpenAI ChatGPT (ZIP + Vector Store)
# RAG Frameworks (3)
├── langchain.py # LangChain Documents
├── llama_index.py # LlamaIndex TextNodes
├── haystack.py # Haystack Documents
# Vector Databases (5)
├── chroma.py # ChromaDB
├── faiss_helpers.py # FAISS
├── qdrant.py # Qdrant
├── weaviate.py # Weaviate
# AI Coding Assistants (4 - via Claude format + config files)
# - Cursor, Windsurf, Cline, Continue.dev
# Generic (1)
├── markdown.py # Generic Markdown (ZIP)
└── streaming_adaptor.py # Streaming data ingest
Key Methods:
package(skill_dir, output_path)- Platform-specific packagingupload(package_path, api_key)- Platform-specific upload (where applicable)enhance(skill_dir, mode)- AI enhancement with platform-specific modelsexport(skill_dir, format)- Export to RAG/vector DB formats
Data Flow (5 Phases)
-
Scrape Phase (
doc_scraper.py:scrape_all())- BFS traversal from base_url
- Output:
output/{name}_data/pages/*.json
-
Build Phase (
doc_scraper.py:build_skill())- Load pages → Categorize → Extract patterns
- Output:
output/{name}/SKILL.md+references/*.md
-
Enhancement Phase (optional,
enhance_skill_local.py)- LLM analyzes references → Rewrites SKILL.md
- Platform-specific models (Sonnet 4, Gemini 2.0, GPT-4o)
-
Package Phase (
package_skill.py→ adaptor)- Platform adaptor packages in appropriate format
- Output:
.zipor.tar.gz
-
Upload Phase (optional,
upload_skill.py→ adaptor)- Upload via platform API
File Structure (src/ layout) - Key Files Only
src/skill_seekers/
├── cli/ # All CLI commands
│ ├── main.py # ⭐ Git-style CLI dispatcher
│ ├── doc_scraper.py # ⭐ Main scraper (~790 lines)
│ │ ├── scrape_all() # BFS traversal engine
│ │ ├── smart_categorize() # Category detection
│ │ └── build_skill() # SKILL.md generation
│ ├── github_scraper.py # GitHub repo analysis
│ ├── codebase_scraper.py # ⭐ Local analysis (C2.x+C3.x)
│ ├── package_skill.py # Platform packaging
│ ├── unified_scraper.py # Multi-source scraping
│ ├── unified_codebase_analyzer.py # Three-stream GitHub+local analyzer
│ ├── enhance_skill_local.py # AI enhancement (LOCAL mode)
│ ├── enhance_status.py # Enhancement status monitoring
│ ├── upload_skill.py # Upload to platforms
│ ├── install_skill.py # Complete workflow automation
│ ├── install_agent.py # Install to AI agent directories
│ ├── pattern_recognizer.py # C3.1 Design pattern detection
│ ├── test_example_extractor.py # C3.2 Test example extraction
│ ├── how_to_guide_builder.py # C3.3 How-to guide generation
│ ├── config_extractor.py # C3.4 Configuration extraction
│ ├── generate_router.py # C3.5 Router skill generation
│ ├── code_analyzer.py # Multi-language code analysis
│ ├── api_reference_builder.py # API documentation builder
│ ├── dependency_analyzer.py # Dependency graph analysis
│ ├── signal_flow_analyzer.py # C3.10 Signal flow analysis (Godot)
│ ├── pdf_scraper.py # PDF extraction
│ └── adaptors/ # ⭐ Platform adaptor pattern
│ ├── __init__.py # Factory: get_adaptor()
│ ├── base_adaptor.py # Abstract base
│ ├── claude_adaptor.py # Claude AI
│ ├── gemini_adaptor.py # Google Gemini
│ ├── openai_adaptor.py # OpenAI ChatGPT
│ ├── markdown_adaptor.py # Generic Markdown
│ ├── langchain.py # LangChain RAG
│ ├── llama_index.py # LlamaIndex RAG
│ ├── haystack.py # Haystack RAG
│ ├── chroma.py # ChromaDB
│ ├── faiss_helpers.py # FAISS
│ ├── qdrant.py # Qdrant
│ ├── weaviate.py # Weaviate
│ └── streaming_adaptor.py # Streaming data ingest
└── mcp/ # MCP server (26 tools)
├── server_fastmcp.py # FastMCP server
└── tools/ # Tool implementations
Most Modified Files (when contributing):
- Platform adaptors:
src/skill_seekers/cli/adaptors/{platform}.py - Tests:
tests/test_{feature}.py - Configs:
configs/{framework}.json
🛠️ Development Commands
Setup
# Install in editable mode (required before tests due to src/ layout)
pip install -e .
# Install with all platform dependencies
pip install -e ".[all-llms]"
# Install specific platforms
pip install -e ".[gemini]" # Google Gemini
pip install -e ".[openai]" # OpenAI ChatGPT
Running Tests
CRITICAL: Never skip tests - User requires all tests to pass before commits.
# All tests (must run pip install -e . first!)
pytest tests/ -v
# Specific test file
pytest tests/test_scraper_features.py -v
# Multi-platform tests
pytest tests/test_install_multiplatform.py -v
# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html
# Single test
pytest tests/test_scraper_features.py::test_detect_language -v
# MCP server tests
pytest tests/test_mcp_fastmcp.py -v
Test Architecture:
- 46 test files covering all features
- CI Matrix: Ubuntu + macOS, Python 3.10-3.13
- 2,121 tests passing (current v3.1.0), up from 700+ in v2.x
- Must run
pip install -e .before tests (src/ layout requirement) - Tests include create command integration tests, CLI refactor E2E tests
Building & Publishing
# Build package (using uv - recommended)
uv build
# Or using build
python -m build
# Publish to PyPI
uv publish
# Or using twine
python -m twine upload dist/*
Testing CLI Commands
# Test configuration wizard (NEW: v2.7.0)
skill-seekers config --show # Show current configuration
skill-seekers config --github # GitHub token setup
skill-seekers config --test # Test connections
# Test resume functionality (NEW: v2.7.0)
skill-seekers resume --list # List resumable jobs
skill-seekers resume --clean # Clean up old jobs
# Test GitHub scraping with profiles (NEW: v2.7.0)
skill-seekers github --repo facebook/react --profile personal # Use specific profile
skill-seekers github --repo owner/repo --non-interactive # CI/CD mode
# Test scraping (dry run)
skill-seekers scrape --config configs/react.json --dry-run
# Test codebase analysis (C2.x features)
skill-seekers analyze --directory . --output output/codebase/
# Test pattern detection (C3.1)
skill-seekers patterns --file src/skill_seekers/cli/code_analyzer.py
# Test how-to guide generation (C3.3)
skill-seekers how-to-guides output/test_examples.json --output output/guides/
# Test enhancement status monitoring
skill-seekers enhance-status output/react/ --watch
# Test multi-platform packaging
skill-seekers package output/react/ --target gemini --dry-run
# Test MCP server (stdio mode)
python -m skill_seekers.mcp.server_fastmcp
# Test MCP server (HTTP mode)
python -m skill_seekers.mcp.server_fastmcp --transport http --port 8765
New v3.0.0 CLI Commands
# Setup wizard (interactive configuration)
skill-seekers-setup
# Cloud storage operations
skill-seekers cloud upload --provider s3 --bucket my-bucket output/react.zip
skill-seekers cloud download --provider gcs --bucket my-bucket react.zip
skill-seekers cloud list --provider azure --container my-container
# Embedding server (for RAG pipelines)
skill-seekers embed --port 8080 --model sentence-transformers
# Sync & incremental updates
skill-seekers sync --source https://docs.react.dev/ --target output/react/
skill-seekers update --skill output/react/ --check-changes
# Quality metrics & benchmarking
skill-seekers quality --skill output/react/ --report
skill-seekers benchmark --config configs/react.json --compare-versions
# Multilingual support
skill-seekers multilang --detect output/react/
skill-seekers multilang --translate output/react/ --target zh-CN
# Streaming data ingest
skill-seekers stream --source docs/ --target output/streaming/
🔧 Key Implementation Details
CLI Architecture (Git-style)
Entry point: src/skill_seekers/cli/main.py
The unified CLI modifies sys.argv and calls existing main() functions to maintain backward compatibility:
# Example: skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv
Subcommands: create, scrape, github, pdf, unified, codebase, enhance, enhance-status, package, upload, estimate, install, install-agent, patterns, how-to-guides
NEW: Unified create Command
The recommended way to create skills - Auto-detects source type and provides progressive help disclosure:
# Auto-detection examples
skill-seekers create https://docs.react.dev/ # → Web scraping
skill-seekers create facebook/react # → GitHub analysis
skill-seekers create ./my-project # → Local codebase
skill-seekers create tutorial.pdf # → PDF extraction
skill-seekers create configs/react.json # → Multi-source
# Progressive help system
skill-seekers create --help # Shows universal args only (13 flags)
skill-seekers create --help-web # Shows web-specific options
skill-seekers create --help-github # Shows GitHub-specific options
skill-seekers create --help-local # Shows local analysis options
skill-seekers create --help-pdf # Shows PDF extraction options
skill-seekers create --help-advanced # Shows advanced/rare options
skill-seekers create --help-all # Shows all 120+ flags
# Universal flags work for ALL sources
skill-seekers create <source> -p quick # Preset (-p shortcut)
skill-seekers create <source> --enhance-level 2 # AI enhancement (0-3)
skill-seekers create <source> --chunk-for-rag # RAG chunking
skill-seekers create <source> --dry-run # Preview
Key improvements:
- Single command replaces scrape/github/analyze for most use cases
- Smart detection - No need to specify source type
- Progressive disclosure - Default help shows 13 flags, detailed help available
- -p shortcut - Quick preset selection (
-p quick|standard|comprehensive) - Universal features - RAG chunking, dry-run, presets work everywhere
Recent Additions:
create- NEW: Unified command with auto-detection and progressive helpcodebase- Local codebase analysis without GitHub API (C2.x + C3.x features)enhance-status- Monitor background/daemon enhancement processespatterns- Detect design patterns in code (C3.1)how-to-guides- Generate educational guides from tests (C3.3)
Platform Adaptor Usage
from skill_seekers.cli.adaptors import get_adaptor
# Get platform-specific adaptor
adaptor = get_adaptor('gemini') # or 'claude', 'openai', 'markdown'
# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')
# Upload to platform
adaptor.upload(
package_path='output/react-gemini.tar.gz',
api_key=os.getenv('GOOGLE_API_KEY')
)
# AI enhancement
adaptor.enhance(skill_dir='output/react/', mode='api')
C3.x Codebase Analysis Features
The project has comprehensive codebase analysis capabilities (C3.1-C3.8):
C3.1 Design Pattern Detection (pattern_recognizer.py):
- Detects 10 common patterns: Singleton, Factory, Observer, Strategy, Decorator, Builder, Adapter, Command, Template Method, Chain of Responsibility
- Supports 9 languages: Python, JavaScript, TypeScript, C++, C, C#, Go, Rust, Java
- Three detection levels: surface (fast), deep (balanced), full (thorough)
- 87% precision, 80% recall on real-world projects
C3.2 Test Example Extraction (test_example_extractor.py):
- Extracts real usage examples from test files
- Categories: instantiation, method_call, config, setup, workflow
- AST-based for Python, regex-based for 8 other languages
- Quality filtering with confidence scoring
C3.3 How-To Guide Generation (how_to_guide_builder.py):
- Transforms test workflows into educational guides
- 5 AI enhancements: step descriptions, troubleshooting, prerequisites, next steps, use cases
- Dual-mode AI: API (fast) or LOCAL (free with Claude Code Max)
- 4 grouping strategies: AI tutorial group, file path, test name, complexity
C3.4 Configuration Pattern Extraction (config_extractor.py):
- Extracts configuration patterns from codebases
- Identifies config files, env vars, CLI arguments
- AI enhancement for better organization
C3.5 Architectural Overview (generate_router.py):
- Generates comprehensive ARCHITECTURE.md files
- Router skill generation for large documentation
- Quality improvements: 6.5/10 → 8.5/10 (+31%)
- Integrates GitHub metadata, issues, labels
C3.6 AI Enhancement (Claude API integration):
- Enhances C3.1-C3.5 with AI-powered insights
- Pattern explanations and improvement suggestions
- Test example context and best practices
- Guide enhancement with troubleshooting and prerequisites
C3.7 Architectural Pattern Detection (architectural_pattern_detector.py):
- Detects 8 architectural patterns (MVC, MVVM, MVP, Repository, etc.)
- Framework detection (Django, Flask, Spring, React, Angular, etc.)
- Multi-file analysis with directory structure patterns
- Evidence-based detection with confidence scoring
C3.8 Standalone Codebase Scraper (codebase_scraper.py):
# Quick analysis (1-2 min, basic features only)
skill-seekers analyze --directory /path/to/repo --quick
# Comprehensive analysis (20-60 min, all features + AI)
skill-seekers analyze --directory . --comprehensive
# With AI enhancement (auto-detects API or LOCAL)
skill-seekers analyze --directory . --enhance
# Granular AI enhancement control (NEW)
skill-seekers analyze --directory . --enhance-level 1 # SKILL.md only
skill-seekers analyze --directory . --enhance-level 2 # + Architecture + Config + Docs
skill-seekers analyze --directory . --enhance-level 3 # Full enhancement (all features)
# Disable specific features
skill-seekers analyze --directory . --skip-patterns --skip-how-to-guides
- Generates 300+ line standalone SKILL.md files from codebases
- All C3.x features integrated (patterns, tests, guides, config, architecture, docs)
- Complete codebase analysis without documentation scraping
- NEW: Granular AI enhancement control with
--enhance-level(0-3)
C3.9 Project Documentation Extraction (codebase_scraper.py):
- Extracts and categorizes all markdown files from the project
- Auto-detects categories: overview, architecture, guides, workflows, features, etc.
- Integrates documentation into SKILL.md with summaries
- AI enhancement (level 2+) adds topic extraction and cross-references
- Controlled by depth: surface=raw copy, deep=parse+summarize, full=AI-enhanced
- Default ON, use
--skip-docsto disable
C3.10 Signal Flow Analysis for Godot Projects (signal_flow_analyzer.py):
- Complete signal flow analysis system for event-driven Godot architectures
- Signal declaration extraction (detects
signalkeyword declarations) - Connection mapping (tracks
.connect()calls with targets and methods) - Emission tracking (finds
.emit()andemit_signal()calls) - Real-world metrics: 208 signals, 634 connections, 298 emissions in test project
- Signal density metrics (signals per file)
- Event chain detection (signals triggering other signals)
- Signal pattern detection:
- EventBus Pattern (0.90 confidence): Centralized signal hub in autoload
- Observer Pattern (0.85 confidence): Multi-observer signals (3+ listeners)
- Event Chains (0.80 confidence): Cascading signal propagation
- Signal-based how-to guides (C3.10.1):
- AI-generated step-by-step usage guides (Connect → Emit → Handle)
- Real code examples from project
- Common usage locations
- Parameter documentation
- Outputs:
signal_flow.json,signal_flow.mmd(Mermaid diagram),signal_reference.md,signal_how_to_guides.md - Comprehensive Godot 4.x support:
- GDScript (.gd), Scene files (.tscn), Resources (.tres), Shaders (.gdshader)
- GDScript test extraction (GUT, gdUnit4, WAT frameworks)
- 396 test cases extracted in test project
- Framework detection (Unity, Unreal, Godot)
Key Architecture Decision (BREAKING in v2.5.2):
- Changed from opt-in (
--build-*) to opt-out (--skip-*) flags - All analysis features now ON by default for maximum value
- Backward compatibility warnings for deprecated flags
Smart Categorization Algorithm
Located in doc_scraper.py:smart_categorize():
- Scores pages against category keywords
- 3 points for URL match, 2 for title, 1 for content
- Threshold of 2+ for categorization
- Auto-infers categories from URL segments if none provided
- Falls back to "other" category
Language Detection
Located in doc_scraper.py:detect_language():
- CSS class attributes (
language-*,lang-*) - Heuristics (keywords like
def,const,func)
Configuration File Structure
Configs (configs/*.json) define scraping behavior:
{
"name": "framework-name",
"description": "When to use this skill",
"base_url": "https://docs.example.com/",
"selectors": {
"main_content": "article", // CSS selector
"title": "h1",
"code_blocks": "pre code"
},
"url_patterns": {
"include": ["/docs"],
"exclude": ["/blog"]
},
"categories": {
"getting_started": ["intro", "quickstart"],
"api": ["api", "reference"]
},
"rate_limit": 0.5,
"max_pages": 500
}
🧪 Testing Guidelines
Test Coverage Requirements
- Core features: 100% coverage required
- Platform adaptors: Each platform has dedicated tests
- MCP tools: All 18 tools must be tested
- Integration tests: End-to-end workflows
Test Markers (from pytest.ini_options)
The project uses pytest markers to categorize tests:
# Run only fast unit tests (default)
pytest tests/ -v
# Include slow tests (>5 seconds)
pytest tests/ -v -m slow
# Run integration tests (requires external services)
pytest tests/ -v -m integration
# Run end-to-end tests (resource-intensive, creates files)
pytest tests/ -v -m e2e
# Run tests requiring virtual environment setup
pytest tests/ -v -m venv
# Run bootstrap feature tests
pytest tests/ -v -m bootstrap
# Skip slow and integration tests (fastest)
pytest tests/ -v -m "not slow and not integration"
Test Execution Strategy
By default, only fast tests run. Use markers to control test execution:
# Default: Only fast tests (skip slow/integration/e2e)
pytest tests/ -v
# Include slow tests (>5 seconds)
pytest tests/ -v -m slow
# Include integration tests (requires external services)
pytest tests/ -v -m integration
# Include resource-intensive e2e tests (creates files)
pytest tests/ -v -m e2e
# Run ONLY fast tests (explicit)
pytest tests/ -v -m "not slow and not integration and not e2e"
# Run everything (CI does this)
pytest tests/ -v -m ""
When to use which:
- Local development: Default (fast tests only) -
pytest tests/ -v - Pre-commit: Fast tests -
pytest tests/ -v - Before PR: Include slow + integration -
pytest tests/ -v -m "not e2e" - CI validation: All tests run automatically
Key Test Files
test_scraper_features.py- Core scraping functionalitytest_mcp_server.py- MCP integration (18 tools)test_mcp_fastmcp.py- FastMCP frameworktest_unified.py- Multi-source scrapingtest_github_scraper.py- GitHub analysistest_pdf_scraper.py- PDF extractiontest_install_multiplatform.py- Multi-platform packagingtest_integration.py- End-to-end workflowstest_install_skill.py- One-command installtest_install_agent.py- AI agent installationconftest.py- Test configuration (checks package installation)
🌐 Environment Variables
# Claude AI / Compatible APIs
# Option 1: Official Anthropic API (default)
export ANTHROPIC_API_KEY=sk-ant-...
# Option 2: GLM-4.7 Claude-compatible API (or any compatible endpoint)
export ANTHROPIC_API_KEY=your-api-key
export ANTHROPIC_BASE_URL=https://glm-4-7-endpoint.com/v1
# Google Gemini (optional)
export GOOGLE_API_KEY=AIza...
# OpenAI ChatGPT (optional)
export OPENAI_API_KEY=sk-...
# GitHub (for higher rate limits)
export GITHUB_TOKEN=ghp_...
# Private config repositories (optional)
export GITLAB_TOKEN=glpat-...
export GITEA_TOKEN=...
export BITBUCKET_TOKEN=...
All AI enhancement features respect these settings:
enhance_skill.py- API mode SKILL.md enhancementai_enhancer.py- C3.1/C3.2 pattern and test example enhancementguide_enhancer.py- C3.3 guide enhancementconfig_enhancer.py- C3.4 configuration enhancementadaptors/claude.py- Claude platform adaptor enhancement
Note: Setting ANTHROPIC_BASE_URL allows you to use any Claude-compatible API endpoint, such as GLM-4.7 (智谱 AI).
📦 Package Structure (pyproject.toml)
Entry Points
[project.scripts]
# Main unified CLI
skill-seekers = "skill_seekers.cli.main:main"
# Individual tool entry points (Core)
skill-seekers-config = "skill_seekers.cli.config_command:main" # v2.7.0 Configuration wizard
skill-seekers-resume = "skill_seekers.cli.resume_command:main" # v2.7.0 Resume interrupted jobs
skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
skill-seekers-github = "skill_seekers.cli.github_scraper:main"
skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
skill-seekers-codebase = "skill_seekers.cli.codebase_scraper:main" # C2.x Local codebase analysis
skill-seekers-enhance = "skill_seekers.cli.enhance_skill_local:main"
skill-seekers-enhance-status = "skill_seekers.cli.enhance_status:main" # Status monitoring
skill-seekers-package = "skill_seekers.cli.package_skill:main"
skill-seekers-upload = "skill_seekers.cli.upload_skill:main"
skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
skill-seekers-install = "skill_seekers.cli.install_skill:main"
skill-seekers-install-agent = "skill_seekers.cli.install_agent:main"
skill-seekers-patterns = "skill_seekers.cli.pattern_recognizer:main" # C3.1 Pattern detection
skill-seekers-how-to-guides = "skill_seekers.cli.how_to_guide_builder:main" # C3.3 Guide generation
skill-seekers-workflows = "skill_seekers.cli.workflows_command:main" # NEW: Workflow preset management
# New v3.0.0 Entry Points
skill-seekers-setup = "skill_seekers.cli.setup_wizard:main" # NEW: v3.0.0 Setup wizard
skill-seekers-cloud = "skill_seekers.cli.cloud_storage_cli:main" # NEW: v3.0.0 Cloud storage
skill-seekers-embed = "skill_seekers.embedding.server:main" # NEW: v3.0.0 Embedding server
skill-seekers-sync = "skill_seekers.cli.sync_cli:main" # NEW: v3.0.0 Sync & monitoring
skill-seekers-benchmark = "skill_seekers.cli.benchmark_cli:main" # NEW: v3.0.0 Benchmarking
skill-seekers-stream = "skill_seekers.cli.streaming_ingest:main" # NEW: v3.0.0 Streaming ingest
skill-seekers-update = "skill_seekers.cli.incremental_updater:main" # NEW: v3.0.0 Incremental updates
skill-seekers-multilang = "skill_seekers.cli.multilang_support:main" # NEW: v3.0.0 Multilingual
skill-seekers-quality = "skill_seekers.cli.quality_metrics:main" # NEW: v3.0.0 Quality metrics
Optional Dependencies
Project uses PEP 735 [dependency-groups] (Python 3.13+):
- Replaces deprecated
tool.uv.dev-dependencies - Dev dependencies:
[dependency-groups] dev = [...]in pyproject.toml - Install with:
pip install -e .(installs only core deps) - Install dev deps: See CI workflow or manually install pytest, ruff, mypy
[project.optional-dependencies]
gemini = ["google-generativeai>=0.8.0"]
openai = ["openai>=1.0.0"]
all-llms = ["google-generativeai>=0.8.0", "openai>=1.0.0"]
[dependency-groups] # PEP 735 (replaces tool.uv.dev-dependencies)
dev = [
"pytest>=8.4.2",
"pytest-asyncio>=0.24.0",
"pytest-cov>=7.0.0",
"coverage>=7.11.0",
]
🚨 Critical Development Notes
Must Run Before Tests
# REQUIRED: Install package before running tests
pip install -e .
# Why: src/ layout requires package installation
# Without this, imports will fail
Never Skip Tests
Per user instructions in ~/.claude/CLAUDE.md:
- "never skip any test. always make sure all test pass"
- All 2,121 tests must pass before commits (v3.1.0)
- Run full test suite:
pytest tests/ -v - New tests added for create command and CLI refactor work
Platform-Specific Dependencies
Platform dependencies are optional (install only what you need):
# Install specific platform support
pip install -e ".[gemini]" # Google Gemini
pip install -e ".[openai]" # OpenAI ChatGPT
pip install -e ".[chroma]" # ChromaDB
pip install -e ".[weaviate]" # Weaviate
pip install -e ".[s3]" # AWS S3
pip install -e ".[gcs]" # Google Cloud Storage
pip install -e ".[azure]" # Azure Blob Storage
pip install -e ".[mcp]" # MCP integration
pip install -e ".[all]" # Everything (16 platforms + cloud + embedding)
# Or install from PyPI:
pip install skill-seekers[gemini] # Google Gemini support
pip install skill-seekers[openai] # OpenAI ChatGPT support
pip install skill-seekers[all-llms] # All LLM platforms
pip install skill-seekers[chroma] # ChromaDB support
pip install skill-seekers[weaviate] # Weaviate support
pip install skill-seekers[s3] # AWS S3 support
pip install skill-seekers[all] # All optional dependencies
AI Enhancement Modes
AI enhancement transforms basic skills (2-3/10) into production-ready skills (8-9/10). Two modes available:
API Mode (default if ANTHROPIC_API_KEY is set):
- Direct Claude API calls (fast, efficient)
- Cost: ~$0.15-$0.30 per skill
- Perfect for CI/CD automation
- Requires:
export ANTHROPIC_API_KEY=sk-ant-...
LOCAL Mode (fallback if no API key):
- Uses Claude Code CLI (your existing Max plan)
- Free! No API charges
- 4 execution modes:
- Headless (default): Foreground, waits for completion
- Background (
--background): Returns immediately - Daemon (
--daemon): Fully detached with nohup - Terminal (
--interactive-enhancement): Opens new terminal (macOS)
- Status monitoring:
skill-seekers enhance-status output/react/ --watch - Timeout configuration:
--timeout 300(seconds)
Enhancement Flag Consolidation (Phase 1)
IMPORTANT CHANGE: Three enhancement flags have been unified into a single granular control:
Old flags (deprecated):
--enhance- Enable AI enhancement--enhance-local- Use LOCAL mode (Claude Code)--api-key KEY- Anthropic API key
New unified flag:
--enhance-level LEVEL- Granular AI enhancement control (0-3, default: 2)0- Disabled, no AI enhancement1- SKILL.md only (core documentation)2- + Architecture + Config + Docs (default, balanced)3- Full enhancement (all features, comprehensive)
Auto-detection: Mode (API vs LOCAL) is auto-detected:
- If
ANTHROPIC_API_KEYis set → API mode - Otherwise → LOCAL mode (Claude Code Max)
Examples:
# Auto-detect mode, default enhancement level (2)
skill-seekers create https://docs.react.dev/
# Disable enhancement
skill-seekers create facebook/react --enhance-level 0
# SKILL.md only (fast)
skill-seekers create ./my-project --enhance-level 1
# Full enhancement (comprehensive)
skill-seekers create tutorial.pdf --enhance-level 3
# Force LOCAL mode with specific level
skill-seekers enhance output/react/ --mode LOCAL --enhance-level 2
# Background with status monitoring
skill-seekers enhance output/react/ --background
skill-seekers enhance-status output/react/ --watch
Migration: Old flags still work with deprecation warnings, will be removed in v4.0.0.
See docs/ENHANCEMENT_MODES.md for detailed documentation.
Git Workflow
Git Workflow Notes:
- Main branch:
main - Development branch:
development - Always create feature branches from
development - Branch naming:
feature/{task-id}-{description}orfeature/{category}
To see current status: git status
CI/CD Pipeline
The project has GitHub Actions workflows in .github/workflows/:
tests.yml - Runs on every push and PR to main or development:
-
Lint Job (Python 3.12, Ubuntu):
ruff check src/ tests/- Code linting with GitHub annotationsruff format --check src/ tests/- Format validationmypy src/skill_seekers- Type checking (continue-on-error)
-
Test Job (Matrix):
- OS: Ubuntu + macOS
- Python: 3.10, 3.11, 3.12
- Exclusions: macOS + Python 3.10 (speed optimization)
- Steps:
- Install dependencies +
pip install -e . - Run CLI tests (scraper, config, integration)
- Run MCP server tests
- Generate coverage report → Upload to Codecov
- Install dependencies +
-
Summary Job - Single status check for branch protection
- Ensures both lint and test jobs succeed
- Provides single "All Checks Complete" status
release.yml - Triggers on version tags (e.g., v2.9.0):
- Builds package with
uv build - Publishes to PyPI with
uv publish - Creates GitHub release
Local Pre-Commit Validation
Run the same checks as CI before pushing:
# 1. Code quality (matches lint job) - WITH AUTO-FIX
uvx ruff check --fix --unsafe-fixes src/ tests/ # Auto-fix issues
uvx ruff format src/ tests/ # Auto-format
uvx ruff check src/ tests/ # Verify clean
uvx ruff format --check src/ tests/ # Verify formatted
mypy src/skill_seekers
# 2. Tests (matches test job)
pip install -e .
pytest tests/ -v --cov=src/skill_seekers --cov-report=term
# 3. If all pass, you're good to push!
git add -A # Stage any auto-fixes
git commit --amend --no-edit # Add fixes to commit (or new commit)
git push origin feature/my-feature
Branch Protection Rules:
- main: Requires tests + 1 review, only maintainers merge
- development: Requires tests to pass, default target for PRs
Common CI Failure Patterns and Fixes
If CI fails after your changes, follow this debugging checklist:
# 1. Fix linting errors automatically
uvx ruff check --fix --unsafe-fixes src/ tests/
# 2. Fix formatting issues
uvx ruff format src/ tests/
# 3. Check for remaining issues
uvx ruff check src/ tests/
uvx ruff format --check src/ tests/
# 4. Verify tests pass locally
pip install -e .
pytest tests/ -v
# 5. Push fixes
git add -A
git commit -m "fix: resolve CI linting/formatting issues"
git push
Critical dependency patterns to check:
- MCP version mismatch: Ensure
requirements.txtandpyproject.tomlhave matching MCP versions - Missing module-level imports: If a tool file imports a module at top level (e.g.,
import yaml), that module MUST be in core dependencies - Try/except ImportError: Silent failures in try/except blocks can hide missing dependencies
Timing-sensitive tests:
- Benchmark tests may fail on slower CI runners (macOS)
- If a test times out or exceeds threshold only in CI, consider relaxing the threshold
- Local passing doesn't guarantee CI passing for performance tests
🚨 Common Pitfalls & Solutions
1. Import Errors
Problem: ModuleNotFoundError: No module named 'skill_seekers'
Solution: Must install package first due to src/ layout
pip install -e .
Why: The src/ layout prevents imports from repo root. Package must be installed.
2. Tests Fail with "No module named..."
Problem: Package not installed in test environment
Solution: CI runs pip install -e . before tests - do the same locally
pip install -e .
pytest tests/ -v
3. Platform-Specific Dependencies Not Found
Problem: ModuleNotFoundError: No module named 'google.generativeai'
Solution: Install platform-specific dependencies
pip install -e ".[gemini]" # For Gemini
pip install -e ".[openai]" # For OpenAI
pip install -e ".[all-llms]" # For all platforms
4. Git Branch Confusion
Problem: PR targets main instead of development
Solution: Always create PRs targeting development branch
git checkout development
git pull upstream development
git checkout -b feature/my-feature
# ... make changes ...
git push origin feature/my-feature
# Create PR: feature/my-feature → development
Important: See CONTRIBUTING.md for complete branch workflow.
5. Tests Pass Locally But Fail in CI
Problem: Different Python version or missing dependency
Solution: Test with multiple Python versions locally
# CI tests: Python 3.10, 3.11, 3.12 on Ubuntu + macOS
# Use pyenv or docker to test locally:
pyenv install 3.10.13 3.11.7 3.12.1
pyenv local 3.10.13
pip install -e . && pytest tests/ -v
pyenv local 3.11.7
pip install -e . && pytest tests/ -v
pyenv local 3.12.1
pip install -e . && pytest tests/ -v
6. Enhancement Not Working
Problem: AI enhancement fails or hangs
Solutions:
# Check if API key is set
echo $ANTHROPIC_API_KEY
# Try LOCAL mode instead (uses Claude Code Max, no API key needed)
skill-seekers enhance output/react/ --mode LOCAL
# Monitor enhancement status for background jobs
skill-seekers enhance-status output/react/ --watch
7. Rate Limit Errors from GitHub
Problem: 403 Forbidden from GitHub API
Solutions:
# Check current rate limit
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
# Configure multiple GitHub profiles (recommended)
skill-seekers config --github
# Use specific profile
skill-seekers github --repo owner/repo --profile work
# Test all configured tokens
skill-seekers config --test
8. Confused About Command Options
Problem: "Too many flags!" or "Which flags work with which sources?"
Solution: Use the progressive disclosure help system in the create command:
# Start with universal options (13 flags)
skill-seekers create --help
# Need web scraping options?
skill-seekers create --help-web
# GitHub-specific flags?
skill-seekers create --help-github
# See ALL options (120+ flags)?
skill-seekers create --help-all
# Quick preset shortcut
skill-seekers create <source> -p quick
skill-seekers create <source> -p standard
skill-seekers create <source> -p comprehensive
Why: The create command shows only relevant flags by default to reduce cognitive load.
Legacy commands (scrape, github, analyze) show all flags in one help screen - use them if you prefer that style.
9. CI Passes Locally But Fails in GitHub Actions
Problem: Ruff check/format or tests pass locally but fail in CI
Common causes:
-
Dependency version mismatch -
requirements.txtvspyproject.tomlconflicts# Check both files have matching versions for core deps grep "mcp" requirements.txt pyproject.toml grep "PyYAML" requirements.txt pyproject.toml -
Module imported but not declared - File imports module at top level but it's not in dependencies
# Search for imports that might not be in dependencies grep -r "^import yaml" src/ grep -r "^from yaml" src/ # Ensure PyYAML is in pyproject.toml core dependencies -
Ruff version differences - Local ruff vs CI ruff may have different rules
# Use uvx to match CI's ruff version uvx ruff check src/ tests/ uvx ruff format src/ tests/
Solution:
# Run CI validation commands exactly as CI does
pip install -e . # Fresh install
uvx ruff check src/ tests/ # Use uvx, not local ruff
uvx ruff format --check src/ tests/
pytest tests/ -v
🔌 MCP Integration
MCP Server (26 Tools)
Transport modes:
- stdio: Claude Code, VS Code + Cline
- HTTP: Cursor, Windsurf, IntelliJ IDEA
Core Tools (9):
list_configs- List preset configurationsgenerate_config- Generate config from docs URLvalidate_config- Validate config structureestimate_pages- Estimate page countscrape_docs- Scrape documentationpackage_skill- Package to format (supports--formatand--target)upload_skill- Upload to platform (supports--target)enhance_skill- AI enhancement with platform supportinstall_skill- Complete workflow automation
Extended Tools (10):
10. scrape_github - GitHub repository analysis
11. scrape_pdf - PDF extraction
12. unified_scrape - Multi-source scraping
13. merge_sources - Merge docs + code
14. detect_conflicts - Find discrepancies
15. add_config_source - Register git repos
16. fetch_config - Fetch configs from git
17. list_config_sources - List registered sources
18. remove_config_source - Remove config source
19. split_config - Split large configs
NEW Vector DB Tools (4):
20. export_to_chroma - Export to ChromaDB
21. export_to_weaviate - Export to Weaviate
22. export_to_faiss - Export to FAISS
23. export_to_qdrant - Export to Qdrant
NEW Cloud Tools (3):
24. cloud_upload - Upload to S3/GCS/Azure
25. cloud_download - Download from cloud storage
26. cloud_list - List files in cloud storage
Starting MCP Server
# stdio mode (Claude Code, VS Code + Cline)
python -m skill_seekers.mcp.server_fastmcp
# HTTP mode (Cursor, Windsurf, IntelliJ)
python -m skill_seekers.mcp.server_fastmcp --transport http --port 8765
🤖 RAG Framework & Vector Database Integrations (NEW - v3.0.0)
Skill Seekers is now the universal preprocessor for RAG pipelines. Export documentation to any RAG framework or vector database with a single command.
RAG Frameworks
LangChain Documents:
# Export to LangChain Document format
skill-seekers package output/django --format langchain
# Output: output/django-langchain.json
# Format: Array of LangChain Document objects
# - page_content: Full text content
# - metadata: {source, category, type, url}
# Use in LangChain:
from langchain.document_loaders import JSONLoader
loader = JSONLoader("output/django-langchain.json")
documents = loader.load()
LlamaIndex TextNodes:
# Export to LlamaIndex TextNode format
skill-seekers package output/django --format llama-index
# Output: output/django-llama-index.json
# Format: Array of LlamaIndex TextNode objects
# - text: Content
# - id_: Unique identifier
# - metadata: {source, category, type}
# - relationships: Document relationships
# Use in LlamaIndex:
from llama_index import StorageContext, load_index_from_storage
from llama_index.schema import TextNode
nodes = [TextNode.from_dict(n) for n in json.load(open("output/django-llama-index.json"))]
Haystack Documents:
# Export to Haystack Document format
skill-seekers package output/django --format haystack
# Output: output/django-haystack.json
# Format: Haystack Document objects for pipelines
# Perfect for: Question answering, search, RAG pipelines
Vector Databases
ChromaDB (Direct Integration):
# Export and optionally upload to ChromaDB
skill-seekers package output/django --format chroma
# Output: output/django-chroma/ (ChromaDB collection)
# With direct upload (requires chromadb running):
skill-seekers package output/django --format chroma --upload
# Configuration via environment:
export CHROMA_HOST=localhost
export CHROMA_PORT=8000
FAISS (Facebook AI Similarity Search):
# Export to FAISS index format
skill-seekers package output/django --format faiss
# Output:
# - output/django-faiss.index (FAISS index)
# - output/django-faiss-metadata.json (Document metadata)
# Use with FAISS:
import faiss
index = faiss.read_index("output/django-faiss.index")
Weaviate:
# Export and upload to Weaviate
skill-seekers package output/django --format weaviate --upload
# Requires environment variables:
export WEAVIATE_URL=http://localhost:8080
export WEAVIATE_API_KEY=your-api-key
# Creates class "DjangoDoc" with schema
Qdrant:
# Export and upload to Qdrant
skill-seekers package output/django --format qdrant --upload
# Requires environment variables:
export QDRANT_URL=http://localhost:6333
export QDRANT_API_KEY=your-api-key
# Creates collection "django_docs"
Pinecone (via Markdown):
# Pinecone uses the markdown format
skill-seekers package output/django --target markdown
# Then use Pinecone's Python client for upsert
# See: docs/integrations/PINECONE.md
Complete RAG Pipeline Example
# 1. Scrape documentation
skill-seekers scrape --config configs/django.json
# 2. Export to your RAG stack
skill-seekers package output/django --format langchain # For LangChain
skill-seekers package output/django --format llama-index # For LlamaIndex
skill-seekers package output/django --format chroma --upload # Direct to ChromaDB
# 3. Use in your application
# See examples/:
# - examples/langchain-rag-pipeline/
# - examples/llama-index-query-engine/
# - examples/pinecone-upsert/
Integration Hub: docs/integrations/RAG_PIPELINES.md
🛠️ AI Coding Assistant Integrations (NEW - v3.0.0)
Transform any framework documentation into persistent expert context for 4+ AI coding assistants. Your IDE's AI now "knows" your frameworks without manual prompting.
Cursor IDE
Setup:
# 1. Generate skill
skill-seekers scrape --config configs/react.json
skill-seekers package output/react/ --target claude
# 2. Install to Cursor
cp output/react-claude/SKILL.md .cursorrules
# 3. Restart Cursor
# AI now has React expertise!
Benefits:
- ✅ AI suggests React-specific patterns
- ✅ No manual "use React hooks" prompts needed
- ✅ Consistent team patterns
- ✅ Works for ANY framework
Guide: docs/integrations/CURSOR.md Example: examples/cursor-react-skill/
Windsurf
Setup:
# 1. Generate skill
skill-seekers scrape --config configs/django.json
skill-seekers package output/django/ --target claude
# 2. Install to Windsurf
mkdir -p .windsurf/rules
cp output/django-claude/SKILL.md .windsurf/rules/django.md
# 3. Restart Windsurf
# AI now knows Django patterns!
Benefits:
- ✅ Flow-based coding with framework knowledge
- ✅ IDE-native AI assistance
- ✅ Persistent context across sessions
Guide: docs/integrations/WINDSURF.md Example: examples/windsurf-fastapi-context/
Cline (VS Code Extension)
Setup:
# 1. Generate skill
skill-seekers scrape --config configs/fastapi.json
skill-seekers package output/fastapi/ --target claude
# 2. Install to Cline
cp output/fastapi-claude/SKILL.md .clinerules
# 3. Reload VS Code
# Cline now has FastAPI expertise!
Benefits:
- ✅ Agentic code generation in VS Code
- ✅ Cursor Composer equivalent for VS Code
- ✅ System prompts + MCP integration
Guide: docs/integrations/CLINE.md Example: examples/cline-django-assistant/
Continue.dev (Universal IDE)
Setup:
# 1. Generate skill
skill-seekers scrape --config configs/react.json
skill-seekers package output/react/ --target claude
# 2. Start context server
cd examples/continue-dev-universal/
python context_server.py --port 8765
# 3. Configure in ~/.continue/config.json
{
"contextProviders": [
{
"name": "http",
"params": {
"url": "http://localhost:8765/context",
"title": "React Documentation"
}
}
]
}
# 4. Works in ALL IDEs!
# VS Code, JetBrains, Vim, Emacs...
Benefits:
- ✅ IDE-agnostic (works in VS Code, IntelliJ, Vim, Emacs)
- ✅ Custom LLM providers supported
- ✅ HTTP-based context serving
- ✅ Team consistency across mixed IDE environments
Guide: docs/integrations/CONTINUE_DEV.md Example: examples/continue-dev-universal/
Multi-IDE Team Setup
For teams using different IDEs (VS Code, IntelliJ, Vim):
# Use Continue.dev as universal context provider
skill-seekers scrape --config configs/react.json
python context_server.py --host 0.0.0.0 --port 8765
# ALL team members configure Continue.dev
# Result: Identical AI suggestions across all IDEs!
Integration Hub: docs/integrations/INTEGRATIONS.md
☁️ Cloud Storage Integration (NEW - v3.0.0)
Upload skills directly to cloud storage for team sharing and CI/CD pipelines.
Supported Providers
AWS S3:
# Upload skill
skill-seekers cloud upload --provider s3 --bucket my-skills output/react.zip
# Download skill
skill-seekers cloud download --provider s3 --bucket my-skills react.zip
# List skills
skill-seekers cloud list --provider s3 --bucket my-skills
# Environment variables:
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AWS_REGION=us-east-1
Google Cloud Storage:
# Upload skill
skill-seekers cloud upload --provider gcs --bucket my-skills output/react.zip
# Download skill
skill-seekers cloud download --provider gcs --bucket my-skills react.zip
# List skills
skill-seekers cloud list --provider gcs --bucket my-skills
# Environment variables:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
Azure Blob Storage:
# Upload skill
skill-seekers cloud upload --provider azure --container my-skills output/react.zip
# Download skill
skill-seekers cloud download --provider azure --container my-skills react.zip
# List skills
skill-seekers cloud list --provider azure --container my-skills
# Environment variables:
export AZURE_STORAGE_CONNECTION_STRING=your-connection-string
CI/CD Integration
# GitHub Actions example
- name: Upload skill to S3
run: |
skill-seekers scrape --config configs/react.json
skill-seekers package output/react/
skill-seekers cloud upload --provider s3 --bucket ci-skills output/react.zip
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Guide: docs/integrations/CLOUD_STORAGE.md
📋 Common Workflows
Adding a New Platform
- Create adaptor in
src/skill_seekers/cli/adaptors/{platform}_adaptor.py - Inherit from
BaseAdaptor - Implement
package(),upload(),enhance()methods - Add to factory in
adaptors/__init__.py - Add optional dependency to
pyproject.toml - Add tests in
tests/test_install_multiplatform.py
Adding a New Feature
- Implement in appropriate CLI module
- Add entry point to
pyproject.tomlif needed - Add tests in
tests/test_{feature}.py - Run full test suite:
pytest tests/ -v - Update CHANGELOG.md
- Commit only when all tests pass
Debugging Common Issues
Import Errors:
# Always ensure package is installed first
pip install -e .
# Verify installation
python -c "import skill_seekers; print(skill_seekers.__version__)"
Rate Limit Issues:
# Check current GitHub rate limit status
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
# Configure multiple GitHub profiles
skill-seekers config --github
# Test your tokens
skill-seekers config --test
Enhancement Not Working:
# Check if API key is set
echo $ANTHROPIC_API_KEY
# Try LOCAL mode instead (uses Claude Code Max)
skill-seekers enhance output/react/ --mode LOCAL
# Monitor enhancement status
skill-seekers enhance-status output/react/ --watch
Test Failures:
# Run specific failing test with verbose output
pytest tests/test_file.py::test_name -vv
# Run with print statements visible
pytest tests/test_file.py -s
# Run with coverage to see what's not tested
pytest tests/test_file.py --cov=src/skill_seekers --cov-report=term-missing
# Run only unit tests (skip slow integration tests)
pytest tests/ -v -m "not slow and not integration"
Config Issues:
# Validate config structure
skill-seekers-validate configs/myconfig.json
# Show current configuration
skill-seekers config --show
# Estimate pages before scraping
skill-seekers estimate configs/myconfig.json
🎯 Where to Make Changes
This section helps you quickly locate the right files when implementing common changes.
Adding a New CLI Command
Files to modify:
-
Create command file:
src/skill_seekers/cli/my_command.pydef main(): """Entry point for my-command.""" # Implementation -
Add entry point:
pyproject.toml[project.scripts] skill-seekers-my-command = "skill_seekers.cli.my_command:main" -
Update unified CLI:
src/skill_seekers/cli/main.py- Add subcommand handler to dispatcher
-
Add tests:
tests/test_my_command.py- Test main functionality
- Test CLI argument parsing
- Test error cases
-
Update docs:
CHANGELOG.md+README.md(if user-facing)
Adding a New Platform Adaptor
Files to modify:
-
Create adaptor:
src/skill_seekers/cli/adaptors/my_platform_adaptor.pyfrom .base import BaseAdaptor class MyPlatformAdaptor(BaseAdaptor): def package(self, skill_dir, output_path, **kwargs): # Platform-specific packaging pass def upload(self, package_path, api_key=None, **kwargs): # Platform-specific upload (optional for some platforms) pass def export(self, skill_dir, format, **kwargs): # For RAG/vector DB adaptors: export to specific format pass -
Register in factory:
src/skill_seekers/cli/adaptors/__init__.pydef get_adaptor(target=None, format=None): # For LLM platforms (--target flag) target_adaptors = { 'claude': ClaudeAdaptor, 'gemini': GeminiAdaptor, 'openai': OpenAIAdaptor, 'markdown': MarkdownAdaptor, 'myplatform': MyPlatformAdaptor, # ADD THIS } # For RAG/vector DBs (--format flag) format_adaptors = { 'langchain': LangChainAdaptor, 'llama-index': LlamaIndexAdaptor, 'chroma': ChromaAdaptor, # ... etc } -
Add optional dependency:
pyproject.toml[project.optional-dependencies] myplatform = ["myplatform-sdk>=1.0.0"] -
Add tests:
tests/test_adaptors/test_my_platform_adaptor.py- Test export format
- Test upload (if applicable)
- Test with real data
-
Update documentation:
- README.md - Platform comparison table
- docs/integrations/MY_PLATFORM.md - Integration guide
- examples/my-platform-example/ - Working example
Adding a New Config Preset
Files to modify:
-
Create config:
configs/my_framework.json{ "name": "my_framework", "base_url": "https://docs.myframework.com/", "selectors": {...}, "categories": {...} } -
Test locally:
# Estimate first skill-seekers estimate configs/my_framework.json # Test scrape (small sample) skill-seekers scrape --config configs/my_framework.json --max-pages 50 -
Add to README: Update presets table in
README.md -
Submit to website: (Optional) Submit to SkillSeekersWeb.com
Modifying Core Scraping Logic
Key files by feature:
| Feature | File | Size | Notes |
|---|---|---|---|
| Doc scraping | src/skill_seekers/cli/doc_scraper.py |
~90KB | Main scraper, BFS traversal |
| GitHub scraping | src/skill_seekers/cli/github_scraper.py |
~56KB | Repo analysis + metadata |
| GitHub API | src/skill_seekers/cli/github_fetcher.py |
~17KB | Rate limit handling |
| PDF extraction | src/skill_seekers/cli/pdf_scraper.py |
Medium | PyMuPDF + OCR |
| Code analysis | src/skill_seekers/cli/code_analyzer.py |
~65KB | Multi-language AST parsing |
| Pattern detection | src/skill_seekers/cli/pattern_recognizer.py |
Medium | C3.1 - 10 GoF patterns |
| Test extraction | src/skill_seekers/cli/test_example_extractor.py |
Medium | C3.2 - 5 categories |
| Guide generation | src/skill_seekers/cli/how_to_guide_builder.py |
~45KB | C3.3 - AI-enhanced guides |
| Config extraction | src/skill_seekers/cli/config_extractor.py |
~32KB | C3.4 - 9 formats |
| Router generation | src/skill_seekers/cli/generate_router.py |
~43KB | C3.5 - Architecture docs |
| Signal flow | src/skill_seekers/cli/signal_flow_analyzer.py |
Medium | C3.10 - Godot-specific |
Always add tests when modifying core logic!
Modifying the Unified Create Command
The create command uses a modular argument system:
Files involved:
-
Parser:
src/skill_seekers/cli/parsers/create_parser.py- Defines help text and formatter
- Registers help mode flags (
--help-web,--help-github, etc.) - Uses custom
NoWrapFormatterfor better help display
-
Arguments:
src/skill_seekers/cli/arguments/create.py- Three tiers of arguments:
UNIVERSAL_ARGUMENTS(13 flags) - Work for all sources- Source-specific dicts (
WEB_ARGUMENTS,GITHUB_ARGUMENTS, etc.) ADVANCED_ARGUMENTS- Rare/advanced options
add_create_arguments(parser, mode)- Multi-mode argument addition
- Three tiers of arguments:
-
Source Detection:
src/skill_seekers/cli/source_detector.py(if implemented)- Auto-detect source type from input
- Pattern matching (URLs, GitHub repos, file extensions)
-
Main Logic:
src/skill_seekers/cli/create_command.py(if implemented)- Route to appropriate scraper based on detected type
- Argument validation and compatibility checking
When adding new arguments:
- Universal args →
UNIVERSAL_ARGUMENTSinarguments/create.py - Source-specific → Appropriate dict (
WEB_ARGUMENTS, etc.) - Always update help text and add tests
Example: Adding a new universal flag:
# In arguments/create.py
UNIVERSAL_ARGUMENTS = {
# ... existing args ...
"my_flag": {
"flags": ("--my-flag", "-m"),
"kwargs": {
"action": "store_true",
"help": "Description of my flag",
},
},
}
Adding MCP Tools
Files to modify:
-
Add tool function:
src/skill_seekers/mcp/tools/{category}_tools.py -
Register tool:
src/skill_seekers/mcp/server.py@mcp.tool() def my_new_tool(param: str) -> str: """Tool description.""" # Implementation -
Add tests:
tests/test_mcp_fastmcp.py -
Update count: README.md (currently 18 tools)
📍 Key Files Quick Reference
| Task | File(s) | What to Modify |
|---|---|---|
| Add new CLI command | src/skill_seekers/cli/my_cmd.pypyproject.toml |
Create main() functionAdd entry point |
| Add platform adaptor | src/skill_seekers/cli/adaptors/my_platform.pyadaptors/__init__.py |
Inherit BaseAdaptorRegister in factory |
| Fix scraping logic | src/skill_seekers/cli/doc_scraper.py |
scrape_all(), extract_content() |
| Add MCP tool | src/skill_seekers/mcp/server_fastmcp.py |
Add @mcp.tool() function |
| Fix tests | tests/test_{feature}.py |
Add/modify test functions |
| Add config preset | configs/{framework}.json |
Create JSON config |
| Update CI | .github/workflows/tests.yml |
Modify workflow steps |
📚 Key Code Locations
Documentation Scraper (src/skill_seekers/cli/doc_scraper.py):
is_valid_url()- URL validationextract_content()- Content extractiondetect_language()- Code language detectionextract_patterns()- Pattern extractionsmart_categorize()- Smart categorizationinfer_categories()- Category inferencegenerate_quick_reference()- Quick reference generationcreate_enhanced_skill_md()- SKILL.md generationscrape_all()- Main scraping loopmain()- Entry point
Codebase Analysis (src/skill_seekers/cli/):
codebase_scraper.py- Main CLI for local codebase analysiscode_analyzer.py- Multi-language AST parsing (9 languages)api_reference_builder.py- API documentation generationdependency_analyzer.py- NetworkX-based dependency graphspattern_recognizer.py- C3.1 design pattern detectiontest_example_extractor.py- C3.2 test example extractionhow_to_guide_builder.py- C3.3 guide generationconfig_extractor.py- C3.4 configuration extractiongenerate_router.py- C3.5 router skill generationsignal_flow_analyzer.py- C3.10 signal flow analysis (Godot projects)unified_codebase_analyzer.py- Three-stream GitHub+local analyzer
AI Enhancement (src/skill_seekers/cli/):
enhance_skill_local.py- LOCAL mode enhancement (4 execution modes)enhance_skill.py- API mode enhancementenhance_status.py- Status monitoring for background processesai_enhancer.py- Shared AI enhancement logicguide_enhancer.py- C3.3 guide AI enhancementconfig_enhancer.py- C3.4 config AI enhancement
Platform Adaptors (src/skill_seekers/cli/adaptors/):
__init__.py- Factory functionbase_adaptor.py- Abstract base classclaude_adaptor.py- Claude AI implementationgemini_adaptor.py- Google Gemini implementationopenai_adaptor.py- OpenAI ChatGPT implementationmarkdown_adaptor.py- Generic Markdown implementation
MCP Server (src/skill_seekers/mcp/):
server.py- FastMCP-based servertools/- 18 MCP tool implementations
Configuration & Rate Limit Management (NEW: v2.7.0 - src/skill_seekers/cli/):
config_manager.py- Multi-token configuration system (~490 lines)ConfigManagerclass - Singleton pattern for global config accessadd_github_profile()- Add GitHub profile with token and strategyget_github_token()- Smart fallback chain (CLI → Env → Config → Prompt)get_next_profile()- Profile switching for rate limit handlingsave_progress()/load_progress()- Job resumption supportcleanup_old_progress()- Auto-cleanup of old jobs (7 days default)
config_command.py- Interactive configuration wizard (~400 lines)main_menu()- 7-option main menu with navigationgithub_token_menu()- GitHub profile managementadd_github_profile()- Guided token setup with browser integrationapi_keys_menu()- API key configuration for Claude/Gemini/OpenAItest_connections()- Connection testing for tokens and API keys
rate_limit_handler.py- Smart rate limit detection and handling (~450 lines)RateLimitHandlerclass - Strategy pattern for rate limit handlingcheck_upfront()- Upfront rate limit check before startingcheck_response()- Real-time detection from API responseshandle_rate_limit()- Execute strategy (prompt/wait/switch/fail)try_switch_profile()- Automatic profile switchingwait_for_reset()- Countdown timer with live progressshow_countdown_timer()- Live terminal countdown display
resume_command.py- Resume interrupted scraping jobs (~150 lines)list_resumable_jobs()- Display all jobs with progress detailsresume_job()- Resume from saved checkpointclean_old_jobs()- Cleanup old progress files
GitHub Integration (Modified for v2.7.0 - src/skill_seekers/cli/):
github_fetcher.py- Integrated rate limit handler- Constructor now accepts
interactiveandprofile_nameparameters fetch()- Added upfront rate limit check- All API calls check responses for rate limits
- Raises
RateLimitErrorwhen rate limit cannot be handled
- Constructor now accepts
github_scraper.py- Added CLI flags--non-interactiveflag for CI/CD mode (fail fast)--profileflag to select GitHub profile from config- Config supports
interactiveandgithub_profilekeys
RAG & Vector Database Adaptors (NEW: v3.0.0 - src/skill_seekers/cli/adaptors/):
langchain.py- LangChain Documents export (~250 lines)- Exports to LangChain Document format
- Preserves metadata (source, category, type, url)
- Smart chunking with overlap
llama_index.py- LlamaIndex TextNodes export (~280 lines)- Exports to TextNode format with unique IDs
- Relationship mapping between documents
- Metadata preservation
haystack.py- Haystack Documents export (~230 lines)- Pipeline-ready document format
- Supports embeddings and filters
chroma.py- ChromaDB integration (~350 lines)- Direct collection creation
- Batch upsert with embeddings
- Query interface
weaviate.py- Weaviate vector search (~320 lines)- Schema creation with auto-detection
- Batch import with error handling
faiss_helpers.py- FAISS index generation (~280 lines)- Index building with metadata
- Search utilities
qdrant.py- Qdrant vector database (~300 lines)- Collection management
- Payload indexing
streaming_adaptor.py- Streaming data ingest (~200 lines)- Real-time data processing
- Incremental updates
Cloud Storage & Infrastructure (NEW: v3.0.0 - src/skill_seekers/cli/):
cloud_storage_cli.py- S3/GCS/Azure upload/download (~450 lines)- Multi-provider abstraction
- Parallel uploads for large files
- Retry logic with exponential backoff
embedding_pipeline.py- Embedding generation for vectors (~320 lines)- Sentence-transformers integration
- Batch processing
- Multiple embedding models
sync_cli.py- Continuous sync & monitoring (~380 lines)- File watching for changes
- Automatic re-scraping
- Smart diff detection
incremental_updater.py- Smart incremental updates (~350 lines)- Change detection algorithms
- Partial skill updates
- Version tracking
streaming_ingest.py- Real-time data streaming (~290 lines)- Stream processing pipelines
- WebSocket support
benchmark_cli.py- Performance benchmarking (~280 lines)- Scraping performance tests
- Comparison reports
- CI/CD integration
quality_metrics.py- Quality analysis & reporting (~340 lines)- Completeness scoring
- Link checking
- Content quality metrics
multilang_support.py- Internationalization support (~260 lines)- Language detection
- Translation integration
- Multi-locale skills
setup_wizard.py- Interactive setup wizard (~220 lines)- Configuration management
- Profile creation
- First-time setup
🎯 Project-Specific Best Practices
- Prefer the unified
createcommand - Useskill-seekers create <source>over legacy commands for consistency - Always use platform adaptors - Never hardcode platform-specific logic
- Test all platforms - Changes must work for all 16 platforms (was 4 in v2.x)
- Maintain backward compatibility - Legacy commands (scrape, github, analyze) must still work
- Document API changes - Update CHANGELOG.md for every release
- Keep dependencies optional - Platform-specific deps are optional (RAG, cloud, etc.)
- Use src/ layout - Proper package structure with
pip install -e . - Run tests before commits - Per user instructions, never skip tests (1,765+ tests must pass)
- RAG-first mindset - v3.0.0 is the universal preprocessor for AI systems
- Export format clarity - Use
--formatfor RAG/vector DBs,--targetfor LLM platforms - Test with real integrations - Verify exports work with actual LangChain, ChromaDB, etc.
- Progressive disclosure - When adding flags, categorize as universal/source-specific/advanced
🐛 Debugging Tips
Enable Verbose Logging
# Set environment variable for debug output
export SKILL_SEEKERS_DEBUG=1
skill-seekers scrape --config configs/react.json
Test Single Function/Module
Run Python modules directly for debugging:
# Run modules with --help to see options
python -m skill_seekers.cli.doc_scraper --help
python -m skill_seekers.cli.github_scraper --repo facebook/react --dry-run
python -m skill_seekers.cli.package_skill --help
# Test MCP server directly
python -m skill_seekers.mcp.server_fastmcp
Use pytest with Debugging
# Drop into debugger on failure
pytest tests/test_scraper_features.py --pdb
# Show print statements (normally suppressed)
pytest tests/test_scraper_features.py -s
# Verbose test output (shows full diff, more details)
pytest tests/test_scraper_features.py -vv
# Run only failed tests from last run
pytest tests/ --lf
# Run until first failure (stop immediately)
pytest tests/ -x
# Show local variables on failure
pytest tests/ -l
Debug Specific Test
# Run single test with full output
pytest tests/test_scraper_features.py::test_detect_language -vv -s
# With debugger
pytest tests/test_scraper_features.py::test_detect_language --pdb
Check Package Installation
# Verify package is installed
pip list | grep skill-seekers
# Check installation mode (should show editable location)
pip show skill-seekers
# Verify imports work
python -c "import skill_seekers; print(skill_seekers.__version__)"
# Check CLI entry points
which skill-seekers
skill-seekers --version
Common Error Messages & Solutions
"ModuleNotFoundError: No module named 'skill_seekers'"
→ Solution: pip install -e .
→ Why: src/ layout requires package installation
"403 Forbidden" from GitHub API
→ Solution: Rate limit hit, set GITHUB_TOKEN or use skill-seekers config --github
→ Check limit: curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit
"SKILL.md enhancement failed"
→ Solution: Check if ANTHROPIC_API_KEY is set, or use --mode LOCAL
→ Monitor: skill-seekers enhance-status output/react/ --watch
"No such file or directory: 'configs/myconfig.json'" → Solution: Config path resolution order:
- Exact path as provided
./configs/(current directory)~/.config/skill-seekers/configs/(user config)- SkillSeekersWeb.com API (presets)
"pytest: command not found" → Solution: Install dev dependencies
pip install pytest pytest-asyncio pytest-cov coverage
# Or: pip install -e ".[dev]" (if available)
"ruff: command not found" → Solution: Install ruff
pip install ruff
# Or use uvx: uvx ruff check src/
Debugging Scraping Issues
No content extracted?
# Test selectors in Python
from bs4 import BeautifulSoup
import requests
url = "https://docs.example.com/page"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectors
print(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
print(soup.select_one('.documentation-content'))
Categories not working?
- Check
categoriesin config has correct keywords - Run with
--dry-runto see categorization without scraping - Enable debug mode:
export SKILL_SEEKERS_DEBUG=1
Profiling Performance
# Profile scraping performance
python -m cProfile -o profile.stats -m skill_seekers.cli.doc_scraper --config configs/react.json --max-pages 10
# Analyze profile
python -m pstats profile.stats
# In pstats shell:
# > sort cumtime
# > stats 20
📖 Additional Documentation
Official Website:
- SkillSeekersWeb.com - Browse 24+ preset configs, share configs, complete documentation
For Users:
- README.md - Complete user documentation
- BULLETPROOF_QUICKSTART.md - Beginner guide
- TROUBLESHOOTING.md - Common issues
For Developers:
- CHANGELOG.md - Release history
- ROADMAP.md - 136 tasks across 10 categories
- docs/UNIFIED_SCRAPING.md - Multi-source scraping
- docs/MCP_SETUP.md - MCP server setup
- docs/ENHANCEMENT_MODES.md - AI enhancement modes
- docs/PATTERN_DETECTION.md - C3.1 pattern detection
- docs/THREE_STREAM_STATUS_REPORT.md - Three-stream architecture
- docs/MULTI_LLM_SUPPORT.md - Multi-platform support
🎓 Understanding the Codebase
Why src/ Layout?
Modern Python best practice (PEP 517/518):
- Prevents accidental imports from repo root
- Forces proper package installation
- Better isolation between package and tests
- Required:
pip install -e .before running tests
Why Platform Adaptors?
Strategy pattern benefits:
- Single codebase supports 4 platforms
- Platform-specific optimizations (format, APIs, models)
- Easy to add new platforms (implement BaseAdaptor)
- Clean separation of concerns
- Testable in isolation
Why Git-style CLI?
User experience benefits:
- Familiar to developers (like
git) - Single entry point:
skill-seekers - Backward compatible: individual tools still work
- Cleaner than multiple separate commands
- Easier to document and teach
Three-Stream GitHub Architecture
The unified_codebase_analyzer.py splits GitHub repositories into three independent streams:
Stream 1: Code Analysis (C3.x features)
- Deep AST parsing (9 languages)
- Design pattern detection (C3.1)
- Test example extraction (C3.2)
- How-to guide generation (C3.3)
- Configuration extraction (C3.4)
- Architectural overview (C3.5)
- API reference + dependency graphs
Stream 2: Documentation
- README, CONTRIBUTING, LICENSE
- docs/ directory markdown files
- Wiki pages (if available)
- CHANGELOG and version history
Stream 3: Community Insights
- GitHub metadata (stars, forks, watchers)
- Issue analysis (top problems and solutions)
- PR trends and contributor stats
- Release history
- Label-based topic detection
Key Benefits:
- Unified interface for GitHub URLs and local paths
- Analysis depth control: 'basic' (1-2 min) or 'c3x' (20-60 min)
- Enhanced router generation with GitHub context
- Smart keyword extraction weighted by GitHub labels (2x weight)
- 81 E2E tests passing (0.44 seconds)
🔧 Helper Scripts
The scripts/ directory contains utility scripts:
# Bootstrap skill generation - self-hosting skill-seekers as a Claude skill
./scripts/bootstrap_skill.sh
# Start MCP server for HTTP transport
./scripts/start_mcp_server.sh
# Script templates are in scripts/skill_header.md
Bootstrap Skill Workflow:
- Analyzes skill-seekers codebase itself (dogfooding)
- Combines handcrafted header with auto-generated analysis
- Validates SKILL.md structure
- Outputs ready-to-use skill for Claude Code
🔍 Performance Characteristics
| Operation | Time | Notes |
|---|---|---|
| Scraping (sync) | 15-45 min | First time, thread-based |
| Scraping (async) | 5-15 min | 2-3x faster with --async |
| Building | 1-3 min | Fast rebuild from cache |
| Re-building | <1 min | With --skip-scrape |
| Enhancement (LOCAL) | 30-60 sec | Uses Claude Code Max |
| Enhancement (API) | 20-40 sec | Requires API key |
| Packaging | 5-10 sec | Final .zip creation |
🎉 Recent Achievements
v3.1.0 (In Development) - "Unified CLI & Developer Experience":
- 🎯 Unified
createCommand - Auto-detects source type (web/GitHub/local/PDF/config) - 📋 Progressive Disclosure Help - Default shows 13 universal flags, detailed help available per source
- ⚡ -p Shortcut - Quick preset selection (
-p quick|standard|comprehensive) - 🔧 Enhancement Flag Consolidation -
--enhance-level(0-3) replaces 3 separate flags - 🎨 Smart Source Detection - No need to specify whether input is URL, repo, or directory
- 🔄 Enhancement Workflow Presets - YAML-based presets;
skill-seekers workflows list/show/copy/add/remove/validate; bundled presets:default,minimal,security-focus,architecture-comprehensive,api-documentation - 🔀 Multiple Workflows from CLI -
--enhance-workflow wf-a --enhance-workflow wf-bchains presets in a single command;workflows copy/add/removeall accept multiple names/files at once - 🐛 Bug Fix -
createcommand now correctly forwards multiple--enhance-workflowflags to sub-scrapers - ✅ 2,121 Tests Passing - All CLI refactor + workflow preset work verified
- 📚 Improved Documentation - CLAUDE.md, README, QUICK_REFERENCE updated with workflow preset details
v3.1.0 CI Stability (February 20, 2026):
- 🔧 Dependency Alignment - Fixed MCP version mismatch between requirements.txt (was 1.18.0) and pyproject.toml (>=1.25)
- 📦 PyYAML Core Dependency - Added PyYAML>=6.0 to core dependencies (required by workflow_tools.py module-level import)
- ⚡ Benchmark Stability - Relaxed timing-sensitive test thresholds for CI environment variability
- ✅ 2,121 Tests Passing - All CI matrix jobs passing (ubuntu 3.10/3.11/3.12, macos 3.11/3.12)
v3.0.0 (February 10, 2026) - "Universal Intelligence Platform":
- 🚀 16 Platform Adaptors - RAG frameworks (LangChain, LlamaIndex, Haystack), vector DBs (Chroma, FAISS, Weaviate, Qdrant), AI coding assistants (Cursor, Windsurf, Cline, Continue.dev), LLM platforms (Claude, Gemini, OpenAI)
- 🛠️ 26 MCP Tools (up from 18) - Complete automation for any AI system
- ✅ 1,852 Tests Passing (up from 700+) - Production-grade reliability
- ☁️ Cloud Storage - S3, GCS, Azure Blob Storage integration
- 🎯 AI Coding Assistants - Persistent context for Cursor, Windsurf, Cline, Continue.dev
- 📊 Quality Metrics - Automated completeness scoring and content analysis
- 🌐 Multilingual Support - Language detection and translation
- 🔄 Streaming Ingest - Real-time data processing pipelines
- 📈 Benchmarking Tools - Performance comparison and CI/CD integration
- 🔧 Setup Wizard - Interactive first-time configuration
- 📦 12 Example Projects - Complete working examples for every integration
- 📚 18 Integration Guides - Comprehensive documentation for all platforms
v2.9.0 (February 3, 2026):
- C3.10: Signal Flow Analysis - Complete signal flow analysis for Godot projects
- Comprehensive Godot 4.x support (GDScript, .tscn, .tres, .gdshader files)
- GDScript test extraction (GUT, gdUnit4, WAT frameworks)
- Signal pattern detection (EventBus, Observer, Event Chains)
- Signal-based how-to guides generation
v2.8.0 (February 1, 2026):
- C3.9: Project Documentation Extraction
- Granular AI enhancement control with
--enhance-level(0-3)
v2.7.1 (January 18, 2026 - Hotfix):
- 🚨 Critical Bug Fix: Config download 404 errors resolved
- Fixed manual URL construction bug - now uses
download_urlfrom API response - All 15 source tools tests + 8 fetch_config tests passing
v2.7.0 (January 18, 2026):
- 🔐 Smart Rate Limit Management - Multi-token GitHub configuration system
- 🧙 Interactive Configuration Wizard - Beautiful terminal UI (
skill-seekers config) - 🚦 Intelligent Rate Limit Handler - Four strategies (prompt/wait/switch/fail)
- 📥 Resume Capability - Continue interrupted jobs with progress tracking
- 🔧 CI/CD Support - Non-interactive mode for automation
- 🎯 Bootstrap Skill - Self-hosting skill-seekers as Claude Code skill
v2.6.0 (January 14, 2026):
- C3.x Codebase Analysis Suite Complete (C3.1-C3.8)
- Multi-platform support with platform adaptor architecture (4 platforms)
- 18 MCP tools fully functional
- 700+ tests passing
- Unified multi-source scraping maturity
C3.x Series (Complete - Code Analysis Features):
- C3.1: Design pattern detection (10 GoF patterns, 9 languages, 87% precision)
- C3.2: Test example extraction (5 categories, AST-based for Python)
- C3.3: How-to guide generation with AI enhancement (5 improvements)
- C3.4: Configuration pattern extraction (env vars, config files, CLI args)
- C3.5: Architectural overview & router skill generation
- C3.6: AI enhancement for patterns and test examples (Claude API integration)
- C3.7: Architectural pattern detection (8 patterns, framework-aware)
- C3.8: Standalone codebase scraper (300+ line SKILL.md from code alone)
- C3.9: Project documentation extraction (markdown categorization, AI enhancement)
- C3.10: Signal flow analysis (Godot event-driven architecture, pattern detection)
v2.5.2:
- UX Improvement: Analysis features now default ON with --skip-* flags (BREAKING)
- Router quality improvements: 6.5/10 → 8.5/10 (+31%)
- All 107 codebase analysis tests passing
v2.5.0:
- Multi-platform support (Claude, Gemini, OpenAI, Markdown)
- Platform adaptor architecture
- 18 MCP tools (up from 9)
- Complete feature parity across platforms
v2.1.0:
- Unified multi-source scraping (docs + GitHub + PDF)
- Conflict detection between sources
- 427 tests passing
v1.0.0:
- Production release with MCP integration
- Documentation scraping with smart categorization
- 12 preset configurations