firefrost-gaming/skill-seekers-reference

Files

yusyus a99e22c639 feat: Multi-Source Synthesis Architecture - Rich Standalone Skills + Smart Combination

BREAKING CHANGE: Major architectural improvements to multi-source skill generation

This commit implements the complete "Multi-Source Synthesis Architecture" where
each source (documentation, GitHub, PDF) generates a rich standalone SKILL.md
file before being intelligently synthesized with source-specific formulas.

## 🎯 Core Architecture Changes

### 1. Rich Standalone SKILL.md Generation (Source Parity)

Each source now generates comprehensive, production-quality SKILL.md files that
can stand alone OR be synthesized with other sources.

**GitHub Scraper Enhancements** (+263 lines):
- Now generates 300+ line SKILL.md (was ~50 lines)
- Integrates C3.x codebase analysis data:
  - C2.5: API Reference extraction
  - C3.1: Design pattern detection (27 high-confidence patterns)
  - C3.2: Test example extraction (215 examples)
  - C3.7: Architectural pattern analysis
- Enhanced sections:
  - ⚡ Quick Reference with pattern summaries
  - 📝 Code Examples from real repository tests
  - 🔧 API Reference from codebase analysis
  - 🏗️ Architecture Overview with design patterns
  - ⚠️ Known Issues from GitHub issues
- Location: src/skill_seekers/cli/github_scraper.py

**PDF Scraper Enhancements** (+205 lines):
- Now generates 200+ line SKILL.md (was ~50 lines)
- Enhanced content extraction:
  - 📖 Chapter Overview (PDF structure breakdown)
  - 🔑 Key Concepts (extracted from headings)
  - ⚡ Quick Reference (pattern extraction)
  - 📝 Code Examples: Top 15 (was top 5), grouped by language
  - Quality scoring and intelligent truncation
- Better formatting and organization
- Location: src/skill_seekers/cli/pdf_scraper.py

**Result**: All 3 sources (docs, GitHub, PDF) now have equal capability to
generate rich, comprehensive standalone skills.

### 2. File Organization & Caching System

**Problem**: output/ directory cluttered with intermediate files, data, and logs.

**Solution**: New `.skillseeker-cache/` hidden directory for all intermediate files.

**New Structure**:
```
.skillseeker-cache/{skill_name}/
├── sources/          # Standalone SKILL.md from each source
│   ├── httpx_docs/
│   ├── httpx_github/
│   └── httpx_pdf/
├── data/             # Raw scraped data (JSON)
├── repos/            # Cloned GitHub repositories (cached for reuse)
└── logs/             # Session logs with timestamps

output/{skill_name}/  # CLEAN: Only final synthesized skill
├── SKILL.md
└── references/
```

**Benefits**:
- ✅ Clean output/ directory (only final product)
- ✅ Intermediate files preserved for debugging
- ✅ Repository clones cached and reused (faster re-runs)
- ✅ Timestamped logs for each scraping session
- ✅ All cache dirs added to .gitignore

**Changes**:
- .gitignore: Added `.skillseeker-cache/` entry
- unified_scraper.py: Complete reorganization (+238 lines)
  - Added cache directory structure
  - File logging with timestamps
  - Repository cloning with caching/reuse
  - Cleaner intermediate file management
  - Better subprocess logging and error handling

### 3. Config Repository Migration

**Moved to separate config repository**: https://github.com/yusufkaraaslan/skill-seekers-configs

**Deleted from this repo** (35 config files):
- ansible-core.json, astro.json, claude-code.json
- django.json, django_unified.json, fastapi.json, fastapi_unified.json
- godot.json, godot_unified.json, godot_github.json, godot-large-example.json
- react.json, react_unified.json, react_github.json, react_github_example.json
- vue.json, kubernetes.json, laravel.json, tailwind.json, hono.json
- svelte_cli_unified.json, steam-economy-complete.json
- deck_deck_go_local.json, python-tutorial-test.json, example_pdf.json
- test-manual.json, fastapi_unified_test.json, fastmcp_github_example.json
- example-team/ directory (4 files)

**Kept as reference example**:
- configs/httpx_comprehensive.json (complete multi-source example)

**Rationale**:
- Cleaner repository (979+ lines added, 1680 deleted)
- Configs managed separately with versioning
- Official presets available via `fetch-config` command
- Users can maintain private config repos

### 4. AI Enhancement Improvements

**enhance_skill.py** (+125 lines):
- Better integration with multi-source synthesis
- Enhanced prompt generation for synthesized skills
- Improved error handling and logging
- Support for source metadata in enhancement

### 5. Documentation Updates

**CLAUDE.md** (+252 lines):
- Comprehensive project documentation
- Architecture explanations
- Development workflow guidelines
- Testing requirements
- Multi-source synthesis patterns

**SKILL_QUALITY_ANALYSIS.md** (new):
- Quality assessment framework
- Before/after analysis of httpx skill
- Grading rubric for skill quality
- Metrics and benchmarks

### 6. Testing & Validation Scripts

**test_httpx_skill.sh** (new):
- Complete httpx skill generation test
- Multi-source synthesis validation
- Quality metrics verification

**test_httpx_quick.sh** (new):
- Quick validation script
- Subset of features for rapid testing

## 📊 Quality Improvements

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| GitHub SKILL.md lines | ~50 | 300+ | +500% |
| PDF SKILL.md lines | ~50 | 200+ | +300% |
| GitHub C3.x integration | ❌ No | ✅ Yes | New feature |
| PDF pattern extraction | ❌ No | ✅ Yes | New feature |
| File organization | Messy | Clean cache | Major improvement |
| Repository cloning | Always fresh | Cached reuse | Faster re-runs |
| Logging | Console only | Timestamped files | Better debugging |
| Config management | In-repo | Separate repo | Cleaner separation |

## 🧪 Testing

All existing tests pass:
- test_c3_integration.py: Updated for new architecture
- 700+ tests passing
- Multi-source synthesis validated with httpx example

## 🔧 Technical Details

**Modified Core Files**:
1. src/skill_seekers/cli/github_scraper.py (+263 lines)
   - _generate_skill_md(): Rich content with C3.x integration
   - _format_pattern_summary(): Design pattern summaries
   - _format_code_examples(): Test example formatting
   - _format_api_reference(): API reference from codebase
   - _format_architecture(): Architectural pattern analysis

2. src/skill_seekers/cli/pdf_scraper.py (+205 lines)
   - _generate_skill_md(): Enhanced with rich content
   - _format_key_concepts(): Extract concepts from headings
   - _format_patterns_from_content(): Pattern extraction
   - Code examples: Top 15, grouped by language, better quality scoring

3. src/skill_seekers/cli/unified_scraper.py (+238 lines)
   - __init__(): Cache directory structure
   - _setup_logging(): File logging with timestamps
   - _clone_github_repo(): Repository caching system
   - _scrape_documentation(): Move to cache, better logging
   - Better subprocess handling and error reporting

4. src/skill_seekers/cli/enhance_skill.py (+125 lines)
   - Multi-source synthesis awareness
   - Enhanced prompt generation
   - Better error handling

**Minor Updates**:
- src/skill_seekers/cli/codebase_scraper.py (+3 lines): Minor improvements
- src/skill_seekers/cli/test_example_extractor.py: Quality scoring adjustments
- tests/test_c3_integration.py: Test updates for new architecture

## 🚀 Migration Guide

**For users with existing configs**:
No action required - all existing configs continue to work.

**For users wanting official presets**:
```bash
# Fetch from official config repo
skill-seekers fetch-config --name react --target unified

# Or use existing local configs
skill-seekers unified --config configs/httpx_comprehensive.json
```

**Cache directory**:
New `.skillseeker-cache/` directory will be created automatically.
Safe to delete - will be regenerated on next run.

## 📈 Next Steps

This architecture enables:
- ✅ Source parity: All sources generate rich standalone skills
- ✅ Smart synthesis: Each combination has optimal formula
- ✅ Better debugging: Cached files and logs preserved
- ✅ Faster iteration: Repository caching, clean output
- 🔄 Future: Multi-platform enhancement (Gemini, GPT-4) - planned
- 🔄 Future: Conflict detection between sources - planned
- 🔄 Future: Source prioritization rules - planned

## 🎓 Example: httpx Skill Quality

**Before**: 186 lines, basic synthesis, missing data
**After**: 640 lines with AI enhancement, A- (9/10) quality

**What changed**:
- All C3.x analysis data integrated (patterns, tests, API, architecture)
- GitHub metadata included (stars, topics, languages)
- PDF chapter structure visible
- Professional formatting with emojis and clear sections
- Real-world code examples from test suite
- Design patterns explained with confidence scores
- Known issues with impact assessment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-01-11 23:01:07 +03:00

25 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

🎯 Project Overview

Skill Seekers is a Python tool that converts documentation websites, GitHub repositories, and PDFs into LLM skills. It supports 4 platforms: Claude AI, Google Gemini, OpenAI ChatGPT, and Generic Markdown.

Current Version: v2.5.2 Python Version: 3.10+ required Status: Production-ready, published on PyPI

🏗️ Architecture

Core Design Pattern: Platform Adaptors

The codebase uses the Strategy Pattern with a factory method to support multiple LLM platforms:

src/skill_seekers/cli/adaptors/
├── __init__.py          # Factory: get_adaptor(target)
├── base_adaptor.py      # Abstract base class
├── claude_adaptor.py    # Claude AI (ZIP + YAML)
├── gemini_adaptor.py    # Google Gemini (tar.gz)
├── openai_adaptor.py    # OpenAI ChatGPT (ZIP + Vector Store)
└── markdown_adaptor.py  # Generic Markdown (ZIP)

Key Methods:

package(skill_dir, output_path) - Platform-specific packaging
upload(package_path, api_key) - Platform-specific upload
enhance(skill_dir, mode) - AI enhancement with platform-specific models

Data Flow (5 Phases)

Scrape Phase (doc_scraper.py:scrape_all())
- BFS traversal from base_url
- Output: output/{name}_data/pages/*.json
Build Phase (doc_scraper.py:build_skill())
- Load pages → Categorize → Extract patterns
- Output: output/{name}/SKILL.md + references/*.md
Enhancement Phase (optional, enhance_skill_local.py)
- LLM analyzes references → Rewrites SKILL.md
- Platform-specific models (Sonnet 4, Gemini 2.0, GPT-4o)
Package Phase (package_skill.py → adaptor)
- Platform adaptor packages in appropriate format
- Output: .zip or .tar.gz
Upload Phase (optional, upload_skill.py → adaptor)
- Upload via platform API

File Structure (src/ layout)

src/skill_seekers/
├── cli/                              # CLI tools
│   ├── main.py                       # Git-style CLI dispatcher
│   ├── doc_scraper.py                # Main scraper (~790 lines)
│   ├── github_scraper.py             # GitHub repo analysis
│   ├── pdf_scraper.py                # PDF extraction
│   ├── unified_scraper.py            # Multi-source scraping
│   ├── codebase_scraper.py           # Local codebase analysis (C2.x)
│   ├── unified_codebase_analyzer.py  # Three-stream GitHub+local analyzer
│   ├── enhance_skill_local.py        # AI enhancement (LOCAL mode)
│   ├── enhance_status.py             # Enhancement status monitoring
│   ├── package_skill.py              # Skill packager
│   ├── upload_skill.py               # Upload to platforms
│   ├── install_skill.py              # Complete workflow automation
│   ├── install_agent.py              # Install to AI agent directories
│   ├── pattern_recognizer.py         # C3.1 Design pattern detection
│   ├── test_example_extractor.py     # C3.2 Test example extraction
│   ├── how_to_guide_builder.py       # C3.3 How-to guide generation
│   ├── config_extractor.py           # C3.4 Configuration extraction
│   ├── generate_router.py            # C3.5 Router skill generation
│   ├── code_analyzer.py              # Multi-language code analysis
│   ├── api_reference_builder.py      # API documentation builder
│   ├── dependency_analyzer.py        # Dependency graph analysis
│   └── adaptors/                     # Platform adaptor architecture
│       ├── __init__.py
│       ├── base_adaptor.py
│       ├── claude_adaptor.py
│       ├── gemini_adaptor.py
│       ├── openai_adaptor.py
│       └── markdown_adaptor.py
└── mcp/                              # MCP server integration
    ├── server.py                     # FastMCP server (stdio + HTTP)
    └── tools/                        # 18 MCP tool implementations

🛠️ Development Commands

Setup

# Install in editable mode (required before tests due to src/ layout)
pip install -e .

# Install with all platform dependencies
pip install -e ".[all-llms]"

# Install specific platforms
pip install -e ".[gemini]"   # Google Gemini
pip install -e ".[openai]"   # OpenAI ChatGPT

Running Tests

CRITICAL: Never skip tests - User requires all tests to pass before commits.

# All tests (must run pip install -e . first!)
pytest tests/ -v

# Specific test file
pytest tests/test_scraper_features.py -v

# Multi-platform tests
pytest tests/test_install_multiplatform.py -v

# With coverage
pytest tests/ --cov=src/skill_seekers --cov-report=term --cov-report=html

# Single test
pytest tests/test_scraper_features.py::test_detect_language -v

# MCP server tests
pytest tests/test_mcp_fastmcp.py -v

Test Architecture:

46 test files covering all features
CI Matrix: Ubuntu + macOS, Python 3.10-3.13
700+ tests passing
Must run pip install -e . before tests (src/ layout requirement)

Building & Publishing

# Build package (using uv - recommended)
uv build

# Or using build
python -m build

# Publish to PyPI
uv publish

# Or using twine
python -m twine upload dist/*

Testing CLI Commands

# Test scraping (dry run)
skill-seekers scrape --config configs/react.json --dry-run

# Test codebase analysis (C2.x features)
skill-seekers codebase --directory . --output output/codebase/

# Test pattern detection (C3.1)
skill-seekers patterns --file src/skill_seekers/cli/code_analyzer.py

# Test how-to guide generation (C3.3)
skill-seekers how-to-guides output/test_examples.json --output output/guides/

# Test enhancement status monitoring
skill-seekers enhance-status output/react/ --watch

# Test multi-platform packaging
skill-seekers package output/react/ --target gemini --dry-run

# Test MCP server (stdio mode)
python -m skill_seekers.mcp.server

# Test MCP server (HTTP mode)
python -m skill_seekers.mcp.server --transport http --port 8765

🔧 Key Implementation Details

CLI Architecture (Git-style)

Entry point: src/skill_seekers/cli/main.py

The unified CLI modifies sys.argv and calls existing main() functions to maintain backward compatibility:

# Example: skill-seekers scrape --config react.json
# Transforms to: doc_scraper.main() with modified sys.argv

Subcommands: scrape, github, pdf, unified, codebase, enhance, enhance-status, package, upload, estimate, install, install-agent, patterns, how-to-guides

New in v2.5.2:

codebase - Local codebase analysis without GitHub API (C2.x features)
enhance-status - Monitor background/daemon enhancement processes
patterns - Detect design patterns in code (C3.1)
how-to-guides - Generate educational guides from tests (C3.3)

Platform Adaptor Usage

from skill_seekers.cli.adaptors import get_adaptor

# Get platform-specific adaptor
adaptor = get_adaptor('gemini')  # or 'claude', 'openai', 'markdown'

# Package skill
adaptor.package(skill_dir='output/react/', output_path='output/')

# Upload to platform
adaptor.upload(
    package_path='output/react-gemini.tar.gz',
    api_key=os.getenv('GOOGLE_API_KEY')
)

# AI enhancement
adaptor.enhance(skill_dir='output/react/', mode='api')

C3.x Codebase Analysis Features

The project has comprehensive codebase analysis capabilities (C3.1-C3.7):

C3.1 Design Pattern Detection (pattern_recognizer.py):

Detects 10 common patterns: Singleton, Factory, Observer, Strategy, Decorator, Builder, Adapter, Command, Template Method, Chain of Responsibility
Supports 9 languages: Python, JavaScript, TypeScript, C++, C, C#, Go, Rust, Java
Three detection levels: surface (fast), deep (balanced), full (thorough)
87% precision, 80% recall on real-world projects

C3.2 Test Example Extraction (test_example_extractor.py):

Extracts real usage examples from test files
Categories: instantiation, method_call, config, setup, workflow
AST-based for Python, regex-based for 8 other languages
Quality filtering with confidence scoring

C3.3 How-To Guide Generation (how_to_guide_builder.py):

Transforms test workflows into educational guides
5 AI enhancements: step descriptions, troubleshooting, prerequisites, next steps, use cases
Dual-mode AI: API (fast) or LOCAL (free with Claude Code Max)
4 grouping strategies: AI tutorial group, file path, test name, complexity

C3.4 Configuration Pattern Extraction (config_extractor.py):

Extracts configuration patterns from codebases
Identifies config files, env vars, CLI arguments
AI enhancement for better organization

C3.5 Router Skill Generation (generate_router.py):

Creates meta-skills that route to specialized skills
Quality improvements: 6.5/10 → 8.5/10 (+31%)
Integrates GitHub metadata, issues, labels

Codebase Scraper Integration (codebase_scraper.py):

# All C3.x features enabled by default, use --skip-* to disable
skill-seekers codebase --directory /path/to/repo

# Disable specific features
skill-seekers codebase --directory . --skip-patterns --skip-how-to-guides

# Legacy flags (deprecated but still work)
skill-seekers codebase --directory . --build-api-reference --build-dependency-graph

Key Architecture Decision (v2.5.2):

Changed from opt-in (--build-*) to opt-out (--skip-*) flags
All analysis features now ON by default for maximum value
Backward compatibility warnings for deprecated flags

Smart Categorization Algorithm

Located in doc_scraper.py:smart_categorize():

Scores pages against category keywords
3 points for URL match, 2 for title, 1 for content
Threshold of 2+ for categorization
Auto-infers categories from URL segments if none provided
Falls back to "other" category

Language Detection

Located in doc_scraper.py:detect_language():

CSS class attributes (language-*, lang-*)
Heuristics (keywords like def, const, func)

Configuration File Structure

Configs (configs/*.json) define scraping behavior:

{
  "name": "framework-name",
  "description": "When to use this skill",
  "base_url": "https://docs.example.com/",
  "selectors": {
    "main_content": "article",  // CSS selector
    "title": "h1",
    "code_blocks": "pre code"
  },
  "url_patterns": {
    "include": ["/docs"],
    "exclude": ["/blog"]
  },
  "categories": {
    "getting_started": ["intro", "quickstart"],
    "api": ["api", "reference"]
  },
  "rate_limit": 0.5,
  "max_pages": 500
}

🧪 Testing Guidelines

Test Coverage Requirements

Core features: 100% coverage required
Platform adaptors: Each platform has dedicated tests
MCP tools: All 18 tools must be tested
Integration tests: End-to-end workflows

Key Test Files

test_scraper_features.py - Core scraping functionality
test_mcp_server.py - MCP integration (18 tools)
test_mcp_fastmcp.py - FastMCP framework
test_unified.py - Multi-source scraping
test_github_scraper.py - GitHub analysis
test_pdf_scraper.py - PDF extraction
test_install_multiplatform.py - Multi-platform packaging
test_integration.py - End-to-end workflows
test_install_skill.py - One-command install
test_install_agent.py - AI agent installation

🌐 Environment Variables

# Claude AI (default platform)
export ANTHROPIC_API_KEY=sk-ant-...

# Google Gemini (optional)
export GOOGLE_API_KEY=AIza...

# OpenAI ChatGPT (optional)
export OPENAI_API_KEY=sk-...

# GitHub (for higher rate limits)
export GITHUB_TOKEN=ghp_...

# Private config repositories (optional)
export GITLAB_TOKEN=glpat-...
export GITEA_TOKEN=...
export BITBUCKET_TOKEN=...

📦 Package Structure (pyproject.toml)

Entry Points

[project.scripts]
# Main unified CLI
skill-seekers = "skill_seekers.cli.main:main"

# Individual tool entry points
skill-seekers-scrape = "skill_seekers.cli.doc_scraper:main"
skill-seekers-github = "skill_seekers.cli.github_scraper:main"
skill-seekers-pdf = "skill_seekers.cli.pdf_scraper:main"
skill-seekers-unified = "skill_seekers.cli.unified_scraper:main"
skill-seekers-codebase = "skill_seekers.cli.codebase_scraper:main"           # NEW: C2.x
skill-seekers-enhance = "skill_seekers.cli.enhance_skill_local:main"
skill-seekers-enhance-status = "skill_seekers.cli.enhance_status:main"       # NEW: Status monitoring
skill-seekers-package = "skill_seekers.cli.package_skill:main"
skill-seekers-upload = "skill_seekers.cli.upload_skill:main"
skill-seekers-estimate = "skill_seekers.cli.estimate_pages:main"
skill-seekers-install = "skill_seekers.cli.install_skill:main"
skill-seekers-install-agent = "skill_seekers.cli.install_agent:main"
skill-seekers-patterns = "skill_seekers.cli.pattern_recognizer:main"         # NEW: C3.1
skill-seekers-how-to-guides = "skill_seekers.cli.how_to_guide_builder:main" # NEW: C3.3

Optional Dependencies

[project.optional-dependencies]
gemini = ["google-generativeai>=0.8.0"]
openai = ["openai>=1.0.0"]
all-llms = ["google-generativeai>=0.8.0", "openai>=1.0.0"]

[dependency-groups]  # PEP 735 (replaces tool.uv.dev-dependencies)
dev = [
    "pytest>=8.4.2",
    "pytest-asyncio>=0.24.0",
    "pytest-cov>=7.0.0",
    "coverage>=7.11.0",
]

Note: Project uses PEP 735 dependency-groups instead of deprecated tool.uv.dev-dependencies.

🚨 Critical Development Notes

Must Run Before Tests

# REQUIRED: Install package before running tests
pip install -e .

# Why: src/ layout requires package installation
# Without this, imports will fail

Never Skip Tests

Per user instructions in ~/.claude/CLAUDE.md:

"never skipp any test. always make sure all test pass"
All 700+ tests must pass before commits
Run full test suite: pytest tests/ -v

Platform-Specific Dependencies

Platform dependencies are optional:

# Install only what you need
pip install skill-seekers[gemini]  # Gemini support
pip install skill-seekers[openai]  # OpenAI support
pip install skill-seekers[all-llms]  # All platforms

AI Enhancement Modes

AI enhancement transforms basic skills (2-3/10) into production-ready skills (8-9/10). Two modes available:

API Mode (default if ANTHROPIC_API_KEY is set):

Direct Claude API calls (fast, efficient)
Cost: ~$0.15-$0.30 per skill
Perfect for CI/CD automation
Requires: export ANTHROPIC_API_KEY=sk-ant-...

LOCAL Mode (fallback if no API key):

Uses Claude Code CLI (your existing Max plan)
Free! No API charges
4 execution modes:
- Headless (default): Foreground, waits for completion
- Background (--background): Returns immediately
- Daemon (--daemon): Fully detached with nohup
- Terminal (--interactive-enhancement): Opens new terminal (macOS)
Status monitoring: skill-seekers enhance-status output/react/ --watch
Timeout configuration: --timeout 300 (seconds)

Force Mode (default ON since v2.5.2):

Skip all confirmations automatically
Perfect for CI/CD, batch processing
Use --no-force to enable prompts if needed

# API mode (if ANTHROPIC_API_KEY is set)
skill-seekers enhance output/react/

# LOCAL mode (no API key needed)
skill-seekers enhance output/react/ --mode LOCAL

# Background with status monitoring
skill-seekers enhance output/react/ --background
skill-seekers enhance-status output/react/ --watch

# Force mode OFF (enable prompts)
skill-seekers enhance output/react/ --no-force

See docs/ENHANCEMENT_MODES.md for detailed documentation.

Git Workflow

Main branch: main
Current branch: development
Always create feature branches from development
Feature branch naming: feature/{task-id}-{description} or feature/{category}

🔌 MCP Integration

MCP Server (18 Tools)

Transport modes:

stdio: Claude Code, VS Code + Cline
HTTP: Cursor, Windsurf, IntelliJ IDEA

Core Tools (9):

list_configs - List preset configurations
generate_config - Generate config from docs URL
validate_config - Validate config structure
estimate_pages - Estimate page count
scrape_docs - Scrape documentation
package_skill - Package to .zip (supports --target)
upload_skill - Upload to platform (supports --target)
enhance_skill - AI enhancement with platform support
install_skill - Complete workflow automation

Extended Tools (9): 10. scrape_github - GitHub repository analysis 11. scrape_pdf - PDF extraction 12. unified_scrape - Multi-source scraping 13. merge_sources - Merge docs + code 14. detect_conflicts - Find discrepancies 15. split_config - Split large configs 16. generate_router - Generate router skills 17. add_config_source - Register git repos 18. fetch_config - Fetch configs from git

Starting MCP Server

# stdio mode (Claude Code, VS Code + Cline)
python -m skill_seekers.mcp.server

# HTTP mode (Cursor, Windsurf, IntelliJ)
python -m skill_seekers.mcp.server --transport http --port 8765

📋 Common Workflows

Adding a New Platform

Create adaptor in src/skill_seekers/cli/adaptors/{platform}_adaptor.py
Inherit from BaseAdaptor
Implement package(), upload(), enhance() methods
Add to factory in adaptors/__init__.py
Add optional dependency to pyproject.toml
Add tests in tests/test_install_multiplatform.py

Adding a New Feature

Implement in appropriate CLI module
Add entry point to pyproject.toml if needed
Add tests in tests/test_{feature}.py
Run full test suite: pytest tests/ -v
Update CHANGELOG.md
Commit only when all tests pass

Debugging Test Failures

# Run specific failing test with verbose output
pytest tests/test_file.py::test_name -vv

# Run with print statements visible
pytest tests/test_file.py -s

# Run with coverage to see what's not tested
pytest tests/test_file.py --cov=src/skill_seekers --cov-report=term-missing

📚 Key Code Locations

Documentation Scraper (src/skill_seekers/cli/doc_scraper.py):

is_valid_url() - URL validation
extract_content() - Content extraction
detect_language() - Code language detection
extract_patterns() - Pattern extraction
smart_categorize() - Smart categorization
infer_categories() - Category inference
generate_quick_reference() - Quick reference generation
create_enhanced_skill_md() - SKILL.md generation
scrape_all() - Main scraping loop
main() - Entry point

Codebase Analysis (src/skill_seekers/cli/):

codebase_scraper.py - Main CLI for local codebase analysis
code_analyzer.py - Multi-language AST parsing (9 languages)
api_reference_builder.py - API documentation generation
dependency_analyzer.py - NetworkX-based dependency graphs
pattern_recognizer.py - C3.1 design pattern detection
test_example_extractor.py - C3.2 test example extraction
how_to_guide_builder.py - C3.3 guide generation
config_extractor.py - C3.4 configuration extraction
generate_router.py - C3.5 router skill generation
unified_codebase_analyzer.py - Three-stream GitHub+local analyzer

AI Enhancement (src/skill_seekers/cli/):

enhance_skill_local.py - LOCAL mode enhancement (4 execution modes)
enhance_skill.py - API mode enhancement
enhance_status.py - Status monitoring for background processes
ai_enhancer.py - Shared AI enhancement logic
guide_enhancer.py - C3.3 guide AI enhancement
config_enhancer.py - C3.4 config AI enhancement

Platform Adaptors (src/skill_seekers/cli/adaptors/):

__init__.py - Factory function
base_adaptor.py - Abstract base class
claude_adaptor.py - Claude AI implementation
gemini_adaptor.py - Google Gemini implementation
openai_adaptor.py - OpenAI ChatGPT implementation
markdown_adaptor.py - Generic Markdown implementation

MCP Server (src/skill_seekers/mcp/):

server.py - FastMCP-based server
tools/ - 18 MCP tool implementations

🎯 Project-Specific Best Practices

Always use platform adaptors - Never hardcode platform-specific logic
Test all platforms - Changes must work for all 4 platforms
Maintain backward compatibility - Legacy configs must still work
Document API changes - Update CHANGELOG.md for every release
Keep dependencies optional - Platform-specific deps are optional
Use src/ layout - Proper package structure with pip install -e .
Run tests before commits - Per user instructions, never skip tests

📖 Additional Documentation

For Users:

README.md - Complete user documentation
BULLETPROOF_QUICKSTART.md - Beginner guide
TROUBLESHOOTING.md - Common issues

For Developers:

CHANGELOG.md - Release history
FLEXIBLE_ROADMAP.md - 134 tasks across 22 feature groups
docs/UNIFIED_SCRAPING.md - Multi-source scraping
docs/MCP_SETUP.md - MCP server setup
docs/ENHANCEMENT_MODES.md - AI enhancement modes
docs/PATTERN_DETECTION.md - C3.1 pattern detection
docs/THREE_STREAM_STATUS_REPORT.md - Three-stream architecture
docs/MULTI_LLM_SUPPORT.md - Multi-platform support

🎓 Understanding the Codebase

Why src/ Layout?

Modern Python best practice (PEP 517/518):

Prevents accidental imports from repo root
Forces proper package installation
Better isolation between package and tests
Required: pip install -e . before running tests

Why Platform Adaptors?

Strategy pattern benefits:

Single codebase supports 4 platforms
Platform-specific optimizations (format, APIs, models)
Easy to add new platforms (implement BaseAdaptor)
Clean separation of concerns
Testable in isolation

Why Git-style CLI?

User experience benefits:

Familiar to developers (like git)
Single entry point: skill-seekers
Backward compatible: individual tools still work
Cleaner than multiple separate commands
Easier to document and teach

Three-Stream GitHub Architecture

The unified_codebase_analyzer.py splits GitHub repositories into three independent streams:

Stream 1: Code Analysis (C3.x features)

Deep AST parsing (9 languages)
Design pattern detection (C3.1)
Test example extraction (C3.2)
How-to guide generation (C3.3)
Configuration extraction (C3.4)
Architectural overview (C3.5)
API reference + dependency graphs

Stream 2: Documentation

README, CONTRIBUTING, LICENSE
docs/ directory markdown files
Wiki pages (if available)
CHANGELOG and version history

Stream 3: Community Insights

GitHub metadata (stars, forks, watchers)
Issue analysis (top problems and solutions)
PR trends and contributor stats
Release history
Label-based topic detection

Key Benefits:

Unified interface for GitHub URLs and local paths
Analysis depth control: 'basic' (1-2 min) or 'c3x' (20-60 min)
Enhanced router generation with GitHub context
Smart keyword extraction weighted by GitHub labels (2x weight)
81 E2E tests passing (0.44 seconds)

🔍 Performance Characteristics

Operation	Time	Notes
Scraping (sync)	15-45 min	First time, thread-based
Scraping (async)	5-15 min	2-3x faster with `--async`
Building	1-3 min	Fast rebuild from cache
Re-building	<1 min	With `--skip-scrape`
Enhancement (LOCAL)	30-60 sec	Uses Claude Code Max
Enhancement (API)	20-40 sec	Requires API key
Packaging	5-10 sec	Final .zip creation

🎉 Recent Achievements

v2.5.2 (Latest):

UX Improvement: Analysis features now default ON with --skip-* flags (BREAKING)
Changed from opt-in (--build-) to opt-out (--skip-) for better discoverability
Router quality improvements: 6.5/10 → 8.5/10 (+31%)
C3.5 Architectural Overview & Skill Integrator
All 107 codebase analysis tests passing

v2.5.1:

Fixed critical PyPI packaging bug (missing adaptors module)
100% of multi-platform features working

v2.5.0:

Multi-platform support (4 LLM platforms)
Platform adaptor architecture
18 MCP tools (up from 9)
Complete feature parity across platforms
700+ tests passing

C3.x Series (Code Analysis Features):

C3.1: Design pattern detection (10 patterns, 9 languages, 87% precision)
C3.2: Test example extraction (AST-based, 19 tests)
C3.3: How-to guide generation with AI enhancement (5 improvements)
C3.4: Configuration pattern extraction
C3.5: Router skill generation
C3.6: AI enhancement (dual-mode: API + LOCAL)
C3.7: Architectural pattern detection

v2.0.0:

Unified multi-source scraping
Conflict detection between docs and code
5 unified configs (React, Django, FastAPI, Godot)
22 unified tests passing

25 KiB Raw Blame History