BREAKING CHANGE: Major architectural improvements to multi-source skill generation This commit implements the complete "Multi-Source Synthesis Architecture" where each source (documentation, GitHub, PDF) generates a rich standalone SKILL.md file before being intelligently synthesized with source-specific formulas. ## 🎯 Core Architecture Changes ### 1. Rich Standalone SKILL.md Generation (Source Parity) Each source now generates comprehensive, production-quality SKILL.md files that can stand alone OR be synthesized with other sources. **GitHub Scraper Enhancements** (+263 lines): - Now generates 300+ line SKILL.md (was ~50 lines) - Integrates C3.x codebase analysis data: - C2.5: API Reference extraction - C3.1: Design pattern detection (27 high-confidence patterns) - C3.2: Test example extraction (215 examples) - C3.7: Architectural pattern analysis - Enhanced sections: - ⚡ Quick Reference with pattern summaries - 📝 Code Examples from real repository tests - 🔧 API Reference from codebase analysis - 🏗️ Architecture Overview with design patterns - ⚠️ Known Issues from GitHub issues - Location: src/skill_seekers/cli/github_scraper.py **PDF Scraper Enhancements** (+205 lines): - Now generates 200+ line SKILL.md (was ~50 lines) - Enhanced content extraction: - 📖 Chapter Overview (PDF structure breakdown) - 🔑 Key Concepts (extracted from headings) - ⚡ Quick Reference (pattern extraction) - 📝 Code Examples: Top 15 (was top 5), grouped by language - Quality scoring and intelligent truncation - Better formatting and organization - Location: src/skill_seekers/cli/pdf_scraper.py **Result**: All 3 sources (docs, GitHub, PDF) now have equal capability to generate rich, comprehensive standalone skills. ### 2. File Organization & Caching System **Problem**: output/ directory cluttered with intermediate files, data, and logs. **Solution**: New `.skillseeker-cache/` hidden directory for all intermediate files. **New Structure**: ``` .skillseeker-cache/{skill_name}/ ├── sources/ # Standalone SKILL.md from each source │ ├── httpx_docs/ │ ├── httpx_github/ │ └── httpx_pdf/ ├── data/ # Raw scraped data (JSON) ├── repos/ # Cloned GitHub repositories (cached for reuse) └── logs/ # Session logs with timestamps output/{skill_name}/ # CLEAN: Only final synthesized skill ├── SKILL.md └── references/ ``` **Benefits**: - ✅ Clean output/ directory (only final product) - ✅ Intermediate files preserved for debugging - ✅ Repository clones cached and reused (faster re-runs) - ✅ Timestamped logs for each scraping session - ✅ All cache dirs added to .gitignore **Changes**: - .gitignore: Added `.skillseeker-cache/` entry - unified_scraper.py: Complete reorganization (+238 lines) - Added cache directory structure - File logging with timestamps - Repository cloning with caching/reuse - Cleaner intermediate file management - Better subprocess logging and error handling ### 3. Config Repository Migration **Moved to separate config repository**: https://github.com/yusufkaraaslan/skill-seekers-configs **Deleted from this repo** (35 config files): - ansible-core.json, astro.json, claude-code.json - django.json, django_unified.json, fastapi.json, fastapi_unified.json - godot.json, godot_unified.json, godot_github.json, godot-large-example.json - react.json, react_unified.json, react_github.json, react_github_example.json - vue.json, kubernetes.json, laravel.json, tailwind.json, hono.json - svelte_cli_unified.json, steam-economy-complete.json - deck_deck_go_local.json, python-tutorial-test.json, example_pdf.json - test-manual.json, fastapi_unified_test.json, fastmcp_github_example.json - example-team/ directory (4 files) **Kept as reference example**: - configs/httpx_comprehensive.json (complete multi-source example) **Rationale**: - Cleaner repository (979+ lines added, 1680 deleted) - Configs managed separately with versioning - Official presets available via `fetch-config` command - Users can maintain private config repos ### 4. AI Enhancement Improvements **enhance_skill.py** (+125 lines): - Better integration with multi-source synthesis - Enhanced prompt generation for synthesized skills - Improved error handling and logging - Support for source metadata in enhancement ### 5. Documentation Updates **CLAUDE.md** (+252 lines): - Comprehensive project documentation - Architecture explanations - Development workflow guidelines - Testing requirements - Multi-source synthesis patterns **SKILL_QUALITY_ANALYSIS.md** (new): - Quality assessment framework - Before/after analysis of httpx skill - Grading rubric for skill quality - Metrics and benchmarks ### 6. Testing & Validation Scripts **test_httpx_skill.sh** (new): - Complete httpx skill generation test - Multi-source synthesis validation - Quality metrics verification **test_httpx_quick.sh** (new): - Quick validation script - Subset of features for rapid testing ## 📊 Quality Improvements | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | GitHub SKILL.md lines | ~50 | 300+ | +500% | | PDF SKILL.md lines | ~50 | 200+ | +300% | | GitHub C3.x integration | ❌ No | ✅ Yes | New feature | | PDF pattern extraction | ❌ No | ✅ Yes | New feature | | File organization | Messy | Clean cache | Major improvement | | Repository cloning | Always fresh | Cached reuse | Faster re-runs | | Logging | Console only | Timestamped files | Better debugging | | Config management | In-repo | Separate repo | Cleaner separation | ## 🧪 Testing All existing tests pass: - test_c3_integration.py: Updated for new architecture - 700+ tests passing - Multi-source synthesis validated with httpx example ## 🔧 Technical Details **Modified Core Files**: 1. src/skill_seekers/cli/github_scraper.py (+263 lines) - _generate_skill_md(): Rich content with C3.x integration - _format_pattern_summary(): Design pattern summaries - _format_code_examples(): Test example formatting - _format_api_reference(): API reference from codebase - _format_architecture(): Architectural pattern analysis 2. src/skill_seekers/cli/pdf_scraper.py (+205 lines) - _generate_skill_md(): Enhanced with rich content - _format_key_concepts(): Extract concepts from headings - _format_patterns_from_content(): Pattern extraction - Code examples: Top 15, grouped by language, better quality scoring 3. src/skill_seekers/cli/unified_scraper.py (+238 lines) - __init__(): Cache directory structure - _setup_logging(): File logging with timestamps - _clone_github_repo(): Repository caching system - _scrape_documentation(): Move to cache, better logging - Better subprocess handling and error reporting 4. src/skill_seekers/cli/enhance_skill.py (+125 lines) - Multi-source synthesis awareness - Enhanced prompt generation - Better error handling **Minor Updates**: - src/skill_seekers/cli/codebase_scraper.py (+3 lines): Minor improvements - src/skill_seekers/cli/test_example_extractor.py: Quality scoring adjustments - tests/test_c3_integration.py: Test updates for new architecture ## 🚀 Migration Guide **For users with existing configs**: No action required - all existing configs continue to work. **For users wanting official presets**: ```bash # Fetch from official config repo skill-seekers fetch-config --name react --target unified # Or use existing local configs skill-seekers unified --config configs/httpx_comprehensive.json ``` **Cache directory**: New `.skillseeker-cache/` directory will be created automatically. Safe to delete - will be regenerated on next run. ## 📈 Next Steps This architecture enables: - ✅ Source parity: All sources generate rich standalone skills - ✅ Smart synthesis: Each combination has optimal formula - ✅ Better debugging: Cached files and logs preserved - ✅ Faster iteration: Repository caching, clean output - 🔄 Future: Multi-platform enhancement (Gemini, GPT-4) - planned - 🔄 Future: Conflict detection between sources - planned - 🔄 Future: Source prioritization rules - planned ## 🎓 Example: httpx Skill Quality **Before**: 186 lines, basic synthesis, missing data **After**: 640 lines with AI enhancement, A- (9/10) quality **What changed**: - All C3.x analysis data integrated (patterns, tests, API, architecture) - GitHub metadata included (stars, topics, languages) - PDF chapter structure visible - Professional formatting with emojis and clear sections - Real-world code examples from test suite - Design patterns explained with confidence scores - Known issues with impact assessment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
388 lines
14 KiB
Python
388 lines
14 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Integration tests for C3.5 - Architectural Overview & Skill Integrator
|
|
|
|
Tests the integration of C3.x codebase analysis features into unified skills:
|
|
- Default ON behavior for enable_codebase_analysis
|
|
- --skip-codebase-analysis CLI flag
|
|
- ARCHITECTURE.md generation with 8 sections
|
|
- C3.x reference directory structure
|
|
- Graceful degradation on failures
|
|
"""
|
|
|
|
import os
|
|
import json
|
|
import pytest
|
|
import tempfile
|
|
import shutil
|
|
from pathlib import Path
|
|
from unittest.mock import Mock, patch, MagicMock
|
|
|
|
# Import modules to test
|
|
from skill_seekers.cli.unified_scraper import UnifiedScraper
|
|
from skill_seekers.cli.unified_skill_builder import UnifiedSkillBuilder
|
|
from skill_seekers.cli.config_validator import ConfigValidator
|
|
|
|
|
|
class TestC3Integration:
|
|
"""Test C3.5 integration features."""
|
|
|
|
@pytest.fixture
|
|
def temp_dir(self):
|
|
"""Create temporary directory for tests."""
|
|
temp = tempfile.mkdtemp()
|
|
yield temp
|
|
shutil.rmtree(temp, ignore_errors=True)
|
|
|
|
@pytest.fixture
|
|
def mock_config(self, temp_dir):
|
|
"""Create mock unified config with GitHub source."""
|
|
return {
|
|
'name': 'test-c3',
|
|
'description': 'Test C3.5 integration',
|
|
'merge_mode': 'rule-based',
|
|
'sources': [
|
|
{
|
|
'type': 'github',
|
|
'repo': 'test/repo',
|
|
'local_repo_path': temp_dir,
|
|
'enable_codebase_analysis': True,
|
|
'ai_mode': 'none'
|
|
}
|
|
]
|
|
}
|
|
|
|
@pytest.fixture
|
|
def mock_c3_data(self):
|
|
"""Create mock C3.x analysis data."""
|
|
return {
|
|
'patterns': [
|
|
{
|
|
'file_path': 'src/factory.py',
|
|
'patterns': [
|
|
{
|
|
'pattern_type': 'Factory',
|
|
'class_name': 'WidgetFactory',
|
|
'confidence': 0.95,
|
|
'indicators': ['create_method', 'product_interface']
|
|
}
|
|
]
|
|
}
|
|
],
|
|
'test_examples': {
|
|
'total_examples': 15,
|
|
'high_value_count': 9,
|
|
'examples': [
|
|
{
|
|
'description': 'Create widget instance',
|
|
'category': 'instantiation',
|
|
'confidence': 0.85,
|
|
'file_path': 'tests/test_widget.py',
|
|
'code_snippet': 'widget = Widget(name="test")'
|
|
}
|
|
],
|
|
'examples_by_category': {
|
|
'instantiation': 5,
|
|
'method_call': 6,
|
|
'workflow': 4
|
|
}
|
|
},
|
|
'how_to_guides': {
|
|
'guides': [
|
|
{
|
|
'id': 'create_widget',
|
|
'title': 'How to create a widget',
|
|
'description': 'Step-by-step guide',
|
|
'steps': [
|
|
{
|
|
'action': 'Import Widget class',
|
|
'code_example': 'from widgets import Widget',
|
|
'language': 'python'
|
|
}
|
|
]
|
|
}
|
|
],
|
|
'total_count': 1
|
|
},
|
|
'config_patterns': {
|
|
'config_files': [
|
|
{
|
|
'relative_path': 'config.json',
|
|
'type': 'json',
|
|
'purpose': 'Application configuration',
|
|
'settings': [
|
|
{'key': 'debug', 'value': 'true', 'value_type': 'boolean'}
|
|
]
|
|
}
|
|
],
|
|
'ai_enhancements': {
|
|
'overall_insights': {
|
|
'security_issues_found': 1,
|
|
'recommended_actions': ['Move secrets to .env']
|
|
}
|
|
}
|
|
},
|
|
'architecture': {
|
|
'patterns': [
|
|
{
|
|
'pattern_name': 'MVC',
|
|
'confidence': 0.89,
|
|
'framework': 'Flask',
|
|
'evidence': ['models/ directory', 'views/ directory', 'controllers/ directory']
|
|
}
|
|
],
|
|
'frameworks_detected': ['Flask', 'SQLAlchemy'],
|
|
'languages': {'python': 42, 'javascript': 8},
|
|
'directory_structure': {
|
|
'src': 25,
|
|
'tests': 15,
|
|
'docs': 3
|
|
}
|
|
}
|
|
}
|
|
|
|
def test_codebase_analysis_enabled_by_default(self, mock_config, temp_dir):
|
|
"""Test that enable_codebase_analysis defaults to True."""
|
|
# Config with GitHub source but no explicit enable_codebase_analysis
|
|
config_without_flag = {
|
|
'name': 'test',
|
|
'description': 'Test',
|
|
'sources': [
|
|
{
|
|
'type': 'github',
|
|
'repo': 'test/repo',
|
|
'local_repo_path': temp_dir
|
|
}
|
|
]
|
|
}
|
|
|
|
# Save config
|
|
config_path = os.path.join(temp_dir, 'config.json')
|
|
with open(config_path, 'w') as f:
|
|
json.dump(config_without_flag, f)
|
|
|
|
# Create scraper
|
|
scraper = UnifiedScraper(config_path)
|
|
|
|
# Verify default is True
|
|
github_source = scraper.config['sources'][0]
|
|
assert github_source.get('enable_codebase_analysis', True) == True
|
|
|
|
def test_skip_codebase_analysis_flag(self, mock_config, temp_dir):
|
|
"""Test --skip-codebase-analysis CLI flag disables analysis."""
|
|
# Save config
|
|
config_path = os.path.join(temp_dir, 'config.json')
|
|
with open(config_path, 'w') as f:
|
|
json.dump(mock_config, f)
|
|
|
|
# Create scraper
|
|
scraper = UnifiedScraper(config_path)
|
|
|
|
# Simulate --skip-codebase-analysis flag behavior
|
|
for source in scraper.config.get('sources', []):
|
|
if source['type'] == 'github':
|
|
source['enable_codebase_analysis'] = False
|
|
|
|
# Verify flag disabled it
|
|
github_source = scraper.config['sources'][0]
|
|
assert github_source['enable_codebase_analysis'] == False
|
|
|
|
def test_architecture_md_generation(self, mock_config, mock_c3_data, temp_dir):
|
|
"""Test ARCHITECTURE.md is generated with all 8 sections."""
|
|
# Create skill builder with C3.x data
|
|
scraped_data = {
|
|
'github': {
|
|
'data': {
|
|
'readme': 'Test README',
|
|
'c3_analysis': mock_c3_data
|
|
}
|
|
}
|
|
}
|
|
|
|
builder = UnifiedSkillBuilder(mock_config, scraped_data)
|
|
builder.skill_dir = temp_dir
|
|
|
|
# Generate C3.x references
|
|
c3_dir = os.path.join(temp_dir, 'references', 'codebase_analysis')
|
|
os.makedirs(c3_dir, exist_ok=True)
|
|
builder._generate_architecture_overview(c3_dir, mock_c3_data)
|
|
|
|
# Verify ARCHITECTURE.md exists
|
|
arch_file = os.path.join(c3_dir, 'ARCHITECTURE.md')
|
|
assert os.path.exists(arch_file)
|
|
|
|
# Read and verify content
|
|
with open(arch_file, 'r') as f:
|
|
content = f.read()
|
|
|
|
# Verify all 8 sections exist
|
|
assert '## 1. Overview' in content
|
|
assert '## 2. Architectural Patterns' in content
|
|
assert '## 3. Technology Stack' in content
|
|
assert '## 4. Design Patterns' in content
|
|
assert '## 5. Configuration Overview' in content
|
|
assert '## 6. Common Workflows' in content
|
|
assert '## 7. Usage Examples' in content
|
|
assert '## 8. Entry Points & Directory Structure' in content
|
|
|
|
# Verify specific data is present
|
|
assert 'MVC' in content
|
|
assert 'Flask' in content
|
|
assert 'Factory' in content
|
|
assert '15 usage example(s)' in content or '15 total' in content
|
|
assert 'Security Alert' in content
|
|
|
|
def test_c3_reference_directory_structure(self, mock_config, mock_c3_data, temp_dir):
|
|
"""Test correct C3.x reference directory structure is created."""
|
|
scraped_data = {
|
|
'github': {
|
|
'data': {
|
|
'readme': 'Test README',
|
|
'c3_analysis': mock_c3_data
|
|
}
|
|
}
|
|
}
|
|
|
|
builder = UnifiedSkillBuilder(mock_config, scraped_data)
|
|
builder.skill_dir = temp_dir
|
|
|
|
# Generate C3.x references
|
|
c3_dir = os.path.join(temp_dir, 'references', 'codebase_analysis')
|
|
os.makedirs(c3_dir, exist_ok=True)
|
|
|
|
builder._generate_architecture_overview(c3_dir, mock_c3_data)
|
|
builder._generate_pattern_references(c3_dir, mock_c3_data.get('patterns'))
|
|
builder._generate_example_references(c3_dir, mock_c3_data.get('test_examples'))
|
|
builder._generate_guide_references(c3_dir, mock_c3_data.get('how_to_guides'))
|
|
builder._generate_config_references(c3_dir, mock_c3_data.get('config_patterns'))
|
|
builder._copy_architecture_details(c3_dir, mock_c3_data.get('architecture'))
|
|
|
|
# Verify directory structure
|
|
assert os.path.exists(os.path.join(c3_dir, 'ARCHITECTURE.md'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'patterns'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'examples'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'guides'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'configuration'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'architecture_details'))
|
|
|
|
# Verify index files
|
|
assert os.path.exists(os.path.join(c3_dir, 'patterns', 'index.md'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'examples', 'index.md'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'guides', 'index.md'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'configuration', 'index.md'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'architecture_details', 'index.md'))
|
|
|
|
# Verify JSON data files
|
|
assert os.path.exists(os.path.join(c3_dir, 'patterns', 'detected_patterns.json'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'examples', 'test_examples.json'))
|
|
assert os.path.exists(os.path.join(c3_dir, 'configuration', 'config_patterns.json'))
|
|
|
|
def test_graceful_degradation_on_c3_failure(self, mock_config, temp_dir):
|
|
"""Test skill builds even if C3.x analysis fails."""
|
|
# Mock _run_c3_analysis to raise exception
|
|
with patch('skill_seekers.cli.unified_scraper.UnifiedScraper._run_c3_analysis') as mock_c3:
|
|
mock_c3.side_effect = Exception("C3.x analysis failed")
|
|
|
|
# Save config
|
|
config_path = os.path.join(temp_dir, 'config.json')
|
|
with open(config_path, 'w') as f:
|
|
json.dump(mock_config, f)
|
|
|
|
# Mock GitHubScraper
|
|
with patch('skill_seekers.cli.unified_scraper.GitHubScraper') as mock_github:
|
|
mock_github.return_value.scrape.return_value = {
|
|
'readme': 'Test README',
|
|
'issues': [],
|
|
'releases': []
|
|
}
|
|
|
|
scraper = UnifiedScraper(config_path)
|
|
|
|
# This should not raise an exception
|
|
try:
|
|
scraper._scrape_github(mock_config['sources'][0])
|
|
# If we get here, graceful degradation worked
|
|
assert True
|
|
except Exception as e:
|
|
pytest.fail(f"Should handle C3.x failure gracefully but raised: {e}")
|
|
|
|
def test_config_validator_accepts_c3_properties(self, temp_dir):
|
|
"""Test config validator accepts new C3.5 properties."""
|
|
config = {
|
|
'name': 'test',
|
|
'description': 'Test',
|
|
'sources': [
|
|
{
|
|
'type': 'github',
|
|
'repo': 'test/repo',
|
|
'enable_codebase_analysis': True,
|
|
'ai_mode': 'auto'
|
|
}
|
|
]
|
|
}
|
|
|
|
# Save config
|
|
config_path = os.path.join(temp_dir, 'config.json')
|
|
with open(config_path, 'w') as f:
|
|
json.dump(config, f)
|
|
|
|
# Validate
|
|
validator = ConfigValidator(config_path)
|
|
assert validator.validate() == True
|
|
|
|
def test_config_validator_rejects_invalid_ai_mode(self, temp_dir):
|
|
"""Test config validator rejects invalid ai_mode values."""
|
|
config = {
|
|
'name': 'test',
|
|
'description': 'Test',
|
|
'sources': [
|
|
{
|
|
'type': 'github',
|
|
'repo': 'test/repo',
|
|
'ai_mode': 'invalid_mode' # Invalid!
|
|
}
|
|
]
|
|
}
|
|
|
|
# Save config
|
|
config_path = os.path.join(temp_dir, 'config.json')
|
|
with open(config_path, 'w') as f:
|
|
json.dump(config, f)
|
|
|
|
# Validate should raise
|
|
validator = ConfigValidator(config_path)
|
|
with pytest.raises(ValueError, match="Invalid ai_mode"):
|
|
validator.validate()
|
|
|
|
def test_skill_md_includes_c3_summary(self, mock_config, mock_c3_data, temp_dir):
|
|
"""Test SKILL.md includes C3.x architecture summary."""
|
|
scraped_data = {
|
|
'github': {
|
|
'data': {
|
|
'readme': 'Test README',
|
|
'c3_analysis': mock_c3_data
|
|
}
|
|
}
|
|
}
|
|
|
|
builder = UnifiedSkillBuilder(mock_config, scraped_data)
|
|
builder.skill_dir = temp_dir
|
|
builder._generate_skill_md()
|
|
|
|
# Read SKILL.md
|
|
skill_file = os.path.join(temp_dir, 'SKILL.md')
|
|
with open(skill_file, 'r') as f:
|
|
content = f.read()
|
|
|
|
# Verify C3.x summary section exists
|
|
assert '## 🏗️ Architecture & Code Analysis' in content
|
|
assert 'Primary Architecture' in content
|
|
assert 'MVC' in content
|
|
assert 'Design Patterns' in content
|
|
assert 'Factory' in content
|
|
assert 'references/codebase_analysis/ARCHITECTURE.md' in content
|
|
|
|
|
|
if __name__ == '__main__':
|
|
pytest.main([__file__, '-v'])
|