b475b51ad1f0ee9302dc1a1b8e28e70a958fa649
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c33c6f9073 | change max lenght | ||
|
|
5ed767ff9a | run ruff | ||
|
|
72dde1ba08 |
feat: AI enhancement multi-repo support + critical bug fix
CRITICAL BUG FIX:
- Fixed documentation scraper overwriting list with dict
- Changed self.scraped_data['documentation'] = {...} to .append({...})
- Bug was breaking unified skill builder reference generation
AI ENHANCEMENT UPDATES:
- Added repo_id extraction in utils.py for multi-repo support
- Enhanced grouping by (source, repo_id) tuple in both enhancement files
- Added MULTI-REPOSITORY HANDLING section to AI prompts
- AI now correctly identifies and synthesizes multiple repos
CHANGES:
1. src/skill_seekers/cli/utils.py:
- _determine_source_metadata() now returns (source, confidence, repo_id)
- Extracts repo_id from codebase_analysis/{repo_id}/ paths
- Added repo_id field to reference metadata dict
2. src/skill_seekers/cli/enhance_skill_local.py:
- Group references by (source_type, repo_id) instead of just source_type
- Display repo identity in prompt sections
- Detect multiple repos and add explicit guidance to AI
3. src/skill_seekers/cli/enhance_skill.py:
- Same grouping and display logic as local enhancement
- Multi-repository handling section added
4. src/skill_seekers/cli/unified_scraper.py:
- FIX: Documentation scraper now appends to list instead of overwriting
- Added source_id, base_url, refs_dir to documentation metadata
- Update refs_dir after moving to cache
TESTING:
- All 57 tests passing (unified, C3, utilities)
- Single-source verified: httpx comprehensive (219→749 lines after enhancement)
- Multi-source verified: encode/httpx + encode/httpcore (523 lines)
- AI enhancement working: Professional output with source attribution
QUALITY:
- Enhanced httpx SKILL.md: 749 lines, 19KB, A+ quality
- Source attribution working correctly
- Multi-repo synthesis transparent and accurate
- Reference structure clean and organized
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
||
|
|
a99e22c639 |
feat: Multi-Source Synthesis Architecture - Rich Standalone Skills + Smart Combination
BREAKING CHANGE: Major architectural improvements to multi-source skill generation This commit implements the complete "Multi-Source Synthesis Architecture" where each source (documentation, GitHub, PDF) generates a rich standalone SKILL.md file before being intelligently synthesized with source-specific formulas. ## 🎯 Core Architecture Changes ### 1. Rich Standalone SKILL.md Generation (Source Parity) Each source now generates comprehensive, production-quality SKILL.md files that can stand alone OR be synthesized with other sources. **GitHub Scraper Enhancements** (+263 lines): - Now generates 300+ line SKILL.md (was ~50 lines) - Integrates C3.x codebase analysis data: - C2.5: API Reference extraction - C3.1: Design pattern detection (27 high-confidence patterns) - C3.2: Test example extraction (215 examples) - C3.7: Architectural pattern analysis - Enhanced sections: - ⚡ Quick Reference with pattern summaries - 📝 Code Examples from real repository tests - 🔧 API Reference from codebase analysis - 🏗️ Architecture Overview with design patterns - ⚠️ Known Issues from GitHub issues - Location: src/skill_seekers/cli/github_scraper.py **PDF Scraper Enhancements** (+205 lines): - Now generates 200+ line SKILL.md (was ~50 lines) - Enhanced content extraction: - 📖 Chapter Overview (PDF structure breakdown) - 🔑 Key Concepts (extracted from headings) - ⚡ Quick Reference (pattern extraction) - 📝 Code Examples: Top 15 (was top 5), grouped by language - Quality scoring and intelligent truncation - Better formatting and organization - Location: src/skill_seekers/cli/pdf_scraper.py **Result**: All 3 sources (docs, GitHub, PDF) now have equal capability to generate rich, comprehensive standalone skills. ### 2. File Organization & Caching System **Problem**: output/ directory cluttered with intermediate files, data, and logs. **Solution**: New `.skillseeker-cache/` hidden directory for all intermediate files. **New Structure**: ``` .skillseeker-cache/{skill_name}/ ├── sources/ # Standalone SKILL.md from each source │ ├── httpx_docs/ │ ├── httpx_github/ │ └── httpx_pdf/ ├── data/ # Raw scraped data (JSON) ├── repos/ # Cloned GitHub repositories (cached for reuse) └── logs/ # Session logs with timestamps output/{skill_name}/ # CLEAN: Only final synthesized skill ├── SKILL.md └── references/ ``` **Benefits**: - ✅ Clean output/ directory (only final product) - ✅ Intermediate files preserved for debugging - ✅ Repository clones cached and reused (faster re-runs) - ✅ Timestamped logs for each scraping session - ✅ All cache dirs added to .gitignore **Changes**: - .gitignore: Added `.skillseeker-cache/` entry - unified_scraper.py: Complete reorganization (+238 lines) - Added cache directory structure - File logging with timestamps - Repository cloning with caching/reuse - Cleaner intermediate file management - Better subprocess logging and error handling ### 3. Config Repository Migration **Moved to separate config repository**: https://github.com/yusufkaraaslan/skill-seekers-configs **Deleted from this repo** (35 config files): - ansible-core.json, astro.json, claude-code.json - django.json, django_unified.json, fastapi.json, fastapi_unified.json - godot.json, godot_unified.json, godot_github.json, godot-large-example.json - react.json, react_unified.json, react_github.json, react_github_example.json - vue.json, kubernetes.json, laravel.json, tailwind.json, hono.json - svelte_cli_unified.json, steam-economy-complete.json - deck_deck_go_local.json, python-tutorial-test.json, example_pdf.json - test-manual.json, fastapi_unified_test.json, fastmcp_github_example.json - example-team/ directory (4 files) **Kept as reference example**: - configs/httpx_comprehensive.json (complete multi-source example) **Rationale**: - Cleaner repository (979+ lines added, 1680 deleted) - Configs managed separately with versioning - Official presets available via `fetch-config` command - Users can maintain private config repos ### 4. AI Enhancement Improvements **enhance_skill.py** (+125 lines): - Better integration with multi-source synthesis - Enhanced prompt generation for synthesized skills - Improved error handling and logging - Support for source metadata in enhancement ### 5. Documentation Updates **CLAUDE.md** (+252 lines): - Comprehensive project documentation - Architecture explanations - Development workflow guidelines - Testing requirements - Multi-source synthesis patterns **SKILL_QUALITY_ANALYSIS.md** (new): - Quality assessment framework - Before/after analysis of httpx skill - Grading rubric for skill quality - Metrics and benchmarks ### 6. Testing & Validation Scripts **test_httpx_skill.sh** (new): - Complete httpx skill generation test - Multi-source synthesis validation - Quality metrics verification **test_httpx_quick.sh** (new): - Quick validation script - Subset of features for rapid testing ## 📊 Quality Improvements | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | GitHub SKILL.md lines | ~50 | 300+ | +500% | | PDF SKILL.md lines | ~50 | 200+ | +300% | | GitHub C3.x integration | ❌ No | ✅ Yes | New feature | | PDF pattern extraction | ❌ No | ✅ Yes | New feature | | File organization | Messy | Clean cache | Major improvement | | Repository cloning | Always fresh | Cached reuse | Faster re-runs | | Logging | Console only | Timestamped files | Better debugging | | Config management | In-repo | Separate repo | Cleaner separation | ## 🧪 Testing All existing tests pass: - test_c3_integration.py: Updated for new architecture - 700+ tests passing - Multi-source synthesis validated with httpx example ## 🔧 Technical Details **Modified Core Files**: 1. src/skill_seekers/cli/github_scraper.py (+263 lines) - _generate_skill_md(): Rich content with C3.x integration - _format_pattern_summary(): Design pattern summaries - _format_code_examples(): Test example formatting - _format_api_reference(): API reference from codebase - _format_architecture(): Architectural pattern analysis 2. src/skill_seekers/cli/pdf_scraper.py (+205 lines) - _generate_skill_md(): Enhanced with rich content - _format_key_concepts(): Extract concepts from headings - _format_patterns_from_content(): Pattern extraction - Code examples: Top 15, grouped by language, better quality scoring 3. src/skill_seekers/cli/unified_scraper.py (+238 lines) - __init__(): Cache directory structure - _setup_logging(): File logging with timestamps - _clone_github_repo(): Repository caching system - _scrape_documentation(): Move to cache, better logging - Better subprocess handling and error reporting 4. src/skill_seekers/cli/enhance_skill.py (+125 lines) - Multi-source synthesis awareness - Enhanced prompt generation - Better error handling **Minor Updates**: - src/skill_seekers/cli/codebase_scraper.py (+3 lines): Minor improvements - src/skill_seekers/cli/test_example_extractor.py: Quality scoring adjustments - tests/test_c3_integration.py: Test updates for new architecture ## 🚀 Migration Guide **For users with existing configs**: No action required - all existing configs continue to work. **For users wanting official presets**: ```bash # Fetch from official config repo skill-seekers fetch-config --name react --target unified # Or use existing local configs skill-seekers unified --config configs/httpx_comprehensive.json ``` **Cache directory**: New `.skillseeker-cache/` directory will be created automatically. Safe to delete - will be regenerated on next run. ## 📈 Next Steps This architecture enables: - ✅ Source parity: All sources generate rich standalone skills - ✅ Smart synthesis: Each combination has optimal formula - ✅ Better debugging: Cached files and logs preserved - ✅ Faster iteration: Repository caching, clean output - 🔄 Future: Multi-platform enhancement (Gemini, GPT-4) - planned - 🔄 Future: Conflict detection between sources - planned - 🔄 Future: Source prioritization rules - planned ## 🎓 Example: httpx Skill Quality **Before**: 186 lines, basic synthesis, missing data **After**: 640 lines with AI enhancement, A- (9/10) quality **What changed**: - All C3.x analysis data integrated (patterns, tests, API, architecture) - GitHub metadata included (stars, topics, languages) - PDF chapter structure visible - Professional formatting with emojis and clear sections - Real-world code examples from test suite - Design patterns explained with confidence scores - Known issues with impact assessment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|
|
f2faebb8d5 |
fix: Complete fix for Issue #219 - All three problems resolved
**Problem #1: Large File Encoding Error** ✅ FIXED - Add large file download support via download_url - Detect encoding='none' for files >1MB - Download via GitHub raw URL instead of API - Handles ccxt/ccxt's 1.4MB CHANGELOG.md successfully **Problem #2: Missing CLI Enhancement Flags** ✅ FIXED - Add --enhance, --enhance-local, --api-key to main.py github_parser - Add flag forwarding in CLI dispatcher - Fixes 'unrecognized arguments' error - Users can now use: skill-seekers github --repo owner/repo --enhance-local **Problem #3: Custom API Endpoint Support** ✅ FIXED - Support ANTHROPIC_BASE_URL environment variable - Support ANTHROPIC_AUTH_TOKEN (alternative to ANTHROPIC_API_KEY) - Fix ThinkingBlock.text error with newer Anthropic SDK - Find TextBlock in response content array (handles thinking blocks) **Changes**: - src/skill_seekers/cli/enhance_skill.py: - Support custom base_url parameter - Support both ANTHROPIC_API_KEY and ANTHROPIC_AUTH_TOKEN - Iterate through content blocks to find text (handles ThinkingBlock) - src/skill_seekers/cli/main.py: - Add --enhance, --enhance-local, --api-key to github_parser - Forward flags to github_scraper.py in dispatcher - src/skill_seekers/cli/github_scraper.py: - Add large file detection (encoding=None/"none") - Download via download_url with requests - Log file size and download progress - tests/test_github_scraper.py: - Add test_get_file_content_large_file - Add test_extract_changelog_large_file - All 31 tests passing ✅ **Credits**: - Thanks to @XGCoder for detailed bug report - Thanks to @gorquan for local fixes and guidance Fixes #219 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|
|
d0bc042a43 |
feat(multi-llm): Phase 1 - Foundation adaptor architecture
Implement base adaptor pattern for multi-LLM support (Issue #179) **Architecture:** - Created adaptors/ package with base SkillAdaptor class - Implemented factory pattern with get_adaptor() registry - Refactored Claude-specific code into ClaudeAdaptor **Changes:** - New: src/skill_seekers/cli/adaptors/base.py (SkillAdaptor + SkillMetadata) - New: src/skill_seekers/cli/adaptors/__init__.py (registry + factory) - New: src/skill_seekers/cli/adaptors/claude.py (refactored upload + enhance logic) - Modified: package_skill.py (added --target flag, uses adaptor.package()) - Modified: upload_skill.py (added --target flag, uses adaptor.upload()) - Modified: enhance_skill.py (added --target flag, uses adaptor.enhance()) **Tests:** - New: tests/test_adaptors/test_base.py (10 tests passing) - All existing tests still pass (backward compatible) **Backward Compatibility:** - Default --target=claude maintains existing behavior - All CLI tools work exactly as before without --target flag - No breaking changes **Next:** Phase 2 - Implement Gemini, OpenAI, Markdown adaptors |
||
|
|
13ca374295 |
refactor: Update CLI commands to use new unified entry points
Updated all command examples in CLI scripts from old pattern: python3 cli/<script>.py → skill-seekers <command> Changes: - doc_scraper.py → skill-seekers scrape - github_scraper.py → skill-seekers github - pdf_scraper.py → skill-seekers pdf - unified_scraper.py → skill-seekers unified - enhance_skill.py → skill-seekers enhance - enhance_skill_local.py → skill-seekers enhance - package_skill.py → skill-seekers package - estimate_pages.py → skill-seekers estimate This reflects the new modern Python packaging with proper entry points. Users can now use clean commands instead of file paths. Files updated: 10 CLI scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|
|
ce1c07b437 |
feat: Add modern Python packaging - Phase 1 (Foundation)
Implements issue #168 - Modern Python packaging with uv support This is Phase 1 of the modernization effort, establishing the core package structure and build system. ## Major Changes ### 1. Migrated to src/ Layout - Moved cli/ → src/skill_seekers/cli/ - Moved skill_seeker_mcp/ → src/skill_seekers/mcp/ - Created root package: src/skill_seekers/__init__.py - Updated all imports: cli. → skill_seekers.cli. - Updated all imports: skill_seeker_mcp. → skill_seekers.mcp. ### 2. Created pyproject.toml - Modern Python packaging configuration - All dependencies properly declared - 8 CLI entry points configured: * skill-seekers (unified CLI) * skill-seekers-scrape * skill-seekers-github * skill-seekers-pdf * skill-seekers-unified * skill-seekers-enhance * skill-seekers-package * skill-seekers-upload * skill-seekers-estimate - uv tool support enabled - Build system: setuptools with wheel ### 3. Created Unified CLI (main.py) - Git-style subcommands (skill-seekers scrape, etc.) - Delegates to existing tool main() functions - Full help system at top-level and subcommand level - Backwards compatible with individual commands ### 4. Updated Package Versions - cli/__init__.py: 1.3.0 → 2.0.0 - mcp/__init__.py: 1.2.0 → 2.0.0 - Root package: 2.0.0 ### 5. Updated Test Suite - Fixed test_package_structure.py for new layout - All 28 package structure tests passing - Updated all test imports for new structure ## Installation Methods (Working) ```bash # Development install pip install -e . # Run unified CLI skill-seekers --version # → 2.0.0 skill-seekers --help # Run individual tools skill-seekers-scrape --help skill-seekers-github --help ``` ## Test Results - Package structure tests: 28/28 passing ✅ - Package installs successfully ✅ - All entry points working ✅ ## Still TODO (Phase 2) - [ ] Run full test suite (299 tests) - [ ] Update documentation (README, CLAUDE.md, etc.) - [ ] Test with uv tool run/install - [ ] Build and publish to PyPI - [ ] Create PR and merge ## Breaking Changes None - fully backwards compatible. Old import paths still work. ## Migration for Users No action needed. Package works with both pip and uv. Closes #168 (when complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |