feat: AI enhancement multi-repo support + critical bug fix

CRITICAL BUG FIX:
- Fixed documentation scraper overwriting list with dict
- Changed self.scraped_data['documentation'] = {...} to .append({...})
- Bug was breaking unified skill builder reference generation

AI ENHANCEMENT UPDATES:
- Added repo_id extraction in utils.py for multi-repo support
- Enhanced grouping by (source, repo_id) tuple in both enhancement files
- Added MULTI-REPOSITORY HANDLING section to AI prompts
- AI now correctly identifies and synthesizes multiple repos

CHANGES:
1. src/skill_seekers/cli/utils.py:
   - _determine_source_metadata() now returns (source, confidence, repo_id)
   - Extracts repo_id from codebase_analysis/{repo_id}/ paths
   - Added repo_id field to reference metadata dict

2. src/skill_seekers/cli/enhance_skill_local.py:
   - Group references by (source_type, repo_id) instead of just source_type
   - Display repo identity in prompt sections
   - Detect multiple repos and add explicit guidance to AI

3. src/skill_seekers/cli/enhance_skill.py:
   - Same grouping and display logic as local enhancement
   - Multi-repository handling section added

4. src/skill_seekers/cli/unified_scraper.py:
   - FIX: Documentation scraper now appends to list instead of overwriting
   - Added source_id, base_url, refs_dir to documentation metadata
   - Update refs_dir after moving to cache

TESTING:
- All 57 tests passing (unified, C3, utilities)
- Single-source verified: httpx comprehensive (219→749 lines after enhancement)
- Multi-source verified: encode/httpx + encode/httpcore (523 lines)
- AI enhancement working: Professional output with source attribution

QUALITY:
- Enhanced httpx SKILL.md: 749 lines, 19KB, A+ quality
- Source attribution working correctly
- Multi-repo synthesis transparent and accurate
- Reference structure clean and organized

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-01-12 22:05:34 +03:00
parent 52cf99136a
commit 72dde1ba08
4 changed files with 151 additions and 47 deletions

View File

@@ -216,10 +216,15 @@ class UnifiedScraper:
with open(docs_data_file, 'r', encoding='utf-8') as f:
summary = json.load(f)
self.scraped_data['documentation'] = {
# Append to documentation list (multi-source support)
self.scraped_data['documentation'].append({
'source_id': doc_config['name'],
'base_url': source['base_url'],
'pages': summary.get('pages', []),
'data_file': docs_data_file
}
'total_pages': summary.get('total_pages', 0),
'data_file': docs_data_file,
'refs_dir': '' # Will be set after moving to cache
})
logger.info(f"✅ Documentation: {summary.get('total_pages', 0)} pages scraped")
else:
@@ -240,6 +245,11 @@ class UnifiedScraper:
shutil.move(docs_output_dir, cache_docs_dir)
logger.info(f"📦 Moved docs output to cache: {cache_docs_dir}")
# Update refs_dir in scraped_data with cache location
refs_dir_path = os.path.join(cache_docs_dir, 'references')
if self.scraped_data['documentation']:
self.scraped_data['documentation'][-1]['refs_dir'] = refs_dir_path
if os.path.exists(docs_data_dir):
cache_data_dir = os.path.join(self.data_dir, f"{doc_config['name']}_data")
if os.path.exists(cache_data_dir):