fix: Fix local repo extraction limitations (code analyzer, exclusions, enhancement)

This commit fixes three critical limitations discovered during local repository skill extraction testing:

**Fix 1: Code Analyzer Import Issue**
- Changed unified_scraper.py to use absolute imports instead of relative imports
- Fixed: `from github_scraper import` → `from skill_seekers.cli.github_scraper import`
- Fixed: `from pdf_scraper import` → `from skill_seekers.cli.pdf_scraper import`
- Result: CodeAnalyzer now available during extraction, deep analysis works

**Fix 2: Unity Library Exclusions**
- Updated should_exclude_dir() to accept and check full directory paths
- Updated _extract_file_tree_local() to pass both dir name and full path
- Added exclusion config passing from unified_scraper to github_scraper
- Result: exclude_dirs_additional now works (297 files excluded in test)

**Fix 3: AI Enhancement for Single Sources**
- Changed read_reference_files() to use rglob() for recursive search
- Now finds reference files in subdirectories (e.g., references/github/README.md)
- Result: AI enhancement works with unified skills that have nested references

**Test Results:**
- Code Analyzer:  Working (deep analysis running)
- Unity Exclusions:  Working (297 files excluded from 679)
- AI Enhancement:  Working (finds and reads nested references)

**Files Changed:**
- src/skill_seekers/cli/unified_scraper.py (Fix 1 & 2)
- src/skill_seekers/cli/github_scraper.py (Fix 2)
- src/skill_seekers/cli/utils.py (Fix 3)

**Test Artifacts:**
- configs/deck_deck_go_local.json (test configuration)
- docs/LOCAL_REPO_TEST_RESULTS.md (comprehensive test report)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2025-12-21 22:24:38 +03:00
parent ae69c507a0
commit 65ded6c07c
5 changed files with 567 additions and 21 deletions

View File

@@ -203,7 +203,8 @@ def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, p
return references
total_chars = 0
for ref_file in sorted(references_dir.glob("*.md")):
# Search recursively for all .md files (including subdirectories like github/README.md)
for ref_file in sorted(references_dir.rglob("*.md")):
if ref_file.name == "index.md":
continue
@@ -213,7 +214,9 @@ def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, p
if len(content) > preview_limit:
content = content[:preview_limit] + "\n\n[Content truncated...]"
references[ref_file.name] = content
# Use relative path from references_dir as key for nested files
relative_path = ref_file.relative_to(references_dir)
references[str(relative_path)] = content
total_chars += len(content)
# Stop if we've read enough