fix: Fix local repo extraction limitations (code analyzer, exclusions, enhancement)

This commit fixes three critical limitations discovered during local repository skill extraction testing: **Fix 1: Code Analyzer Import Issue** - Changed unified_scraper.py to use absolute imports instead of relative imports - Fixed: `from github_scraper import` → `from skill_seekers.cli.github_scraper import` - Fixed: `from pdf_scraper import` → `from skill_seekers.cli.pdf_scraper import` - Result: CodeAnalyzer now available during extraction, deep analysis works **Fix 2: Unity Library Exclusions** - Updated should_exclude_dir() to accept and check full directory paths - Updated _extract_file_tree_local() to pass both dir name and full path - Added exclusion config passing from unified_scraper to github_scraper - Result: exclude_dirs_additional now works (297 files excluded in test) **Fix 3: AI Enhancement for Single Sources** - Changed read_reference_files() to use rglob() for recursive search - Now finds reference files in subdirectories (e.g., references/github/README.md) - Result: AI enhancement works with unified skills that have nested references **Test Results:** - Code Analyzer: ✅ Working (deep analysis running) - Unity Exclusions: ✅ Working (297 files excluded from 679) - AI Enhancement: ✅ Working (finds and reads nested references) **Files Changed:** - src/skill_seekers/cli/unified_scraper.py (Fix 1 & 2) - src/skill_seekers/cli/github_scraper.py (Fix 2) - src/skill_seekers/cli/utils.py (Fix 3) **Test Artifacts:** - configs/deck_deck_go_local.json (test configuration) - docs/LOCAL_REPO_TEST_RESULTS.md (comprehensive test report) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 22:24:38 +03:00
parent ae69c507a0
commit 65ded6c07c
5 changed files with 567 additions and 21 deletions
--- a/src/skill_seekers/cli/utils.py
+++ b/src/skill_seekers/cli/utils.py
@@ -203,7 +203,8 @@ def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, p
        return references

    total_chars = 0
-    for ref_file in sorted(references_dir.glob("*.md")):
+    # Search recursively for all .md files (including subdirectories like github/README.md)
+    for ref_file in sorted(references_dir.rglob("*.md")):
        if ref_file.name == "index.md":
            continue

@@ -213,7 +214,9 @@ def read_reference_files(skill_dir: Union[str, Path], max_chars: int = 100000, p
        if len(content) > preview_limit:
            content = content[:preview_limit] + "\n\n[Content truncated...]"

-        references[ref_file.name] = content
+        # Use relative path from references_dir as key for nested files
+        relative_path = ref_file.relative_to(references_dir)
+        references[str(relative_path)] = content
        total_chars += len(content)

        # Stop if we've read enough