fix: Remove duplicate documentation directories to save disk space (fixes #279)

Problem:
The analyze command created duplicate documentation directories:
- output/skill-seekers/documentation/ (1.5MB) - Not referenced
- output/skill-seekers/references/documentation/ (1.5MB) - Referenced
This wasted 1.5MB per skill (50% duplication).

Root Cause:
_generate_references() copied directories to references/ but never
cleaned up the source directories.

Solution:
After copying each directory to references/, immediately remove the
source directory using shutil.rmtree(). SKILL.md only references
references/{target}, making the source directories redundant.

Changes:
- Add cleanup in _generate_references() after each copytree operation
- Add 2 comprehensive tests to verify no duplicate directories
- Test coverage: 38/38 tests passing in test_codebase_scraper.py

Impact:
- Saves 1.5MB per skill (documentation size varies)
- Prevents 50% duplication of all analysis output directories
- Clean, efficient disk usage

Tests Added:
- test_no_duplicate_directories_created: Verifies source cleanup
- test_no_disk_space_wasted: Verifies single copy in references/

Reported by: @yangshare via Issue #279

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-05 21:27:41 +03:00
parent 31d83245da
commit 5492fe3dc0
2 changed files with 85 additions and 0 deletions

View File

@@ -1855,6 +1855,11 @@ def _generate_references(output_dir: Path):
shutil.copytree(source_dir, target_dir)
logger.debug(f"Copied {source} → references/{target}")
# Clean up source directory to avoid duplication (Issue #279)
# SKILL.md only references references/{target}, so source dir is redundant
shutil.rmtree(source_dir)
logger.debug(f"Cleaned up duplicate {source}/ directory")
logger.info(f"✅ Generated references directory: {references_dir}")