fix: Remove duplicate documentation directories to save disk space (fixes #279)
Problem:
The analyze command created duplicate documentation directories:
- output/skill-seekers/documentation/ (1.5MB) - Not referenced
- output/skill-seekers/references/documentation/ (1.5MB) - Referenced
This wasted 1.5MB per skill (50% duplication).
Root Cause:
_generate_references() copied directories to references/ but never
cleaned up the source directories.
Solution:
After copying each directory to references/, immediately remove the
source directory using shutil.rmtree(). SKILL.md only references
references/{target}, making the source directories redundant.
Changes:
- Add cleanup in _generate_references() after each copytree operation
- Add 2 comprehensive tests to verify no duplicate directories
- Test coverage: 38/38 tests passing in test_codebase_scraper.py
Impact:
- Saves 1.5MB per skill (documentation size varies)
- Prevents 50% duplication of all analysis output directories
- Clean, efficient disk usage
Tests Added:
- test_no_duplicate_directories_created: Verifies source cleanup
- test_no_disk_space_wasted: Verifies single copy in references/
Reported by: @yangshare via Issue #279
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1855,6 +1855,11 @@ def _generate_references(output_dir: Path):
|
||||
shutil.copytree(source_dir, target_dir)
|
||||
logger.debug(f"Copied {source} → references/{target}")
|
||||
|
||||
# Clean up source directory to avoid duplication (Issue #279)
|
||||
# SKILL.md only references references/{target}, so source dir is redundant
|
||||
shutil.rmtree(source_dir)
|
||||
logger.debug(f"Cleaned up duplicate {source}/ directory")
|
||||
|
||||
logger.info(f"✅ Generated references directory: {references_dir}")
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user