feat: Make EXCLUDED_DIRS configurable for local repository analysis

Closes #203

Adds configuration options to customize directory exclusions during local
repository analysis, while maintaining backward compatibility with smart
defaults.

**New Config Options:**

1. `exclude_dirs_additional` - Extend defaults (most common)
   - Adds custom directories to default exclusions
   - Example: ["proprietary", "legacy", "third_party"]
   - Total exclusions = defaults + additional

2. `exclude_dirs` - Replace defaults (advanced users)
   - Completely overrides default exclusions
   - Example: ["node_modules", ".git", "custom_vendor"]
   - Gives full control over exclusions

**Implementation:**

- Modified GitHubScraper.__init__() to parse exclude_dirs config
- Changed should_exclude_dir() to use instance variable instead of global
- Added logging for custom exclusions (INFO for extend, WARNING for replace)
- Maintains backward compatibility (no config = use defaults)

**Testing:**

- Added 12 comprehensive tests in test_excluded_dirs_config.py
  - 3 tests for defaults (backward compatibility)
  - 3 tests for extend mode
  - 3 tests for replace mode
  - 1 test for precedence
  - 2 tests for edge cases
- All 12 new tests passing 
- All 22 existing github_scraper tests passing 

**Documentation:**

- Updated CLAUDE.md config parameters section
- Added detailed "Configurable Directory Exclusions" feature section
- Included examples for both modes
- Listed common use cases (monorepos, enterprise, legacy codebases)

**Use Cases:**

- Monorepos with custom directory structures
- Enterprise projects with non-standard naming conventions
- Including unusual directories for analysis
- Minimal exclusions for small/simple projects

**Backward Compatibility:**

 Fully backward compatible - existing configs work unchanged
 Smart defaults maintained when no config provided
 All existing tests pass

Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
This commit is contained in:
yusyus
2025-11-29 23:53:27 +03:00
parent bd20b32470
commit ea289cebe1
3 changed files with 308 additions and 1 deletions

View File

@@ -87,6 +87,28 @@ class GitHubScraper:
self.local_repo_path = os.path.expanduser(self.local_repo_path)
logger.info(f"Local repository mode enabled: {self.local_repo_path}")
# Configure directory exclusions (smart defaults + optional customization)
self.excluded_dirs = set(EXCLUDED_DIRS) # Start with smart defaults
# Option 1: Replace mode - Use only specified exclusions
if 'exclude_dirs' in config:
self.excluded_dirs = set(config['exclude_dirs'])
logger.warning(
f"Using custom directory exclusions ({len(self.excluded_dirs)} dirs) - "
"defaults overridden"
)
logger.debug(f"Custom exclusions: {sorted(self.excluded_dirs)}")
# Option 2: Extend mode - Add to default exclusions
elif 'exclude_dirs_additional' in config:
additional = set(config['exclude_dirs_additional'])
self.excluded_dirs = self.excluded_dirs.union(additional)
logger.info(
f"Added {len(additional)} custom directory exclusions "
f"(total: {len(self.excluded_dirs)})"
)
logger.debug(f"Additional exclusions: {sorted(additional)}")
# GitHub client setup (C1.1)
token = self._get_token()
self.github = Github(token) if token else Github()
@@ -281,7 +303,7 @@ class GitHubScraper:
def should_exclude_dir(self, dir_name: str) -> bool:
"""Check if directory should be excluded from analysis."""
return dir_name in EXCLUDED_DIRS or dir_name.startswith('.')
return dir_name in self.excluded_dirs or dir_name.startswith('.')
def _extract_file_tree(self):
"""Extract repository file tree structure (dual-mode: GitHub API or local filesystem)."""