feat: Make EXCLUDED_DIRS configurable for local repository analysis
Closes #203 Adds configuration options to customize directory exclusions during local repository analysis, while maintaining backward compatibility with smart defaults. **New Config Options:** 1. `exclude_dirs_additional` - Extend defaults (most common) - Adds custom directories to default exclusions - Example: ["proprietary", "legacy", "third_party"] - Total exclusions = defaults + additional 2. `exclude_dirs` - Replace defaults (advanced users) - Completely overrides default exclusions - Example: ["node_modules", ".git", "custom_vendor"] - Gives full control over exclusions **Implementation:** - Modified GitHubScraper.__init__() to parse exclude_dirs config - Changed should_exclude_dir() to use instance variable instead of global - Added logging for custom exclusions (INFO for extend, WARNING for replace) - Maintains backward compatibility (no config = use defaults) **Testing:** - Added 12 comprehensive tests in test_excluded_dirs_config.py - 3 tests for defaults (backward compatibility) - 3 tests for extend mode - 3 tests for replace mode - 1 test for precedence - 2 tests for edge cases - All 12 new tests passing ✅ - All 22 existing github_scraper tests passing ✅ **Documentation:** - Updated CLAUDE.md config parameters section - Added detailed "Configurable Directory Exclusions" feature section - Included examples for both modes - Listed common use cases (monorepos, enterprise, legacy codebases) **Use Cases:** - Monorepos with custom directory structures - Enterprise projects with non-standard naming conventions - Including unusual directories for analysis - Minimal exclusions for small/simple projects **Backward Compatibility:** ✅ Fully backward compatible - existing configs work unchanged ✅ Smart defaults maintained when no config provided ✅ All existing tests pass Co-authored-by: jimmy058910 <jimmy058910@users.noreply.github.com>
This commit is contained in:
@@ -87,6 +87,28 @@ class GitHubScraper:
|
||||
self.local_repo_path = os.path.expanduser(self.local_repo_path)
|
||||
logger.info(f"Local repository mode enabled: {self.local_repo_path}")
|
||||
|
||||
# Configure directory exclusions (smart defaults + optional customization)
|
||||
self.excluded_dirs = set(EXCLUDED_DIRS) # Start with smart defaults
|
||||
|
||||
# Option 1: Replace mode - Use only specified exclusions
|
||||
if 'exclude_dirs' in config:
|
||||
self.excluded_dirs = set(config['exclude_dirs'])
|
||||
logger.warning(
|
||||
f"Using custom directory exclusions ({len(self.excluded_dirs)} dirs) - "
|
||||
"defaults overridden"
|
||||
)
|
||||
logger.debug(f"Custom exclusions: {sorted(self.excluded_dirs)}")
|
||||
|
||||
# Option 2: Extend mode - Add to default exclusions
|
||||
elif 'exclude_dirs_additional' in config:
|
||||
additional = set(config['exclude_dirs_additional'])
|
||||
self.excluded_dirs = self.excluded_dirs.union(additional)
|
||||
logger.info(
|
||||
f"Added {len(additional)} custom directory exclusions "
|
||||
f"(total: {len(self.excluded_dirs)})"
|
||||
)
|
||||
logger.debug(f"Additional exclusions: {sorted(additional)}")
|
||||
|
||||
# GitHub client setup (C1.1)
|
||||
token = self._get_token()
|
||||
self.github = Github(token) if token else Github()
|
||||
@@ -281,7 +303,7 @@ class GitHubScraper:
|
||||
|
||||
def should_exclude_dir(self, dir_name: str) -> bool:
|
||||
"""Check if directory should be excluded from analysis."""
|
||||
return dir_name in EXCLUDED_DIRS or dir_name.startswith('.')
|
||||
return dir_name in self.excluded_dirs or dir_name.startswith('.')
|
||||
|
||||
def _extract_file_tree(self):
|
||||
"""Extract repository file tree structure (dual-mode: GitHub API or local filesystem)."""
|
||||
|
||||
Reference in New Issue
Block a user