fix: Framework detection now works by including import-only files (fixes #239)

## Problem
Framework detection was broken because files with only imports (no
classes/functions) were excluded from analysis. The architectural pattern
detector received empty file lists, resulting in 0 frameworks detected.

## Root Cause
In codebase_scraper.py:873-881, the has_content check filtered out files
that didn't have classes, functions, or other structural elements. This
excluded simple __init__.py files that only contained import statements,
which are critical for framework detection.

## Solution (3 parts)

1. **Extract imports from Python files** (code_analyzer.py:140-178)
   - Added import extraction using AST (ast.Import, ast.ImportFrom)
   - Returns imports list in analysis results
   - Now captures: "from flask import Flask" → ["flask"]

2. **Include import-only files** (codebase_scraper.py:873-881)
   - Updated has_content check to include files with imports
   - Files with imports are now included in analysis results
   - Comment added: "IMPORTANT: Include files with imports for framework
     detection (fixes #239)"

3. **Enhance framework detection** (architectural_pattern_detector.py:195-240)
   - Extract imports from all Python files in analysis
   - Check imports in addition to file paths and directory structure
   - Prioritize import-based detection (high confidence)
   - Require 2+ matches for path-based detection (avoid false positives)
   - Added debug logging: "Collected N imports for framework detection"

## Results

**Before fix:**
- Test Flask project: 0 files analyzed, 0 frameworks detected
- Files with imports: excluded from analysis
- Framework detection: completely broken

**After fix:**
- Test Flask project: 3 files analyzed, Flask detected 
- Files with imports: included in analysis
- Framework detection: working correctly
- No false positives (ASP.NET, Rails, etc.)

## Testing

Added comprehensive test suite (tests/test_framework_detection.py):
-  test_flask_framework_detection_from_imports
-  test_files_with_imports_are_included
-  test_no_false_positive_frameworks

All existing tests pass:
-  38 tests in test_codebase_scraper.py
-  54 tests in test_code_analyzer.py
-  3 new tests in test_framework_detection.py

## Impact

- Fixes issue #239 completely
- Framework detection now works for Python projects
- Import-only files (common in Python packages) are properly analyzed
- No performance impact (import extraction is fast)
- No breaking changes to existing functionality

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-05 22:02:06 +03:00
parent 5492fe3dc0
commit a565b87a90
4 changed files with 249 additions and 11 deletions

View File

@@ -200,6 +200,16 @@ class ArchitecturalPatternDetector:
all_paths = [str(f.get("file", "")) for f in files]
all_content = " ".join(all_paths)
# Extract all imports from Python files (fixes #239)
all_imports = []
for file_data in files:
if file_data.get("language") == "Python" and file_data.get("imports"):
all_imports.extend(file_data["imports"])
# Create searchable import string
import_content = " ".join(all_imports)
logger.debug(f"Collected {len(all_imports)} imports for framework detection")
# Also check actual directory structure for game engine markers
# (project.godot, .unity, .uproject are config files, not in analyzed files)
dir_files = []
@@ -227,15 +237,27 @@ class ArchitecturalPatternDetector:
# Return early to prevent web framework false positives
return detected
# Check other frameworks
# Check other frameworks (including imports - fixes #239)
for framework, markers in self.FRAMEWORK_MARKERS.items():
if framework in ["Unity", "Unreal", "Godot"]:
continue # Already checked
matches = sum(1 for marker in markers if marker.lower() in all_content.lower())
if matches >= 2:
# Check in file paths, directory structure, AND imports
path_matches = sum(1 for marker in markers if marker.lower() in all_content.lower())
dir_matches = sum(1 for marker in markers if marker.lower() in dir_content.lower())
import_matches = sum(1 for marker in markers if marker.lower() in import_content.lower())
# Strategy: Prioritize import-based detection (more accurate)
# If we have import matches, they're strong signals - use them alone
# Otherwise, require 2+ matches from paths/dirs
if import_matches >= 1:
# Import-based detection (high confidence)
detected.append(framework)
logger.info(f" 📦 Detected framework: {framework}")
logger.info(f" 📦 Detected framework: {framework} (imports:{import_matches})")
elif (path_matches + dir_matches) >= 2:
# Path/directory-based detection (requires 2+ matches)
detected.append(framework)
logger.info(f" 📦 Detected framework: {framework} (path:{path_matches} dir:{dir_matches})")
return detected