feat: Add 6 new languages to codebase analysis system (C#, Go, Rust, Java, Ruby, PHP)

Expands language support from 3 to 9 languages across entire codebase scraping system.

**New Languages Added:**
- C# (Unity/.NET support) - classes, methods, properties, async/await, XML docs
- Go - structs, functions, methods with receivers, multiple return values
- Rust - structs, functions, async functions, impl blocks
- Java - classes, methods, inheritance, interfaces, generics
- Ruby - classes, methods, inheritance, predicate methods
- PHP - classes, methods, namespaces, inheritance

**Code Analysis (code_analyzer.py):**
- Added 6 new language analyzers (~1000 lines)
- Regex-based parsers inspired by official language specs
- Extract classes, functions, signatures, async detection
- Comprehensive comment extraction for all languages

**Dependency Analysis (dependency_analyzer.py):**
- Added 6 new import extractors (~300 lines)
- C#: using statements, static using, aliases
- Go: import blocks, aliases
- Rust: use statements, curly braces, crate/super
- Java: import statements, static imports, wildcards
- Ruby: require, require_relative, load
- PHP: require/include, namespace use

**File Extensions (codebase_scraper.py):**
- Added mappings: .cs, .go, .rs, .java, .rb, .php

**Test Coverage:**
- Added 24 new tests for 6 languages (4 tests each)
- Added 19 dependency analyzer tests
- Added 6 language detection tests
- Total: 118 tests, 100% passing 

**Credits:**
- Regex patterns based on official language specifications:
  - Microsoft C# Language Specification
  - Go Language Specification
  - Rust Language Reference
  - Oracle Java Language Specification
  - Ruby Documentation
  - PHP Language Reference
- NetworkX for graph algorithms

**Issues Resolved:**
- Closes #166 (C# support request)
- Closes #140 (E1.7 MCP tool scrape_codebase)

**Test Results:**
- test_code_analyzer.py: 54 tests passing
- test_dependency_analyzer.py: 43 tests passing
- test_codebase_scraper.py: 21 tests passing
- Total execution: ~0.41s

🚀 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-01-02 21:28:21 +03:00
parent 0511486677
commit 3408315f40
6 changed files with 1978 additions and 14 deletions

View File

@@ -51,9 +51,33 @@ class TestLanguageDetection(unittest.TestCase):
self.assertEqual(detect_language(Path('test.h')), 'C++')
self.assertEqual(detect_language(Path('test.hpp')), 'C++')
def test_csharp_detection(self):
"""Test C# file detection."""
self.assertEqual(detect_language(Path('test.cs')), 'C#')
def test_go_detection(self):
"""Test Go file detection."""
self.assertEqual(detect_language(Path('test.go')), 'Go')
def test_rust_detection(self):
"""Test Rust file detection."""
self.assertEqual(detect_language(Path('test.rs')), 'Rust')
def test_java_detection(self):
"""Test Java file detection."""
self.assertEqual(detect_language(Path('test.java')), 'Java')
def test_ruby_detection(self):
"""Test Ruby file detection."""
self.assertEqual(detect_language(Path('test.rb')), 'Ruby')
def test_php_detection(self):
"""Test PHP file detection."""
self.assertEqual(detect_language(Path('test.php')), 'PHP')
def test_unknown_language(self):
"""Test unknown file extension."""
self.assertEqual(detect_language(Path('test.go')), 'Unknown')
self.assertEqual(detect_language(Path('test.swift')), 'Unknown')
self.assertEqual(detect_language(Path('test.txt')), 'Unknown')