skill-seekers-reference

firefrost-gaming/skill-seekers-reference

Files

yusyus d1a2df6dae feat: Add multi-level confidence filtering for pattern detection (fixes #240 )

## Problem
Pattern detection was producing too many low-confidence patterns:
- 905 patterns detected (overwhelming)
- Many with confidence as low as 0.50
- 4,875 lines in patterns index.md
- Low signal-to-noise ratio

## Solution

### 1. Added Confidence Thresholds (pattern_recognizer.py)
```python
CONFIDENCE_THRESHOLDS = {
    'critical': 0.80,   # High-confidence for ARCHITECTURE.md
    'high': 0.70,       # Detailed analysis
    'medium': 0.60,     # Include with warning
    'low': 0.50,        # Minimum detection
}
```

### 2. Created Filtering Utilities (pattern_recognizer.py:1650-1723)
- `filter_patterns_by_confidence()` - Filter by threshold
- `create_multi_level_report()` - Multi-level grouping with statistics

### 3. Multi-Level Output Files (codebase_scraper.py:1009-1055)
Now generates 4 output files:
- **all_patterns.json** - All detected patterns (unfiltered)
- **high_confidence_patterns.json** - Patterns ≥ 0.70 (for detailed analysis)
- **critical_patterns.json** - Patterns ≥ 0.80 (for ARCHITECTURE.md)
- **summary.json** - Statistics and thresholds

### 4. Enhanced Logging
```
✅ Detected 4 patterns in 1 files
   🔴 Critical (≥0.80): 0 patterns
   🟠 High (≥0.70): 0 patterns
   🟡 Medium (≥0.60): 1 patterns
   ⚪ Low (<0.60): 3 patterns
```

## Results

**Before:**
- Single output file with all patterns
- No confidence-based filtering
- Overwhelming amount of data

**After:**
- 4 output files by confidence level
- Clear quality indicators (🔴🟠🟡⚪)
- Easy to find high-quality patterns
- Statistics in summary.json

**Example Output:**
```json
{
  "statistics": {
    "total": 4,
    "critical_count": 0,
    "high_confidence_count": 0,
    "medium_count": 1,
    "low_count": 3
  },
  "thresholds": {
    "critical": 0.80,
    "high": 0.70,
    "medium": 0.60,
    "low": 0.50
  }
}
```

## Benefits

1. **Better Signal-to-Noise Ratio**
   - Focus on high-confidence patterns
   - Low-confidence patterns separate

2. **Flexible Usage**
   - ARCHITECTURE.md uses critical_patterns.json
   - Detailed analysis uses high_confidence_patterns.json
   - Debug/research uses all_patterns.json

3. **Clear Quality Indicators**
   - Visual indicators (🔴🟠🟡⚪)
   - Explicit thresholds documented
   - Statistics for quick assessment

4. **Backward Compatible**
   - all_patterns.json maintains full data
   - No breaking changes to existing code
   - Additional files are opt-in

## Testing

**Test project:**
```python
class SingletonDatabase:  # Detected with varying confidence
class UserFactory:        # Detected patterns
class Logger:             # Observer pattern (0.60 confidence)
```

**Results:**
- ✅ All 41 tests passing
- ✅ Multi-level filtering works correctly
- ✅ Statistics accurate
- ✅ Output files created properly

## Future Improvements (Not in this PR)

- Context-aware confidence boosting (pattern in design_patterns/ dir)
- Pattern count limits (top N per file/type)
- AI-enhanced confidence scoring
- Per-language threshold tuning

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>