# Implementation Plan: Arbitrary Limits & Dead Code

**Generated:** 2026-02-24  
**Scope:** Remove harmful arbitrary limits and implement critical TODO items  
**Priority:** P0 (Critical) - P3 (Backlog)

---

## Part 1: Arbitrary Limits to Remove

### 🔴 P0 - Critical (Fix Immediately)

#### 1.1 Enhancement Code Block Limit
**File:** `src/skill_seekers/cli/enhance_skill_local.py:341`  
**Current:**
```python
for _idx, block in code_blocks[:5]:  # Max 5 code blocks
```
**Problem:** AI enhancement only sees 5 code blocks regardless of skill size. A skill with 100 code examples has 95% ignored during enhancement.

**Solution:**
```python
# Option A: Remove limit, use token counting instead
max_tokens = 4000  # Claude 3.5 Sonnet context for enhancement
current_tokens = 0
selected_blocks = []
for idx, block in code_blocks:
    block_tokens = estimate_tokens(block)
    if current_tokens + block_tokens > max_tokens:
        break
    selected_blocks.append((idx, block))
    current_tokens += block_tokens

for _idx, block in selected_blocks:
```

**Effort:** 2 hours  
**Impact:** High - Massive improvement in enhancement quality  
**Breaking Change:** No

---

### 🟠 P1 - High Priority (Next Sprint)

#### 1.2 Reference File Code Truncation
**Files:**
- `src/skill_seekers/cli/codebase_scraper.py:422, 489, 575, 720, 746`
- `src/skill_seekers/cli/unified_skill_builder.py:1298`

**Current:**
```python
"code": code[:500],  # Truncate long code blocks
```

**Problem:** Reference files should be comprehensive. Truncating code blocks at 500 chars breaks copy-paste functionality and harms skill utility.

**Solution:**
```python
# Remove truncation from reference files
"code": code,  # Full code

# Keep truncation only for SKILL.md summaries (if needed)
```

**Effort:** 1 hour  
**Impact:** High - Reference files become actually usable  
**Breaking Change:** No (output improves)

---

#### 1.3 Table Row Limit in References
**File:** `src/skill_seekers/cli/word_scraper.py:595`  

**Current:**
```python
for row in rows[:5]:
```

**Problem:** Tables in reference files truncated to 5 rows.

**Solution:** Remove `[:5]` limit from reference file generation. Keep limit only for SKILL.md summaries.

**Effort:** 30 minutes  
**Impact:** Medium  
**Breaking Change:** No

---

### 🟡 P2 - Medium Priority (Backlog)

#### 1.4 Pattern/Example Limits in Analysis
**Files:**
- `src/skill_seekers/cli/codebase_scraper.py:1898` - `examples[:10]`
- `src/skill_seekers/cli/github_scraper.py:1145, 1169` - Pattern limits
- `src/skill_seekers/cli/doc_scraper.py:608` - `patterns[:5]`

**Problem:** Pattern detection limited arbitrarily, missing edge cases.

**Solution:** Make configurable via `--max-patterns` flag with sensible default (50 instead of 5-10).

**Effort:** 3 hours  
**Impact:** Medium - Better pattern coverage  
**Breaking Change:** No

---

#### 1.5 Issue/Release Limits in GitHub Scraper
**File:** `src/skill_seekers/cli/github_scraper.py`

**Current:**
```python
for release in releases[:3]:
for issue in issues[:5]:
for issue in open_issues[:20]:
```

**Problem:** Hard limits without user control.

**Solution:** Add CLI flags:
```python
parser.add_argument("--max-issues", type=int, default=50)
parser.add_argument("--max-releases", type=int, default=10)
```

**Effort:** 2 hours  
**Impact:** Medium - User control  
**Breaking Change:** No

---

#### 1.6 Config File Display Limits
**Files:**
- `src/skill_seekers/cli/config_manager.py:540` - `jobs[:5]`
- `src/skill_seekers/cli/config_enhancer.py:165, 302` - Config file limits

**Problem:** Display truncated for UX reasons, but should have `--verbose` override.

**Solution:** Add verbose mode check:
```python
if verbose:
    display_items = items
else:
    display_items = items[:5]  # Truncated for readability
```

**Effort:** 2 hours  
**Impact:** Low-Medium  
**Breaking Change:** No

---

### 🟢 P3 - Low Priority / Keep As Is

These limits are justified and should remain:

| Location | Limit | Justification |
|----------|-------|---------------|
| `word_scraper.py:553` | `all_code[:15]` | SKILL.md summary - full code in references |
| `word_scraper.py:567` | `examples[:5]` | Per-language summary in SKILL.md |
| `pdf_scraper.py:453, 472` | Same as above | Consistent with Word scraper |
| `word_scraper.py:658, 664` | `[:10], [:15]` | Key concepts list (justified for readability) |
| `adaptors/*.py` | `[:30000]` | API token limits (Claude/Gemini/OpenAI) |
| `base.py:208` | `[:500]` | Preview/summary text (not reference) |

---

## Part 1b: Hardcoded Language Issues

These are **data flow bugs** - the correct language is available upstream but hardcoded to `"python"` downstream.

### 🔴 P0 - Critical

#### 1.b.1 Test Example Code Snippets
**File:** `src/skill_seekers/cli/unified_skill_builder.py:1298`

**Current:**
```python
f.write(f"\n```python\n{ex['code_snippet'][:300]}\n```\n")
```

**Problem:** Hardcoded to `python` regardless of actual language.

**Available Data:** The `ex` dict from `TestExample.to_dict()` includes a `language` field.

**Fix:**
```python
lang = ex.get("language", "text")
f.write(f"\n```{lang}\n{ex['code_snippet'][:300]}\n```\n")
```

**Effort:** 1 minute  
**Impact:** Medium - Syntax highlighting now correct  
**Breaking Change:** No

---

#### 1.b.2 How-To Guide Language
**File:** `src/skill_seekers/cli/how_to_guide_builder.py:1018`

**Current:**
```python
"language": "python",  # TODO: Detect from code
```

**Problem:** Language hardcoded in guide data sent to AI enhancement.

**Solution (3 one-line changes):**

1. **Add field to dataclass** (around line 70):
```python
@dataclass
class HowToGuide:
    # ... existing fields ...
    language: str = "python"  # Source file language
```

2. **Set at creation** (line 955, in `_create_guide_from_workflow`):
```python
HowToGuide(
    # ... other fields ...
    language=primary_workflow.get("language", "python"),
)
```

3. **Use the field** (line 1018):
```python
"language": guide.language,
```

**Note:** The `primary_workflow` dict already carries the language field (populated by test example extractor upstream at line 169). Zero new imports needed.

**Effort:** 5 minutes  
**Impact:** Medium - AI receives correct language context  
**Breaking Change:** No

---

## Part 2: Dead Code / TODO Implementation

### 🔴 P0 - Critical TODOs (Implement Now)

#### 2.1 SMTP Email Notifications
**File:** `src/skill_seekers/sync/notifier.py:138`  

**Current:**
```python
# TODO: Implement SMTP email sending
```

**Implementation:**
```python
def _send_email_smtp(self, to_email: str, subject: str, body: str) -> bool:
    """Send email via SMTP."""
    import smtplib
    from email.mime.text import MIMEText
    
    smtp_host = os.environ.get("SKILL_SEEKERS_SMTP_HOST", "localhost")
    smtp_port = int(os.environ.get("SKILL_SEEKERS_SMTP_PORT", "587"))
    smtp_user = os.environ.get("SKILL_SEEKERS_SMTP_USER")
    smtp_pass = os.environ.get("SKILL_SEEKERS_SMTP_PASS")
    
    if not all([smtp_user, smtp_pass]):
        logger.warning("SMTP credentials not configured")
        return False
    
    try:
        msg = MIMEText(body)
        msg["Subject"] = subject
        msg["From"] = smtp_user
        msg["To"] = to_email
        
        with smtplib.SMTP(smtp_host, smtp_port) as server:
            server.starttls()
            server.login(smtp_user, smtp_pass)
            server.send_message(msg)
        return True
    except Exception as e:
        logger.error(f"Failed to send email: {e}")
        return False
```

**Effort:** 4 hours  
**Dependencies:** Environment variables for SMTP config  
**Breaking Change:** No

---

### 🟠 P1 - High Priority (Next Sprint)

#### 2.2 Auto-Update Integration
**File:** `src/skill_seekers/sync/monitor.py:201`

**Current:**
```python
# TODO: Integrate with doc_scraper to rebuild skill
```

**Implementation:** Call existing scraper commands when changes detected.

```python
def _rebuild_skill(self, config_path: str) -> bool:
    """Rebuild skill when changes detected."""
    import subprocess
    
    # Use existing create command
    result = subprocess.run(
        ["skill-seekers", "create", config_path, "--force"],
        capture_output=True,
        text=True,
        timeout=300  # 5 minute timeout
    )
    
    return result.returncode == 0
```

**Effort:** 3 hours  
**Dependencies:** Ensure `skill-seekers` CLI available in PATH  
**Breaking Change:** No

---

#### 2.3 Language Detection in How-To Guides
**File:** `src/skill_seekers/cli/how_to_guide_builder.py:1018`

**Current:**
```python
"language": "python",  # TODO: Detect from code
```

**Implementation:** Use existing `LanguageDetector`:

```python
from skill_seekers.cli.language_detector import LanguageDetector

detector = LanguageDetector(min_confidence=0.3)
language, confidence = detector.detect_from_text(code)
if confidence < 0.3:
    language = "text"  # Fallback
```

**Effort:** 1 hour  
**Dependencies:** Existing LanguageDetector class  
**Breaking Change:** No

---

### 🟡 P2 - Medium Priority (Backlog)

#### 2.4 Custom Transform System
**File:** `src/skill_seekers/cli/enhancement_workflow.py:439`

**Current:**
```python
# TODO: Implement custom transform system
```

**Purpose:** Allow users to define custom code transformations in workflow YAML.

**Implementation Sketch:**
```yaml
# Example workflow addition
transforms:
  - name: "Remove boilerplate"
    pattern: "Copyright \(c\) \d+"
    action: "remove"
  - name: "Normalize headers"
    pattern: "^#{1,6} "
    replacement: "## "
```

**Effort:** 8 hours  
**Impact:** Medium - Power user feature  
**Breaking Change:** No

---

#### 2.5 Vector Database Storage for Embeddings
**File:** `src/skill_seekers/embedding/server.py:268`

**Current:**
```python
# TODO: Store embeddings in vector database
```

**Implementation Options:**
- Option A: ChromaDB integration (already have adaptor)
- Option B: Qdrant integration (already have adaptor)
- Option C: SQLite with vector extension (simplest)

**Recommendation:** Start with SQLite + `sqlite-vec` for zero-config setup.

**Effort:** 6 hours  
**Dependencies:** New dependency `sqlite-vec`  
**Breaking Change:** No

---

### 🟢 P3 - Backlog / Low Priority

#### 2.6 URL Resolution in Sync Monitor
**File:** `src/skill_seekers/sync/monitor.py:136`

**Current:**
```python
# TODO: In real implementation, get actual URLs from scraper
```

**Note:** Current implementation uses placeholder URLs. Full implementation requires scraper to expose URL list.

**Effort:** 4 hours  
**Impact:** Low - Current implementation works for basic use  
**Breaking Change:** No

---

## Implementation Schedule

### Week 1: Critical Fixes
- [ ] Remove `[:5]` limit in `enhance_skill_local.py` (P0)
- [ ] Remove `[:500]` truncation from reference files (P1)
- [ ] Remove `[:5]` table row limit (P1)

### Week 2: Notifications & Integration
- [ ] Implement SMTP notifications (P0)
- [ ] Implement auto-update in sync monitor (P1)
- [ ] Fix language detection in how-to guides (P1)

### Week 3: Configurability
- [ ] Add `--max-patterns`, `--max-issues` CLI flags (P2)
- [ ] Add verbose mode for display limits (P2)
- [ ] Add `--max-code-blocks` for enhancement (P2)

### Backlog
- [ ] Custom transform system (P2)
- [ ] Vector DB storage for embeddings (P2)
- [ ] URL resolution in sync monitor (P3)

---

## Testing Strategy

### For Limit Removal
1. Create test skill with 100+ code blocks
2. Verify enhancement sees all code (or token-based limit)
3. Verify reference files contain complete code
4. Verify SKILL.md still has appropriate summaries

### For Hardcoded Language Fixes
1. Create skill from JavaScript/Go/Rust test examples
2. Verify `unified_skill_builder.py` outputs correct language tag in markdown
3. Verify `how_to_guide_builder.py` uses correct language in AI prompt

### For TODO Implementation
1. SMTP: Mock SMTP server test
2. Auto-update: Mock subprocess test
3. Language detection: Test with mixed-language code samples

---

## Success Metrics

| Metric | Before | After |
|--------|--------|-------|
| Code blocks in enhancement | 5 max | Token-based (40+) |
| Code truncation in refs | 500 chars | Full code |
| Table rows in refs | 5 max | All rows |
| Code snippet language | Always "python" | Correct language |
| Guide language | Always "python" | Source file language |
| Email notifications | Webhook only | SMTP + webhook |
| Auto-update | Manual only | Automatic |

---

## Appendix: Files Modified

### Limit Removals
- `src/skill_seekers/cli/enhance_skill_local.py`
- `src/skill_seekers/cli/codebase_scraper.py`
- `src/skill_seekers/cli/unified_skill_builder.py`
- `src/skill_seekers/cli/word_scraper.py`

### Hardcoded Language Fixes
- `src/skill_seekers/cli/unified_skill_builder.py` (line 1298)
- `src/skill_seekers/cli/how_to_guide_builder.py` (dataclass + lines 955, 1018)

### TODO Implementations
- `src/skill_seekers/sync/notifier.py`
- `src/skill_seekers/sync/monitor.py`
- `src/skill_seekers/cli/how_to_guide_builder.py`
- `src/skill_seekers/cli/github_scraper.py` (new flags)
- `src/skill_seekers/cli/config_manager.py` (verbose mode)

---

*This document should be reviewed and updated after each implementation phase.*