firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md

# AI Stack Resource Requirements

**Server:** TX1 Dallas (38.68.14.26)
**Purpose:** Resource allocation planning
**Last Updated:** 2026-02-18

---

## TX1 Server Specifications

**CPU:** 32 vCPU
**RAM:** 256GB
**Storage:** 1TB NVMe SSD
**Location:** Dallas, TX
**Network:** 1Gbps

**Current Usage (before AI stack):**
- Game servers: 6 Minecraft instances
- Management services: Minimal overhead
- Available for AI: Significant capacity

---

## Storage Requirements

### Component Breakdown

| Component | Size | Purpose |
|-----------|------|---------|
| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model |
| **Llama 3.3 70B** | ~40GB | General reasoning model |
| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model |
| **Dify Services** | ~5GB | Docker containers, databases |
| **Knowledge Base** | ~5GB | Indexed docs, embeddings |
| **Logs & Temp** | ~2GB | Operational overhead |
| **Total** | **~99GB** | ✅ Well under 1TB limit |

### Storage Growth Estimate

**Year 1:**
- Models: 87GB (static, no growth unless upgrading)
- Knowledge base: 5GB → 8GB (as docs grow)
- Logs: 2GB → 5GB (6 months rotation)
- **Total Year 1:** ~100GB

**Storage is NOT a concern.**

---

## RAM Requirements

### Scenario 1: Normal Operations (Claude Available)

| Component | RAM Usage |
|-----------|-----------|
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama (idle)** | <1GB |
| **Total (idle)** | **~8GB** ✅ |

**Game servers have ~248GB available** (256GB - 8GB)

---

### Scenario 2: DERP Activated (Claude Down, Emergency)

**Load ONE large model at a time:**

| Component | RAM Usage |
|-----------|-----------|
| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB |
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama Runtime** | ~2GB |
| **OS Overhead** | ~3GB |
| **Total (active DERP)** | **~92GB** ✅ |

**Game servers have ~164GB available** (256GB - 92GB)

**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers)

---

### Scenario 3: Vision Model Only (Screenshot Analysis)

| Component | RAM Usage |
|-----------|-----------|
| **Llama 3.2 Vision 11B** | ~7GB |
| **Dify Services** | ~4GB |
| **Other Services** | ~3GB |
| **Total** | **~14GB** ✅ |

**Very lightweight, can run alongside game servers with no impact**

---

## CPU Requirements

### Model Inference Performance

**TX1 has 32 vCPU (shared among all services)**

**Expected Inference Times:**

| Model | Token Generation Speed | Typical Response |
|-------|----------------------|------------------|
| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds |

**For comparison:**
- Claude API: 20-40 tokens/second
- **DERP is 5-10× slower** (this is expected and acceptable for emergency use)

**CPU Impact on Game Servers:**
- During DERP inference: ~70-80% CPU usage (temporary spikes)
- Game servers may experience brief lag during AI responses
- **Acceptable for emergency use** (not for normal operations)

---

## Network Requirements

### Initial Model Downloads (One-Time)

| Model | Size | Download Time (1Gbps) |
|-------|------|----------------------|
| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes |
| **Llama 3.3 70B** | ~40GB | 5-10 minutes |
| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes |
| **Total** | **~87GB** | **15-25 minutes** |

**Reality:** Download speeds vary, budget 2-4 hours for all models.

**Recommendation:** Download overnight to avoid impacting game server traffic.

---

### Ongoing Bandwidth

**Dify Web Interface:**
- Minimal (text-based queries)
- ~1-5 KB per query
- Negligible impact

**Discord Bot:**
- Text-based queries only
- ~1-5 KB per query
- Negligible impact

**Model Updates:**
- Infrequent (quarterly at most)
- Same as initial download (~87GB)
- Schedule during low-traffic periods

---

## Resource Allocation Strategy

### Priority Levels

**Priority 1 (Always):** Game Servers
**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.)
**Priority 3 (Emergency Only):** DERP AI Stack

### RAM Allocation Rules

**Normal Operations:**
- Game servers: Up to 240GB
- Management: ~8GB
- AI Stack (idle): ~8GB
- **Total: 256GB** ✅

**DERP Emergency:**
- Game servers: Temporarily limited to 160GB
- Management: ~8GB
- AI Stack (active): ~92GB
- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods)

**If RAM pressure occurs during DERP:**
1. Unload one game server temporarily
2. Run AI query
3. Reload game server
4. **Total downtime per query: <5 minutes**

---

## Monitoring & Alerts

### Critical Thresholds

**RAM Usage:**
- **Warning:** >220GB used (85%)
- **Critical:** >240GB used (93%)
- **Action:** Defer DERP usage or unload game server

**CPU Usage:**
- **Warning:** >80% sustained for >5 minutes
- **Critical:** >90% sustained for >2 minutes
- **Action:** Pause AI inference, prioritize game servers

**Storage:**
- **Warning:** >800GB used (80%)
- **Critical:** >900GB used (90%)
- **Action:** Clean up old logs, model cache

### Monitoring Commands

```bash
# Check RAM
free -h

# Check CPU
htop

# Check storage
df -h /

# Check Ollama status
ollama list
ollama ps  # Shows loaded models

# Check Dify
cd /opt/dify
docker-compose ps
docker stats  # Real-time resource usage
```

---

## Resource Optimization

### Unload Models When Not Needed

```bash
# Unload all models (frees RAM)
ollama stop qwen2.5-coder:72b
ollama stop llama3.3:70b
ollama stop llama3.2-vision:11b

# Verify RAM freed
free -h
```

### Preload Models for Faster Response

```bash
# Preload model (takes ~30 seconds)
ollama run qwen2.5-coder:72b ""
# Model now in RAM, queries will be faster
```

### Schedule Maintenance Windows

**Best time for model downloads/updates:**
- Tuesday/Wednesday 2-6 AM CST (lowest traffic)
- Announce in Discord 24 hours ahead
- Expected downtime: <10 minutes

---

## Capacity Planning

### Current State (Feb 2026)
- **Game servers:** 6 active
- **RAM available:** 256GB
- **Storage available:** 1TB
- **AI stack:** Fits comfortably

### Growth Scenarios

**Scenario 1: Add 6 more game servers (12 total)**
- Additional RAM needed: ~60GB
- Available for AI (normal): 248GB → 188GB ✅
- Available for AI (DERP): 164GB → 104GB ✅
- **Status:** Still viable

**Scenario 2: Add 12 more game servers (18 total)**
- Additional RAM needed: ~120GB
- Available for AI (normal): 248GB → 128GB ✅
- Available for AI (DERP): 164GB → 44GB ⚠️
- **Status:** DERP would require unloading 2 game servers

**Scenario 3: Upgrade to larger models (theoretical)**
- Qwen 3.0 Coder 170B: ~180GB RAM
- **Status:** Would NOT fit alongside game servers
- **Recommendation:** Stick with 72B models

### Upgrade Path

**If TX1 reaches capacity:**

**Option A: Add second dedicated AI server**
- Move AI stack to separate VPS
- TX1 focuses only on game servers
- Cost: ~$100-200/month (NOT DERP-compliant)

**Option B: Upgrade TX1 RAM**
- 256GB → 512GB
- Cost: Contact Hetzner for pricing
- **Preferred:** Maintains DERP compliance

**Option C: Use smaller AI models**
- Qwen 2.5 Coder 32B (~35GB RAM)
- Llama 3.2 8B (~8GB RAM)
- **Tradeoff:** Lower quality, but more capacity

---

## Disaster Recovery

### Backup Strategy

**What to backup:**
- ✅ Dify configuration files
- ✅ Knowledge base data
- ✅ Discord bot code
- ❌ Models (can re-download)

**Backup location:**
- Git repository (for configs/code)
- NC1 Charlotte (for knowledge base)

**Backup frequency:**
- Configurations: After every change
- Knowledge base: Weekly
- Models: No backup needed

### Recovery Procedure

**If TX1 fails completely:**

1. Deploy Dify on NC1 (temporary)
2. Restore knowledge base from backup
3. Re-download models (~4 hours)
4. Point Discord bot to NC1
5. **Downtime: 4-6 hours**

**Note:** This is acceptable for DERP (emergency-only system)

---

## Cost Analysis

### One-Time Costs
- Setup time: 6-8 hours (Michael's time)
- Model downloads: Bandwidth usage (included in hosting)
- **Total: $0** (sweat equity only)

### Monthly Costs
- Hosting: $0 (using existing TX1)
- Bandwidth: $0 (included in hosting)
- Maintenance: ~1 hour/month (Michael's time)
- **Total: $0/month** ✅

### Opportunity Cost
- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
- Could host 1-2 more game servers in that space
- **Acceptable tradeoff:** DERP independence worth more than 2 game servers

---

**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️

**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**