Files
firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
The Chronicler b32afdd1db Task #9: Rewrite AI Stack architecture for DERP compliance
Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
2026-02-18 17:27:25 +00:00

368 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI Stack Resource Requirements
**Server:** TX1 Dallas (38.68.14.26)
**Purpose:** Resource allocation planning
**Last Updated:** 2026-02-18
---
## TX1 Server Specifications
**CPU:** 32 vCPU
**RAM:** 256GB
**Storage:** 1TB NVMe SSD
**Location:** Dallas, TX
**Network:** 1Gbps
**Current Usage (before AI stack):**
- Game servers: 6 Minecraft instances
- Management services: Minimal overhead
- Available for AI: Significant capacity
---
## Storage Requirements
### Component Breakdown
| Component | Size | Purpose |
|-----------|------|---------|
| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model |
| **Llama 3.3 70B** | ~40GB | General reasoning model |
| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model |
| **Dify Services** | ~5GB | Docker containers, databases |
| **Knowledge Base** | ~5GB | Indexed docs, embeddings |
| **Logs & Temp** | ~2GB | Operational overhead |
| **Total** | **~99GB** | ✅ Well under 1TB limit |
### Storage Growth Estimate
**Year 1:**
- Models: 87GB (static, no growth unless upgrading)
- Knowledge base: 5GB → 8GB (as docs grow)
- Logs: 2GB → 5GB (6 months rotation)
- **Total Year 1:** ~100GB
**Storage is NOT a concern.**
---
## RAM Requirements
### Scenario 1: Normal Operations (Claude Available)
| Component | RAM Usage |
|-----------|-----------|
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama (idle)** | <1GB |
| **Total (idle)** | **~8GB** ✅ |
**Game servers have ~248GB available** (256GB - 8GB)
---
### Scenario 2: DERP Activated (Claude Down, Emergency)
**Load ONE large model at a time:**
| Component | RAM Usage |
|-----------|-----------|
| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB |
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama Runtime** | ~2GB |
| **OS Overhead** | ~3GB |
| **Total (active DERP)** | **~92GB** ✅ |
**Game servers have ~164GB available** (256GB - 92GB)
**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers)
---
### Scenario 3: Vision Model Only (Screenshot Analysis)
| Component | RAM Usage |
|-----------|-----------|
| **Llama 3.2 Vision 11B** | ~7GB |
| **Dify Services** | ~4GB |
| **Other Services** | ~3GB |
| **Total** | **~14GB** ✅ |
**Very lightweight, can run alongside game servers with no impact**
---
## CPU Requirements
### Model Inference Performance
**TX1 has 32 vCPU (shared among all services)**
**Expected Inference Times:**
| Model | Token Generation Speed | Typical Response |
|-------|----------------------|------------------|
| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds |
**For comparison:**
- Claude API: 20-40 tokens/second
- **DERP is 5-10× slower** (this is expected and acceptable for emergency use)
**CPU Impact on Game Servers:**
- During DERP inference: ~70-80% CPU usage (temporary spikes)
- Game servers may experience brief lag during AI responses
- **Acceptable for emergency use** (not for normal operations)
---
## Network Requirements
### Initial Model Downloads (One-Time)
| Model | Size | Download Time (1Gbps) |
|-------|------|----------------------|
| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes |
| **Llama 3.3 70B** | ~40GB | 5-10 minutes |
| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes |
| **Total** | **~87GB** | **15-25 minutes** |
**Reality:** Download speeds vary, budget 2-4 hours for all models.
**Recommendation:** Download overnight to avoid impacting game server traffic.
---
### Ongoing Bandwidth
**Dify Web Interface:**
- Minimal (text-based queries)
- ~1-5 KB per query
- Negligible impact
**Discord Bot:**
- Text-based queries only
- ~1-5 KB per query
- Negligible impact
**Model Updates:**
- Infrequent (quarterly at most)
- Same as initial download (~87GB)
- Schedule during low-traffic periods
---
## Resource Allocation Strategy
### Priority Levels
**Priority 1 (Always):** Game Servers
**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.)
**Priority 3 (Emergency Only):** DERP AI Stack
### RAM Allocation Rules
**Normal Operations:**
- Game servers: Up to 240GB
- Management: ~8GB
- AI Stack (idle): ~8GB
- **Total: 256GB** ✅
**DERP Emergency:**
- Game servers: Temporarily limited to 160GB
- Management: ~8GB
- AI Stack (active): ~92GB
- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods)
**If RAM pressure occurs during DERP:**
1. Unload one game server temporarily
2. Run AI query
3. Reload game server
4. **Total downtime per query: <5 minutes**
---
## Monitoring & Alerts
### Critical Thresholds
**RAM Usage:**
- **Warning:** >220GB used (85%)
- **Critical:** >240GB used (93%)
- **Action:** Defer DERP usage or unload game server
**CPU Usage:**
- **Warning:** >80% sustained for >5 minutes
- **Critical:** >90% sustained for >2 minutes
- **Action:** Pause AI inference, prioritize game servers
**Storage:**
- **Warning:** >800GB used (80%)
- **Critical:** >900GB used (90%)
- **Action:** Clean up old logs, model cache
### Monitoring Commands
```bash
# Check RAM
free -h
# Check CPU
htop
# Check storage
df -h /
# Check Ollama status
ollama list
ollama ps # Shows loaded models
# Check Dify
cd /opt/dify
docker-compose ps
docker stats # Real-time resource usage
```
---
## Resource Optimization
### Unload Models When Not Needed
```bash
# Unload all models (frees RAM)
ollama stop qwen2.5-coder:72b
ollama stop llama3.3:70b
ollama stop llama3.2-vision:11b
# Verify RAM freed
free -h
```
### Preload Models for Faster Response
```bash
# Preload model (takes ~30 seconds)
ollama run qwen2.5-coder:72b ""
# Model now in RAM, queries will be faster
```
### Schedule Maintenance Windows
**Best time for model downloads/updates:**
- Tuesday/Wednesday 2-6 AM CST (lowest traffic)
- Announce in Discord 24 hours ahead
- Expected downtime: <10 minutes
---
## Capacity Planning
### Current State (Feb 2026)
- **Game servers:** 6 active
- **RAM available:** 256GB
- **Storage available:** 1TB
- **AI stack:** Fits comfortably
### Growth Scenarios
**Scenario 1: Add 6 more game servers (12 total)**
- Additional RAM needed: ~60GB
- Available for AI (normal): 248GB → 188GB ✅
- Available for AI (DERP): 164GB → 104GB ✅
- **Status:** Still viable
**Scenario 2: Add 12 more game servers (18 total)**
- Additional RAM needed: ~120GB
- Available for AI (normal): 248GB → 128GB ✅
- Available for AI (DERP): 164GB → 44GB ⚠️
- **Status:** DERP would require unloading 2 game servers
**Scenario 3: Upgrade to larger models (theoretical)**
- Qwen 3.0 Coder 170B: ~180GB RAM
- **Status:** Would NOT fit alongside game servers
- **Recommendation:** Stick with 72B models
### Upgrade Path
**If TX1 reaches capacity:**
**Option A: Add second dedicated AI server**
- Move AI stack to separate VPS
- TX1 focuses only on game servers
- Cost: ~$100-200/month (NOT DERP-compliant)
**Option B: Upgrade TX1 RAM**
- 256GB → 512GB
- Cost: Contact Hetzner for pricing
- **Preferred:** Maintains DERP compliance
**Option C: Use smaller AI models**
- Qwen 2.5 Coder 32B (~35GB RAM)
- Llama 3.2 8B (~8GB RAM)
- **Tradeoff:** Lower quality, but more capacity
---
## Disaster Recovery
### Backup Strategy
**What to backup:**
- ✅ Dify configuration files
- ✅ Knowledge base data
- ✅ Discord bot code
- ❌ Models (can re-download)
**Backup location:**
- Git repository (for configs/code)
- NC1 Charlotte (for knowledge base)
**Backup frequency:**
- Configurations: After every change
- Knowledge base: Weekly
- Models: No backup needed
### Recovery Procedure
**If TX1 fails completely:**
1. Deploy Dify on NC1 (temporary)
2. Restore knowledge base from backup
3. Re-download models (~4 hours)
4. Point Discord bot to NC1
5. **Downtime: 4-6 hours**
**Note:** This is acceptable for DERP (emergency-only system)
---
## Cost Analysis
### One-Time Costs
- Setup time: 6-8 hours (Michael's time)
- Model downloads: Bandwidth usage (included in hosting)
- **Total: $0** (sweat equity only)
### Monthly Costs
- Hosting: $0 (using existing TX1)
- Bandwidth: $0 (included in hosting)
- Maintenance: ~1 hour/month (Michael's time)
- **Total: $0/month** ✅
### Opportunity Cost
- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
- Could host 1-2 more game servers in that space
- **Acceptable tradeoff:** DERP independence worth more than 2 game servers
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**