# AI Stack Resource Requirements **Server:** TX1 Dallas (38.68.14.26) **Purpose:** Resource allocation planning **Last Updated:** 2026-02-18 --- ## TX1 Server Specifications **CPU:** 32 vCPU **RAM:** 256GB **Storage:** 1TB NVMe SSD **Location:** Dallas, TX **Network:** 1Gbps **Current Usage (before AI stack):** - Game servers: 6 Minecraft instances - Management services: Minimal overhead - Available for AI: Significant capacity --- ## Storage Requirements ### Component Breakdown | Component | Size | Purpose | |-----------|------|---------| | **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model | | **Llama 3.3 70B** | ~40GB | General reasoning model | | **Llama 3.2 Vision 11B** | ~7GB | Image analysis model | | **Dify Services** | ~5GB | Docker containers, databases | | **Knowledge Base** | ~5GB | Indexed docs, embeddings | | **Logs & Temp** | ~2GB | Operational overhead | | **Total** | **~99GB** | ✅ Well under 1TB limit | ### Storage Growth Estimate **Year 1:** - Models: 87GB (static, no growth unless upgrading) - Knowledge base: 5GB → 8GB (as docs grow) - Logs: 2GB → 5GB (6 months rotation) - **Total Year 1:** ~100GB **Storage is NOT a concern.** --- ## RAM Requirements ### Scenario 1: Normal Operations (Claude Available) | Component | RAM Usage | |-----------|-----------| | **Dify Services** | ~4GB | | **PostgreSQL** | ~2GB | | **Redis** | ~1GB | | **Ollama (idle)** | <1GB | | **Total (idle)** | **~8GB** ✅ | **Game servers have ~248GB available** (256GB - 8GB) --- ### Scenario 2: DERP Activated (Claude Down, Emergency) **Load ONE large model at a time:** | Component | RAM Usage | |-----------|-----------| | **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB | | **Dify Services** | ~4GB | | **PostgreSQL** | ~2GB | | **Redis** | ~1GB | | **Ollama Runtime** | ~2GB | | **OS Overhead** | ~3GB | | **Total (active DERP)** | **~92GB** ✅ | **Game servers have ~164GB available** (256GB - 92GB) **Critical:** DO NOT load both large models simultaneously (160GB would impact game servers) --- ### Scenario 3: Vision Model Only (Screenshot Analysis) | Component | RAM Usage | |-----------|-----------| | **Llama 3.2 Vision 11B** | ~7GB | | **Dify Services** | ~4GB | | **Other Services** | ~3GB | | **Total** | **~14GB** ✅ | **Very lightweight, can run alongside game servers with no impact** --- ## CPU Requirements ### Model Inference Performance **TX1 has 32 vCPU (shared among all services)** **Expected Inference Times:** | Model | Token Generation Speed | Typical Response | |-------|----------------------|------------------| | **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds | | **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds | | **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds | **For comparison:** - Claude API: 20-40 tokens/second - **DERP is 5-10× slower** (this is expected and acceptable for emergency use) **CPU Impact on Game Servers:** - During DERP inference: ~70-80% CPU usage (temporary spikes) - Game servers may experience brief lag during AI responses - **Acceptable for emergency use** (not for normal operations) --- ## Network Requirements ### Initial Model Downloads (One-Time) | Model | Size | Download Time (1Gbps) | |-------|------|----------------------| | **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes | | **Llama 3.3 70B** | ~40GB | 5-10 minutes | | **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes | | **Total** | **~87GB** | **15-25 minutes** | **Reality:** Download speeds vary, budget 2-4 hours for all models. **Recommendation:** Download overnight to avoid impacting game server traffic. --- ### Ongoing Bandwidth **Dify Web Interface:** - Minimal (text-based queries) - ~1-5 KB per query - Negligible impact **Discord Bot:** - Text-based queries only - ~1-5 KB per query - Negligible impact **Model Updates:** - Infrequent (quarterly at most) - Same as initial download (~87GB) - Schedule during low-traffic periods --- ## Resource Allocation Strategy ### Priority Levels **Priority 1 (Always):** Game Servers **Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.) **Priority 3 (Emergency Only):** DERP AI Stack ### RAM Allocation Rules **Normal Operations:** - Game servers: Up to 240GB - Management: ~8GB - AI Stack (idle): ~8GB - **Total: 256GB** ✅ **DERP Emergency:** - Game servers: Temporarily limited to 160GB - Management: ~8GB - AI Stack (active): ~92GB - **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods) **If RAM pressure occurs during DERP:** 1. Unload one game server temporarily 2. Run AI query 3. Reload game server 4. **Total downtime per query: <5 minutes** --- ## Monitoring & Alerts ### Critical Thresholds **RAM Usage:** - **Warning:** >220GB used (85%) - **Critical:** >240GB used (93%) - **Action:** Defer DERP usage or unload game server **CPU Usage:** - **Warning:** >80% sustained for >5 minutes - **Critical:** >90% sustained for >2 minutes - **Action:** Pause AI inference, prioritize game servers **Storage:** - **Warning:** >800GB used (80%) - **Critical:** >900GB used (90%) - **Action:** Clean up old logs, model cache ### Monitoring Commands ```bash # Check RAM free -h # Check CPU htop # Check storage df -h / # Check Ollama status ollama list ollama ps # Shows loaded models # Check Dify cd /opt/dify docker-compose ps docker stats # Real-time resource usage ``` --- ## Resource Optimization ### Unload Models When Not Needed ```bash # Unload all models (frees RAM) ollama stop qwen2.5-coder:72b ollama stop llama3.3:70b ollama stop llama3.2-vision:11b # Verify RAM freed free -h ``` ### Preload Models for Faster Response ```bash # Preload model (takes ~30 seconds) ollama run qwen2.5-coder:72b "" # Model now in RAM, queries will be faster ``` ### Schedule Maintenance Windows **Best time for model downloads/updates:** - Tuesday/Wednesday 2-6 AM CST (lowest traffic) - Announce in Discord 24 hours ahead - Expected downtime: <10 minutes --- ## Capacity Planning ### Current State (Feb 2026) - **Game servers:** 6 active - **RAM available:** 256GB - **Storage available:** 1TB - **AI stack:** Fits comfortably ### Growth Scenarios **Scenario 1: Add 6 more game servers (12 total)** - Additional RAM needed: ~60GB - Available for AI (normal): 248GB → 188GB ✅ - Available for AI (DERP): 164GB → 104GB ✅ - **Status:** Still viable **Scenario 2: Add 12 more game servers (18 total)** - Additional RAM needed: ~120GB - Available for AI (normal): 248GB → 128GB ✅ - Available for AI (DERP): 164GB → 44GB ⚠️ - **Status:** DERP would require unloading 2 game servers **Scenario 3: Upgrade to larger models (theoretical)** - Qwen 3.0 Coder 170B: ~180GB RAM - **Status:** Would NOT fit alongside game servers - **Recommendation:** Stick with 72B models ### Upgrade Path **If TX1 reaches capacity:** **Option A: Add second dedicated AI server** - Move AI stack to separate VPS - TX1 focuses only on game servers - Cost: ~$100-200/month (NOT DERP-compliant) **Option B: Upgrade TX1 RAM** - 256GB → 512GB - Cost: Contact Hetzner for pricing - **Preferred:** Maintains DERP compliance **Option C: Use smaller AI models** - Qwen 2.5 Coder 32B (~35GB RAM) - Llama 3.2 8B (~8GB RAM) - **Tradeoff:** Lower quality, but more capacity --- ## Disaster Recovery ### Backup Strategy **What to backup:** - ✅ Dify configuration files - ✅ Knowledge base data - ✅ Discord bot code - ❌ Models (can re-download) **Backup location:** - Git repository (for configs/code) - NC1 Charlotte (for knowledge base) **Backup frequency:** - Configurations: After every change - Knowledge base: Weekly - Models: No backup needed ### Recovery Procedure **If TX1 fails completely:** 1. Deploy Dify on NC1 (temporary) 2. Restore knowledge base from backup 3. Re-download models (~4 hours) 4. Point Discord bot to NC1 5. **Downtime: 4-6 hours** **Note:** This is acceptable for DERP (emergency-only system) --- ## Cost Analysis ### One-Time Costs - Setup time: 6-8 hours (Michael's time) - Model downloads: Bandwidth usage (included in hosting) - **Total: $0** (sweat equity only) ### Monthly Costs - Hosting: $0 (using existing TX1) - Bandwidth: $0 (included in hosting) - Maintenance: ~1 hour/month (Michael's time) - **Total: $0/month** ✅ ### Opportunity Cost - RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP) - Could host 1-2 more game servers in that space - **Acceptable tradeoff:** DERP independence worth more than 2 game servers --- **Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ **TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**