Files
firefrost-operations-manual/docs/tasks/_archive/self-hosted-ai-stack-on-tx1/resource-requirements.md
Claude f239451362 chore: Task cleanup - archive 3, delete 11 obsolete folders
Archive threshold: ≥50KB OR ≥4 files

Archived to _archive/:
- firefrost-codex-migration-to-open-webui (127K, 9 files)
- whitelist-manager (65K, 5 files)
- self-hosted-ai-stack-on-tx1 (35K, 4 files)

Deleted (obsolete/superseded):
- builder-rank-holly-setup
- consultant-photo-processing
- ghost-theme-migration (empty)
- gitea-plane-integration (Plane abandoned)
- gitea-upgrade (Kanban approach abandoned)
- plane-deployment (superseded by decommission)
- pterodactyl-blueprint-asset-build (fold into #26)
- pterodactyl-modpack-version-display (fold into #26)
- scope-document-corrections (too vague)
- scoped-gitea-token (honor system working)
- whitelist-manager-v1-12-compatibility (rolled into Trinity Console)

Also added: Gemini task management consolidation consultation

Chronicler #69
2026-04-08 14:17:26 +00:00

8.5 KiB
Raw Blame History

AI Stack Resource Requirements

Server: TX1 Dallas (38.68.14.26)
Purpose: Resource allocation planning
Last Updated: 2026-02-18


TX1 Server Specifications

CPU: 32 vCPU
RAM: 256GB
Storage: 1TB NVMe SSD
Location: Dallas, TX
Network: 1Gbps

Current Usage (before AI stack):

  • Game servers: 6 Minecraft instances
  • Management services: Minimal overhead
  • Available for AI: Significant capacity

Storage Requirements

Component Breakdown

Component Size Purpose
Qwen 2.5 Coder 72B ~40GB Infrastructure/coding model
Llama 3.3 70B ~40GB General reasoning model
Llama 3.2 Vision 11B ~7GB Image analysis model
Dify Services ~5GB Docker containers, databases
Knowledge Base ~5GB Indexed docs, embeddings
Logs & Temp ~2GB Operational overhead
Total ~99GB Well under 1TB limit

Storage Growth Estimate

Year 1:

  • Models: 87GB (static, no growth unless upgrading)
  • Knowledge base: 5GB → 8GB (as docs grow)
  • Logs: 2GB → 5GB (6 months rotation)
  • Total Year 1: ~100GB

Storage is NOT a concern.


RAM Requirements

Scenario 1: Normal Operations (Claude Available)

Component RAM Usage
Dify Services ~4GB
PostgreSQL ~2GB
Redis ~1GB
Ollama (idle) <1GB
Total (idle) ~8GB

Game servers have ~248GB available (256GB - 8GB)


Scenario 2: DERP Activated (Claude Down, Emergency)

Load ONE large model at a time:

Component RAM Usage
Qwen 2.5 Coder 72B OR Llama 3.3 70B ~80GB
Dify Services ~4GB
PostgreSQL ~2GB
Redis ~1GB
Ollama Runtime ~2GB
OS Overhead ~3GB
Total (active DERP) ~92GB

Game servers have ~164GB available (256GB - 92GB)

Critical: DO NOT load both large models simultaneously (160GB would impact game servers)


Scenario 3: Vision Model Only (Screenshot Analysis)

Component RAM Usage
Llama 3.2 Vision 11B ~7GB
Dify Services ~4GB
Other Services ~3GB
Total ~14GB

Very lightweight, can run alongside game servers with no impact


CPU Requirements

Model Inference Performance

TX1 has 32 vCPU (shared among all services)

Expected Inference Times:

Model Token Generation Speed Typical Response
Qwen 2.5 Coder 72B ~3-5 tokens/second 30-120 seconds
Llama 3.3 70B ~3-5 tokens/second 30-120 seconds
Llama 3.2 Vision 11B ~8-12 tokens/second 10-45 seconds

For comparison:

  • Claude API: 20-40 tokens/second
  • DERP is 5-10× slower (this is expected and acceptable for emergency use)

CPU Impact on Game Servers:

  • During DERP inference: ~70-80% CPU usage (temporary spikes)
  • Game servers may experience brief lag during AI responses
  • Acceptable for emergency use (not for normal operations)

Network Requirements

Initial Model Downloads (One-Time)

Model Size Download Time (1Gbps)
Qwen 2.5 Coder 72B ~40GB 5-10 minutes
Llama 3.3 70B ~40GB 5-10 minutes
Llama 3.2 Vision 11B ~7GB 1-2 minutes
Total ~87GB 15-25 minutes

Reality: Download speeds vary, budget 2-4 hours for all models.

Recommendation: Download overnight to avoid impacting game server traffic.


Ongoing Bandwidth

Dify Web Interface:

  • Minimal (text-based queries)
  • ~1-5 KB per query
  • Negligible impact

Discord Bot:

  • Text-based queries only
  • ~1-5 KB per query
  • Negligible impact

Model Updates:

  • Infrequent (quarterly at most)
  • Same as initial download (~87GB)
  • Schedule during low-traffic periods

Resource Allocation Strategy

Priority Levels

Priority 1 (Always): Game Servers
Priority 2 (Normal): Management Services (Pterodactyl, Gitea, etc.)
Priority 3 (Emergency Only): DERP AI Stack

RAM Allocation Rules

Normal Operations:

  • Game servers: Up to 240GB
  • Management: ~8GB
  • AI Stack (idle): ~8GB
  • Total: 256GB

DERP Emergency:

  • Game servers: Temporarily limited to 160GB
  • Management: ~8GB
  • AI Stack (active): ~92GB
  • Total: 260GB ⚠️ (4GB overcommit acceptable for brief periods)

If RAM pressure occurs during DERP:

  1. Unload one game server temporarily
  2. Run AI query
  3. Reload game server
  4. Total downtime per query: <5 minutes

Monitoring & Alerts

Critical Thresholds

RAM Usage:

  • Warning: >220GB used (85%)
  • Critical: >240GB used (93%)
  • Action: Defer DERP usage or unload game server

CPU Usage:

  • Warning: >80% sustained for >5 minutes
  • Critical: >90% sustained for >2 minutes
  • Action: Pause AI inference, prioritize game servers

Storage:

  • Warning: >800GB used (80%)
  • Critical: >900GB used (90%)
  • Action: Clean up old logs, model cache

Monitoring Commands

# Check RAM
free -h

# Check CPU
htop

# Check storage
df -h /

# Check Ollama status
ollama list
ollama ps  # Shows loaded models

# Check Dify
cd /opt/dify
docker-compose ps
docker stats  # Real-time resource usage

Resource Optimization

Unload Models When Not Needed

# Unload all models (frees RAM)
ollama stop qwen2.5-coder:72b
ollama stop llama3.3:70b
ollama stop llama3.2-vision:11b

# Verify RAM freed
free -h

Preload Models for Faster Response

# Preload model (takes ~30 seconds)
ollama run qwen2.5-coder:72b ""
# Model now in RAM, queries will be faster

Schedule Maintenance Windows

Best time for model downloads/updates:

  • Tuesday/Wednesday 2-6 AM CST (lowest traffic)
  • Announce in Discord 24 hours ahead
  • Expected downtime: <10 minutes

Capacity Planning

Current State (Feb 2026)

  • Game servers: 6 active
  • RAM available: 256GB
  • Storage available: 1TB
  • AI stack: Fits comfortably

Growth Scenarios

Scenario 1: Add 6 more game servers (12 total)

  • Additional RAM needed: ~60GB
  • Available for AI (normal): 248GB → 188GB
  • Available for AI (DERP): 164GB → 104GB
  • Status: Still viable

Scenario 2: Add 12 more game servers (18 total)

  • Additional RAM needed: ~120GB
  • Available for AI (normal): 248GB → 128GB
  • Available for AI (DERP): 164GB → 44GB ⚠️
  • Status: DERP would require unloading 2 game servers

Scenario 3: Upgrade to larger models (theoretical)

  • Qwen 3.0 Coder 170B: ~180GB RAM
  • Status: Would NOT fit alongside game servers
  • Recommendation: Stick with 72B models

Upgrade Path

If TX1 reaches capacity:

Option A: Add second dedicated AI server

  • Move AI stack to separate VPS
  • TX1 focuses only on game servers
  • Cost: ~$100-200/month (NOT DERP-compliant)

Option B: Upgrade TX1 RAM

  • 256GB → 512GB
  • Cost: Contact Hetzner for pricing
  • Preferred: Maintains DERP compliance

Option C: Use smaller AI models

  • Qwen 2.5 Coder 32B (~35GB RAM)
  • Llama 3.2 8B (~8GB RAM)
  • Tradeoff: Lower quality, but more capacity

Disaster Recovery

Backup Strategy

What to backup:

  • Dify configuration files
  • Knowledge base data
  • Discord bot code
  • Models (can re-download)

Backup location:

  • Git repository (for configs/code)
  • NC1 Charlotte (for knowledge base)

Backup frequency:

  • Configurations: After every change
  • Knowledge base: Weekly
  • Models: No backup needed

Recovery Procedure

If TX1 fails completely:

  1. Deploy Dify on NC1 (temporary)
  2. Restore knowledge base from backup
  3. Re-download models (~4 hours)
  4. Point Discord bot to NC1
  5. Downtime: 4-6 hours

Note: This is acceptable for DERP (emergency-only system)


Cost Analysis

One-Time Costs

  • Setup time: 6-8 hours (Michael's time)
  • Model downloads: Bandwidth usage (included in hosting)
  • Total: $0 (sweat equity only)

Monthly Costs

  • Hosting: $0 (using existing TX1)
  • Bandwidth: $0 (included in hosting)
  • Maintenance: ~1 hour/month (Michael's time)
  • Total: $0/month

Opportunity Cost

  • RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
  • Could host 1-2 more game servers in that space
  • Acceptable tradeoff: DERP independence worth more than 2 game servers

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.