Files
firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
The Chronicler b32afdd1db Task #9: Rewrite AI Stack architecture for DERP compliance
Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
2026-02-18 17:27:25 +00:00

8.5 KiB
Raw Blame History

AI Stack Resource Requirements

Server: TX1 Dallas (38.68.14.26)
Purpose: Resource allocation planning
Last Updated: 2026-02-18


TX1 Server Specifications

CPU: 32 vCPU
RAM: 256GB
Storage: 1TB NVMe SSD
Location: Dallas, TX
Network: 1Gbps

Current Usage (before AI stack):

  • Game servers: 6 Minecraft instances
  • Management services: Minimal overhead
  • Available for AI: Significant capacity

Storage Requirements

Component Breakdown

Component Size Purpose
Qwen 2.5 Coder 72B ~40GB Infrastructure/coding model
Llama 3.3 70B ~40GB General reasoning model
Llama 3.2 Vision 11B ~7GB Image analysis model
Dify Services ~5GB Docker containers, databases
Knowledge Base ~5GB Indexed docs, embeddings
Logs & Temp ~2GB Operational overhead
Total ~99GB Well under 1TB limit

Storage Growth Estimate

Year 1:

  • Models: 87GB (static, no growth unless upgrading)
  • Knowledge base: 5GB → 8GB (as docs grow)
  • Logs: 2GB → 5GB (6 months rotation)
  • Total Year 1: ~100GB

Storage is NOT a concern.


RAM Requirements

Scenario 1: Normal Operations (Claude Available)

Component RAM Usage
Dify Services ~4GB
PostgreSQL ~2GB
Redis ~1GB
Ollama (idle) <1GB
Total (idle) ~8GB

Game servers have ~248GB available (256GB - 8GB)


Scenario 2: DERP Activated (Claude Down, Emergency)

Load ONE large model at a time:

Component RAM Usage
Qwen 2.5 Coder 72B OR Llama 3.3 70B ~80GB
Dify Services ~4GB
PostgreSQL ~2GB
Redis ~1GB
Ollama Runtime ~2GB
OS Overhead ~3GB
Total (active DERP) ~92GB

Game servers have ~164GB available (256GB - 92GB)

Critical: DO NOT load both large models simultaneously (160GB would impact game servers)


Scenario 3: Vision Model Only (Screenshot Analysis)

Component RAM Usage
Llama 3.2 Vision 11B ~7GB
Dify Services ~4GB
Other Services ~3GB
Total ~14GB

Very lightweight, can run alongside game servers with no impact


CPU Requirements

Model Inference Performance

TX1 has 32 vCPU (shared among all services)

Expected Inference Times:

Model Token Generation Speed Typical Response
Qwen 2.5 Coder 72B ~3-5 tokens/second 30-120 seconds
Llama 3.3 70B ~3-5 tokens/second 30-120 seconds
Llama 3.2 Vision 11B ~8-12 tokens/second 10-45 seconds

For comparison:

  • Claude API: 20-40 tokens/second
  • DERP is 5-10× slower (this is expected and acceptable for emergency use)

CPU Impact on Game Servers:

  • During DERP inference: ~70-80% CPU usage (temporary spikes)
  • Game servers may experience brief lag during AI responses
  • Acceptable for emergency use (not for normal operations)

Network Requirements

Initial Model Downloads (One-Time)

Model Size Download Time (1Gbps)
Qwen 2.5 Coder 72B ~40GB 5-10 minutes
Llama 3.3 70B ~40GB 5-10 minutes
Llama 3.2 Vision 11B ~7GB 1-2 minutes
Total ~87GB 15-25 minutes

Reality: Download speeds vary, budget 2-4 hours for all models.

Recommendation: Download overnight to avoid impacting game server traffic.


Ongoing Bandwidth

Dify Web Interface:

  • Minimal (text-based queries)
  • ~1-5 KB per query
  • Negligible impact

Discord Bot:

  • Text-based queries only
  • ~1-5 KB per query
  • Negligible impact

Model Updates:

  • Infrequent (quarterly at most)
  • Same as initial download (~87GB)
  • Schedule during low-traffic periods

Resource Allocation Strategy

Priority Levels

Priority 1 (Always): Game Servers
Priority 2 (Normal): Management Services (Pterodactyl, Gitea, etc.)
Priority 3 (Emergency Only): DERP AI Stack

RAM Allocation Rules

Normal Operations:

  • Game servers: Up to 240GB
  • Management: ~8GB
  • AI Stack (idle): ~8GB
  • Total: 256GB

DERP Emergency:

  • Game servers: Temporarily limited to 160GB
  • Management: ~8GB
  • AI Stack (active): ~92GB
  • Total: 260GB ⚠️ (4GB overcommit acceptable for brief periods)

If RAM pressure occurs during DERP:

  1. Unload one game server temporarily
  2. Run AI query
  3. Reload game server
  4. Total downtime per query: <5 minutes

Monitoring & Alerts

Critical Thresholds

RAM Usage:

  • Warning: >220GB used (85%)
  • Critical: >240GB used (93%)
  • Action: Defer DERP usage or unload game server

CPU Usage:

  • Warning: >80% sustained for >5 minutes
  • Critical: >90% sustained for >2 minutes
  • Action: Pause AI inference, prioritize game servers

Storage:

  • Warning: >800GB used (80%)
  • Critical: >900GB used (90%)
  • Action: Clean up old logs, model cache

Monitoring Commands

# Check RAM
free -h

# Check CPU
htop

# Check storage
df -h /

# Check Ollama status
ollama list
ollama ps  # Shows loaded models

# Check Dify
cd /opt/dify
docker-compose ps
docker stats  # Real-time resource usage

Resource Optimization

Unload Models When Not Needed

# Unload all models (frees RAM)
ollama stop qwen2.5-coder:72b
ollama stop llama3.3:70b
ollama stop llama3.2-vision:11b

# Verify RAM freed
free -h

Preload Models for Faster Response

# Preload model (takes ~30 seconds)
ollama run qwen2.5-coder:72b ""
# Model now in RAM, queries will be faster

Schedule Maintenance Windows

Best time for model downloads/updates:

  • Tuesday/Wednesday 2-6 AM CST (lowest traffic)
  • Announce in Discord 24 hours ahead
  • Expected downtime: <10 minutes

Capacity Planning

Current State (Feb 2026)

  • Game servers: 6 active
  • RAM available: 256GB
  • Storage available: 1TB
  • AI stack: Fits comfortably

Growth Scenarios

Scenario 1: Add 6 more game servers (12 total)

  • Additional RAM needed: ~60GB
  • Available for AI (normal): 248GB → 188GB
  • Available for AI (DERP): 164GB → 104GB
  • Status: Still viable

Scenario 2: Add 12 more game servers (18 total)

  • Additional RAM needed: ~120GB
  • Available for AI (normal): 248GB → 128GB
  • Available for AI (DERP): 164GB → 44GB ⚠️
  • Status: DERP would require unloading 2 game servers

Scenario 3: Upgrade to larger models (theoretical)

  • Qwen 3.0 Coder 170B: ~180GB RAM
  • Status: Would NOT fit alongside game servers
  • Recommendation: Stick with 72B models

Upgrade Path

If TX1 reaches capacity:

Option A: Add second dedicated AI server

  • Move AI stack to separate VPS
  • TX1 focuses only on game servers
  • Cost: ~$100-200/month (NOT DERP-compliant)

Option B: Upgrade TX1 RAM

  • 256GB → 512GB
  • Cost: Contact Hetzner for pricing
  • Preferred: Maintains DERP compliance

Option C: Use smaller AI models

  • Qwen 2.5 Coder 32B (~35GB RAM)
  • Llama 3.2 8B (~8GB RAM)
  • Tradeoff: Lower quality, but more capacity

Disaster Recovery

Backup Strategy

What to backup:

  • Dify configuration files
  • Knowledge base data
  • Discord bot code
  • Models (can re-download)

Backup location:

  • Git repository (for configs/code)
  • NC1 Charlotte (for knowledge base)

Backup frequency:

  • Configurations: After every change
  • Knowledge base: Weekly
  • Models: No backup needed

Recovery Procedure

If TX1 fails completely:

  1. Deploy Dify on NC1 (temporary)
  2. Restore knowledge base from backup
  3. Re-download models (~4 hours)
  4. Point Discord bot to NC1
  5. Downtime: 4-6 hours

Note: This is acceptable for DERP (emergency-only system)


Cost Analysis

One-Time Costs

  • Setup time: 6-8 hours (Michael's time)
  • Model downloads: Bandwidth usage (included in hosting)
  • Total: $0 (sweat equity only)

Monthly Costs

  • Hosting: $0 (using existing TX1)
  • Bandwidth: $0 (included in hosting)
  • Maintenance: ~1 hour/month (Michael's time)
  • Total: $0/month

Opportunity Cost

  • RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
  • Could host 1-2 more game servers in that space
  • Acceptable tradeoff: DERP independence worth more than 2 game servers

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.