Files
firefrost-operations-manual/docs/tasks/_archive/self-hosted-ai-stack-on-tx1
Claude dca114eee9 chore: Task cleanup - archive 3, delete 11 obsolete folders
Archive threshold: ≥50KB OR ≥4 files

Archived to _archive/:
- firefrost-codex-migration-to-open-webui (127K, 9 files)
- whitelist-manager (65K, 5 files)
- self-hosted-ai-stack-on-tx1 (35K, 4 files)

Deleted (obsolete/superseded):
- builder-rank-holly-setup
- consultant-photo-processing
- ghost-theme-migration (empty)
- gitea-plane-integration (Plane abandoned)
- gitea-upgrade (Kanban approach abandoned)
- plane-deployment (superseded by decommission)
- pterodactyl-blueprint-asset-build (fold into #26)
- pterodactyl-modpack-version-display (fold into #26)
- scope-document-corrections (too vague)
- scoped-gitea-token (honor system working)
- whitelist-manager-v1-12-compatibility (rolled into Trinity Console)

Also added: Gemini task management consolidation consultation

Chronicler #69
2026-04-08 14:17:26 +00:00
..

Self-Hosted AI Stack on TX1

Status: Blocked - Medical clearance
Priority: Tier 2 - Major Infrastructure
Time: 6-8 hours (3-4 active, rest downloads)
Location: TX1 Dallas
Last Updated: 2026-02-18
Updated By: The Chronicler


Overview

DERP-compliant AI infrastructure with zero additional monthly cost.

Three-tier usage model:

  1. Primary: Claude Projects (best experience, full repo context)
  2. DERP Backup: Self-hosted when Claude/Anthropic unavailable
  3. Staff/Subscriber Bots: Discord + Wiki integration

Monthly Cost: $0 (beyond existing $20 Claude Pro subscription)


Architecture

Component 1: Dify (RAG Platform)

URL: ai.firefrostgaming.com
Purpose: Knowledge management, API backend
Features:

  • Multi-workspace (Operations, Brainstorming)
  • Knowledge graph indexing
  • Web interface
  • Discord bot API
  • Repository integration

Component 2: Ollama (Model Server)

Purpose: Local model hosting
Features:

  • Model management
  • API compatibility
  • Resource optimization

Component 3: Models (Self-Hosted)

Qwen 2.5 Coder 72B

  • Purpose: Infrastructure/coding specialist
  • Context: 128K tokens
  • RAM: ~80GB when loaded
  • Storage: ~40GB
  • Use: DERP strategic decisions

Llama 3.3 70B

  • Purpose: General reasoning
  • Context: 128K tokens
  • RAM: ~80GB when loaded
  • Storage: ~40GB
  • Use: DERP general queries

Llama 3.2 Vision 11B

  • Purpose: Screenshot/image analysis
  • RAM: ~7GB when loaded
  • Storage: ~7GB
  • Use: Visual troubleshooting

Component 4: Discord Bot

Purpose: Staff/subscriber interface
Features:

  • Role-based access (staff vs subscribers)
  • Calls Dify API
  • Commands: /ask, /operations, /brainstorm

Usage Model

Tier 1: Claude Projects (Primary)

When: Normal operations
Experience: Best (full repo context, session continuity)
Cost: $20/month (already paying)

Tier 2: DERP Backup (Emergency)

When: Claude/Anthropic outage
Experience: Functional (knowledge graph + 128K context)
Cost: $0/month (self-hosted)

Tier 3: Staff/Subscriber Bots

When: Routine queries in Discord/Wiki
Experience: Fast, simple
Cost: $0/month (same infrastructure)


Resource Requirements

Storage (TX1 has 1TB)

  • Qwen 2.5 Coder 72B: ~40GB
  • Llama 3.3 70B: ~40GB
  • Llama 3.2 Vision 11B: ~7GB
  • Dify + services: ~10GB
  • Total: ~97GB

RAM (TX1 has 256GB)

DERP Activated (one large model loaded):

  • Model: ~80GB (Qwen OR Llama 3.3)
  • Dify services: ~4GB
  • Overhead: ~8GB
  • Total: ~92GB

Normal Operations (models idle):

  • Minimal RAM usage
  • Available for game servers

CPU

  • 32 vCPU available
  • Inference slower than API
  • Functional for emergency use

Deployment Phases

Phase 1: Core Stack (2-3 hours)

  1. Deploy Dify via Docker Compose
  2. Install Ollama
  3. Download models (overnight - large files)
  4. Configure workspaces
  5. Index Git repository

Phase 2: Discord Bot (2-3 hours)

  1. Create Python bot
  2. Connect to Dify API
  3. Implement role-based access
  4. Test in Discord

Phase 3: Documentation (1 hour)

  1. Usage guide (when to use what)
  2. Emergency DERP procedures
  3. Discord bot commands
  4. Staff training materials

Total Time: 6-8 hours (active work)


Success Criteria

  • Dify deployed and indexing repo
  • Models loaded and operational
  • DERP backup tested (strategic query without Claude)
  • Discord bot functional (staff + subscriber access)
  • Documentation complete
  • Zero additional monthly cost

  • deployment-plan.md - Step-by-step deployment guide
  • usage-guide.md - When to use Claude vs DERP vs bots
  • resource-requirements.md - Detailed TX1 resource allocation
  • discord-bot-setup.md - Bot configuration and commands

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

Monthly Cost: $20 (no increase from current)