firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/README.md

# Self-Hosted AI Stack on TX1

**Status:** Blocked - Medical clearance
**Priority:** Tier 2 - Major Infrastructure
**Time:** 6-8 hours (3-4 active, rest downloads)
**Location:** TX1 Dallas
**Last Updated:** 2026-02-18
**Updated By:** The Chronicler

---

## Overview

**DERP-compliant AI infrastructure with zero additional monthly cost.**

Three-tier usage model:
1. **Primary:** Claude Projects (best experience, full repo context)
2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable
3. **Staff/Subscriber Bots:** Discord + Wiki integration

**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription)

---

## Architecture

### Component 1: Dify (RAG Platform)
**URL:** ai.firefrostgaming.com
**Purpose:** Knowledge management, API backend
**Features:**
- Multi-workspace (Operations, Brainstorming)
- Knowledge graph indexing
- Web interface
- Discord bot API
- Repository integration

### Component 2: Ollama (Model Server)
**Purpose:** Local model hosting
**Features:**
- Model management
- API compatibility
- Resource optimization

### Component 3: Models (Self-Hosted)

**Qwen 2.5 Coder 72B**
- Purpose: Infrastructure/coding specialist
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP strategic decisions

**Llama 3.3 70B**
- Purpose: General reasoning
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP general queries

**Llama 3.2 Vision 11B**
- Purpose: Screenshot/image analysis
- RAM: ~7GB when loaded
- Storage: ~7GB
- Use: Visual troubleshooting

### Component 4: Discord Bot
**Purpose:** Staff/subscriber interface
**Features:**
- Role-based access (staff vs subscribers)
- Calls Dify API
- Commands: `/ask`, `/operations`, `/brainstorm`

---

## Usage Model

### Tier 1: Claude Projects (Primary)
**When:** Normal operations
**Experience:** Best (full repo context, session continuity)
**Cost:** $20/month (already paying)

### Tier 2: DERP Backup (Emergency)
**When:** Claude/Anthropic outage
**Experience:** Functional (knowledge graph + 128K context)
**Cost:** $0/month (self-hosted)

### Tier 3: Staff/Subscriber Bots
**When:** Routine queries in Discord/Wiki
**Experience:** Fast, simple
**Cost:** $0/month (same infrastructure)

---

## Resource Requirements

### Storage (TX1 has 1TB)
- Qwen 2.5 Coder 72B: ~40GB
- Llama 3.3 70B: ~40GB
- Llama 3.2 Vision 11B: ~7GB
- Dify + services: ~10GB
- **Total: ~97GB** ✅

### RAM (TX1 has 256GB)
**DERP Activated (one large model loaded):**
- Model: ~80GB (Qwen OR Llama 3.3)
- Dify services: ~4GB
- Overhead: ~8GB
- **Total: ~92GB** ✅

**Normal Operations (models idle):**
- Minimal RAM usage
- Available for game servers

### CPU
- 32 vCPU available
- Inference slower than API
- Functional for emergency use

---

## Deployment Phases

### Phase 1: Core Stack (2-3 hours)
1. Deploy Dify via Docker Compose
2. Install Ollama
3. Download models (overnight - large files)
4. Configure workspaces
5. Index Git repository

### Phase 2: Discord Bot (2-3 hours)
1. Create Python bot
2. Connect to Dify API
3. Implement role-based access
4. Test in Discord

### Phase 3: Documentation (1 hour)
1. Usage guide (when to use what)
2. Emergency DERP procedures
3. Discord bot commands
4. Staff training materials

**Total Time:** 6-8 hours (active work)

---

## Success Criteria

- ✅ Dify deployed and indexing repo
- ✅ Models loaded and operational
- ✅ DERP backup tested (strategic query without Claude)
- ✅ Discord bot functional (staff + subscriber access)
- ✅ Documentation complete
- ✅ Zero additional monthly cost

---

## Related Documentation

- **deployment-plan.md** - Step-by-step deployment guide
- **usage-guide.md** - When to use Claude vs DERP vs bots
- **resource-requirements.md** - Detailed TX1 resource allocation
- **discord-bot-setup.md** - Bot configuration and commands

---

**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️

**Monthly Cost: $20 (no increase from current)**