Task #9: Rewrite AI Stack architecture for DERP compliance

Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
2026-02-18 17:27:25 +00:00
parent fa42868b69
commit 96f20e8715
5 changed files with 1365 additions and 30 deletions
--- a/docs/core/tasks.md
+++ b/docs/core/tasks.md
@@ -178,15 +178,17 @@ Professional @firefrostgaming.com email on NC1. Self-hosted, $120/year saved, el
 ---

 ### 9. Self-Hosted AI Stack on TX1
-**Time:** 8-12 hours (3-4 active, rest downloads)  
+**Time:** 6-8 hours (3-4 active, rest downloads)  
 **Status:** BLOCKED - Medical clearance  
 **Documentation:** `docs/tasks/self-hosted-ai-stack-on-tx1/`

-Dual AI deployment: AnythingLLM (ops) + Open WebUI (staff). DERP backup, unlimited AI access.
+DERP-compliant AI infrastructure: Dify + Ollama + self-hosted models. Three-tier usage: Claude Projects (primary) → DERP backup (emergency) → Discord/Wiki bots (staff/subscribers).

+**Architecture:** Dify with knowledge graph RAG, Ollama model server  
 **Models:** Qwen 2.5 Coder 72B, Llama 3.3 70B, Llama 3.2 Vision 11B  
-**Storage:** ~150GB  
-**RAM:** ~110GB when loaded
+**Storage:** ~97GB  
+**RAM:** ~92GB when DERP activated, ~8GB idle  
+**Monthly Cost:** $0 (self-hosted, no additional cost beyond Claude Pro)

 ---

--- a/docs/tasks/self-hosted-ai-stack-on-tx1/README.md
+++ b/docs/tasks/self-hosted-ai-stack-on-tx1/README.md
@@ -2,43 +2,167 @@

 **Status:** Blocked - Medical clearance  
 **Priority:** Tier 2 - Major Infrastructure  
-**Time:** 8-12 hours (3-4 active, rest downloads)  
+**Time:** 6-8 hours (3-4 active, rest downloads)  
 **Location:** TX1 Dallas  
-**Last Updated:** 2026-02-16
+**Last Updated:** 2026-02-18  
+**Updated By:** The Chronicler
+
+---

 ## Overview
-Dual AI deployment: AnythingLLM (Michael/Meg, document-heavy) + Open WebUI (staff assistant). DERP backup, unlimited AI access, staff foundation.
+
+**DERP-compliant AI infrastructure with zero additional monthly cost.**
+
+Three-tier usage model:
+1. **Primary:** Claude Projects (best experience, full repo context)
+2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable
+3. **Staff/Subscriber Bots:** Discord + Wiki integration
+
+**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription)
+
+---

 ## Architecture
-**Primary: AnythingLLM** (ai.firefrostgaming.com)
- 1,000+ document libraries
- LanceDB vector database
- Workspace isolation (Operations, Pokerole, Brainstorming)

-**Secondary: Open WebUI** (staff-ai.firefrostgaming.com)
- Lighter for staff wiki
- Chroma vector DB
- ChatGPT-like interface
+### Component 1: Dify (RAG Platform)
+**URL:** ai.firefrostgaming.com  
+**Purpose:** Knowledge management, API backend  
+**Features:**
+- Multi-workspace (Operations, Brainstorming)
+- Knowledge graph indexing
+- Web interface
+- Discord bot API
+- Repository integration

-## Phases
-**Phase 1:** Deploy stack (1-2 hours)
-**Phase 2:** Load models (6-8 hours overnight)
-**Phase 3:** Document ingestion (2-3 hours active, 6-8 total)
+### Component 2: Ollama (Model Server)
+**Purpose:** Local model hosting  
+**Features:**
+- Model management
+- API compatibility
+- Resource optimization

-## Models
- Qwen 2.5 Coder 72B (~40GB)
- Llama 3.3 70B (~40GB)
- Llama 3.2 Vision 11B (~7GB)
- Embeddings: all-MiniLM-L6-v2 (~400MB)
+### Component 3: Models (Self-Hosted)

-**Total:** ~150GB storage, ~110GB RAM when loaded
+**Qwen 2.5 Coder 72B**
+- Purpose: Infrastructure/coding specialist
+- Context: 128K tokens
+- RAM: ~80GB when loaded
+- Storage: ~40GB
+- Use: DERP strategic decisions
+
+**Llama 3.3 70B**
+- Purpose: General reasoning
+- Context: 128K tokens
+- RAM: ~80GB when loaded
+- Storage: ~40GB
+- Use: DERP general queries
+
+**Llama 3.2 Vision 11B**
+- Purpose: Screenshot/image analysis
+- RAM: ~7GB when loaded
+- Storage: ~7GB
+- Use: Visual troubleshooting
+
+### Component 4: Discord Bot
+**Purpose:** Staff/subscriber interface  
+**Features:**
+- Role-based access (staff vs subscribers)
+- Calls Dify API
+- Commands: `/ask`, `/operations`, `/brainstorm`
+
+---
+
+## Usage Model
+
+### Tier 1: Claude Projects (Primary)
+**When:** Normal operations  
+**Experience:** Best (full repo context, session continuity)  
+**Cost:** $20/month (already paying)
+
+### Tier 2: DERP Backup (Emergency)
+**When:** Claude/Anthropic outage  
+**Experience:** Functional (knowledge graph + 128K context)  
+**Cost:** $0/month (self-hosted)
+
+### Tier 3: Staff/Subscriber Bots
+**When:** Routine queries in Discord/Wiki  
+**Experience:** Fast, simple  
+**Cost:** $0/month (same infrastructure)
+
+---
+
+## Resource Requirements
+
+### Storage (TX1 has 1TB)
+- Qwen 2.5 Coder 72B: ~40GB
+- Llama 3.3 70B: ~40GB
+- Llama 3.2 Vision 11B: ~7GB
+- Dify + services: ~10GB
+- **Total: ~97GB** ✅
+
+### RAM (TX1 has 256GB)
+**DERP Activated (one large model loaded):**
+- Model: ~80GB (Qwen OR Llama 3.3)
+- Dify services: ~4GB
+- Overhead: ~8GB
+- **Total: ~92GB** ✅
+
+**Normal Operations (models idle):**
+- Minimal RAM usage
+- Available for game servers
+
+### CPU
+- 32 vCPU available
+- Inference slower than API
+- Functional for emergency use
+
+---
+
+## Deployment Phases
+
+### Phase 1: Core Stack (2-3 hours)
+1. Deploy Dify via Docker Compose
+2. Install Ollama
+3. Download models (overnight - large files)
+4. Configure workspaces
+5. Index Git repository
+
+### Phase 2: Discord Bot (2-3 hours)
+1. Create Python bot
+2. Connect to Dify API
+3. Implement role-based access
+4. Test in Discord
+
+### Phase 3: Documentation (1 hour)
+1. Usage guide (when to use what)
+2. Emergency DERP procedures
+3. Discord bot commands
+4. Staff training materials
+
+**Total Time:** 6-8 hours (active work)
+
+---

 ## Success Criteria
- ✅ Both stacks deployed
+
+- ✅ Dify deployed and indexing repo
 - ✅ Models loaded and operational
- ✅ Documents ingested (Ops, Pokerole, Brainstorming)
- ✅ DERP backup functional
+- ✅ DERP backup tested (strategic query without Claude)
+- ✅ Discord bot functional (staff + subscriber access)
+- ✅ Documentation complete
+- ✅ Zero additional monthly cost

-**See:** deployment-plan.md for detailed phases
+---

-**Fire + Frost + Foundation** 💙🔥❄️
+## Related Documentation
+
+- **deployment-plan.md** - Step-by-step deployment guide
+- **usage-guide.md** - When to use Claude vs DERP vs bots
+- **resource-requirements.md** - Detailed TX1 resource allocation
+- **discord-bot-setup.md** - Bot configuration and commands
+
+---
+
+**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
+
+**Monthly Cost: $20 (no increase from current)**
--- a/docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md
+++ b/docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md
@@ -0,0 +1,500 @@
+# Self-Hosted AI Stack - Deployment Plan
+
+**Task:** Self-Hosted AI Stack on TX1  
+**Location:** TX1 Dallas (38.68.14.26)  
+**Total Time:** 6-8 hours (3-4 active, rest overnight downloads)  
+**Last Updated:** 2026-02-18
+
+---
+
+## Prerequisites
+
+### Before Starting
+- [ ] SSH access to TX1
+- [ ] Docker installed on TX1
+- [ ] Docker Compose installed
+- [ ] Sufficient storage (~100GB free)
+- [ ] No game servers under heavy load (model downloads are bandwidth-intensive)
+
+### Domain Configuration
+- [ ] DNS A record: ai.firefrostgaming.com → 38.68.14.26
+- [ ] SSL certificate ready (Let's Encrypt)
+
+---
+
+## Phase 1: Deploy Dify (2-3 hours)
+
+### Step 1.1: Create Directory Structure
+
+```bash
+ssh root@38.68.14.26
+cd /opt
+mkdir -p dify
+cd dify
+```
+
+### Step 1.2: Download Dify Docker Compose
+
+```bash
+wget https://raw.githubusercontent.com/langgenius/dify/main/docker/docker-compose.yaml
+```
+
+### Step 1.3: Configure Environment
+
+```bash
+# Create .env file
+cat > .env << 'EOF'
+# Dify Configuration
+DIFY_VERSION=0.6.0
+API_URL=https://ai.firefrostgaming.com
+WEB_API_URL=https://ai.firefrostgaming.com
+
+# Database
+POSTGRES_PASSWORD=<generate_secure_password>
+POSTGRES_DB=dify
+
+# Redis
+REDIS_PASSWORD=<generate_secure_password>
+
+# Secret Key (generate with: openssl rand -base64 32)
+SECRET_KEY=<generate_secret_key>
+
+# Storage
+STORAGE_TYPE=local
+STORAGE_LOCAL_PATH=/app/storage
+EOF
+```
+
+### Step 1.4: Deploy Dify
+
+```bash
+docker-compose up -d
+```
+
+**Wait:** 5-10 minutes for all services to start
+
+### Step 1.5: Verify Deployment
+
+```bash
+docker-compose ps
+# All services should show "Up"
+
+curl http://localhost/health
+# Should return: {"status":"ok"}
+```
+
+### Step 1.6: Configure Nginx Reverse Proxy
+
+```bash
+# Create Nginx config
+cat > /etc/nginx/sites-available/ai.firefrostgaming.com << 'EOF'
+server {
+    listen 80;
+    server_name ai.firefrostgaming.com;
+    
+    location / {
+        proxy_pass http://localhost:80;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+    }
+}
+EOF
+
+# Enable site
+ln -s /etc/nginx/sites-available/ai.firefrostgaming.com /etc/nginx/sites-enabled/
+nginx -t
+systemctl reload nginx
+
+# Get SSL certificate
+certbot --nginx -d ai.firefrostgaming.com
+```
+
+### Step 1.7: Initial Configuration
+
+1. Visit https://ai.firefrostgaming.com
+2. Create admin account (Michael)
+3. Configure workspaces:
+   - **Operations** (infrastructure docs)
+   - **Brainstorming** (creative docs)
+
+---
+
+## Phase 2: Install Ollama and Models (Overnight)
+
+### Step 2.1: Install Ollama
+
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+### Step 2.2: Download Models (Overnight - Large Files)
+
+**Download Qwen 2.5 Coder 72B:**
+```bash
+ollama pull qwen2.5-coder:72b
+```
+**Size:** ~40GB  
+**Time:** 2-4 hours (depending on connection)
+
+**Download Llama 3.3 70B:**
+```bash
+ollama pull llama3.3:70b
+```
+**Size:** ~40GB  
+**Time:** 2-4 hours
+
+**Download Llama 3.2 Vision 11B:**
+```bash
+ollama pull llama3.2-vision:11b
+```
+**Size:** ~7GB  
+**Time:** 30-60 minutes
+
+**Total download time:** 6-8 hours (run overnight)
+
+### Step 2.3: Verify Models
+
+```bash
+ollama list
+# Should show all three models
+
+# Test Qwen
+ollama run qwen2.5-coder:72b "Write a bash script to check disk space"
+# Should generate script
+
+# Test Llama 3.3
+ollama run llama3.3:70b "Explain Firefrost Gaming's Fire + Frost philosophy"
+# Should respond
+
+# Test Vision
+ollama run llama3.2-vision:11b "Describe this image: /path/to/test/image.jpg"
+# Should analyze image
+```
+
+### Step 2.4: Configure Ollama as Dify Backend
+
+In Dify web interface:
+1. Go to Settings → Model Providers
+2. Add Ollama provider
+3. URL: http://localhost:11434
+4. Add models:
+   - qwen2.5-coder:72b
+   - llama3.3:70b
+   - llama3.2-vision:11b
+5. Set Qwen as default for coding queries
+6. Set Llama 3.3 as default for general queries
+
+---
+
+## Phase 3: Index Git Repository (1-2 hours)
+
+### Step 3.1: Clone Operations Manual to TX1
+
+```bash
+cd /opt/dify
+git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
+```
+
+### Step 3.2: Configure Dify Knowledge Base
+
+**Operations Workspace:**
+1. In Dify, go to Operations workspace
+2. Create Knowledge Base: "Infrastructure Docs"
+3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/`
+4. Processing: Automatic chunking with Q&A segmentation
+5. Embedding model: Default (all-MiniLM-L6-v2)
+
+**Brainstorming Workspace:**
+1. Go to Brainstorming workspace
+2. Create Knowledge Base: "Creative Docs"
+3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/planning/`
+4. Same processing settings
+
+**Wait:** 30-60 minutes for indexing (416 files)
+
+### Step 3.3: Test Knowledge Retrieval
+
+In Operations workspace:
+- Query: "What is the Frostwall Protocol?"
+- Should return relevant docs with citations
+
+In Brainstorming workspace:
+- Query: "What is the Terraria branding training arc?"
+- Should return planning docs
+
+---
+
+## Phase 4: Discord Bot (2-3 hours)
+
+### Step 4.1: Create Bot on Discord Developer Portal
+
+1. Go to https://discord.com/developers/applications
+2. Create new application: "Firefrost AI Assistant"
+3. Go to Bot section
+4. Create bot
+5. Copy bot token
+6. Enable Privileged Gateway Intents:
+   - Message Content Intent
+   - Server Members Intent
+
+### Step 4.2: Install Bot Code on TX1
+
+```bash
+cd /opt
+mkdir -p firefrost-discord-bot
+cd firefrost-discord-bot
+
+# Create requirements.txt
+cat > requirements.txt << 'EOF'
+discord.py==2.3.2
+aiohttp==3.9.1
+python-dotenv==1.0.0
+EOF
+
+# Create virtual environment
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+
+### Step 4.3: Create Bot Script
+
+```bash
+cat > bot.py << 'EOF'
+import discord
+from discord.ext import commands
+import aiohttp
+import os
+from dotenv import load_dotenv
+
+load_dotenv()
+
+TOKEN = os.getenv('DISCORD_TOKEN')
+DIFY_API_URL = os.getenv('DIFY_API_URL')
+DIFY_API_KEY = os.getenv('DIFY_API_KEY')
+
+intents = discord.Intents.default()
+intents.message_content = True
+bot = commands.Bot(command_prefix='/', intents=intents)
+
+@bot.event
+async def on_ready():
+    print(f'{bot.user} is now running!')
+
+@bot.command(name='ask')
+async def ask(ctx, *, question):
+    """Ask the AI a question"""
+    # Check user roles
+    is_staff = any(role.name in ['Staff', 'Admin'] for role in ctx.author.roles)
+    is_subscriber = any(role.name == 'Subscriber' for role in ctx.author.roles)
+    
+    if not (is_staff or is_subscriber):
+        await ctx.send("You need Staff or Subscriber role to use this command.")
+        return
+    
+    # Determine workspace based on role
+    workspace = 'operations' if is_staff else 'general'
+    
+    await ctx.send(f"🤔 Thinking...")
+    
+    async with aiohttp.ClientSession() as session:
+        async with session.post(
+            f'{DIFY_API_URL}/v1/chat-messages',
+            headers={
+                'Authorization': f'Bearer {DIFY_API_KEY}',
+                'Content-Type': 'application/json'
+            },
+            json={
+                'query': question,
+                'user': str(ctx.author.id),
+                'conversation_id': None,
+                'workspace': workspace
+            }
+        ) as resp:
+            if resp.status == 200:
+                data = await resp.json()
+                answer = data.get('answer', 'No response')
+                
+                # Split long responses
+                if len(answer) > 2000:
+                    chunks = [answer[i:i+2000] for i in range(0, len(answer), 2000)]
+                    for chunk in chunks:
+                        await ctx.send(chunk)
+                else:
+                    await ctx.send(answer)
+            else:
+                await ctx.send("❌ Error connecting to AI. Please try again.")
+
+bot.run(TOKEN)
+EOF
+```
+
+### Step 4.4: Configure Bot
+
+```bash
+# Create .env file
+cat > .env << 'EOF'
+DISCORD_TOKEN=<your_bot_token>
+DIFY_API_URL=https://ai.firefrostgaming.com
+DIFY_API_KEY=<get_from_dify_settings>
+EOF
+```
+
+### Step 4.5: Create Systemd Service
+
+```bash
+cat > /etc/systemd/system/firefrost-discord-bot.service << 'EOF'
+[Unit]
+Description=Firefrost Discord Bot
+After=network.target
+
+[Service]
+Type=simple
+User=root
+WorkingDirectory=/opt/firefrost-discord-bot
+Environment="PATH=/opt/firefrost-discord-bot/venv/bin"
+ExecStart=/opt/firefrost-discord-bot/venv/bin/python bot.py
+Restart=always
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+systemctl daemon-reload
+systemctl enable firefrost-discord-bot
+systemctl start firefrost-discord-bot
+```
+
+### Step 4.6: Invite Bot to Discord
+
+1. Go to OAuth2 → URL Generator
+2. Select scopes: bot, applications.commands
+3. Select permissions: Send Messages, Read Message History
+4. Copy generated URL
+5. Open in browser and invite to Firefrost Discord
+
+### Step 4.7: Test Bot
+
+In Discord:
+```
+/ask What is the Frostwall Protocol?
+```
+Should return answer from Operations workspace (staff only)
+
+---
+
+## Phase 5: Testing and Validation (30 minutes)
+
+### Test 1: DERP Backup (Strategic Query)
+
+**Simulate Claude outage:**
+1. Load Qwen model: `ollama run qwen2.5-coder:72b`
+2. In Dify Operations workspace, ask:
+   - "Should I deploy Mailcow before or after Frostwall Protocol?"
+3. Verify:
+   - Response references both task docs
+   - Shows dependency understanding
+   - Recommends Frostwall first
+
+### Test 2: Discord Bot (Staff Query)
+
+As staff member in Discord:
+```
+/ask How many game servers are running?
+```
+Should return infrastructure details
+
+### Test 3: Discord Bot (Subscriber Query)
+
+As subscriber in Discord:
+```
+/ask What modpacks are available?
+```
+Should return modpack list (limited to public info)
+
+### Test 4: Resource Monitoring
+
+```bash
+# Check RAM usage with model loaded
+free -h
+# Should show ~92GB used when Qwen loaded
+
+# Check disk usage
+df -h /opt/dify
+# Should show ~97GB used
+
+# Check Docker containers
+docker ps
+# All Dify services should be running
+```
+
+---
+
+## Phase 6: Documentation (1 hour)
+
+### Create Usage Guide
+
+Document at `/opt/dify/USAGE-GUIDE.md`:
+- When to use Claude (primary)
+- When to use DERP (Claude down)
+- When to use Discord bot (routine queries)
+- Emergency procedures
+
+### Update Operations Manual
+
+Commit changes to Git:
+- Task documentation updated
+- Deployment plan complete
+- Usage guide created
+
+---
+
+## Success Criteria Checklist
+
+- [ ] Dify deployed and accessible at https://ai.firefrostgaming.com
+- [ ] Ollama running with all 3 models loaded
+- [ ] Operations workspace indexing complete (416 files)
+- [ ] Brainstorming workspace indexing complete
+- [ ] DERP backup tested (strategic query works)
+- [ ] Discord bot deployed and running
+- [ ] Staff can query via Discord (/ask command)
+- [ ] Subscribers have limited access
+- [ ] Resource usage within TX1 limits (~92GB RAM, ~97GB storage)
+- [ ] Documentation complete and committed to Git
+- [ ] Zero additional monthly cost confirmed
+
+---
+
+## Rollback Plan
+
+If deployment fails:
+
+```bash
+# Stop all services
+cd /opt/dify
+docker-compose down
+
+# Stop Discord bot
+systemctl stop firefrost-discord-bot
+systemctl disable firefrost-discord-bot
+
+# Remove installation
+rm -rf /opt/dify
+rm -rf /opt/firefrost-discord-bot
+rm /etc/systemd/system/firefrost-discord-bot.service
+systemctl daemon-reload
+
+# Remove Nginx config
+rm /etc/nginx/sites-enabled/ai.firefrostgaming.com
+rm /etc/nginx/sites-available/ai.firefrostgaming.com
+nginx -t && systemctl reload nginx
+
+# Uninstall Ollama
+sudo /usr/local/bin/ollama-uninstall.sh
+```
+
+---
+
+**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
--- a/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
+++ b/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
@@ -0,0 +1,367 @@
+# AI Stack Resource Requirements
+
+**Server:** TX1 Dallas (38.68.14.26)  
+**Purpose:** Resource allocation planning  
+**Last Updated:** 2026-02-18
+
+---
+
+## TX1 Server Specifications
+
+**CPU:** 32 vCPU  
+**RAM:** 256GB  
+**Storage:** 1TB NVMe SSD  
+**Location:** Dallas, TX  
+**Network:** 1Gbps
+
+**Current Usage (before AI stack):**
+- Game servers: 6 Minecraft instances
+- Management services: Minimal overhead
+- Available for AI: Significant capacity
+
+---
+
+## Storage Requirements
+
+### Component Breakdown
+
+| Component | Size | Purpose |
+|-----------|------|---------|
+| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model |
+| **Llama 3.3 70B** | ~40GB | General reasoning model |
+| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model |
+| **Dify Services** | ~5GB | Docker containers, databases |
+| **Knowledge Base** | ~5GB | Indexed docs, embeddings |
+| **Logs & Temp** | ~2GB | Operational overhead |
+| **Total** | **~99GB** | ✅ Well under 1TB limit |
+
+### Storage Growth Estimate
+
+**Year 1:**
+- Models: 87GB (static, no growth unless upgrading)
+- Knowledge base: 5GB → 8GB (as docs grow)
+- Logs: 2GB → 5GB (6 months rotation)
+- **Total Year 1:** ~100GB
+
+**Storage is NOT a concern.**
+
+---
+
+## RAM Requirements
+
+### Scenario 1: Normal Operations (Claude Available)
+
+| Component | RAM Usage |
+|-----------|-----------|
+| **Dify Services** | ~4GB |
+| **PostgreSQL** | ~2GB |
+| **Redis** | ~1GB |
+| **Ollama (idle)** | <1GB |
+| **Total (idle)** | **~8GB** ✅ |
+
+**Game servers have ~248GB available** (256GB - 8GB)
+
+---
+
+### Scenario 2: DERP Activated (Claude Down, Emergency)
+
+**Load ONE large model at a time:**
+
+| Component | RAM Usage |
+|-----------|-----------|
+| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB |
+| **Dify Services** | ~4GB |
+| **PostgreSQL** | ~2GB |
+| **Redis** | ~1GB |
+| **Ollama Runtime** | ~2GB |
+| **OS Overhead** | ~3GB |
+| **Total (active DERP)** | **~92GB** ✅ |
+
+**Game servers have ~164GB available** (256GB - 92GB)
+
+**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers)
+
+---
+
+### Scenario 3: Vision Model Only (Screenshot Analysis)
+
+| Component | RAM Usage |
+|-----------|-----------|
+| **Llama 3.2 Vision 11B** | ~7GB |
+| **Dify Services** | ~4GB |
+| **Other Services** | ~3GB |
+| **Total** | **~14GB** ✅ |
+
+**Very lightweight, can run alongside game servers with no impact**
+
+---
+
+## CPU Requirements
+
+### Model Inference Performance
+
+**TX1 has 32 vCPU (shared among all services)**
+
+**Expected Inference Times:**
+
+| Model | Token Generation Speed | Typical Response |
+|-------|----------------------|------------------|
+| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds |
+| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds |
+| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds |
+
+**For comparison:**
+- Claude API: 20-40 tokens/second
+- **DERP is 5-10× slower** (this is expected and acceptable for emergency use)
+
+**CPU Impact on Game Servers:**
+- During DERP inference: ~70-80% CPU usage (temporary spikes)
+- Game servers may experience brief lag during AI responses
+- **Acceptable for emergency use** (not for normal operations)
+
+---
+
+## Network Requirements
+
+### Initial Model Downloads (One-Time)
+
+| Model | Size | Download Time (1Gbps) |
+|-------|------|----------------------|
+| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes |
+| **Llama 3.3 70B** | ~40GB | 5-10 minutes |
+| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes |
+| **Total** | **~87GB** | **15-25 minutes** |
+
+**Reality:** Download speeds vary, budget 2-4 hours for all models.
+
+**Recommendation:** Download overnight to avoid impacting game server traffic.
+
+---
+
+### Ongoing Bandwidth
+
+**Dify Web Interface:**
+- Minimal (text-based queries)
+- ~1-5 KB per query
+- Negligible impact
+
+**Discord Bot:**
+- Text-based queries only
+- ~1-5 KB per query
+- Negligible impact
+
+**Model Updates:**
+- Infrequent (quarterly at most)
+- Same as initial download (~87GB)
+- Schedule during low-traffic periods
+
+---
+
+## Resource Allocation Strategy
+
+### Priority Levels
+
+**Priority 1 (Always):** Game Servers  
+**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.)  
+**Priority 3 (Emergency Only):** DERP AI Stack
+
+### RAM Allocation Rules
+
+**Normal Operations:**
+- Game servers: Up to 240GB
+- Management: ~8GB
+- AI Stack (idle): ~8GB
+- **Total: 256GB** ✅
+
+**DERP Emergency:**
+- Game servers: Temporarily limited to 160GB
+- Management: ~8GB
+- AI Stack (active): ~92GB
+- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods)
+
+**If RAM pressure occurs during DERP:**
+1. Unload one game server temporarily
+2. Run AI query
+3. Reload game server
+4. **Total downtime per query: <5 minutes**
+
+---
+
+## Monitoring & Alerts
+
+### Critical Thresholds
+
+**RAM Usage:**
+- **Warning:** >220GB used (85%)
+- **Critical:** >240GB used (93%)
+- **Action:** Defer DERP usage or unload game server
+
+**CPU Usage:**
+- **Warning:** >80% sustained for >5 minutes
+- **Critical:** >90% sustained for >2 minutes
+- **Action:** Pause AI inference, prioritize game servers
+
+**Storage:**
+- **Warning:** >800GB used (80%)
+- **Critical:** >900GB used (90%)
+- **Action:** Clean up old logs, model cache
+
+### Monitoring Commands
+
+```bash
+# Check RAM
+free -h
+
+# Check CPU
+htop
+
+# Check storage
+df -h /
+
+# Check Ollama status
+ollama list
+ollama ps  # Shows loaded models
+
+# Check Dify
+cd /opt/dify
+docker-compose ps
+docker stats  # Real-time resource usage
+```
+
+---
+
+## Resource Optimization
+
+### Unload Models When Not Needed
+
+```bash
+# Unload all models (frees RAM)
+ollama stop qwen2.5-coder:72b
+ollama stop llama3.3:70b
+ollama stop llama3.2-vision:11b
+
+# Verify RAM freed
+free -h
+```
+
+### Preload Models for Faster Response
+
+```bash
+# Preload model (takes ~30 seconds)
+ollama run qwen2.5-coder:72b ""
+# Model now in RAM, queries will be faster
+```
+
+### Schedule Maintenance Windows
+
+**Best time for model downloads/updates:**
+- Tuesday/Wednesday 2-6 AM CST (lowest traffic)
+- Announce in Discord 24 hours ahead
+- Expected downtime: <10 minutes
+
+---
+
+## Capacity Planning
+
+### Current State (Feb 2026)
+- **Game servers:** 6 active
+- **RAM available:** 256GB
+- **Storage available:** 1TB
+- **AI stack:** Fits comfortably
+
+### Growth Scenarios
+
+**Scenario 1: Add 6 more game servers (12 total)**
+- Additional RAM needed: ~60GB
+- Available for AI (normal): 248GB → 188GB ✅
+- Available for AI (DERP): 164GB → 104GB ✅
+- **Status:** Still viable
+
+**Scenario 2: Add 12 more game servers (18 total)**
+- Additional RAM needed: ~120GB
+- Available for AI (normal): 248GB → 128GB ✅
+- Available for AI (DERP): 164GB → 44GB ⚠️
+- **Status:** DERP would require unloading 2 game servers
+
+**Scenario 3: Upgrade to larger models (theoretical)**
+- Qwen 3.0 Coder 170B: ~180GB RAM
+- **Status:** Would NOT fit alongside game servers
+- **Recommendation:** Stick with 72B models
+
+### Upgrade Path
+
+**If TX1 reaches capacity:**
+
+**Option A: Add second dedicated AI server**
+- Move AI stack to separate VPS
+- TX1 focuses only on game servers
+- Cost: ~$100-200/month (NOT DERP-compliant)
+
+**Option B: Upgrade TX1 RAM**
+- 256GB → 512GB
+- Cost: Contact Hetzner for pricing
+- **Preferred:** Maintains DERP compliance
+
+**Option C: Use smaller AI models**
+- Qwen 2.5 Coder 32B (~35GB RAM)
+- Llama 3.2 8B (~8GB RAM)
+- **Tradeoff:** Lower quality, but more capacity
+
+---
+
+## Disaster Recovery
+
+### Backup Strategy
+
+**What to backup:**
+- ✅ Dify configuration files
+- ✅ Knowledge base data
+- ✅ Discord bot code
+- ❌ Models (can re-download)
+
+**Backup location:**
+- Git repository (for configs/code)
+- NC1 Charlotte (for knowledge base)
+
+**Backup frequency:**
+- Configurations: After every change
+- Knowledge base: Weekly
+- Models: No backup needed
+
+### Recovery Procedure
+
+**If TX1 fails completely:**
+
+1. Deploy Dify on NC1 (temporary)
+2. Restore knowledge base from backup
+3. Re-download models (~4 hours)
+4. Point Discord bot to NC1
+5. **Downtime: 4-6 hours**
+
+**Note:** This is acceptable for DERP (emergency-only system)
+
+---
+
+## Cost Analysis
+
+### One-Time Costs
+- Setup time: 6-8 hours (Michael's time)
+- Model downloads: Bandwidth usage (included in hosting)
+- **Total: $0** (sweat equity only)
+
+### Monthly Costs
+- Hosting: $0 (using existing TX1)
+- Bandwidth: $0 (included in hosting)
+- Maintenance: ~1 hour/month (Michael's time)
+- **Total: $0/month** ✅
+
+### Opportunity Cost
+- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
+- Could host 1-2 more game servers in that space
+- **Acceptable tradeoff:** DERP independence worth more than 2 game servers
+
+---
+
+**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
+
+**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**
--- a/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
+++ b/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
@@ -0,0 +1,342 @@
+# AI Stack Usage Guide
+
+**Purpose:** Know which AI system to use when  
+**Last Updated:** 2026-02-18
+
+---
+
+## The Three-Tier System
+
+### Tier 1: Claude Projects (Primary) - **USE THIS FIRST**
+
+**Who:** Michael + Meg  
+**Where:** claude.ai or Claude app  
+**Cost:** $20/month (already paying)
+
+**When to use:**
+- ✅ **Normal daily operations** (99% of the time)
+- ✅ **Strategic decision-making** (deployment order, architecture)
+- ✅ **Complex reasoning** (tradeoffs, dependencies)
+- ✅ **Session continuity** (remembers context across days)
+- ✅ **Best experience** (fastest, most capable)
+
+**What Claude can do:**
+- Search entire 416-file operations manual
+- Write deployment scripts
+- Review infrastructure decisions
+- Generate documentation
+- Debug issues
+- Plan roadmaps
+
+**Example queries:**
+- "Should I deploy Mailcow or AI stack first?"
+- "Write a script to deploy Frostwall Protocol"
+- "What tasks depend on NC1 cleanup?"
+- "Help me troubleshoot this Pterodactyl error"
+
+**Limitations:**
+- Requires internet connection
+- Subject to Anthropic availability
+
+---
+
+### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN**
+
+**Who:** Michael + Meg  
+**Where:** https://ai.firefrostgaming.com  
+**Cost:** $0/month (self-hosted on TX1)
+
+**When to use:**
+- ❌ **Not for normal operations** (Claude is faster/better)
+- ✅ **Anthropic outage** (Claude unavailable for hours)
+- ✅ **Emergency infrastructure decisions** (can't wait for Claude)
+- ✅ **Critical troubleshooting** (server down, need immediate help)
+
+**What DERP can do:**
+- Query indexed operations manual (416 files)
+- Strategic reasoning with 128K context
+- Infrastructure troubleshooting
+- Code generation
+- Emergency deployment guidance
+
+**Available models:**
+- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions
+- **Llama 3.3 70B** - General reasoning
+- **Llama 3.2 Vision 11B** - Screenshot analysis
+
+**Example queries:**
+- "Claude is down. What's the deployment order for Frostwall?"
+- "Emergency: Mailcow not starting. Check logs and diagnose."
+- "Need to deploy something NOW. What dependencies are missing?"
+
+**Limitations:**
+- Slower inference than Claude
+- No session continuity
+- Manual model selection
+- Uses TX1 resources (~80GB RAM when active)
+
+**How to activate:**
+1. Verify Claude is unavailable (try multiple times)
+2. Go to https://ai.firefrostgaming.com
+3. Select workspace:
+   - **Operations** - Infrastructure decisions
+   - **Brainstorming** - Creative work
+4. Select model:
+   - **Qwen 2.5 Coder** - For deployment/troubleshooting
+   - **Llama 3.3** - For general questions
+5. Ask question
+6. Copy/paste response as needed
+
+**When to deactivate:**
+- Claude comes back online
+- Emergency resolved
+- Free up TX1 RAM for game servers
+
+---
+
+### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES**
+
+**Who:** Staff + Subscribers  
+**Where:** Firefrost Discord server  
+**Cost:** $0/month (same infrastructure)
+
+**When to use:**
+- ✅ **Routine questions** (daily operations)
+- ✅ **Quick lookups** (server status, modpack info)
+- ✅ **Staff training** (how-to queries)
+- ✅ **Subscriber support** (basic info)
+
+**Commands:**
+
+**`/ask [question]`**
+- Available to: Staff + Subscribers
+- Searches: Operations workspace (staff) or public docs (subscribers)
+- Rate limit: 10 queries/hour per user
+
+**Example queries (Staff):**
+```
+/ask How many game servers are running?
+/ask What's the Whitelist Manager deployment status?
+/ask How do I restart a Minecraft server?
+```
+
+**Example queries (Subscribers):**
+```
+/ask What modpacks are available?
+/ask How do I join a server?
+/ask What's the difference between Fire and Frost paths?
+```
+
+**Role-based access:**
+- **Staff:** Full Operations workspace access
+- **Subscribers:** Public documentation only
+- **No role:** Cannot use bot
+
+**Limitations:**
+- Simple queries only (no complex reasoning)
+- No file uploads
+- No strategic decisions
+- Rate limited
+
+---
+
+## Decision Tree
+
+```
+┌─────────────────────────────────────┐
+│    Do you need AI assistance?      │
+└─────────────┬───────────────────────┘
+              │
+              ▼
+      ┌───────────────┐
+      │ Is it urgent? │
+      └───┬───────┬───┘
+          │       │
+        NO│       │YES
+          │       │
+          ▼       ▼
+    ┌─────────┐ ┌──────────────┐
+    │ Claude  │ │ Is Claude    │
+    │ working?│ │ available?   │
+    └───┬─────┘ └──┬───────┬───┘
+        │          │       │
+       YES│       YES│     │NO
+        │          │       │
+        ▼          ▼       ▼
+  ┌──────────┐ ┌──────────┐ ┌─────────┐
+  │Use Claude│ │Use Claude│ │Use DERP │
+  │Projects  │ │Projects  │ │Backup   │
+  └──────────┘ └──────────┘ └─────────┘
+```
+
+**For staff/subscribers:**
+```
+┌────────────────────────────┐
+│   Simple routine query?    │
+└──────────┬─────────────────┘
+           │
+          YES
+           │
+           ▼
+   ┌──────────────┐
+   │ Use Discord  │
+   │ Bot: /ask    │
+   └──────────────┘
+```
+
+---
+
+## Emergency Procedures
+
+### Scenario 1: Claude Down, Need Strategic Decision
+
+**Problem:** Anthropic outage, need to deploy something NOW
+
+**Solution:**
+1. Verify Claude truly unavailable (try web + app)
+2. Go to https://ai.firefrostgaming.com
+3. Login with Michael's account
+4. Select Operations workspace
+5. Select Qwen 2.5 Coder model
+6. Ask strategic question
+7. Copy deployment commands
+8. Execute carefully (no session memory!)
+
+**Note:** DERP doesn't remember context. Be explicit in each query.
+
+### Scenario 2: Discord Bot Down
+
+**Problem:** Staff reporting bot not responding
+
+**Check status:**
+```bash
+ssh root@38.68.14.26
+systemctl status firefrost-discord-bot
+```
+
+**If stopped:**
+```bash
+systemctl start firefrost-discord-bot
+```
+
+**If errors:**
+```bash
+journalctl -u firefrost-discord-bot -f
+# Check for API errors, token issues
+```
+
+**If Dify down:**
+```bash
+cd /opt/dify
+docker-compose ps
+# If services down:
+docker-compose up -d
+```
+
+### Scenario 3: Model Won't Load
+
+**Problem:** DERP system reports "model unavailable"
+
+**Check Ollama:**
+```bash
+ollama list
+# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b
+```
+
+**If models missing:**
+```bash
+# Re-download
+ollama pull qwen2.5-coder:72b
+ollama pull llama3.3:70b
+ollama pull llama3.2-vision:11b
+```
+
+**Check RAM:**
+```bash
+free -h
+# If <90GB free, unload game servers temporarily
+```
+
+---
+
+## Cost Tracking
+
+### Monthly Costs
+- **Claude Projects:** $20/month (primary system)
+- **Dify:** $0/month (self-hosted)
+- **Ollama:** $0/month (self-hosted)
+- **Discord Bot:** $0/month (self-hosted)
+- **Total:** $20/month ✅
+
+### Resource Usage (TX1)
+- **Storage:** ~97GB (one-time)
+- **RAM (active DERP):** ~92GB (temporary)
+- **RAM (idle):** <5GB (normal)
+- **Bandwidth:** Models downloaded once, minimal ongoing
+
+---
+
+## Performance Expectations
+
+### Claude Projects (Primary)
+- **Response time:** 5-30 seconds
+- **Quality:** Excellent (GPT-4 class)
+- **Context:** Full repo (416 files)
+- **Session memory:** Yes
+
+### DERP Backup (Emergency)
+- **Response time:** 30-120 seconds (slower than Claude)
+- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model)
+- **Context:** 128K tokens per query
+- **Session memory:** No (each query independent)
+
+### Discord Bot (Routine)
+- **Response time:** 10-45 seconds
+- **Quality:** Good for simple queries
+- **Context:** Knowledge base search
+- **Rate limit:** 10 queries/hour per user
+
+---
+
+## Best Practices
+
+### For Michael + Meg:
+1. ✅ **Always use Claude Projects first** (best experience)
+2. ✅ **Only use DERP for true emergencies** (Claude unavailable)
+3. ✅ **Document DERP usage** (so Claude can learn from it later)
+4. ✅ **Free TX1 RAM after DERP use** (restart Ollama if needed)
+
+### For Staff:
+1. ✅ **Use Discord bot for quick lookups** (fast, simple)
+2. ✅ **Ask Michael/Meg for complex questions** (they have Claude)
+3. ✅ **Don't abuse rate limits** (10 queries/hour is generous)
+4. ✅ **Report bot issues immediately** (don't let it stay broken)
+
+### For Subscribers:
+1. ✅ **Use Discord bot for server info** (join instructions, modpacks)
+2. ✅ **Don't ask for staff-only info** (bot will decline)
+3. ✅ **Be patient** (bot shares resources with staff)
+
+---
+
+## Training & Onboarding
+
+### New Staff Training:
+1. Introduce Discord bot commands (`/ask`)
+2. Show example queries (moderation, server management)
+3. Explain rate limits
+4. When to escalate to Michael/Meg
+
+### Subscriber Communication:
+1. Announce bot in Discord
+2. Pin message with `/ask` command
+3. Example queries in welcome channel
+4. FAQ: "What can the bot answer?"
+
+---
+
+**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
+
+**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.**
+
+**Monthly cost: $20 (no increase)**