Task #9: Rewrite AI Stack architecture for DERP compliance

Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
This commit is contained in:
The Chronicler
2026-02-18 17:27:25 +00:00
parent fa42868b69
commit 96f20e8715
5 changed files with 1365 additions and 30 deletions

View File

@@ -178,15 +178,17 @@ Professional @firefrostgaming.com email on NC1. Self-hosted, $120/year saved, el
---
### 9. Self-Hosted AI Stack on TX1
**Time:** 8-12 hours (3-4 active, rest downloads)
**Time:** 6-8 hours (3-4 active, rest downloads)
**Status:** BLOCKED - Medical clearance
**Documentation:** `docs/tasks/self-hosted-ai-stack-on-tx1/`
Dual AI deployment: AnythingLLM (ops) + Open WebUI (staff). DERP backup, unlimited AI access.
DERP-compliant AI infrastructure: Dify + Ollama + self-hosted models. Three-tier usage: Claude Projects (primary) → DERP backup (emergency) → Discord/Wiki bots (staff/subscribers).
**Architecture:** Dify with knowledge graph RAG, Ollama model server
**Models:** Qwen 2.5 Coder 72B, Llama 3.3 70B, Llama 3.2 Vision 11B
**Storage:** ~150GB
**RAM:** ~110GB when loaded
**Storage:** ~97GB
**RAM:** ~92GB when DERP activated, ~8GB idle
**Monthly Cost:** $0 (self-hosted, no additional cost beyond Claude Pro)
---

View File

@@ -2,43 +2,167 @@
**Status:** Blocked - Medical clearance
**Priority:** Tier 2 - Major Infrastructure
**Time:** 8-12 hours (3-4 active, rest downloads)
**Time:** 6-8 hours (3-4 active, rest downloads)
**Location:** TX1 Dallas
**Last Updated:** 2026-02-16
**Last Updated:** 2026-02-18
**Updated By:** The Chronicler
---
## Overview
Dual AI deployment: AnythingLLM (Michael/Meg, document-heavy) + Open WebUI (staff assistant). DERP backup, unlimited AI access, staff foundation.
**DERP-compliant AI infrastructure with zero additional monthly cost.**
Three-tier usage model:
1. **Primary:** Claude Projects (best experience, full repo context)
2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable
3. **Staff/Subscriber Bots:** Discord + Wiki integration
**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription)
---
## Architecture
**Primary: AnythingLLM** (ai.firefrostgaming.com)
- 1,000+ document libraries
- LanceDB vector database
- Workspace isolation (Operations, Pokerole, Brainstorming)
**Secondary: Open WebUI** (staff-ai.firefrostgaming.com)
- Lighter for staff wiki
- Chroma vector DB
- ChatGPT-like interface
### Component 1: Dify (RAG Platform)
**URL:** ai.firefrostgaming.com
**Purpose:** Knowledge management, API backend
**Features:**
- Multi-workspace (Operations, Brainstorming)
- Knowledge graph indexing
- Web interface
- Discord bot API
- Repository integration
## Phases
**Phase 1:** Deploy stack (1-2 hours)
**Phase 2:** Load models (6-8 hours overnight)
**Phase 3:** Document ingestion (2-3 hours active, 6-8 total)
### Component 2: Ollama (Model Server)
**Purpose:** Local model hosting
**Features:**
- Model management
- API compatibility
- Resource optimization
## Models
- Qwen 2.5 Coder 72B (~40GB)
- Llama 3.3 70B (~40GB)
- Llama 3.2 Vision 11B (~7GB)
- Embeddings: all-MiniLM-L6-v2 (~400MB)
### Component 3: Models (Self-Hosted)
**Total:** ~150GB storage, ~110GB RAM when loaded
**Qwen 2.5 Coder 72B**
- Purpose: Infrastructure/coding specialist
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP strategic decisions
**Llama 3.3 70B**
- Purpose: General reasoning
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP general queries
**Llama 3.2 Vision 11B**
- Purpose: Screenshot/image analysis
- RAM: ~7GB when loaded
- Storage: ~7GB
- Use: Visual troubleshooting
### Component 4: Discord Bot
**Purpose:** Staff/subscriber interface
**Features:**
- Role-based access (staff vs subscribers)
- Calls Dify API
- Commands: `/ask`, `/operations`, `/brainstorm`
---
## Usage Model
### Tier 1: Claude Projects (Primary)
**When:** Normal operations
**Experience:** Best (full repo context, session continuity)
**Cost:** $20/month (already paying)
### Tier 2: DERP Backup (Emergency)
**When:** Claude/Anthropic outage
**Experience:** Functional (knowledge graph + 128K context)
**Cost:** $0/month (self-hosted)
### Tier 3: Staff/Subscriber Bots
**When:** Routine queries in Discord/Wiki
**Experience:** Fast, simple
**Cost:** $0/month (same infrastructure)
---
## Resource Requirements
### Storage (TX1 has 1TB)
- Qwen 2.5 Coder 72B: ~40GB
- Llama 3.3 70B: ~40GB
- Llama 3.2 Vision 11B: ~7GB
- Dify + services: ~10GB
- **Total: ~97GB** ✅
### RAM (TX1 has 256GB)
**DERP Activated (one large model loaded):**
- Model: ~80GB (Qwen OR Llama 3.3)
- Dify services: ~4GB
- Overhead: ~8GB
- **Total: ~92GB** ✅
**Normal Operations (models idle):**
- Minimal RAM usage
- Available for game servers
### CPU
- 32 vCPU available
- Inference slower than API
- Functional for emergency use
---
## Deployment Phases
### Phase 1: Core Stack (2-3 hours)
1. Deploy Dify via Docker Compose
2. Install Ollama
3. Download models (overnight - large files)
4. Configure workspaces
5. Index Git repository
### Phase 2: Discord Bot (2-3 hours)
1. Create Python bot
2. Connect to Dify API
3. Implement role-based access
4. Test in Discord
### Phase 3: Documentation (1 hour)
1. Usage guide (when to use what)
2. Emergency DERP procedures
3. Discord bot commands
4. Staff training materials
**Total Time:** 6-8 hours (active work)
---
## Success Criteria
- ✅ Both stacks deployed
- ✅ Dify deployed and indexing repo
- ✅ Models loaded and operational
- ✅ Documents ingested (Ops, Pokerole, Brainstorming)
- ✅ DERP backup functional
- ✅ DERP backup tested (strategic query without Claude)
- ✅ Discord bot functional (staff + subscriber access)
- ✅ Documentation complete
- ✅ Zero additional monthly cost
**See:** deployment-plan.md for detailed phases
---
**Fire + Frost + Foundation** 💙🔥❄️
## Related Documentation
- **deployment-plan.md** - Step-by-step deployment guide
- **usage-guide.md** - When to use Claude vs DERP vs bots
- **resource-requirements.md** - Detailed TX1 resource allocation
- **discord-bot-setup.md** - Bot configuration and commands
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
**Monthly Cost: $20 (no increase from current)**

View File

@@ -0,0 +1,500 @@
# Self-Hosted AI Stack - Deployment Plan
**Task:** Self-Hosted AI Stack on TX1
**Location:** TX1 Dallas (38.68.14.26)
**Total Time:** 6-8 hours (3-4 active, rest overnight downloads)
**Last Updated:** 2026-02-18
---
## Prerequisites
### Before Starting
- [ ] SSH access to TX1
- [ ] Docker installed on TX1
- [ ] Docker Compose installed
- [ ] Sufficient storage (~100GB free)
- [ ] No game servers under heavy load (model downloads are bandwidth-intensive)
### Domain Configuration
- [ ] DNS A record: ai.firefrostgaming.com → 38.68.14.26
- [ ] SSL certificate ready (Let's Encrypt)
---
## Phase 1: Deploy Dify (2-3 hours)
### Step 1.1: Create Directory Structure
```bash
ssh root@38.68.14.26
cd /opt
mkdir -p dify
cd dify
```
### Step 1.2: Download Dify Docker Compose
```bash
wget https://raw.githubusercontent.com/langgenius/dify/main/docker/docker-compose.yaml
```
### Step 1.3: Configure Environment
```bash
# Create .env file
cat > .env << 'EOF'
# Dify Configuration
DIFY_VERSION=0.6.0
API_URL=https://ai.firefrostgaming.com
WEB_API_URL=https://ai.firefrostgaming.com
# Database
POSTGRES_PASSWORD=<generate_secure_password>
POSTGRES_DB=dify
# Redis
REDIS_PASSWORD=<generate_secure_password>
# Secret Key (generate with: openssl rand -base64 32)
SECRET_KEY=<generate_secret_key>
# Storage
STORAGE_TYPE=local
STORAGE_LOCAL_PATH=/app/storage
EOF
```
### Step 1.4: Deploy Dify
```bash
docker-compose up -d
```
**Wait:** 5-10 minutes for all services to start
### Step 1.5: Verify Deployment
```bash
docker-compose ps
# All services should show "Up"
curl http://localhost/health
# Should return: {"status":"ok"}
```
### Step 1.6: Configure Nginx Reverse Proxy
```bash
# Create Nginx config
cat > /etc/nginx/sites-available/ai.firefrostgaming.com << 'EOF'
server {
listen 80;
server_name ai.firefrostgaming.com;
location / {
proxy_pass http://localhost:80;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
EOF
# Enable site
ln -s /etc/nginx/sites-available/ai.firefrostgaming.com /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx
# Get SSL certificate
certbot --nginx -d ai.firefrostgaming.com
```
### Step 1.7: Initial Configuration
1. Visit https://ai.firefrostgaming.com
2. Create admin account (Michael)
3. Configure workspaces:
- **Operations** (infrastructure docs)
- **Brainstorming** (creative docs)
---
## Phase 2: Install Ollama and Models (Overnight)
### Step 2.1: Install Ollama
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
### Step 2.2: Download Models (Overnight - Large Files)
**Download Qwen 2.5 Coder 72B:**
```bash
ollama pull qwen2.5-coder:72b
```
**Size:** ~40GB
**Time:** 2-4 hours (depending on connection)
**Download Llama 3.3 70B:**
```bash
ollama pull llama3.3:70b
```
**Size:** ~40GB
**Time:** 2-4 hours
**Download Llama 3.2 Vision 11B:**
```bash
ollama pull llama3.2-vision:11b
```
**Size:** ~7GB
**Time:** 30-60 minutes
**Total download time:** 6-8 hours (run overnight)
### Step 2.3: Verify Models
```bash
ollama list
# Should show all three models
# Test Qwen
ollama run qwen2.5-coder:72b "Write a bash script to check disk space"
# Should generate script
# Test Llama 3.3
ollama run llama3.3:70b "Explain Firefrost Gaming's Fire + Frost philosophy"
# Should respond
# Test Vision
ollama run llama3.2-vision:11b "Describe this image: /path/to/test/image.jpg"
# Should analyze image
```
### Step 2.4: Configure Ollama as Dify Backend
In Dify web interface:
1. Go to Settings → Model Providers
2. Add Ollama provider
3. URL: http://localhost:11434
4. Add models:
- qwen2.5-coder:72b
- llama3.3:70b
- llama3.2-vision:11b
5. Set Qwen as default for coding queries
6. Set Llama 3.3 as default for general queries
---
## Phase 3: Index Git Repository (1-2 hours)
### Step 3.1: Clone Operations Manual to TX1
```bash
cd /opt/dify
git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
```
### Step 3.2: Configure Dify Knowledge Base
**Operations Workspace:**
1. In Dify, go to Operations workspace
2. Create Knowledge Base: "Infrastructure Docs"
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/`
4. Processing: Automatic chunking with Q&A segmentation
5. Embedding model: Default (all-MiniLM-L6-v2)
**Brainstorming Workspace:**
1. Go to Brainstorming workspace
2. Create Knowledge Base: "Creative Docs"
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/planning/`
4. Same processing settings
**Wait:** 30-60 minutes for indexing (416 files)
### Step 3.3: Test Knowledge Retrieval
In Operations workspace:
- Query: "What is the Frostwall Protocol?"
- Should return relevant docs with citations
In Brainstorming workspace:
- Query: "What is the Terraria branding training arc?"
- Should return planning docs
---
## Phase 4: Discord Bot (2-3 hours)
### Step 4.1: Create Bot on Discord Developer Portal
1. Go to https://discord.com/developers/applications
2. Create new application: "Firefrost AI Assistant"
3. Go to Bot section
4. Create bot
5. Copy bot token
6. Enable Privileged Gateway Intents:
- Message Content Intent
- Server Members Intent
### Step 4.2: Install Bot Code on TX1
```bash
cd /opt
mkdir -p firefrost-discord-bot
cd firefrost-discord-bot
# Create requirements.txt
cat > requirements.txt << 'EOF'
discord.py==2.3.2
aiohttp==3.9.1
python-dotenv==1.0.0
EOF
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
### Step 4.3: Create Bot Script
```bash
cat > bot.py << 'EOF'
import discord
from discord.ext import commands
import aiohttp
import os
from dotenv import load_dotenv
load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')
DIFY_API_URL = os.getenv('DIFY_API_URL')
DIFY_API_KEY = os.getenv('DIFY_API_KEY')
intents = discord.Intents.default()
intents.message_content = True
bot = commands.Bot(command_prefix='/', intents=intents)
@bot.event
async def on_ready():
print(f'{bot.user} is now running!')
@bot.command(name='ask')
async def ask(ctx, *, question):
"""Ask the AI a question"""
# Check user roles
is_staff = any(role.name in ['Staff', 'Admin'] for role in ctx.author.roles)
is_subscriber = any(role.name == 'Subscriber' for role in ctx.author.roles)
if not (is_staff or is_subscriber):
await ctx.send("You need Staff or Subscriber role to use this command.")
return
# Determine workspace based on role
workspace = 'operations' if is_staff else 'general'
await ctx.send(f"🤔 Thinking...")
async with aiohttp.ClientSession() as session:
async with session.post(
f'{DIFY_API_URL}/v1/chat-messages',
headers={
'Authorization': f'Bearer {DIFY_API_KEY}',
'Content-Type': 'application/json'
},
json={
'query': question,
'user': str(ctx.author.id),
'conversation_id': None,
'workspace': workspace
}
) as resp:
if resp.status == 200:
data = await resp.json()
answer = data.get('answer', 'No response')
# Split long responses
if len(answer) > 2000:
chunks = [answer[i:i+2000] for i in range(0, len(answer), 2000)]
for chunk in chunks:
await ctx.send(chunk)
else:
await ctx.send(answer)
else:
await ctx.send("❌ Error connecting to AI. Please try again.")
bot.run(TOKEN)
EOF
```
### Step 4.4: Configure Bot
```bash
# Create .env file
cat > .env << 'EOF'
DISCORD_TOKEN=<your_bot_token>
DIFY_API_URL=https://ai.firefrostgaming.com
DIFY_API_KEY=<get_from_dify_settings>
EOF
```
### Step 4.5: Create Systemd Service
```bash
cat > /etc/systemd/system/firefrost-discord-bot.service << 'EOF'
[Unit]
Description=Firefrost Discord Bot
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/firefrost-discord-bot
Environment="PATH=/opt/firefrost-discord-bot/venv/bin"
ExecStart=/opt/firefrost-discord-bot/venv/bin/python bot.py
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable firefrost-discord-bot
systemctl start firefrost-discord-bot
```
### Step 4.6: Invite Bot to Discord
1. Go to OAuth2 → URL Generator
2. Select scopes: bot, applications.commands
3. Select permissions: Send Messages, Read Message History
4. Copy generated URL
5. Open in browser and invite to Firefrost Discord
### Step 4.7: Test Bot
In Discord:
```
/ask What is the Frostwall Protocol?
```
Should return answer from Operations workspace (staff only)
---
## Phase 5: Testing and Validation (30 minutes)
### Test 1: DERP Backup (Strategic Query)
**Simulate Claude outage:**
1. Load Qwen model: `ollama run qwen2.5-coder:72b`
2. In Dify Operations workspace, ask:
- "Should I deploy Mailcow before or after Frostwall Protocol?"
3. Verify:
- Response references both task docs
- Shows dependency understanding
- Recommends Frostwall first
### Test 2: Discord Bot (Staff Query)
As staff member in Discord:
```
/ask How many game servers are running?
```
Should return infrastructure details
### Test 3: Discord Bot (Subscriber Query)
As subscriber in Discord:
```
/ask What modpacks are available?
```
Should return modpack list (limited to public info)
### Test 4: Resource Monitoring
```bash
# Check RAM usage with model loaded
free -h
# Should show ~92GB used when Qwen loaded
# Check disk usage
df -h /opt/dify
# Should show ~97GB used
# Check Docker containers
docker ps
# All Dify services should be running
```
---
## Phase 6: Documentation (1 hour)
### Create Usage Guide
Document at `/opt/dify/USAGE-GUIDE.md`:
- When to use Claude (primary)
- When to use DERP (Claude down)
- When to use Discord bot (routine queries)
- Emergency procedures
### Update Operations Manual
Commit changes to Git:
- Task documentation updated
- Deployment plan complete
- Usage guide created
---
## Success Criteria Checklist
- [ ] Dify deployed and accessible at https://ai.firefrostgaming.com
- [ ] Ollama running with all 3 models loaded
- [ ] Operations workspace indexing complete (416 files)
- [ ] Brainstorming workspace indexing complete
- [ ] DERP backup tested (strategic query works)
- [ ] Discord bot deployed and running
- [ ] Staff can query via Discord (/ask command)
- [ ] Subscribers have limited access
- [ ] Resource usage within TX1 limits (~92GB RAM, ~97GB storage)
- [ ] Documentation complete and committed to Git
- [ ] Zero additional monthly cost confirmed
---
## Rollback Plan
If deployment fails:
```bash
# Stop all services
cd /opt/dify
docker-compose down
# Stop Discord bot
systemctl stop firefrost-discord-bot
systemctl disable firefrost-discord-bot
# Remove installation
rm -rf /opt/dify
rm -rf /opt/firefrost-discord-bot
rm /etc/systemd/system/firefrost-discord-bot.service
systemctl daemon-reload
# Remove Nginx config
rm /etc/nginx/sites-enabled/ai.firefrostgaming.com
rm /etc/nginx/sites-available/ai.firefrostgaming.com
nginx -t && systemctl reload nginx
# Uninstall Ollama
sudo /usr/local/bin/ollama-uninstall.sh
```
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️

View File

@@ -0,0 +1,367 @@
# AI Stack Resource Requirements
**Server:** TX1 Dallas (38.68.14.26)
**Purpose:** Resource allocation planning
**Last Updated:** 2026-02-18
---
## TX1 Server Specifications
**CPU:** 32 vCPU
**RAM:** 256GB
**Storage:** 1TB NVMe SSD
**Location:** Dallas, TX
**Network:** 1Gbps
**Current Usage (before AI stack):**
- Game servers: 6 Minecraft instances
- Management services: Minimal overhead
- Available for AI: Significant capacity
---
## Storage Requirements
### Component Breakdown
| Component | Size | Purpose |
|-----------|------|---------|
| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model |
| **Llama 3.3 70B** | ~40GB | General reasoning model |
| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model |
| **Dify Services** | ~5GB | Docker containers, databases |
| **Knowledge Base** | ~5GB | Indexed docs, embeddings |
| **Logs & Temp** | ~2GB | Operational overhead |
| **Total** | **~99GB** | ✅ Well under 1TB limit |
### Storage Growth Estimate
**Year 1:**
- Models: 87GB (static, no growth unless upgrading)
- Knowledge base: 5GB → 8GB (as docs grow)
- Logs: 2GB → 5GB (6 months rotation)
- **Total Year 1:** ~100GB
**Storage is NOT a concern.**
---
## RAM Requirements
### Scenario 1: Normal Operations (Claude Available)
| Component | RAM Usage |
|-----------|-----------|
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama (idle)** | <1GB |
| **Total (idle)** | **~8GB** ✅ |
**Game servers have ~248GB available** (256GB - 8GB)
---
### Scenario 2: DERP Activated (Claude Down, Emergency)
**Load ONE large model at a time:**
| Component | RAM Usage |
|-----------|-----------|
| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB |
| **Dify Services** | ~4GB |
| **PostgreSQL** | ~2GB |
| **Redis** | ~1GB |
| **Ollama Runtime** | ~2GB |
| **OS Overhead** | ~3GB |
| **Total (active DERP)** | **~92GB** ✅ |
**Game servers have ~164GB available** (256GB - 92GB)
**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers)
---
### Scenario 3: Vision Model Only (Screenshot Analysis)
| Component | RAM Usage |
|-----------|-----------|
| **Llama 3.2 Vision 11B** | ~7GB |
| **Dify Services** | ~4GB |
| **Other Services** | ~3GB |
| **Total** | **~14GB** ✅ |
**Very lightweight, can run alongside game servers with no impact**
---
## CPU Requirements
### Model Inference Performance
**TX1 has 32 vCPU (shared among all services)**
**Expected Inference Times:**
| Model | Token Generation Speed | Typical Response |
|-------|----------------------|------------------|
| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds |
| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds |
**For comparison:**
- Claude API: 20-40 tokens/second
- **DERP is 5-10× slower** (this is expected and acceptable for emergency use)
**CPU Impact on Game Servers:**
- During DERP inference: ~70-80% CPU usage (temporary spikes)
- Game servers may experience brief lag during AI responses
- **Acceptable for emergency use** (not for normal operations)
---
## Network Requirements
### Initial Model Downloads (One-Time)
| Model | Size | Download Time (1Gbps) |
|-------|------|----------------------|
| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes |
| **Llama 3.3 70B** | ~40GB | 5-10 minutes |
| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes |
| **Total** | **~87GB** | **15-25 minutes** |
**Reality:** Download speeds vary, budget 2-4 hours for all models.
**Recommendation:** Download overnight to avoid impacting game server traffic.
---
### Ongoing Bandwidth
**Dify Web Interface:**
- Minimal (text-based queries)
- ~1-5 KB per query
- Negligible impact
**Discord Bot:**
- Text-based queries only
- ~1-5 KB per query
- Negligible impact
**Model Updates:**
- Infrequent (quarterly at most)
- Same as initial download (~87GB)
- Schedule during low-traffic periods
---
## Resource Allocation Strategy
### Priority Levels
**Priority 1 (Always):** Game Servers
**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.)
**Priority 3 (Emergency Only):** DERP AI Stack
### RAM Allocation Rules
**Normal Operations:**
- Game servers: Up to 240GB
- Management: ~8GB
- AI Stack (idle): ~8GB
- **Total: 256GB** ✅
**DERP Emergency:**
- Game servers: Temporarily limited to 160GB
- Management: ~8GB
- AI Stack (active): ~92GB
- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods)
**If RAM pressure occurs during DERP:**
1. Unload one game server temporarily
2. Run AI query
3. Reload game server
4. **Total downtime per query: <5 minutes**
---
## Monitoring & Alerts
### Critical Thresholds
**RAM Usage:**
- **Warning:** >220GB used (85%)
- **Critical:** >240GB used (93%)
- **Action:** Defer DERP usage or unload game server
**CPU Usage:**
- **Warning:** >80% sustained for >5 minutes
- **Critical:** >90% sustained for >2 minutes
- **Action:** Pause AI inference, prioritize game servers
**Storage:**
- **Warning:** >800GB used (80%)
- **Critical:** >900GB used (90%)
- **Action:** Clean up old logs, model cache
### Monitoring Commands
```bash
# Check RAM
free -h
# Check CPU
htop
# Check storage
df -h /
# Check Ollama status
ollama list
ollama ps # Shows loaded models
# Check Dify
cd /opt/dify
docker-compose ps
docker stats # Real-time resource usage
```
---
## Resource Optimization
### Unload Models When Not Needed
```bash
# Unload all models (frees RAM)
ollama stop qwen2.5-coder:72b
ollama stop llama3.3:70b
ollama stop llama3.2-vision:11b
# Verify RAM freed
free -h
```
### Preload Models for Faster Response
```bash
# Preload model (takes ~30 seconds)
ollama run qwen2.5-coder:72b ""
# Model now in RAM, queries will be faster
```
### Schedule Maintenance Windows
**Best time for model downloads/updates:**
- Tuesday/Wednesday 2-6 AM CST (lowest traffic)
- Announce in Discord 24 hours ahead
- Expected downtime: <10 minutes
---
## Capacity Planning
### Current State (Feb 2026)
- **Game servers:** 6 active
- **RAM available:** 256GB
- **Storage available:** 1TB
- **AI stack:** Fits comfortably
### Growth Scenarios
**Scenario 1: Add 6 more game servers (12 total)**
- Additional RAM needed: ~60GB
- Available for AI (normal): 248GB → 188GB ✅
- Available for AI (DERP): 164GB → 104GB ✅
- **Status:** Still viable
**Scenario 2: Add 12 more game servers (18 total)**
- Additional RAM needed: ~120GB
- Available for AI (normal): 248GB → 128GB ✅
- Available for AI (DERP): 164GB → 44GB ⚠️
- **Status:** DERP would require unloading 2 game servers
**Scenario 3: Upgrade to larger models (theoretical)**
- Qwen 3.0 Coder 170B: ~180GB RAM
- **Status:** Would NOT fit alongside game servers
- **Recommendation:** Stick with 72B models
### Upgrade Path
**If TX1 reaches capacity:**
**Option A: Add second dedicated AI server**
- Move AI stack to separate VPS
- TX1 focuses only on game servers
- Cost: ~$100-200/month (NOT DERP-compliant)
**Option B: Upgrade TX1 RAM**
- 256GB → 512GB
- Cost: Contact Hetzner for pricing
- **Preferred:** Maintains DERP compliance
**Option C: Use smaller AI models**
- Qwen 2.5 Coder 32B (~35GB RAM)
- Llama 3.2 8B (~8GB RAM)
- **Tradeoff:** Lower quality, but more capacity
---
## Disaster Recovery
### Backup Strategy
**What to backup:**
- ✅ Dify configuration files
- ✅ Knowledge base data
- ✅ Discord bot code
- ❌ Models (can re-download)
**Backup location:**
- Git repository (for configs/code)
- NC1 Charlotte (for knowledge base)
**Backup frequency:**
- Configurations: After every change
- Knowledge base: Weekly
- Models: No backup needed
### Recovery Procedure
**If TX1 fails completely:**
1. Deploy Dify on NC1 (temporary)
2. Restore knowledge base from backup
3. Re-download models (~4 hours)
4. Point Discord bot to NC1
5. **Downtime: 4-6 hours**
**Note:** This is acceptable for DERP (emergency-only system)
---
## Cost Analysis
### One-Time Costs
- Setup time: 6-8 hours (Michael's time)
- Model downloads: Bandwidth usage (included in hosting)
- **Total: $0** (sweat equity only)
### Monthly Costs
- Hosting: $0 (using existing TX1)
- Bandwidth: $0 (included in hosting)
- Maintenance: ~1 hour/month (Michael's time)
- **Total: $0/month** ✅
### Opportunity Cost
- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
- Could host 1-2 more game servers in that space
- **Acceptable tradeoff:** DERP independence worth more than 2 game servers
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**

View File

@@ -0,0 +1,342 @@
# AI Stack Usage Guide
**Purpose:** Know which AI system to use when
**Last Updated:** 2026-02-18
---
## The Three-Tier System
### Tier 1: Claude Projects (Primary) - **USE THIS FIRST**
**Who:** Michael + Meg
**Where:** claude.ai or Claude app
**Cost:** $20/month (already paying)
**When to use:**
-**Normal daily operations** (99% of the time)
-**Strategic decision-making** (deployment order, architecture)
-**Complex reasoning** (tradeoffs, dependencies)
-**Session continuity** (remembers context across days)
-**Best experience** (fastest, most capable)
**What Claude can do:**
- Search entire 416-file operations manual
- Write deployment scripts
- Review infrastructure decisions
- Generate documentation
- Debug issues
- Plan roadmaps
**Example queries:**
- "Should I deploy Mailcow or AI stack first?"
- "Write a script to deploy Frostwall Protocol"
- "What tasks depend on NC1 cleanup?"
- "Help me troubleshoot this Pterodactyl error"
**Limitations:**
- Requires internet connection
- Subject to Anthropic availability
---
### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN**
**Who:** Michael + Meg
**Where:** https://ai.firefrostgaming.com
**Cost:** $0/month (self-hosted on TX1)
**When to use:**
-**Not for normal operations** (Claude is faster/better)
-**Anthropic outage** (Claude unavailable for hours)
-**Emergency infrastructure decisions** (can't wait for Claude)
-**Critical troubleshooting** (server down, need immediate help)
**What DERP can do:**
- Query indexed operations manual (416 files)
- Strategic reasoning with 128K context
- Infrastructure troubleshooting
- Code generation
- Emergency deployment guidance
**Available models:**
- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions
- **Llama 3.3 70B** - General reasoning
- **Llama 3.2 Vision 11B** - Screenshot analysis
**Example queries:**
- "Claude is down. What's the deployment order for Frostwall?"
- "Emergency: Mailcow not starting. Check logs and diagnose."
- "Need to deploy something NOW. What dependencies are missing?"
**Limitations:**
- Slower inference than Claude
- No session continuity
- Manual model selection
- Uses TX1 resources (~80GB RAM when active)
**How to activate:**
1. Verify Claude is unavailable (try multiple times)
2. Go to https://ai.firefrostgaming.com
3. Select workspace:
- **Operations** - Infrastructure decisions
- **Brainstorming** - Creative work
4. Select model:
- **Qwen 2.5 Coder** - For deployment/troubleshooting
- **Llama 3.3** - For general questions
5. Ask question
6. Copy/paste response as needed
**When to deactivate:**
- Claude comes back online
- Emergency resolved
- Free up TX1 RAM for game servers
---
### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES**
**Who:** Staff + Subscribers
**Where:** Firefrost Discord server
**Cost:** $0/month (same infrastructure)
**When to use:**
-**Routine questions** (daily operations)
-**Quick lookups** (server status, modpack info)
-**Staff training** (how-to queries)
-**Subscriber support** (basic info)
**Commands:**
**`/ask [question]`**
- Available to: Staff + Subscribers
- Searches: Operations workspace (staff) or public docs (subscribers)
- Rate limit: 10 queries/hour per user
**Example queries (Staff):**
```
/ask How many game servers are running?
/ask What's the Whitelist Manager deployment status?
/ask How do I restart a Minecraft server?
```
**Example queries (Subscribers):**
```
/ask What modpacks are available?
/ask How do I join a server?
/ask What's the difference between Fire and Frost paths?
```
**Role-based access:**
- **Staff:** Full Operations workspace access
- **Subscribers:** Public documentation only
- **No role:** Cannot use bot
**Limitations:**
- Simple queries only (no complex reasoning)
- No file uploads
- No strategic decisions
- Rate limited
---
## Decision Tree
```
┌─────────────────────────────────────┐
│ Do you need AI assistance? │
└─────────────┬───────────────────────┘
┌───────────────┐
│ Is it urgent? │
└───┬───────┬───┘
│ │
NO│ │YES
│ │
▼ ▼
┌─────────┐ ┌──────────────┐
│ Claude │ │ Is Claude │
│ working?│ │ available? │
└───┬─────┘ └──┬───────┬───┘
│ │ │
YES│ YES│ │NO
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌─────────┐
│Use Claude│ │Use Claude│ │Use DERP │
│Projects │ │Projects │ │Backup │
└──────────┘ └──────────┘ └─────────┘
```
**For staff/subscribers:**
```
┌────────────────────────────┐
│ Simple routine query? │
└──────────┬─────────────────┘
YES
┌──────────────┐
│ Use Discord │
│ Bot: /ask │
└──────────────┘
```
---
## Emergency Procedures
### Scenario 1: Claude Down, Need Strategic Decision
**Problem:** Anthropic outage, need to deploy something NOW
**Solution:**
1. Verify Claude truly unavailable (try web + app)
2. Go to https://ai.firefrostgaming.com
3. Login with Michael's account
4. Select Operations workspace
5. Select Qwen 2.5 Coder model
6. Ask strategic question
7. Copy deployment commands
8. Execute carefully (no session memory!)
**Note:** DERP doesn't remember context. Be explicit in each query.
### Scenario 2: Discord Bot Down
**Problem:** Staff reporting bot not responding
**Check status:**
```bash
ssh root@38.68.14.26
systemctl status firefrost-discord-bot
```
**If stopped:**
```bash
systemctl start firefrost-discord-bot
```
**If errors:**
```bash
journalctl -u firefrost-discord-bot -f
# Check for API errors, token issues
```
**If Dify down:**
```bash
cd /opt/dify
docker-compose ps
# If services down:
docker-compose up -d
```
### Scenario 3: Model Won't Load
**Problem:** DERP system reports "model unavailable"
**Check Ollama:**
```bash
ollama list
# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b
```
**If models missing:**
```bash
# Re-download
ollama pull qwen2.5-coder:72b
ollama pull llama3.3:70b
ollama pull llama3.2-vision:11b
```
**Check RAM:**
```bash
free -h
# If <90GB free, unload game servers temporarily
```
---
## Cost Tracking
### Monthly Costs
- **Claude Projects:** $20/month (primary system)
- **Dify:** $0/month (self-hosted)
- **Ollama:** $0/month (self-hosted)
- **Discord Bot:** $0/month (self-hosted)
- **Total:** $20/month ✅
### Resource Usage (TX1)
- **Storage:** ~97GB (one-time)
- **RAM (active DERP):** ~92GB (temporary)
- **RAM (idle):** <5GB (normal)
- **Bandwidth:** Models downloaded once, minimal ongoing
---
## Performance Expectations
### Claude Projects (Primary)
- **Response time:** 5-30 seconds
- **Quality:** Excellent (GPT-4 class)
- **Context:** Full repo (416 files)
- **Session memory:** Yes
### DERP Backup (Emergency)
- **Response time:** 30-120 seconds (slower than Claude)
- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model)
- **Context:** 128K tokens per query
- **Session memory:** No (each query independent)
### Discord Bot (Routine)
- **Response time:** 10-45 seconds
- **Quality:** Good for simple queries
- **Context:** Knowledge base search
- **Rate limit:** 10 queries/hour per user
---
## Best Practices
### For Michael + Meg:
1.**Always use Claude Projects first** (best experience)
2.**Only use DERP for true emergencies** (Claude unavailable)
3.**Document DERP usage** (so Claude can learn from it later)
4.**Free TX1 RAM after DERP use** (restart Ollama if needed)
### For Staff:
1.**Use Discord bot for quick lookups** (fast, simple)
2.**Ask Michael/Meg for complex questions** (they have Claude)
3.**Don't abuse rate limits** (10 queries/hour is generous)
4.**Report bot issues immediately** (don't let it stay broken)
### For Subscribers:
1.**Use Discord bot for server info** (join instructions, modpacks)
2.**Don't ask for staff-only info** (bot will decline)
3.**Be patient** (bot shares resources with staff)
---
## Training & Onboarding
### New Staff Training:
1. Introduce Discord bot commands (`/ask`)
2. Show example queries (moderation, server management)
3. Explain rate limits
4. When to escalate to Michael/Meg
### Subscriber Communication:
1. Announce bot in Discord
2. Pin message with `/ask` command
3. Example queries in welcome channel
4. FAQ: "What can the bot answer?"
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.**
**Monthly cost: $20 (no increase)**