Task #9: Rewrite AI Stack architecture for DERP compliance
Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
This commit is contained in:
@@ -178,15 +178,17 @@ Professional @firefrostgaming.com email on NC1. Self-hosted, $120/year saved, el
|
||||
---
|
||||
|
||||
### 9. Self-Hosted AI Stack on TX1
|
||||
**Time:** 8-12 hours (3-4 active, rest downloads)
|
||||
**Time:** 6-8 hours (3-4 active, rest downloads)
|
||||
**Status:** BLOCKED - Medical clearance
|
||||
**Documentation:** `docs/tasks/self-hosted-ai-stack-on-tx1/`
|
||||
|
||||
Dual AI deployment: AnythingLLM (ops) + Open WebUI (staff). DERP backup, unlimited AI access.
|
||||
DERP-compliant AI infrastructure: Dify + Ollama + self-hosted models. Three-tier usage: Claude Projects (primary) → DERP backup (emergency) → Discord/Wiki bots (staff/subscribers).
|
||||
|
||||
**Architecture:** Dify with knowledge graph RAG, Ollama model server
|
||||
**Models:** Qwen 2.5 Coder 72B, Llama 3.3 70B, Llama 3.2 Vision 11B
|
||||
**Storage:** ~150GB
|
||||
**RAM:** ~110GB when loaded
|
||||
**Storage:** ~97GB
|
||||
**RAM:** ~92GB when DERP activated, ~8GB idle
|
||||
**Monthly Cost:** $0 (self-hosted, no additional cost beyond Claude Pro)
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -2,43 +2,167 @@
|
||||
|
||||
**Status:** Blocked - Medical clearance
|
||||
**Priority:** Tier 2 - Major Infrastructure
|
||||
**Time:** 8-12 hours (3-4 active, rest downloads)
|
||||
**Time:** 6-8 hours (3-4 active, rest downloads)
|
||||
**Location:** TX1 Dallas
|
||||
**Last Updated:** 2026-02-16
|
||||
**Last Updated:** 2026-02-18
|
||||
**Updated By:** The Chronicler
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
Dual AI deployment: AnythingLLM (Michael/Meg, document-heavy) + Open WebUI (staff assistant). DERP backup, unlimited AI access, staff foundation.
|
||||
|
||||
**DERP-compliant AI infrastructure with zero additional monthly cost.**
|
||||
|
||||
Three-tier usage model:
|
||||
1. **Primary:** Claude Projects (best experience, full repo context)
|
||||
2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable
|
||||
3. **Staff/Subscriber Bots:** Discord + Wiki integration
|
||||
|
||||
**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription)
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
**Primary: AnythingLLM** (ai.firefrostgaming.com)
|
||||
- 1,000+ document libraries
|
||||
- LanceDB vector database
|
||||
- Workspace isolation (Operations, Pokerole, Brainstorming)
|
||||
|
||||
**Secondary: Open WebUI** (staff-ai.firefrostgaming.com)
|
||||
- Lighter for staff wiki
|
||||
- Chroma vector DB
|
||||
- ChatGPT-like interface
|
||||
### Component 1: Dify (RAG Platform)
|
||||
**URL:** ai.firefrostgaming.com
|
||||
**Purpose:** Knowledge management, API backend
|
||||
**Features:**
|
||||
- Multi-workspace (Operations, Brainstorming)
|
||||
- Knowledge graph indexing
|
||||
- Web interface
|
||||
- Discord bot API
|
||||
- Repository integration
|
||||
|
||||
## Phases
|
||||
**Phase 1:** Deploy stack (1-2 hours)
|
||||
**Phase 2:** Load models (6-8 hours overnight)
|
||||
**Phase 3:** Document ingestion (2-3 hours active, 6-8 total)
|
||||
### Component 2: Ollama (Model Server)
|
||||
**Purpose:** Local model hosting
|
||||
**Features:**
|
||||
- Model management
|
||||
- API compatibility
|
||||
- Resource optimization
|
||||
|
||||
## Models
|
||||
- Qwen 2.5 Coder 72B (~40GB)
|
||||
- Llama 3.3 70B (~40GB)
|
||||
- Llama 3.2 Vision 11B (~7GB)
|
||||
- Embeddings: all-MiniLM-L6-v2 (~400MB)
|
||||
### Component 3: Models (Self-Hosted)
|
||||
|
||||
**Total:** ~150GB storage, ~110GB RAM when loaded
|
||||
**Qwen 2.5 Coder 72B**
|
||||
- Purpose: Infrastructure/coding specialist
|
||||
- Context: 128K tokens
|
||||
- RAM: ~80GB when loaded
|
||||
- Storage: ~40GB
|
||||
- Use: DERP strategic decisions
|
||||
|
||||
**Llama 3.3 70B**
|
||||
- Purpose: General reasoning
|
||||
- Context: 128K tokens
|
||||
- RAM: ~80GB when loaded
|
||||
- Storage: ~40GB
|
||||
- Use: DERP general queries
|
||||
|
||||
**Llama 3.2 Vision 11B**
|
||||
- Purpose: Screenshot/image analysis
|
||||
- RAM: ~7GB when loaded
|
||||
- Storage: ~7GB
|
||||
- Use: Visual troubleshooting
|
||||
|
||||
### Component 4: Discord Bot
|
||||
**Purpose:** Staff/subscriber interface
|
||||
**Features:**
|
||||
- Role-based access (staff vs subscribers)
|
||||
- Calls Dify API
|
||||
- Commands: `/ask`, `/operations`, `/brainstorm`
|
||||
|
||||
---
|
||||
|
||||
## Usage Model
|
||||
|
||||
### Tier 1: Claude Projects (Primary)
|
||||
**When:** Normal operations
|
||||
**Experience:** Best (full repo context, session continuity)
|
||||
**Cost:** $20/month (already paying)
|
||||
|
||||
### Tier 2: DERP Backup (Emergency)
|
||||
**When:** Claude/Anthropic outage
|
||||
**Experience:** Functional (knowledge graph + 128K context)
|
||||
**Cost:** $0/month (self-hosted)
|
||||
|
||||
### Tier 3: Staff/Subscriber Bots
|
||||
**When:** Routine queries in Discord/Wiki
|
||||
**Experience:** Fast, simple
|
||||
**Cost:** $0/month (same infrastructure)
|
||||
|
||||
---
|
||||
|
||||
## Resource Requirements
|
||||
|
||||
### Storage (TX1 has 1TB)
|
||||
- Qwen 2.5 Coder 72B: ~40GB
|
||||
- Llama 3.3 70B: ~40GB
|
||||
- Llama 3.2 Vision 11B: ~7GB
|
||||
- Dify + services: ~10GB
|
||||
- **Total: ~97GB** ✅
|
||||
|
||||
### RAM (TX1 has 256GB)
|
||||
**DERP Activated (one large model loaded):**
|
||||
- Model: ~80GB (Qwen OR Llama 3.3)
|
||||
- Dify services: ~4GB
|
||||
- Overhead: ~8GB
|
||||
- **Total: ~92GB** ✅
|
||||
|
||||
**Normal Operations (models idle):**
|
||||
- Minimal RAM usage
|
||||
- Available for game servers
|
||||
|
||||
### CPU
|
||||
- 32 vCPU available
|
||||
- Inference slower than API
|
||||
- Functional for emergency use
|
||||
|
||||
---
|
||||
|
||||
## Deployment Phases
|
||||
|
||||
### Phase 1: Core Stack (2-3 hours)
|
||||
1. Deploy Dify via Docker Compose
|
||||
2. Install Ollama
|
||||
3. Download models (overnight - large files)
|
||||
4. Configure workspaces
|
||||
5. Index Git repository
|
||||
|
||||
### Phase 2: Discord Bot (2-3 hours)
|
||||
1. Create Python bot
|
||||
2. Connect to Dify API
|
||||
3. Implement role-based access
|
||||
4. Test in Discord
|
||||
|
||||
### Phase 3: Documentation (1 hour)
|
||||
1. Usage guide (when to use what)
|
||||
2. Emergency DERP procedures
|
||||
3. Discord bot commands
|
||||
4. Staff training materials
|
||||
|
||||
**Total Time:** 6-8 hours (active work)
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
- ✅ Both stacks deployed
|
||||
|
||||
- ✅ Dify deployed and indexing repo
|
||||
- ✅ Models loaded and operational
|
||||
- ✅ Documents ingested (Ops, Pokerole, Brainstorming)
|
||||
- ✅ DERP backup functional
|
||||
- ✅ DERP backup tested (strategic query without Claude)
|
||||
- ✅ Discord bot functional (staff + subscriber access)
|
||||
- ✅ Documentation complete
|
||||
- ✅ Zero additional monthly cost
|
||||
|
||||
**See:** deployment-plan.md for detailed phases
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation** 💙🔥❄️
|
||||
## Related Documentation
|
||||
|
||||
- **deployment-plan.md** - Step-by-step deployment guide
|
||||
- **usage-guide.md** - When to use Claude vs DERP vs bots
|
||||
- **resource-requirements.md** - Detailed TX1 resource allocation
|
||||
- **discord-bot-setup.md** - Bot configuration and commands
|
||||
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
||||
|
||||
**Monthly Cost: $20 (no increase from current)**
|
||||
|
||||
500
docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md
Normal file
500
docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md
Normal file
@@ -0,0 +1,500 @@
|
||||
# Self-Hosted AI Stack - Deployment Plan
|
||||
|
||||
**Task:** Self-Hosted AI Stack on TX1
|
||||
**Location:** TX1 Dallas (38.68.14.26)
|
||||
**Total Time:** 6-8 hours (3-4 active, rest overnight downloads)
|
||||
**Last Updated:** 2026-02-18
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Before Starting
|
||||
- [ ] SSH access to TX1
|
||||
- [ ] Docker installed on TX1
|
||||
- [ ] Docker Compose installed
|
||||
- [ ] Sufficient storage (~100GB free)
|
||||
- [ ] No game servers under heavy load (model downloads are bandwidth-intensive)
|
||||
|
||||
### Domain Configuration
|
||||
- [ ] DNS A record: ai.firefrostgaming.com → 38.68.14.26
|
||||
- [ ] SSL certificate ready (Let's Encrypt)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Deploy Dify (2-3 hours)
|
||||
|
||||
### Step 1.1: Create Directory Structure
|
||||
|
||||
```bash
|
||||
ssh root@38.68.14.26
|
||||
cd /opt
|
||||
mkdir -p dify
|
||||
cd dify
|
||||
```
|
||||
|
||||
### Step 1.2: Download Dify Docker Compose
|
||||
|
||||
```bash
|
||||
wget https://raw.githubusercontent.com/langgenius/dify/main/docker/docker-compose.yaml
|
||||
```
|
||||
|
||||
### Step 1.3: Configure Environment
|
||||
|
||||
```bash
|
||||
# Create .env file
|
||||
cat > .env << 'EOF'
|
||||
# Dify Configuration
|
||||
DIFY_VERSION=0.6.0
|
||||
API_URL=https://ai.firefrostgaming.com
|
||||
WEB_API_URL=https://ai.firefrostgaming.com
|
||||
|
||||
# Database
|
||||
POSTGRES_PASSWORD=<generate_secure_password>
|
||||
POSTGRES_DB=dify
|
||||
|
||||
# Redis
|
||||
REDIS_PASSWORD=<generate_secure_password>
|
||||
|
||||
# Secret Key (generate with: openssl rand -base64 32)
|
||||
SECRET_KEY=<generate_secret_key>
|
||||
|
||||
# Storage
|
||||
STORAGE_TYPE=local
|
||||
STORAGE_LOCAL_PATH=/app/storage
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 1.4: Deploy Dify
|
||||
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
**Wait:** 5-10 minutes for all services to start
|
||||
|
||||
### Step 1.5: Verify Deployment
|
||||
|
||||
```bash
|
||||
docker-compose ps
|
||||
# All services should show "Up"
|
||||
|
||||
curl http://localhost/health
|
||||
# Should return: {"status":"ok"}
|
||||
```
|
||||
|
||||
### Step 1.6: Configure Nginx Reverse Proxy
|
||||
|
||||
```bash
|
||||
# Create Nginx config
|
||||
cat > /etc/nginx/sites-available/ai.firefrostgaming.com << 'EOF'
|
||||
server {
|
||||
listen 80;
|
||||
server_name ai.firefrostgaming.com;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:80;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# Enable site
|
||||
ln -s /etc/nginx/sites-available/ai.firefrostgaming.com /etc/nginx/sites-enabled/
|
||||
nginx -t
|
||||
systemctl reload nginx
|
||||
|
||||
# Get SSL certificate
|
||||
certbot --nginx -d ai.firefrostgaming.com
|
||||
```
|
||||
|
||||
### Step 1.7: Initial Configuration
|
||||
|
||||
1. Visit https://ai.firefrostgaming.com
|
||||
2. Create admin account (Michael)
|
||||
3. Configure workspaces:
|
||||
- **Operations** (infrastructure docs)
|
||||
- **Brainstorming** (creative docs)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Install Ollama and Models (Overnight)
|
||||
|
||||
### Step 2.1: Install Ollama
|
||||
|
||||
```bash
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
```
|
||||
|
||||
### Step 2.2: Download Models (Overnight - Large Files)
|
||||
|
||||
**Download Qwen 2.5 Coder 72B:**
|
||||
```bash
|
||||
ollama pull qwen2.5-coder:72b
|
||||
```
|
||||
**Size:** ~40GB
|
||||
**Time:** 2-4 hours (depending on connection)
|
||||
|
||||
**Download Llama 3.3 70B:**
|
||||
```bash
|
||||
ollama pull llama3.3:70b
|
||||
```
|
||||
**Size:** ~40GB
|
||||
**Time:** 2-4 hours
|
||||
|
||||
**Download Llama 3.2 Vision 11B:**
|
||||
```bash
|
||||
ollama pull llama3.2-vision:11b
|
||||
```
|
||||
**Size:** ~7GB
|
||||
**Time:** 30-60 minutes
|
||||
|
||||
**Total download time:** 6-8 hours (run overnight)
|
||||
|
||||
### Step 2.3: Verify Models
|
||||
|
||||
```bash
|
||||
ollama list
|
||||
# Should show all three models
|
||||
|
||||
# Test Qwen
|
||||
ollama run qwen2.5-coder:72b "Write a bash script to check disk space"
|
||||
# Should generate script
|
||||
|
||||
# Test Llama 3.3
|
||||
ollama run llama3.3:70b "Explain Firefrost Gaming's Fire + Frost philosophy"
|
||||
# Should respond
|
||||
|
||||
# Test Vision
|
||||
ollama run llama3.2-vision:11b "Describe this image: /path/to/test/image.jpg"
|
||||
# Should analyze image
|
||||
```
|
||||
|
||||
### Step 2.4: Configure Ollama as Dify Backend
|
||||
|
||||
In Dify web interface:
|
||||
1. Go to Settings → Model Providers
|
||||
2. Add Ollama provider
|
||||
3. URL: http://localhost:11434
|
||||
4. Add models:
|
||||
- qwen2.5-coder:72b
|
||||
- llama3.3:70b
|
||||
- llama3.2-vision:11b
|
||||
5. Set Qwen as default for coding queries
|
||||
6. Set Llama 3.3 as default for general queries
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Index Git Repository (1-2 hours)
|
||||
|
||||
### Step 3.1: Clone Operations Manual to TX1
|
||||
|
||||
```bash
|
||||
cd /opt/dify
|
||||
git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
|
||||
```
|
||||
|
||||
### Step 3.2: Configure Dify Knowledge Base
|
||||
|
||||
**Operations Workspace:**
|
||||
1. In Dify, go to Operations workspace
|
||||
2. Create Knowledge Base: "Infrastructure Docs"
|
||||
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/`
|
||||
4. Processing: Automatic chunking with Q&A segmentation
|
||||
5. Embedding model: Default (all-MiniLM-L6-v2)
|
||||
|
||||
**Brainstorming Workspace:**
|
||||
1. Go to Brainstorming workspace
|
||||
2. Create Knowledge Base: "Creative Docs"
|
||||
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/planning/`
|
||||
4. Same processing settings
|
||||
|
||||
**Wait:** 30-60 minutes for indexing (416 files)
|
||||
|
||||
### Step 3.3: Test Knowledge Retrieval
|
||||
|
||||
In Operations workspace:
|
||||
- Query: "What is the Frostwall Protocol?"
|
||||
- Should return relevant docs with citations
|
||||
|
||||
In Brainstorming workspace:
|
||||
- Query: "What is the Terraria branding training arc?"
|
||||
- Should return planning docs
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Discord Bot (2-3 hours)
|
||||
|
||||
### Step 4.1: Create Bot on Discord Developer Portal
|
||||
|
||||
1. Go to https://discord.com/developers/applications
|
||||
2. Create new application: "Firefrost AI Assistant"
|
||||
3. Go to Bot section
|
||||
4. Create bot
|
||||
5. Copy bot token
|
||||
6. Enable Privileged Gateway Intents:
|
||||
- Message Content Intent
|
||||
- Server Members Intent
|
||||
|
||||
### Step 4.2: Install Bot Code on TX1
|
||||
|
||||
```bash
|
||||
cd /opt
|
||||
mkdir -p firefrost-discord-bot
|
||||
cd firefrost-discord-bot
|
||||
|
||||
# Create requirements.txt
|
||||
cat > requirements.txt << 'EOF'
|
||||
discord.py==2.3.2
|
||||
aiohttp==3.9.1
|
||||
python-dotenv==1.0.0
|
||||
EOF
|
||||
|
||||
# Create virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Step 4.3: Create Bot Script
|
||||
|
||||
```bash
|
||||
cat > bot.py << 'EOF'
|
||||
import discord
|
||||
from discord.ext import commands
|
||||
import aiohttp
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
||||
TOKEN = os.getenv('DISCORD_TOKEN')
|
||||
DIFY_API_URL = os.getenv('DIFY_API_URL')
|
||||
DIFY_API_KEY = os.getenv('DIFY_API_KEY')
|
||||
|
||||
intents = discord.Intents.default()
|
||||
intents.message_content = True
|
||||
bot = commands.Bot(command_prefix='/', intents=intents)
|
||||
|
||||
@bot.event
|
||||
async def on_ready():
|
||||
print(f'{bot.user} is now running!')
|
||||
|
||||
@bot.command(name='ask')
|
||||
async def ask(ctx, *, question):
|
||||
"""Ask the AI a question"""
|
||||
# Check user roles
|
||||
is_staff = any(role.name in ['Staff', 'Admin'] for role in ctx.author.roles)
|
||||
is_subscriber = any(role.name == 'Subscriber' for role in ctx.author.roles)
|
||||
|
||||
if not (is_staff or is_subscriber):
|
||||
await ctx.send("You need Staff or Subscriber role to use this command.")
|
||||
return
|
||||
|
||||
# Determine workspace based on role
|
||||
workspace = 'operations' if is_staff else 'general'
|
||||
|
||||
await ctx.send(f"🤔 Thinking...")
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
f'{DIFY_API_URL}/v1/chat-messages',
|
||||
headers={
|
||||
'Authorization': f'Bearer {DIFY_API_KEY}',
|
||||
'Content-Type': 'application/json'
|
||||
},
|
||||
json={
|
||||
'query': question,
|
||||
'user': str(ctx.author.id),
|
||||
'conversation_id': None,
|
||||
'workspace': workspace
|
||||
}
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
answer = data.get('answer', 'No response')
|
||||
|
||||
# Split long responses
|
||||
if len(answer) > 2000:
|
||||
chunks = [answer[i:i+2000] for i in range(0, len(answer), 2000)]
|
||||
for chunk in chunks:
|
||||
await ctx.send(chunk)
|
||||
else:
|
||||
await ctx.send(answer)
|
||||
else:
|
||||
await ctx.send("❌ Error connecting to AI. Please try again.")
|
||||
|
||||
bot.run(TOKEN)
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 4.4: Configure Bot
|
||||
|
||||
```bash
|
||||
# Create .env file
|
||||
cat > .env << 'EOF'
|
||||
DISCORD_TOKEN=<your_bot_token>
|
||||
DIFY_API_URL=https://ai.firefrostgaming.com
|
||||
DIFY_API_KEY=<get_from_dify_settings>
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 4.5: Create Systemd Service
|
||||
|
||||
```bash
|
||||
cat > /etc/systemd/system/firefrost-discord-bot.service << 'EOF'
|
||||
[Unit]
|
||||
Description=Firefrost Discord Bot
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=root
|
||||
WorkingDirectory=/opt/firefrost-discord-bot
|
||||
Environment="PATH=/opt/firefrost-discord-bot/venv/bin"
|
||||
ExecStart=/opt/firefrost-discord-bot/venv/bin/python bot.py
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
systemctl daemon-reload
|
||||
systemctl enable firefrost-discord-bot
|
||||
systemctl start firefrost-discord-bot
|
||||
```
|
||||
|
||||
### Step 4.6: Invite Bot to Discord
|
||||
|
||||
1. Go to OAuth2 → URL Generator
|
||||
2. Select scopes: bot, applications.commands
|
||||
3. Select permissions: Send Messages, Read Message History
|
||||
4. Copy generated URL
|
||||
5. Open in browser and invite to Firefrost Discord
|
||||
|
||||
### Step 4.7: Test Bot
|
||||
|
||||
In Discord:
|
||||
```
|
||||
/ask What is the Frostwall Protocol?
|
||||
```
|
||||
Should return answer from Operations workspace (staff only)
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Testing and Validation (30 minutes)
|
||||
|
||||
### Test 1: DERP Backup (Strategic Query)
|
||||
|
||||
**Simulate Claude outage:**
|
||||
1. Load Qwen model: `ollama run qwen2.5-coder:72b`
|
||||
2. In Dify Operations workspace, ask:
|
||||
- "Should I deploy Mailcow before or after Frostwall Protocol?"
|
||||
3. Verify:
|
||||
- Response references both task docs
|
||||
- Shows dependency understanding
|
||||
- Recommends Frostwall first
|
||||
|
||||
### Test 2: Discord Bot (Staff Query)
|
||||
|
||||
As staff member in Discord:
|
||||
```
|
||||
/ask How many game servers are running?
|
||||
```
|
||||
Should return infrastructure details
|
||||
|
||||
### Test 3: Discord Bot (Subscriber Query)
|
||||
|
||||
As subscriber in Discord:
|
||||
```
|
||||
/ask What modpacks are available?
|
||||
```
|
||||
Should return modpack list (limited to public info)
|
||||
|
||||
### Test 4: Resource Monitoring
|
||||
|
||||
```bash
|
||||
# Check RAM usage with model loaded
|
||||
free -h
|
||||
# Should show ~92GB used when Qwen loaded
|
||||
|
||||
# Check disk usage
|
||||
df -h /opt/dify
|
||||
# Should show ~97GB used
|
||||
|
||||
# Check Docker containers
|
||||
docker ps
|
||||
# All Dify services should be running
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Documentation (1 hour)
|
||||
|
||||
### Create Usage Guide
|
||||
|
||||
Document at `/opt/dify/USAGE-GUIDE.md`:
|
||||
- When to use Claude (primary)
|
||||
- When to use DERP (Claude down)
|
||||
- When to use Discord bot (routine queries)
|
||||
- Emergency procedures
|
||||
|
||||
### Update Operations Manual
|
||||
|
||||
Commit changes to Git:
|
||||
- Task documentation updated
|
||||
- Deployment plan complete
|
||||
- Usage guide created
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria Checklist
|
||||
|
||||
- [ ] Dify deployed and accessible at https://ai.firefrostgaming.com
|
||||
- [ ] Ollama running with all 3 models loaded
|
||||
- [ ] Operations workspace indexing complete (416 files)
|
||||
- [ ] Brainstorming workspace indexing complete
|
||||
- [ ] DERP backup tested (strategic query works)
|
||||
- [ ] Discord bot deployed and running
|
||||
- [ ] Staff can query via Discord (/ask command)
|
||||
- [ ] Subscribers have limited access
|
||||
- [ ] Resource usage within TX1 limits (~92GB RAM, ~97GB storage)
|
||||
- [ ] Documentation complete and committed to Git
|
||||
- [ ] Zero additional monthly cost confirmed
|
||||
|
||||
---
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If deployment fails:
|
||||
|
||||
```bash
|
||||
# Stop all services
|
||||
cd /opt/dify
|
||||
docker-compose down
|
||||
|
||||
# Stop Discord bot
|
||||
systemctl stop firefrost-discord-bot
|
||||
systemctl disable firefrost-discord-bot
|
||||
|
||||
# Remove installation
|
||||
rm -rf /opt/dify
|
||||
rm -rf /opt/firefrost-discord-bot
|
||||
rm /etc/systemd/system/firefrost-discord-bot.service
|
||||
systemctl daemon-reload
|
||||
|
||||
# Remove Nginx config
|
||||
rm /etc/nginx/sites-enabled/ai.firefrostgaming.com
|
||||
rm /etc/nginx/sites-available/ai.firefrostgaming.com
|
||||
nginx -t && systemctl reload nginx
|
||||
|
||||
# Uninstall Ollama
|
||||
sudo /usr/local/bin/ollama-uninstall.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
||||
367
docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
Normal file
367
docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md
Normal file
@@ -0,0 +1,367 @@
|
||||
# AI Stack Resource Requirements
|
||||
|
||||
**Server:** TX1 Dallas (38.68.14.26)
|
||||
**Purpose:** Resource allocation planning
|
||||
**Last Updated:** 2026-02-18
|
||||
|
||||
---
|
||||
|
||||
## TX1 Server Specifications
|
||||
|
||||
**CPU:** 32 vCPU
|
||||
**RAM:** 256GB
|
||||
**Storage:** 1TB NVMe SSD
|
||||
**Location:** Dallas, TX
|
||||
**Network:** 1Gbps
|
||||
|
||||
**Current Usage (before AI stack):**
|
||||
- Game servers: 6 Minecraft instances
|
||||
- Management services: Minimal overhead
|
||||
- Available for AI: Significant capacity
|
||||
|
||||
---
|
||||
|
||||
## Storage Requirements
|
||||
|
||||
### Component Breakdown
|
||||
|
||||
| Component | Size | Purpose |
|
||||
|-----------|------|---------|
|
||||
| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model |
|
||||
| **Llama 3.3 70B** | ~40GB | General reasoning model |
|
||||
| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model |
|
||||
| **Dify Services** | ~5GB | Docker containers, databases |
|
||||
| **Knowledge Base** | ~5GB | Indexed docs, embeddings |
|
||||
| **Logs & Temp** | ~2GB | Operational overhead |
|
||||
| **Total** | **~99GB** | ✅ Well under 1TB limit |
|
||||
|
||||
### Storage Growth Estimate
|
||||
|
||||
**Year 1:**
|
||||
- Models: 87GB (static, no growth unless upgrading)
|
||||
- Knowledge base: 5GB → 8GB (as docs grow)
|
||||
- Logs: 2GB → 5GB (6 months rotation)
|
||||
- **Total Year 1:** ~100GB
|
||||
|
||||
**Storage is NOT a concern.**
|
||||
|
||||
---
|
||||
|
||||
## RAM Requirements
|
||||
|
||||
### Scenario 1: Normal Operations (Claude Available)
|
||||
|
||||
| Component | RAM Usage |
|
||||
|-----------|-----------|
|
||||
| **Dify Services** | ~4GB |
|
||||
| **PostgreSQL** | ~2GB |
|
||||
| **Redis** | ~1GB |
|
||||
| **Ollama (idle)** | <1GB |
|
||||
| **Total (idle)** | **~8GB** ✅ |
|
||||
|
||||
**Game servers have ~248GB available** (256GB - 8GB)
|
||||
|
||||
---
|
||||
|
||||
### Scenario 2: DERP Activated (Claude Down, Emergency)
|
||||
|
||||
**Load ONE large model at a time:**
|
||||
|
||||
| Component | RAM Usage |
|
||||
|-----------|-----------|
|
||||
| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB |
|
||||
| **Dify Services** | ~4GB |
|
||||
| **PostgreSQL** | ~2GB |
|
||||
| **Redis** | ~1GB |
|
||||
| **Ollama Runtime** | ~2GB |
|
||||
| **OS Overhead** | ~3GB |
|
||||
| **Total (active DERP)** | **~92GB** ✅ |
|
||||
|
||||
**Game servers have ~164GB available** (256GB - 92GB)
|
||||
|
||||
**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers)
|
||||
|
||||
---
|
||||
|
||||
### Scenario 3: Vision Model Only (Screenshot Analysis)
|
||||
|
||||
| Component | RAM Usage |
|
||||
|-----------|-----------|
|
||||
| **Llama 3.2 Vision 11B** | ~7GB |
|
||||
| **Dify Services** | ~4GB |
|
||||
| **Other Services** | ~3GB |
|
||||
| **Total** | **~14GB** ✅ |
|
||||
|
||||
**Very lightweight, can run alongside game servers with no impact**
|
||||
|
||||
---
|
||||
|
||||
## CPU Requirements
|
||||
|
||||
### Model Inference Performance
|
||||
|
||||
**TX1 has 32 vCPU (shared among all services)**
|
||||
|
||||
**Expected Inference Times:**
|
||||
|
||||
| Model | Token Generation Speed | Typical Response |
|
||||
|-------|----------------------|------------------|
|
||||
| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds |
|
||||
| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds |
|
||||
| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds |
|
||||
|
||||
**For comparison:**
|
||||
- Claude API: 20-40 tokens/second
|
||||
- **DERP is 5-10× slower** (this is expected and acceptable for emergency use)
|
||||
|
||||
**CPU Impact on Game Servers:**
|
||||
- During DERP inference: ~70-80% CPU usage (temporary spikes)
|
||||
- Game servers may experience brief lag during AI responses
|
||||
- **Acceptable for emergency use** (not for normal operations)
|
||||
|
||||
---
|
||||
|
||||
## Network Requirements
|
||||
|
||||
### Initial Model Downloads (One-Time)
|
||||
|
||||
| Model | Size | Download Time (1Gbps) |
|
||||
|-------|------|----------------------|
|
||||
| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes |
|
||||
| **Llama 3.3 70B** | ~40GB | 5-10 minutes |
|
||||
| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes |
|
||||
| **Total** | **~87GB** | **15-25 minutes** |
|
||||
|
||||
**Reality:** Download speeds vary, budget 2-4 hours for all models.
|
||||
|
||||
**Recommendation:** Download overnight to avoid impacting game server traffic.
|
||||
|
||||
---
|
||||
|
||||
### Ongoing Bandwidth
|
||||
|
||||
**Dify Web Interface:**
|
||||
- Minimal (text-based queries)
|
||||
- ~1-5 KB per query
|
||||
- Negligible impact
|
||||
|
||||
**Discord Bot:**
|
||||
- Text-based queries only
|
||||
- ~1-5 KB per query
|
||||
- Negligible impact
|
||||
|
||||
**Model Updates:**
|
||||
- Infrequent (quarterly at most)
|
||||
- Same as initial download (~87GB)
|
||||
- Schedule during low-traffic periods
|
||||
|
||||
---
|
||||
|
||||
## Resource Allocation Strategy
|
||||
|
||||
### Priority Levels
|
||||
|
||||
**Priority 1 (Always):** Game Servers
|
||||
**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.)
|
||||
**Priority 3 (Emergency Only):** DERP AI Stack
|
||||
|
||||
### RAM Allocation Rules
|
||||
|
||||
**Normal Operations:**
|
||||
- Game servers: Up to 240GB
|
||||
- Management: ~8GB
|
||||
- AI Stack (idle): ~8GB
|
||||
- **Total: 256GB** ✅
|
||||
|
||||
**DERP Emergency:**
|
||||
- Game servers: Temporarily limited to 160GB
|
||||
- Management: ~8GB
|
||||
- AI Stack (active): ~92GB
|
||||
- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods)
|
||||
|
||||
**If RAM pressure occurs during DERP:**
|
||||
1. Unload one game server temporarily
|
||||
2. Run AI query
|
||||
3. Reload game server
|
||||
4. **Total downtime per query: <5 minutes**
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Alerts
|
||||
|
||||
### Critical Thresholds
|
||||
|
||||
**RAM Usage:**
|
||||
- **Warning:** >220GB used (85%)
|
||||
- **Critical:** >240GB used (93%)
|
||||
- **Action:** Defer DERP usage or unload game server
|
||||
|
||||
**CPU Usage:**
|
||||
- **Warning:** >80% sustained for >5 minutes
|
||||
- **Critical:** >90% sustained for >2 minutes
|
||||
- **Action:** Pause AI inference, prioritize game servers
|
||||
|
||||
**Storage:**
|
||||
- **Warning:** >800GB used (80%)
|
||||
- **Critical:** >900GB used (90%)
|
||||
- **Action:** Clean up old logs, model cache
|
||||
|
||||
### Monitoring Commands
|
||||
|
||||
```bash
|
||||
# Check RAM
|
||||
free -h
|
||||
|
||||
# Check CPU
|
||||
htop
|
||||
|
||||
# Check storage
|
||||
df -h /
|
||||
|
||||
# Check Ollama status
|
||||
ollama list
|
||||
ollama ps # Shows loaded models
|
||||
|
||||
# Check Dify
|
||||
cd /opt/dify
|
||||
docker-compose ps
|
||||
docker stats # Real-time resource usage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Resource Optimization
|
||||
|
||||
### Unload Models When Not Needed
|
||||
|
||||
```bash
|
||||
# Unload all models (frees RAM)
|
||||
ollama stop qwen2.5-coder:72b
|
||||
ollama stop llama3.3:70b
|
||||
ollama stop llama3.2-vision:11b
|
||||
|
||||
# Verify RAM freed
|
||||
free -h
|
||||
```
|
||||
|
||||
### Preload Models for Faster Response
|
||||
|
||||
```bash
|
||||
# Preload model (takes ~30 seconds)
|
||||
ollama run qwen2.5-coder:72b ""
|
||||
# Model now in RAM, queries will be faster
|
||||
```
|
||||
|
||||
### Schedule Maintenance Windows
|
||||
|
||||
**Best time for model downloads/updates:**
|
||||
- Tuesday/Wednesday 2-6 AM CST (lowest traffic)
|
||||
- Announce in Discord 24 hours ahead
|
||||
- Expected downtime: <10 minutes
|
||||
|
||||
---
|
||||
|
||||
## Capacity Planning
|
||||
|
||||
### Current State (Feb 2026)
|
||||
- **Game servers:** 6 active
|
||||
- **RAM available:** 256GB
|
||||
- **Storage available:** 1TB
|
||||
- **AI stack:** Fits comfortably
|
||||
|
||||
### Growth Scenarios
|
||||
|
||||
**Scenario 1: Add 6 more game servers (12 total)**
|
||||
- Additional RAM needed: ~60GB
|
||||
- Available for AI (normal): 248GB → 188GB ✅
|
||||
- Available for AI (DERP): 164GB → 104GB ✅
|
||||
- **Status:** Still viable
|
||||
|
||||
**Scenario 2: Add 12 more game servers (18 total)**
|
||||
- Additional RAM needed: ~120GB
|
||||
- Available for AI (normal): 248GB → 128GB ✅
|
||||
- Available for AI (DERP): 164GB → 44GB ⚠️
|
||||
- **Status:** DERP would require unloading 2 game servers
|
||||
|
||||
**Scenario 3: Upgrade to larger models (theoretical)**
|
||||
- Qwen 3.0 Coder 170B: ~180GB RAM
|
||||
- **Status:** Would NOT fit alongside game servers
|
||||
- **Recommendation:** Stick with 72B models
|
||||
|
||||
### Upgrade Path
|
||||
|
||||
**If TX1 reaches capacity:**
|
||||
|
||||
**Option A: Add second dedicated AI server**
|
||||
- Move AI stack to separate VPS
|
||||
- TX1 focuses only on game servers
|
||||
- Cost: ~$100-200/month (NOT DERP-compliant)
|
||||
|
||||
**Option B: Upgrade TX1 RAM**
|
||||
- 256GB → 512GB
|
||||
- Cost: Contact Hetzner for pricing
|
||||
- **Preferred:** Maintains DERP compliance
|
||||
|
||||
**Option C: Use smaller AI models**
|
||||
- Qwen 2.5 Coder 32B (~35GB RAM)
|
||||
- Llama 3.2 8B (~8GB RAM)
|
||||
- **Tradeoff:** Lower quality, but more capacity
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**What to backup:**
|
||||
- ✅ Dify configuration files
|
||||
- ✅ Knowledge base data
|
||||
- ✅ Discord bot code
|
||||
- ❌ Models (can re-download)
|
||||
|
||||
**Backup location:**
|
||||
- Git repository (for configs/code)
|
||||
- NC1 Charlotte (for knowledge base)
|
||||
|
||||
**Backup frequency:**
|
||||
- Configurations: After every change
|
||||
- Knowledge base: Weekly
|
||||
- Models: No backup needed
|
||||
|
||||
### Recovery Procedure
|
||||
|
||||
**If TX1 fails completely:**
|
||||
|
||||
1. Deploy Dify on NC1 (temporary)
|
||||
2. Restore knowledge base from backup
|
||||
3. Re-download models (~4 hours)
|
||||
4. Point Discord bot to NC1
|
||||
5. **Downtime: 4-6 hours**
|
||||
|
||||
**Note:** This is acceptable for DERP (emergency-only system)
|
||||
|
||||
---
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### One-Time Costs
|
||||
- Setup time: 6-8 hours (Michael's time)
|
||||
- Model downloads: Bandwidth usage (included in hosting)
|
||||
- **Total: $0** (sweat equity only)
|
||||
|
||||
### Monthly Costs
|
||||
- Hosting: $0 (using existing TX1)
|
||||
- Bandwidth: $0 (included in hosting)
|
||||
- Maintenance: ~1 hour/month (Michael's time)
|
||||
- **Total: $0/month** ✅
|
||||
|
||||
### Opportunity Cost
|
||||
- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP)
|
||||
- Could host 1-2 more game servers in that space
|
||||
- **Acceptable tradeoff:** DERP independence worth more than 2 game servers
|
||||
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
||||
|
||||
**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.**
|
||||
342
docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
Normal file
342
docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
Normal file
@@ -0,0 +1,342 @@
|
||||
# AI Stack Usage Guide
|
||||
|
||||
**Purpose:** Know which AI system to use when
|
||||
**Last Updated:** 2026-02-18
|
||||
|
||||
---
|
||||
|
||||
## The Three-Tier System
|
||||
|
||||
### Tier 1: Claude Projects (Primary) - **USE THIS FIRST**
|
||||
|
||||
**Who:** Michael + Meg
|
||||
**Where:** claude.ai or Claude app
|
||||
**Cost:** $20/month (already paying)
|
||||
|
||||
**When to use:**
|
||||
- ✅ **Normal daily operations** (99% of the time)
|
||||
- ✅ **Strategic decision-making** (deployment order, architecture)
|
||||
- ✅ **Complex reasoning** (tradeoffs, dependencies)
|
||||
- ✅ **Session continuity** (remembers context across days)
|
||||
- ✅ **Best experience** (fastest, most capable)
|
||||
|
||||
**What Claude can do:**
|
||||
- Search entire 416-file operations manual
|
||||
- Write deployment scripts
|
||||
- Review infrastructure decisions
|
||||
- Generate documentation
|
||||
- Debug issues
|
||||
- Plan roadmaps
|
||||
|
||||
**Example queries:**
|
||||
- "Should I deploy Mailcow or AI stack first?"
|
||||
- "Write a script to deploy Frostwall Protocol"
|
||||
- "What tasks depend on NC1 cleanup?"
|
||||
- "Help me troubleshoot this Pterodactyl error"
|
||||
|
||||
**Limitations:**
|
||||
- Requires internet connection
|
||||
- Subject to Anthropic availability
|
||||
|
||||
---
|
||||
|
||||
### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN**
|
||||
|
||||
**Who:** Michael + Meg
|
||||
**Where:** https://ai.firefrostgaming.com
|
||||
**Cost:** $0/month (self-hosted on TX1)
|
||||
|
||||
**When to use:**
|
||||
- ❌ **Not for normal operations** (Claude is faster/better)
|
||||
- ✅ **Anthropic outage** (Claude unavailable for hours)
|
||||
- ✅ **Emergency infrastructure decisions** (can't wait for Claude)
|
||||
- ✅ **Critical troubleshooting** (server down, need immediate help)
|
||||
|
||||
**What DERP can do:**
|
||||
- Query indexed operations manual (416 files)
|
||||
- Strategic reasoning with 128K context
|
||||
- Infrastructure troubleshooting
|
||||
- Code generation
|
||||
- Emergency deployment guidance
|
||||
|
||||
**Available models:**
|
||||
- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions
|
||||
- **Llama 3.3 70B** - General reasoning
|
||||
- **Llama 3.2 Vision 11B** - Screenshot analysis
|
||||
|
||||
**Example queries:**
|
||||
- "Claude is down. What's the deployment order for Frostwall?"
|
||||
- "Emergency: Mailcow not starting. Check logs and diagnose."
|
||||
- "Need to deploy something NOW. What dependencies are missing?"
|
||||
|
||||
**Limitations:**
|
||||
- Slower inference than Claude
|
||||
- No session continuity
|
||||
- Manual model selection
|
||||
- Uses TX1 resources (~80GB RAM when active)
|
||||
|
||||
**How to activate:**
|
||||
1. Verify Claude is unavailable (try multiple times)
|
||||
2. Go to https://ai.firefrostgaming.com
|
||||
3. Select workspace:
|
||||
- **Operations** - Infrastructure decisions
|
||||
- **Brainstorming** - Creative work
|
||||
4. Select model:
|
||||
- **Qwen 2.5 Coder** - For deployment/troubleshooting
|
||||
- **Llama 3.3** - For general questions
|
||||
5. Ask question
|
||||
6. Copy/paste response as needed
|
||||
|
||||
**When to deactivate:**
|
||||
- Claude comes back online
|
||||
- Emergency resolved
|
||||
- Free up TX1 RAM for game servers
|
||||
|
||||
---
|
||||
|
||||
### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES**
|
||||
|
||||
**Who:** Staff + Subscribers
|
||||
**Where:** Firefrost Discord server
|
||||
**Cost:** $0/month (same infrastructure)
|
||||
|
||||
**When to use:**
|
||||
- ✅ **Routine questions** (daily operations)
|
||||
- ✅ **Quick lookups** (server status, modpack info)
|
||||
- ✅ **Staff training** (how-to queries)
|
||||
- ✅ **Subscriber support** (basic info)
|
||||
|
||||
**Commands:**
|
||||
|
||||
**`/ask [question]`**
|
||||
- Available to: Staff + Subscribers
|
||||
- Searches: Operations workspace (staff) or public docs (subscribers)
|
||||
- Rate limit: 10 queries/hour per user
|
||||
|
||||
**Example queries (Staff):**
|
||||
```
|
||||
/ask How many game servers are running?
|
||||
/ask What's the Whitelist Manager deployment status?
|
||||
/ask How do I restart a Minecraft server?
|
||||
```
|
||||
|
||||
**Example queries (Subscribers):**
|
||||
```
|
||||
/ask What modpacks are available?
|
||||
/ask How do I join a server?
|
||||
/ask What's the difference between Fire and Frost paths?
|
||||
```
|
||||
|
||||
**Role-based access:**
|
||||
- **Staff:** Full Operations workspace access
|
||||
- **Subscribers:** Public documentation only
|
||||
- **No role:** Cannot use bot
|
||||
|
||||
**Limitations:**
|
||||
- Simple queries only (no complex reasoning)
|
||||
- No file uploads
|
||||
- No strategic decisions
|
||||
- Rate limited
|
||||
|
||||
---
|
||||
|
||||
## Decision Tree
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Do you need AI assistance? │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Is it urgent? │
|
||||
└───┬───────┬───┘
|
||||
│ │
|
||||
NO│ │YES
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────┐ ┌──────────────┐
|
||||
│ Claude │ │ Is Claude │
|
||||
│ working?│ │ available? │
|
||||
└───┬─────┘ └──┬───────┬───┘
|
||||
│ │ │
|
||||
YES│ YES│ │NO
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌─────────┐
|
||||
│Use Claude│ │Use Claude│ │Use DERP │
|
||||
│Projects │ │Projects │ │Backup │
|
||||
└──────────┘ └──────────┘ └─────────┘
|
||||
```
|
||||
|
||||
**For staff/subscribers:**
|
||||
```
|
||||
┌────────────────────────────┐
|
||||
│ Simple routine query? │
|
||||
└──────────┬─────────────────┘
|
||||
│
|
||||
YES
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Use Discord │
|
||||
│ Bot: /ask │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Scenario 1: Claude Down, Need Strategic Decision
|
||||
|
||||
**Problem:** Anthropic outage, need to deploy something NOW
|
||||
|
||||
**Solution:**
|
||||
1. Verify Claude truly unavailable (try web + app)
|
||||
2. Go to https://ai.firefrostgaming.com
|
||||
3. Login with Michael's account
|
||||
4. Select Operations workspace
|
||||
5. Select Qwen 2.5 Coder model
|
||||
6. Ask strategic question
|
||||
7. Copy deployment commands
|
||||
8. Execute carefully (no session memory!)
|
||||
|
||||
**Note:** DERP doesn't remember context. Be explicit in each query.
|
||||
|
||||
### Scenario 2: Discord Bot Down
|
||||
|
||||
**Problem:** Staff reporting bot not responding
|
||||
|
||||
**Check status:**
|
||||
```bash
|
||||
ssh root@38.68.14.26
|
||||
systemctl status firefrost-discord-bot
|
||||
```
|
||||
|
||||
**If stopped:**
|
||||
```bash
|
||||
systemctl start firefrost-discord-bot
|
||||
```
|
||||
|
||||
**If errors:**
|
||||
```bash
|
||||
journalctl -u firefrost-discord-bot -f
|
||||
# Check for API errors, token issues
|
||||
```
|
||||
|
||||
**If Dify down:**
|
||||
```bash
|
||||
cd /opt/dify
|
||||
docker-compose ps
|
||||
# If services down:
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Scenario 3: Model Won't Load
|
||||
|
||||
**Problem:** DERP system reports "model unavailable"
|
||||
|
||||
**Check Ollama:**
|
||||
```bash
|
||||
ollama list
|
||||
# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b
|
||||
```
|
||||
|
||||
**If models missing:**
|
||||
```bash
|
||||
# Re-download
|
||||
ollama pull qwen2.5-coder:72b
|
||||
ollama pull llama3.3:70b
|
||||
ollama pull llama3.2-vision:11b
|
||||
```
|
||||
|
||||
**Check RAM:**
|
||||
```bash
|
||||
free -h
|
||||
# If <90GB free, unload game servers temporarily
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cost Tracking
|
||||
|
||||
### Monthly Costs
|
||||
- **Claude Projects:** $20/month (primary system)
|
||||
- **Dify:** $0/month (self-hosted)
|
||||
- **Ollama:** $0/month (self-hosted)
|
||||
- **Discord Bot:** $0/month (self-hosted)
|
||||
- **Total:** $20/month ✅
|
||||
|
||||
### Resource Usage (TX1)
|
||||
- **Storage:** ~97GB (one-time)
|
||||
- **RAM (active DERP):** ~92GB (temporary)
|
||||
- **RAM (idle):** <5GB (normal)
|
||||
- **Bandwidth:** Models downloaded once, minimal ongoing
|
||||
|
||||
---
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
### Claude Projects (Primary)
|
||||
- **Response time:** 5-30 seconds
|
||||
- **Quality:** Excellent (GPT-4 class)
|
||||
- **Context:** Full repo (416 files)
|
||||
- **Session memory:** Yes
|
||||
|
||||
### DERP Backup (Emergency)
|
||||
- **Response time:** 30-120 seconds (slower than Claude)
|
||||
- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model)
|
||||
- **Context:** 128K tokens per query
|
||||
- **Session memory:** No (each query independent)
|
||||
|
||||
### Discord Bot (Routine)
|
||||
- **Response time:** 10-45 seconds
|
||||
- **Quality:** Good for simple queries
|
||||
- **Context:** Knowledge base search
|
||||
- **Rate limit:** 10 queries/hour per user
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### For Michael + Meg:
|
||||
1. ✅ **Always use Claude Projects first** (best experience)
|
||||
2. ✅ **Only use DERP for true emergencies** (Claude unavailable)
|
||||
3. ✅ **Document DERP usage** (so Claude can learn from it later)
|
||||
4. ✅ **Free TX1 RAM after DERP use** (restart Ollama if needed)
|
||||
|
||||
### For Staff:
|
||||
1. ✅ **Use Discord bot for quick lookups** (fast, simple)
|
||||
2. ✅ **Ask Michael/Meg for complex questions** (they have Claude)
|
||||
3. ✅ **Don't abuse rate limits** (10 queries/hour is generous)
|
||||
4. ✅ **Report bot issues immediately** (don't let it stay broken)
|
||||
|
||||
### For Subscribers:
|
||||
1. ✅ **Use Discord bot for server info** (join instructions, modpacks)
|
||||
2. ✅ **Don't ask for staff-only info** (bot will decline)
|
||||
3. ✅ **Be patient** (bot shares resources with staff)
|
||||
|
||||
---
|
||||
|
||||
## Training & Onboarding
|
||||
|
||||
### New Staff Training:
|
||||
1. Introduce Discord bot commands (`/ask`)
|
||||
2. Show example queries (moderation, server management)
|
||||
3. Explain rate limits
|
||||
4. When to escalate to Michael/Meg
|
||||
|
||||
### Subscriber Communication:
|
||||
1. Announce bot in Discord
|
||||
2. Pin message with `/ask` command
|
||||
3. Example queries in welcome channel
|
||||
4. FAQ: "What can the bot answer?"
|
||||
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
||||
|
||||
**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.**
|
||||
|
||||
**Monthly cost: $20 (no increase)**
|
||||
Reference in New Issue
Block a user