Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
501 lines
11 KiB
Markdown
501 lines
11 KiB
Markdown
# Self-Hosted AI Stack - Deployment Plan
|
|
|
|
**Task:** Self-Hosted AI Stack on TX1
|
|
**Location:** TX1 Dallas (38.68.14.26)
|
|
**Total Time:** 6-8 hours (3-4 active, rest overnight downloads)
|
|
**Last Updated:** 2026-02-18
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Before Starting
|
|
- [ ] SSH access to TX1
|
|
- [ ] Docker installed on TX1
|
|
- [ ] Docker Compose installed
|
|
- [ ] Sufficient storage (~100GB free)
|
|
- [ ] No game servers under heavy load (model downloads are bandwidth-intensive)
|
|
|
|
### Domain Configuration
|
|
- [ ] DNS A record: ai.firefrostgaming.com → 38.68.14.26
|
|
- [ ] SSL certificate ready (Let's Encrypt)
|
|
|
|
---
|
|
|
|
## Phase 1: Deploy Dify (2-3 hours)
|
|
|
|
### Step 1.1: Create Directory Structure
|
|
|
|
```bash
|
|
ssh root@38.68.14.26
|
|
cd /opt
|
|
mkdir -p dify
|
|
cd dify
|
|
```
|
|
|
|
### Step 1.2: Download Dify Docker Compose
|
|
|
|
```bash
|
|
wget https://raw.githubusercontent.com/langgenius/dify/main/docker/docker-compose.yaml
|
|
```
|
|
|
|
### Step 1.3: Configure Environment
|
|
|
|
```bash
|
|
# Create .env file
|
|
cat > .env << 'EOF'
|
|
# Dify Configuration
|
|
DIFY_VERSION=0.6.0
|
|
API_URL=https://ai.firefrostgaming.com
|
|
WEB_API_URL=https://ai.firefrostgaming.com
|
|
|
|
# Database
|
|
POSTGRES_PASSWORD=<generate_secure_password>
|
|
POSTGRES_DB=dify
|
|
|
|
# Redis
|
|
REDIS_PASSWORD=<generate_secure_password>
|
|
|
|
# Secret Key (generate with: openssl rand -base64 32)
|
|
SECRET_KEY=<generate_secret_key>
|
|
|
|
# Storage
|
|
STORAGE_TYPE=local
|
|
STORAGE_LOCAL_PATH=/app/storage
|
|
EOF
|
|
```
|
|
|
|
### Step 1.4: Deploy Dify
|
|
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
**Wait:** 5-10 minutes for all services to start
|
|
|
|
### Step 1.5: Verify Deployment
|
|
|
|
```bash
|
|
docker-compose ps
|
|
# All services should show "Up"
|
|
|
|
curl http://localhost/health
|
|
# Should return: {"status":"ok"}
|
|
```
|
|
|
|
### Step 1.6: Configure Nginx Reverse Proxy
|
|
|
|
```bash
|
|
# Create Nginx config
|
|
cat > /etc/nginx/sites-available/ai.firefrostgaming.com << 'EOF'
|
|
server {
|
|
listen 80;
|
|
server_name ai.firefrostgaming.com;
|
|
|
|
location / {
|
|
proxy_pass http://localhost:80;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
}
|
|
}
|
|
EOF
|
|
|
|
# Enable site
|
|
ln -s /etc/nginx/sites-available/ai.firefrostgaming.com /etc/nginx/sites-enabled/
|
|
nginx -t
|
|
systemctl reload nginx
|
|
|
|
# Get SSL certificate
|
|
certbot --nginx -d ai.firefrostgaming.com
|
|
```
|
|
|
|
### Step 1.7: Initial Configuration
|
|
|
|
1. Visit https://ai.firefrostgaming.com
|
|
2. Create admin account (Michael)
|
|
3. Configure workspaces:
|
|
- **Operations** (infrastructure docs)
|
|
- **Brainstorming** (creative docs)
|
|
|
|
---
|
|
|
|
## Phase 2: Install Ollama and Models (Overnight)
|
|
|
|
### Step 2.1: Install Ollama
|
|
|
|
```bash
|
|
curl -fsSL https://ollama.com/install.sh | sh
|
|
```
|
|
|
|
### Step 2.2: Download Models (Overnight - Large Files)
|
|
|
|
**Download Qwen 2.5 Coder 72B:**
|
|
```bash
|
|
ollama pull qwen2.5-coder:72b
|
|
```
|
|
**Size:** ~40GB
|
|
**Time:** 2-4 hours (depending on connection)
|
|
|
|
**Download Llama 3.3 70B:**
|
|
```bash
|
|
ollama pull llama3.3:70b
|
|
```
|
|
**Size:** ~40GB
|
|
**Time:** 2-4 hours
|
|
|
|
**Download Llama 3.2 Vision 11B:**
|
|
```bash
|
|
ollama pull llama3.2-vision:11b
|
|
```
|
|
**Size:** ~7GB
|
|
**Time:** 30-60 minutes
|
|
|
|
**Total download time:** 6-8 hours (run overnight)
|
|
|
|
### Step 2.3: Verify Models
|
|
|
|
```bash
|
|
ollama list
|
|
# Should show all three models
|
|
|
|
# Test Qwen
|
|
ollama run qwen2.5-coder:72b "Write a bash script to check disk space"
|
|
# Should generate script
|
|
|
|
# Test Llama 3.3
|
|
ollama run llama3.3:70b "Explain Firefrost Gaming's Fire + Frost philosophy"
|
|
# Should respond
|
|
|
|
# Test Vision
|
|
ollama run llama3.2-vision:11b "Describe this image: /path/to/test/image.jpg"
|
|
# Should analyze image
|
|
```
|
|
|
|
### Step 2.4: Configure Ollama as Dify Backend
|
|
|
|
In Dify web interface:
|
|
1. Go to Settings → Model Providers
|
|
2. Add Ollama provider
|
|
3. URL: http://localhost:11434
|
|
4. Add models:
|
|
- qwen2.5-coder:72b
|
|
- llama3.3:70b
|
|
- llama3.2-vision:11b
|
|
5. Set Qwen as default for coding queries
|
|
6. Set Llama 3.3 as default for general queries
|
|
|
|
---
|
|
|
|
## Phase 3: Index Git Repository (1-2 hours)
|
|
|
|
### Step 3.1: Clone Operations Manual to TX1
|
|
|
|
```bash
|
|
cd /opt/dify
|
|
git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
|
|
```
|
|
|
|
### Step 3.2: Configure Dify Knowledge Base
|
|
|
|
**Operations Workspace:**
|
|
1. In Dify, go to Operations workspace
|
|
2. Create Knowledge Base: "Infrastructure Docs"
|
|
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/`
|
|
4. Processing: Automatic chunking with Q&A segmentation
|
|
5. Embedding model: Default (all-MiniLM-L6-v2)
|
|
|
|
**Brainstorming Workspace:**
|
|
1. Go to Brainstorming workspace
|
|
2. Create Knowledge Base: "Creative Docs"
|
|
3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/planning/`
|
|
4. Same processing settings
|
|
|
|
**Wait:** 30-60 minutes for indexing (416 files)
|
|
|
|
### Step 3.3: Test Knowledge Retrieval
|
|
|
|
In Operations workspace:
|
|
- Query: "What is the Frostwall Protocol?"
|
|
- Should return relevant docs with citations
|
|
|
|
In Brainstorming workspace:
|
|
- Query: "What is the Terraria branding training arc?"
|
|
- Should return planning docs
|
|
|
|
---
|
|
|
|
## Phase 4: Discord Bot (2-3 hours)
|
|
|
|
### Step 4.1: Create Bot on Discord Developer Portal
|
|
|
|
1. Go to https://discord.com/developers/applications
|
|
2. Create new application: "Firefrost AI Assistant"
|
|
3. Go to Bot section
|
|
4. Create bot
|
|
5. Copy bot token
|
|
6. Enable Privileged Gateway Intents:
|
|
- Message Content Intent
|
|
- Server Members Intent
|
|
|
|
### Step 4.2: Install Bot Code on TX1
|
|
|
|
```bash
|
|
cd /opt
|
|
mkdir -p firefrost-discord-bot
|
|
cd firefrost-discord-bot
|
|
|
|
# Create requirements.txt
|
|
cat > requirements.txt << 'EOF'
|
|
discord.py==2.3.2
|
|
aiohttp==3.9.1
|
|
python-dotenv==1.0.0
|
|
EOF
|
|
|
|
# Create virtual environment
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Step 4.3: Create Bot Script
|
|
|
|
```bash
|
|
cat > bot.py << 'EOF'
|
|
import discord
|
|
from discord.ext import commands
|
|
import aiohttp
|
|
import os
|
|
from dotenv import load_dotenv
|
|
|
|
load_dotenv()
|
|
|
|
TOKEN = os.getenv('DISCORD_TOKEN')
|
|
DIFY_API_URL = os.getenv('DIFY_API_URL')
|
|
DIFY_API_KEY = os.getenv('DIFY_API_KEY')
|
|
|
|
intents = discord.Intents.default()
|
|
intents.message_content = True
|
|
bot = commands.Bot(command_prefix='/', intents=intents)
|
|
|
|
@bot.event
|
|
async def on_ready():
|
|
print(f'{bot.user} is now running!')
|
|
|
|
@bot.command(name='ask')
|
|
async def ask(ctx, *, question):
|
|
"""Ask the AI a question"""
|
|
# Check user roles
|
|
is_staff = any(role.name in ['Staff', 'Admin'] for role in ctx.author.roles)
|
|
is_subscriber = any(role.name == 'Subscriber' for role in ctx.author.roles)
|
|
|
|
if not (is_staff or is_subscriber):
|
|
await ctx.send("You need Staff or Subscriber role to use this command.")
|
|
return
|
|
|
|
# Determine workspace based on role
|
|
workspace = 'operations' if is_staff else 'general'
|
|
|
|
await ctx.send(f"🤔 Thinking...")
|
|
|
|
async with aiohttp.ClientSession() as session:
|
|
async with session.post(
|
|
f'{DIFY_API_URL}/v1/chat-messages',
|
|
headers={
|
|
'Authorization': f'Bearer {DIFY_API_KEY}',
|
|
'Content-Type': 'application/json'
|
|
},
|
|
json={
|
|
'query': question,
|
|
'user': str(ctx.author.id),
|
|
'conversation_id': None,
|
|
'workspace': workspace
|
|
}
|
|
) as resp:
|
|
if resp.status == 200:
|
|
data = await resp.json()
|
|
answer = data.get('answer', 'No response')
|
|
|
|
# Split long responses
|
|
if len(answer) > 2000:
|
|
chunks = [answer[i:i+2000] for i in range(0, len(answer), 2000)]
|
|
for chunk in chunks:
|
|
await ctx.send(chunk)
|
|
else:
|
|
await ctx.send(answer)
|
|
else:
|
|
await ctx.send("❌ Error connecting to AI. Please try again.")
|
|
|
|
bot.run(TOKEN)
|
|
EOF
|
|
```
|
|
|
|
### Step 4.4: Configure Bot
|
|
|
|
```bash
|
|
# Create .env file
|
|
cat > .env << 'EOF'
|
|
DISCORD_TOKEN=<your_bot_token>
|
|
DIFY_API_URL=https://ai.firefrostgaming.com
|
|
DIFY_API_KEY=<get_from_dify_settings>
|
|
EOF
|
|
```
|
|
|
|
### Step 4.5: Create Systemd Service
|
|
|
|
```bash
|
|
cat > /etc/systemd/system/firefrost-discord-bot.service << 'EOF'
|
|
[Unit]
|
|
Description=Firefrost Discord Bot
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=root
|
|
WorkingDirectory=/opt/firefrost-discord-bot
|
|
Environment="PATH=/opt/firefrost-discord-bot/venv/bin"
|
|
ExecStart=/opt/firefrost-discord-bot/venv/bin/python bot.py
|
|
Restart=always
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
EOF
|
|
|
|
systemctl daemon-reload
|
|
systemctl enable firefrost-discord-bot
|
|
systemctl start firefrost-discord-bot
|
|
```
|
|
|
|
### Step 4.6: Invite Bot to Discord
|
|
|
|
1. Go to OAuth2 → URL Generator
|
|
2. Select scopes: bot, applications.commands
|
|
3. Select permissions: Send Messages, Read Message History
|
|
4. Copy generated URL
|
|
5. Open in browser and invite to Firefrost Discord
|
|
|
|
### Step 4.7: Test Bot
|
|
|
|
In Discord:
|
|
```
|
|
/ask What is the Frostwall Protocol?
|
|
```
|
|
Should return answer from Operations workspace (staff only)
|
|
|
|
---
|
|
|
|
## Phase 5: Testing and Validation (30 minutes)
|
|
|
|
### Test 1: DERP Backup (Strategic Query)
|
|
|
|
**Simulate Claude outage:**
|
|
1. Load Qwen model: `ollama run qwen2.5-coder:72b`
|
|
2. In Dify Operations workspace, ask:
|
|
- "Should I deploy Mailcow before or after Frostwall Protocol?"
|
|
3. Verify:
|
|
- Response references both task docs
|
|
- Shows dependency understanding
|
|
- Recommends Frostwall first
|
|
|
|
### Test 2: Discord Bot (Staff Query)
|
|
|
|
As staff member in Discord:
|
|
```
|
|
/ask How many game servers are running?
|
|
```
|
|
Should return infrastructure details
|
|
|
|
### Test 3: Discord Bot (Subscriber Query)
|
|
|
|
As subscriber in Discord:
|
|
```
|
|
/ask What modpacks are available?
|
|
```
|
|
Should return modpack list (limited to public info)
|
|
|
|
### Test 4: Resource Monitoring
|
|
|
|
```bash
|
|
# Check RAM usage with model loaded
|
|
free -h
|
|
# Should show ~92GB used when Qwen loaded
|
|
|
|
# Check disk usage
|
|
df -h /opt/dify
|
|
# Should show ~97GB used
|
|
|
|
# Check Docker containers
|
|
docker ps
|
|
# All Dify services should be running
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 6: Documentation (1 hour)
|
|
|
|
### Create Usage Guide
|
|
|
|
Document at `/opt/dify/USAGE-GUIDE.md`:
|
|
- When to use Claude (primary)
|
|
- When to use DERP (Claude down)
|
|
- When to use Discord bot (routine queries)
|
|
- Emergency procedures
|
|
|
|
### Update Operations Manual
|
|
|
|
Commit changes to Git:
|
|
- Task documentation updated
|
|
- Deployment plan complete
|
|
- Usage guide created
|
|
|
|
---
|
|
|
|
## Success Criteria Checklist
|
|
|
|
- [ ] Dify deployed and accessible at https://ai.firefrostgaming.com
|
|
- [ ] Ollama running with all 3 models loaded
|
|
- [ ] Operations workspace indexing complete (416 files)
|
|
- [ ] Brainstorming workspace indexing complete
|
|
- [ ] DERP backup tested (strategic query works)
|
|
- [ ] Discord bot deployed and running
|
|
- [ ] Staff can query via Discord (/ask command)
|
|
- [ ] Subscribers have limited access
|
|
- [ ] Resource usage within TX1 limits (~92GB RAM, ~97GB storage)
|
|
- [ ] Documentation complete and committed to Git
|
|
- [ ] Zero additional monthly cost confirmed
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If deployment fails:
|
|
|
|
```bash
|
|
# Stop all services
|
|
cd /opt/dify
|
|
docker-compose down
|
|
|
|
# Stop Discord bot
|
|
systemctl stop firefrost-discord-bot
|
|
systemctl disable firefrost-discord-bot
|
|
|
|
# Remove installation
|
|
rm -rf /opt/dify
|
|
rm -rf /opt/firefrost-discord-bot
|
|
rm /etc/systemd/system/firefrost-discord-bot.service
|
|
systemctl daemon-reload
|
|
|
|
# Remove Nginx config
|
|
rm /etc/nginx/sites-enabled/ai.firefrostgaming.com
|
|
rm /etc/nginx/sites-available/ai.firefrostgaming.com
|
|
nginx -t && systemctl reload nginx
|
|
|
|
# Uninstall Ollama
|
|
sudo /usr/local/bin/ollama-uninstall.sh
|
|
```
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|