diff --git a/docs/core/tasks.md b/docs/core/tasks.md index 3480360..e14bac2 100644 --- a/docs/core/tasks.md +++ b/docs/core/tasks.md @@ -178,15 +178,17 @@ Professional @firefrostgaming.com email on NC1. Self-hosted, $120/year saved, el --- ### 9. Self-Hosted AI Stack on TX1 -**Time:** 8-12 hours (3-4 active, rest downloads) +**Time:** 6-8 hours (3-4 active, rest downloads) **Status:** BLOCKED - Medical clearance **Documentation:** `docs/tasks/self-hosted-ai-stack-on-tx1/` -Dual AI deployment: AnythingLLM (ops) + Open WebUI (staff). DERP backup, unlimited AI access. +DERP-compliant AI infrastructure: Dify + Ollama + self-hosted models. Three-tier usage: Claude Projects (primary) → DERP backup (emergency) → Discord/Wiki bots (staff/subscribers). +**Architecture:** Dify with knowledge graph RAG, Ollama model server **Models:** Qwen 2.5 Coder 72B, Llama 3.3 70B, Llama 3.2 Vision 11B -**Storage:** ~150GB -**RAM:** ~110GB when loaded +**Storage:** ~97GB +**RAM:** ~92GB when DERP activated, ~8GB idle +**Monthly Cost:** $0 (self-hosted, no additional cost beyond Claude Pro) --- diff --git a/docs/tasks/self-hosted-ai-stack-on-tx1/README.md b/docs/tasks/self-hosted-ai-stack-on-tx1/README.md index 516b8fe..7227a65 100644 --- a/docs/tasks/self-hosted-ai-stack-on-tx1/README.md +++ b/docs/tasks/self-hosted-ai-stack-on-tx1/README.md @@ -2,43 +2,167 @@ **Status:** Blocked - Medical clearance **Priority:** Tier 2 - Major Infrastructure -**Time:** 8-12 hours (3-4 active, rest downloads) +**Time:** 6-8 hours (3-4 active, rest downloads) **Location:** TX1 Dallas -**Last Updated:** 2026-02-16 +**Last Updated:** 2026-02-18 +**Updated By:** The Chronicler + +--- ## Overview -Dual AI deployment: AnythingLLM (Michael/Meg, document-heavy) + Open WebUI (staff assistant). DERP backup, unlimited AI access, staff foundation. + +**DERP-compliant AI infrastructure with zero additional monthly cost.** + +Three-tier usage model: +1. **Primary:** Claude Projects (best experience, full repo context) +2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable +3. **Staff/Subscriber Bots:** Discord + Wiki integration + +**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription) + +--- ## Architecture -**Primary: AnythingLLM** (ai.firefrostgaming.com) -- 1,000+ document libraries -- LanceDB vector database -- Workspace isolation (Operations, Pokerole, Brainstorming) -**Secondary: Open WebUI** (staff-ai.firefrostgaming.com) -- Lighter for staff wiki -- Chroma vector DB -- ChatGPT-like interface +### Component 1: Dify (RAG Platform) +**URL:** ai.firefrostgaming.com +**Purpose:** Knowledge management, API backend +**Features:** +- Multi-workspace (Operations, Brainstorming) +- Knowledge graph indexing +- Web interface +- Discord bot API +- Repository integration -## Phases -**Phase 1:** Deploy stack (1-2 hours) -**Phase 2:** Load models (6-8 hours overnight) -**Phase 3:** Document ingestion (2-3 hours active, 6-8 total) +### Component 2: Ollama (Model Server) +**Purpose:** Local model hosting +**Features:** +- Model management +- API compatibility +- Resource optimization -## Models -- Qwen 2.5 Coder 72B (~40GB) -- Llama 3.3 70B (~40GB) -- Llama 3.2 Vision 11B (~7GB) -- Embeddings: all-MiniLM-L6-v2 (~400MB) +### Component 3: Models (Self-Hosted) -**Total:** ~150GB storage, ~110GB RAM when loaded +**Qwen 2.5 Coder 72B** +- Purpose: Infrastructure/coding specialist +- Context: 128K tokens +- RAM: ~80GB when loaded +- Storage: ~40GB +- Use: DERP strategic decisions + +**Llama 3.3 70B** +- Purpose: General reasoning +- Context: 128K tokens +- RAM: ~80GB when loaded +- Storage: ~40GB +- Use: DERP general queries + +**Llama 3.2 Vision 11B** +- Purpose: Screenshot/image analysis +- RAM: ~7GB when loaded +- Storage: ~7GB +- Use: Visual troubleshooting + +### Component 4: Discord Bot +**Purpose:** Staff/subscriber interface +**Features:** +- Role-based access (staff vs subscribers) +- Calls Dify API +- Commands: `/ask`, `/operations`, `/brainstorm` + +--- + +## Usage Model + +### Tier 1: Claude Projects (Primary) +**When:** Normal operations +**Experience:** Best (full repo context, session continuity) +**Cost:** $20/month (already paying) + +### Tier 2: DERP Backup (Emergency) +**When:** Claude/Anthropic outage +**Experience:** Functional (knowledge graph + 128K context) +**Cost:** $0/month (self-hosted) + +### Tier 3: Staff/Subscriber Bots +**When:** Routine queries in Discord/Wiki +**Experience:** Fast, simple +**Cost:** $0/month (same infrastructure) + +--- + +## Resource Requirements + +### Storage (TX1 has 1TB) +- Qwen 2.5 Coder 72B: ~40GB +- Llama 3.3 70B: ~40GB +- Llama 3.2 Vision 11B: ~7GB +- Dify + services: ~10GB +- **Total: ~97GB** ✅ + +### RAM (TX1 has 256GB) +**DERP Activated (one large model loaded):** +- Model: ~80GB (Qwen OR Llama 3.3) +- Dify services: ~4GB +- Overhead: ~8GB +- **Total: ~92GB** ✅ + +**Normal Operations (models idle):** +- Minimal RAM usage +- Available for game servers + +### CPU +- 32 vCPU available +- Inference slower than API +- Functional for emergency use + +--- + +## Deployment Phases + +### Phase 1: Core Stack (2-3 hours) +1. Deploy Dify via Docker Compose +2. Install Ollama +3. Download models (overnight - large files) +4. Configure workspaces +5. Index Git repository + +### Phase 2: Discord Bot (2-3 hours) +1. Create Python bot +2. Connect to Dify API +3. Implement role-based access +4. Test in Discord + +### Phase 3: Documentation (1 hour) +1. Usage guide (when to use what) +2. Emergency DERP procedures +3. Discord bot commands +4. Staff training materials + +**Total Time:** 6-8 hours (active work) + +--- ## Success Criteria -- ✅ Both stacks deployed + +- ✅ Dify deployed and indexing repo - ✅ Models loaded and operational -- ✅ Documents ingested (Ops, Pokerole, Brainstorming) -- ✅ DERP backup functional +- ✅ DERP backup tested (strategic query without Claude) +- ✅ Discord bot functional (staff + subscriber access) +- ✅ Documentation complete +- ✅ Zero additional monthly cost -**See:** deployment-plan.md for detailed phases +--- -**Fire + Frost + Foundation** 💙🔥❄️ +## Related Documentation + +- **deployment-plan.md** - Step-by-step deployment guide +- **usage-guide.md** - When to use Claude vs DERP vs bots +- **resource-requirements.md** - Detailed TX1 resource allocation +- **discord-bot-setup.md** - Bot configuration and commands + +--- + +**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ + +**Monthly Cost: $20 (no increase from current)** diff --git a/docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md b/docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md new file mode 100644 index 0000000..b9b1d57 --- /dev/null +++ b/docs/tasks/self-hosted-ai-stack-on-tx1/deployment-plan.md @@ -0,0 +1,500 @@ +# Self-Hosted AI Stack - Deployment Plan + +**Task:** Self-Hosted AI Stack on TX1 +**Location:** TX1 Dallas (38.68.14.26) +**Total Time:** 6-8 hours (3-4 active, rest overnight downloads) +**Last Updated:** 2026-02-18 + +--- + +## Prerequisites + +### Before Starting +- [ ] SSH access to TX1 +- [ ] Docker installed on TX1 +- [ ] Docker Compose installed +- [ ] Sufficient storage (~100GB free) +- [ ] No game servers under heavy load (model downloads are bandwidth-intensive) + +### Domain Configuration +- [ ] DNS A record: ai.firefrostgaming.com → 38.68.14.26 +- [ ] SSL certificate ready (Let's Encrypt) + +--- + +## Phase 1: Deploy Dify (2-3 hours) + +### Step 1.1: Create Directory Structure + +```bash +ssh root@38.68.14.26 +cd /opt +mkdir -p dify +cd dify +``` + +### Step 1.2: Download Dify Docker Compose + +```bash +wget https://raw.githubusercontent.com/langgenius/dify/main/docker/docker-compose.yaml +``` + +### Step 1.3: Configure Environment + +```bash +# Create .env file +cat > .env << 'EOF' +# Dify Configuration +DIFY_VERSION=0.6.0 +API_URL=https://ai.firefrostgaming.com +WEB_API_URL=https://ai.firefrostgaming.com + +# Database +POSTGRES_PASSWORD= +POSTGRES_DB=dify + +# Redis +REDIS_PASSWORD= + +# Secret Key (generate with: openssl rand -base64 32) +SECRET_KEY= + +# Storage +STORAGE_TYPE=local +STORAGE_LOCAL_PATH=/app/storage +EOF +``` + +### Step 1.4: Deploy Dify + +```bash +docker-compose up -d +``` + +**Wait:** 5-10 minutes for all services to start + +### Step 1.5: Verify Deployment + +```bash +docker-compose ps +# All services should show "Up" + +curl http://localhost/health +# Should return: {"status":"ok"} +``` + +### Step 1.6: Configure Nginx Reverse Proxy + +```bash +# Create Nginx config +cat > /etc/nginx/sites-available/ai.firefrostgaming.com << 'EOF' +server { + listen 80; + server_name ai.firefrostgaming.com; + + location / { + proxy_pass http://localhost:80; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } +} +EOF + +# Enable site +ln -s /etc/nginx/sites-available/ai.firefrostgaming.com /etc/nginx/sites-enabled/ +nginx -t +systemctl reload nginx + +# Get SSL certificate +certbot --nginx -d ai.firefrostgaming.com +``` + +### Step 1.7: Initial Configuration + +1. Visit https://ai.firefrostgaming.com +2. Create admin account (Michael) +3. Configure workspaces: + - **Operations** (infrastructure docs) + - **Brainstorming** (creative docs) + +--- + +## Phase 2: Install Ollama and Models (Overnight) + +### Step 2.1: Install Ollama + +```bash +curl -fsSL https://ollama.com/install.sh | sh +``` + +### Step 2.2: Download Models (Overnight - Large Files) + +**Download Qwen 2.5 Coder 72B:** +```bash +ollama pull qwen2.5-coder:72b +``` +**Size:** ~40GB +**Time:** 2-4 hours (depending on connection) + +**Download Llama 3.3 70B:** +```bash +ollama pull llama3.3:70b +``` +**Size:** ~40GB +**Time:** 2-4 hours + +**Download Llama 3.2 Vision 11B:** +```bash +ollama pull llama3.2-vision:11b +``` +**Size:** ~7GB +**Time:** 30-60 minutes + +**Total download time:** 6-8 hours (run overnight) + +### Step 2.3: Verify Models + +```bash +ollama list +# Should show all three models + +# Test Qwen +ollama run qwen2.5-coder:72b "Write a bash script to check disk space" +# Should generate script + +# Test Llama 3.3 +ollama run llama3.3:70b "Explain Firefrost Gaming's Fire + Frost philosophy" +# Should respond + +# Test Vision +ollama run llama3.2-vision:11b "Describe this image: /path/to/test/image.jpg" +# Should analyze image +``` + +### Step 2.4: Configure Ollama as Dify Backend + +In Dify web interface: +1. Go to Settings → Model Providers +2. Add Ollama provider +3. URL: http://localhost:11434 +4. Add models: + - qwen2.5-coder:72b + - llama3.3:70b + - llama3.2-vision:11b +5. Set Qwen as default for coding queries +6. Set Llama 3.3 as default for general queries + +--- + +## Phase 3: Index Git Repository (1-2 hours) + +### Step 3.1: Clone Operations Manual to TX1 + +```bash +cd /opt/dify +git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git +``` + +### Step 3.2: Configure Dify Knowledge Base + +**Operations Workspace:** +1. In Dify, go to Operations workspace +2. Create Knowledge Base: "Infrastructure Docs" +3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/` +4. Processing: Automatic chunking with Q&A segmentation +5. Embedding model: Default (all-MiniLM-L6-v2) + +**Brainstorming Workspace:** +1. Go to Brainstorming workspace +2. Create Knowledge Base: "Creative Docs" +3. Upload folder: `/opt/dify/firefrost-operations-manual/docs/planning/` +4. Same processing settings + +**Wait:** 30-60 minutes for indexing (416 files) + +### Step 3.3: Test Knowledge Retrieval + +In Operations workspace: +- Query: "What is the Frostwall Protocol?" +- Should return relevant docs with citations + +In Brainstorming workspace: +- Query: "What is the Terraria branding training arc?" +- Should return planning docs + +--- + +## Phase 4: Discord Bot (2-3 hours) + +### Step 4.1: Create Bot on Discord Developer Portal + +1. Go to https://discord.com/developers/applications +2. Create new application: "Firefrost AI Assistant" +3. Go to Bot section +4. Create bot +5. Copy bot token +6. Enable Privileged Gateway Intents: + - Message Content Intent + - Server Members Intent + +### Step 4.2: Install Bot Code on TX1 + +```bash +cd /opt +mkdir -p firefrost-discord-bot +cd firefrost-discord-bot + +# Create requirements.txt +cat > requirements.txt << 'EOF' +discord.py==2.3.2 +aiohttp==3.9.1 +python-dotenv==1.0.0 +EOF + +# Create virtual environment +python3 -m venv venv +source venv/bin/activate +pip install -r requirements.txt +``` + +### Step 4.3: Create Bot Script + +```bash +cat > bot.py << 'EOF' +import discord +from discord.ext import commands +import aiohttp +import os +from dotenv import load_dotenv + +load_dotenv() + +TOKEN = os.getenv('DISCORD_TOKEN') +DIFY_API_URL = os.getenv('DIFY_API_URL') +DIFY_API_KEY = os.getenv('DIFY_API_KEY') + +intents = discord.Intents.default() +intents.message_content = True +bot = commands.Bot(command_prefix='/', intents=intents) + +@bot.event +async def on_ready(): + print(f'{bot.user} is now running!') + +@bot.command(name='ask') +async def ask(ctx, *, question): + """Ask the AI a question""" + # Check user roles + is_staff = any(role.name in ['Staff', 'Admin'] for role in ctx.author.roles) + is_subscriber = any(role.name == 'Subscriber' for role in ctx.author.roles) + + if not (is_staff or is_subscriber): + await ctx.send("You need Staff or Subscriber role to use this command.") + return + + # Determine workspace based on role + workspace = 'operations' if is_staff else 'general' + + await ctx.send(f"🤔 Thinking...") + + async with aiohttp.ClientSession() as session: + async with session.post( + f'{DIFY_API_URL}/v1/chat-messages', + headers={ + 'Authorization': f'Bearer {DIFY_API_KEY}', + 'Content-Type': 'application/json' + }, + json={ + 'query': question, + 'user': str(ctx.author.id), + 'conversation_id': None, + 'workspace': workspace + } + ) as resp: + if resp.status == 200: + data = await resp.json() + answer = data.get('answer', 'No response') + + # Split long responses + if len(answer) > 2000: + chunks = [answer[i:i+2000] for i in range(0, len(answer), 2000)] + for chunk in chunks: + await ctx.send(chunk) + else: + await ctx.send(answer) + else: + await ctx.send("❌ Error connecting to AI. Please try again.") + +bot.run(TOKEN) +EOF +``` + +### Step 4.4: Configure Bot + +```bash +# Create .env file +cat > .env << 'EOF' +DISCORD_TOKEN= +DIFY_API_URL=https://ai.firefrostgaming.com +DIFY_API_KEY= +EOF +``` + +### Step 4.5: Create Systemd Service + +```bash +cat > /etc/systemd/system/firefrost-discord-bot.service << 'EOF' +[Unit] +Description=Firefrost Discord Bot +After=network.target + +[Service] +Type=simple +User=root +WorkingDirectory=/opt/firefrost-discord-bot +Environment="PATH=/opt/firefrost-discord-bot/venv/bin" +ExecStart=/opt/firefrost-discord-bot/venv/bin/python bot.py +Restart=always + +[Install] +WantedBy=multi-user.target +EOF + +systemctl daemon-reload +systemctl enable firefrost-discord-bot +systemctl start firefrost-discord-bot +``` + +### Step 4.6: Invite Bot to Discord + +1. Go to OAuth2 → URL Generator +2. Select scopes: bot, applications.commands +3. Select permissions: Send Messages, Read Message History +4. Copy generated URL +5. Open in browser and invite to Firefrost Discord + +### Step 4.7: Test Bot + +In Discord: +``` +/ask What is the Frostwall Protocol? +``` +Should return answer from Operations workspace (staff only) + +--- + +## Phase 5: Testing and Validation (30 minutes) + +### Test 1: DERP Backup (Strategic Query) + +**Simulate Claude outage:** +1. Load Qwen model: `ollama run qwen2.5-coder:72b` +2. In Dify Operations workspace, ask: + - "Should I deploy Mailcow before or after Frostwall Protocol?" +3. Verify: + - Response references both task docs + - Shows dependency understanding + - Recommends Frostwall first + +### Test 2: Discord Bot (Staff Query) + +As staff member in Discord: +``` +/ask How many game servers are running? +``` +Should return infrastructure details + +### Test 3: Discord Bot (Subscriber Query) + +As subscriber in Discord: +``` +/ask What modpacks are available? +``` +Should return modpack list (limited to public info) + +### Test 4: Resource Monitoring + +```bash +# Check RAM usage with model loaded +free -h +# Should show ~92GB used when Qwen loaded + +# Check disk usage +df -h /opt/dify +# Should show ~97GB used + +# Check Docker containers +docker ps +# All Dify services should be running +``` + +--- + +## Phase 6: Documentation (1 hour) + +### Create Usage Guide + +Document at `/opt/dify/USAGE-GUIDE.md`: +- When to use Claude (primary) +- When to use DERP (Claude down) +- When to use Discord bot (routine queries) +- Emergency procedures + +### Update Operations Manual + +Commit changes to Git: +- Task documentation updated +- Deployment plan complete +- Usage guide created + +--- + +## Success Criteria Checklist + +- [ ] Dify deployed and accessible at https://ai.firefrostgaming.com +- [ ] Ollama running with all 3 models loaded +- [ ] Operations workspace indexing complete (416 files) +- [ ] Brainstorming workspace indexing complete +- [ ] DERP backup tested (strategic query works) +- [ ] Discord bot deployed and running +- [ ] Staff can query via Discord (/ask command) +- [ ] Subscribers have limited access +- [ ] Resource usage within TX1 limits (~92GB RAM, ~97GB storage) +- [ ] Documentation complete and committed to Git +- [ ] Zero additional monthly cost confirmed + +--- + +## Rollback Plan + +If deployment fails: + +```bash +# Stop all services +cd /opt/dify +docker-compose down + +# Stop Discord bot +systemctl stop firefrost-discord-bot +systemctl disable firefrost-discord-bot + +# Remove installation +rm -rf /opt/dify +rm -rf /opt/firefrost-discord-bot +rm /etc/systemd/system/firefrost-discord-bot.service +systemctl daemon-reload + +# Remove Nginx config +rm /etc/nginx/sites-enabled/ai.firefrostgaming.com +rm /etc/nginx/sites-available/ai.firefrostgaming.com +nginx -t && systemctl reload nginx + +# Uninstall Ollama +sudo /usr/local/bin/ollama-uninstall.sh +``` + +--- + +**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ diff --git a/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md b/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md new file mode 100644 index 0000000..e668e5b --- /dev/null +++ b/docs/tasks/self-hosted-ai-stack-on-tx1/resource-requirements.md @@ -0,0 +1,367 @@ +# AI Stack Resource Requirements + +**Server:** TX1 Dallas (38.68.14.26) +**Purpose:** Resource allocation planning +**Last Updated:** 2026-02-18 + +--- + +## TX1 Server Specifications + +**CPU:** 32 vCPU +**RAM:** 256GB +**Storage:** 1TB NVMe SSD +**Location:** Dallas, TX +**Network:** 1Gbps + +**Current Usage (before AI stack):** +- Game servers: 6 Minecraft instances +- Management services: Minimal overhead +- Available for AI: Significant capacity + +--- + +## Storage Requirements + +### Component Breakdown + +| Component | Size | Purpose | +|-----------|------|---------| +| **Qwen 2.5 Coder 72B** | ~40GB | Infrastructure/coding model | +| **Llama 3.3 70B** | ~40GB | General reasoning model | +| **Llama 3.2 Vision 11B** | ~7GB | Image analysis model | +| **Dify Services** | ~5GB | Docker containers, databases | +| **Knowledge Base** | ~5GB | Indexed docs, embeddings | +| **Logs & Temp** | ~2GB | Operational overhead | +| **Total** | **~99GB** | ✅ Well under 1TB limit | + +### Storage Growth Estimate + +**Year 1:** +- Models: 87GB (static, no growth unless upgrading) +- Knowledge base: 5GB → 8GB (as docs grow) +- Logs: 2GB → 5GB (6 months rotation) +- **Total Year 1:** ~100GB + +**Storage is NOT a concern.** + +--- + +## RAM Requirements + +### Scenario 1: Normal Operations (Claude Available) + +| Component | RAM Usage | +|-----------|-----------| +| **Dify Services** | ~4GB | +| **PostgreSQL** | ~2GB | +| **Redis** | ~1GB | +| **Ollama (idle)** | <1GB | +| **Total (idle)** | **~8GB** ✅ | + +**Game servers have ~248GB available** (256GB - 8GB) + +--- + +### Scenario 2: DERP Activated (Claude Down, Emergency) + +**Load ONE large model at a time:** + +| Component | RAM Usage | +|-----------|-----------| +| **Qwen 2.5 Coder 72B** OR **Llama 3.3 70B** | ~80GB | +| **Dify Services** | ~4GB | +| **PostgreSQL** | ~2GB | +| **Redis** | ~1GB | +| **Ollama Runtime** | ~2GB | +| **OS Overhead** | ~3GB | +| **Total (active DERP)** | **~92GB** ✅ | + +**Game servers have ~164GB available** (256GB - 92GB) + +**Critical:** DO NOT load both large models simultaneously (160GB would impact game servers) + +--- + +### Scenario 3: Vision Model Only (Screenshot Analysis) + +| Component | RAM Usage | +|-----------|-----------| +| **Llama 3.2 Vision 11B** | ~7GB | +| **Dify Services** | ~4GB | +| **Other Services** | ~3GB | +| **Total** | **~14GB** ✅ | + +**Very lightweight, can run alongside game servers with no impact** + +--- + +## CPU Requirements + +### Model Inference Performance + +**TX1 has 32 vCPU (shared among all services)** + +**Expected Inference Times:** + +| Model | Token Generation Speed | Typical Response | +|-------|----------------------|------------------| +| **Qwen 2.5 Coder 72B** | ~3-5 tokens/second | 30-120 seconds | +| **Llama 3.3 70B** | ~3-5 tokens/second | 30-120 seconds | +| **Llama 3.2 Vision 11B** | ~8-12 tokens/second | 10-45 seconds | + +**For comparison:** +- Claude API: 20-40 tokens/second +- **DERP is 5-10× slower** (this is expected and acceptable for emergency use) + +**CPU Impact on Game Servers:** +- During DERP inference: ~70-80% CPU usage (temporary spikes) +- Game servers may experience brief lag during AI responses +- **Acceptable for emergency use** (not for normal operations) + +--- + +## Network Requirements + +### Initial Model Downloads (One-Time) + +| Model | Size | Download Time (1Gbps) | +|-------|------|----------------------| +| **Qwen 2.5 Coder 72B** | ~40GB | 5-10 minutes | +| **Llama 3.3 70B** | ~40GB | 5-10 minutes | +| **Llama 3.2 Vision 11B** | ~7GB | 1-2 minutes | +| **Total** | **~87GB** | **15-25 minutes** | + +**Reality:** Download speeds vary, budget 2-4 hours for all models. + +**Recommendation:** Download overnight to avoid impacting game server traffic. + +--- + +### Ongoing Bandwidth + +**Dify Web Interface:** +- Minimal (text-based queries) +- ~1-5 KB per query +- Negligible impact + +**Discord Bot:** +- Text-based queries only +- ~1-5 KB per query +- Negligible impact + +**Model Updates:** +- Infrequent (quarterly at most) +- Same as initial download (~87GB) +- Schedule during low-traffic periods + +--- + +## Resource Allocation Strategy + +### Priority Levels + +**Priority 1 (Always):** Game Servers +**Priority 2 (Normal):** Management Services (Pterodactyl, Gitea, etc.) +**Priority 3 (Emergency Only):** DERP AI Stack + +### RAM Allocation Rules + +**Normal Operations:** +- Game servers: Up to 240GB +- Management: ~8GB +- AI Stack (idle): ~8GB +- **Total: 256GB** ✅ + +**DERP Emergency:** +- Game servers: Temporarily limited to 160GB +- Management: ~8GB +- AI Stack (active): ~92GB +- **Total: 260GB** ⚠️ (4GB overcommit acceptable for brief periods) + +**If RAM pressure occurs during DERP:** +1. Unload one game server temporarily +2. Run AI query +3. Reload game server +4. **Total downtime per query: <5 minutes** + +--- + +## Monitoring & Alerts + +### Critical Thresholds + +**RAM Usage:** +- **Warning:** >220GB used (85%) +- **Critical:** >240GB used (93%) +- **Action:** Defer DERP usage or unload game server + +**CPU Usage:** +- **Warning:** >80% sustained for >5 minutes +- **Critical:** >90% sustained for >2 minutes +- **Action:** Pause AI inference, prioritize game servers + +**Storage:** +- **Warning:** >800GB used (80%) +- **Critical:** >900GB used (90%) +- **Action:** Clean up old logs, model cache + +### Monitoring Commands + +```bash +# Check RAM +free -h + +# Check CPU +htop + +# Check storage +df -h / + +# Check Ollama status +ollama list +ollama ps # Shows loaded models + +# Check Dify +cd /opt/dify +docker-compose ps +docker stats # Real-time resource usage +``` + +--- + +## Resource Optimization + +### Unload Models When Not Needed + +```bash +# Unload all models (frees RAM) +ollama stop qwen2.5-coder:72b +ollama stop llama3.3:70b +ollama stop llama3.2-vision:11b + +# Verify RAM freed +free -h +``` + +### Preload Models for Faster Response + +```bash +# Preload model (takes ~30 seconds) +ollama run qwen2.5-coder:72b "" +# Model now in RAM, queries will be faster +``` + +### Schedule Maintenance Windows + +**Best time for model downloads/updates:** +- Tuesday/Wednesday 2-6 AM CST (lowest traffic) +- Announce in Discord 24 hours ahead +- Expected downtime: <10 minutes + +--- + +## Capacity Planning + +### Current State (Feb 2026) +- **Game servers:** 6 active +- **RAM available:** 256GB +- **Storage available:** 1TB +- **AI stack:** Fits comfortably + +### Growth Scenarios + +**Scenario 1: Add 6 more game servers (12 total)** +- Additional RAM needed: ~60GB +- Available for AI (normal): 248GB → 188GB ✅ +- Available for AI (DERP): 164GB → 104GB ✅ +- **Status:** Still viable + +**Scenario 2: Add 12 more game servers (18 total)** +- Additional RAM needed: ~120GB +- Available for AI (normal): 248GB → 128GB ✅ +- Available for AI (DERP): 164GB → 44GB ⚠️ +- **Status:** DERP would require unloading 2 game servers + +**Scenario 3: Upgrade to larger models (theoretical)** +- Qwen 3.0 Coder 170B: ~180GB RAM +- **Status:** Would NOT fit alongside game servers +- **Recommendation:** Stick with 72B models + +### Upgrade Path + +**If TX1 reaches capacity:** + +**Option A: Add second dedicated AI server** +- Move AI stack to separate VPS +- TX1 focuses only on game servers +- Cost: ~$100-200/month (NOT DERP-compliant) + +**Option B: Upgrade TX1 RAM** +- 256GB → 512GB +- Cost: Contact Hetzner for pricing +- **Preferred:** Maintains DERP compliance + +**Option C: Use smaller AI models** +- Qwen 2.5 Coder 32B (~35GB RAM) +- Llama 3.2 8B (~8GB RAM) +- **Tradeoff:** Lower quality, but more capacity + +--- + +## Disaster Recovery + +### Backup Strategy + +**What to backup:** +- ✅ Dify configuration files +- ✅ Knowledge base data +- ✅ Discord bot code +- ❌ Models (can re-download) + +**Backup location:** +- Git repository (for configs/code) +- NC1 Charlotte (for knowledge base) + +**Backup frequency:** +- Configurations: After every change +- Knowledge base: Weekly +- Models: No backup needed + +### Recovery Procedure + +**If TX1 fails completely:** + +1. Deploy Dify on NC1 (temporary) +2. Restore knowledge base from backup +3. Re-download models (~4 hours) +4. Point Discord bot to NC1 +5. **Downtime: 4-6 hours** + +**Note:** This is acceptable for DERP (emergency-only system) + +--- + +## Cost Analysis + +### One-Time Costs +- Setup time: 6-8 hours (Michael's time) +- Model downloads: Bandwidth usage (included in hosting) +- **Total: $0** (sweat equity only) + +### Monthly Costs +- Hosting: $0 (using existing TX1) +- Bandwidth: $0 (included in hosting) +- Maintenance: ~1 hour/month (Michael's time) +- **Total: $0/month** ✅ + +### Opportunity Cost +- RAM reserved for AI: ~8GB (idle) or ~92GB (active DERP) +- Could host 1-2 more game servers in that space +- **Acceptable tradeoff:** DERP independence worth more than 2 game servers + +--- + +**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ + +**TX1 has the capacity. Resources are allocated wisely. $0 monthly cost maintained.** diff --git a/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md b/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md new file mode 100644 index 0000000..0967119 --- /dev/null +++ b/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md @@ -0,0 +1,342 @@ +# AI Stack Usage Guide + +**Purpose:** Know which AI system to use when +**Last Updated:** 2026-02-18 + +--- + +## The Three-Tier System + +### Tier 1: Claude Projects (Primary) - **USE THIS FIRST** + +**Who:** Michael + Meg +**Where:** claude.ai or Claude app +**Cost:** $20/month (already paying) + +**When to use:** +- ✅ **Normal daily operations** (99% of the time) +- ✅ **Strategic decision-making** (deployment order, architecture) +- ✅ **Complex reasoning** (tradeoffs, dependencies) +- ✅ **Session continuity** (remembers context across days) +- ✅ **Best experience** (fastest, most capable) + +**What Claude can do:** +- Search entire 416-file operations manual +- Write deployment scripts +- Review infrastructure decisions +- Generate documentation +- Debug issues +- Plan roadmaps + +**Example queries:** +- "Should I deploy Mailcow or AI stack first?" +- "Write a script to deploy Frostwall Protocol" +- "What tasks depend on NC1 cleanup?" +- "Help me troubleshoot this Pterodactyl error" + +**Limitations:** +- Requires internet connection +- Subject to Anthropic availability + +--- + +### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN** + +**Who:** Michael + Meg +**Where:** https://ai.firefrostgaming.com +**Cost:** $0/month (self-hosted on TX1) + +**When to use:** +- ❌ **Not for normal operations** (Claude is faster/better) +- ✅ **Anthropic outage** (Claude unavailable for hours) +- ✅ **Emergency infrastructure decisions** (can't wait for Claude) +- ✅ **Critical troubleshooting** (server down, need immediate help) + +**What DERP can do:** +- Query indexed operations manual (416 files) +- Strategic reasoning with 128K context +- Infrastructure troubleshooting +- Code generation +- Emergency deployment guidance + +**Available models:** +- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions +- **Llama 3.3 70B** - General reasoning +- **Llama 3.2 Vision 11B** - Screenshot analysis + +**Example queries:** +- "Claude is down. What's the deployment order for Frostwall?" +- "Emergency: Mailcow not starting. Check logs and diagnose." +- "Need to deploy something NOW. What dependencies are missing?" + +**Limitations:** +- Slower inference than Claude +- No session continuity +- Manual model selection +- Uses TX1 resources (~80GB RAM when active) + +**How to activate:** +1. Verify Claude is unavailable (try multiple times) +2. Go to https://ai.firefrostgaming.com +3. Select workspace: + - **Operations** - Infrastructure decisions + - **Brainstorming** - Creative work +4. Select model: + - **Qwen 2.5 Coder** - For deployment/troubleshooting + - **Llama 3.3** - For general questions +5. Ask question +6. Copy/paste response as needed + +**When to deactivate:** +- Claude comes back online +- Emergency resolved +- Free up TX1 RAM for game servers + +--- + +### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES** + +**Who:** Staff + Subscribers +**Where:** Firefrost Discord server +**Cost:** $0/month (same infrastructure) + +**When to use:** +- ✅ **Routine questions** (daily operations) +- ✅ **Quick lookups** (server status, modpack info) +- ✅ **Staff training** (how-to queries) +- ✅ **Subscriber support** (basic info) + +**Commands:** + +**`/ask [question]`** +- Available to: Staff + Subscribers +- Searches: Operations workspace (staff) or public docs (subscribers) +- Rate limit: 10 queries/hour per user + +**Example queries (Staff):** +``` +/ask How many game servers are running? +/ask What's the Whitelist Manager deployment status? +/ask How do I restart a Minecraft server? +``` + +**Example queries (Subscribers):** +``` +/ask What modpacks are available? +/ask How do I join a server? +/ask What's the difference between Fire and Frost paths? +``` + +**Role-based access:** +- **Staff:** Full Operations workspace access +- **Subscribers:** Public documentation only +- **No role:** Cannot use bot + +**Limitations:** +- Simple queries only (no complex reasoning) +- No file uploads +- No strategic decisions +- Rate limited + +--- + +## Decision Tree + +``` +┌─────────────────────────────────────┐ +│ Do you need AI assistance? │ +└─────────────┬───────────────────────┘ + │ + ▼ + ┌───────────────┐ + │ Is it urgent? │ + └───┬───────┬───┘ + │ │ + NO│ │YES + │ │ + ▼ ▼ + ┌─────────┐ ┌──────────────┐ + │ Claude │ │ Is Claude │ + │ working?│ │ available? │ + └───┬─────┘ └──┬───────┬───┘ + │ │ │ + YES│ YES│ │NO + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌──────────┐ ┌─────────┐ + │Use Claude│ │Use Claude│ │Use DERP │ + │Projects │ │Projects │ │Backup │ + └──────────┘ └──────────┘ └─────────┘ +``` + +**For staff/subscribers:** +``` +┌────────────────────────────┐ +│ Simple routine query? │ +└──────────┬─────────────────┘ + │ + YES + │ + ▼ + ┌──────────────┐ + │ Use Discord │ + │ Bot: /ask │ + └──────────────┘ +``` + +--- + +## Emergency Procedures + +### Scenario 1: Claude Down, Need Strategic Decision + +**Problem:** Anthropic outage, need to deploy something NOW + +**Solution:** +1. Verify Claude truly unavailable (try web + app) +2. Go to https://ai.firefrostgaming.com +3. Login with Michael's account +4. Select Operations workspace +5. Select Qwen 2.5 Coder model +6. Ask strategic question +7. Copy deployment commands +8. Execute carefully (no session memory!) + +**Note:** DERP doesn't remember context. Be explicit in each query. + +### Scenario 2: Discord Bot Down + +**Problem:** Staff reporting bot not responding + +**Check status:** +```bash +ssh root@38.68.14.26 +systemctl status firefrost-discord-bot +``` + +**If stopped:** +```bash +systemctl start firefrost-discord-bot +``` + +**If errors:** +```bash +journalctl -u firefrost-discord-bot -f +# Check for API errors, token issues +``` + +**If Dify down:** +```bash +cd /opt/dify +docker-compose ps +# If services down: +docker-compose up -d +``` + +### Scenario 3: Model Won't Load + +**Problem:** DERP system reports "model unavailable" + +**Check Ollama:** +```bash +ollama list +# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b +``` + +**If models missing:** +```bash +# Re-download +ollama pull qwen2.5-coder:72b +ollama pull llama3.3:70b +ollama pull llama3.2-vision:11b +``` + +**Check RAM:** +```bash +free -h +# If <90GB free, unload game servers temporarily +``` + +--- + +## Cost Tracking + +### Monthly Costs +- **Claude Projects:** $20/month (primary system) +- **Dify:** $0/month (self-hosted) +- **Ollama:** $0/month (self-hosted) +- **Discord Bot:** $0/month (self-hosted) +- **Total:** $20/month ✅ + +### Resource Usage (TX1) +- **Storage:** ~97GB (one-time) +- **RAM (active DERP):** ~92GB (temporary) +- **RAM (idle):** <5GB (normal) +- **Bandwidth:** Models downloaded once, minimal ongoing + +--- + +## Performance Expectations + +### Claude Projects (Primary) +- **Response time:** 5-30 seconds +- **Quality:** Excellent (GPT-4 class) +- **Context:** Full repo (416 files) +- **Session memory:** Yes + +### DERP Backup (Emergency) +- **Response time:** 30-120 seconds (slower than Claude) +- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model) +- **Context:** 128K tokens per query +- **Session memory:** No (each query independent) + +### Discord Bot (Routine) +- **Response time:** 10-45 seconds +- **Quality:** Good for simple queries +- **Context:** Knowledge base search +- **Rate limit:** 10 queries/hour per user + +--- + +## Best Practices + +### For Michael + Meg: +1. ✅ **Always use Claude Projects first** (best experience) +2. ✅ **Only use DERP for true emergencies** (Claude unavailable) +3. ✅ **Document DERP usage** (so Claude can learn from it later) +4. ✅ **Free TX1 RAM after DERP use** (restart Ollama if needed) + +### For Staff: +1. ✅ **Use Discord bot for quick lookups** (fast, simple) +2. ✅ **Ask Michael/Meg for complex questions** (they have Claude) +3. ✅ **Don't abuse rate limits** (10 queries/hour is generous) +4. ✅ **Report bot issues immediately** (don't let it stay broken) + +### For Subscribers: +1. ✅ **Use Discord bot for server info** (join instructions, modpacks) +2. ✅ **Don't ask for staff-only info** (bot will decline) +3. ✅ **Be patient** (bot shares resources with staff) + +--- + +## Training & Onboarding + +### New Staff Training: +1. Introduce Discord bot commands (`/ask`) +2. Show example queries (moderation, server management) +3. Explain rate limits +4. When to escalate to Michael/Meg + +### Subscriber Communication: +1. Announce bot in Discord +2. Pin message with `/ask` command +3. Example queries in welcome channel +4. FAQ: "What can the bot answer?" + +--- + +**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ + +**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.** + +**Monthly cost: $20 (no increase)**