Files
firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
The Chronicler 96f20e8715 Task #9: Rewrite AI Stack architecture for DERP compliance
Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
2026-02-18 17:27:25 +00:00

343 lines
9.3 KiB
Markdown

# AI Stack Usage Guide
**Purpose:** Know which AI system to use when
**Last Updated:** 2026-02-18
---
## The Three-Tier System
### Tier 1: Claude Projects (Primary) - **USE THIS FIRST**
**Who:** Michael + Meg
**Where:** claude.ai or Claude app
**Cost:** $20/month (already paying)
**When to use:**
-**Normal daily operations** (99% of the time)
-**Strategic decision-making** (deployment order, architecture)
-**Complex reasoning** (tradeoffs, dependencies)
-**Session continuity** (remembers context across days)
-**Best experience** (fastest, most capable)
**What Claude can do:**
- Search entire 416-file operations manual
- Write deployment scripts
- Review infrastructure decisions
- Generate documentation
- Debug issues
- Plan roadmaps
**Example queries:**
- "Should I deploy Mailcow or AI stack first?"
- "Write a script to deploy Frostwall Protocol"
- "What tasks depend on NC1 cleanup?"
- "Help me troubleshoot this Pterodactyl error"
**Limitations:**
- Requires internet connection
- Subject to Anthropic availability
---
### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN**
**Who:** Michael + Meg
**Where:** https://ai.firefrostgaming.com
**Cost:** $0/month (self-hosted on TX1)
**When to use:**
-**Not for normal operations** (Claude is faster/better)
-**Anthropic outage** (Claude unavailable for hours)
-**Emergency infrastructure decisions** (can't wait for Claude)
-**Critical troubleshooting** (server down, need immediate help)
**What DERP can do:**
- Query indexed operations manual (416 files)
- Strategic reasoning with 128K context
- Infrastructure troubleshooting
- Code generation
- Emergency deployment guidance
**Available models:**
- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions
- **Llama 3.3 70B** - General reasoning
- **Llama 3.2 Vision 11B** - Screenshot analysis
**Example queries:**
- "Claude is down. What's the deployment order for Frostwall?"
- "Emergency: Mailcow not starting. Check logs and diagnose."
- "Need to deploy something NOW. What dependencies are missing?"
**Limitations:**
- Slower inference than Claude
- No session continuity
- Manual model selection
- Uses TX1 resources (~80GB RAM when active)
**How to activate:**
1. Verify Claude is unavailable (try multiple times)
2. Go to https://ai.firefrostgaming.com
3. Select workspace:
- **Operations** - Infrastructure decisions
- **Brainstorming** - Creative work
4. Select model:
- **Qwen 2.5 Coder** - For deployment/troubleshooting
- **Llama 3.3** - For general questions
5. Ask question
6. Copy/paste response as needed
**When to deactivate:**
- Claude comes back online
- Emergency resolved
- Free up TX1 RAM for game servers
---
### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES**
**Who:** Staff + Subscribers
**Where:** Firefrost Discord server
**Cost:** $0/month (same infrastructure)
**When to use:**
-**Routine questions** (daily operations)
-**Quick lookups** (server status, modpack info)
-**Staff training** (how-to queries)
-**Subscriber support** (basic info)
**Commands:**
**`/ask [question]`**
- Available to: Staff + Subscribers
- Searches: Operations workspace (staff) or public docs (subscribers)
- Rate limit: 10 queries/hour per user
**Example queries (Staff):**
```
/ask How many game servers are running?
/ask What's the Whitelist Manager deployment status?
/ask How do I restart a Minecraft server?
```
**Example queries (Subscribers):**
```
/ask What modpacks are available?
/ask How do I join a server?
/ask What's the difference between Fire and Frost paths?
```
**Role-based access:**
- **Staff:** Full Operations workspace access
- **Subscribers:** Public documentation only
- **No role:** Cannot use bot
**Limitations:**
- Simple queries only (no complex reasoning)
- No file uploads
- No strategic decisions
- Rate limited
---
## Decision Tree
```
┌─────────────────────────────────────┐
│ Do you need AI assistance? │
└─────────────┬───────────────────────┘
┌───────────────┐
│ Is it urgent? │
└───┬───────┬───┘
│ │
NO│ │YES
│ │
▼ ▼
┌─────────┐ ┌──────────────┐
│ Claude │ │ Is Claude │
│ working?│ │ available? │
└───┬─────┘ └──┬───────┬───┘
│ │ │
YES│ YES│ │NO
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌─────────┐
│Use Claude│ │Use Claude│ │Use DERP │
│Projects │ │Projects │ │Backup │
└──────────┘ └──────────┘ └─────────┘
```
**For staff/subscribers:**
```
┌────────────────────────────┐
│ Simple routine query? │
└──────────┬─────────────────┘
YES
┌──────────────┐
│ Use Discord │
│ Bot: /ask │
└──────────────┘
```
---
## Emergency Procedures
### Scenario 1: Claude Down, Need Strategic Decision
**Problem:** Anthropic outage, need to deploy something NOW
**Solution:**
1. Verify Claude truly unavailable (try web + app)
2. Go to https://ai.firefrostgaming.com
3. Login with Michael's account
4. Select Operations workspace
5. Select Qwen 2.5 Coder model
6. Ask strategic question
7. Copy deployment commands
8. Execute carefully (no session memory!)
**Note:** DERP doesn't remember context. Be explicit in each query.
### Scenario 2: Discord Bot Down
**Problem:** Staff reporting bot not responding
**Check status:**
```bash
ssh root@38.68.14.26
systemctl status firefrost-discord-bot
```
**If stopped:**
```bash
systemctl start firefrost-discord-bot
```
**If errors:**
```bash
journalctl -u firefrost-discord-bot -f
# Check for API errors, token issues
```
**If Dify down:**
```bash
cd /opt/dify
docker-compose ps
# If services down:
docker-compose up -d
```
### Scenario 3: Model Won't Load
**Problem:** DERP system reports "model unavailable"
**Check Ollama:**
```bash
ollama list
# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b
```
**If models missing:**
```bash
# Re-download
ollama pull qwen2.5-coder:72b
ollama pull llama3.3:70b
ollama pull llama3.2-vision:11b
```
**Check RAM:**
```bash
free -h
# If <90GB free, unload game servers temporarily
```
---
## Cost Tracking
### Monthly Costs
- **Claude Projects:** $20/month (primary system)
- **Dify:** $0/month (self-hosted)
- **Ollama:** $0/month (self-hosted)
- **Discord Bot:** $0/month (self-hosted)
- **Total:** $20/month ✅
### Resource Usage (TX1)
- **Storage:** ~97GB (one-time)
- **RAM (active DERP):** ~92GB (temporary)
- **RAM (idle):** <5GB (normal)
- **Bandwidth:** Models downloaded once, minimal ongoing
---
## Performance Expectations
### Claude Projects (Primary)
- **Response time:** 5-30 seconds
- **Quality:** Excellent (GPT-4 class)
- **Context:** Full repo (416 files)
- **Session memory:** Yes
### DERP Backup (Emergency)
- **Response time:** 30-120 seconds (slower than Claude)
- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model)
- **Context:** 128K tokens per query
- **Session memory:** No (each query independent)
### Discord Bot (Routine)
- **Response time:** 10-45 seconds
- **Quality:** Good for simple queries
- **Context:** Knowledge base search
- **Rate limit:** 10 queries/hour per user
---
## Best Practices
### For Michael + Meg:
1.**Always use Claude Projects first** (best experience)
2.**Only use DERP for true emergencies** (Claude unavailable)
3.**Document DERP usage** (so Claude can learn from it later)
4.**Free TX1 RAM after DERP use** (restart Ollama if needed)
### For Staff:
1.**Use Discord bot for quick lookups** (fast, simple)
2.**Ask Michael/Meg for complex questions** (they have Claude)
3.**Don't abuse rate limits** (10 queries/hour is generous)
4.**Report bot issues immediately** (don't let it stay broken)
### For Subscribers:
1.**Use Discord bot for server info** (join instructions, modpacks)
2.**Don't ask for staff-only info** (bot will decline)
3.**Be patient** (bot shares resources with staff)
---
## Training & Onboarding
### New Staff Training:
1. Introduce Discord bot commands (`/ask`)
2. Show example queries (moderation, server management)
3. Explain rate limits
4. When to escalate to Michael/Meg
### Subscriber Communication:
1. Announce bot in Discord
2. Pin message with `/ask` command
3. Example queries in welcome channel
4. FAQ: "What can the bot answer?"
---
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.**
**Monthly cost: $20 (no increase)**