Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
343 lines
9.3 KiB
Markdown
343 lines
9.3 KiB
Markdown
# AI Stack Usage Guide
|
|
|
|
**Purpose:** Know which AI system to use when
|
|
**Last Updated:** 2026-02-18
|
|
|
|
---
|
|
|
|
## The Three-Tier System
|
|
|
|
### Tier 1: Claude Projects (Primary) - **USE THIS FIRST**
|
|
|
|
**Who:** Michael + Meg
|
|
**Where:** claude.ai or Claude app
|
|
**Cost:** $20/month (already paying)
|
|
|
|
**When to use:**
|
|
- ✅ **Normal daily operations** (99% of the time)
|
|
- ✅ **Strategic decision-making** (deployment order, architecture)
|
|
- ✅ **Complex reasoning** (tradeoffs, dependencies)
|
|
- ✅ **Session continuity** (remembers context across days)
|
|
- ✅ **Best experience** (fastest, most capable)
|
|
|
|
**What Claude can do:**
|
|
- Search entire 416-file operations manual
|
|
- Write deployment scripts
|
|
- Review infrastructure decisions
|
|
- Generate documentation
|
|
- Debug issues
|
|
- Plan roadmaps
|
|
|
|
**Example queries:**
|
|
- "Should I deploy Mailcow or AI stack first?"
|
|
- "Write a script to deploy Frostwall Protocol"
|
|
- "What tasks depend on NC1 cleanup?"
|
|
- "Help me troubleshoot this Pterodactyl error"
|
|
|
|
**Limitations:**
|
|
- Requires internet connection
|
|
- Subject to Anthropic availability
|
|
|
|
---
|
|
|
|
### Tier 2: DERP Backup (Emergency Only) - **WHEN CLAUDE IS DOWN**
|
|
|
|
**Who:** Michael + Meg
|
|
**Where:** https://ai.firefrostgaming.com
|
|
**Cost:** $0/month (self-hosted on TX1)
|
|
|
|
**When to use:**
|
|
- ❌ **Not for normal operations** (Claude is faster/better)
|
|
- ✅ **Anthropic outage** (Claude unavailable for hours)
|
|
- ✅ **Emergency infrastructure decisions** (can't wait for Claude)
|
|
- ✅ **Critical troubleshooting** (server down, need immediate help)
|
|
|
|
**What DERP can do:**
|
|
- Query indexed operations manual (416 files)
|
|
- Strategic reasoning with 128K context
|
|
- Infrastructure troubleshooting
|
|
- Code generation
|
|
- Emergency deployment guidance
|
|
|
|
**Available models:**
|
|
- **Qwen 2.5 Coder 72B** - Infrastructure/coding questions
|
|
- **Llama 3.3 70B** - General reasoning
|
|
- **Llama 3.2 Vision 11B** - Screenshot analysis
|
|
|
|
**Example queries:**
|
|
- "Claude is down. What's the deployment order for Frostwall?"
|
|
- "Emergency: Mailcow not starting. Check logs and diagnose."
|
|
- "Need to deploy something NOW. What dependencies are missing?"
|
|
|
|
**Limitations:**
|
|
- Slower inference than Claude
|
|
- No session continuity
|
|
- Manual model selection
|
|
- Uses TX1 resources (~80GB RAM when active)
|
|
|
|
**How to activate:**
|
|
1. Verify Claude is unavailable (try multiple times)
|
|
2. Go to https://ai.firefrostgaming.com
|
|
3. Select workspace:
|
|
- **Operations** - Infrastructure decisions
|
|
- **Brainstorming** - Creative work
|
|
4. Select model:
|
|
- **Qwen 2.5 Coder** - For deployment/troubleshooting
|
|
- **Llama 3.3** - For general questions
|
|
5. Ask question
|
|
6. Copy/paste response as needed
|
|
|
|
**When to deactivate:**
|
|
- Claude comes back online
|
|
- Emergency resolved
|
|
- Free up TX1 RAM for game servers
|
|
|
|
---
|
|
|
|
### Tier 3: Discord Bot (Staff/Subscribers) - **ROUTINE QUERIES**
|
|
|
|
**Who:** Staff + Subscribers
|
|
**Where:** Firefrost Discord server
|
|
**Cost:** $0/month (same infrastructure)
|
|
|
|
**When to use:**
|
|
- ✅ **Routine questions** (daily operations)
|
|
- ✅ **Quick lookups** (server status, modpack info)
|
|
- ✅ **Staff training** (how-to queries)
|
|
- ✅ **Subscriber support** (basic info)
|
|
|
|
**Commands:**
|
|
|
|
**`/ask [question]`**
|
|
- Available to: Staff + Subscribers
|
|
- Searches: Operations workspace (staff) or public docs (subscribers)
|
|
- Rate limit: 10 queries/hour per user
|
|
|
|
**Example queries (Staff):**
|
|
```
|
|
/ask How many game servers are running?
|
|
/ask What's the Whitelist Manager deployment status?
|
|
/ask How do I restart a Minecraft server?
|
|
```
|
|
|
|
**Example queries (Subscribers):**
|
|
```
|
|
/ask What modpacks are available?
|
|
/ask How do I join a server?
|
|
/ask What's the difference between Fire and Frost paths?
|
|
```
|
|
|
|
**Role-based access:**
|
|
- **Staff:** Full Operations workspace access
|
|
- **Subscribers:** Public documentation only
|
|
- **No role:** Cannot use bot
|
|
|
|
**Limitations:**
|
|
- Simple queries only (no complex reasoning)
|
|
- No file uploads
|
|
- No strategic decisions
|
|
- Rate limited
|
|
|
|
---
|
|
|
|
## Decision Tree
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ Do you need AI assistance? │
|
|
└─────────────┬───────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────┐
|
|
│ Is it urgent? │
|
|
└───┬───────┬───┘
|
|
│ │
|
|
NO│ │YES
|
|
│ │
|
|
▼ ▼
|
|
┌─────────┐ ┌──────────────┐
|
|
│ Claude │ │ Is Claude │
|
|
│ working?│ │ available? │
|
|
└───┬─────┘ └──┬───────┬───┘
|
|
│ │ │
|
|
YES│ YES│ │NO
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌──────────┐ ┌──────────┐ ┌─────────┐
|
|
│Use Claude│ │Use Claude│ │Use DERP │
|
|
│Projects │ │Projects │ │Backup │
|
|
└──────────┘ └──────────┘ └─────────┘
|
|
```
|
|
|
|
**For staff/subscribers:**
|
|
```
|
|
┌────────────────────────────┐
|
|
│ Simple routine query? │
|
|
└──────────┬─────────────────┘
|
|
│
|
|
YES
|
|
│
|
|
▼
|
|
┌──────────────┐
|
|
│ Use Discord │
|
|
│ Bot: /ask │
|
|
└──────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Procedures
|
|
|
|
### Scenario 1: Claude Down, Need Strategic Decision
|
|
|
|
**Problem:** Anthropic outage, need to deploy something NOW
|
|
|
|
**Solution:**
|
|
1. Verify Claude truly unavailable (try web + app)
|
|
2. Go to https://ai.firefrostgaming.com
|
|
3. Login with Michael's account
|
|
4. Select Operations workspace
|
|
5. Select Qwen 2.5 Coder model
|
|
6. Ask strategic question
|
|
7. Copy deployment commands
|
|
8. Execute carefully (no session memory!)
|
|
|
|
**Note:** DERP doesn't remember context. Be explicit in each query.
|
|
|
|
### Scenario 2: Discord Bot Down
|
|
|
|
**Problem:** Staff reporting bot not responding
|
|
|
|
**Check status:**
|
|
```bash
|
|
ssh root@38.68.14.26
|
|
systemctl status firefrost-discord-bot
|
|
```
|
|
|
|
**If stopped:**
|
|
```bash
|
|
systemctl start firefrost-discord-bot
|
|
```
|
|
|
|
**If errors:**
|
|
```bash
|
|
journalctl -u firefrost-discord-bot -f
|
|
# Check for API errors, token issues
|
|
```
|
|
|
|
**If Dify down:**
|
|
```bash
|
|
cd /opt/dify
|
|
docker-compose ps
|
|
# If services down:
|
|
docker-compose up -d
|
|
```
|
|
|
|
### Scenario 3: Model Won't Load
|
|
|
|
**Problem:** DERP system reports "model unavailable"
|
|
|
|
**Check Ollama:**
|
|
```bash
|
|
ollama list
|
|
# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b
|
|
```
|
|
|
|
**If models missing:**
|
|
```bash
|
|
# Re-download
|
|
ollama pull qwen2.5-coder:72b
|
|
ollama pull llama3.3:70b
|
|
ollama pull llama3.2-vision:11b
|
|
```
|
|
|
|
**Check RAM:**
|
|
```bash
|
|
free -h
|
|
# If <90GB free, unload game servers temporarily
|
|
```
|
|
|
|
---
|
|
|
|
## Cost Tracking
|
|
|
|
### Monthly Costs
|
|
- **Claude Projects:** $20/month (primary system)
|
|
- **Dify:** $0/month (self-hosted)
|
|
- **Ollama:** $0/month (self-hosted)
|
|
- **Discord Bot:** $0/month (self-hosted)
|
|
- **Total:** $20/month ✅
|
|
|
|
### Resource Usage (TX1)
|
|
- **Storage:** ~97GB (one-time)
|
|
- **RAM (active DERP):** ~92GB (temporary)
|
|
- **RAM (idle):** <5GB (normal)
|
|
- **Bandwidth:** Models downloaded once, minimal ongoing
|
|
|
|
---
|
|
|
|
## Performance Expectations
|
|
|
|
### Claude Projects (Primary)
|
|
- **Response time:** 5-30 seconds
|
|
- **Quality:** Excellent (GPT-4 class)
|
|
- **Context:** Full repo (416 files)
|
|
- **Session memory:** Yes
|
|
|
|
### DERP Backup (Emergency)
|
|
- **Response time:** 30-120 seconds (slower than Claude)
|
|
- **Quality:** Good (GPT-3.5 to GPT-4 class depending on model)
|
|
- **Context:** 128K tokens per query
|
|
- **Session memory:** No (each query independent)
|
|
|
|
### Discord Bot (Routine)
|
|
- **Response time:** 10-45 seconds
|
|
- **Quality:** Good for simple queries
|
|
- **Context:** Knowledge base search
|
|
- **Rate limit:** 10 queries/hour per user
|
|
|
|
---
|
|
|
|
## Best Practices
|
|
|
|
### For Michael + Meg:
|
|
1. ✅ **Always use Claude Projects first** (best experience)
|
|
2. ✅ **Only use DERP for true emergencies** (Claude unavailable)
|
|
3. ✅ **Document DERP usage** (so Claude can learn from it later)
|
|
4. ✅ **Free TX1 RAM after DERP use** (restart Ollama if needed)
|
|
|
|
### For Staff:
|
|
1. ✅ **Use Discord bot for quick lookups** (fast, simple)
|
|
2. ✅ **Ask Michael/Meg for complex questions** (they have Claude)
|
|
3. ✅ **Don't abuse rate limits** (10 queries/hour is generous)
|
|
4. ✅ **Report bot issues immediately** (don't let it stay broken)
|
|
|
|
### For Subscribers:
|
|
1. ✅ **Use Discord bot for server info** (join instructions, modpacks)
|
|
2. ✅ **Don't ask for staff-only info** (bot will decline)
|
|
3. ✅ **Be patient** (bot shares resources with staff)
|
|
|
|
---
|
|
|
|
## Training & Onboarding
|
|
|
|
### New Staff Training:
|
|
1. Introduce Discord bot commands (`/ask`)
|
|
2. Show example queries (moderation, server management)
|
|
3. Explain rate limits
|
|
4. When to escalate to Michael/Meg
|
|
|
|
### Subscriber Communication:
|
|
1. Announce bot in Discord
|
|
2. Pin message with `/ask` command
|
|
3. Example queries in welcome channel
|
|
4. FAQ: "What can the bot answer?"
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
|
|
|
**Remember: Claude first, DERP only when necessary, Discord bot for routine queries.**
|
|
|
|
**Monthly cost: $20 (no increase)**
|