Files
firefrost-operations-manual/docs/tasks/self-hosted-ai-stack-on-tx1/usage-guide.md
The Chronicler 96f20e8715 Task #9: Rewrite AI Stack architecture for DERP compliance
Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
2026-02-18 17:27:25 +00:00

9.3 KiB

AI Stack Usage Guide

Purpose: Know which AI system to use when
Last Updated: 2026-02-18


The Three-Tier System

Tier 1: Claude Projects (Primary) - USE THIS FIRST

Who: Michael + Meg
Where: claude.ai or Claude app
Cost: $20/month (already paying)

When to use:

  • Normal daily operations (99% of the time)
  • Strategic decision-making (deployment order, architecture)
  • Complex reasoning (tradeoffs, dependencies)
  • Session continuity (remembers context across days)
  • Best experience (fastest, most capable)

What Claude can do:

  • Search entire 416-file operations manual
  • Write deployment scripts
  • Review infrastructure decisions
  • Generate documentation
  • Debug issues
  • Plan roadmaps

Example queries:

  • "Should I deploy Mailcow or AI stack first?"
  • "Write a script to deploy Frostwall Protocol"
  • "What tasks depend on NC1 cleanup?"
  • "Help me troubleshoot this Pterodactyl error"

Limitations:

  • Requires internet connection
  • Subject to Anthropic availability

Tier 2: DERP Backup (Emergency Only) - WHEN CLAUDE IS DOWN

Who: Michael + Meg
Where: https://ai.firefrostgaming.com
Cost: $0/month (self-hosted on TX1)

When to use:

  • Not for normal operations (Claude is faster/better)
  • Anthropic outage (Claude unavailable for hours)
  • Emergency infrastructure decisions (can't wait for Claude)
  • Critical troubleshooting (server down, need immediate help)

What DERP can do:

  • Query indexed operations manual (416 files)
  • Strategic reasoning with 128K context
  • Infrastructure troubleshooting
  • Code generation
  • Emergency deployment guidance

Available models:

  • Qwen 2.5 Coder 72B - Infrastructure/coding questions
  • Llama 3.3 70B - General reasoning
  • Llama 3.2 Vision 11B - Screenshot analysis

Example queries:

  • "Claude is down. What's the deployment order for Frostwall?"
  • "Emergency: Mailcow not starting. Check logs and diagnose."
  • "Need to deploy something NOW. What dependencies are missing?"

Limitations:

  • Slower inference than Claude
  • No session continuity
  • Manual model selection
  • Uses TX1 resources (~80GB RAM when active)

How to activate:

  1. Verify Claude is unavailable (try multiple times)
  2. Go to https://ai.firefrostgaming.com
  3. Select workspace:
    • Operations - Infrastructure decisions
    • Brainstorming - Creative work
  4. Select model:
    • Qwen 2.5 Coder - For deployment/troubleshooting
    • Llama 3.3 - For general questions
  5. Ask question
  6. Copy/paste response as needed

When to deactivate:

  • Claude comes back online
  • Emergency resolved
  • Free up TX1 RAM for game servers

Tier 3: Discord Bot (Staff/Subscribers) - ROUTINE QUERIES

Who: Staff + Subscribers
Where: Firefrost Discord server
Cost: $0/month (same infrastructure)

When to use:

  • Routine questions (daily operations)
  • Quick lookups (server status, modpack info)
  • Staff training (how-to queries)
  • Subscriber support (basic info)

Commands:

/ask [question]

  • Available to: Staff + Subscribers
  • Searches: Operations workspace (staff) or public docs (subscribers)
  • Rate limit: 10 queries/hour per user

Example queries (Staff):

/ask How many game servers are running?
/ask What's the Whitelist Manager deployment status?
/ask How do I restart a Minecraft server?

Example queries (Subscribers):

/ask What modpacks are available?
/ask How do I join a server?
/ask What's the difference between Fire and Frost paths?

Role-based access:

  • Staff: Full Operations workspace access
  • Subscribers: Public documentation only
  • No role: Cannot use bot

Limitations:

  • Simple queries only (no complex reasoning)
  • No file uploads
  • No strategic decisions
  • Rate limited

Decision Tree

┌─────────────────────────────────────┐
│    Do you need AI assistance?      │
└─────────────┬───────────────────────┘
              │
              ▼
      ┌───────────────┐
      │ Is it urgent? │
      └───┬───────┬───┘
          │       │
        NO│       │YES
          │       │
          ▼       ▼
    ┌─────────┐ ┌──────────────┐
    │ Claude  │ │ Is Claude    │
    │ working?│ │ available?   │
    └───┬─────┘ └──┬───────┬───┘
        │          │       │
       YES│       YES│     │NO
        │          │       │
        ▼          ▼       ▼
  ┌──────────┐ ┌──────────┐ ┌─────────┐
  │Use Claude│ │Use Claude│ │Use DERP │
  │Projects  │ │Projects  │ │Backup   │
  └──────────┘ └──────────┘ └─────────┘

For staff/subscribers:

┌────────────────────────────┐
│   Simple routine query?    │
└──────────┬─────────────────┘
           │
          YES
           │
           ▼
   ┌──────────────┐
   │ Use Discord  │
   │ Bot: /ask    │
   └──────────────┘

Emergency Procedures

Scenario 1: Claude Down, Need Strategic Decision

Problem: Anthropic outage, need to deploy something NOW

Solution:

  1. Verify Claude truly unavailable (try web + app)
  2. Go to https://ai.firefrostgaming.com
  3. Login with Michael's account
  4. Select Operations workspace
  5. Select Qwen 2.5 Coder model
  6. Ask strategic question
  7. Copy deployment commands
  8. Execute carefully (no session memory!)

Note: DERP doesn't remember context. Be explicit in each query.

Scenario 2: Discord Bot Down

Problem: Staff reporting bot not responding

Check status:

ssh root@38.68.14.26
systemctl status firefrost-discord-bot

If stopped:

systemctl start firefrost-discord-bot

If errors:

journalctl -u firefrost-discord-bot -f
# Check for API errors, token issues

If Dify down:

cd /opt/dify
docker-compose ps
# If services down:
docker-compose up -d

Scenario 3: Model Won't Load

Problem: DERP system reports "model unavailable"

Check Ollama:

ollama list
# Should show: qwen2.5-coder:72b, llama3.3:70b, llama3.2-vision:11b

If models missing:

# Re-download
ollama pull qwen2.5-coder:72b
ollama pull llama3.3:70b
ollama pull llama3.2-vision:11b

Check RAM:

free -h
# If <90GB free, unload game servers temporarily

Cost Tracking

Monthly Costs

  • Claude Projects: $20/month (primary system)
  • Dify: $0/month (self-hosted)
  • Ollama: $0/month (self-hosted)
  • Discord Bot: $0/month (self-hosted)
  • Total: $20/month

Resource Usage (TX1)

  • Storage: ~97GB (one-time)
  • RAM (active DERP): ~92GB (temporary)
  • RAM (idle): <5GB (normal)
  • Bandwidth: Models downloaded once, minimal ongoing

Performance Expectations

Claude Projects (Primary)

  • Response time: 5-30 seconds
  • Quality: Excellent (GPT-4 class)
  • Context: Full repo (416 files)
  • Session memory: Yes

DERP Backup (Emergency)

  • Response time: 30-120 seconds (slower than Claude)
  • Quality: Good (GPT-3.5 to GPT-4 class depending on model)
  • Context: 128K tokens per query
  • Session memory: No (each query independent)

Discord Bot (Routine)

  • Response time: 10-45 seconds
  • Quality: Good for simple queries
  • Context: Knowledge base search
  • Rate limit: 10 queries/hour per user

Best Practices

For Michael + Meg:

  1. Always use Claude Projects first (best experience)
  2. Only use DERP for true emergencies (Claude unavailable)
  3. Document DERP usage (so Claude can learn from it later)
  4. Free TX1 RAM after DERP use (restart Ollama if needed)

For Staff:

  1. Use Discord bot for quick lookups (fast, simple)
  2. Ask Michael/Meg for complex questions (they have Claude)
  3. Don't abuse rate limits (10 queries/hour is generous)
  4. Report bot issues immediately (don't let it stay broken)

For Subscribers:

  1. Use Discord bot for server info (join instructions, modpacks)
  2. Don't ask for staff-only info (bot will decline)
  3. Be patient (bot shares resources with staff)

Training & Onboarding

New Staff Training:

  1. Introduce Discord bot commands (/ask)
  2. Show example queries (moderation, server management)
  3. Explain rate limits
  4. When to escalate to Michael/Meg

Subscriber Communication:

  1. Announce bot in Discord
  2. Pin message with /ask command
  3. Example queries in welcome channel
  4. FAQ: "What can the bot answer?"

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

Remember: Claude first, DERP only when necessary, Discord bot for routine queries.

Monthly cost: $20 (no increase)