firefrost-gaming/firefrost-operations-manual

Files

The Chronicler 96f20e8715 Task #9 : Rewrite AI Stack architecture for DERP compliance

Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture:

CHANGES:
- Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant)
- Cost model: $0/month additional (self-hosted on TX1, no external APIs)
- Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers)
- Time estimate: 8-12hrs → 6-8hrs (more focused deployment)
- Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB)

NEW DOCUMENTATION:
- README.md: Complete architecture rewrite with three-tier usage model
- deployment-plan.md: Step-by-step deployment (6 phases, all commands included)
- usage-guide.md: Decision tree for when to use Claude vs DERP vs bots
- resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery

KEY FEATURES:
- Zero additional monthly cost (beyond existing $20 Claude Pro)
- True DERP compliance (fully self-hosted when Claude unavailable)
- Knowledge graph RAG (indexes entire 416-file repo)
- Discord bot integration (role-based staff/subscriber access)
- Emergency procedures documented
- Capacity planning for growth (up to 18 game servers)

MODELS:
- Qwen 2.5 Coder 72B (infrastructure/coding, 128K context)
- Llama 3.3 70B (general reasoning, 128K context)
- Llama 3.2 Vision 11B (screenshot analysis)

Updated tasks.md summary to reflect new architecture.

Status: Ready for deployment (pending medical clearance)

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

2026-02-18 17:27:25 +00:00

deployment-plan.md

Task #9 : Rewrite AI Stack architecture for DERP compliance

2026-02-18 17:27:25 +00:00

README.md

Task #9 : Rewrite AI Stack architecture for DERP compliance

2026-02-18 17:27:25 +00:00

resource-requirements.md

Task #9 : Rewrite AI Stack architecture for DERP compliance

2026-02-18 17:27:25 +00:00

usage-guide.md

Task #9 : Rewrite AI Stack architecture for DERP compliance

2026-02-18 17:27:25 +00:00

README.md

Self-Hosted AI Stack on TX1

Status: Blocked - Medical clearance
Priority: Tier 2 - Major Infrastructure
Time: 6-8 hours (3-4 active, rest downloads)
Location: TX1 Dallas
Last Updated: 2026-02-18
Updated By: The Chronicler

Overview

DERP-compliant AI infrastructure with zero additional monthly cost.

Three-tier usage model:

Primary: Claude Projects (best experience, full repo context)
DERP Backup: Self-hosted when Claude/Anthropic unavailable
Staff/Subscriber Bots: Discord + Wiki integration

Monthly Cost: $0 (beyond existing $20 Claude Pro subscription)

Architecture

Component 1: Dify (RAG Platform)

URL: ai.firefrostgaming.com
Purpose: Knowledge management, API backend
Features:

Multi-workspace (Operations, Brainstorming)
Knowledge graph indexing
Web interface
Discord bot API
Repository integration

Component 2: Ollama (Model Server)

Purpose: Local model hosting
Features:

Model management
API compatibility
Resource optimization

Component 3: Models (Self-Hosted)

Qwen 2.5 Coder 72B

Purpose: Infrastructure/coding specialist
Context: 128K tokens
RAM: ~80GB when loaded
Storage: ~40GB
Use: DERP strategic decisions

Llama 3.3 70B

Purpose: General reasoning
Context: 128K tokens
RAM: ~80GB when loaded
Storage: ~40GB
Use: DERP general queries

Llama 3.2 Vision 11B

Purpose: Screenshot/image analysis
RAM: ~7GB when loaded
Storage: ~7GB
Use: Visual troubleshooting

Component 4: Discord Bot

Purpose: Staff/subscriber interface
Features:

Role-based access (staff vs subscribers)
Calls Dify API
Commands: /ask, /operations, /brainstorm

Usage Model

Tier 1: Claude Projects (Primary)

When: Normal operations
Experience: Best (full repo context, session continuity)
Cost: $20/month (already paying)

Tier 2: DERP Backup (Emergency)

When: Claude/Anthropic outage
Experience: Functional (knowledge graph + 128K context)
Cost: $0/month (self-hosted)

Tier 3: Staff/Subscriber Bots

When: Routine queries in Discord/Wiki
Experience: Fast, simple
Cost: $0/month (same infrastructure)

Resource Requirements

Storage (TX1 has 1TB)

Qwen 2.5 Coder 72B: ~40GB
Llama 3.3 70B: ~40GB
Llama 3.2 Vision 11B: ~7GB
Dify + services: ~10GB
Total: ~97GB ✅

RAM (TX1 has 256GB)

DERP Activated (one large model loaded):

Model: ~80GB (Qwen OR Llama 3.3)
Dify services: ~4GB
Overhead: ~8GB
Total: ~92GB ✅

Normal Operations (models idle):

Minimal RAM usage
Available for game servers

CPU

32 vCPU available
Inference slower than API
Functional for emergency use

Deployment Phases

Phase 1: Core Stack (2-3 hours)

Deploy Dify via Docker Compose
Install Ollama
Download models (overnight - large files)
Configure workspaces
Index Git repository

Phase 2: Discord Bot (2-3 hours)

Create Python bot
Connect to Dify API
Implement role-based access
Test in Discord

Phase 3: Documentation (1 hour)

Usage guide (when to use what)
Emergency DERP procedures
Discord bot commands
Staff training materials

Total Time: 6-8 hours (active work)

Success Criteria

✅ Dify deployed and indexing repo
✅ Models loaded and operational
✅ DERP backup tested (strategic query without Claude)
✅ Discord bot functional (staff + subscriber access)
✅ Documentation complete
✅ Zero additional monthly cost

deployment-plan.md - Step-by-step deployment guide
usage-guide.md - When to use Claude vs DERP vs bots
resource-requirements.md - Detailed TX1 resource allocation
discord-bot-setup.md - Bot configuration and commands

Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️

Monthly Cost: $20 (no increase from current)

README.md

Self-Hosted AI Stack on TX1

Overview

Architecture

Component 1: Dify (RAG Platform)

Component 2: Ollama (Model Server)

Component 3: Models (Self-Hosted)

Component 4: Discord Bot

Usage Model

Tier 1: Claude Projects (Primary)

Tier 2: DERP Backup (Emergency)

Tier 3: Staff/Subscriber Bots

Resource Requirements

Storage (TX1 has 1TB)

RAM (TX1 has 256GB)

CPU

Deployment Phases

Phase 1: Core Stack (2-3 hours)

Phase 2: Discord Bot (2-3 hours)

Phase 3: Documentation (1 hour)

Success Criteria

Related Documentation