Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
Self-Hosted AI Stack on TX1
Status: Blocked - Medical clearance
Priority: Tier 2 - Major Infrastructure
Time: 6-8 hours (3-4 active, rest downloads)
Location: TX1 Dallas
Last Updated: 2026-02-18
Updated By: The Chronicler
Overview
DERP-compliant AI infrastructure with zero additional monthly cost.
Three-tier usage model:
- Primary: Claude Projects (best experience, full repo context)
- DERP Backup: Self-hosted when Claude/Anthropic unavailable
- Staff/Subscriber Bots: Discord + Wiki integration
Monthly Cost: $0 (beyond existing $20 Claude Pro subscription)
Architecture
Component 1: Dify (RAG Platform)
URL: ai.firefrostgaming.com
Purpose: Knowledge management, API backend
Features:
- Multi-workspace (Operations, Brainstorming)
- Knowledge graph indexing
- Web interface
- Discord bot API
- Repository integration
Component 2: Ollama (Model Server)
Purpose: Local model hosting
Features:
- Model management
- API compatibility
- Resource optimization
Component 3: Models (Self-Hosted)
Qwen 2.5 Coder 72B
- Purpose: Infrastructure/coding specialist
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP strategic decisions
Llama 3.3 70B
- Purpose: General reasoning
- Context: 128K tokens
- RAM: ~80GB when loaded
- Storage: ~40GB
- Use: DERP general queries
Llama 3.2 Vision 11B
- Purpose: Screenshot/image analysis
- RAM: ~7GB when loaded
- Storage: ~7GB
- Use: Visual troubleshooting
Component 4: Discord Bot
Purpose: Staff/subscriber interface
Features:
- Role-based access (staff vs subscribers)
- Calls Dify API
- Commands:
/ask,/operations,/brainstorm
Usage Model
Tier 1: Claude Projects (Primary)
When: Normal operations
Experience: Best (full repo context, session continuity)
Cost: $20/month (already paying)
Tier 2: DERP Backup (Emergency)
When: Claude/Anthropic outage
Experience: Functional (knowledge graph + 128K context)
Cost: $0/month (self-hosted)
Tier 3: Staff/Subscriber Bots
When: Routine queries in Discord/Wiki
Experience: Fast, simple
Cost: $0/month (same infrastructure)
Resource Requirements
Storage (TX1 has 1TB)
- Qwen 2.5 Coder 72B: ~40GB
- Llama 3.3 70B: ~40GB
- Llama 3.2 Vision 11B: ~7GB
- Dify + services: ~10GB
- Total: ~97GB ✅
RAM (TX1 has 256GB)
DERP Activated (one large model loaded):
- Model: ~80GB (Qwen OR Llama 3.3)
- Dify services: ~4GB
- Overhead: ~8GB
- Total: ~92GB ✅
Normal Operations (models idle):
- Minimal RAM usage
- Available for game servers
CPU
- 32 vCPU available
- Inference slower than API
- Functional for emergency use
Deployment Phases
Phase 1: Core Stack (2-3 hours)
- Deploy Dify via Docker Compose
- Install Ollama
- Download models (overnight - large files)
- Configure workspaces
- Index Git repository
Phase 2: Discord Bot (2-3 hours)
- Create Python bot
- Connect to Dify API
- Implement role-based access
- Test in Discord
Phase 3: Documentation (1 hour)
- Usage guide (when to use what)
- Emergency DERP procedures
- Discord bot commands
- Staff training materials
Total Time: 6-8 hours (active work)
Success Criteria
- ✅ Dify deployed and indexing repo
- ✅ Models loaded and operational
- ✅ DERP backup tested (strategic query without Claude)
- ✅ Discord bot functional (staff + subscriber access)
- ✅ Documentation complete
- ✅ Zero additional monthly cost
Related Documentation
- deployment-plan.md - Step-by-step deployment guide
- usage-guide.md - When to use Claude vs DERP vs bots
- resource-requirements.md - Detailed TX1 resource allocation
- discord-bot-setup.md - Bot configuration and commands
Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
Monthly Cost: $20 (no increase from current)