# Self-Hosted AI Stack on TX1 **Status:** Blocked - Medical clearance **Priority:** Tier 2 - Major Infrastructure **Time:** 6-8 hours (3-4 active, rest downloads) **Location:** TX1 Dallas **Last Updated:** 2026-02-18 **Updated By:** The Chronicler --- ## Overview **DERP-compliant AI infrastructure with zero additional monthly cost.** Three-tier usage model: 1. **Primary:** Claude Projects (best experience, full repo context) 2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable 3. **Staff/Subscriber Bots:** Discord + Wiki integration **Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription) --- ## Architecture ### Component 1: Dify (RAG Platform) **URL:** ai.firefrostgaming.com **Purpose:** Knowledge management, API backend **Features:** - Multi-workspace (Operations, Brainstorming) - Knowledge graph indexing - Web interface - Discord bot API - Repository integration ### Component 2: Ollama (Model Server) **Purpose:** Local model hosting **Features:** - Model management - API compatibility - Resource optimization ### Component 3: Models (Self-Hosted) **Qwen 2.5 Coder 72B** - Purpose: Infrastructure/coding specialist - Context: 128K tokens - RAM: ~80GB when loaded - Storage: ~40GB - Use: DERP strategic decisions **Llama 3.3 70B** - Purpose: General reasoning - Context: 128K tokens - RAM: ~80GB when loaded - Storage: ~40GB - Use: DERP general queries **Llama 3.2 Vision 11B** - Purpose: Screenshot/image analysis - RAM: ~7GB when loaded - Storage: ~7GB - Use: Visual troubleshooting ### Component 4: Discord Bot **Purpose:** Staff/subscriber interface **Features:** - Role-based access (staff vs subscribers) - Calls Dify API - Commands: `/ask`, `/operations`, `/brainstorm` --- ## Usage Model ### Tier 1: Claude Projects (Primary) **When:** Normal operations **Experience:** Best (full repo context, session continuity) **Cost:** $20/month (already paying) ### Tier 2: DERP Backup (Emergency) **When:** Claude/Anthropic outage **Experience:** Functional (knowledge graph + 128K context) **Cost:** $0/month (self-hosted) ### Tier 3: Staff/Subscriber Bots **When:** Routine queries in Discord/Wiki **Experience:** Fast, simple **Cost:** $0/month (same infrastructure) --- ## Resource Requirements ### Storage (TX1 has 1TB) - Qwen 2.5 Coder 72B: ~40GB - Llama 3.3 70B: ~40GB - Llama 3.2 Vision 11B: ~7GB - Dify + services: ~10GB - **Total: ~97GB** ✅ ### RAM (TX1 has 256GB) **DERP Activated (one large model loaded):** - Model: ~80GB (Qwen OR Llama 3.3) - Dify services: ~4GB - Overhead: ~8GB - **Total: ~92GB** ✅ **Normal Operations (models idle):** - Minimal RAM usage - Available for game servers ### CPU - 32 vCPU available - Inference slower than API - Functional for emergency use --- ## Deployment Phases ### Phase 1: Core Stack (2-3 hours) 1. Deploy Dify via Docker Compose 2. Install Ollama 3. Download models (overnight - large files) 4. Configure workspaces 5. Index Git repository ### Phase 2: Discord Bot (2-3 hours) 1. Create Python bot 2. Connect to Dify API 3. Implement role-based access 4. Test in Discord ### Phase 3: Documentation (1 hour) 1. Usage guide (when to use what) 2. Emergency DERP procedures 3. Discord bot commands 4. Staff training materials **Total Time:** 6-8 hours (active work) --- ## Success Criteria - ✅ Dify deployed and indexing repo - ✅ Models loaded and operational - ✅ DERP backup tested (strategic query without Claude) - ✅ Discord bot functional (staff + subscriber access) - ✅ Documentation complete - ✅ Zero additional monthly cost --- ## Related Documentation - **deployment-plan.md** - Step-by-step deployment guide - **usage-guide.md** - When to use Claude vs DERP vs bots - **resource-requirements.md** - Detailed TX1 resource allocation - **discord-bot-setup.md** - Bot configuration and commands --- **Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️ **Monthly Cost: $20 (no increase from current)**