Complete rewrite of self-hosted AI stack (Task #9) with new DERP-compliant architecture: CHANGES: - Architecture: AnythingLLM+OpenWebUI → Dify+Ollama (DERP-compliant) - Cost model: $0/month additional (self-hosted on TX1, no external APIs) - Usage tiers: Claude Projects (primary) → DERP backup (emergency) → Discord bots (staff/subscribers) - Time estimate: 8-12hrs → 6-8hrs (more focused deployment) - Resource allocation: 97GB storage, 92GB RAM when active (vs 150GB/110GB) NEW DOCUMENTATION: - README.md: Complete architecture rewrite with three-tier usage model - deployment-plan.md: Step-by-step deployment (6 phases, all commands included) - usage-guide.md: Decision tree for when to use Claude vs DERP vs bots - resource-requirements.md: TX1 capacity planning, monitoring, disaster recovery KEY FEATURES: - Zero additional monthly cost (beyond existing $20 Claude Pro) - True DERP compliance (fully self-hosted when Claude unavailable) - Knowledge graph RAG (indexes entire 416-file repo) - Discord bot integration (role-based staff/subscriber access) - Emergency procedures documented - Capacity planning for growth (up to 18 game servers) MODELS: - Qwen 2.5 Coder 72B (infrastructure/coding, 128K context) - Llama 3.3 70B (general reasoning, 128K context) - Llama 3.2 Vision 11B (screenshot analysis) Updated tasks.md summary to reflect new architecture. Status: Ready for deployment (pending medical clearance) Fire + Frost + Foundation + DERP = True Independence 💙🔥❄️
169 lines
3.9 KiB
Markdown
169 lines
3.9 KiB
Markdown
# Self-Hosted AI Stack on TX1
|
|
|
|
**Status:** Blocked - Medical clearance
|
|
**Priority:** Tier 2 - Major Infrastructure
|
|
**Time:** 6-8 hours (3-4 active, rest downloads)
|
|
**Location:** TX1 Dallas
|
|
**Last Updated:** 2026-02-18
|
|
**Updated By:** The Chronicler
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
**DERP-compliant AI infrastructure with zero additional monthly cost.**
|
|
|
|
Three-tier usage model:
|
|
1. **Primary:** Claude Projects (best experience, full repo context)
|
|
2. **DERP Backup:** Self-hosted when Claude/Anthropic unavailable
|
|
3. **Staff/Subscriber Bots:** Discord + Wiki integration
|
|
|
|
**Monthly Cost:** $0 (beyond existing $20 Claude Pro subscription)
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Component 1: Dify (RAG Platform)
|
|
**URL:** ai.firefrostgaming.com
|
|
**Purpose:** Knowledge management, API backend
|
|
**Features:**
|
|
- Multi-workspace (Operations, Brainstorming)
|
|
- Knowledge graph indexing
|
|
- Web interface
|
|
- Discord bot API
|
|
- Repository integration
|
|
|
|
### Component 2: Ollama (Model Server)
|
|
**Purpose:** Local model hosting
|
|
**Features:**
|
|
- Model management
|
|
- API compatibility
|
|
- Resource optimization
|
|
|
|
### Component 3: Models (Self-Hosted)
|
|
|
|
**Qwen 2.5 Coder 72B**
|
|
- Purpose: Infrastructure/coding specialist
|
|
- Context: 128K tokens
|
|
- RAM: ~80GB when loaded
|
|
- Storage: ~40GB
|
|
- Use: DERP strategic decisions
|
|
|
|
**Llama 3.3 70B**
|
|
- Purpose: General reasoning
|
|
- Context: 128K tokens
|
|
- RAM: ~80GB when loaded
|
|
- Storage: ~40GB
|
|
- Use: DERP general queries
|
|
|
|
**Llama 3.2 Vision 11B**
|
|
- Purpose: Screenshot/image analysis
|
|
- RAM: ~7GB when loaded
|
|
- Storage: ~7GB
|
|
- Use: Visual troubleshooting
|
|
|
|
### Component 4: Discord Bot
|
|
**Purpose:** Staff/subscriber interface
|
|
**Features:**
|
|
- Role-based access (staff vs subscribers)
|
|
- Calls Dify API
|
|
- Commands: `/ask`, `/operations`, `/brainstorm`
|
|
|
|
---
|
|
|
|
## Usage Model
|
|
|
|
### Tier 1: Claude Projects (Primary)
|
|
**When:** Normal operations
|
|
**Experience:** Best (full repo context, session continuity)
|
|
**Cost:** $20/month (already paying)
|
|
|
|
### Tier 2: DERP Backup (Emergency)
|
|
**When:** Claude/Anthropic outage
|
|
**Experience:** Functional (knowledge graph + 128K context)
|
|
**Cost:** $0/month (self-hosted)
|
|
|
|
### Tier 3: Staff/Subscriber Bots
|
|
**When:** Routine queries in Discord/Wiki
|
|
**Experience:** Fast, simple
|
|
**Cost:** $0/month (same infrastructure)
|
|
|
|
---
|
|
|
|
## Resource Requirements
|
|
|
|
### Storage (TX1 has 1TB)
|
|
- Qwen 2.5 Coder 72B: ~40GB
|
|
- Llama 3.3 70B: ~40GB
|
|
- Llama 3.2 Vision 11B: ~7GB
|
|
- Dify + services: ~10GB
|
|
- **Total: ~97GB** ✅
|
|
|
|
### RAM (TX1 has 256GB)
|
|
**DERP Activated (one large model loaded):**
|
|
- Model: ~80GB (Qwen OR Llama 3.3)
|
|
- Dify services: ~4GB
|
|
- Overhead: ~8GB
|
|
- **Total: ~92GB** ✅
|
|
|
|
**Normal Operations (models idle):**
|
|
- Minimal RAM usage
|
|
- Available for game servers
|
|
|
|
### CPU
|
|
- 32 vCPU available
|
|
- Inference slower than API
|
|
- Functional for emergency use
|
|
|
|
---
|
|
|
|
## Deployment Phases
|
|
|
|
### Phase 1: Core Stack (2-3 hours)
|
|
1. Deploy Dify via Docker Compose
|
|
2. Install Ollama
|
|
3. Download models (overnight - large files)
|
|
4. Configure workspaces
|
|
5. Index Git repository
|
|
|
|
### Phase 2: Discord Bot (2-3 hours)
|
|
1. Create Python bot
|
|
2. Connect to Dify API
|
|
3. Implement role-based access
|
|
4. Test in Discord
|
|
|
|
### Phase 3: Documentation (1 hour)
|
|
1. Usage guide (when to use what)
|
|
2. Emergency DERP procedures
|
|
3. Discord bot commands
|
|
4. Staff training materials
|
|
|
|
**Total Time:** 6-8 hours (active work)
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
- ✅ Dify deployed and indexing repo
|
|
- ✅ Models loaded and operational
|
|
- ✅ DERP backup tested (strategic query without Claude)
|
|
- ✅ Discord bot functional (staff + subscriber access)
|
|
- ✅ Documentation complete
|
|
- ✅ Zero additional monthly cost
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- **deployment-plan.md** - Step-by-step deployment guide
|
|
- **usage-guide.md** - When to use Claude vs DERP vs bots
|
|
- **resource-requirements.md** - Detailed TX1 resource allocation
|
|
- **discord-bot-setup.md** - Bot configuration and commands
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation + DERP = True Independence** 💙🔥❄️
|
|
|
|
**Monthly Cost: $20 (no increase from current)**
|