diff --git a/docs/core/tasks.md b/docs/core/tasks.md index 3e7e242..7327771 100644 --- a/docs/core/tasks.md +++ b/docs/core/tasks.md @@ -283,51 +283,175 @@ Foundation is secure, now deploy major services. --- ### 9. Self-Hosted AI Stack on TX1 -**Time:** 4-5 hours total (30 min active work, rest is downloads/waiting) +**Time:** 8-12 hours total (3-4 hours active work, rest is downloads/ingestion) **Depends On:** Medical clearance **Why Critical:** DERP backup, unlimited AI access, staff assistant foundation -**Location:** TX1 Dallas +**Location:** TX1 Dallas +**Architecture:** Dual deployment (AnythingLLM primary + Open WebUI for staff) **Phase 0: NC1 Cleanup** ✅ (See Tier 0, Task #3) -**Phase 1: Deploy Ollama + Open WebUI (30 min active)** -- Install Ollama on TX1 -- Deploy Open WebUI via Docker -- Configure Nginx reverse proxy -- Test basic functionality -- Document at `docs/deployment/ai-stack.md` +**Phase 1: Deploy Stack (1-2 hours active)** + +**Primary: AnythingLLM (for Michael/Meg, document-heavy workloads)** +- Purpose-built for 1,000+ document libraries +- Built-in LanceDB vector database (proven to 5,000+ docs) +- Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff) +- Domain: ai.firefrostgaming.com + +**Secondary: Open WebUI (for staff assistant Phase 4)** +- Lighter weight for staff wiki only +- Built-in Chroma vector DB (sufficient for smaller staff dataset) +- ChatGPT-like interface (familiar to users) +- Domain: staff-ai.firefrostgaming.com (when staff wiki deployed) + +**Deployment Steps:** +1. Install Ollama on TX1 +2. Deploy AnythingLLM via Docker +3. Deploy Open WebUI via Docker +4. Configure Nginx reverse proxy for both +5. Test basic functionality +6. Document at `docs/deployment/ai-stack.md` + +--- **Phase 2: Load LLM Models (6-8 hours download, overnight)** -- Qwen 2.5 Coder 72B (~40GB) — Coding tasks -- Llama 3.3 70B (~40GB) — Conversation, reasoning -- Llama 3.2 Vision 11B (~7GB) — Image understanding -- Total: ~90GB storage, ~90GB RAM when all loaded -**Phase 3: Gitea Integration (1-2 hours)** -- Configure AI to access `/mnt/gitea/operations-manual/` -- Test full repo context loading -- Implement RAG system for document retrieval -- Verify DERP functionality (can reconstruct work from repo) +**Core Models:** +- Qwen 2.5 Coder 72B (~40GB) — Coding tasks, script generation +- Llama 3.3 70B (~40GB) — Conversation, reasoning, decision-making +- Llama 3.2 Vision 11B (~7GB) — Image understanding, photo processing + +**Embedding Model (for RAG):** +- all-MiniLM-L6-v2 (~400MB) — Document embeddings for semantic search +- Or: nomic-embed-text (~700MB) — Higher quality, slightly larger + +**Total Storage:** ~90-95GB for models + 50-60GB for embeddings/vector DB = ~150GB +**Total RAM when loaded:** ~100-110GB (models + vector DB in memory) + +--- + +**Phase 3: Document Ingestion & Vector Database (2-3 hours ACTIVE, 6-8 hours total processing)** + +**CRITICAL: Batch ingestion to prevent OOM (Out of Memory) killer** + +**Preparation:** +```bash +# Clone repos to local filesystem +mkdir -p /opt/firefrost/repos +cd /opt/firefrost/repos +git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git +git clone https://git.firefrostgaming.com/firefrost-gaming/pokerole-project.git +git clone https://git.firefrostgaming.com/firefrost-gaming/brainstorming.git +``` + +**Workspace Structure in AnythingLLM:** + +**Workspace 1: Firefrost Operations** (~500 files, growing) +- Full operations-manual repo +- Session transcripts +- Deployment guides +- Infrastructure docs +- Consultant profiles + +**Workspace 2: Pokerole Project** (~200 files) +- Separate context for Claudius +- Holly collaboration docs +- Pokédex content +- Isolated from main operations (security) + +**Workspace 3: Brainstorming** (~100 files) +- Sandbox sessions +- Gemini brainstorms +- Idea backlogs +- Among Us planning + +**Workspace 4: Staff Wiki** (future, when deployed) +- Staff-facing documentation only +- Procedures, troubleshooting, policies +- No access to operations manual (private) + +**Ingestion Process (BATCH 100 DOCS AT A TIME):** + +1. **Start with Firefrost Operations workspace** + - Upload first 100 markdown files from operations-manual + - Wait for embedding completion (~30-45 min) + - Verify search works before continuing + +2. **Continue in batches of 100** + - Monitor RAM usage (should stay under 150GB) + - If RAM spikes above 180GB, reduce batch size to 50 + - Total ingestion: ~4-6 hours for 500 docs + +3. **Repeat for other workspaces** + - Pokerole: 2-3 hours (smaller dataset) + - Brainstorming: 1-2 hours + - Total: 6-8 hours processing time + +**Testing:** +- Test semantic search: "What is the Frostwall Protocol?" +- Test cross-document synthesis: "Summarize our infrastructure decisions" +- Test DERP functionality: "Reconstruct session from Feb 13" +- Verify workspace isolation (Pokerole can't see Operations) + +**Hardware Reality Check:** +- Your TX1: 251GB RAM, 809GB storage, 32 cores +- Requirements: 16-32GB RAM, 50GB storage, 4+ cores +- **TX1 is massive overkill (good thing) ✅** + +--- **Phase 4: Staff AI Assistant (2-3 hours)** -- Deploy embedded chat widget OR dedicated portal -- Configure knowledge base (staff wiki only, operations manual stays private) -- Test 24/7 staff question answering -- Document usage in staff wiki (when deployed) -- Reduces Michael/Meg interruptions -- Onboarding tool for future recruitment -**TX1 Resources After Deployment:** +**Deploy Open WebUI with staff wiki docs only** +- Much smaller dataset (~50-100 docs when staff wiki exists) +- Built-in Chroma vector DB sufficient (no need for external) +- Embedded chat widget OR dedicated portal +- Domain: staff-ai.firefrostgaming.com + +**Configuration:** +1. Create "Staff Wiki" knowledge base in Open WebUI +2. Upload staff-facing docs only (operations manual stays private in AnythingLLM) +3. Configure access (staff accounts, not public) +4. Test 24/7 staff question answering: + - "How do I restart a game server?" + - "What's the whitelist process?" + - "Who do I contact for billing issues?" +5. Document usage in staff wiki +6. Train Meg on basic usage + +**Benefits:** +- Reduces Michael/Meg interruptions (staff self-serve) +- 24/7 availability (AI doesn't sleep) +- Onboarding tool for future recruitment +- Consistent answers (no "telephone game") + +--- + +**TX1 Resources After Full Deployment:** + +**Before AI Stack:** - Games: 20GB RAM, 24GB storage -- AI Stack: 90GB RAM, 100GB storage -- **Total: 110GB RAM / 251GB = 44% usage ✅** -- **Total: 124GB storage / 809GB = 15% usage ✅** + +**After AI Stack:** +- Games: 20GB RAM, 24GB storage +- AI Models: 100GB RAM, 95GB storage +- Vector DB + Embeddings: 20GB RAM, 60GB storage +- **Total: 140GB RAM / 251GB = 56% usage ✅** +- **Total: 179GB storage / 809GB = 22% usage ✅** + +**Plenty of headroom for growth.** + +--- **After Completion:** - ✅ DERP backup operational (Claude.ai dies → self-hosted continues) -- ✅ Unlimited AI access ($0 vs Claude Pro $20/month) +- ✅ Unlimited AI access ($0 vs Claude Pro $20/month = $240/year saved) - ✅ Staff assistant reduces support burden -- ✅ Full repo context for decision-making +- ✅ Full repo context for decision-making (1,000+ docs semantic search) +- ✅ Workspace isolation (Operations separate from Pokerole separate from Staff) +- ✅ Proven to scale: 5,000+ document libraries in production +- ✅ Self-healing via Git: repos update → workspaces re-ingest automatically ---