AI Stack architecture enhanced: vector database scaling for 1,000+ documents

MAJOR CHANGES: - Dual deployment: AnythingLLM (primary) + Open WebUI (staff) - AnythingLLM with LanceDB vector database (proven to 5,000+ docs) - Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff) - Batch ingestion strategy: 100 docs at a time (prevent OOM) - Embedding model added: all-MiniLM-L6-v2 for semantic search - Detailed ingestion process (6-8 hours total, 2-3 hours active) - Hardware validation: TX1 can handle 1,000+ docs easily SCALING STRATEGY: - Phase 3 now includes proper RAG pipeline - Vector DB for semantic search across full repo - Workspace isolation prevents context bleed - Auto-update via Git sync (repos update → workspaces re-ingest) RESOURCE UPDATE: - Total: 140GB RAM / 251GB = 56% usage (was 44%) - Total: 179GB storage / 809GB = 22% usage (was 15%) - Headroom for 5,000+ document growth Based on research: AnythingLLM gold standard for document-heavy self-hosting Updated by: Chronicler the Ninth
2026-02-15 12:44:08 -06:00
parent e1eb6661dc
commit c774b9ae3c
1 changed files with 153 additions and 29 deletions
--- a/docs/core/tasks.md
+++ b/docs/core/tasks.md
@@ -283,51 +283,175 @@ Foundation is secure, now deploy major services.
 ---

 ### 9. Self-Hosted AI Stack on TX1
-**Time:** 4-5 hours total (30 min active work, rest is downloads/waiting)  
+**Time:** 8-12 hours total (3-4 hours active work, rest is downloads/ingestion)  
 **Depends On:** Medical clearance  
 **Why Critical:** DERP backup, unlimited AI access, staff assistant foundation  
-**Location:** TX1 Dallas
+**Location:** TX1 Dallas  
+**Architecture:** Dual deployment (AnythingLLM primary + Open WebUI for staff)

 **Phase 0: NC1 Cleanup** ✅ (See Tier 0, Task #3)

-**Phase 1: Deploy Ollama + Open WebUI (30 min active)**
- Install Ollama on TX1
- Deploy Open WebUI via Docker
- Configure Nginx reverse proxy
- Test basic functionality
- Document at `docs/deployment/ai-stack.md`
+**Phase 1: Deploy Stack (1-2 hours active)**
+
+**Primary: AnythingLLM (for Michael/Meg, document-heavy workloads)**
+- Purpose-built for 1,000+ document libraries
+- Built-in LanceDB vector database (proven to 5,000+ docs)
+- Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff)
+- Domain: ai.firefrostgaming.com
+
+**Secondary: Open WebUI (for staff assistant Phase 4)**
+- Lighter weight for staff wiki only
+- Built-in Chroma vector DB (sufficient for smaller staff dataset)
+- ChatGPT-like interface (familiar to users)
+- Domain: staff-ai.firefrostgaming.com (when staff wiki deployed)
+
+**Deployment Steps:**
+1. Install Ollama on TX1
+2. Deploy AnythingLLM via Docker
+3. Deploy Open WebUI via Docker
+4. Configure Nginx reverse proxy for both
+5. Test basic functionality
+6. Document at `docs/deployment/ai-stack.md`
+
+---

 **Phase 2: Load LLM Models (6-8 hours download, overnight)**
- Qwen 2.5 Coder 72B (~40GB) — Coding tasks
- Llama 3.3 70B (~40GB) — Conversation, reasoning
- Llama 3.2 Vision 11B (~7GB) — Image understanding
- Total: ~90GB storage, ~90GB RAM when all loaded

-**Phase 3: Gitea Integration (1-2 hours)**
- Configure AI to access `/mnt/gitea/operations-manual/`
- Test full repo context loading
- Implement RAG system for document retrieval
- Verify DERP functionality (can reconstruct work from repo)
+**Core Models:**
+- Qwen 2.5 Coder 72B (~40GB) — Coding tasks, script generation
+- Llama 3.3 70B (~40GB) — Conversation, reasoning, decision-making
+- Llama 3.2 Vision 11B (~7GB) — Image understanding, photo processing
+
+**Embedding Model (for RAG):**
+- all-MiniLM-L6-v2 (~400MB) — Document embeddings for semantic search
+- Or: nomic-embed-text (~700MB) — Higher quality, slightly larger
+
+**Total Storage:** ~90-95GB for models + 50-60GB for embeddings/vector DB = ~150GB
+**Total RAM when loaded:** ~100-110GB (models + vector DB in memory)
+
+---
+
+**Phase 3: Document Ingestion & Vector Database (2-3 hours ACTIVE, 6-8 hours total processing)**
+
+**CRITICAL: Batch ingestion to prevent OOM (Out of Memory) killer**
+
+**Preparation:**
+```bash
+# Clone repos to local filesystem
+mkdir -p /opt/firefrost/repos
+cd /opt/firefrost/repos
+git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
+git clone https://git.firefrostgaming.com/firefrost-gaming/pokerole-project.git
+git clone https://git.firefrostgaming.com/firefrost-gaming/brainstorming.git
+```
+
+**Workspace Structure in AnythingLLM:**
+
+**Workspace 1: Firefrost Operations** (~500 files, growing)
+- Full operations-manual repo
+- Session transcripts
+- Deployment guides
+- Infrastructure docs
+- Consultant profiles
+
+**Workspace 2: Pokerole Project** (~200 files)
+- Separate context for Claudius
+- Holly collaboration docs
+- Pokédex content
+- Isolated from main operations (security)
+
+**Workspace 3: Brainstorming** (~100 files)
+- Sandbox sessions
+- Gemini brainstorms
+- Idea backlogs
+- Among Us planning
+
+**Workspace 4: Staff Wiki** (future, when deployed)
+- Staff-facing documentation only
+- Procedures, troubleshooting, policies
+- No access to operations manual (private)
+
+**Ingestion Process (BATCH 100 DOCS AT A TIME):**
+
+1. **Start with Firefrost Operations workspace**
+   - Upload first 100 markdown files from operations-manual
+   - Wait for embedding completion (~30-45 min)
+   - Verify search works before continuing
+   
+2. **Continue in batches of 100**
+   - Monitor RAM usage (should stay under 150GB)
+   - If RAM spikes above 180GB, reduce batch size to 50
+   - Total ingestion: ~4-6 hours for 500 docs
+
+3. **Repeat for other workspaces**
+   - Pokerole: 2-3 hours (smaller dataset)
+   - Brainstorming: 1-2 hours
+   - Total: 6-8 hours processing time
+
+**Testing:**
+- Test semantic search: "What is the Frostwall Protocol?"
+- Test cross-document synthesis: "Summarize our infrastructure decisions"
+- Test DERP functionality: "Reconstruct session from Feb 13"
+- Verify workspace isolation (Pokerole can't see Operations)
+
+**Hardware Reality Check:**
+- Your TX1: 251GB RAM, 809GB storage, 32 cores
+- Requirements: 16-32GB RAM, 50GB storage, 4+ cores
+- **TX1 is massive overkill (good thing) ✅**
+
+---

 **Phase 4: Staff AI Assistant (2-3 hours)**
- Deploy embedded chat widget OR dedicated portal
- Configure knowledge base (staff wiki only, operations manual stays private)
- Test 24/7 staff question answering
- Document usage in staff wiki (when deployed)
- Reduces Michael/Meg interruptions
- Onboarding tool for future recruitment

-**TX1 Resources After Deployment:**
+**Deploy Open WebUI with staff wiki docs only**
+- Much smaller dataset (~50-100 docs when staff wiki exists)
+- Built-in Chroma vector DB sufficient (no need for external)
+- Embedded chat widget OR dedicated portal
+- Domain: staff-ai.firefrostgaming.com
+
+**Configuration:**
+1. Create "Staff Wiki" knowledge base in Open WebUI
+2. Upload staff-facing docs only (operations manual stays private in AnythingLLM)
+3. Configure access (staff accounts, not public)
+4. Test 24/7 staff question answering:
+   - "How do I restart a game server?"
+   - "What's the whitelist process?"
+   - "Who do I contact for billing issues?"
+5. Document usage in staff wiki
+6. Train Meg on basic usage
+
+**Benefits:**
+- Reduces Michael/Meg interruptions (staff self-serve)
+- 24/7 availability (AI doesn't sleep)
+- Onboarding tool for future recruitment
+- Consistent answers (no "telephone game")
+
+---
+
+**TX1 Resources After Full Deployment:**
+
+**Before AI Stack:**
 - Games: 20GB RAM, 24GB storage
- AI Stack: 90GB RAM, 100GB storage
- **Total: 110GB RAM / 251GB = 44% usage ✅**
- **Total: 124GB storage / 809GB = 15% usage ✅**
+
+**After AI Stack:**
+- Games: 20GB RAM, 24GB storage
+- AI Models: 100GB RAM, 95GB storage
+- Vector DB + Embeddings: 20GB RAM, 60GB storage
+- **Total: 140GB RAM / 251GB = 56% usage ✅**
+- **Total: 179GB storage / 809GB = 22% usage ✅**
+
+**Plenty of headroom for growth.**
+
+---

 **After Completion:**
 - ✅ DERP backup operational (Claude.ai dies → self-hosted continues)
- ✅ Unlimited AI access ($0 vs Claude Pro $20/month)
+- ✅ Unlimited AI access ($0 vs Claude Pro $20/month = $240/year saved)
 - ✅ Staff assistant reduces support burden
- ✅ Full repo context for decision-making
+- ✅ Full repo context for decision-making (1,000+ docs semantic search)
+- ✅ Workspace isolation (Operations separate from Pokerole separate from Staff)
+- ✅ Proven to scale: 5,000+ document libraries in production
+- ✅ Self-healing via Git: repos update → workspaces re-ingest automatically

 ---