AI Stack architecture enhanced: vector database scaling for 1,000+ documents
MAJOR CHANGES: - Dual deployment: AnythingLLM (primary) + Open WebUI (staff) - AnythingLLM with LanceDB vector database (proven to 5,000+ docs) - Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff) - Batch ingestion strategy: 100 docs at a time (prevent OOM) - Embedding model added: all-MiniLM-L6-v2 for semantic search - Detailed ingestion process (6-8 hours total, 2-3 hours active) - Hardware validation: TX1 can handle 1,000+ docs easily SCALING STRATEGY: - Phase 3 now includes proper RAG pipeline - Vector DB for semantic search across full repo - Workspace isolation prevents context bleed - Auto-update via Git sync (repos update → workspaces re-ingest) RESOURCE UPDATE: - Total: 140GB RAM / 251GB = 56% usage (was 44%) - Total: 179GB storage / 809GB = 22% usage (was 15%) - Headroom for 5,000+ document growth Based on research: AnythingLLM gold standard for document-heavy self-hosting Updated by: Chronicler the Ninth
This commit is contained in:
@@ -283,51 +283,175 @@ Foundation is secure, now deploy major services.
|
||||
---
|
||||
|
||||
### 9. Self-Hosted AI Stack on TX1
|
||||
**Time:** 4-5 hours total (30 min active work, rest is downloads/waiting)
|
||||
**Time:** 8-12 hours total (3-4 hours active work, rest is downloads/ingestion)
|
||||
**Depends On:** Medical clearance
|
||||
**Why Critical:** DERP backup, unlimited AI access, staff assistant foundation
|
||||
**Location:** TX1 Dallas
|
||||
**Location:** TX1 Dallas
|
||||
**Architecture:** Dual deployment (AnythingLLM primary + Open WebUI for staff)
|
||||
|
||||
**Phase 0: NC1 Cleanup** ✅ (See Tier 0, Task #3)
|
||||
|
||||
**Phase 1: Deploy Ollama + Open WebUI (30 min active)**
|
||||
- Install Ollama on TX1
|
||||
- Deploy Open WebUI via Docker
|
||||
- Configure Nginx reverse proxy
|
||||
- Test basic functionality
|
||||
- Document at `docs/deployment/ai-stack.md`
|
||||
**Phase 1: Deploy Stack (1-2 hours active)**
|
||||
|
||||
**Primary: AnythingLLM (for Michael/Meg, document-heavy workloads)**
|
||||
- Purpose-built for 1,000+ document libraries
|
||||
- Built-in LanceDB vector database (proven to 5,000+ docs)
|
||||
- Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff)
|
||||
- Domain: ai.firefrostgaming.com
|
||||
|
||||
**Secondary: Open WebUI (for staff assistant Phase 4)**
|
||||
- Lighter weight for staff wiki only
|
||||
- Built-in Chroma vector DB (sufficient for smaller staff dataset)
|
||||
- ChatGPT-like interface (familiar to users)
|
||||
- Domain: staff-ai.firefrostgaming.com (when staff wiki deployed)
|
||||
|
||||
**Deployment Steps:**
|
||||
1. Install Ollama on TX1
|
||||
2. Deploy AnythingLLM via Docker
|
||||
3. Deploy Open WebUI via Docker
|
||||
4. Configure Nginx reverse proxy for both
|
||||
5. Test basic functionality
|
||||
6. Document at `docs/deployment/ai-stack.md`
|
||||
|
||||
---
|
||||
|
||||
**Phase 2: Load LLM Models (6-8 hours download, overnight)**
|
||||
- Qwen 2.5 Coder 72B (~40GB) — Coding tasks
|
||||
- Llama 3.3 70B (~40GB) — Conversation, reasoning
|
||||
- Llama 3.2 Vision 11B (~7GB) — Image understanding
|
||||
- Total: ~90GB storage, ~90GB RAM when all loaded
|
||||
|
||||
**Phase 3: Gitea Integration (1-2 hours)**
|
||||
- Configure AI to access `/mnt/gitea/operations-manual/`
|
||||
- Test full repo context loading
|
||||
- Implement RAG system for document retrieval
|
||||
- Verify DERP functionality (can reconstruct work from repo)
|
||||
**Core Models:**
|
||||
- Qwen 2.5 Coder 72B (~40GB) — Coding tasks, script generation
|
||||
- Llama 3.3 70B (~40GB) — Conversation, reasoning, decision-making
|
||||
- Llama 3.2 Vision 11B (~7GB) — Image understanding, photo processing
|
||||
|
||||
**Embedding Model (for RAG):**
|
||||
- all-MiniLM-L6-v2 (~400MB) — Document embeddings for semantic search
|
||||
- Or: nomic-embed-text (~700MB) — Higher quality, slightly larger
|
||||
|
||||
**Total Storage:** ~90-95GB for models + 50-60GB for embeddings/vector DB = ~150GB
|
||||
**Total RAM when loaded:** ~100-110GB (models + vector DB in memory)
|
||||
|
||||
---
|
||||
|
||||
**Phase 3: Document Ingestion & Vector Database (2-3 hours ACTIVE, 6-8 hours total processing)**
|
||||
|
||||
**CRITICAL: Batch ingestion to prevent OOM (Out of Memory) killer**
|
||||
|
||||
**Preparation:**
|
||||
```bash
|
||||
# Clone repos to local filesystem
|
||||
mkdir -p /opt/firefrost/repos
|
||||
cd /opt/firefrost/repos
|
||||
git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
|
||||
git clone https://git.firefrostgaming.com/firefrost-gaming/pokerole-project.git
|
||||
git clone https://git.firefrostgaming.com/firefrost-gaming/brainstorming.git
|
||||
```
|
||||
|
||||
**Workspace Structure in AnythingLLM:**
|
||||
|
||||
**Workspace 1: Firefrost Operations** (~500 files, growing)
|
||||
- Full operations-manual repo
|
||||
- Session transcripts
|
||||
- Deployment guides
|
||||
- Infrastructure docs
|
||||
- Consultant profiles
|
||||
|
||||
**Workspace 2: Pokerole Project** (~200 files)
|
||||
- Separate context for Claudius
|
||||
- Holly collaboration docs
|
||||
- Pokédex content
|
||||
- Isolated from main operations (security)
|
||||
|
||||
**Workspace 3: Brainstorming** (~100 files)
|
||||
- Sandbox sessions
|
||||
- Gemini brainstorms
|
||||
- Idea backlogs
|
||||
- Among Us planning
|
||||
|
||||
**Workspace 4: Staff Wiki** (future, when deployed)
|
||||
- Staff-facing documentation only
|
||||
- Procedures, troubleshooting, policies
|
||||
- No access to operations manual (private)
|
||||
|
||||
**Ingestion Process (BATCH 100 DOCS AT A TIME):**
|
||||
|
||||
1. **Start with Firefrost Operations workspace**
|
||||
- Upload first 100 markdown files from operations-manual
|
||||
- Wait for embedding completion (~30-45 min)
|
||||
- Verify search works before continuing
|
||||
|
||||
2. **Continue in batches of 100**
|
||||
- Monitor RAM usage (should stay under 150GB)
|
||||
- If RAM spikes above 180GB, reduce batch size to 50
|
||||
- Total ingestion: ~4-6 hours for 500 docs
|
||||
|
||||
3. **Repeat for other workspaces**
|
||||
- Pokerole: 2-3 hours (smaller dataset)
|
||||
- Brainstorming: 1-2 hours
|
||||
- Total: 6-8 hours processing time
|
||||
|
||||
**Testing:**
|
||||
- Test semantic search: "What is the Frostwall Protocol?"
|
||||
- Test cross-document synthesis: "Summarize our infrastructure decisions"
|
||||
- Test DERP functionality: "Reconstruct session from Feb 13"
|
||||
- Verify workspace isolation (Pokerole can't see Operations)
|
||||
|
||||
**Hardware Reality Check:**
|
||||
- Your TX1: 251GB RAM, 809GB storage, 32 cores
|
||||
- Requirements: 16-32GB RAM, 50GB storage, 4+ cores
|
||||
- **TX1 is massive overkill (good thing) ✅**
|
||||
|
||||
---
|
||||
|
||||
**Phase 4: Staff AI Assistant (2-3 hours)**
|
||||
- Deploy embedded chat widget OR dedicated portal
|
||||
- Configure knowledge base (staff wiki only, operations manual stays private)
|
||||
- Test 24/7 staff question answering
|
||||
- Document usage in staff wiki (when deployed)
|
||||
- Reduces Michael/Meg interruptions
|
||||
- Onboarding tool for future recruitment
|
||||
|
||||
**TX1 Resources After Deployment:**
|
||||
**Deploy Open WebUI with staff wiki docs only**
|
||||
- Much smaller dataset (~50-100 docs when staff wiki exists)
|
||||
- Built-in Chroma vector DB sufficient (no need for external)
|
||||
- Embedded chat widget OR dedicated portal
|
||||
- Domain: staff-ai.firefrostgaming.com
|
||||
|
||||
**Configuration:**
|
||||
1. Create "Staff Wiki" knowledge base in Open WebUI
|
||||
2. Upload staff-facing docs only (operations manual stays private in AnythingLLM)
|
||||
3. Configure access (staff accounts, not public)
|
||||
4. Test 24/7 staff question answering:
|
||||
- "How do I restart a game server?"
|
||||
- "What's the whitelist process?"
|
||||
- "Who do I contact for billing issues?"
|
||||
5. Document usage in staff wiki
|
||||
6. Train Meg on basic usage
|
||||
|
||||
**Benefits:**
|
||||
- Reduces Michael/Meg interruptions (staff self-serve)
|
||||
- 24/7 availability (AI doesn't sleep)
|
||||
- Onboarding tool for future recruitment
|
||||
- Consistent answers (no "telephone game")
|
||||
|
||||
---
|
||||
|
||||
**TX1 Resources After Full Deployment:**
|
||||
|
||||
**Before AI Stack:**
|
||||
- Games: 20GB RAM, 24GB storage
|
||||
- AI Stack: 90GB RAM, 100GB storage
|
||||
- **Total: 110GB RAM / 251GB = 44% usage ✅**
|
||||
- **Total: 124GB storage / 809GB = 15% usage ✅**
|
||||
|
||||
**After AI Stack:**
|
||||
- Games: 20GB RAM, 24GB storage
|
||||
- AI Models: 100GB RAM, 95GB storage
|
||||
- Vector DB + Embeddings: 20GB RAM, 60GB storage
|
||||
- **Total: 140GB RAM / 251GB = 56% usage ✅**
|
||||
- **Total: 179GB storage / 809GB = 22% usage ✅**
|
||||
|
||||
**Plenty of headroom for growth.**
|
||||
|
||||
---
|
||||
|
||||
**After Completion:**
|
||||
- ✅ DERP backup operational (Claude.ai dies → self-hosted continues)
|
||||
- ✅ Unlimited AI access ($0 vs Claude Pro $20/month)
|
||||
- ✅ Unlimited AI access ($0 vs Claude Pro $20/month = $240/year saved)
|
||||
- ✅ Staff assistant reduces support burden
|
||||
- ✅ Full repo context for decision-making
|
||||
- ✅ Full repo context for decision-making (1,000+ docs semantic search)
|
||||
- ✅ Workspace isolation (Operations separate from Pokerole separate from Staff)
|
||||
- ✅ Proven to scale: 5,000+ document libraries in production
|
||||
- ✅ Self-healing via Git: repos update → workspaces re-ingest automatically
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user