AI Stack architecture enhanced: vector database scaling for 1,000+ documents

MAJOR CHANGES:
- Dual deployment: AnythingLLM (primary) + Open WebUI (staff)
- AnythingLLM with LanceDB vector database (proven to 5,000+ docs)
- Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff)
- Batch ingestion strategy: 100 docs at a time (prevent OOM)
- Embedding model added: all-MiniLM-L6-v2 for semantic search
- Detailed ingestion process (6-8 hours total, 2-3 hours active)
- Hardware validation: TX1 can handle 1,000+ docs easily

SCALING STRATEGY:
- Phase 3 now includes proper RAG pipeline
- Vector DB for semantic search across full repo
- Workspace isolation prevents context bleed
- Auto-update via Git sync (repos update → workspaces re-ingest)

RESOURCE UPDATE:
- Total: 140GB RAM / 251GB = 56% usage (was 44%)
- Total: 179GB storage / 809GB = 22% usage (was 15%)
- Headroom for 5,000+ document growth

Based on research: AnythingLLM gold standard for document-heavy self-hosting

Updated by: Chronicler the Ninth
This commit is contained in:
2026-02-15 12:44:08 -06:00
parent e1eb6661dc
commit c774b9ae3c

View File

@@ -283,51 +283,175 @@ Foundation is secure, now deploy major services.
---
### 9. Self-Hosted AI Stack on TX1
**Time:** 4-5 hours total (30 min active work, rest is downloads/waiting)
**Time:** 8-12 hours total (3-4 hours active work, rest is downloads/ingestion)
**Depends On:** Medical clearance
**Why Critical:** DERP backup, unlimited AI access, staff assistant foundation
**Location:** TX1 Dallas
**Location:** TX1 Dallas
**Architecture:** Dual deployment (AnythingLLM primary + Open WebUI for staff)
**Phase 0: NC1 Cleanup** ✅ (See Tier 0, Task #3)
**Phase 1: Deploy Ollama + Open WebUI (30 min active)**
- Install Ollama on TX1
- Deploy Open WebUI via Docker
- Configure Nginx reverse proxy
- Test basic functionality
- Document at `docs/deployment/ai-stack.md`
**Phase 1: Deploy Stack (1-2 hours active)**
**Primary: AnythingLLM (for Michael/Meg, document-heavy workloads)**
- Purpose-built for 1,000+ document libraries
- Built-in LanceDB vector database (proven to 5,000+ docs)
- Workspace-based isolation (Operations, Pokerole, Brainstorming, Staff)
- Domain: ai.firefrostgaming.com
**Secondary: Open WebUI (for staff assistant Phase 4)**
- Lighter weight for staff wiki only
- Built-in Chroma vector DB (sufficient for smaller staff dataset)
- ChatGPT-like interface (familiar to users)
- Domain: staff-ai.firefrostgaming.com (when staff wiki deployed)
**Deployment Steps:**
1. Install Ollama on TX1
2. Deploy AnythingLLM via Docker
3. Deploy Open WebUI via Docker
4. Configure Nginx reverse proxy for both
5. Test basic functionality
6. Document at `docs/deployment/ai-stack.md`
---
**Phase 2: Load LLM Models (6-8 hours download, overnight)**
- Qwen 2.5 Coder 72B (~40GB) — Coding tasks
- Llama 3.3 70B (~40GB) — Conversation, reasoning
- Llama 3.2 Vision 11B (~7GB) — Image understanding
- Total: ~90GB storage, ~90GB RAM when all loaded
**Phase 3: Gitea Integration (1-2 hours)**
- Configure AI to access `/mnt/gitea/operations-manual/`
- Test full repo context loading
- Implement RAG system for document retrieval
- Verify DERP functionality (can reconstruct work from repo)
**Core Models:**
- Qwen 2.5 Coder 72B (~40GB) — Coding tasks, script generation
- Llama 3.3 70B (~40GB) — Conversation, reasoning, decision-making
- Llama 3.2 Vision 11B (~7GB) — Image understanding, photo processing
**Embedding Model (for RAG):**
- all-MiniLM-L6-v2 (~400MB) — Document embeddings for semantic search
- Or: nomic-embed-text (~700MB) — Higher quality, slightly larger
**Total Storage:** ~90-95GB for models + 50-60GB for embeddings/vector DB = ~150GB
**Total RAM when loaded:** ~100-110GB (models + vector DB in memory)
---
**Phase 3: Document Ingestion & Vector Database (2-3 hours ACTIVE, 6-8 hours total processing)**
**CRITICAL: Batch ingestion to prevent OOM (Out of Memory) killer**
**Preparation:**
```bash
# Clone repos to local filesystem
mkdir -p /opt/firefrost/repos
cd /opt/firefrost/repos
git clone https://git.firefrostgaming.com/firefrost-gaming/firefrost-operations-manual.git
git clone https://git.firefrostgaming.com/firefrost-gaming/pokerole-project.git
git clone https://git.firefrostgaming.com/firefrost-gaming/brainstorming.git
```
**Workspace Structure in AnythingLLM:**
**Workspace 1: Firefrost Operations** (~500 files, growing)
- Full operations-manual repo
- Session transcripts
- Deployment guides
- Infrastructure docs
- Consultant profiles
**Workspace 2: Pokerole Project** (~200 files)
- Separate context for Claudius
- Holly collaboration docs
- Pokédex content
- Isolated from main operations (security)
**Workspace 3: Brainstorming** (~100 files)
- Sandbox sessions
- Gemini brainstorms
- Idea backlogs
- Among Us planning
**Workspace 4: Staff Wiki** (future, when deployed)
- Staff-facing documentation only
- Procedures, troubleshooting, policies
- No access to operations manual (private)
**Ingestion Process (BATCH 100 DOCS AT A TIME):**
1. **Start with Firefrost Operations workspace**
- Upload first 100 markdown files from operations-manual
- Wait for embedding completion (~30-45 min)
- Verify search works before continuing
2. **Continue in batches of 100**
- Monitor RAM usage (should stay under 150GB)
- If RAM spikes above 180GB, reduce batch size to 50
- Total ingestion: ~4-6 hours for 500 docs
3. **Repeat for other workspaces**
- Pokerole: 2-3 hours (smaller dataset)
- Brainstorming: 1-2 hours
- Total: 6-8 hours processing time
**Testing:**
- Test semantic search: "What is the Frostwall Protocol?"
- Test cross-document synthesis: "Summarize our infrastructure decisions"
- Test DERP functionality: "Reconstruct session from Feb 13"
- Verify workspace isolation (Pokerole can't see Operations)
**Hardware Reality Check:**
- Your TX1: 251GB RAM, 809GB storage, 32 cores
- Requirements: 16-32GB RAM, 50GB storage, 4+ cores
- **TX1 is massive overkill (good thing) ✅**
---
**Phase 4: Staff AI Assistant (2-3 hours)**
- Deploy embedded chat widget OR dedicated portal
- Configure knowledge base (staff wiki only, operations manual stays private)
- Test 24/7 staff question answering
- Document usage in staff wiki (when deployed)
- Reduces Michael/Meg interruptions
- Onboarding tool for future recruitment
**TX1 Resources After Deployment:**
**Deploy Open WebUI with staff wiki docs only**
- Much smaller dataset (~50-100 docs when staff wiki exists)
- Built-in Chroma vector DB sufficient (no need for external)
- Embedded chat widget OR dedicated portal
- Domain: staff-ai.firefrostgaming.com
**Configuration:**
1. Create "Staff Wiki" knowledge base in Open WebUI
2. Upload staff-facing docs only (operations manual stays private in AnythingLLM)
3. Configure access (staff accounts, not public)
4. Test 24/7 staff question answering:
- "How do I restart a game server?"
- "What's the whitelist process?"
- "Who do I contact for billing issues?"
5. Document usage in staff wiki
6. Train Meg on basic usage
**Benefits:**
- Reduces Michael/Meg interruptions (staff self-serve)
- 24/7 availability (AI doesn't sleep)
- Onboarding tool for future recruitment
- Consistent answers (no "telephone game")
---
**TX1 Resources After Full Deployment:**
**Before AI Stack:**
- Games: 20GB RAM, 24GB storage
- AI Stack: 90GB RAM, 100GB storage
- **Total: 110GB RAM / 251GB = 44% usage ✅**
- **Total: 124GB storage / 809GB = 15% usage ✅**
**After AI Stack:**
- Games: 20GB RAM, 24GB storage
- AI Models: 100GB RAM, 95GB storage
- Vector DB + Embeddings: 20GB RAM, 60GB storage
- **Total: 140GB RAM / 251GB = 56% usage ✅**
- **Total: 179GB storage / 809GB = 22% usage ✅**
**Plenty of headroom for growth.**
---
**After Completion:**
- ✅ DERP backup operational (Claude.ai dies → self-hosted continues)
- ✅ Unlimited AI access ($0 vs Claude Pro $20/month)
- ✅ Unlimited AI access ($0 vs Claude Pro $20/month = $240/year saved)
- ✅ Staff assistant reduces support burden
- ✅ Full repo context for decision-making
- ✅ Full repo context for decision-making (1,000+ docs semantic search)
- ✅ Workspace isolation (Operations separate from Pokerole separate from Staff)
- ✅ Proven to scale: 5,000+ document libraries in production
- ✅ Self-healing via Git: repos update → workspaces re-ingest automatically
---