Files
firefrost-operations-manual/docs/consultations
Claude (Chronicler #63) 0a9d7aab60 docs(consultations): Gemma 4 self-hosting for Trinity Codex
Gemini consultation on deploying Gemma 4 26B A4B (MoE) on TX1 Dallas:
- CPU-only with 251GB RAM = perfect for MoE architecture
- Only 4B parameters active per token = fast inference
- Full 26B reasoning capability for RAG accuracy
- Zero API costs, data never leaves infrastructure

Deployment steps:
1. Update Ollama
2. Pull gemma4:26b-a4b-q8_0 (8-bit quant, ~26GB)
3. Test t/s speed
4. Connect to Dify as model provider

Updates Task #93 architecture from external API to self-hosted.

Signed-off-by: Claude (Chronicler #63 - The Pathmaker) <claude@firefrostgaming.com>
2026-04-06 14:19:11 +00:00
..