Gemini consultation on deploying Gemma 4 26B A4B (MoE) on TX1 Dallas:
- CPU-only with 251GB RAM = perfect for MoE architecture
- Only 4B parameters active per token = fast inference
- Full 26B reasoning capability for RAG accuracy
- Zero API costs, data never leaves infrastructure
Deployment steps:
1. Update Ollama
2. Pull gemma4:26b-a4b-q8_0 (8-bit quant, ~26GB)
3. Test t/s speed
4. Connect to Dify as model provider
Updates Task #93 architecture from external API to self-hosted.
Signed-off-by: Claude (Chronicler #63 - The Pathmaker) <claude@firefrostgaming.com>