Ollama 0.20.5 (updated from 0.16.2, fixed Docker networking) Model: gemma4:26b-a4b-it-q8_0 (28GB, q8_0 quantization) Speed: 14.4 tokens/sec on CPU-only RAM: 93GB/251GB used, 157GB available for game servers Remaining: Connect to Dify as model provider (web UI step) Chronicler #78 | firefrost-operations-manual
2.4 KiB
2.4 KiB
Task #96: Gemma 4 Deployment Log
Date: April 11, 2026 Chronicler: #78 Status: Model deployed, Dify connection pending
Deployment Steps Completed
1. Ollama Update
- Before: Docker container, Ollama 0.16.2
- Problem: Container had broken bridge networking (no internet access)
- Fix: Recreated container with
--network host - After: Ollama 0.20.5 with full network access
2. Model Pull
- Tag:
gemma4:26b-a4b-it-q8_0(NOT26b-a4b-q8_0as in Gemini consultation) - Size: 28GB
- Download speed: ~250 MB/s
- Total download time: ~3 minutes
3. Inference Test
- First query response: "I am a large language model, trained by Google."
- Speed: 14.4 tokens/sec
- Total time: 13.1s for 175 eval tokens
- Loading time: First run had ~30 second model load (loading into RAM)
4. RAM Impact
| Metric | Before | After | Available |
|---|---|---|---|
| Total RAM | 251GB | 251GB | — |
| Used | 65GB | 93GB | — |
| Available | 186GB | 157GB | 157GB |
Verdict: 28GB used by model, 157GB still available. Game servers unaffected.
5. Player Impact Check
- Queried all 20 Minecraft servers via MC ping protocol
- 0 players online at time of deployment (9:45 AM Saturday)
- No game server performance impact expected even under load
Docker Container Configuration
docker run -d \
--name ollama \
--network host \
-v /usr/share/ollama/.ollama:/root/.ollama \
--restart unless-stopped \
ollama/ollama
Key change from original: --network host instead of default bridge networking. Bridge mode had broken DNS/routing in the container.
Remaining Steps
-
Connect to Dify (requires web UI — Michael)
- codex.firefrostgaming.com → Settings → Model Providers → Ollama
- Model:
gemma4:26b-a4b-it-q8_0 - Base URL:
http://host.docker.internal:11434orhttp://172.17.0.1:11434 - Context Length: 65536
-
Test RAG queries against operations manual
-
Benchmark quality — compare against previous Dify responses
Errata
Gemini consultation typo: The consultation at docs/consultations/gemini-gemma4-selfhosting-2026-04-06.md references gemma4:26b-a4b-q8_0. The correct Ollama tag is gemma4:26b-a4b-it-q8_0 (includes it for instruction-tuned).
Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️