Files
firefrost-operations-manual/docs/tasks/task-096-gemma4-self-hosted-llm/DEPLOYMENT-LOG.md
Claude 0fe2753fd8 docs: Task #96 deployment log — Gemma 4 live on TX1
Ollama 0.20.5 (updated from 0.16.2, fixed Docker networking)
Model: gemma4:26b-a4b-it-q8_0 (28GB, q8_0 quantization)
Speed: 14.4 tokens/sec on CPU-only
RAM: 93GB/251GB used, 157GB available for game servers
Remaining: Connect to Dify as model provider (web UI step)

Chronicler #78 | firefrost-operations-manual
2026-04-11 15:00:11 +00:00

2.4 KiB

Task #96: Gemma 4 Deployment Log

Date: April 11, 2026 Chronicler: #78 Status: Model deployed, Dify connection pending


Deployment Steps Completed

1. Ollama Update

  • Before: Docker container, Ollama 0.16.2
  • Problem: Container had broken bridge networking (no internet access)
  • Fix: Recreated container with --network host
  • After: Ollama 0.20.5 with full network access

2. Model Pull

  • Tag: gemma4:26b-a4b-it-q8_0 (NOT 26b-a4b-q8_0 as in Gemini consultation)
  • Size: 28GB
  • Download speed: ~250 MB/s
  • Total download time: ~3 minutes

3. Inference Test

  • First query response: "I am a large language model, trained by Google."
  • Speed: 14.4 tokens/sec
  • Total time: 13.1s for 175 eval tokens
  • Loading time: First run had ~30 second model load (loading into RAM)

4. RAM Impact

Metric Before After Available
Total RAM 251GB 251GB
Used 65GB 93GB
Available 186GB 157GB 157GB

Verdict: 28GB used by model, 157GB still available. Game servers unaffected.

5. Player Impact Check

  • Queried all 20 Minecraft servers via MC ping protocol
  • 0 players online at time of deployment (9:45 AM Saturday)
  • No game server performance impact expected even under load

Docker Container Configuration

docker run -d \
  --name ollama \
  --network host \
  -v /usr/share/ollama/.ollama:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama

Key change from original: --network host instead of default bridge networking. Bridge mode had broken DNS/routing in the container.


Remaining Steps

  1. Connect to Dify (requires web UI — Michael)

    • codex.firefrostgaming.com → Settings → Model Providers → Ollama
    • Model: gemma4:26b-a4b-it-q8_0
    • Base URL: http://host.docker.internal:11434 or http://172.17.0.1:11434
    • Context Length: 65536
  2. Test RAG queries against operations manual

  3. Benchmark quality — compare against previous Dify responses


Errata

Gemini consultation typo: The consultation at docs/consultations/gemini-gemma4-selfhosting-2026-04-06.md references gemma4:26b-a4b-q8_0. The correct Ollama tag is gemma4:26b-a4b-it-q8_0 (includes it for instruction-tuned).


Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️