Gemini consultation Round 2 response: embedding model, hybrid search, CPU pinning

Key findings from Round 2: - snowflake-arctic-embed-m (1.5GB) wins over bge-m3 - Hybrid search works OOTB with Dify 1.12.0 + Qdrant - Gitea plugin: strip OAuth, pin SHA, batch 10/1.5s - CRITICAL: CPU pinning needed for Ollama vs game servers - Awakened Concierge is Priority 1 for subscriber growth - State of the Realm weekly report is feasible - Keep current proxy architecture (don't add Workers) Claude (Chronicler #82)
2026-04-12 07:45:34 +00:00
parent 0447ac8995
commit 3dc0ae9150
1 changed files with 60 additions and 0 deletions
--- a/docs/consultations/gemini-forge-ecosystem-round-2-response-2026-04-12.md
+++ b/docs/consultations/gemini-forge-ecosystem-round-2-response-2026-04-12.md
@@ -0,0 +1,60 @@
+# Gemini Response: The Forge Ecosystem — Round 2
+
+**Date:** April 12, 2026  
+**Summary:** snowflake-arctic-embed-m wins (1.5GB, lighter on CPU). Fresh KB required for model swap. Hybrid search works out of the box with Qdrant. Strip OAuth from Gitea plugin, pin to SHA for tree walks, batch 10 files at 1.5s delay. Awakened Concierge is Priority 1. CPU pinning is critical for TX1 when players are online. State of the Realm report is feasible via n8n→Dify→R2→Discord.
+
+---
+
+## Key Decisions Made
+
+### Embedding Model: snowflake-arctic-embed-m
+- 1.5GB RAM footprint (vs 2.5-3GB for bge-m3)
+- Lighter on CPU inference — critical for TX1 thread economy
+- 768-dimensional vectors (same as nomic, but different mapping)
+- **Must rebuild knowledge base from scratch** — cannot migrate incrementally
+- Zero-downtime swap: build new KB, test, swap dataset in app config
+
+### Hybrid Search: Works Out of the Box
+- Dify 1.12.0 auto-configures Qdrant BM25 sparse indices
+- No manual Qdrant configuration needed
+- Uses more RAM/disk in Qdrant — not a concern with 251GB pool
+- Monitor Qdrant container memory as a precaution
+
+### Gitea Plugin Architecture
+- **Strip OAuth entirely** — use simple PAT (Personal Access Token)
+- Auth header: `Authorization: token {api_token}` (not Bearer)
+- **Always resolve branch to SHA first** — prevents mid-sync drift
+- Endpoint: `GET /api/v1/repos/:owner/:repo/branches/:branch` → `commit.id`
+- Then: `GET /api/v1/repos/:owner/:repo/git/trees/:sha?recursive=true`
+- **Batch size: 10 files, 1.5s delay** — 114 docs in under a minute
+- Filter to `.md` files only, optionally restrict to `docs/` directory
+
+### Wild Ideas Priority Order
+1. **Awakened Concierge** (FIRST — community growth funds RV dream)
+   - Route through Dify API (not direct Gemma 4) for persona grounding
+   - 60-word welcome ≈ 4-5 seconds at 14.4 t/s — feels like natural typing
+   - Use Discord "bot is typing..." status during generation
+2. **Jack Alert Override** (Quick win — simple n8n webhook + phone shortcut)
+3. **Pterodactyl Auto-Janitor** (After launch — needs real crash data)
+
+### Architecture Decisions
+- **Keep current proxy route** (Trinity Console → nginx → Dify) — don't add Cloudflare Worker
+- **CRITICAL: CPU pinning required** — Docker `cpuset-cpus` for Ollama/Dify
+  - Ollama uses 100% of CPU threads it touches
+  - Minecraft servers are single-thread dependent
+  - If Gemma 4 generates during player activity → tick lag
+  - Reserve dedicated core block for AI, let Pterodactyl manage the rest
+
+### State of the Realm Report — Architecture
+1. n8n cron trigger (weekly, e.g., Sunday 6 PM)
+2. HTTP nodes gather: Pterodactyl API (uptime/CPU), Stripe API (subs), Discord API (members)
+3. n8n bundles raw JSON into single text block
+4. Sends to Dify with Chronicler persona prompt
+5. Dify returns formatted Markdown
+6. n8n pushes to Cloudflare R2 bucket
+7. n8n fires Discord webhook with R2 link
+
+---
+
+## Conclusion
+Gemini validated our approach across the board and added the critical CPU pinning insight that could have bitten us hard at launch. The roadmap is clear.