docs: Gemini consultation — Trinity Core MCP session persistence (launch eve)
This commit is contained in:
@@ -0,0 +1,131 @@
|
||||
# Gemini Consultation: Trinity Core MCP Session Persistence
|
||||
|
||||
**Date:** April 14, 2026, ~7:30 PM CDT
|
||||
**From:** Michael (The Wizard) + Claude (Chronicler #90)
|
||||
**To:** Gemini (Architectural Partner)
|
||||
**Re:** Claude.ai MCP connector gets stuck on stale Trinity Core sessions — need a resilient fix before launch day
|
||||
|
||||
---
|
||||
|
||||
## Hey Gemini! 👋
|
||||
|
||||
We're the night before launch (April 15, 7AM CDT) and Trinity Core's MCP transport is giving us trouble. The *service itself* is perfectly healthy — REST API works, SSH execution works, both MCP transports initialize correctly when tested directly via curl. But Claude.ai's built-in MCP connector keeps returning "Session terminated" and won't re-initialize.
|
||||
|
||||
We need your help designing a resilient solution so this doesn't keep biting us.
|
||||
|
||||
---
|
||||
|
||||
## The Situation
|
||||
|
||||
**Trinity Core v2.4.0** runs on a Raspberry Pi behind a Cloudflare Tunnel (`mcp.firefrostgaming.com` → `localhost:3000`). It supports two MCP transports:
|
||||
|
||||
1. **Streamable HTTP** (protocol 2025-11-25) at `POST /mcp` — creates sessions with UUID, tracks in an in-memory `activeSessions` Map
|
||||
2. **Legacy SSE** (protocol 2024-11-05) at `GET /mcp` + `POST /mcp/messages?sessionId=...`
|
||||
|
||||
When Trinity Core restarts (or the tunnel flaps), all in-memory sessions are lost. The code correctly returns 404 for stale session IDs:
|
||||
|
||||
```javascript
|
||||
} else if (sessionId && !activeSessions.has(sessionId)) {
|
||||
log(`StreamableHTTP stale session ${sessionId} — returning 404`);
|
||||
return res.status(404).json({
|
||||
jsonrpc: '2.0',
|
||||
error: { code: -32001, message: 'Session not found. Please re-initialize.' },
|
||||
id: null
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**The problem:** Claude.ai's MCP connector receives the 404 but does NOT re-initialize. It just keeps retrying the stale session ID and reports "Session terminated" to the conversation. Starting a new Claude.ai conversation sometimes helps (fresh connector = fresh session), but not always.
|
||||
|
||||
**What the logs show:**
|
||||
|
||||
```
|
||||
[2026-04-15T00:15:28.905Z] AUTH FAILED from ::1
|
||||
[2026-04-15T00:16:28.972Z] AUTH FAILED from ::1
|
||||
... (every 60 seconds)
|
||||
[2026-04-15T00:20:54.430Z] StreamableHTTP GET from ::1 session=703f0ece-8187-43fc-b3cb-6f714ae99c0c
|
||||
[2026-04-15T00:20:54.431Z] StreamableHTTP stale session 703f0ece-... — returning 404
|
||||
[2026-04-15T00:20:54.787Z] StreamableHTTP GET from ::1 session=703f0ece-...
|
||||
[2026-04-15T00:20:54.787Z] StreamableHTTP stale session 703f0ece-... — returning 404
|
||||
```
|
||||
|
||||
Two issues visible:
|
||||
- **AUTH FAILED every 60 seconds from ::1** — something on the Pi is hitting Trinity Core without a Bearer token (health check? keep-alive?)
|
||||
- **Stale session `703f0ece-...`** keeps getting retried, never re-initializes
|
||||
|
||||
---
|
||||
|
||||
## Michael's Questions (Important Context)
|
||||
|
||||
Michael raised three possibilities we want your input on:
|
||||
|
||||
1. **He left Chronicler #87's session AND a Claude Code session open on his Nitro laptop at home.** Could those old browser tabs be holding the stale MCP session and preventing the new conversation from getting a clean connection? Claude.ai may share MCP connector state across tabs/sessions.
|
||||
|
||||
2. **Chronicler #87 was working on server health monitoring.** Could #87 have set up some polling mechanism that's now the source of the AUTH FAILED every 60 seconds?
|
||||
|
||||
3. **Chronicler #87 also made SSH key changes.** Could key modifications on the Pi or target servers affect MCP transport negotiation? (We think probably not — our REST API SSH test to command-center succeeded — but want your take.)
|
||||
|
||||
---
|
||||
|
||||
## What We're Trying to Do
|
||||
|
||||
Design Trinity Core to be resilient against stale sessions regardless of what Claude.ai's connector does. We can't control Claude.ai's reconnection behavior, so we need to handle it server-side.
|
||||
|
||||
**Our current workaround:** REST API wrapper via curl that bypasses MCP entirely. Works fine but loses the native MCP tool integration.
|
||||
|
||||
---
|
||||
|
||||
## Specific Questions
|
||||
|
||||
1. **Should we make Trinity Core transparently re-initialize when it receives a request with a stale session ID?** Instead of returning 404, detect the stale session + initialize request combo and create a new session on the fly. What are the risks?
|
||||
|
||||
2. **Should we add a `/health` endpoint that doesn't require auth?** Cloudflare Tunnel or some other process might be doing health checks that hit the auth middleware. Would an unauthenticated health route fix the AUTH FAILED spam?
|
||||
|
||||
3. **Is there a better session persistence strategy for a Pi?** We're using an in-memory Map — should we persist sessions to a file or SQLite so they survive restarts? Or is that overengineering for an MCP server?
|
||||
|
||||
4. **Could Claude.ai's MCP connector share session state across browser tabs?** If Michael has session #87 open in one tab and session #90 in another, could the connector from #87's tab be monopolizing the MCP connection and blocking #90 from initializing fresh?
|
||||
|
||||
5. **What's the best practice for the AUTH FAILED spam?** Is it harmful, or just noisy? Should we add a health endpoint, or suppress the log for specific paths?
|
||||
|
||||
---
|
||||
|
||||
## Context That Might Help
|
||||
|
||||
- **Pi hardware:** Raspberry Pi (ARM), runs Node.js, ~91MB RSS for the Trinity Core process
|
||||
- **Tunnel:** Cloudflare Tunnel (`cloudflared`), has had "datagram manager failure" warnings in logs but reconnects
|
||||
- **MCP SDK:** `@modelcontextprotocol/sdk` — version used in v2.4.0 build
|
||||
- **Claude.ai connector URL:** `https://mcp.firefrostgaming.com/mcp` (configured in Claude.ai settings)
|
||||
- **OAuth flow:** Simplified — `/authorize` auto-redirects with code, `/token` returns the static API token. This was designed for Claude.ai's OAuth requirement.
|
||||
- **The service is NOT managed by systemd** — it was started manually (`node index.js`) and is running as PID 7953 under `claude_executor`
|
||||
|
||||
---
|
||||
|
||||
## Current Trinity Core Source
|
||||
|
||||
Here's the full routing for reference:
|
||||
|
||||
```javascript
|
||||
// Streamable HTTP
|
||||
app.all('/mcp', auth, async (req, res) => {
|
||||
const sessionId = req.headers['mcp-session-id'];
|
||||
// ... creates new session on POST without sessionId if isInitializeRequest
|
||||
// ... returns 404 for stale sessions
|
||||
// ... falls through to legacySSE for GET without sessionId
|
||||
});
|
||||
|
||||
// Legacy SSE
|
||||
async function legacySSE(req, res) {
|
||||
// GET /mcp with no session → opens SSE stream
|
||||
// Returns endpoint: /mcp/messages?sessionId=<uuid>
|
||||
}
|
||||
|
||||
app.post('/mcp/messages', auth, async (req, res) => {
|
||||
// Legacy SSE message posting
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Thanks Gemini! This is launch eve and Trinity Core is our lifeline for server management. Any architectural guidance on making this bulletproof would be huge. 🔥❄️
|
||||
|
||||
— Michael + Claude (Chronicler #90)
|
||||
Reference in New Issue
Block a user