Files
firefrost-operations-manual/docs/consultations/2026-04-07-model-switching.md
Claude f2e99ed9f6 Task #93: Add Chronicler session management architecture
Per Gemini consultation on model switching:
- Added future expansion section for Chronicler Orchestration Layer
- Middleware routing (Haiku/4.5/4.6 based on query type)
- Compressed state management (not raw history)
- RAG for historical context queries
- Prompt caching at API level

Saved consultation: docs/consultations/2026-04-07-model-switching.md

Long-term solution integrated into Codex roadmap.
Short-term: exploring Claude Projects optimization.

Chronicler #66 - The Indexer
2026-04-07 20:05:12 +00:00

4.7 KiB

Gemini Consultation: Model Switching & Session Continuity

Date: April 7, 2026
Consulted By: Chronicler #66 (The Indexer)
Topic: How to switch Claude models mid-session without losing context


The Question

We use Opus 4.5 for general ops and want to switch to Opus 4.6 for complex coding — but each session starts fresh. How do we maintain continuity?


Gemini's Response

Hey Michael and Claude! 💙🔥❄️

You have hit the classic "Context Bloat Wall." When you are building a massive, interconnected system like Firefrost Gaming, relying on a single chat thread or continuously copy-pasting massive context blocks eventually breaks the token bank and degrades the model's performance.

Transitioning from "a really long chat" to a true "stateful architecture" is the exact right move for keeping this project RV-Ready.

1. API Approach (History Export/Import)

The Reality: Yes, you can programmatically pass the entire messages array from one model's API to another.

The Trap: If you dump a 50k-token history into a new model, you are paying for those 50k input tokens every single time you make a call.

The RV-Ready Solution: Do not pass raw history. Pass compressed state. When a session ends, have the current model generate a dense, compressed JSON or Markdown summary of the current state of the project. When you boot up the next model, you inject only that summary plus your core system prompt.

2. Proxy / Middleware Architecture

Building a middleware router is exactly how enterprise multi-agent systems work. You already have the perfect tools for this: Dify and n8n.

Instead of interacting with a web UI, you can build a custom chat interface in Dify. Behind the scenes, Dify acts as the orchestrator:

  • Simple queries get routed to a cheaper, faster model (like Claude 3.5 Haiku or Gemini Flash)
  • Architectural queries get routed to the heavy hitters (Claude 3.7 Sonnet or Gemini Pro)
  • The middleware maintains the unified session state in your database

3. Native Features (Web UI Blindspots)

If you are using the Claude.ai web interface, you are largely constrained by their consumer UI.

  • Projects: You can use Claude Projects to upload static context, which allows the model to reference it without eating up your chat window limits as quickly.
  • The Limitation: The web UI does not natively support dynamic mid-conversation model swapping while perfectly maintaining the underlying context tree. To get true control over memory, you have to transition to API-based workflows.

4. Alternative Architectures (The Golden Ticket)

Prompt Caching: This is the silver bullet for your token costs and context issues. Anthropic's API supports Prompt Caching.

You can cache your massive "Source of Truth" documents and the established Chronicler persona. When you make a new request, the API reads the cached context almost instantly at a fraction of the cost (often a 90% discount on input tokens).

RAG Integration (Trinity Codex): Instead of passing chat history, let your models query the Codex. If Claude #65 needs to know what Claude #62 decided about the ModpackChecker schema, it doesn't need the chat history—it just queries the vector database.

5. Token Optimization

To optimize costs without sacrificing intelligence:

  • Context Pruning: Implement a rolling context window. Keep the last 10 messages raw, and have the system automatically summarize anything older.
  • Modular Prompting: Don't load the ModpackChecker code if you are working on the Social Hub. Inject only the relevant reference files based on the user's initial prompt.

The Fork in the Road

Do you want to keep the primary interaction inside the native web interfaces (and optimize via Projects and summaries), or are you ready to transition the Chronicler interface fully into your Dify/API environment where we can implement Prompt Caching and dynamic routing?


Our Decision

Long-term: Add Chronicler Orchestration Layer to Task #93 (Trinity Codex)

Short-term: Explore what Claude Projects can do RIGHT NOW for session transitions


Follow-Up Questions Sent to Gemini

  1. Can we upload persistent docs to Projects that Claude reads each session?
  2. Should we have separate Projects for Ops (4.5) vs Dev (4.6)?
  3. How does Memory interact with Projects — can it bridge sessions better?
  4. What's the optimal workflow using only claude.ai features (no API yet)?

Action Items

  • Add long-term architecture to Task #93
  • Get Gemini's follow-up on Claude Projects
  • Test Project-based workflow improvements
  • Document findings for future Chroniclers

Fire + Frost + Foundation = Where Love Builds Legacy 🔥❄️