diff --git a/docs/consultations/2026-04-07-model-switching.md b/docs/consultations/2026-04-07-model-switching.md new file mode 100644 index 0000000..2baa906 --- /dev/null +++ b/docs/consultations/2026-04-07-model-switching.md @@ -0,0 +1,94 @@ +# Gemini Consultation: Model Switching & Session Continuity + +**Date:** April 7, 2026 +**Consulted By:** Chronicler #66 (The Indexer) +**Topic:** How to switch Claude models mid-session without losing context + +--- + +## The Question + +We use Opus 4.5 for general ops and want to switch to Opus 4.6 for complex coding β€” but each session starts fresh. How do we maintain continuity? + +--- + +## Gemini's Response + +> Hey Michael and Claude! πŸ’™πŸ”₯❄️ +> +> You have hit the classic "Context Bloat Wall." When you are building a massive, interconnected system like Firefrost Gaming, relying on a single chat thread or continuously copy-pasting massive context blocks eventually breaks the token bank and degrades the model's performance. +> +> Transitioning from "a really long chat" to a true "stateful architecture" is the exact right move for keeping this project RV-Ready. + +### 1. API Approach (History Export/Import) + +**The Reality:** Yes, you can programmatically pass the entire messages array from one model's API to another. + +**The Trap:** If you dump a 50k-token history into a new model, you are paying for those 50k input tokens every single time you make a call. + +**The RV-Ready Solution:** Do not pass raw history. Pass **compressed state**. When a session ends, have the current model generate a dense, compressed JSON or Markdown summary of the *current state of the project*. When you boot up the next model, you inject *only* that summary plus your core system prompt. + +### 2. Proxy / Middleware Architecture + +Building a middleware router is exactly how enterprise multi-agent systems work. You already have the perfect tools for this: **Dify and n8n**. + +Instead of interacting with a web UI, you can build a custom chat interface in Dify. Behind the scenes, Dify acts as the orchestrator: +- **Simple queries** get routed to a cheaper, faster model (like Claude 3.5 Haiku or Gemini Flash) +- **Architectural queries** get routed to the heavy hitters (Claude 3.7 Sonnet or Gemini Pro) +- The middleware maintains the unified session state in your database + +### 3. Native Features (Web UI Blindspots) + +If you are using the Claude.ai web interface, you are largely constrained by their consumer UI. + +- **Projects:** You can use Claude Projects to upload static context, which allows the model to reference it without eating up your chat window limits as quickly. +- **The Limitation:** The web UI does not natively support dynamic mid-conversation model swapping while perfectly maintaining the underlying context tree. To get true control over memory, you have to transition to API-based workflows. + +### 4. Alternative Architectures (The Golden Ticket) + +**Prompt Caching:** This is the silver bullet for your token costs and context issues. Anthropic's API supports Prompt Caching. + +You can cache your massive "Source of Truth" documents and the established Chronicler persona. When you make a new request, the API reads the cached context almost instantly at a fraction of the cost (often a 90% discount on input tokens). + +**RAG Integration (Trinity Codex):** Instead of passing chat history, let your models query the Codex. If Claude #65 needs to know what Claude #62 decided about the ModpackChecker schema, it doesn't need the chat historyβ€”it just queries the vector database. + +### 5. Token Optimization + +To optimize costs without sacrificing intelligence: + +- **Context Pruning:** Implement a rolling context window. Keep the last 10 messages raw, and have the system automatically summarize anything older. +- **Modular Prompting:** Don't load the ModpackChecker code if you are working on the Social Hub. Inject *only* the relevant reference files based on the user's initial prompt. + +### The Fork in the Road + +> Do you want to keep the primary interaction inside the native web interfaces (and optimize via Projects and summaries), or are you ready to transition the Chronicler interface fully into your Dify/API environment where we can implement Prompt Caching and dynamic routing? + +--- + +## Our Decision + +**Long-term:** Add Chronicler Orchestration Layer to Task #93 (Trinity Codex) + +**Short-term:** Explore what Claude Projects can do RIGHT NOW for session transitions + +--- + +## Follow-Up Questions Sent to Gemini + +1. Can we upload persistent docs to Projects that Claude reads each session? +2. Should we have separate Projects for Ops (4.5) vs Dev (4.6)? +3. How does Memory interact with Projects β€” can it bridge sessions better? +4. What's the optimal workflow using only claude.ai features (no API yet)? + +--- + +## Action Items + +- [x] Add long-term architecture to Task #93 +- [ ] Get Gemini's follow-up on Claude Projects +- [ ] Test Project-based workflow improvements +- [ ] Document findings for future Chroniclers + +--- + +**Fire + Frost + Foundation = Where Love Builds Legacy** πŸ”₯❄️ diff --git a/docs/tasks/task-093-trinity-codex/README.md b/docs/tasks/task-093-trinity-codex/README.md index 26ed636..ba2b9bd 100644 --- a/docs/tasks/task-093-trinity-codex/README.md +++ b/docs/tasks/task-093-trinity-codex/README.md @@ -429,6 +429,56 @@ Per Gemini's recommendation: --- +## πŸ”„ FUTURE EXPANSION: Chronicler Session Management + +**Added:** April 7, 2026 by Chronicler #66 (per Gemini consultation) + +### The Problem + +Chronicler sessions hit "Context Bloat Wall" β€” switching between models (4.5 for ops, 4.6 for coding) loses context. Current workarounds (copy-paste, handoff docs) are manual and error-prone. + +### The Long-Term Solution (Build into Codex) + +Gemini recommends evolving Trinity Codex into a full **Chronicler Orchestration Layer**: + +1. **Middleware Router in Dify** + - Simple queries β†’ route to Haiku (cheap, fast) + - Documentation queries β†’ route to Opus 4.5 + - Complex coding queries β†’ route to Opus 4.6 + - User experience feels like one continuous Chronicler + +2. **Compressed State Management** + - End of session: Model generates dense JSON state summary + - State stored in Codex (not raw chat history) + - New session injects only compressed state + system prompt + - 90%+ token reduction vs passing full history + +3. **RAG for Historical Context** + - Instead of passing chat history, query Codex + - "What did #62 decide about ModpackChecker schema?" β†’ vector search + - Past decisions accessible without token bloat + +4. **Prompt Caching (API-level)** + - Cache system prompt + persona + standards docs + - New requests read cached context at ~10% token cost + - Requires API access (not web UI) + +### Implementation Phases + +| Phase | What | When | +|-------|------|------| +| 1 | Basic Codex (current spec) | After soft launch | +| 2 | Add session state storage | After Phase 1 stable | +| 3 | Add model routing logic | When token costs justify | +| 4 | Full orchestration layer | RV-Ready milestone | + +### Reference + +- Gemini consultation: `docs/consultations/2026-04-07-model-switching.md` +- Context handoff template: `docs/templates/context-handoff-template.md` + +--- + ## Open Questions for Implementation 1. What's the current Dify version on TX1?