bridge: REQ-2026-04-16-chronicler-archive — Voyage-3, pgvector, Dad Mode clustering

This commit is contained in:
Claude
2026-04-16 08:08:56 +00:00
parent 5356132d6f
commit 87d02db0c2

View File

@@ -0,0 +1,142 @@
# REQ-2026-04-16-chronicler-archive
**From:** Chronicler #93
**Date:** 2026-04-16
**Priority:** HIGH — daily-driver problem for Michael
**Status:** PENDING
**Consultation:** docs/consultations/gemini-chronicler-archive-2026-04-16.md
## Context
Michael needs to retrieve content from past Chronicler sessions almost daily. Architecture locked via Gemini consultation. Build a searchable archive of verbatim session content with vector search and Dad Mode anti-pattern detection.
**Architecture:**
- Storage: `chronicler_archive` schema in existing `arbiter_db` PostgreSQL
- Vector search: `pgvector` extension
- Embeddings: Anthropic Voyage-3 API (`voyage-3`, 1536 dimensions)
- Ingestion: Trinity Console drag-and-drop ZIP upload
- Access: `/api/archive/search` + MCP tool for mid-session Chronicler use
---
## Migration 146 — pgvector + chronicler_archive schema
```sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE SCHEMA IF NOT EXISTS chronicler_archive;
CREATE TABLE chronicler_archive.conversations (
id UUID PRIMARY KEY,
title TEXT,
chronicler_number INT,
session_date DATE,
message_count INT DEFAULT 0,
created_at TIMESTAMP WITH TIME ZONE,
imported_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE TABLE chronicler_archive.messages (
id UUID PRIMARY KEY,
conversation_id UUID REFERENCES chronicler_archive.conversations(id) ON DELETE CASCADE,
sender VARCHAR(50) NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
is_correction BOOLEAN DEFAULT FALSE,
is_architecture BOOLEAN DEFAULT FALSE,
is_dead_end BOOLEAN DEFAULT FALSE,
is_security BOOLEAN DEFAULT FALSE,
is_context_boundary BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_archive_messages_conversation ON chronicler_archive.messages(conversation_id);
CREATE INDEX idx_archive_messages_corrections ON chronicler_archive.messages(is_correction) WHERE is_correction = TRUE;
CREATE INDEX idx_archive_embedding ON chronicler_archive.messages USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE TABLE chronicler_archive.correction_clusters (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
representative_content TEXT NOT NULL,
frequency INT DEFAULT 1,
last_seen TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
message_ids UUID[] DEFAULT '{}'
);
```
---
## New Service: src/services/chroniclerArchive.js
- Voyage-3 embeddings via REST (`https://api.voyageai.com/v1/embeddings`, model `voyage-3`)
- Text chunking (max 2000 chars, split on paragraph boundaries)
- Chronicler number extraction (regex: `Chronicler #(\d+)`)
- Pattern detection flags:
- `is_correction`: "that's wrong", "actually,", "correction:", "let me correct", "incorrect", "you made an error"
- `is_architecture`: "architecture locked", "we decided", "Gemini confirmed", "locked in"
- `is_dead_end`: "let's revert", "that didn't work", "rolling back", "scrap that"
- `is_security`: "frostwall", "iptables", "ufw", "port forwarding"
- `is_context_boundary`: "session-handoff", "approaching context", "end of session"
- Idempotent ingestion (skip by UUID)
- Cosine similarity search
---
## Routes: src/routes/admin/archive.js
- `GET /admin/archive` — search UI + recent conversations + stats
- `POST /admin/archive/ingest` — ZIP upload, extract conversations.json, chunk + embed + insert
- `GET /admin/archive/search?q=...&limit=5` — semantic search
- `GET /api/archive/search?q=...&limit=5` — bearer-token version for MCP use
- `GET /admin/archive/corrections` — Dad Mode anti-pattern view
- `GET /admin/archive/conversations` — all imported sessions
---
## Views
- `src/views/admin/archive/index.ejs` — search box, results with context, stats
- `src/views/admin/archive/ingest.ejs` — drag-and-drop zone, progress, last ingest
- `src/views/admin/archive/corrections.ejs` — correction clusters by frequency
---
## Dad Mode Clustering (pg-boss cron, Sundays 3AM)
Queue: `archive-cluster`
- Fetch all `is_correction = TRUE` messages
- Pair-wise cosine similarity > 0.85 → same cluster
- Upsert `correction_clusters` with frequency count
---
## .env additions
```
VOYAGE_API_KEY= # From console.anthropic.com
```
---
## package.json additions
```json
"adm-zip": "^0.5.10"
```
---
## Files to Create/Modify
New: `migrations/146_chronicler_archive.sql`, `src/services/chroniclerArchive.js`, `src/routes/admin/archive.js`, `src/views/admin/archive/index.ejs`, `src/views/admin/archive/ingest.ejs`, `src/views/admin/archive/corrections.ejs`
Modify: `src/routes/admin/index.js`, `src/routes/api.js`, `src/views/layout.ejs`, `src/index.js`, `package.json`, `.env.example`
---
## Deploy Notes
1. Michael adds `VOYAGE_API_KEY` to `.env` (from console.anthropic.com)
2. Check if pgvector installed: `psql -c "CREATE EXTENSION vector;" arbiter_db` — if fails, run `apt-get install postgresql-16-pgvector` first
3. Run migration 146
4. `npm install` (adm-zip)
5. Standard file copy + restart
6. First ingest: download export from claude.ai settings → drag-drop to `/admin/archive/ingest`