bridge: REQ-2026-04-16-chronicler-archive — Voyage-3, pgvector, Dad Mode clustering
This commit is contained in:
142
docs/code-bridge/archive/REQ-2026-04-16-chronicler-archive.md
Normal file
142
docs/code-bridge/archive/REQ-2026-04-16-chronicler-archive.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# REQ-2026-04-16-chronicler-archive
|
||||
|
||||
**From:** Chronicler #93
|
||||
**Date:** 2026-04-16
|
||||
**Priority:** HIGH — daily-driver problem for Michael
|
||||
**Status:** PENDING
|
||||
**Consultation:** docs/consultations/gemini-chronicler-archive-2026-04-16.md
|
||||
|
||||
## Context
|
||||
|
||||
Michael needs to retrieve content from past Chronicler sessions almost daily. Architecture locked via Gemini consultation. Build a searchable archive of verbatim session content with vector search and Dad Mode anti-pattern detection.
|
||||
|
||||
**Architecture:**
|
||||
- Storage: `chronicler_archive` schema in existing `arbiter_db` PostgreSQL
|
||||
- Vector search: `pgvector` extension
|
||||
- Embeddings: Anthropic Voyage-3 API (`voyage-3`, 1536 dimensions)
|
||||
- Ingestion: Trinity Console drag-and-drop ZIP upload
|
||||
- Access: `/api/archive/search` + MCP tool for mid-session Chronicler use
|
||||
|
||||
---
|
||||
|
||||
## Migration 146 — pgvector + chronicler_archive schema
|
||||
|
||||
```sql
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
CREATE SCHEMA IF NOT EXISTS chronicler_archive;
|
||||
|
||||
CREATE TABLE chronicler_archive.conversations (
|
||||
id UUID PRIMARY KEY,
|
||||
title TEXT,
|
||||
chronicler_number INT,
|
||||
session_date DATE,
|
||||
message_count INT DEFAULT 0,
|
||||
created_at TIMESTAMP WITH TIME ZONE,
|
||||
imported_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE TABLE chronicler_archive.messages (
|
||||
id UUID PRIMARY KEY,
|
||||
conversation_id UUID REFERENCES chronicler_archive.conversations(id) ON DELETE CASCADE,
|
||||
sender VARCHAR(50) NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
embedding vector(1536),
|
||||
is_correction BOOLEAN DEFAULT FALSE,
|
||||
is_architecture BOOLEAN DEFAULT FALSE,
|
||||
is_dead_end BOOLEAN DEFAULT FALSE,
|
||||
is_security BOOLEAN DEFAULT FALSE,
|
||||
is_context_boundary BOOLEAN DEFAULT FALSE,
|
||||
created_at TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_archive_messages_conversation ON chronicler_archive.messages(conversation_id);
|
||||
CREATE INDEX idx_archive_messages_corrections ON chronicler_archive.messages(is_correction) WHERE is_correction = TRUE;
|
||||
CREATE INDEX idx_archive_embedding ON chronicler_archive.messages USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
|
||||
|
||||
CREATE TABLE chronicler_archive.correction_clusters (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
representative_content TEXT NOT NULL,
|
||||
frequency INT DEFAULT 1,
|
||||
last_seen TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
message_ids UUID[] DEFAULT '{}'
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## New Service: src/services/chroniclerArchive.js
|
||||
|
||||
- Voyage-3 embeddings via REST (`https://api.voyageai.com/v1/embeddings`, model `voyage-3`)
|
||||
- Text chunking (max 2000 chars, split on paragraph boundaries)
|
||||
- Chronicler number extraction (regex: `Chronicler #(\d+)`)
|
||||
- Pattern detection flags:
|
||||
- `is_correction`: "that's wrong", "actually,", "correction:", "let me correct", "incorrect", "you made an error"
|
||||
- `is_architecture`: "architecture locked", "we decided", "Gemini confirmed", "locked in"
|
||||
- `is_dead_end`: "let's revert", "that didn't work", "rolling back", "scrap that"
|
||||
- `is_security`: "frostwall", "iptables", "ufw", "port forwarding"
|
||||
- `is_context_boundary`: "session-handoff", "approaching context", "end of session"
|
||||
- Idempotent ingestion (skip by UUID)
|
||||
- Cosine similarity search
|
||||
|
||||
---
|
||||
|
||||
## Routes: src/routes/admin/archive.js
|
||||
|
||||
- `GET /admin/archive` — search UI + recent conversations + stats
|
||||
- `POST /admin/archive/ingest` — ZIP upload, extract conversations.json, chunk + embed + insert
|
||||
- `GET /admin/archive/search?q=...&limit=5` — semantic search
|
||||
- `GET /api/archive/search?q=...&limit=5` — bearer-token version for MCP use
|
||||
- `GET /admin/archive/corrections` — Dad Mode anti-pattern view
|
||||
- `GET /admin/archive/conversations` — all imported sessions
|
||||
|
||||
---
|
||||
|
||||
## Views
|
||||
|
||||
- `src/views/admin/archive/index.ejs` — search box, results with context, stats
|
||||
- `src/views/admin/archive/ingest.ejs` — drag-and-drop zone, progress, last ingest
|
||||
- `src/views/admin/archive/corrections.ejs` — correction clusters by frequency
|
||||
|
||||
---
|
||||
|
||||
## Dad Mode Clustering (pg-boss cron, Sundays 3AM)
|
||||
|
||||
Queue: `archive-cluster`
|
||||
- Fetch all `is_correction = TRUE` messages
|
||||
- Pair-wise cosine similarity > 0.85 → same cluster
|
||||
- Upsert `correction_clusters` with frequency count
|
||||
|
||||
---
|
||||
|
||||
## .env additions
|
||||
|
||||
```
|
||||
VOYAGE_API_KEY= # From console.anthropic.com
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## package.json additions
|
||||
|
||||
```json
|
||||
"adm-zip": "^0.5.10"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files to Create/Modify
|
||||
|
||||
New: `migrations/146_chronicler_archive.sql`, `src/services/chroniclerArchive.js`, `src/routes/admin/archive.js`, `src/views/admin/archive/index.ejs`, `src/views/admin/archive/ingest.ejs`, `src/views/admin/archive/corrections.ejs`
|
||||
|
||||
Modify: `src/routes/admin/index.js`, `src/routes/api.js`, `src/views/layout.ejs`, `src/index.js`, `package.json`, `.env.example`
|
||||
|
||||
---
|
||||
|
||||
## Deploy Notes
|
||||
|
||||
1. Michael adds `VOYAGE_API_KEY` to `.env` (from console.anthropic.com)
|
||||
2. Check if pgvector installed: `psql -c "CREATE EXTENSION vector;" arbiter_db` — if fails, run `apt-get install postgresql-16-pgvector` first
|
||||
3. Run migration 146
|
||||
4. `npm install` (adm-zip)
|
||||
5. Standard file copy + restart
|
||||
6. First ingest: download export from claude.ai settings → drag-drop to `/admin/archive/ingest`
|
||||
Reference in New Issue
Block a user