feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance

secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:05:11 +01:00
parent 7a2189fa21
commit 87f3a007c9
36 changed files with 13450 additions and 6 deletions
--- a/engineering/sql-database-assistant/references/optimization_guide.md
+++ b/engineering/sql-database-assistant/references/optimization_guide.md
@@ -0,0 +1,330 @@
+# Query Optimization Guide
+
+How to read EXPLAIN plans, choose the right index types, understand query plan operators, and configure connection pooling.
+
+---
+
+## Reading EXPLAIN Plans
+
+### PostgreSQL — EXPLAIN ANALYZE
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Sample output:**
+```
+Limit  (cost=0.43..12.87 rows=20 width=128) (actual time=0.052..0.089 rows=20 loops=1)
+  ->  Index Scan Backward using idx_orders_status_created on orders  (cost=0.43..4521.33 rows=7284 width=128) (actual time=0.051..0.085 rows=20 loops=1)
+        Index Cond: (status = 'paid')
+        Buffers: shared hit=4
+Planning Time: 0.156 ms
+Execution Time: 0.112 ms
+```
+
+**Key fields to check:**
+
+| Field | What it tells you |
+|-------|-------------------|
+| `cost` | Estimated startup..total cost (arbitrary units) |
+| `rows` | Estimated row count at that node |
+| `actual time` | Real wall-clock time in milliseconds |
+| `actual rows` | Real row count — compare against estimate |
+| `Buffers: shared hit` | Pages read from cache (good) |
+| `Buffers: shared read` | Pages read from disk (slow) |
+| `loops` | How many times the node executed |
+
+**Red flags:**
+- `Seq Scan` on a large table with a WHERE clause — missing index
+- `actual rows` >> `rows` (estimated) — stale statistics, run `ANALYZE`
+- `Nested Loop` with high loop count — consider hash join or add index
+- `Sort` with `external merge` — not enough `work_mem`, spilling to disk
+- `Buffers: shared read` much higher than `shared hit` — cold cache or table too large for memory
+
+### MySQL — EXPLAIN FORMAT=JSON
+
+```sql
+EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Key fields:**
+- `query_block.select_id` — identifies subqueries
+- `table.access_type` — `ALL` (full scan), `ref` (index lookup), `range`, `index`, `const`
+- `table.rows_examined_per_scan` — how many rows the engine reads
+- `table.using_index` — covering index (no table lookup needed)
+- `table.attached_condition` — the WHERE filter applied
+
+**Access types ranked (best to worst):**
+`system` > `const` > `eq_ref` > `ref` > `range` > `index` > `ALL`
+
+---
+
+## Index Types
+
+### B-tree (default)
+
+The workhorse index. Supports equality, range, prefix, and ORDER BY operations.
+
+**Best for:** `=`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`, `MIN()`, `MAX()`
+
+```sql
+CREATE INDEX idx_orders_created ON orders (created_at);
+```
+
+**Composite B-tree:** Column order matters. The index is useful for queries that filter on a leftmost prefix of the indexed columns.
+
+```sql
+-- This index serves: WHERE status = ... AND created_at > ...
+-- Also serves: WHERE status = ...
+-- Does NOT serve: WHERE created_at > ... (without status)
+CREATE INDEX idx_orders_status_created ON orders (status, created_at);
+```
+
+### Hash
+
+Equality-only lookups. Faster than B-tree for exact matches but no range support.
+
+**Best for:** `=` lookups on high-cardinality columns
+
+```sql
+-- PostgreSQL
+CREATE INDEX idx_sessions_token ON sessions USING hash (token);
+```
+
+**Limitations:** No range queries, no ORDER BY, not WAL-logged before PostgreSQL 10.
+
+### GIN (Generalized Inverted Index)
+
+For multi-valued data: arrays, JSONB, full-text search vectors.
+
+```sql
+-- JSONB containment
+CREATE INDEX idx_products_tags ON products USING gin (tags);
+-- Query: SELECT * FROM products WHERE tags @> '["sale"]';
+
+-- Full-text search
+CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
+```
+
+### GiST (Generalized Search Tree)
+
+For geometric, range, and proximity data.
+
+```sql
+-- Range type (e.g., date ranges)
+CREATE INDEX idx_bookings_period ON bookings USING gist (during);
+-- Query: SELECT * FROM bookings WHERE during && '[2025-01-01, 2025-01-31]';
+
+-- PostGIS geometry
+CREATE INDEX idx_locations_geom ON locations USING gist (geom);
+```
+
+### BRIN (Block Range INdex)
+
+Tiny index for naturally ordered data (e.g., time-series append-only tables).
+
+```sql
+CREATE INDEX idx_events_created ON events USING brin (created_at);
+```
+
+**Best for:** Large tables where the indexed column correlates with physical row order. Much smaller than B-tree but less precise.
+
+### Partial Index
+
+Index only rows matching a condition. Smaller and faster for targeted queries.
+
+```sql
+-- Only index active users (skip millions of inactive)
+CREATE INDEX idx_users_active_email ON users (email) WHERE status = 'active';
+```
+
+### Covering Index (INCLUDE)
+
+Store extra columns in the index to avoid table lookups (index-only scans).
+
+```sql
+-- PostgreSQL 11+
+CREATE INDEX idx_orders_status ON orders (status) INCLUDE (total, created_at);
+-- Query can be answered entirely from the index:
+-- SELECT total, created_at FROM orders WHERE status = 'paid';
+```
+
+### Expression Index
+
+Index the result of a function or expression.
+
+```sql
+CREATE INDEX idx_users_lower_email ON users (LOWER(email));
+-- Query: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
+```
+
+---
+
+## Query Plan Operators
+
+### Scan operators
+
+| Operator | Description | Performance |
+|----------|-------------|-------------|
+| **Seq Scan** | Full table scan, reads every row | Slow on large tables |
+| **Index Scan** | B-tree lookup + table fetch | Fast for selective queries |
+| **Index Only Scan** | Reads only the index (covering) | Fastest for covered queries |
+| **Bitmap Index Scan** | Builds a bitmap of matching pages | Good for medium selectivity |
+| **Bitmap Heap Scan** | Fetches pages identified by bitmap | Pairs with bitmap index scan |
+
+### Join operators
+
+| Operator | Description | Best when |
+|----------|-------------|-----------|
+| **Nested Loop** | For each outer row, scan inner | Small outer set, indexed inner |
+| **Hash Join** | Build hash table on inner, probe with outer | Medium-large sets, no index |
+| **Merge Join** | Merge two sorted inputs | Both inputs already sorted |
+
+### Other operators
+
+| Operator | Description |
+|----------|-------------|
+| **Sort** | Sorts rows (may spill to disk if work_mem exceeded) |
+| **Hash Aggregate** | GROUP BY using hash table |
+| **Group Aggregate** | GROUP BY on pre-sorted input |
+| **Limit** | Stops after N rows |
+| **Materialize** | Caches subquery results in memory |
+| **Gather / Gather Merge** | Collects results from parallel workers |
+
+---
+
+## Connection Pooling
+
+### Why pool connections?
+
+Each database connection consumes memory (5-10 MB in PostgreSQL). Without pooling:
+- Application creates a new connection per request (slow: TCP + TLS + auth)
+- Under load, connection count spikes past `max_connections`
+- Database OOM or connection refused errors
+
+### PgBouncer (PostgreSQL)
+
+The standard external connection pooler for PostgreSQL.
+
+**Modes:**
+- **Session** — connection assigned for entire client session (safest, least efficient)
+- **Transaction** — connection returned to pool after each transaction (recommended)
+- **Statement** — connection returned after each statement (cannot use transactions)
+
+```ini
+# pgbouncer.ini
+[databases]
+mydb = host=127.0.0.1 port=5432 dbname=mydb
+
+[pgbouncer]
+pool_mode = transaction
+max_client_conn = 200
+default_pool_size = 20
+min_pool_size = 5
+reserve_pool_size = 5
+reserve_pool_timeout = 3
+server_idle_timeout = 300
+```
+
+**Sizing formula:**
+```
+default_pool_size = num_cpu_cores * 2 + effective_spindle_count
+```
+For SSDs, start with `num_cpu_cores * 2` (typically 4-16 connections is optimal).
+
+### ProxySQL (MySQL)
+
+```ini
+mysql_servers = ({ address="127.0.0.1", port=3306, hostgroup=0, max_connections=100 })
+mysql_query_rules = ({ rule_id=1, match_pattern="^SELECT.*FOR UPDATE", destination_hostgroup=0 })
+```
+
+### Application-Level Pooling
+
+Most ORMs and drivers include built-in pooling:
+
+| Platform | Pool Configuration |
+|----------|--------------------|
+| **node-postgres** | `new Pool({ max: 20, idleTimeoutMillis: 30000 })` |
+| **SQLAlchemy** | `create_engine(url, pool_size=20, max_overflow=5)` |
+| **HikariCP (Java)** | `maximumPoolSize=20, minimumIdle=5, idleTimeout=300000` |
+| **Prisma** | `connection_limit=20` in connection string |
+
+### Pool Sizing Guidelines
+
+| Metric | Guideline |
+|--------|-----------|
+| **Minimum** | Number of always-active background workers |
+| **Maximum** | 2-4x CPU cores for OLTP; lower for OLAP |
+| **Idle timeout** | 30-300 seconds (reclaim unused connections) |
+| **Connection timeout** | 3-10 seconds (fail fast under pressure) |
+| **Queue size** | 2-5x pool max (buffer bursts before rejecting) |
+
+**Warning:** More connections does not mean better performance. Beyond the optimal point (usually 20-50), contention on locks, CPU, and I/O causes throughput to decrease.
+
+---
+
+## Statistics and Maintenance
+
+### PostgreSQL
+```sql
+-- Update statistics for the query planner
+ANALYZE orders;
+ANALYZE;  -- All tables
+
+-- Check table bloat and dead tuples
+SELECT relname, n_dead_tup, last_autovacuum, last_autoanalyze
+FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
+
+-- Identify unused indexes
+SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
+FROM pg_stat_user_indexes
+WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
+ORDER BY pg_relation_size(indexrelid) DESC;
+```
+
+### MySQL
+```sql
+-- Update statistics
+ANALYZE TABLE orders;
+
+-- Check index usage
+SELECT * FROM sys.schema_unused_indexes;
+SELECT * FROM sys.schema_redundant_indexes;
+
+-- Identify long-running queries
+SELECT * FROM information_schema.processlist WHERE time > 10;
+```
+
+---
+
+## Performance Checklist
+
+Before deploying any query to production:
+
+1. Run `EXPLAIN ANALYZE` and verify no unexpected sequential scans
+2. Check that estimated rows are within 10x of actual rows
+3. Verify index usage on all WHERE, JOIN, and ORDER BY columns
+4. Ensure LIMIT is present for user-facing list queries
+5. Confirm parameterized queries (no string concatenation)
+6. Test with production-like data volume (not just 10 rows)
+7. Monitor query time in application metrics after deployment
+8. Set up slow query log alerting (> 100ms for OLTP, > 5s for reports)
+
+---
+
+## Quick Reference: When to Use Which Index
+
+| Query Pattern | Index Type |
+|--------------|-----------|
+| `WHERE col = value` | B-tree or Hash |
+| `WHERE col > value` | B-tree |
+| `WHERE col LIKE 'prefix%'` | B-tree |
+| `WHERE col LIKE '%substring%'` | GIN (full-text) or trigram |
+| `WHERE jsonb_col @> '{...}'` | GIN |
+| `WHERE array_col && ARRAY[...]` | GIN |
+| `WHERE range_col && '[a,b]'` | GiST |
+| `WHERE ST_DWithin(geom, ...)` | GiST |
+| `WHERE col = value` (append-only) | BRIN |
+| `WHERE col = value AND status = 'active'` | Partial B-tree |
+| `SELECT a, b WHERE c = value` | Covering (INCLUDE) |
--- a/engineering/sql-database-assistant/references/orm_patterns.md
+++ b/engineering/sql-database-assistant/references/orm_patterns.md
@@ -0,0 +1,451 @@
+# ORM Patterns Reference
+
+Side-by-side comparison of Prisma, Drizzle, TypeORM, and SQLAlchemy patterns for common database operations.
+
+---
+
+## Schema Definition
+
+### Prisma (schema.prisma)
+```prisma
+model User {
+  id        Int      @id @default(autoincrement())
+  email     String   @unique
+  name      String?
+  role      Role     @default(USER)
+  posts     Post[]
+  profile   Profile?
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  @@index([email])
+  @@map("users")
+}
+
+model Post {
+  id        Int      @id @default(autoincrement())
+  title     String
+  body      String?
+  published Boolean  @default(false)
+  author    User     @relation(fields: [authorId], references: [id], onDelete: Cascade)
+  authorId  Int
+  tags      Tag[]
+  createdAt DateTime @default(now())
+
+  @@index([authorId])
+  @@index([published, createdAt])
+  @@map("posts")
+}
+
+enum Role {
+  USER
+  ADMIN
+  MODERATOR
+}
+```
+
+### Drizzle (schema.ts)
+```typescript
+import { pgTable, serial, varchar, text, boolean, timestamp, integer, pgEnum } from 'drizzle-orm/pg-core';
+
+export const roleEnum = pgEnum('role', ['USER', 'ADMIN', 'MODERATOR']);
+
+export const users = pgTable('users', {
+  id: serial('id').primaryKey(),
+  email: varchar('email', { length: 255 }).notNull().unique(),
+  name: varchar('name', { length: 255 }),
+  role: roleEnum('role').default('USER').notNull(),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+  updatedAt: timestamp('updated_at').defaultNow().notNull(),
+});
+
+export const posts = pgTable('posts', {
+  id: serial('id').primaryKey(),
+  title: varchar('title', { length: 255 }).notNull(),
+  body: text('body'),
+  published: boolean('published').default(false).notNull(),
+  authorId: integer('author_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+}, (table) => ({
+  authorIdx: index('idx_posts_author').on(table.authorId),
+  publishedIdx: index('idx_posts_published').on(table.published, table.createdAt),
+}));
+```
+
+### TypeORM (entities)
+```typescript
+import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, OneToMany, CreateDateColumn, UpdateDateColumn, Index } from 'typeorm';
+
+export enum Role { USER = 'USER', ADMIN = 'ADMIN', MODERATOR = 'MODERATOR' }
+
+@Entity('users')
+export class User {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column({ unique: true })
+  @Index()
+  email: string;
+
+  @Column({ nullable: true })
+  name: string;
+
+  @Column({ type: 'enum', enum: Role, default: Role.USER })
+  role: Role;
+
+  @OneToMany(() => Post, post => post.author)
+  posts: Post[];
+
+  @CreateDateColumn()
+  createdAt: Date;
+
+  @UpdateDateColumn()
+  updatedAt: Date;
+}
+
+@Entity('posts')
+@Index(['published', 'createdAt'])
+export class Post {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column()
+  title: string;
+
+  @Column({ nullable: true, type: 'text' })
+  body: string;
+
+  @Column({ default: false })
+  published: boolean;
+
+  @ManyToOne(() => User, user => user.posts, { onDelete: 'CASCADE' })
+  author: User;
+
+  @Column()
+  authorId: number;
+
+  @CreateDateColumn()
+  createdAt: Date;
+}
+```
+
+### SQLAlchemy (models.py)
+```python
+import enum
+from datetime import datetime
+from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, Enum, ForeignKey, Index
+from sqlalchemy.orm import relationship, DeclarativeBase
+
+class Base(DeclarativeBase):
+    pass
+
+class Role(enum.Enum):
+    USER = "USER"
+    ADMIN = "ADMIN"
+    MODERATOR = "MODERATOR"
+
+class User(Base):
+    __tablename__ = 'users'
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    email = Column(String(255), unique=True, nullable=False, index=True)
+    name = Column(String(255), nullable=True)
+    role = Column(Enum(Role), default=Role.USER, nullable=False)
+    posts = relationship('Post', back_populates='author', cascade='all, delete-orphan')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
+
+class Post(Base):
+    __tablename__ = 'posts'
+    __table_args__ = (
+        Index('idx_posts_published', 'published', 'created_at'),
+    )
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    title = Column(String(255), nullable=False)
+    body = Column(Text, nullable=True)
+    published = Column(Boolean, default=False, nullable=False)
+    author_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False, index=True)
+    author = relationship('User', back_populates='posts')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+```
+
+---
+
+## CRUD Operations
+
+### Create
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.create({ data: { email, name } })` |
+| **Drizzle** | `await db.insert(users).values({ email, name }).returning()` |
+| **TypeORM** | `await userRepo.save(userRepo.create({ email, name }))` |
+| **SQLAlchemy** | `session.add(User(email=email, name=name)); session.commit()` |
+
+### Read (with filter)
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.findMany({ where: { role: 'ADMIN' }, orderBy: { createdAt: 'desc' } })` |
+| **Drizzle** | `await db.select().from(users).where(eq(users.role, 'ADMIN')).orderBy(desc(users.createdAt))` |
+| **TypeORM** | `await userRepo.find({ where: { role: Role.ADMIN }, order: { createdAt: 'DESC' } })` |
+| **SQLAlchemy** | `session.query(User).filter(User.role == Role.ADMIN).order_by(User.created_at.desc()).all()` |
+
+### Update
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.update({ where: { id }, data: { name } })` |
+| **Drizzle** | `await db.update(users).set({ name }).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.update(id, { name })` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).update({User.name: name}); session.commit()` |
+
+### Delete
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.delete({ where: { id } })` |
+| **Drizzle** | `await db.delete(users).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.delete(id)` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).delete(); session.commit()` |
+
+---
+
+## Relations and Eager Loading
+
+### Prisma — include / select
+```typescript
+// Eager load posts with user
+const user = await prisma.user.findUnique({
+  where: { id: 1 },
+  include: { posts: { where: { published: true }, orderBy: { createdAt: 'desc' } } },
+});
+
+// Nested create
+await prisma.user.create({
+  data: {
+    email: 'new@example.com',
+    posts: { create: [{ title: 'First post' }] },
+  },
+});
+```
+
+### Drizzle — relational queries
+```typescript
+const result = await db.query.users.findFirst({
+  where: eq(users.id, 1),
+  with: { posts: { where: eq(posts.published, true), orderBy: [desc(posts.createdAt)] } },
+});
+```
+
+### TypeORM — relations / query builder
+```typescript
+// FindOptions
+const user = await userRepo.findOne({ where: { id: 1 }, relations: ['posts'] });
+
+// QueryBuilder for complex joins
+const result = await userRepo.createQueryBuilder('u')
+  .leftJoinAndSelect('u.posts', 'p', 'p.published = :pub', { pub: true })
+  .where('u.id = :id', { id: 1 })
+  .getOne();
+```
+
+### SQLAlchemy — joinedload / selectinload
+```python
+from sqlalchemy.orm import joinedload, selectinload
+
+# Eager load in one JOIN query
+user = session.query(User).options(joinedload(User.posts)).filter(User.id == 1).first()
+
+# Eager load in a separate IN query (better for collections)
+users = session.query(User).options(selectinload(User.posts)).all()
+```
+
+---
+
+## Raw SQL Escape Hatches
+
+Every ORM should provide a way to execute raw SQL for complex queries:
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `` prisma.$queryRaw`SELECT * FROM users WHERE id = ${id}` `` |
+| **Drizzle** | `db.execute(sql`SELECT * FROM users WHERE id = ${id}`)` |
+| **TypeORM** | `dataSource.query('SELECT * FROM users WHERE id = $1', [id])` |
+| **SQLAlchemy** | `session.execute(text('SELECT * FROM users WHERE id = :id'), {'id': id})` |
+
+Always use parameterized queries in raw SQL to prevent injection.
+
+---
+
+## Transaction Patterns
+
+### Prisma
+```typescript
+await prisma.$transaction(async (tx) => {
+  const user = await tx.user.create({ data: { email } });
+  await tx.post.create({ data: { title: 'Welcome', authorId: user.id } });
+});
+```
+
+### Drizzle
+```typescript
+await db.transaction(async (tx) => {
+  const [user] = await tx.insert(users).values({ email }).returning();
+  await tx.insert(posts).values({ title: 'Welcome', authorId: user.id });
+});
+```
+
+### TypeORM
+```typescript
+await dataSource.transaction(async (manager) => {
+  const user = await manager.save(User, { email });
+  await manager.save(Post, { title: 'Welcome', authorId: user.id });
+});
+```
+
+### SQLAlchemy
+```python
+with Session() as session:
+    try:
+        user = User(email=email)
+        session.add(user)
+        session.flush()  # Get user.id without committing
+        session.add(Post(title='Welcome', author_id=user.id))
+        session.commit()
+    except Exception:
+        session.rollback()
+        raise
+```
+
+---
+
+## Migration Workflows
+
+### Prisma
+```bash
+# Generate migration from schema changes
+npx prisma migrate dev --name add_posts_table
+
+# Apply in production
+npx prisma migrate deploy
+
+# Reset database (dev only)
+npx prisma migrate reset
+
+# Generate client after schema change
+npx prisma generate
+```
+
+**Files:** `prisma/migrations/<timestamp>_<name>/migration.sql`
+
+### Drizzle
+```bash
+# Generate migration SQL from schema diff
+npx drizzle-kit generate:pg
+
+# Push schema directly (dev only, no migration files)
+npx drizzle-kit push:pg
+
+# Apply migrations
+npx drizzle-kit migrate
+```
+
+**Files:** `drizzle/<timestamp>_<name>.sql`
+
+### TypeORM
+```bash
+# Auto-generate migration from entity changes
+npx typeorm migration:generate -d data-source.ts -n AddPostsTable
+
+# Create empty migration
+npx typeorm migration:create -n CustomMigration
+
+# Run pending migrations
+npx typeorm migration:run -d data-source.ts
+
+# Revert last migration
+npx typeorm migration:revert -d data-source.ts
+```
+
+**Files:** `src/migrations/<timestamp>-<Name>.ts`
+
+### SQLAlchemy (Alembic)
+```bash
+# Initialize Alembic
+alembic init alembic
+
+# Auto-generate migration from model changes
+alembic revision --autogenerate -m "add posts table"
+
+# Apply all pending
+alembic upgrade head
+
+# Revert one step
+alembic downgrade -1
+
+# Show current state
+alembic current
+```
+
+**Files:** `alembic/versions/<hash>_<slug>.py`
+
+---
+
+## N+1 Prevention Cheat Sheet
+
+| ORM | Lazy (N+1 risk) | Eager (fixed) |
+|-----|-----------------|---------------|
+| **Prisma** | Not accessing `include` | `include: { posts: true }` |
+| **Drizzle** | Separate queries | `with: { posts: true }` |
+| **TypeORM** | `@ManyToOne(() => ..., { lazy: true })` | `relations: ['posts']` or `leftJoinAndSelect` |
+| **SQLAlchemy** | Default `lazy='select'` | `joinedload()` or `selectinload()` |
+
+**Rule of thumb:** If you access a relation inside a loop, you have an N+1 problem. Always load relations before the loop.
+
+---
+
+## Connection Pooling
+
+### Prisma
+```
+# In .env or connection string
+DATABASE_URL="postgresql://user:pass@host/db?connection_limit=20&pool_timeout=10"
+```
+
+### Drizzle (with node-postgres)
+```typescript
+import { Pool } from 'pg';
+const pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000 });
+const db = drizzle(pool);
+```
+
+### TypeORM
+```typescript
+const dataSource = new DataSource({
+  type: 'postgres',
+  extra: { max: 20, idleTimeoutMillis: 30000 },
+});
+```
+
+### SQLAlchemy
+```python
+from sqlalchemy import create_engine
+engine = create_engine('postgresql://user:pass@host/db', pool_size=20, max_overflow=5, pool_timeout=30)
+```
+
+---
+
+## Best Practices Summary
+
+1. **Always use migrations** — never modify production schemas by hand
+2. **Eager load relations** — prevent N+1 in every list/collection query
+3. **Use transactions** — group related writes to maintain consistency
+4. **Parameterize raw SQL** — never concatenate user input into queries
+5. **Connection pooling** — configure pool size matching your workload
+6. **Index foreign keys** — ORMs often skip this; add manually if needed
+7. **Review generated SQL** — enable query logging in development to catch inefficiencies
+8. **Type-safe queries** — leverage TypeScript/Python typing for compile-time checks
+9. **Separate read/write models** — use views or read replicas for heavy reporting queries
+10. **Test migrations both ways** — always verify that down migrations actually reverse up migrations
--- a/engineering/sql-database-assistant/references/query_patterns.md
+++ b/engineering/sql-database-assistant/references/query_patterns.md
@@ -0,0 +1,406 @@
+# SQL Query Patterns Reference
+
+Common query patterns for everyday database operations. All examples use PostgreSQL syntax with dialect notes where they differ.
+
+---
+
+## JOIN Patterns
+
+### INNER JOIN — matching rows in both tables
+```sql
+SELECT u.name, o.id AS order_id, o.total
+FROM users u
+INNER JOIN orders o ON o.user_id = u.id
+WHERE o.status = 'paid';
+```
+
+### LEFT JOIN — all rows from left, matching from right
+```sql
+SELECT u.name, COUNT(o.id) AS order_count
+FROM users u
+LEFT JOIN orders o ON o.user_id = u.id
+GROUP BY u.id, u.name;
+```
+Returns users even if they have zero orders.
+
+### Self JOIN — comparing rows within the same table
+```sql
+-- Find employees who earn more than their manager
+SELECT e.name AS employee, m.name AS manager, e.salary, m.salary AS manager_salary
+FROM employees e
+JOIN employees m ON e.manager_id = m.id
+WHERE e.salary > m.salary;
+```
+
+### CROSS JOIN — every combination (cartesian product)
+```sql
+-- Generate a calendar grid
+SELECT d.date, s.shift_name
+FROM dates d
+CROSS JOIN shifts s;
+```
+Use intentionally. Accidental cartesian joins are a performance killer.
+
+### LATERAL JOIN (PostgreSQL) — correlated subquery as a table
+```sql
+-- Top 3 orders per user
+SELECT u.name, top_orders.*
+FROM users u
+CROSS JOIN LATERAL (
+  SELECT id, total FROM orders
+  WHERE user_id = u.id
+  ORDER BY total DESC LIMIT 3
+) top_orders;
+```
+MySQL equivalent: use a subquery with `ROW_NUMBER()`.
+
+---
+
+## Common Table Expressions (CTEs)
+
+### Basic CTE — readable subquery
+```sql
+WITH active_users AS (
+  SELECT id, name, email
+  FROM users
+  WHERE last_login > CURRENT_DATE - INTERVAL '30 days'
+)
+SELECT au.name, COUNT(o.id) AS recent_orders
+FROM active_users au
+JOIN orders o ON o.user_id = au.id
+GROUP BY au.name;
+```
+
+### Multiple CTEs — chaining transformations
+```sql
+WITH monthly_revenue AS (
+  SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue
+  FROM orders WHERE status = 'paid'
+  GROUP BY 1
+),
+growth AS (
+  SELECT month, revenue,
+    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
+    ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
+  FROM monthly_revenue
+)
+SELECT * FROM growth ORDER BY month;
+```
+
+### Recursive CTE — hierarchical data
+```sql
+-- Organization tree
+WITH RECURSIVE org_tree AS (
+  -- Base case: top-level managers
+  SELECT id, name, manager_id, 0 AS depth
+  FROM employees WHERE manager_id IS NULL
+
+  UNION ALL
+
+  -- Recursive case: subordinates
+  SELECT e.id, e.name, e.manager_id, ot.depth + 1
+  FROM employees e
+  JOIN org_tree ot ON e.manager_id = ot.id
+)
+SELECT * FROM org_tree ORDER BY depth, name;
+```
+
+### Recursive CTE — path traversal
+```sql
+-- Category breadcrumb
+WITH RECURSIVE breadcrumb AS (
+  SELECT id, name, parent_id, name::TEXT AS path
+  FROM categories WHERE id = 42
+
+  UNION ALL
+
+  SELECT c.id, c.name, c.parent_id, c.name || ' > ' || b.path
+  FROM categories c
+  JOIN breadcrumb b ON c.id = b.parent_id
+)
+SELECT path FROM breadcrumb WHERE parent_id IS NULL;
+```
+
+---
+
+## Window Functions
+
+### ROW_NUMBER — assign unique rank per partition
+```sql
+SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
+FROM employees;
+```
+
+### RANK and DENSE_RANK — handle ties
+```sql
+-- RANK: 1, 2, 2, 4 (skips after tie)
+-- DENSE_RANK: 1, 2, 2, 3 (no skip)
+SELECT name, salary,
+  RANK() OVER (ORDER BY salary DESC) AS rank,
+  DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
+FROM employees;
+```
+
+### Running total and moving average
+```sql
+SELECT date, amount,
+  SUM(amount) OVER (ORDER BY date) AS running_total,
+  AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
+FROM daily_revenue;
+```
+
+### LAG / LEAD — access adjacent rows
+```sql
+SELECT date, revenue,
+  LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
+  revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day_change
+FROM daily_revenue;
+```
+
+### NTILE — divide into buckets
+```sql
+-- Split customers into quartiles by total spend
+SELECT customer_id, total_spend,
+  NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
+FROM customer_summary;
+```
+
+### FIRST_VALUE / LAST_VALUE
+```sql
+SELECT department_id, name, salary,
+  FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_paid
+FROM employees;
+```
+
+---
+
+## Subquery Patterns
+
+### EXISTS — correlated existence check
+```sql
+-- Users who have placed at least one order
+SELECT u.* FROM users u
+WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### NOT EXISTS — safer than NOT IN for NULLs
+```sql
+-- Users who have never ordered
+SELECT u.* FROM users u
+WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### Scalar subquery — single value
+```sql
+SELECT name, salary,
+  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
+FROM employees;
+```
+
+### Derived table — subquery in FROM
+```sql
+SELECT dept, avg_salary
+FROM (
+  SELECT department_id AS dept, AVG(salary) AS avg_salary
+  FROM employees GROUP BY department_id
+) dept_avg
+WHERE avg_salary > 100000;
+```
+
+---
+
+## Aggregation Patterns
+
+### GROUP BY with HAVING
+```sql
+-- Departments with more than 10 employees
+SELECT department_id, COUNT(*) AS headcount, AVG(salary) AS avg_salary
+FROM employees
+GROUP BY department_id
+HAVING COUNT(*) > 10;
+```
+
+### GROUPING SETS — multiple grouping levels
+```sql
+SELECT region, product_category, SUM(revenue)
+FROM sales
+GROUP BY GROUPING SETS (
+  (region, product_category),
+  (region),
+  (product_category),
+  ()
+);
+```
+
+### ROLLUP — hierarchical subtotals
+```sql
+SELECT region, city, SUM(revenue)
+FROM sales
+GROUP BY ROLLUP (region, city);
+-- Produces: (region, city), (region), ()
+```
+
+### CUBE — all combinations
+```sql
+SELECT region, product, SUM(revenue)
+FROM sales
+GROUP BY CUBE (region, product);
+```
+
+### FILTER clause (PostgreSQL) — conditional aggregation
+```sql
+SELECT
+  COUNT(*) AS total,
+  COUNT(*) FILTER (WHERE status = 'paid') AS paid,
+  COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
+  SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
+FROM orders;
+```
+MySQL/SQL Server equivalent: `SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)`.
+
+---
+
+## UPSERT Patterns
+
+### PostgreSQL — ON CONFLICT
+```sql
+INSERT INTO user_settings (user_id, key, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON CONFLICT (user_id, key)
+DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
+```
+
+### MySQL — ON DUPLICATE KEY
+```sql
+INSERT INTO user_settings (user_id, key_name, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
+```
+
+### SQL Server — MERGE
+```sql
+MERGE INTO user_settings AS target
+USING (VALUES (1, 'theme', 'dark')) AS source (user_id, key_name, value)
+ON target.user_id = source.user_id AND target.key_name = source.key_name
+WHEN MATCHED THEN UPDATE SET value = source.value, updated_at = GETDATE()
+WHEN NOT MATCHED THEN INSERT (user_id, key_name, value, updated_at)
+  VALUES (source.user_id, source.key_name, source.value, GETDATE());
+```
+
+---
+
+## JSON Operations
+
+### PostgreSQL JSONB
+```sql
+-- Extract field
+SELECT data->>'name' AS name FROM products WHERE data->>'category' = 'electronics';
+
+-- Array contains
+SELECT * FROM products WHERE data->'tags' ? 'sale';
+
+-- Update nested field
+UPDATE products SET data = jsonb_set(data, '{price}', '29.99') WHERE id = 1;
+
+-- Aggregate into JSON array
+SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM users;
+```
+
+### MySQL JSON
+```sql
+-- Extract field
+SELECT JSON_EXTRACT(data, '$.name') AS name FROM products;
+-- Shorthand: SELECT data->>"$.name"
+
+-- Search in array
+SELECT * FROM products WHERE JSON_CONTAINS(data->"$.tags", '"sale"');
+
+-- Update
+UPDATE products SET data = JSON_SET(data, '$.price', 29.99) WHERE id = 1;
+```
+
+---
+
+## Pagination Patterns
+
+### Offset pagination (simple but slow for deep pages)
+```sql
+SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
+```
+
+### Keyset pagination (fast, requires ordered unique column)
+```sql
+-- Page after the last seen id
+SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20;
+```
+
+### Keyset with composite sort
+```sql
+SELECT * FROM products
+WHERE (created_at, id) < (:last_created_at, :last_id)
+ORDER BY created_at DESC, id DESC
+LIMIT 20;
+```
+
+---
+
+## Bulk Operations
+
+### Batch INSERT
+```sql
+INSERT INTO events (type, payload, created_at) VALUES
+  ('click', '{"page": "/home"}', NOW()),
+  ('view', '{"page": "/pricing"}', NOW()),
+  ('click', '{"page": "/signup"}', NOW());
+```
+
+### Batch UPDATE with VALUES
+```sql
+UPDATE products AS p SET price = v.price
+FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, price)
+WHERE p.id = v.id;
+```
+
+### DELETE with subquery
+```sql
+DELETE FROM sessions
+WHERE user_id IN (SELECT id FROM users WHERE deleted_at IS NOT NULL);
+```
+
+### COPY (PostgreSQL bulk load)
+```sql
+COPY products (name, price, category) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
+```
+
+---
+
+## Utility Patterns
+
+### Generate series (PostgreSQL)
+```sql
+-- Fill date gaps
+SELECT d::date FROM generate_series('2025-01-01'::date, '2025-12-31', '1 day') d;
+```
+
+### Deduplicate rows
+```sql
+DELETE FROM events a USING events b
+WHERE a.id > b.id AND a.user_id = b.user_id AND a.event_type = b.event_type
+  AND a.created_at = b.created_at;
+```
+
+### Pivot (manual)
+```sql
+SELECT user_id,
+  SUM(CASE WHEN month = 1 THEN revenue END) AS jan,
+  SUM(CASE WHEN month = 2 THEN revenue END) AS feb,
+  SUM(CASE WHEN month = 3 THEN revenue END) AS mar
+FROM monthly_revenue
+GROUP BY user_id;
+```
+
+### Conditional INSERT (skip if exists)
+```sql
+INSERT INTO tags (name) SELECT 'new-tag'
+WHERE NOT EXISTS (SELECT 1 FROM tags WHERE name = 'new-tag');
+```