feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance

secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 14:05:11 +01:00
parent 7a2189fa21
commit 87f3a007c9
36 changed files with 13450 additions and 6 deletions
--- a/engineering/sql-database-assistant/SKILL.md
+++ b/engineering/sql-database-assistant/SKILL.md
@@ -0,0 +1,457 @@
+---
+name: "sql-database-assistant"
+description: "Use when the user asks to write SQL queries, optimize database performance, generate migrations, explore database schemas, or work with ORMs like Prisma, Drizzle, TypeORM, or SQLAlchemy."
+---
+
+# SQL Database Assistant - POWERFUL Tier Skill
+
+## Overview
+
+The operational companion to database design. While **database-designer** focuses on schema architecture and **database-schema-designer** handles ERD modeling, this skill covers the day-to-day: writing queries, optimizing performance, generating migrations, and bridging the gap between application code and database engines.
+
+### Core Capabilities
+
+- **Natural Language to SQL** — translate requirements into correct, performant queries
+- **Schema Exploration** — introspect live databases across PostgreSQL, MySQL, SQLite, SQL Server
+- **Query Optimization** — EXPLAIN analysis, index recommendations, N+1 detection, rewrite patterns
+- **Migration Generation** — up/down scripts, zero-downtime strategies, rollback plans
+- **ORM Integration** — Prisma, Drizzle, TypeORM, SQLAlchemy patterns and escape hatches
+- **Multi-Database Support** — dialect-aware SQL with compatibility guidance
+
+### Tools
+
+| Script | Purpose |
+|--------|---------|
+| `scripts/query_optimizer.py` | Static analysis of SQL queries for performance issues |
+| `scripts/migration_generator.py` | Generate migration file templates from change descriptions |
+| `scripts/schema_explorer.py` | Generate schema documentation from introspection queries |
+
+---
+
+## Natural Language to SQL
+
+### Translation Patterns
+
+When converting requirements to SQL, follow this sequence:
+
+1. **Identify entities** — map nouns to tables
+2. **Identify relationships** — map verbs to JOINs or subqueries
+3. **Identify filters** — map adjectives/conditions to WHERE clauses
+4. **Identify aggregations** — map "total", "average", "count" to GROUP BY
+5. **Identify ordering** — map "top", "latest", "highest" to ORDER BY + LIMIT
+
+### Common Query Templates
+
+**Top-N per group (window function)**
+```sql
+SELECT * FROM (
+  SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
+  FROM employees
+) ranked WHERE rn <= 3;
+```
+
+**Running totals**
+```sql
+SELECT date, amount,
+  SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
+FROM transactions;
+```
+
+**Gap detection**
+```sql
+SELECT curr.id, curr.seq_num, prev.seq_num AS prev_seq
+FROM records curr
+LEFT JOIN records prev ON prev.seq_num = curr.seq_num - 1
+WHERE prev.id IS NULL AND curr.seq_num > 1;
+```
+
+**UPSERT (PostgreSQL)**
+```sql
+INSERT INTO settings (key, value, updated_at)
+VALUES ('theme', 'dark', NOW())
+ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
+```
+
+**UPSERT (MySQL)**
+```sql
+INSERT INTO settings (key_name, value, updated_at)
+VALUES ('theme', 'dark', NOW())
+ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
+```
+
+> See references/query_patterns.md for JOINs, CTEs, window functions, JSON operations, and more.
+
+---
+
+## Schema Exploration
+
+### Introspection Queries
+
+**PostgreSQL — list tables and columns**
+```sql
+SELECT table_name, column_name, data_type, is_nullable, column_default
+FROM information_schema.columns
+WHERE table_schema = 'public'
+ORDER BY table_name, ordinal_position;
+```
+
+**PostgreSQL — foreign keys**
+```sql
+SELECT tc.table_name, kcu.column_name,
+  ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
+FROM information_schema.table_constraints tc
+JOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_name
+JOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_name
+WHERE tc.constraint_type = 'FOREIGN KEY';
+```
+
+**MySQL — table sizes**
+```sql
+SELECT table_name, table_rows,
+  ROUND(data_length / 1024 / 1024, 2) AS data_mb,
+  ROUND(index_length / 1024 / 1024, 2) AS index_mb
+FROM information_schema.tables
+WHERE table_schema = DATABASE()
+ORDER BY data_length DESC;
+```
+
+**SQLite — schema dump**
+```sql
+SELECT name, sql FROM sqlite_master WHERE type = 'table' ORDER BY name;
+```
+
+**SQL Server — columns with types**
+```sql
+SELECT t.name AS table_name, c.name AS column_name,
+  ty.name AS data_type, c.max_length, c.is_nullable
+FROM sys.columns c
+JOIN sys.tables t ON c.object_id = t.object_id
+JOIN sys.types ty ON c.user_type_id = ty.user_type_id
+ORDER BY t.name, c.column_id;
+```
+
+### Generating Documentation from Schema
+
+Use `scripts/schema_explorer.py` to produce markdown or JSON documentation:
+
+```bash
+python scripts/schema_explorer.py --dialect postgres --tables all --format md
+python scripts/schema_explorer.py --dialect mysql --tables users,orders --format json --json
+```
+
+---
+
+## Query Optimization
+
+### EXPLAIN Analysis Workflow
+
+1. **Run EXPLAIN ANALYZE** (PostgreSQL) or **EXPLAIN FORMAT=JSON** (MySQL)
+2. **Identify the costliest node** — Seq Scan on large tables, Nested Loop with high row estimates
+3. **Check for missing indexes** — sequential scans on filtered columns
+4. **Look for estimation errors** — planned vs actual rows divergence signals stale statistics
+5. **Evaluate JOIN order** — ensure the smallest result set drives the join
+
+### Index Recommendation Checklist
+
+- Columns in WHERE clauses with high selectivity
+- Columns in JOIN conditions (foreign keys)
+- Columns in ORDER BY when combined with LIMIT
+- Composite indexes matching multi-column WHERE predicates (most selective column first)
+- Partial indexes for queries with constant filters (e.g., `WHERE status = 'active'`)
+- Covering indexes to avoid table lookups for read-heavy queries
+
+### Query Rewriting Patterns
+
+| Anti-Pattern | Rewrite |
+|-------------|---------|
+| `SELECT * FROM orders` | `SELECT id, status, total FROM orders` (explicit columns) |
+| `WHERE YEAR(created_at) = 2025` | `WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'` (sargable) |
+| Correlated subquery in SELECT | LEFT JOIN with aggregation |
+| `NOT IN (SELECT ...)` with NULLs | `NOT EXISTS (SELECT 1 ...)` |
+| `UNION` (dedup) when not needed | `UNION ALL` |
+| `LIKE '%search%'` | Full-text search index (GIN/FULLTEXT) |
+| `ORDER BY RAND()` | Application-side random sampling or `TABLESAMPLE` |
+
+### N+1 Detection
+
+**Symptoms:**
+- Application loop that executes one query per parent row
+- ORM lazy-loading related entities inside a loop
+- Query log shows hundreds of identical SELECT patterns with different IDs
+
+**Fixes:**
+- Use eager loading (`include` in Prisma, `joinedload` in SQLAlchemy)
+- Batch queries with `WHERE id IN (...)`
+- Use DataLoader pattern for GraphQL resolvers
+
+### Static Analysis Tool
+
+```bash
+python scripts/query_optimizer.py --query "SELECT * FROM orders WHERE status = 'pending'" --dialect postgres
+python scripts/query_optimizer.py --query queries.sql --dialect mysql --json
+```
+
+> See references/optimization_guide.md for EXPLAIN plan reading, index types, and connection pooling.
+
+---
+
+## Migration Generation
+
+### Zero-Downtime Migration Patterns
+
+**Adding a column (safe)**
+```sql
+-- Up
+ALTER TABLE users ADD COLUMN phone VARCHAR(20);
+
+-- Down
+ALTER TABLE users DROP COLUMN phone;
+```
+
+**Renaming a column (expand-contract)**
+```sql
+-- Step 1: Add new column
+ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
+-- Step 2: Backfill
+UPDATE users SET full_name = name;
+-- Step 3: Deploy app reading both columns
+-- Step 4: Deploy app writing only new column
+-- Step 5: Drop old column
+ALTER TABLE users DROP COLUMN name;
+```
+
+**Adding a NOT NULL column (safe sequence)**
+```sql
+-- Step 1: Add nullable
+ALTER TABLE orders ADD COLUMN region VARCHAR(50);
+-- Step 2: Backfill with default
+UPDATE orders SET region = 'unknown' WHERE region IS NULL;
+-- Step 3: Add constraint
+ALTER TABLE orders ALTER COLUMN region SET NOT NULL;
+ALTER TABLE orders ALTER COLUMN region SET DEFAULT 'unknown';
+```
+
+**Index creation (non-blocking, PostgreSQL)**
+```sql
+CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
+```
+
+### Data Backfill Strategies
+
+- **Batch updates** — process in chunks of 1000-10000 rows to avoid lock contention
+- **Background jobs** — run backfills asynchronously with progress tracking
+- **Dual-write** — write to old and new columns during transition period
+- **Validation queries** — verify row counts and data integrity after each batch
+
+### Rollback Strategies
+
+Every migration must have a reversible down script. For irreversible changes:
+
+1. **Backup before execution** — `pg_dump` the affected tables
+2. **Feature flags** — application can switch between old/new schema reads
+3. **Shadow tables** — keep a copy of the original table during migration window
+
+### Migration Generator Tool
+
+```bash
+python scripts/migration_generator.py --change "add email_verified boolean to users" --dialect postgres --format sql
+python scripts/migration_generator.py --change "rename column name to full_name in customers" --dialect mysql --format alembic --json
+```
+
+---
+
+## Multi-Database Support
+
+### Dialect Differences
+
+| Feature | PostgreSQL | MySQL | SQLite | SQL Server |
+|---------|-----------|-------|--------|------------|
+| UPSERT | `ON CONFLICT DO UPDATE` | `ON DUPLICATE KEY UPDATE` | `ON CONFLICT DO UPDATE` | `MERGE` |
+| Boolean | Native `BOOLEAN` | `TINYINT(1)` | `INTEGER` | `BIT` |
+| Auto-increment | `SERIAL` / `GENERATED` | `AUTO_INCREMENT` | `INTEGER PRIMARY KEY` | `IDENTITY` |
+| JSON | `JSONB` (indexed) | `JSON` | Text (ext) | `NVARCHAR(MAX)` |
+| Array | Native `ARRAY` | Not supported | Not supported | Not supported |
+| CTE (recursive) | Full support | 8.0+ | 3.8.3+ | Full support |
+| Window functions | Full support | 8.0+ | 3.25.0+ | Full support |
+| Full-text search | `tsvector` + GIN | `FULLTEXT` index | FTS5 extension | Full-text catalog |
+| LIMIT/OFFSET | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `OFFSET m ROWS FETCH NEXT n ROWS ONLY` |
+
+### Compatibility Tips
+
+- **Always use parameterized queries** — prevents SQL injection across all dialects
+- **Avoid dialect-specific functions in shared code** — wrap in adapter layer
+- **Test migrations on target engine** — `information_schema` varies between engines
+- **Use ISO date format** — `'YYYY-MM-DD'` works everywhere
+- **Quote identifiers** — use double quotes (SQL standard) or backticks (MySQL)
+
+---
+
+## ORM Patterns
+
+### Prisma
+
+**Schema definition**
+```prisma
+model User {
+  id        Int      @id @default(autoincrement())
+  email     String   @unique
+  name      String?
+  posts     Post[]
+  createdAt DateTime @default(now())
+}
+
+model Post {
+  id       Int    @id @default(autoincrement())
+  title    String
+  author   User   @relation(fields: [authorId], references: [id])
+  authorId Int
+}
+```
+
+**Migrations**: `npx prisma migrate dev --name add_user_email`
+**Query API**: `prisma.user.findMany({ where: { email: { contains: '@' } }, include: { posts: true } })`
+**Raw SQL escape hatch**: `prisma.$queryRaw\`SELECT * FROM users WHERE id = ${userId}\``
+
+### Drizzle
+
+**Schema-first definition**
+```typescript
+export const users = pgTable('users', {
+  id: serial('id').primaryKey(),
+  email: varchar('email', { length: 255 }).notNull().unique(),
+  name: text('name'),
+  createdAt: timestamp('created_at').defaultNow(),
+});
+```
+
+**Query builder**: `db.select().from(users).where(eq(users.email, email))`
+**Migrations**: `npx drizzle-kit generate:pg` then `npx drizzle-kit push:pg`
+
+### TypeORM
+
+**Entity decorators**
+```typescript
+@Entity()
+export class User {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column({ unique: true })
+  email: string;
+
+  @OneToMany(() => Post, post => post.author)
+  posts: Post[];
+}
+```
+
+**Repository pattern**: `userRepo.find({ where: { email }, relations: ['posts'] })`
+**Migrations**: `npx typeorm migration:generate -n AddUserEmail`
+
+### SQLAlchemy
+
+**Declarative models**
+```python
+class User(Base):
+    __tablename__ = 'users'
+    id = Column(Integer, primary_key=True)
+    email = Column(String(255), unique=True, nullable=False)
+    name = Column(String(255))
+    posts = relationship('Post', back_populates='author')
+```
+
+**Session management**: Always use `with Session() as session:` context manager
+**Alembic migrations**: `alembic revision --autogenerate -m "add user email"`
+
+> See references/orm_patterns.md for side-by-side comparisons and migration workflows per ORM.
+
+---
+
+## Data Integrity
+
+### Constraint Strategy
+
+- **Primary keys** — every table must have one; prefer surrogate keys (serial/UUID)
+- **Foreign keys** — enforce referential integrity; define ON DELETE behavior explicitly
+- **UNIQUE constraints** — for business-level uniqueness (email, slug, API key)
+- **CHECK constraints** — validate ranges, enums, and business rules at the DB level
+- **NOT NULL** — default to NOT NULL; make nullable only when genuinely optional
+
+### Transaction Isolation Levels
+
+| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
+|-------|-----------|-------------------|-------------|----------|
+| READ UNCOMMITTED | Yes | Yes | Yes | Never recommended |
+| READ COMMITTED | No | Yes | Yes | Default for PostgreSQL, general OLTP |
+| REPEATABLE READ | No | No | Yes (InnoDB: No) | Financial calculations |
+| SERIALIZABLE | No | No | No | Critical consistency (billing, inventory) |
+
+### Deadlock Prevention
+
+1. **Consistent lock ordering** — always acquire locks in the same table/row order
+2. **Short transactions** — minimize time between first lock and commit
+3. **Advisory locks** — use `pg_advisory_lock()` for application-level coordination
+4. **Retry logic** — catch deadlock errors and retry with exponential backoff
+
+---
+
+## Backup & Restore
+
+### PostgreSQL
+```bash
+# Full backup
+pg_dump -Fc --no-owner dbname > backup.dump
+# Restore
+pg_restore -d dbname --clean --no-owner backup.dump
+# Point-in-time recovery: configure WAL archiving + restore_command
+```
+
+### MySQL
+```bash
+# Full backup
+mysqldump --single-transaction --routines --triggers dbname > backup.sql
+# Restore
+mysql dbname < backup.sql
+# Binary log for PITR: mysqlbinlog --start-datetime="2025-01-01 00:00:00" binlog.000001
+```
+
+### SQLite
+```bash
+# Backup (safe with concurrent reads)
+sqlite3 dbname ".backup backup.db"
+```
+
+### Backup Best Practices
+- **Automate** — cron or systemd timer, never manual-only
+- **Test restores** — untested backups are not backups
+- **Offsite copies** — S3, GCS, or separate region
+- **Retention policy** — daily for 7 days, weekly for 4 weeks, monthly for 12 months
+- **Monitor backup size and duration** — sudden changes signal issues
+
+---
+
+## Anti-Patterns
+
+| Anti-Pattern | Problem | Fix |
+|-------------|---------|-----|
+| `SELECT *` | Transfers unnecessary data, breaks on schema changes | Explicit column list |
+| Missing indexes on FK columns | Slow JOINs and cascading deletes | Add indexes on all foreign keys |
+| N+1 queries | 1 + N round trips to database | Eager loading or batch queries |
+| Implicit type coercion | `WHERE id = '123'` prevents index use | Match types in predicates |
+| No connection pooling | Exhausts connections under load | PgBouncer, ProxySQL, or ORM pool |
+| Unbounded queries | No LIMIT risks returning millions of rows | Always paginate |
+| Storing money as FLOAT | Rounding errors | Use `DECIMAL(19,4)` or integer cents |
+| God tables | One table with 50+ columns | Normalize or use vertical partitioning |
+| Soft deletes everywhere | Complicates every query with `WHERE deleted_at IS NULL` | Archive tables or event sourcing |
+| Raw string concatenation | SQL injection | Parameterized queries always |
+
+---
+
+## Cross-References
+
+| Skill | Relationship |
+|-------|-------------|
+| **database-designer** | Schema architecture, normalization analysis, ERD generation |
+| **database-schema-designer** | Visual ERD modeling, relationship mapping |
+| **migration-architect** | Complex multi-step migration orchestration |
+| **api-design-reviewer** | Ensuring API endpoints align with query patterns |
+| **observability-platform** | Query performance monitoring, slow query alerts |
--- a/engineering/sql-database-assistant/references/optimization_guide.md
+++ b/engineering/sql-database-assistant/references/optimization_guide.md
@@ -0,0 +1,330 @@
+# Query Optimization Guide
+
+How to read EXPLAIN plans, choose the right index types, understand query plan operators, and configure connection pooling.
+
+---
+
+## Reading EXPLAIN Plans
+
+### PostgreSQL — EXPLAIN ANALYZE
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Sample output:**
+```
+Limit  (cost=0.43..12.87 rows=20 width=128) (actual time=0.052..0.089 rows=20 loops=1)
+  ->  Index Scan Backward using idx_orders_status_created on orders  (cost=0.43..4521.33 rows=7284 width=128) (actual time=0.051..0.085 rows=20 loops=1)
+        Index Cond: (status = 'paid')
+        Buffers: shared hit=4
+Planning Time: 0.156 ms
+Execution Time: 0.112 ms
+```
+
+**Key fields to check:**
+
+| Field | What it tells you |
+|-------|-------------------|
+| `cost` | Estimated startup..total cost (arbitrary units) |
+| `rows` | Estimated row count at that node |
+| `actual time` | Real wall-clock time in milliseconds |
+| `actual rows` | Real row count — compare against estimate |
+| `Buffers: shared hit` | Pages read from cache (good) |
+| `Buffers: shared read` | Pages read from disk (slow) |
+| `loops` | How many times the node executed |
+
+**Red flags:**
+- `Seq Scan` on a large table with a WHERE clause — missing index
+- `actual rows` >> `rows` (estimated) — stale statistics, run `ANALYZE`
+- `Nested Loop` with high loop count — consider hash join or add index
+- `Sort` with `external merge` — not enough `work_mem`, spilling to disk
+- `Buffers: shared read` much higher than `shared hit` — cold cache or table too large for memory
+
+### MySQL — EXPLAIN FORMAT=JSON
+
+```sql
+EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
+```
+
+**Key fields:**
+- `query_block.select_id` — identifies subqueries
+- `table.access_type` — `ALL` (full scan), `ref` (index lookup), `range`, `index`, `const`
+- `table.rows_examined_per_scan` — how many rows the engine reads
+- `table.using_index` — covering index (no table lookup needed)
+- `table.attached_condition` — the WHERE filter applied
+
+**Access types ranked (best to worst):**
+`system` > `const` > `eq_ref` > `ref` > `range` > `index` > `ALL`
+
+---
+
+## Index Types
+
+### B-tree (default)
+
+The workhorse index. Supports equality, range, prefix, and ORDER BY operations.
+
+**Best for:** `=`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`, `MIN()`, `MAX()`
+
+```sql
+CREATE INDEX idx_orders_created ON orders (created_at);
+```
+
+**Composite B-tree:** Column order matters. The index is useful for queries that filter on a leftmost prefix of the indexed columns.
+
+```sql
+-- This index serves: WHERE status = ... AND created_at > ...
+-- Also serves: WHERE status = ...
+-- Does NOT serve: WHERE created_at > ... (without status)
+CREATE INDEX idx_orders_status_created ON orders (status, created_at);
+```
+
+### Hash
+
+Equality-only lookups. Faster than B-tree for exact matches but no range support.
+
+**Best for:** `=` lookups on high-cardinality columns
+
+```sql
+-- PostgreSQL
+CREATE INDEX idx_sessions_token ON sessions USING hash (token);
+```
+
+**Limitations:** No range queries, no ORDER BY, not WAL-logged before PostgreSQL 10.
+
+### GIN (Generalized Inverted Index)
+
+For multi-valued data: arrays, JSONB, full-text search vectors.
+
+```sql
+-- JSONB containment
+CREATE INDEX idx_products_tags ON products USING gin (tags);
+-- Query: SELECT * FROM products WHERE tags @> '["sale"]';
+
+-- Full-text search
+CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
+```
+
+### GiST (Generalized Search Tree)
+
+For geometric, range, and proximity data.
+
+```sql
+-- Range type (e.g., date ranges)
+CREATE INDEX idx_bookings_period ON bookings USING gist (during);
+-- Query: SELECT * FROM bookings WHERE during && '[2025-01-01, 2025-01-31]';
+
+-- PostGIS geometry
+CREATE INDEX idx_locations_geom ON locations USING gist (geom);
+```
+
+### BRIN (Block Range INdex)
+
+Tiny index for naturally ordered data (e.g., time-series append-only tables).
+
+```sql
+CREATE INDEX idx_events_created ON events USING brin (created_at);
+```
+
+**Best for:** Large tables where the indexed column correlates with physical row order. Much smaller than B-tree but less precise.
+
+### Partial Index
+
+Index only rows matching a condition. Smaller and faster for targeted queries.
+
+```sql
+-- Only index active users (skip millions of inactive)
+CREATE INDEX idx_users_active_email ON users (email) WHERE status = 'active';
+```
+
+### Covering Index (INCLUDE)
+
+Store extra columns in the index to avoid table lookups (index-only scans).
+
+```sql
+-- PostgreSQL 11+
+CREATE INDEX idx_orders_status ON orders (status) INCLUDE (total, created_at);
+-- Query can be answered entirely from the index:
+-- SELECT total, created_at FROM orders WHERE status = 'paid';
+```
+
+### Expression Index
+
+Index the result of a function or expression.
+
+```sql
+CREATE INDEX idx_users_lower_email ON users (LOWER(email));
+-- Query: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
+```
+
+---
+
+## Query Plan Operators
+
+### Scan operators
+
+| Operator | Description | Performance |
+|----------|-------------|-------------|
+| **Seq Scan** | Full table scan, reads every row | Slow on large tables |
+| **Index Scan** | B-tree lookup + table fetch | Fast for selective queries |
+| **Index Only Scan** | Reads only the index (covering) | Fastest for covered queries |
+| **Bitmap Index Scan** | Builds a bitmap of matching pages | Good for medium selectivity |
+| **Bitmap Heap Scan** | Fetches pages identified by bitmap | Pairs with bitmap index scan |
+
+### Join operators
+
+| Operator | Description | Best when |
+|----------|-------------|-----------|
+| **Nested Loop** | For each outer row, scan inner | Small outer set, indexed inner |
+| **Hash Join** | Build hash table on inner, probe with outer | Medium-large sets, no index |
+| **Merge Join** | Merge two sorted inputs | Both inputs already sorted |
+
+### Other operators
+
+| Operator | Description |
+|----------|-------------|
+| **Sort** | Sorts rows (may spill to disk if work_mem exceeded) |
+| **Hash Aggregate** | GROUP BY using hash table |
+| **Group Aggregate** | GROUP BY on pre-sorted input |
+| **Limit** | Stops after N rows |
+| **Materialize** | Caches subquery results in memory |
+| **Gather / Gather Merge** | Collects results from parallel workers |
+
+---
+
+## Connection Pooling
+
+### Why pool connections?
+
+Each database connection consumes memory (5-10 MB in PostgreSQL). Without pooling:
+- Application creates a new connection per request (slow: TCP + TLS + auth)
+- Under load, connection count spikes past `max_connections`
+- Database OOM or connection refused errors
+
+### PgBouncer (PostgreSQL)
+
+The standard external connection pooler for PostgreSQL.
+
+**Modes:**
+- **Session** — connection assigned for entire client session (safest, least efficient)
+- **Transaction** — connection returned to pool after each transaction (recommended)
+- **Statement** — connection returned after each statement (cannot use transactions)
+
+```ini
+# pgbouncer.ini
+[databases]
+mydb = host=127.0.0.1 port=5432 dbname=mydb
+
+[pgbouncer]
+pool_mode = transaction
+max_client_conn = 200
+default_pool_size = 20
+min_pool_size = 5
+reserve_pool_size = 5
+reserve_pool_timeout = 3
+server_idle_timeout = 300
+```
+
+**Sizing formula:**
+```
+default_pool_size = num_cpu_cores * 2 + effective_spindle_count
+```
+For SSDs, start with `num_cpu_cores * 2` (typically 4-16 connections is optimal).
+
+### ProxySQL (MySQL)
+
+```ini
+mysql_servers = ({ address="127.0.0.1", port=3306, hostgroup=0, max_connections=100 })
+mysql_query_rules = ({ rule_id=1, match_pattern="^SELECT.*FOR UPDATE", destination_hostgroup=0 })
+```
+
+### Application-Level Pooling
+
+Most ORMs and drivers include built-in pooling:
+
+| Platform | Pool Configuration |
+|----------|--------------------|
+| **node-postgres** | `new Pool({ max: 20, idleTimeoutMillis: 30000 })` |
+| **SQLAlchemy** | `create_engine(url, pool_size=20, max_overflow=5)` |
+| **HikariCP (Java)** | `maximumPoolSize=20, minimumIdle=5, idleTimeout=300000` |
+| **Prisma** | `connection_limit=20` in connection string |
+
+### Pool Sizing Guidelines
+
+| Metric | Guideline |
+|--------|-----------|
+| **Minimum** | Number of always-active background workers |
+| **Maximum** | 2-4x CPU cores for OLTP; lower for OLAP |
+| **Idle timeout** | 30-300 seconds (reclaim unused connections) |
+| **Connection timeout** | 3-10 seconds (fail fast under pressure) |
+| **Queue size** | 2-5x pool max (buffer bursts before rejecting) |
+
+**Warning:** More connections does not mean better performance. Beyond the optimal point (usually 20-50), contention on locks, CPU, and I/O causes throughput to decrease.
+
+---
+
+## Statistics and Maintenance
+
+### PostgreSQL
+```sql
+-- Update statistics for the query planner
+ANALYZE orders;
+ANALYZE;  -- All tables
+
+-- Check table bloat and dead tuples
+SELECT relname, n_dead_tup, last_autovacuum, last_autoanalyze
+FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
+
+-- Identify unused indexes
+SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
+FROM pg_stat_user_indexes
+WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
+ORDER BY pg_relation_size(indexrelid) DESC;
+```
+
+### MySQL
+```sql
+-- Update statistics
+ANALYZE TABLE orders;
+
+-- Check index usage
+SELECT * FROM sys.schema_unused_indexes;
+SELECT * FROM sys.schema_redundant_indexes;
+
+-- Identify long-running queries
+SELECT * FROM information_schema.processlist WHERE time > 10;
+```
+
+---
+
+## Performance Checklist
+
+Before deploying any query to production:
+
+1. Run `EXPLAIN ANALYZE` and verify no unexpected sequential scans
+2. Check that estimated rows are within 10x of actual rows
+3. Verify index usage on all WHERE, JOIN, and ORDER BY columns
+4. Ensure LIMIT is present for user-facing list queries
+5. Confirm parameterized queries (no string concatenation)
+6. Test with production-like data volume (not just 10 rows)
+7. Monitor query time in application metrics after deployment
+8. Set up slow query log alerting (> 100ms for OLTP, > 5s for reports)
+
+---
+
+## Quick Reference: When to Use Which Index
+
+| Query Pattern | Index Type |
+|--------------|-----------|
+| `WHERE col = value` | B-tree or Hash |
+| `WHERE col > value` | B-tree |
+| `WHERE col LIKE 'prefix%'` | B-tree |
+| `WHERE col LIKE '%substring%'` | GIN (full-text) or trigram |
+| `WHERE jsonb_col @> '{...}'` | GIN |
+| `WHERE array_col && ARRAY[...]` | GIN |
+| `WHERE range_col && '[a,b]'` | GiST |
+| `WHERE ST_DWithin(geom, ...)` | GiST |
+| `WHERE col = value` (append-only) | BRIN |
+| `WHERE col = value AND status = 'active'` | Partial B-tree |
+| `SELECT a, b WHERE c = value` | Covering (INCLUDE) |
--- a/engineering/sql-database-assistant/references/orm_patterns.md
+++ b/engineering/sql-database-assistant/references/orm_patterns.md
@@ -0,0 +1,451 @@
+# ORM Patterns Reference
+
+Side-by-side comparison of Prisma, Drizzle, TypeORM, and SQLAlchemy patterns for common database operations.
+
+---
+
+## Schema Definition
+
+### Prisma (schema.prisma)
+```prisma
+model User {
+  id        Int      @id @default(autoincrement())
+  email     String   @unique
+  name      String?
+  role      Role     @default(USER)
+  posts     Post[]
+  profile   Profile?
+  createdAt DateTime @default(now())
+  updatedAt DateTime @updatedAt
+
+  @@index([email])
+  @@map("users")
+}
+
+model Post {
+  id        Int      @id @default(autoincrement())
+  title     String
+  body      String?
+  published Boolean  @default(false)
+  author    User     @relation(fields: [authorId], references: [id], onDelete: Cascade)
+  authorId  Int
+  tags      Tag[]
+  createdAt DateTime @default(now())
+
+  @@index([authorId])
+  @@index([published, createdAt])
+  @@map("posts")
+}
+
+enum Role {
+  USER
+  ADMIN
+  MODERATOR
+}
+```
+
+### Drizzle (schema.ts)
+```typescript
+import { pgTable, serial, varchar, text, boolean, timestamp, integer, pgEnum } from 'drizzle-orm/pg-core';
+
+export const roleEnum = pgEnum('role', ['USER', 'ADMIN', 'MODERATOR']);
+
+export const users = pgTable('users', {
+  id: serial('id').primaryKey(),
+  email: varchar('email', { length: 255 }).notNull().unique(),
+  name: varchar('name', { length: 255 }),
+  role: roleEnum('role').default('USER').notNull(),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+  updatedAt: timestamp('updated_at').defaultNow().notNull(),
+});
+
+export const posts = pgTable('posts', {
+  id: serial('id').primaryKey(),
+  title: varchar('title', { length: 255 }).notNull(),
+  body: text('body'),
+  published: boolean('published').default(false).notNull(),
+  authorId: integer('author_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
+  createdAt: timestamp('created_at').defaultNow().notNull(),
+}, (table) => ({
+  authorIdx: index('idx_posts_author').on(table.authorId),
+  publishedIdx: index('idx_posts_published').on(table.published, table.createdAt),
+}));
+```
+
+### TypeORM (entities)
+```typescript
+import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, OneToMany, CreateDateColumn, UpdateDateColumn, Index } from 'typeorm';
+
+export enum Role { USER = 'USER', ADMIN = 'ADMIN', MODERATOR = 'MODERATOR' }
+
+@Entity('users')
+export class User {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column({ unique: true })
+  @Index()
+  email: string;
+
+  @Column({ nullable: true })
+  name: string;
+
+  @Column({ type: 'enum', enum: Role, default: Role.USER })
+  role: Role;
+
+  @OneToMany(() => Post, post => post.author)
+  posts: Post[];
+
+  @CreateDateColumn()
+  createdAt: Date;
+
+  @UpdateDateColumn()
+  updatedAt: Date;
+}
+
+@Entity('posts')
+@Index(['published', 'createdAt'])
+export class Post {
+  @PrimaryGeneratedColumn()
+  id: number;
+
+  @Column()
+  title: string;
+
+  @Column({ nullable: true, type: 'text' })
+  body: string;
+
+  @Column({ default: false })
+  published: boolean;
+
+  @ManyToOne(() => User, user => user.posts, { onDelete: 'CASCADE' })
+  author: User;
+
+  @Column()
+  authorId: number;
+
+  @CreateDateColumn()
+  createdAt: Date;
+}
+```
+
+### SQLAlchemy (models.py)
+```python
+import enum
+from datetime import datetime
+from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, Enum, ForeignKey, Index
+from sqlalchemy.orm import relationship, DeclarativeBase
+
+class Base(DeclarativeBase):
+    pass
+
+class Role(enum.Enum):
+    USER = "USER"
+    ADMIN = "ADMIN"
+    MODERATOR = "MODERATOR"
+
+class User(Base):
+    __tablename__ = 'users'
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    email = Column(String(255), unique=True, nullable=False, index=True)
+    name = Column(String(255), nullable=True)
+    role = Column(Enum(Role), default=Role.USER, nullable=False)
+    posts = relationship('Post', back_populates='author', cascade='all, delete-orphan')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
+
+class Post(Base):
+    __tablename__ = 'posts'
+    __table_args__ = (
+        Index('idx_posts_published', 'published', 'created_at'),
+    )
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    title = Column(String(255), nullable=False)
+    body = Column(Text, nullable=True)
+    published = Column(Boolean, default=False, nullable=False)
+    author_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False, index=True)
+    author = relationship('User', back_populates='posts')
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+```
+
+---
+
+## CRUD Operations
+
+### Create
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.create({ data: { email, name } })` |
+| **Drizzle** | `await db.insert(users).values({ email, name }).returning()` |
+| **TypeORM** | `await userRepo.save(userRepo.create({ email, name }))` |
+| **SQLAlchemy** | `session.add(User(email=email, name=name)); session.commit()` |
+
+### Read (with filter)
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.findMany({ where: { role: 'ADMIN' }, orderBy: { createdAt: 'desc' } })` |
+| **Drizzle** | `await db.select().from(users).where(eq(users.role, 'ADMIN')).orderBy(desc(users.createdAt))` |
+| **TypeORM** | `await userRepo.find({ where: { role: Role.ADMIN }, order: { createdAt: 'DESC' } })` |
+| **SQLAlchemy** | `session.query(User).filter(User.role == Role.ADMIN).order_by(User.created_at.desc()).all()` |
+
+### Update
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.update({ where: { id }, data: { name } })` |
+| **Drizzle** | `await db.update(users).set({ name }).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.update(id, { name })` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).update({User.name: name}); session.commit()` |
+
+### Delete
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `await prisma.user.delete({ where: { id } })` |
+| **Drizzle** | `await db.delete(users).where(eq(users.id, id))` |
+| **TypeORM** | `await userRepo.delete(id)` |
+| **SQLAlchemy** | `session.query(User).filter(User.id == id).delete(); session.commit()` |
+
+---
+
+## Relations and Eager Loading
+
+### Prisma — include / select
+```typescript
+// Eager load posts with user
+const user = await prisma.user.findUnique({
+  where: { id: 1 },
+  include: { posts: { where: { published: true }, orderBy: { createdAt: 'desc' } } },
+});
+
+// Nested create
+await prisma.user.create({
+  data: {
+    email: 'new@example.com',
+    posts: { create: [{ title: 'First post' }] },
+  },
+});
+```
+
+### Drizzle — relational queries
+```typescript
+const result = await db.query.users.findFirst({
+  where: eq(users.id, 1),
+  with: { posts: { where: eq(posts.published, true), orderBy: [desc(posts.createdAt)] } },
+});
+```
+
+### TypeORM — relations / query builder
+```typescript
+// FindOptions
+const user = await userRepo.findOne({ where: { id: 1 }, relations: ['posts'] });
+
+// QueryBuilder for complex joins
+const result = await userRepo.createQueryBuilder('u')
+  .leftJoinAndSelect('u.posts', 'p', 'p.published = :pub', { pub: true })
+  .where('u.id = :id', { id: 1 })
+  .getOne();
+```
+
+### SQLAlchemy — joinedload / selectinload
+```python
+from sqlalchemy.orm import joinedload, selectinload
+
+# Eager load in one JOIN query
+user = session.query(User).options(joinedload(User.posts)).filter(User.id == 1).first()
+
+# Eager load in a separate IN query (better for collections)
+users = session.query(User).options(selectinload(User.posts)).all()
+```
+
+---
+
+## Raw SQL Escape Hatches
+
+Every ORM should provide a way to execute raw SQL for complex queries:
+
+| ORM | Pattern |
+|-----|---------|
+| **Prisma** | `` prisma.$queryRaw`SELECT * FROM users WHERE id = ${id}` `` |
+| **Drizzle** | `db.execute(sql`SELECT * FROM users WHERE id = ${id}`)` |
+| **TypeORM** | `dataSource.query('SELECT * FROM users WHERE id = $1', [id])` |
+| **SQLAlchemy** | `session.execute(text('SELECT * FROM users WHERE id = :id'), {'id': id})` |
+
+Always use parameterized queries in raw SQL to prevent injection.
+
+---
+
+## Transaction Patterns
+
+### Prisma
+```typescript
+await prisma.$transaction(async (tx) => {
+  const user = await tx.user.create({ data: { email } });
+  await tx.post.create({ data: { title: 'Welcome', authorId: user.id } });
+});
+```
+
+### Drizzle
+```typescript
+await db.transaction(async (tx) => {
+  const [user] = await tx.insert(users).values({ email }).returning();
+  await tx.insert(posts).values({ title: 'Welcome', authorId: user.id });
+});
+```
+
+### TypeORM
+```typescript
+await dataSource.transaction(async (manager) => {
+  const user = await manager.save(User, { email });
+  await manager.save(Post, { title: 'Welcome', authorId: user.id });
+});
+```
+
+### SQLAlchemy
+```python
+with Session() as session:
+    try:
+        user = User(email=email)
+        session.add(user)
+        session.flush()  # Get user.id without committing
+        session.add(Post(title='Welcome', author_id=user.id))
+        session.commit()
+    except Exception:
+        session.rollback()
+        raise
+```
+
+---
+
+## Migration Workflows
+
+### Prisma
+```bash
+# Generate migration from schema changes
+npx prisma migrate dev --name add_posts_table
+
+# Apply in production
+npx prisma migrate deploy
+
+# Reset database (dev only)
+npx prisma migrate reset
+
+# Generate client after schema change
+npx prisma generate
+```
+
+**Files:** `prisma/migrations/<timestamp>_<name>/migration.sql`
+
+### Drizzle
+```bash
+# Generate migration SQL from schema diff
+npx drizzle-kit generate:pg
+
+# Push schema directly (dev only, no migration files)
+npx drizzle-kit push:pg
+
+# Apply migrations
+npx drizzle-kit migrate
+```
+
+**Files:** `drizzle/<timestamp>_<name>.sql`
+
+### TypeORM
+```bash
+# Auto-generate migration from entity changes
+npx typeorm migration:generate -d data-source.ts -n AddPostsTable
+
+# Create empty migration
+npx typeorm migration:create -n CustomMigration
+
+# Run pending migrations
+npx typeorm migration:run -d data-source.ts
+
+# Revert last migration
+npx typeorm migration:revert -d data-source.ts
+```
+
+**Files:** `src/migrations/<timestamp>-<Name>.ts`
+
+### SQLAlchemy (Alembic)
+```bash
+# Initialize Alembic
+alembic init alembic
+
+# Auto-generate migration from model changes
+alembic revision --autogenerate -m "add posts table"
+
+# Apply all pending
+alembic upgrade head
+
+# Revert one step
+alembic downgrade -1
+
+# Show current state
+alembic current
+```
+
+**Files:** `alembic/versions/<hash>_<slug>.py`
+
+---
+
+## N+1 Prevention Cheat Sheet
+
+| ORM | Lazy (N+1 risk) | Eager (fixed) |
+|-----|-----------------|---------------|
+| **Prisma** | Not accessing `include` | `include: { posts: true }` |
+| **Drizzle** | Separate queries | `with: { posts: true }` |
+| **TypeORM** | `@ManyToOne(() => ..., { lazy: true })` | `relations: ['posts']` or `leftJoinAndSelect` |
+| **SQLAlchemy** | Default `lazy='select'` | `joinedload()` or `selectinload()` |
+
+**Rule of thumb:** If you access a relation inside a loop, you have an N+1 problem. Always load relations before the loop.
+
+---
+
+## Connection Pooling
+
+### Prisma
+```
+# In .env or connection string
+DATABASE_URL="postgresql://user:pass@host/db?connection_limit=20&pool_timeout=10"
+```
+
+### Drizzle (with node-postgres)
+```typescript
+import { Pool } from 'pg';
+const pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000 });
+const db = drizzle(pool);
+```
+
+### TypeORM
+```typescript
+const dataSource = new DataSource({
+  type: 'postgres',
+  extra: { max: 20, idleTimeoutMillis: 30000 },
+});
+```
+
+### SQLAlchemy
+```python
+from sqlalchemy import create_engine
+engine = create_engine('postgresql://user:pass@host/db', pool_size=20, max_overflow=5, pool_timeout=30)
+```
+
+---
+
+## Best Practices Summary
+
+1. **Always use migrations** — never modify production schemas by hand
+2. **Eager load relations** — prevent N+1 in every list/collection query
+3. **Use transactions** — group related writes to maintain consistency
+4. **Parameterize raw SQL** — never concatenate user input into queries
+5. **Connection pooling** — configure pool size matching your workload
+6. **Index foreign keys** — ORMs often skip this; add manually if needed
+7. **Review generated SQL** — enable query logging in development to catch inefficiencies
+8. **Type-safe queries** — leverage TypeScript/Python typing for compile-time checks
+9. **Separate read/write models** — use views or read replicas for heavy reporting queries
+10. **Test migrations both ways** — always verify that down migrations actually reverse up migrations
--- a/engineering/sql-database-assistant/references/query_patterns.md
+++ b/engineering/sql-database-assistant/references/query_patterns.md
@@ -0,0 +1,406 @@
+# SQL Query Patterns Reference
+
+Common query patterns for everyday database operations. All examples use PostgreSQL syntax with dialect notes where they differ.
+
+---
+
+## JOIN Patterns
+
+### INNER JOIN — matching rows in both tables
+```sql
+SELECT u.name, o.id AS order_id, o.total
+FROM users u
+INNER JOIN orders o ON o.user_id = u.id
+WHERE o.status = 'paid';
+```
+
+### LEFT JOIN — all rows from left, matching from right
+```sql
+SELECT u.name, COUNT(o.id) AS order_count
+FROM users u
+LEFT JOIN orders o ON o.user_id = u.id
+GROUP BY u.id, u.name;
+```
+Returns users even if they have zero orders.
+
+### Self JOIN — comparing rows within the same table
+```sql
+-- Find employees who earn more than their manager
+SELECT e.name AS employee, m.name AS manager, e.salary, m.salary AS manager_salary
+FROM employees e
+JOIN employees m ON e.manager_id = m.id
+WHERE e.salary > m.salary;
+```
+
+### CROSS JOIN — every combination (cartesian product)
+```sql
+-- Generate a calendar grid
+SELECT d.date, s.shift_name
+FROM dates d
+CROSS JOIN shifts s;
+```
+Use intentionally. Accidental cartesian joins are a performance killer.
+
+### LATERAL JOIN (PostgreSQL) — correlated subquery as a table
+```sql
+-- Top 3 orders per user
+SELECT u.name, top_orders.*
+FROM users u
+CROSS JOIN LATERAL (
+  SELECT id, total FROM orders
+  WHERE user_id = u.id
+  ORDER BY total DESC LIMIT 3
+) top_orders;
+```
+MySQL equivalent: use a subquery with `ROW_NUMBER()`.
+
+---
+
+## Common Table Expressions (CTEs)
+
+### Basic CTE — readable subquery
+```sql
+WITH active_users AS (
+  SELECT id, name, email
+  FROM users
+  WHERE last_login > CURRENT_DATE - INTERVAL '30 days'
+)
+SELECT au.name, COUNT(o.id) AS recent_orders
+FROM active_users au
+JOIN orders o ON o.user_id = au.id
+GROUP BY au.name;
+```
+
+### Multiple CTEs — chaining transformations
+```sql
+WITH monthly_revenue AS (
+  SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue
+  FROM orders WHERE status = 'paid'
+  GROUP BY 1
+),
+growth AS (
+  SELECT month, revenue,
+    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
+    ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
+  FROM monthly_revenue
+)
+SELECT * FROM growth ORDER BY month;
+```
+
+### Recursive CTE — hierarchical data
+```sql
+-- Organization tree
+WITH RECURSIVE org_tree AS (
+  -- Base case: top-level managers
+  SELECT id, name, manager_id, 0 AS depth
+  FROM employees WHERE manager_id IS NULL
+
+  UNION ALL
+
+  -- Recursive case: subordinates
+  SELECT e.id, e.name, e.manager_id, ot.depth + 1
+  FROM employees e
+  JOIN org_tree ot ON e.manager_id = ot.id
+)
+SELECT * FROM org_tree ORDER BY depth, name;
+```
+
+### Recursive CTE — path traversal
+```sql
+-- Category breadcrumb
+WITH RECURSIVE breadcrumb AS (
+  SELECT id, name, parent_id, name::TEXT AS path
+  FROM categories WHERE id = 42
+
+  UNION ALL
+
+  SELECT c.id, c.name, c.parent_id, c.name || ' > ' || b.path
+  FROM categories c
+  JOIN breadcrumb b ON c.id = b.parent_id
+)
+SELECT path FROM breadcrumb WHERE parent_id IS NULL;
+```
+
+---
+
+## Window Functions
+
+### ROW_NUMBER — assign unique rank per partition
+```sql
+SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
+FROM employees;
+```
+
+### RANK and DENSE_RANK — handle ties
+```sql
+-- RANK: 1, 2, 2, 4 (skips after tie)
+-- DENSE_RANK: 1, 2, 2, 3 (no skip)
+SELECT name, salary,
+  RANK() OVER (ORDER BY salary DESC) AS rank,
+  DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
+FROM employees;
+```
+
+### Running total and moving average
+```sql
+SELECT date, amount,
+  SUM(amount) OVER (ORDER BY date) AS running_total,
+  AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
+FROM daily_revenue;
+```
+
+### LAG / LEAD — access adjacent rows
+```sql
+SELECT date, revenue,
+  LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
+  revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day_change
+FROM daily_revenue;
+```
+
+### NTILE — divide into buckets
+```sql
+-- Split customers into quartiles by total spend
+SELECT customer_id, total_spend,
+  NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
+FROM customer_summary;
+```
+
+### FIRST_VALUE / LAST_VALUE
+```sql
+SELECT department_id, name, salary,
+  FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_paid
+FROM employees;
+```
+
+---
+
+## Subquery Patterns
+
+### EXISTS — correlated existence check
+```sql
+-- Users who have placed at least one order
+SELECT u.* FROM users u
+WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### NOT EXISTS — safer than NOT IN for NULLs
+```sql
+-- Users who have never ordered
+SELECT u.* FROM users u
+WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
+```
+
+### Scalar subquery — single value
+```sql
+SELECT name, salary,
+  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
+FROM employees;
+```
+
+### Derived table — subquery in FROM
+```sql
+SELECT dept, avg_salary
+FROM (
+  SELECT department_id AS dept, AVG(salary) AS avg_salary
+  FROM employees GROUP BY department_id
+) dept_avg
+WHERE avg_salary > 100000;
+```
+
+---
+
+## Aggregation Patterns
+
+### GROUP BY with HAVING
+```sql
+-- Departments with more than 10 employees
+SELECT department_id, COUNT(*) AS headcount, AVG(salary) AS avg_salary
+FROM employees
+GROUP BY department_id
+HAVING COUNT(*) > 10;
+```
+
+### GROUPING SETS — multiple grouping levels
+```sql
+SELECT region, product_category, SUM(revenue)
+FROM sales
+GROUP BY GROUPING SETS (
+  (region, product_category),
+  (region),
+  (product_category),
+  ()
+);
+```
+
+### ROLLUP — hierarchical subtotals
+```sql
+SELECT region, city, SUM(revenue)
+FROM sales
+GROUP BY ROLLUP (region, city);
+-- Produces: (region, city), (region), ()
+```
+
+### CUBE — all combinations
+```sql
+SELECT region, product, SUM(revenue)
+FROM sales
+GROUP BY CUBE (region, product);
+```
+
+### FILTER clause (PostgreSQL) — conditional aggregation
+```sql
+SELECT
+  COUNT(*) AS total,
+  COUNT(*) FILTER (WHERE status = 'paid') AS paid,
+  COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
+  SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
+FROM orders;
+```
+MySQL/SQL Server equivalent: `SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)`.
+
+---
+
+## UPSERT Patterns
+
+### PostgreSQL — ON CONFLICT
+```sql
+INSERT INTO user_settings (user_id, key, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON CONFLICT (user_id, key)
+DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
+```
+
+### MySQL — ON DUPLICATE KEY
+```sql
+INSERT INTO user_settings (user_id, key_name, value, updated_at)
+VALUES (1, 'theme', 'dark', NOW())
+ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
+```
+
+### SQL Server — MERGE
+```sql
+MERGE INTO user_settings AS target
+USING (VALUES (1, 'theme', 'dark')) AS source (user_id, key_name, value)
+ON target.user_id = source.user_id AND target.key_name = source.key_name
+WHEN MATCHED THEN UPDATE SET value = source.value, updated_at = GETDATE()
+WHEN NOT MATCHED THEN INSERT (user_id, key_name, value, updated_at)
+  VALUES (source.user_id, source.key_name, source.value, GETDATE());
+```
+
+---
+
+## JSON Operations
+
+### PostgreSQL JSONB
+```sql
+-- Extract field
+SELECT data->>'name' AS name FROM products WHERE data->>'category' = 'electronics';
+
+-- Array contains
+SELECT * FROM products WHERE data->'tags' ? 'sale';
+
+-- Update nested field
+UPDATE products SET data = jsonb_set(data, '{price}', '29.99') WHERE id = 1;
+
+-- Aggregate into JSON array
+SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM users;
+```
+
+### MySQL JSON
+```sql
+-- Extract field
+SELECT JSON_EXTRACT(data, '$.name') AS name FROM products;
+-- Shorthand: SELECT data->>"$.name"
+
+-- Search in array
+SELECT * FROM products WHERE JSON_CONTAINS(data->"$.tags", '"sale"');
+
+-- Update
+UPDATE products SET data = JSON_SET(data, '$.price', 29.99) WHERE id = 1;
+```
+
+---
+
+## Pagination Patterns
+
+### Offset pagination (simple but slow for deep pages)
+```sql
+SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
+```
+
+### Keyset pagination (fast, requires ordered unique column)
+```sql
+-- Page after the last seen id
+SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20;
+```
+
+### Keyset with composite sort
+```sql
+SELECT * FROM products
+WHERE (created_at, id) < (:last_created_at, :last_id)
+ORDER BY created_at DESC, id DESC
+LIMIT 20;
+```
+
+---
+
+## Bulk Operations
+
+### Batch INSERT
+```sql
+INSERT INTO events (type, payload, created_at) VALUES
+  ('click', '{"page": "/home"}', NOW()),
+  ('view', '{"page": "/pricing"}', NOW()),
+  ('click', '{"page": "/signup"}', NOW());
+```
+
+### Batch UPDATE with VALUES
+```sql
+UPDATE products AS p SET price = v.price
+FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, price)
+WHERE p.id = v.id;
+```
+
+### DELETE with subquery
+```sql
+DELETE FROM sessions
+WHERE user_id IN (SELECT id FROM users WHERE deleted_at IS NOT NULL);
+```
+
+### COPY (PostgreSQL bulk load)
+```sql
+COPY products (name, price, category) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
+```
+
+---
+
+## Utility Patterns
+
+### Generate series (PostgreSQL)
+```sql
+-- Fill date gaps
+SELECT d::date FROM generate_series('2025-01-01'::date, '2025-12-31', '1 day') d;
+```
+
+### Deduplicate rows
+```sql
+DELETE FROM events a USING events b
+WHERE a.id > b.id AND a.user_id = b.user_id AND a.event_type = b.event_type
+  AND a.created_at = b.created_at;
+```
+
+### Pivot (manual)
+```sql
+SELECT user_id,
+  SUM(CASE WHEN month = 1 THEN revenue END) AS jan,
+  SUM(CASE WHEN month = 2 THEN revenue END) AS feb,
+  SUM(CASE WHEN month = 3 THEN revenue END) AS mar
+FROM monthly_revenue
+GROUP BY user_id;
+```
+
+### Conditional INSERT (skip if exists)
+```sql
+INSERT INTO tags (name) SELECT 'new-tag'
+WHERE NOT EXISTS (SELECT 1 FROM tags WHERE name = 'new-tag');
+```
--- a/engineering/sql-database-assistant/scripts/migration_generator.py
+++ b/engineering/sql-database-assistant/scripts/migration_generator.py
@@ -0,0 +1,442 @@
+#!/usr/bin/env python3
+"""
+Migration Generator
+
+Generates database migration file templates (up/down) from natural-language
+schema change descriptions.
+
+Supported operations:
+- Add column, drop column, rename column
+- Add table, drop table, rename table
+- Add index, drop index
+- Add constraint, drop constraint
+- Change column type
+
+Usage:
+    python migration_generator.py --change "add email_verified boolean to users" --dialect postgres
+    python migration_generator.py --change "rename column name to full_name in customers" --format alembic
+    python migration_generator.py --change "add index on orders(status, created_at)" --output 001_add_index.sql
+    python migration_generator.py --change "create table reviews with id, user_id, rating, body" --json
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+import textwrap
+from dataclasses import dataclass, asdict
+from datetime import datetime
+from typing import List, Optional, Tuple
+
+
+@dataclass
+class Migration:
+    """A generated migration with up and down scripts."""
+    description: str
+    dialect: str
+    format: str
+    up: str
+    down: str
+    warnings: List[str]
+
+    def to_dict(self):
+        return asdict(self)
+
+
+# ---------------------------------------------------------------------------
+# Change parsers — extract structured intent from natural language
+# ---------------------------------------------------------------------------
+
+def parse_add_column(desc: str) -> Optional[dict]:
+    """Parse: add <column> <type> to <table>"""
+    m = re.match(
+        r'add\s+(?:column\s+)?(\w+)\s+(\w[\w(),.]*)\s+(?:to|on)\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "add_column", "column": m.group(1), "type": m.group(2), "table": m.group(3)}
+    return None
+
+
+def parse_drop_column(desc: str) -> Optional[dict]:
+    """Parse: drop/remove <column> from <table>"""
+    m = re.match(
+        r'(?:drop|remove)\s+(?:column\s+)?(\w+)\s+from\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "drop_column", "column": m.group(1), "table": m.group(2)}
+    return None
+
+
+def parse_rename_column(desc: str) -> Optional[dict]:
+    """Parse: rename column <old> to <new> in <table>"""
+    m = re.match(
+        r'rename\s+column\s+(\w+)\s+to\s+(\w+)\s+in\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "rename_column", "old": m.group(1), "new": m.group(2), "table": m.group(3)}
+    return None
+
+
+def parse_add_table(desc: str) -> Optional[dict]:
+    """Parse: create table <name> with <col1>, <col2>, ..."""
+    m = re.match(
+        r'create\s+table\s+(\w+)\s+with\s+(.+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        cols = [c.strip() for c in m.group(2).split(",")]
+        return {"op": "add_table", "table": m.group(1), "columns": cols}
+    return None
+
+
+def parse_drop_table(desc: str) -> Optional[dict]:
+    """Parse: drop table <name>"""
+    m = re.match(r'drop\s+table\s+(\w+)', desc, re.IGNORECASE)
+    if m:
+        return {"op": "drop_table", "table": m.group(1)}
+    return None
+
+
+def parse_add_index(desc: str) -> Optional[dict]:
+    """Parse: add index on <table>(<col1>, <col2>)"""
+    m = re.match(
+        r'add\s+(?:unique\s+)?index\s+(?:on\s+)?(\w+)\s*\(([^)]+)\)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        unique = "unique" in desc.lower()
+        cols = [c.strip() for c in m.group(2).split(",")]
+        return {"op": "add_index", "table": m.group(1), "columns": cols, "unique": unique}
+    return None
+
+
+def parse_change_type(desc: str) -> Optional[dict]:
+    """Parse: change <column> type to <type> in <table>"""
+    m = re.match(
+        r'change\s+(?:column\s+)?(\w+)\s+type\s+to\s+(\w[\w(),.]*)\s+in\s+(\w+)',
+        desc, re.IGNORECASE,
+    )
+    if m:
+        return {"op": "change_type", "column": m.group(1), "new_type": m.group(2), "table": m.group(3)}
+    return None
+
+
+PARSERS = [
+    parse_add_column,
+    parse_drop_column,
+    parse_rename_column,
+    parse_add_table,
+    parse_drop_table,
+    parse_add_index,
+    parse_change_type,
+]
+
+
+def parse_change(desc: str) -> Optional[dict]:
+    for parser in PARSERS:
+        result = parser(desc)
+        if result:
+            return result
+    return None
+
+
+# ---------------------------------------------------------------------------
+# SQL generators per dialect
+# ---------------------------------------------------------------------------
+
+TYPE_MAP = {
+    "boolean": {"postgres": "BOOLEAN", "mysql": "TINYINT(1)", "sqlite": "INTEGER", "sqlserver": "BIT"},
+    "text": {"postgres": "TEXT", "mysql": "TEXT", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
+    "integer": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
+    "int": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
+    "serial": {"postgres": "SERIAL", "mysql": "INT AUTO_INCREMENT", "sqlite": "INTEGER", "sqlserver": "INT IDENTITY(1,1)"},
+    "varchar": {"postgres": "VARCHAR(255)", "mysql": "VARCHAR(255)", "sqlite": "TEXT", "sqlserver": "NVARCHAR(255)"},
+    "timestamp": {"postgres": "TIMESTAMP", "mysql": "DATETIME", "sqlite": "TEXT", "sqlserver": "DATETIME2"},
+    "uuid": {"postgres": "UUID", "mysql": "CHAR(36)", "sqlite": "TEXT", "sqlserver": "UNIQUEIDENTIFIER"},
+    "json": {"postgres": "JSONB", "mysql": "JSON", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
+    "decimal": {"postgres": "DECIMAL(19,4)", "mysql": "DECIMAL(19,4)", "sqlite": "REAL", "sqlserver": "DECIMAL(19,4)"},
+    "float": {"postgres": "DOUBLE PRECISION", "mysql": "DOUBLE", "sqlite": "REAL", "sqlserver": "FLOAT"},
+}
+
+
+def map_type(type_name: str, dialect: str) -> str:
+    """Map a generic type name to a dialect-specific type."""
+    key = type_name.lower().rstrip("()")
+    if key in TYPE_MAP and dialect in TYPE_MAP[key]:
+        return TYPE_MAP[key][dialect]
+    return type_name.upper()
+
+
+def gen_add_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    col_type = map_type(change["type"], dialect)
+    table = change["table"]
+    col = change["column"]
+    up = f"ALTER TABLE {table} ADD COLUMN {col} {col_type};"
+    down = f"ALTER TABLE {table} DROP COLUMN {col};"
+    return up, down, []
+
+
+def gen_drop_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    col = change["column"]
+    up = f"ALTER TABLE {table} DROP COLUMN {col};"
+    down = f"-- WARNING: Cannot fully reverse DROP COLUMN. Provide the original type.\nALTER TABLE {table} ADD COLUMN {col} TEXT;"
+    return up, down, ["Down migration uses TEXT as placeholder. Replace with the original column type."]
+
+
+def gen_rename_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    old, new = change["old"], change["new"]
+    warnings = []
+    if dialect == "postgres":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    elif dialect == "mysql":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    elif dialect == "sqlite":
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+        warnings.append("SQLite RENAME COLUMN requires version 3.25.0+.")
+    elif dialect == "sqlserver":
+        up = f"EXEC sp_rename '{table}.{old}', '{new}', 'COLUMN';"
+        down = f"EXEC sp_rename '{table}.{new}', '{old}', 'COLUMN';"
+    else:
+        up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
+        down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
+    return up, down, warnings
+
+
+def gen_add_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    cols = change["columns"]
+    col_defs = []
+    has_id = False
+    for col in cols:
+        col = col.strip()
+        if col.lower() == "id":
+            has_id = True
+            if dialect == "postgres":
+                col_defs.append("    id SERIAL PRIMARY KEY")
+            elif dialect == "mysql":
+                col_defs.append("    id INT AUTO_INCREMENT PRIMARY KEY")
+            elif dialect == "sqlite":
+                col_defs.append("    id INTEGER PRIMARY KEY AUTOINCREMENT")
+            elif dialect == "sqlserver":
+                col_defs.append("    id INT IDENTITY(1,1) PRIMARY KEY")
+        else:
+            # Check if type is specified (e.g., "rating int")
+            parts = col.split()
+            if len(parts) >= 2:
+                col_defs.append(f"    {parts[0]} {map_type(parts[1], dialect)}")
+            else:
+                col_defs.append(f"    {col} TEXT")
+
+    cols_sql = ",\n".join(col_defs)
+    up = f"CREATE TABLE {table} (\n{cols_sql}\n);"
+    down = f"DROP TABLE {table};"
+    warnings = []
+    if not has_id:
+        warnings.append("Table has no explicit primary key. Consider adding an 'id' column.")
+    return up, down, warnings
+
+
+def gen_drop_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    up = f"DROP TABLE {table};"
+    down = f"-- WARNING: Cannot reverse DROP TABLE without original DDL.\nCREATE TABLE {table} (id INTEGER PRIMARY KEY);"
+    return up, down, ["Down migration is a placeholder. Replace with the original CREATE TABLE statement."]
+
+
+def gen_add_index(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    cols = change["columns"]
+    unique = "UNIQUE " if change.get("unique") else ""
+    idx_name = f"idx_{table}_{'_'.join(cols)}"
+    if dialect == "postgres":
+        up = f"CREATE {unique}INDEX CONCURRENTLY {idx_name} ON {table} ({', '.join(cols)});"
+    else:
+        up = f"CREATE {unique}INDEX {idx_name} ON {table} ({', '.join(cols)});"
+    down = f"DROP INDEX {idx_name};" if dialect != "mysql" else f"DROP INDEX {idx_name} ON {table};"
+    warnings = []
+    if dialect == "postgres":
+        warnings.append("CONCURRENTLY cannot run inside a transaction. Run outside migration transaction.")
+    return up, down, warnings
+
+
+def gen_change_type(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
+    table = change["table"]
+    col = change["column"]
+    new_type = map_type(change["new_type"], dialect)
+    warnings = ["Down migration uses TEXT as placeholder. Replace with the original column type."]
+    if dialect == "postgres":
+        up = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE {new_type};"
+        down = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE TEXT;"
+    elif dialect == "mysql":
+        up = f"ALTER TABLE {table} MODIFY COLUMN {col} {new_type};"
+        down = f"ALTER TABLE {table} MODIFY COLUMN {col} TEXT;"
+    elif dialect == "sqlserver":
+        up = f"ALTER TABLE {table} ALTER COLUMN {col} {new_type};"
+        down = f"ALTER TABLE {table} ALTER COLUMN {col} NVARCHAR(MAX);"
+    else:
+        up = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
+        down = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
+        warnings.append("SQLite requires table recreation for type changes.")
+    return up, down, warnings
+
+
+GENERATORS = {
+    "add_column": gen_add_column,
+    "drop_column": gen_drop_column,
+    "rename_column": gen_rename_column,
+    "add_table": gen_add_table,
+    "drop_table": gen_drop_table,
+    "add_index": gen_add_index,
+    "change_type": gen_change_type,
+}
+
+
+# ---------------------------------------------------------------------------
+# Format wrappers
+# ---------------------------------------------------------------------------
+
+def wrap_sql(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Wrap as plain SQL migration files."""
+    timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
+    header = f"-- Migration: {description}\n-- Generated: {datetime.now().isoformat()}\n\n"
+    return header + "-- Up\n" + up, header + "-- Down\n" + down
+
+
+def wrap_prisma(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Format as Prisma migration SQL (Prisma uses raw SQL in migration.sql)."""
+    header = f"-- Migration: {description}\n-- Format: Prisma (migration.sql)\n\n"
+    return header + up, header + "-- Rollback\n" + down
+
+
+def wrap_alembic(up: str, down: str, description: str) -> Tuple[str, str]:
+    """Format as Alembic Python migration."""
+    slug = re.sub(r'\W+', '_', description.lower())[:40]
+    revision = datetime.now().strftime("%Y%m%d%H%M")
+    template = textwrap.dedent(f'''\
+        """
+        {description}
+
+        Revision ID: {revision}
+        """
+        from alembic import op
+        import sqlalchemy as sa
+
+        revision = '{revision}'
+        down_revision = None  # Set to previous revision
+
+
+        def upgrade():
+            op.execute("""
+        {textwrap.indent(up, "        ")}
+            """)
+
+
+        def downgrade():
+            op.execute("""
+        {textwrap.indent(down, "        ")}
+            """)
+    ''')
+    return template, ""
+
+
+FORMATTERS = {
+    "sql": wrap_sql,
+    "prisma": wrap_prisma,
+    "alembic": wrap_alembic,
+}
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate database migration templates from change descriptions.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Supported change descriptions:
+  "add email_verified boolean to users"
+  "drop column legacy_flag from accounts"
+  "rename column name to full_name in customers"
+  "create table reviews with id, user_id, rating int, body text"
+  "drop table temp_imports"
+  "add index on orders(status, created_at)"
+  "add unique index on users(email)"
+  "change email type to varchar in users"
+
+Examples:
+  %(prog)s --change "add phone varchar to users" --dialect postgres
+  %(prog)s --change "create table reviews with id, user_id, rating int, body" --format prisma
+  %(prog)s --change "add index on orders(status)" --output migrations/001.sql --json
+        """,
+    )
+    parser.add_argument("--change", required=True, help="Natural-language description of the schema change")
+    parser.add_argument("--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
+                        default="postgres", help="Target database dialect (default: postgres)")
+    parser.add_argument("--format", choices=["sql", "prisma", "alembic"], default="sql",
+                        dest="fmt", help="Output format (default: sql)")
+    parser.add_argument("--output", help="Write migration to file instead of stdout")
+    parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
+    args = parser.parse_args()
+
+    change = parse_change(args.change)
+    if not change:
+        print(f"Error: Could not parse change description: '{args.change}'", file=sys.stderr)
+        print("Run with --help to see supported patterns.", file=sys.stderr)
+        sys.exit(1)
+
+    gen_fn = GENERATORS.get(change["op"])
+    if not gen_fn:
+        print(f"Error: No generator for operation '{change['op']}'", file=sys.stderr)
+        sys.exit(1)
+
+    up, down, warnings = gen_fn(change, args.dialect)
+
+    fmt_fn = FORMATTERS[args.fmt]
+    up_formatted, down_formatted = fmt_fn(up, down, args.change)
+
+    migration = Migration(
+        description=args.change,
+        dialect=args.dialect,
+        format=args.fmt,
+        up=up_formatted,
+        down=down_formatted,
+        warnings=warnings,
+    )
+
+    if args.json_output:
+        print(json.dumps(migration.to_dict(), indent=2))
+    else:
+        if args.output:
+            with open(args.output, "w") as f:
+                f.write(migration.up)
+            print(f"Migration written to {args.output}")
+            if migration.down:
+                down_path = args.output.replace(".sql", "_down.sql")
+                with open(down_path, "w") as f:
+                    f.write(migration.down)
+                print(f"Rollback written to {down_path}")
+        else:
+            print(migration.up)
+            if migration.down:
+                print("\n" + "=" * 40 + " ROLLBACK " + "=" * 40 + "\n")
+                print(migration.down)
+
+        if warnings:
+            print("\nWarnings:")
+            for w in warnings:
+                print(f"  - {w}")
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/sql-database-assistant/scripts/query_optimizer.py
+++ b/engineering/sql-database-assistant/scripts/query_optimizer.py
@@ -0,0 +1,348 @@
+#!/usr/bin/env python3
+"""
+SQL Query Optimizer — Static Analysis
+
+Analyzes SQL queries for common performance issues:
+- SELECT * usage
+- Missing WHERE clauses on UPDATE/DELETE
+- Cartesian joins (missing JOIN conditions)
+- Subqueries in SELECT list
+- Missing LIMIT on unbounded SELECTs
+- Function calls on indexed columns (non-sargable)
+- LIKE with leading wildcard
+- ORDER BY RAND()
+- UNION instead of UNION ALL
+- NOT IN with subquery (NULL-unsafe)
+
+Usage:
+    python query_optimizer.py --query "SELECT * FROM users"
+    python query_optimizer.py --query queries.sql --dialect postgres
+    python query_optimizer.py --query "SELECT * FROM orders" --json
+"""
+
+import argparse
+import json
+import os
+import re
+import sys
+from dataclasses import dataclass, asdict
+from typing import List, Optional
+
+
+@dataclass
+class Issue:
+    """A single optimization issue found in a query."""
+    severity: str  # critical, warning, info
+    rule: str
+    message: str
+    suggestion: str
+    line: Optional[int] = None
+
+
+@dataclass
+class QueryAnalysis:
+    """Analysis result for one SQL query."""
+    query: str
+    issues: List[Issue]
+    score: int  # 0-100, higher is better
+
+    def to_dict(self):
+        return {
+            "query": self.query[:200] + ("..." if len(self.query) > 200 else ""),
+            "issues": [asdict(i) for i in self.issues],
+            "issue_count": len(self.issues),
+            "score": self.score,
+        }
+
+
+# ---------------------------------------------------------------------------
+# Rule checkers
+# ---------------------------------------------------------------------------
+
+def check_select_star(sql: str) -> Optional[Issue]:
+    """Detect SELECT * usage."""
+    if re.search(r'\bSELECT\s+\*\s', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="select-star",
+            message="SELECT * transfers unnecessary data and breaks on schema changes.",
+            suggestion="List only the columns you need: SELECT col1, col2, ...",
+        )
+    return None
+
+
+def check_missing_where(sql: str) -> Optional[Issue]:
+    """Detect UPDATE/DELETE without WHERE."""
+    upper = sql.upper().strip()
+    for keyword in ("UPDATE", "DELETE"):
+        if upper.startswith(keyword) and "WHERE" not in upper:
+            return Issue(
+                severity="critical",
+                rule="missing-where",
+                message=f"{keyword} without WHERE affects every row in the table.",
+                suggestion=f"Add a WHERE clause to restrict the {keyword} scope.",
+            )
+    return None
+
+
+def check_cartesian_join(sql: str) -> Optional[Issue]:
+    """Detect comma-separated tables without explicit JOIN or WHERE join condition."""
+    upper = sql.upper()
+    if "SELECT" not in upper:
+        return None
+    from_match = re.search(r'\bFROM\s+(.+?)(?:\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
+                           sql, re.IGNORECASE | re.DOTALL)
+    if not from_match:
+        return None
+    from_clause = from_match.group(1)
+    # Skip if explicit JOINs are used
+    if re.search(r'\bJOIN\b', from_clause, re.IGNORECASE):
+        return None
+    # Count comma-separated tables
+    tables = [t.strip() for t in from_clause.split(",") if t.strip()]
+    if len(tables) > 1 and "WHERE" not in upper:
+        return Issue(
+            severity="critical",
+            rule="cartesian-join",
+            message="Multiple tables in FROM without JOIN or WHERE creates a cartesian product.",
+            suggestion="Use explicit JOIN syntax with ON conditions.",
+        )
+    return None
+
+
+def check_subquery_in_select(sql: str) -> Optional[Issue]:
+    """Detect correlated subqueries in SELECT list."""
+    select_match = re.search(r'\bSELECT\b(.+?)\bFROM\b', sql, re.IGNORECASE | re.DOTALL)
+    if select_match:
+        select_clause = select_match.group(1)
+        if re.search(r'\(\s*SELECT\b', select_clause, re.IGNORECASE):
+            return Issue(
+                severity="warning",
+                rule="subquery-in-select",
+                message="Subquery in SELECT list executes once per row (correlated subquery).",
+                suggestion="Rewrite as a LEFT JOIN with aggregation.",
+            )
+    return None
+
+
+def check_missing_limit(sql: str) -> Optional[Issue]:
+    """Detect unbounded SELECT without LIMIT."""
+    upper = sql.upper().strip()
+    if not upper.startswith("SELECT"):
+        return None
+    # Skip if it's a subquery or aggregate-only
+    if re.search(r'\bCOUNT\s*\(', upper) and "GROUP BY" not in upper:
+        return None
+    if "LIMIT" not in upper and "FETCH" not in upper and "TOP " not in upper:
+        return Issue(
+            severity="info",
+            rule="missing-limit",
+            message="SELECT without LIMIT may return unbounded rows.",
+            suggestion="Add LIMIT to prevent returning excessive data.",
+        )
+    return None
+
+
+def check_function_on_column(sql: str) -> Optional[Issue]:
+    """Detect function calls on columns in WHERE (non-sargable)."""
+    where_match = re.search(r'\bWHERE\b(.+?)(?:\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
+                            sql, re.IGNORECASE | re.DOTALL)
+    if not where_match:
+        return None
+    where_clause = where_match.group(1)
+    non_sargable = re.search(
+        r'\b(YEAR|MONTH|DAY|DATE|UPPER|LOWER|TRIM|CAST|COALESCE|IFNULL|NVL)\s*\(',
+        where_clause, re.IGNORECASE
+    )
+    if non_sargable:
+        func = non_sargable.group(1).upper()
+        return Issue(
+            severity="warning",
+            rule="non-sargable",
+            message=f"Function {func}() on column in WHERE prevents index usage.",
+            suggestion="Rewrite to compare the raw column against transformed constants.",
+        )
+    return None
+
+
+def check_leading_wildcard(sql: str) -> Optional[Issue]:
+    """Detect LIKE '%...' patterns."""
+    if re.search(r"LIKE\s+'%", sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="leading-wildcard",
+            message="LIKE with leading wildcard prevents index usage.",
+            suggestion="Use full-text search (GIN index, FULLTEXT, FTS5) for substring matching.",
+        )
+    return None
+
+
+def check_order_by_rand(sql: str) -> Optional[Issue]:
+    """Detect ORDER BY RAND() / RANDOM()."""
+    if re.search(r'ORDER\s+BY\s+(RAND|RANDOM)\s*\(\)', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="order-by-rand",
+            message="ORDER BY RAND() scans and sorts the entire table.",
+            suggestion="Use application-side random sampling or TABLESAMPLE.",
+        )
+    return None
+
+
+def check_union_vs_union_all(sql: str) -> Optional[Issue]:
+    """Detect UNION without ALL (unnecessary dedup)."""
+    if re.search(r'\bUNION\b(?!\s+ALL\b)', sql, re.IGNORECASE):
+        return Issue(
+            severity="info",
+            rule="union-without-all",
+            message="UNION performs deduplication sort; use UNION ALL if duplicates are acceptable.",
+            suggestion="Replace UNION with UNION ALL unless you specifically need deduplication.",
+        )
+    return None
+
+
+def check_not_in_subquery(sql: str) -> Optional[Issue]:
+    """Detect NOT IN (SELECT ...) which is NULL-unsafe."""
+    if re.search(r'\bNOT\s+IN\s*\(\s*SELECT\b', sql, re.IGNORECASE):
+        return Issue(
+            severity="warning",
+            rule="not-in-subquery",
+            message="NOT IN with subquery returns no rows if any subquery result is NULL.",
+            suggestion="Use NOT EXISTS (SELECT 1 ...) instead.",
+        )
+    return None
+
+
+ALL_CHECKS = [
+    check_select_star,
+    check_missing_where,
+    check_cartesian_join,
+    check_subquery_in_select,
+    check_missing_limit,
+    check_function_on_column,
+    check_leading_wildcard,
+    check_order_by_rand,
+    check_union_vs_union_all,
+    check_not_in_subquery,
+]
+
+
+# ---------------------------------------------------------------------------
+# Analysis engine
+# ---------------------------------------------------------------------------
+
+def analyze_query(sql: str, dialect: str = "postgres") -> QueryAnalysis:
+    """Run all checks against a single SQL query."""
+    issues: List[Issue] = []
+    for check_fn in ALL_CHECKS:
+        issue = check_fn(sql)
+        if issue:
+            issues.append(issue)
+
+    # Score: start at 100, deduct per severity
+    score = 100
+    for issue in issues:
+        if issue.severity == "critical":
+            score -= 25
+        elif issue.severity == "warning":
+            score -= 10
+        else:
+            score -= 5
+    score = max(0, score)
+
+    return QueryAnalysis(query=sql.strip(), issues=issues, score=score)
+
+
+def split_queries(text: str) -> List[str]:
+    """Split SQL text into individual statements."""
+    queries = []
+    for stmt in text.split(";"):
+        stmt = stmt.strip()
+        if stmt and len(stmt) > 5:
+            queries.append(stmt + ";")
+    return queries
+
+
+# ---------------------------------------------------------------------------
+# Output formatting
+# ---------------------------------------------------------------------------
+
+SEVERITY_ICONS = {"critical": "[CRITICAL]", "warning": "[WARNING]", "info": "[INFO]"}
+
+
+def format_text(analyses: List[QueryAnalysis]) -> str:
+    """Format analysis results as human-readable text."""
+    lines = []
+    for i, analysis in enumerate(analyses, 1):
+        lines.append(f"{'='*60}")
+        lines.append(f"Query {i} (Score: {analysis.score}/100)")
+        lines.append(f"  {analysis.query[:120]}{'...' if len(analysis.query) > 120 else ''}")
+        lines.append("")
+        if not analysis.issues:
+            lines.append("  No issues detected.")
+        for issue in analysis.issues:
+            icon = SEVERITY_ICONS.get(issue.severity, "")
+            lines.append(f"  {icon} {issue.rule}: {issue.message}")
+            lines.append(f"    -> {issue.suggestion}")
+        lines.append("")
+    return "\n".join(lines)
+
+
+def format_json(analyses: List[QueryAnalysis]) -> str:
+    """Format analysis results as JSON."""
+    return json.dumps(
+        {"analyses": [a.to_dict() for a in analyses], "total_queries": len(analyses)},
+        indent=2,
+    )
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Analyze SQL queries for common performance issues.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --query "SELECT * FROM users"
+  %(prog)s --query queries.sql --dialect mysql
+  %(prog)s --query "DELETE FROM orders" --json
+        """,
+    )
+    parser.add_argument(
+        "--query", required=True,
+        help="SQL query string or path to a .sql file",
+    )
+    parser.add_argument(
+        "--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
+        default="postgres", help="SQL dialect (default: postgres)",
+    )
+    parser.add_argument(
+        "--json", action="store_true", dest="json_output",
+        help="Output results as JSON",
+    )
+    args = parser.parse_args()
+
+    # Determine if query is a file path or inline SQL
+    sql_text = args.query
+    if os.path.isfile(args.query):
+        with open(args.query, "r") as f:
+            sql_text = f.read()
+
+    queries = split_queries(sql_text)
+    if not queries:
+        # Treat the whole input as a single query
+        queries = [sql_text.strip()]
+
+    analyses = [analyze_query(q, args.dialect) for q in queries]
+
+    if args.json_output:
+        print(format_json(analyses))
+    else:
+        print(format_text(analyses))
+
+
+if __name__ == "__main__":
+    main()
--- a/engineering/sql-database-assistant/scripts/schema_explorer.py
+++ b/engineering/sql-database-assistant/scripts/schema_explorer.py
@@ -0,0 +1,315 @@
+#!/usr/bin/env python3
+"""
+Schema Explorer
+
+Generates schema documentation from database introspection queries.
+Outputs the introspection SQL and sample documentation templates
+for PostgreSQL, MySQL, SQLite, and SQL Server.
+
+Since this tool runs without a live database connection, it generates:
+1. The introspection queries you need to run
+2. Documentation templates from the results
+3. Sample schema docs for common table patterns
+
+Usage:
+    python schema_explorer.py --dialect postgres --tables all --format md
+    python schema_explorer.py --dialect mysql --tables users,orders --format json
+    python schema_explorer.py --dialect sqlite --tables all --json
+"""
+
+import argparse
+import json
+import sys
+import textwrap
+from dataclasses import dataclass, asdict
+from typing import List, Optional, Dict
+
+
+# ---------------------------------------------------------------------------
+# Introspection query templates per dialect
+# ---------------------------------------------------------------------------
+
+INTROSPECTION_QUERIES: Dict[str, Dict[str, str]] = {
+    "postgres": {
+        "tables": textwrap.dedent("""\
+            SELECT table_name
+            FROM information_schema.tables
+            WHERE table_schema = 'public' AND table_type = 'BASE TABLE'
+            ORDER BY table_name;"""),
+        "columns": textwrap.dedent("""\
+            SELECT table_name, column_name, data_type, character_maximum_length,
+                   is_nullable, column_default
+            FROM information_schema.columns
+            WHERE table_schema = 'public' {table_filter}
+            ORDER BY table_name, ordinal_position;"""),
+        "primary_keys": textwrap.dedent("""\
+            SELECT tc.table_name, kcu.column_name
+            FROM information_schema.table_constraints tc
+            JOIN information_schema.key_column_usage kcu
+              ON tc.constraint_name = kcu.constraint_name
+            WHERE tc.constraint_type = 'PRIMARY KEY' AND tc.table_schema = 'public'
+            ORDER BY tc.table_name;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT tc.table_name, kcu.column_name,
+                   ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
+            FROM information_schema.table_constraints tc
+            JOIN information_schema.key_column_usage kcu
+              ON tc.constraint_name = kcu.constraint_name
+            JOIN information_schema.constraint_column_usage ccu
+              ON tc.constraint_name = ccu.constraint_name
+            WHERE tc.constraint_type = 'FOREIGN KEY'
+            ORDER BY tc.table_name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT schemaname, tablename, indexname, indexdef
+            FROM pg_indexes
+            WHERE schemaname = 'public'
+            ORDER BY tablename, indexname;"""),
+        "table_sizes": textwrap.dedent("""\
+            SELECT relname AS table_name,
+                   pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
+                   pg_size_pretty(pg_relation_size(relid)) AS data_size,
+                   pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
+            FROM pg_catalog.pg_statio_user_tables
+            ORDER BY pg_total_relation_size(relid) DESC;"""),
+    },
+    "mysql": {
+        "tables": textwrap.dedent("""\
+            SELECT table_name
+            FROM information_schema.tables
+            WHERE table_schema = DATABASE() AND table_type = 'BASE TABLE'
+            ORDER BY table_name;"""),
+        "columns": textwrap.dedent("""\
+            SELECT table_name, column_name, column_type, is_nullable,
+                   column_default, column_key, extra
+            FROM information_schema.columns
+            WHERE table_schema = DATABASE() {table_filter}
+            ORDER BY table_name, ordinal_position;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT table_name, column_name, referenced_table_name, referenced_column_name
+            FROM information_schema.key_column_usage
+            WHERE table_schema = DATABASE() AND referenced_table_name IS NOT NULL
+            ORDER BY table_name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT table_name, index_name, non_unique, column_name, seq_in_index
+            FROM information_schema.statistics
+            WHERE table_schema = DATABASE()
+            ORDER BY table_name, index_name, seq_in_index;"""),
+        "table_sizes": textwrap.dedent("""\
+            SELECT table_name, table_rows,
+                   ROUND(data_length / 1024 / 1024, 2) AS data_mb,
+                   ROUND(index_length / 1024 / 1024, 2) AS index_mb
+            FROM information_schema.tables
+            WHERE table_schema = DATABASE()
+            ORDER BY data_length DESC;"""),
+    },
+    "sqlite": {
+        "tables": textwrap.dedent("""\
+            SELECT name FROM sqlite_master
+            WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
+            ORDER BY name;"""),
+        "columns": textwrap.dedent("""\
+            -- Run for each table:
+            PRAGMA table_info({table_name});"""),
+        "foreign_keys": textwrap.dedent("""\
+            -- Run for each table:
+            PRAGMA foreign_key_list({table_name});"""),
+        "indexes": textwrap.dedent("""\
+            SELECT name, tbl_name, sql FROM sqlite_master
+            WHERE type = 'index'
+            ORDER BY tbl_name, name;"""),
+        "schema_dump": textwrap.dedent("""\
+            SELECT name, sql FROM sqlite_master
+            WHERE type = 'table'
+            ORDER BY name;"""),
+    },
+    "sqlserver": {
+        "tables": textwrap.dedent("""\
+            SELECT TABLE_NAME
+            FROM INFORMATION_SCHEMA.TABLES
+            WHERE TABLE_TYPE = 'BASE TABLE'
+            ORDER BY TABLE_NAME;"""),
+        "columns": textwrap.dedent("""\
+            SELECT t.name AS table_name, c.name AS column_name,
+                   ty.name AS data_type, c.max_length, c.precision, c.scale,
+                   c.is_nullable, dc.definition AS default_value
+            FROM sys.columns c
+            JOIN sys.tables t ON c.object_id = t.object_id
+            JOIN sys.types ty ON c.user_type_id = ty.user_type_id
+            LEFT JOIN sys.default_constraints dc ON c.default_object_id = dc.object_id
+            {table_filter}
+            ORDER BY t.name, c.column_id;"""),
+        "foreign_keys": textwrap.dedent("""\
+            SELECT fk.name AS fk_name,
+                   tp.name AS parent_table, cp.name AS parent_column,
+                   tr.name AS referenced_table, cr.name AS referenced_column
+            FROM sys.foreign_keys fk
+            JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
+            JOIN sys.tables tp ON fkc.parent_object_id = tp.object_id
+            JOIN sys.columns cp ON fkc.parent_object_id = cp.object_id AND fkc.parent_column_id = cp.column_id
+            JOIN sys.tables tr ON fkc.referenced_object_id = tr.object_id
+            JOIN sys.columns cr ON fkc.referenced_object_id = cr.object_id AND fkc.referenced_column_id = cr.column_id
+            ORDER BY tp.name;"""),
+        "indexes": textwrap.dedent("""\
+            SELECT t.name AS table_name, i.name AS index_name,
+                   i.type_desc, i.is_unique, c.name AS column_name,
+                   ic.key_ordinal
+            FROM sys.indexes i
+            JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
+            JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
+            JOIN sys.tables t ON i.object_id = t.object_id
+            WHERE i.name IS NOT NULL
+            ORDER BY t.name, i.name, ic.key_ordinal;"""),
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# Documentation generators
+# ---------------------------------------------------------------------------
+
+SAMPLE_TABLES = {
+    "users": {
+        "columns": [
+            {"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
+            {"name": "email", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "Unique, indexed"},
+            {"name": "name", "type": "VARCHAR(255)", "nullable": "YES", "default": "NULL", "notes": "Display name"},
+            {"name": "password_hash", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "bcrypt hash"},
+            {"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+            {"name": "updated_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+        ],
+        "indexes": ["PRIMARY KEY (id)", "UNIQUE INDEX (email)"],
+        "foreign_keys": [],
+    },
+    "orders": {
+        "columns": [
+            {"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
+            {"name": "user_id", "type": "INTEGER", "nullable": "NO", "default": "-", "notes": "FK -> users.id"},
+            {"name": "status", "type": "VARCHAR(50)", "nullable": "NO", "default": "'pending'", "notes": "pending/paid/shipped/cancelled"},
+            {"name": "total", "type": "DECIMAL(19,4)", "nullable": "NO", "default": "0", "notes": "Order total in cents"},
+            {"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
+        ],
+        "indexes": ["PRIMARY KEY (id)", "INDEX (user_id)", "INDEX (status, created_at)"],
+        "foreign_keys": ["user_id -> users.id ON DELETE CASCADE"],
+    },
+}
+
+
+def generate_md(dialect: str, tables: List[str]) -> str:
+    """Generate markdown schema documentation."""
+    lines = [f"# Database Schema Documentation ({dialect.upper()})\n"]
+    lines.append(f"Generated by sql-database-assistant schema_explorer.\n")
+
+    # Introspection queries section
+    lines.append("## Introspection Queries\n")
+    lines.append("Run these queries against your database to extract schema information:\n")
+    queries = INTROSPECTION_QUERIES.get(dialect, {})
+    for qname, qsql in queries.items():
+        table_filter = ""
+        if "all" not in tables:
+            tlist = ", ".join(f"'{t}'" for t in tables)
+            table_filter = f"AND table_name IN ({tlist})"
+        qsql = qsql.replace("{table_filter}", table_filter)
+        qsql = qsql.replace("{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME")
+        lines.append(f"### {qname.replace('_', ' ').title()}\n")
+        lines.append(f"```sql\n{qsql}\n```\n")
+
+    # Sample documentation
+    lines.append("## Sample Table Documentation\n")
+    lines.append("Below is an example of the documentation format produced from query results:\n")
+
+    show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
+    for tname in show_tables:
+        sample = SAMPLE_TABLES.get(tname)
+        if not sample:
+            lines.append(f"### {tname}\n")
+            lines.append("_No sample data available. Run introspection queries above._\n")
+            continue
+
+        lines.append(f"### {tname}\n")
+        lines.append("| Column | Type | Nullable | Default | Notes |")
+        lines.append("|--------|------|----------|---------|-------|")
+        for col in sample["columns"]:
+            lines.append(f"| {col['name']} | {col['type']} | {col['nullable']} | {col['default']} | {col['notes']} |")
+        lines.append("")
+        if sample["indexes"]:
+            lines.append("**Indexes:** " + ", ".join(sample["indexes"]))
+        if sample["foreign_keys"]:
+            lines.append("**Foreign Keys:** " + ", ".join(sample["foreign_keys"]))
+        lines.append("")
+
+    return "\n".join(lines)
+
+
+def generate_json_output(dialect: str, tables: List[str]) -> dict:
+    """Generate JSON schema documentation."""
+    queries = INTROSPECTION_QUERIES.get(dialect, {})
+    processed = {}
+    for qname, qsql in queries.items():
+        table_filter = ""
+        if "all" not in tables:
+            tlist = ", ".join(f"'{t}'" for t in tables)
+            table_filter = f"AND table_name IN ({tlist})"
+        processed[qname] = qsql.replace("{table_filter}", table_filter).replace(
+            "{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME"
+        )
+
+    show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
+    sample_docs = {}
+    for tname in show_tables:
+        sample = SAMPLE_TABLES.get(tname)
+        if sample:
+            sample_docs[tname] = sample
+
+    return {
+        "dialect": dialect,
+        "requested_tables": tables,
+        "introspection_queries": processed,
+        "sample_documentation": sample_docs,
+        "instructions": "Run the introspection queries against your database, then use the results to populate documentation in the sample format shown.",
+    }
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate schema documentation from database introspection.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s --dialect postgres --tables all --format md
+  %(prog)s --dialect mysql --tables users,orders --format json
+  %(prog)s --dialect sqlite --tables all --json
+        """,
+    )
+    parser.add_argument(
+        "--dialect", required=True, choices=["postgres", "mysql", "sqlite", "sqlserver"],
+        help="Target database dialect",
+    )
+    parser.add_argument(
+        "--tables", default="all",
+        help="Comma-separated table names or 'all' (default: all)",
+    )
+    parser.add_argument(
+        "--format", choices=["md", "json"], default="md", dest="fmt",
+        help="Output format (default: md)",
+    )
+    parser.add_argument(
+        "--json", action="store_true", dest="json_output",
+        help="Output as JSON (overrides --format)",
+    )
+    args = parser.parse_args()
+
+    tables = [t.strip() for t in args.tables.split(",")]
+
+    if args.json_output or args.fmt == "json":
+        result = generate_json_output(args.dialect, tables)
+        print(json.dumps(result, indent=2))
+    else:
+        print(generate_md(args.dialect, tables))
+
+
+if __name__ == "__main__":
+    main()