feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance
secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
457
engineering/sql-database-assistant/SKILL.md
Normal file
457
engineering/sql-database-assistant/SKILL.md
Normal file
@@ -0,0 +1,457 @@
|
||||
---
|
||||
name: "sql-database-assistant"
|
||||
description: "Use when the user asks to write SQL queries, optimize database performance, generate migrations, explore database schemas, or work with ORMs like Prisma, Drizzle, TypeORM, or SQLAlchemy."
|
||||
---
|
||||
|
||||
# SQL Database Assistant - POWERFUL Tier Skill
|
||||
|
||||
## Overview
|
||||
|
||||
The operational companion to database design. While **database-designer** focuses on schema architecture and **database-schema-designer** handles ERD modeling, this skill covers the day-to-day: writing queries, optimizing performance, generating migrations, and bridging the gap between application code and database engines.
|
||||
|
||||
### Core Capabilities
|
||||
|
||||
- **Natural Language to SQL** — translate requirements into correct, performant queries
|
||||
- **Schema Exploration** — introspect live databases across PostgreSQL, MySQL, SQLite, SQL Server
|
||||
- **Query Optimization** — EXPLAIN analysis, index recommendations, N+1 detection, rewrite patterns
|
||||
- **Migration Generation** — up/down scripts, zero-downtime strategies, rollback plans
|
||||
- **ORM Integration** — Prisma, Drizzle, TypeORM, SQLAlchemy patterns and escape hatches
|
||||
- **Multi-Database Support** — dialect-aware SQL with compatibility guidance
|
||||
|
||||
### Tools
|
||||
|
||||
| Script | Purpose |
|
||||
|--------|---------|
|
||||
| `scripts/query_optimizer.py` | Static analysis of SQL queries for performance issues |
|
||||
| `scripts/migration_generator.py` | Generate migration file templates from change descriptions |
|
||||
| `scripts/schema_explorer.py` | Generate schema documentation from introspection queries |
|
||||
|
||||
---
|
||||
|
||||
## Natural Language to SQL
|
||||
|
||||
### Translation Patterns
|
||||
|
||||
When converting requirements to SQL, follow this sequence:
|
||||
|
||||
1. **Identify entities** — map nouns to tables
|
||||
2. **Identify relationships** — map verbs to JOINs or subqueries
|
||||
3. **Identify filters** — map adjectives/conditions to WHERE clauses
|
||||
4. **Identify aggregations** — map "total", "average", "count" to GROUP BY
|
||||
5. **Identify ordering** — map "top", "latest", "highest" to ORDER BY + LIMIT
|
||||
|
||||
### Common Query Templates
|
||||
|
||||
**Top-N per group (window function)**
|
||||
```sql
|
||||
SELECT * FROM (
|
||||
SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
|
||||
FROM employees
|
||||
) ranked WHERE rn <= 3;
|
||||
```
|
||||
|
||||
**Running totals**
|
||||
```sql
|
||||
SELECT date, amount,
|
||||
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
|
||||
FROM transactions;
|
||||
```
|
||||
|
||||
**Gap detection**
|
||||
```sql
|
||||
SELECT curr.id, curr.seq_num, prev.seq_num AS prev_seq
|
||||
FROM records curr
|
||||
LEFT JOIN records prev ON prev.seq_num = curr.seq_num - 1
|
||||
WHERE prev.id IS NULL AND curr.seq_num > 1;
|
||||
```
|
||||
|
||||
**UPSERT (PostgreSQL)**
|
||||
```sql
|
||||
INSERT INTO settings (key, value, updated_at)
|
||||
VALUES ('theme', 'dark', NOW())
|
||||
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
|
||||
```
|
||||
|
||||
**UPSERT (MySQL)**
|
||||
```sql
|
||||
INSERT INTO settings (key_name, value, updated_at)
|
||||
VALUES ('theme', 'dark', NOW())
|
||||
ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
|
||||
```
|
||||
|
||||
> See references/query_patterns.md for JOINs, CTEs, window functions, JSON operations, and more.
|
||||
|
||||
---
|
||||
|
||||
## Schema Exploration
|
||||
|
||||
### Introspection Queries
|
||||
|
||||
**PostgreSQL — list tables and columns**
|
||||
```sql
|
||||
SELECT table_name, column_name, data_type, is_nullable, column_default
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'public'
|
||||
ORDER BY table_name, ordinal_position;
|
||||
```
|
||||
|
||||
**PostgreSQL — foreign keys**
|
||||
```sql
|
||||
SELECT tc.table_name, kcu.column_name,
|
||||
ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_name
|
||||
JOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_name
|
||||
WHERE tc.constraint_type = 'FOREIGN KEY';
|
||||
```
|
||||
|
||||
**MySQL — table sizes**
|
||||
```sql
|
||||
SELECT table_name, table_rows,
|
||||
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
|
||||
ROUND(index_length / 1024 / 1024, 2) AS index_mb
|
||||
FROM information_schema.tables
|
||||
WHERE table_schema = DATABASE()
|
||||
ORDER BY data_length DESC;
|
||||
```
|
||||
|
||||
**SQLite — schema dump**
|
||||
```sql
|
||||
SELECT name, sql FROM sqlite_master WHERE type = 'table' ORDER BY name;
|
||||
```
|
||||
|
||||
**SQL Server — columns with types**
|
||||
```sql
|
||||
SELECT t.name AS table_name, c.name AS column_name,
|
||||
ty.name AS data_type, c.max_length, c.is_nullable
|
||||
FROM sys.columns c
|
||||
JOIN sys.tables t ON c.object_id = t.object_id
|
||||
JOIN sys.types ty ON c.user_type_id = ty.user_type_id
|
||||
ORDER BY t.name, c.column_id;
|
||||
```
|
||||
|
||||
### Generating Documentation from Schema
|
||||
|
||||
Use `scripts/schema_explorer.py` to produce markdown or JSON documentation:
|
||||
|
||||
```bash
|
||||
python scripts/schema_explorer.py --dialect postgres --tables all --format md
|
||||
python scripts/schema_explorer.py --dialect mysql --tables users,orders --format json --json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Optimization
|
||||
|
||||
### EXPLAIN Analysis Workflow
|
||||
|
||||
1. **Run EXPLAIN ANALYZE** (PostgreSQL) or **EXPLAIN FORMAT=JSON** (MySQL)
|
||||
2. **Identify the costliest node** — Seq Scan on large tables, Nested Loop with high row estimates
|
||||
3. **Check for missing indexes** — sequential scans on filtered columns
|
||||
4. **Look for estimation errors** — planned vs actual rows divergence signals stale statistics
|
||||
5. **Evaluate JOIN order** — ensure the smallest result set drives the join
|
||||
|
||||
### Index Recommendation Checklist
|
||||
|
||||
- Columns in WHERE clauses with high selectivity
|
||||
- Columns in JOIN conditions (foreign keys)
|
||||
- Columns in ORDER BY when combined with LIMIT
|
||||
- Composite indexes matching multi-column WHERE predicates (most selective column first)
|
||||
- Partial indexes for queries with constant filters (e.g., `WHERE status = 'active'`)
|
||||
- Covering indexes to avoid table lookups for read-heavy queries
|
||||
|
||||
### Query Rewriting Patterns
|
||||
|
||||
| Anti-Pattern | Rewrite |
|
||||
|-------------|---------|
|
||||
| `SELECT * FROM orders` | `SELECT id, status, total FROM orders` (explicit columns) |
|
||||
| `WHERE YEAR(created_at) = 2025` | `WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'` (sargable) |
|
||||
| Correlated subquery in SELECT | LEFT JOIN with aggregation |
|
||||
| `NOT IN (SELECT ...)` with NULLs | `NOT EXISTS (SELECT 1 ...)` |
|
||||
| `UNION` (dedup) when not needed | `UNION ALL` |
|
||||
| `LIKE '%search%'` | Full-text search index (GIN/FULLTEXT) |
|
||||
| `ORDER BY RAND()` | Application-side random sampling or `TABLESAMPLE` |
|
||||
|
||||
### N+1 Detection
|
||||
|
||||
**Symptoms:**
|
||||
- Application loop that executes one query per parent row
|
||||
- ORM lazy-loading related entities inside a loop
|
||||
- Query log shows hundreds of identical SELECT patterns with different IDs
|
||||
|
||||
**Fixes:**
|
||||
- Use eager loading (`include` in Prisma, `joinedload` in SQLAlchemy)
|
||||
- Batch queries with `WHERE id IN (...)`
|
||||
- Use DataLoader pattern for GraphQL resolvers
|
||||
|
||||
### Static Analysis Tool
|
||||
|
||||
```bash
|
||||
python scripts/query_optimizer.py --query "SELECT * FROM orders WHERE status = 'pending'" --dialect postgres
|
||||
python scripts/query_optimizer.py --query queries.sql --dialect mysql --json
|
||||
```
|
||||
|
||||
> See references/optimization_guide.md for EXPLAIN plan reading, index types, and connection pooling.
|
||||
|
||||
---
|
||||
|
||||
## Migration Generation
|
||||
|
||||
### Zero-Downtime Migration Patterns
|
||||
|
||||
**Adding a column (safe)**
|
||||
```sql
|
||||
-- Up
|
||||
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
|
||||
|
||||
-- Down
|
||||
ALTER TABLE users DROP COLUMN phone;
|
||||
```
|
||||
|
||||
**Renaming a column (expand-contract)**
|
||||
```sql
|
||||
-- Step 1: Add new column
|
||||
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
|
||||
-- Step 2: Backfill
|
||||
UPDATE users SET full_name = name;
|
||||
-- Step 3: Deploy app reading both columns
|
||||
-- Step 4: Deploy app writing only new column
|
||||
-- Step 5: Drop old column
|
||||
ALTER TABLE users DROP COLUMN name;
|
||||
```
|
||||
|
||||
**Adding a NOT NULL column (safe sequence)**
|
||||
```sql
|
||||
-- Step 1: Add nullable
|
||||
ALTER TABLE orders ADD COLUMN region VARCHAR(50);
|
||||
-- Step 2: Backfill with default
|
||||
UPDATE orders SET region = 'unknown' WHERE region IS NULL;
|
||||
-- Step 3: Add constraint
|
||||
ALTER TABLE orders ALTER COLUMN region SET NOT NULL;
|
||||
ALTER TABLE orders ALTER COLUMN region SET DEFAULT 'unknown';
|
||||
```
|
||||
|
||||
**Index creation (non-blocking, PostgreSQL)**
|
||||
```sql
|
||||
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
|
||||
```
|
||||
|
||||
### Data Backfill Strategies
|
||||
|
||||
- **Batch updates** — process in chunks of 1000-10000 rows to avoid lock contention
|
||||
- **Background jobs** — run backfills asynchronously with progress tracking
|
||||
- **Dual-write** — write to old and new columns during transition period
|
||||
- **Validation queries** — verify row counts and data integrity after each batch
|
||||
|
||||
### Rollback Strategies
|
||||
|
||||
Every migration must have a reversible down script. For irreversible changes:
|
||||
|
||||
1. **Backup before execution** — `pg_dump` the affected tables
|
||||
2. **Feature flags** — application can switch between old/new schema reads
|
||||
3. **Shadow tables** — keep a copy of the original table during migration window
|
||||
|
||||
### Migration Generator Tool
|
||||
|
||||
```bash
|
||||
python scripts/migration_generator.py --change "add email_verified boolean to users" --dialect postgres --format sql
|
||||
python scripts/migration_generator.py --change "rename column name to full_name in customers" --dialect mysql --format alembic --json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Multi-Database Support
|
||||
|
||||
### Dialect Differences
|
||||
|
||||
| Feature | PostgreSQL | MySQL | SQLite | SQL Server |
|
||||
|---------|-----------|-------|--------|------------|
|
||||
| UPSERT | `ON CONFLICT DO UPDATE` | `ON DUPLICATE KEY UPDATE` | `ON CONFLICT DO UPDATE` | `MERGE` |
|
||||
| Boolean | Native `BOOLEAN` | `TINYINT(1)` | `INTEGER` | `BIT` |
|
||||
| Auto-increment | `SERIAL` / `GENERATED` | `AUTO_INCREMENT` | `INTEGER PRIMARY KEY` | `IDENTITY` |
|
||||
| JSON | `JSONB` (indexed) | `JSON` | Text (ext) | `NVARCHAR(MAX)` |
|
||||
| Array | Native `ARRAY` | Not supported | Not supported | Not supported |
|
||||
| CTE (recursive) | Full support | 8.0+ | 3.8.3+ | Full support |
|
||||
| Window functions | Full support | 8.0+ | 3.25.0+ | Full support |
|
||||
| Full-text search | `tsvector` + GIN | `FULLTEXT` index | FTS5 extension | Full-text catalog |
|
||||
| LIMIT/OFFSET | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `OFFSET m ROWS FETCH NEXT n ROWS ONLY` |
|
||||
|
||||
### Compatibility Tips
|
||||
|
||||
- **Always use parameterized queries** — prevents SQL injection across all dialects
|
||||
- **Avoid dialect-specific functions in shared code** — wrap in adapter layer
|
||||
- **Test migrations on target engine** — `information_schema` varies between engines
|
||||
- **Use ISO date format** — `'YYYY-MM-DD'` works everywhere
|
||||
- **Quote identifiers** — use double quotes (SQL standard) or backticks (MySQL)
|
||||
|
||||
---
|
||||
|
||||
## ORM Patterns
|
||||
|
||||
### Prisma
|
||||
|
||||
**Schema definition**
|
||||
```prisma
|
||||
model User {
|
||||
id Int @id @default(autoincrement())
|
||||
email String @unique
|
||||
name String?
|
||||
posts Post[]
|
||||
createdAt DateTime @default(now())
|
||||
}
|
||||
|
||||
model Post {
|
||||
id Int @id @default(autoincrement())
|
||||
title String
|
||||
author User @relation(fields: [authorId], references: [id])
|
||||
authorId Int
|
||||
}
|
||||
```
|
||||
|
||||
**Migrations**: `npx prisma migrate dev --name add_user_email`
|
||||
**Query API**: `prisma.user.findMany({ where: { email: { contains: '@' } }, include: { posts: true } })`
|
||||
**Raw SQL escape hatch**: `prisma.$queryRaw\`SELECT * FROM users WHERE id = ${userId}\``
|
||||
|
||||
### Drizzle
|
||||
|
||||
**Schema-first definition**
|
||||
```typescript
|
||||
export const users = pgTable('users', {
|
||||
id: serial('id').primaryKey(),
|
||||
email: varchar('email', { length: 255 }).notNull().unique(),
|
||||
name: text('name'),
|
||||
createdAt: timestamp('created_at').defaultNow(),
|
||||
});
|
||||
```
|
||||
|
||||
**Query builder**: `db.select().from(users).where(eq(users.email, email))`
|
||||
**Migrations**: `npx drizzle-kit generate:pg` then `npx drizzle-kit push:pg`
|
||||
|
||||
### TypeORM
|
||||
|
||||
**Entity decorators**
|
||||
```typescript
|
||||
@Entity()
|
||||
export class User {
|
||||
@PrimaryGeneratedColumn()
|
||||
id: number;
|
||||
|
||||
@Column({ unique: true })
|
||||
email: string;
|
||||
|
||||
@OneToMany(() => Post, post => post.author)
|
||||
posts: Post[];
|
||||
}
|
||||
```
|
||||
|
||||
**Repository pattern**: `userRepo.find({ where: { email }, relations: ['posts'] })`
|
||||
**Migrations**: `npx typeorm migration:generate -n AddUserEmail`
|
||||
|
||||
### SQLAlchemy
|
||||
|
||||
**Declarative models**
|
||||
```python
|
||||
class User(Base):
|
||||
__tablename__ = 'users'
|
||||
id = Column(Integer, primary_key=True)
|
||||
email = Column(String(255), unique=True, nullable=False)
|
||||
name = Column(String(255))
|
||||
posts = relationship('Post', back_populates='author')
|
||||
```
|
||||
|
||||
**Session management**: Always use `with Session() as session:` context manager
|
||||
**Alembic migrations**: `alembic revision --autogenerate -m "add user email"`
|
||||
|
||||
> See references/orm_patterns.md for side-by-side comparisons and migration workflows per ORM.
|
||||
|
||||
---
|
||||
|
||||
## Data Integrity
|
||||
|
||||
### Constraint Strategy
|
||||
|
||||
- **Primary keys** — every table must have one; prefer surrogate keys (serial/UUID)
|
||||
- **Foreign keys** — enforce referential integrity; define ON DELETE behavior explicitly
|
||||
- **UNIQUE constraints** — for business-level uniqueness (email, slug, API key)
|
||||
- **CHECK constraints** — validate ranges, enums, and business rules at the DB level
|
||||
- **NOT NULL** — default to NOT NULL; make nullable only when genuinely optional
|
||||
|
||||
### Transaction Isolation Levels
|
||||
|
||||
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
|
||||
|-------|-----------|-------------------|-------------|----------|
|
||||
| READ UNCOMMITTED | Yes | Yes | Yes | Never recommended |
|
||||
| READ COMMITTED | No | Yes | Yes | Default for PostgreSQL, general OLTP |
|
||||
| REPEATABLE READ | No | No | Yes (InnoDB: No) | Financial calculations |
|
||||
| SERIALIZABLE | No | No | No | Critical consistency (billing, inventory) |
|
||||
|
||||
### Deadlock Prevention
|
||||
|
||||
1. **Consistent lock ordering** — always acquire locks in the same table/row order
|
||||
2. **Short transactions** — minimize time between first lock and commit
|
||||
3. **Advisory locks** — use `pg_advisory_lock()` for application-level coordination
|
||||
4. **Retry logic** — catch deadlock errors and retry with exponential backoff
|
||||
|
||||
---
|
||||
|
||||
## Backup & Restore
|
||||
|
||||
### PostgreSQL
|
||||
```bash
|
||||
# Full backup
|
||||
pg_dump -Fc --no-owner dbname > backup.dump
|
||||
# Restore
|
||||
pg_restore -d dbname --clean --no-owner backup.dump
|
||||
# Point-in-time recovery: configure WAL archiving + restore_command
|
||||
```
|
||||
|
||||
### MySQL
|
||||
```bash
|
||||
# Full backup
|
||||
mysqldump --single-transaction --routines --triggers dbname > backup.sql
|
||||
# Restore
|
||||
mysql dbname < backup.sql
|
||||
# Binary log for PITR: mysqlbinlog --start-datetime="2025-01-01 00:00:00" binlog.000001
|
||||
```
|
||||
|
||||
### SQLite
|
||||
```bash
|
||||
# Backup (safe with concurrent reads)
|
||||
sqlite3 dbname ".backup backup.db"
|
||||
```
|
||||
|
||||
### Backup Best Practices
|
||||
- **Automate** — cron or systemd timer, never manual-only
|
||||
- **Test restores** — untested backups are not backups
|
||||
- **Offsite copies** — S3, GCS, or separate region
|
||||
- **Retention policy** — daily for 7 days, weekly for 4 weeks, monthly for 12 months
|
||||
- **Monitor backup size and duration** — sudden changes signal issues
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Problem | Fix |
|
||||
|-------------|---------|-----|
|
||||
| `SELECT *` | Transfers unnecessary data, breaks on schema changes | Explicit column list |
|
||||
| Missing indexes on FK columns | Slow JOINs and cascading deletes | Add indexes on all foreign keys |
|
||||
| N+1 queries | 1 + N round trips to database | Eager loading or batch queries |
|
||||
| Implicit type coercion | `WHERE id = '123'` prevents index use | Match types in predicates |
|
||||
| No connection pooling | Exhausts connections under load | PgBouncer, ProxySQL, or ORM pool |
|
||||
| Unbounded queries | No LIMIT risks returning millions of rows | Always paginate |
|
||||
| Storing money as FLOAT | Rounding errors | Use `DECIMAL(19,4)` or integer cents |
|
||||
| God tables | One table with 50+ columns | Normalize or use vertical partitioning |
|
||||
| Soft deletes everywhere | Complicates every query with `WHERE deleted_at IS NULL` | Archive tables or event sourcing |
|
||||
| Raw string concatenation | SQL injection | Parameterized queries always |
|
||||
|
||||
---
|
||||
|
||||
## Cross-References
|
||||
|
||||
| Skill | Relationship |
|
||||
|-------|-------------|
|
||||
| **database-designer** | Schema architecture, normalization analysis, ERD generation |
|
||||
| **database-schema-designer** | Visual ERD modeling, relationship mapping |
|
||||
| **migration-architect** | Complex multi-step migration orchestration |
|
||||
| **api-design-reviewer** | Ensuring API endpoints align with query patterns |
|
||||
| **observability-platform** | Query performance monitoring, slow query alerts |
|
||||
@@ -0,0 +1,330 @@
|
||||
# Query Optimization Guide
|
||||
|
||||
How to read EXPLAIN plans, choose the right index types, understand query plan operators, and configure connection pooling.
|
||||
|
||||
---
|
||||
|
||||
## Reading EXPLAIN Plans
|
||||
|
||||
### PostgreSQL — EXPLAIN ANALYZE
|
||||
|
||||
```sql
|
||||
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
|
||||
```
|
||||
|
||||
**Sample output:**
|
||||
```
|
||||
Limit (cost=0.43..12.87 rows=20 width=128) (actual time=0.052..0.089 rows=20 loops=1)
|
||||
-> Index Scan Backward using idx_orders_status_created on orders (cost=0.43..4521.33 rows=7284 width=128) (actual time=0.051..0.085 rows=20 loops=1)
|
||||
Index Cond: (status = 'paid')
|
||||
Buffers: shared hit=4
|
||||
Planning Time: 0.156 ms
|
||||
Execution Time: 0.112 ms
|
||||
```
|
||||
|
||||
**Key fields to check:**
|
||||
|
||||
| Field | What it tells you |
|
||||
|-------|-------------------|
|
||||
| `cost` | Estimated startup..total cost (arbitrary units) |
|
||||
| `rows` | Estimated row count at that node |
|
||||
| `actual time` | Real wall-clock time in milliseconds |
|
||||
| `actual rows` | Real row count — compare against estimate |
|
||||
| `Buffers: shared hit` | Pages read from cache (good) |
|
||||
| `Buffers: shared read` | Pages read from disk (slow) |
|
||||
| `loops` | How many times the node executed |
|
||||
|
||||
**Red flags:**
|
||||
- `Seq Scan` on a large table with a WHERE clause — missing index
|
||||
- `actual rows` >> `rows` (estimated) — stale statistics, run `ANALYZE`
|
||||
- `Nested Loop` with high loop count — consider hash join or add index
|
||||
- `Sort` with `external merge` — not enough `work_mem`, spilling to disk
|
||||
- `Buffers: shared read` much higher than `shared hit` — cold cache or table too large for memory
|
||||
|
||||
### MySQL — EXPLAIN FORMAT=JSON
|
||||
|
||||
```sql
|
||||
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
|
||||
```
|
||||
|
||||
**Key fields:**
|
||||
- `query_block.select_id` — identifies subqueries
|
||||
- `table.access_type` — `ALL` (full scan), `ref` (index lookup), `range`, `index`, `const`
|
||||
- `table.rows_examined_per_scan` — how many rows the engine reads
|
||||
- `table.using_index` — covering index (no table lookup needed)
|
||||
- `table.attached_condition` — the WHERE filter applied
|
||||
|
||||
**Access types ranked (best to worst):**
|
||||
`system` > `const` > `eq_ref` > `ref` > `range` > `index` > `ALL`
|
||||
|
||||
---
|
||||
|
||||
## Index Types
|
||||
|
||||
### B-tree (default)
|
||||
|
||||
The workhorse index. Supports equality, range, prefix, and ORDER BY operations.
|
||||
|
||||
**Best for:** `=`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`, `MIN()`, `MAX()`
|
||||
|
||||
```sql
|
||||
CREATE INDEX idx_orders_created ON orders (created_at);
|
||||
```
|
||||
|
||||
**Composite B-tree:** Column order matters. The index is useful for queries that filter on a leftmost prefix of the indexed columns.
|
||||
|
||||
```sql
|
||||
-- This index serves: WHERE status = ... AND created_at > ...
|
||||
-- Also serves: WHERE status = ...
|
||||
-- Does NOT serve: WHERE created_at > ... (without status)
|
||||
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
|
||||
```
|
||||
|
||||
### Hash
|
||||
|
||||
Equality-only lookups. Faster than B-tree for exact matches but no range support.
|
||||
|
||||
**Best for:** `=` lookups on high-cardinality columns
|
||||
|
||||
```sql
|
||||
-- PostgreSQL
|
||||
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
|
||||
```
|
||||
|
||||
**Limitations:** No range queries, no ORDER BY, not WAL-logged before PostgreSQL 10.
|
||||
|
||||
### GIN (Generalized Inverted Index)
|
||||
|
||||
For multi-valued data: arrays, JSONB, full-text search vectors.
|
||||
|
||||
```sql
|
||||
-- JSONB containment
|
||||
CREATE INDEX idx_products_tags ON products USING gin (tags);
|
||||
-- Query: SELECT * FROM products WHERE tags @> '["sale"]';
|
||||
|
||||
-- Full-text search
|
||||
CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
|
||||
```
|
||||
|
||||
### GiST (Generalized Search Tree)
|
||||
|
||||
For geometric, range, and proximity data.
|
||||
|
||||
```sql
|
||||
-- Range type (e.g., date ranges)
|
||||
CREATE INDEX idx_bookings_period ON bookings USING gist (during);
|
||||
-- Query: SELECT * FROM bookings WHERE during && '[2025-01-01, 2025-01-31]';
|
||||
|
||||
-- PostGIS geometry
|
||||
CREATE INDEX idx_locations_geom ON locations USING gist (geom);
|
||||
```
|
||||
|
||||
### BRIN (Block Range INdex)
|
||||
|
||||
Tiny index for naturally ordered data (e.g., time-series append-only tables).
|
||||
|
||||
```sql
|
||||
CREATE INDEX idx_events_created ON events USING brin (created_at);
|
||||
```
|
||||
|
||||
**Best for:** Large tables where the indexed column correlates with physical row order. Much smaller than B-tree but less precise.
|
||||
|
||||
### Partial Index
|
||||
|
||||
Index only rows matching a condition. Smaller and faster for targeted queries.
|
||||
|
||||
```sql
|
||||
-- Only index active users (skip millions of inactive)
|
||||
CREATE INDEX idx_users_active_email ON users (email) WHERE status = 'active';
|
||||
```
|
||||
|
||||
### Covering Index (INCLUDE)
|
||||
|
||||
Store extra columns in the index to avoid table lookups (index-only scans).
|
||||
|
||||
```sql
|
||||
-- PostgreSQL 11+
|
||||
CREATE INDEX idx_orders_status ON orders (status) INCLUDE (total, created_at);
|
||||
-- Query can be answered entirely from the index:
|
||||
-- SELECT total, created_at FROM orders WHERE status = 'paid';
|
||||
```
|
||||
|
||||
### Expression Index
|
||||
|
||||
Index the result of a function or expression.
|
||||
|
||||
```sql
|
||||
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
|
||||
-- Query: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Query Plan Operators
|
||||
|
||||
### Scan operators
|
||||
|
||||
| Operator | Description | Performance |
|
||||
|----------|-------------|-------------|
|
||||
| **Seq Scan** | Full table scan, reads every row | Slow on large tables |
|
||||
| **Index Scan** | B-tree lookup + table fetch | Fast for selective queries |
|
||||
| **Index Only Scan** | Reads only the index (covering) | Fastest for covered queries |
|
||||
| **Bitmap Index Scan** | Builds a bitmap of matching pages | Good for medium selectivity |
|
||||
| **Bitmap Heap Scan** | Fetches pages identified by bitmap | Pairs with bitmap index scan |
|
||||
|
||||
### Join operators
|
||||
|
||||
| Operator | Description | Best when |
|
||||
|----------|-------------|-----------|
|
||||
| **Nested Loop** | For each outer row, scan inner | Small outer set, indexed inner |
|
||||
| **Hash Join** | Build hash table on inner, probe with outer | Medium-large sets, no index |
|
||||
| **Merge Join** | Merge two sorted inputs | Both inputs already sorted |
|
||||
|
||||
### Other operators
|
||||
|
||||
| Operator | Description |
|
||||
|----------|-------------|
|
||||
| **Sort** | Sorts rows (may spill to disk if work_mem exceeded) |
|
||||
| **Hash Aggregate** | GROUP BY using hash table |
|
||||
| **Group Aggregate** | GROUP BY on pre-sorted input |
|
||||
| **Limit** | Stops after N rows |
|
||||
| **Materialize** | Caches subquery results in memory |
|
||||
| **Gather / Gather Merge** | Collects results from parallel workers |
|
||||
|
||||
---
|
||||
|
||||
## Connection Pooling
|
||||
|
||||
### Why pool connections?
|
||||
|
||||
Each database connection consumes memory (5-10 MB in PostgreSQL). Without pooling:
|
||||
- Application creates a new connection per request (slow: TCP + TLS + auth)
|
||||
- Under load, connection count spikes past `max_connections`
|
||||
- Database OOM or connection refused errors
|
||||
|
||||
### PgBouncer (PostgreSQL)
|
||||
|
||||
The standard external connection pooler for PostgreSQL.
|
||||
|
||||
**Modes:**
|
||||
- **Session** — connection assigned for entire client session (safest, least efficient)
|
||||
- **Transaction** — connection returned to pool after each transaction (recommended)
|
||||
- **Statement** — connection returned after each statement (cannot use transactions)
|
||||
|
||||
```ini
|
||||
# pgbouncer.ini
|
||||
[databases]
|
||||
mydb = host=127.0.0.1 port=5432 dbname=mydb
|
||||
|
||||
[pgbouncer]
|
||||
pool_mode = transaction
|
||||
max_client_conn = 200
|
||||
default_pool_size = 20
|
||||
min_pool_size = 5
|
||||
reserve_pool_size = 5
|
||||
reserve_pool_timeout = 3
|
||||
server_idle_timeout = 300
|
||||
```
|
||||
|
||||
**Sizing formula:**
|
||||
```
|
||||
default_pool_size = num_cpu_cores * 2 + effective_spindle_count
|
||||
```
|
||||
For SSDs, start with `num_cpu_cores * 2` (typically 4-16 connections is optimal).
|
||||
|
||||
### ProxySQL (MySQL)
|
||||
|
||||
```ini
|
||||
mysql_servers = ({ address="127.0.0.1", port=3306, hostgroup=0, max_connections=100 })
|
||||
mysql_query_rules = ({ rule_id=1, match_pattern="^SELECT.*FOR UPDATE", destination_hostgroup=0 })
|
||||
```
|
||||
|
||||
### Application-Level Pooling
|
||||
|
||||
Most ORMs and drivers include built-in pooling:
|
||||
|
||||
| Platform | Pool Configuration |
|
||||
|----------|--------------------|
|
||||
| **node-postgres** | `new Pool({ max: 20, idleTimeoutMillis: 30000 })` |
|
||||
| **SQLAlchemy** | `create_engine(url, pool_size=20, max_overflow=5)` |
|
||||
| **HikariCP (Java)** | `maximumPoolSize=20, minimumIdle=5, idleTimeout=300000` |
|
||||
| **Prisma** | `connection_limit=20` in connection string |
|
||||
|
||||
### Pool Sizing Guidelines
|
||||
|
||||
| Metric | Guideline |
|
||||
|--------|-----------|
|
||||
| **Minimum** | Number of always-active background workers |
|
||||
| **Maximum** | 2-4x CPU cores for OLTP; lower for OLAP |
|
||||
| **Idle timeout** | 30-300 seconds (reclaim unused connections) |
|
||||
| **Connection timeout** | 3-10 seconds (fail fast under pressure) |
|
||||
| **Queue size** | 2-5x pool max (buffer bursts before rejecting) |
|
||||
|
||||
**Warning:** More connections does not mean better performance. Beyond the optimal point (usually 20-50), contention on locks, CPU, and I/O causes throughput to decrease.
|
||||
|
||||
---
|
||||
|
||||
## Statistics and Maintenance
|
||||
|
||||
### PostgreSQL
|
||||
```sql
|
||||
-- Update statistics for the query planner
|
||||
ANALYZE orders;
|
||||
ANALYZE; -- All tables
|
||||
|
||||
-- Check table bloat and dead tuples
|
||||
SELECT relname, n_dead_tup, last_autovacuum, last_autoanalyze
|
||||
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
|
||||
|
||||
-- Identify unused indexes
|
||||
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
|
||||
FROM pg_stat_user_indexes
|
||||
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
|
||||
ORDER BY pg_relation_size(indexrelid) DESC;
|
||||
```
|
||||
|
||||
### MySQL
|
||||
```sql
|
||||
-- Update statistics
|
||||
ANALYZE TABLE orders;
|
||||
|
||||
-- Check index usage
|
||||
SELECT * FROM sys.schema_unused_indexes;
|
||||
SELECT * FROM sys.schema_redundant_indexes;
|
||||
|
||||
-- Identify long-running queries
|
||||
SELECT * FROM information_schema.processlist WHERE time > 10;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Checklist
|
||||
|
||||
Before deploying any query to production:
|
||||
|
||||
1. Run `EXPLAIN ANALYZE` and verify no unexpected sequential scans
|
||||
2. Check that estimated rows are within 10x of actual rows
|
||||
3. Verify index usage on all WHERE, JOIN, and ORDER BY columns
|
||||
4. Ensure LIMIT is present for user-facing list queries
|
||||
5. Confirm parameterized queries (no string concatenation)
|
||||
6. Test with production-like data volume (not just 10 rows)
|
||||
7. Monitor query time in application metrics after deployment
|
||||
8. Set up slow query log alerting (> 100ms for OLTP, > 5s for reports)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: When to Use Which Index
|
||||
|
||||
| Query Pattern | Index Type |
|
||||
|--------------|-----------|
|
||||
| `WHERE col = value` | B-tree or Hash |
|
||||
| `WHERE col > value` | B-tree |
|
||||
| `WHERE col LIKE 'prefix%'` | B-tree |
|
||||
| `WHERE col LIKE '%substring%'` | GIN (full-text) or trigram |
|
||||
| `WHERE jsonb_col @> '{...}'` | GIN |
|
||||
| `WHERE array_col && ARRAY[...]` | GIN |
|
||||
| `WHERE range_col && '[a,b]'` | GiST |
|
||||
| `WHERE ST_DWithin(geom, ...)` | GiST |
|
||||
| `WHERE col = value` (append-only) | BRIN |
|
||||
| `WHERE col = value AND status = 'active'` | Partial B-tree |
|
||||
| `SELECT a, b WHERE c = value` | Covering (INCLUDE) |
|
||||
451
engineering/sql-database-assistant/references/orm_patterns.md
Normal file
451
engineering/sql-database-assistant/references/orm_patterns.md
Normal file
@@ -0,0 +1,451 @@
|
||||
# ORM Patterns Reference
|
||||
|
||||
Side-by-side comparison of Prisma, Drizzle, TypeORM, and SQLAlchemy patterns for common database operations.
|
||||
|
||||
---
|
||||
|
||||
## Schema Definition
|
||||
|
||||
### Prisma (schema.prisma)
|
||||
```prisma
|
||||
model User {
|
||||
id Int @id @default(autoincrement())
|
||||
email String @unique
|
||||
name String?
|
||||
role Role @default(USER)
|
||||
posts Post[]
|
||||
profile Profile?
|
||||
createdAt DateTime @default(now())
|
||||
updatedAt DateTime @updatedAt
|
||||
|
||||
@@index([email])
|
||||
@@map("users")
|
||||
}
|
||||
|
||||
model Post {
|
||||
id Int @id @default(autoincrement())
|
||||
title String
|
||||
body String?
|
||||
published Boolean @default(false)
|
||||
author User @relation(fields: [authorId], references: [id], onDelete: Cascade)
|
||||
authorId Int
|
||||
tags Tag[]
|
||||
createdAt DateTime @default(now())
|
||||
|
||||
@@index([authorId])
|
||||
@@index([published, createdAt])
|
||||
@@map("posts")
|
||||
}
|
||||
|
||||
enum Role {
|
||||
USER
|
||||
ADMIN
|
||||
MODERATOR
|
||||
}
|
||||
```
|
||||
|
||||
### Drizzle (schema.ts)
|
||||
```typescript
|
||||
import { pgTable, serial, varchar, text, boolean, timestamp, integer, pgEnum } from 'drizzle-orm/pg-core';
|
||||
|
||||
export const roleEnum = pgEnum('role', ['USER', 'ADMIN', 'MODERATOR']);
|
||||
|
||||
export const users = pgTable('users', {
|
||||
id: serial('id').primaryKey(),
|
||||
email: varchar('email', { length: 255 }).notNull().unique(),
|
||||
name: varchar('name', { length: 255 }),
|
||||
role: roleEnum('role').default('USER').notNull(),
|
||||
createdAt: timestamp('created_at').defaultNow().notNull(),
|
||||
updatedAt: timestamp('updated_at').defaultNow().notNull(),
|
||||
});
|
||||
|
||||
export const posts = pgTable('posts', {
|
||||
id: serial('id').primaryKey(),
|
||||
title: varchar('title', { length: 255 }).notNull(),
|
||||
body: text('body'),
|
||||
published: boolean('published').default(false).notNull(),
|
||||
authorId: integer('author_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
|
||||
createdAt: timestamp('created_at').defaultNow().notNull(),
|
||||
}, (table) => ({
|
||||
authorIdx: index('idx_posts_author').on(table.authorId),
|
||||
publishedIdx: index('idx_posts_published').on(table.published, table.createdAt),
|
||||
}));
|
||||
```
|
||||
|
||||
### TypeORM (entities)
|
||||
```typescript
|
||||
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, OneToMany, CreateDateColumn, UpdateDateColumn, Index } from 'typeorm';
|
||||
|
||||
export enum Role { USER = 'USER', ADMIN = 'ADMIN', MODERATOR = 'MODERATOR' }
|
||||
|
||||
@Entity('users')
|
||||
export class User {
|
||||
@PrimaryGeneratedColumn()
|
||||
id: number;
|
||||
|
||||
@Column({ unique: true })
|
||||
@Index()
|
||||
email: string;
|
||||
|
||||
@Column({ nullable: true })
|
||||
name: string;
|
||||
|
||||
@Column({ type: 'enum', enum: Role, default: Role.USER })
|
||||
role: Role;
|
||||
|
||||
@OneToMany(() => Post, post => post.author)
|
||||
posts: Post[];
|
||||
|
||||
@CreateDateColumn()
|
||||
createdAt: Date;
|
||||
|
||||
@UpdateDateColumn()
|
||||
updatedAt: Date;
|
||||
}
|
||||
|
||||
@Entity('posts')
|
||||
@Index(['published', 'createdAt'])
|
||||
export class Post {
|
||||
@PrimaryGeneratedColumn()
|
||||
id: number;
|
||||
|
||||
@Column()
|
||||
title: string;
|
||||
|
||||
@Column({ nullable: true, type: 'text' })
|
||||
body: string;
|
||||
|
||||
@Column({ default: false })
|
||||
published: boolean;
|
||||
|
||||
@ManyToOne(() => User, user => user.posts, { onDelete: 'CASCADE' })
|
||||
author: User;
|
||||
|
||||
@Column()
|
||||
authorId: number;
|
||||
|
||||
@CreateDateColumn()
|
||||
createdAt: Date;
|
||||
}
|
||||
```
|
||||
|
||||
### SQLAlchemy (models.py)
|
||||
```python
|
||||
import enum
|
||||
from datetime import datetime
|
||||
from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, Enum, ForeignKey, Index
|
||||
from sqlalchemy.orm import relationship, DeclarativeBase
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
class Role(enum.Enum):
|
||||
USER = "USER"
|
||||
ADMIN = "ADMIN"
|
||||
MODERATOR = "MODERATOR"
|
||||
|
||||
class User(Base):
|
||||
__tablename__ = 'users'
|
||||
|
||||
id = Column(Integer, primary_key=True, autoincrement=True)
|
||||
email = Column(String(255), unique=True, nullable=False, index=True)
|
||||
name = Column(String(255), nullable=True)
|
||||
role = Column(Enum(Role), default=Role.USER, nullable=False)
|
||||
posts = relationship('Post', back_populates='author', cascade='all, delete-orphan')
|
||||
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
|
||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
|
||||
|
||||
class Post(Base):
|
||||
__tablename__ = 'posts'
|
||||
__table_args__ = (
|
||||
Index('idx_posts_published', 'published', 'created_at'),
|
||||
)
|
||||
|
||||
id = Column(Integer, primary_key=True, autoincrement=True)
|
||||
title = Column(String(255), nullable=False)
|
||||
body = Column(Text, nullable=True)
|
||||
published = Column(Boolean, default=False, nullable=False)
|
||||
author_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False, index=True)
|
||||
author = relationship('User', back_populates='posts')
|
||||
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CRUD Operations
|
||||
|
||||
### Create
|
||||
|
||||
| ORM | Pattern |
|
||||
|-----|---------|
|
||||
| **Prisma** | `await prisma.user.create({ data: { email, name } })` |
|
||||
| **Drizzle** | `await db.insert(users).values({ email, name }).returning()` |
|
||||
| **TypeORM** | `await userRepo.save(userRepo.create({ email, name }))` |
|
||||
| **SQLAlchemy** | `session.add(User(email=email, name=name)); session.commit()` |
|
||||
|
||||
### Read (with filter)
|
||||
|
||||
| ORM | Pattern |
|
||||
|-----|---------|
|
||||
| **Prisma** | `await prisma.user.findMany({ where: { role: 'ADMIN' }, orderBy: { createdAt: 'desc' } })` |
|
||||
| **Drizzle** | `await db.select().from(users).where(eq(users.role, 'ADMIN')).orderBy(desc(users.createdAt))` |
|
||||
| **TypeORM** | `await userRepo.find({ where: { role: Role.ADMIN }, order: { createdAt: 'DESC' } })` |
|
||||
| **SQLAlchemy** | `session.query(User).filter(User.role == Role.ADMIN).order_by(User.created_at.desc()).all()` |
|
||||
|
||||
### Update
|
||||
|
||||
| ORM | Pattern |
|
||||
|-----|---------|
|
||||
| **Prisma** | `await prisma.user.update({ where: { id }, data: { name } })` |
|
||||
| **Drizzle** | `await db.update(users).set({ name }).where(eq(users.id, id))` |
|
||||
| **TypeORM** | `await userRepo.update(id, { name })` |
|
||||
| **SQLAlchemy** | `session.query(User).filter(User.id == id).update({User.name: name}); session.commit()` |
|
||||
|
||||
### Delete
|
||||
|
||||
| ORM | Pattern |
|
||||
|-----|---------|
|
||||
| **Prisma** | `await prisma.user.delete({ where: { id } })` |
|
||||
| **Drizzle** | `await db.delete(users).where(eq(users.id, id))` |
|
||||
| **TypeORM** | `await userRepo.delete(id)` |
|
||||
| **SQLAlchemy** | `session.query(User).filter(User.id == id).delete(); session.commit()` |
|
||||
|
||||
---
|
||||
|
||||
## Relations and Eager Loading
|
||||
|
||||
### Prisma — include / select
|
||||
```typescript
|
||||
// Eager load posts with user
|
||||
const user = await prisma.user.findUnique({
|
||||
where: { id: 1 },
|
||||
include: { posts: { where: { published: true }, orderBy: { createdAt: 'desc' } } },
|
||||
});
|
||||
|
||||
// Nested create
|
||||
await prisma.user.create({
|
||||
data: {
|
||||
email: 'new@example.com',
|
||||
posts: { create: [{ title: 'First post' }] },
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Drizzle — relational queries
|
||||
```typescript
|
||||
const result = await db.query.users.findFirst({
|
||||
where: eq(users.id, 1),
|
||||
with: { posts: { where: eq(posts.published, true), orderBy: [desc(posts.createdAt)] } },
|
||||
});
|
||||
```
|
||||
|
||||
### TypeORM — relations / query builder
|
||||
```typescript
|
||||
// FindOptions
|
||||
const user = await userRepo.findOne({ where: { id: 1 }, relations: ['posts'] });
|
||||
|
||||
// QueryBuilder for complex joins
|
||||
const result = await userRepo.createQueryBuilder('u')
|
||||
.leftJoinAndSelect('u.posts', 'p', 'p.published = :pub', { pub: true })
|
||||
.where('u.id = :id', { id: 1 })
|
||||
.getOne();
|
||||
```
|
||||
|
||||
### SQLAlchemy — joinedload / selectinload
|
||||
```python
|
||||
from sqlalchemy.orm import joinedload, selectinload
|
||||
|
||||
# Eager load in one JOIN query
|
||||
user = session.query(User).options(joinedload(User.posts)).filter(User.id == 1).first()
|
||||
|
||||
# Eager load in a separate IN query (better for collections)
|
||||
users = session.query(User).options(selectinload(User.posts)).all()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Raw SQL Escape Hatches
|
||||
|
||||
Every ORM should provide a way to execute raw SQL for complex queries:
|
||||
|
||||
| ORM | Pattern |
|
||||
|-----|---------|
|
||||
| **Prisma** | `` prisma.$queryRaw`SELECT * FROM users WHERE id = ${id}` `` |
|
||||
| **Drizzle** | `db.execute(sql`SELECT * FROM users WHERE id = ${id}`)` |
|
||||
| **TypeORM** | `dataSource.query('SELECT * FROM users WHERE id = $1', [id])` |
|
||||
| **SQLAlchemy** | `session.execute(text('SELECT * FROM users WHERE id = :id'), {'id': id})` |
|
||||
|
||||
Always use parameterized queries in raw SQL to prevent injection.
|
||||
|
||||
---
|
||||
|
||||
## Transaction Patterns
|
||||
|
||||
### Prisma
|
||||
```typescript
|
||||
await prisma.$transaction(async (tx) => {
|
||||
const user = await tx.user.create({ data: { email } });
|
||||
await tx.post.create({ data: { title: 'Welcome', authorId: user.id } });
|
||||
});
|
||||
```
|
||||
|
||||
### Drizzle
|
||||
```typescript
|
||||
await db.transaction(async (tx) => {
|
||||
const [user] = await tx.insert(users).values({ email }).returning();
|
||||
await tx.insert(posts).values({ title: 'Welcome', authorId: user.id });
|
||||
});
|
||||
```
|
||||
|
||||
### TypeORM
|
||||
```typescript
|
||||
await dataSource.transaction(async (manager) => {
|
||||
const user = await manager.save(User, { email });
|
||||
await manager.save(Post, { title: 'Welcome', authorId: user.id });
|
||||
});
|
||||
```
|
||||
|
||||
### SQLAlchemy
|
||||
```python
|
||||
with Session() as session:
|
||||
try:
|
||||
user = User(email=email)
|
||||
session.add(user)
|
||||
session.flush() # Get user.id without committing
|
||||
session.add(Post(title='Welcome', author_id=user.id))
|
||||
session.commit()
|
||||
except Exception:
|
||||
session.rollback()
|
||||
raise
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Workflows
|
||||
|
||||
### Prisma
|
||||
```bash
|
||||
# Generate migration from schema changes
|
||||
npx prisma migrate dev --name add_posts_table
|
||||
|
||||
# Apply in production
|
||||
npx prisma migrate deploy
|
||||
|
||||
# Reset database (dev only)
|
||||
npx prisma migrate reset
|
||||
|
||||
# Generate client after schema change
|
||||
npx prisma generate
|
||||
```
|
||||
|
||||
**Files:** `prisma/migrations/<timestamp>_<name>/migration.sql`
|
||||
|
||||
### Drizzle
|
||||
```bash
|
||||
# Generate migration SQL from schema diff
|
||||
npx drizzle-kit generate:pg
|
||||
|
||||
# Push schema directly (dev only, no migration files)
|
||||
npx drizzle-kit push:pg
|
||||
|
||||
# Apply migrations
|
||||
npx drizzle-kit migrate
|
||||
```
|
||||
|
||||
**Files:** `drizzle/<timestamp>_<name>.sql`
|
||||
|
||||
### TypeORM
|
||||
```bash
|
||||
# Auto-generate migration from entity changes
|
||||
npx typeorm migration:generate -d data-source.ts -n AddPostsTable
|
||||
|
||||
# Create empty migration
|
||||
npx typeorm migration:create -n CustomMigration
|
||||
|
||||
# Run pending migrations
|
||||
npx typeorm migration:run -d data-source.ts
|
||||
|
||||
# Revert last migration
|
||||
npx typeorm migration:revert -d data-source.ts
|
||||
```
|
||||
|
||||
**Files:** `src/migrations/<timestamp>-<Name>.ts`
|
||||
|
||||
### SQLAlchemy (Alembic)
|
||||
```bash
|
||||
# Initialize Alembic
|
||||
alembic init alembic
|
||||
|
||||
# Auto-generate migration from model changes
|
||||
alembic revision --autogenerate -m "add posts table"
|
||||
|
||||
# Apply all pending
|
||||
alembic upgrade head
|
||||
|
||||
# Revert one step
|
||||
alembic downgrade -1
|
||||
|
||||
# Show current state
|
||||
alembic current
|
||||
```
|
||||
|
||||
**Files:** `alembic/versions/<hash>_<slug>.py`
|
||||
|
||||
---
|
||||
|
||||
## N+1 Prevention Cheat Sheet
|
||||
|
||||
| ORM | Lazy (N+1 risk) | Eager (fixed) |
|
||||
|-----|-----------------|---------------|
|
||||
| **Prisma** | Not accessing `include` | `include: { posts: true }` |
|
||||
| **Drizzle** | Separate queries | `with: { posts: true }` |
|
||||
| **TypeORM** | `@ManyToOne(() => ..., { lazy: true })` | `relations: ['posts']` or `leftJoinAndSelect` |
|
||||
| **SQLAlchemy** | Default `lazy='select'` | `joinedload()` or `selectinload()` |
|
||||
|
||||
**Rule of thumb:** If you access a relation inside a loop, you have an N+1 problem. Always load relations before the loop.
|
||||
|
||||
---
|
||||
|
||||
## Connection Pooling
|
||||
|
||||
### Prisma
|
||||
```
|
||||
# In .env or connection string
|
||||
DATABASE_URL="postgresql://user:pass@host/db?connection_limit=20&pool_timeout=10"
|
||||
```
|
||||
|
||||
### Drizzle (with node-postgres)
|
||||
```typescript
|
||||
import { Pool } from 'pg';
|
||||
const pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000 });
|
||||
const db = drizzle(pool);
|
||||
```
|
||||
|
||||
### TypeORM
|
||||
```typescript
|
||||
const dataSource = new DataSource({
|
||||
type: 'postgres',
|
||||
extra: { max: 20, idleTimeoutMillis: 30000 },
|
||||
});
|
||||
```
|
||||
|
||||
### SQLAlchemy
|
||||
```python
|
||||
from sqlalchemy import create_engine
|
||||
engine = create_engine('postgresql://user:pass@host/db', pool_size=20, max_overflow=5, pool_timeout=30)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices Summary
|
||||
|
||||
1. **Always use migrations** — never modify production schemas by hand
|
||||
2. **Eager load relations** — prevent N+1 in every list/collection query
|
||||
3. **Use transactions** — group related writes to maintain consistency
|
||||
4. **Parameterize raw SQL** — never concatenate user input into queries
|
||||
5. **Connection pooling** — configure pool size matching your workload
|
||||
6. **Index foreign keys** — ORMs often skip this; add manually if needed
|
||||
7. **Review generated SQL** — enable query logging in development to catch inefficiencies
|
||||
8. **Type-safe queries** — leverage TypeScript/Python typing for compile-time checks
|
||||
9. **Separate read/write models** — use views or read replicas for heavy reporting queries
|
||||
10. **Test migrations both ways** — always verify that down migrations actually reverse up migrations
|
||||
406
engineering/sql-database-assistant/references/query_patterns.md
Normal file
406
engineering/sql-database-assistant/references/query_patterns.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# SQL Query Patterns Reference
|
||||
|
||||
Common query patterns for everyday database operations. All examples use PostgreSQL syntax with dialect notes where they differ.
|
||||
|
||||
---
|
||||
|
||||
## JOIN Patterns
|
||||
|
||||
### INNER JOIN — matching rows in both tables
|
||||
```sql
|
||||
SELECT u.name, o.id AS order_id, o.total
|
||||
FROM users u
|
||||
INNER JOIN orders o ON o.user_id = u.id
|
||||
WHERE o.status = 'paid';
|
||||
```
|
||||
|
||||
### LEFT JOIN — all rows from left, matching from right
|
||||
```sql
|
||||
SELECT u.name, COUNT(o.id) AS order_count
|
||||
FROM users u
|
||||
LEFT JOIN orders o ON o.user_id = u.id
|
||||
GROUP BY u.id, u.name;
|
||||
```
|
||||
Returns users even if they have zero orders.
|
||||
|
||||
### Self JOIN — comparing rows within the same table
|
||||
```sql
|
||||
-- Find employees who earn more than their manager
|
||||
SELECT e.name AS employee, m.name AS manager, e.salary, m.salary AS manager_salary
|
||||
FROM employees e
|
||||
JOIN employees m ON e.manager_id = m.id
|
||||
WHERE e.salary > m.salary;
|
||||
```
|
||||
|
||||
### CROSS JOIN — every combination (cartesian product)
|
||||
```sql
|
||||
-- Generate a calendar grid
|
||||
SELECT d.date, s.shift_name
|
||||
FROM dates d
|
||||
CROSS JOIN shifts s;
|
||||
```
|
||||
Use intentionally. Accidental cartesian joins are a performance killer.
|
||||
|
||||
### LATERAL JOIN (PostgreSQL) — correlated subquery as a table
|
||||
```sql
|
||||
-- Top 3 orders per user
|
||||
SELECT u.name, top_orders.*
|
||||
FROM users u
|
||||
CROSS JOIN LATERAL (
|
||||
SELECT id, total FROM orders
|
||||
WHERE user_id = u.id
|
||||
ORDER BY total DESC LIMIT 3
|
||||
) top_orders;
|
||||
```
|
||||
MySQL equivalent: use a subquery with `ROW_NUMBER()`.
|
||||
|
||||
---
|
||||
|
||||
## Common Table Expressions (CTEs)
|
||||
|
||||
### Basic CTE — readable subquery
|
||||
```sql
|
||||
WITH active_users AS (
|
||||
SELECT id, name, email
|
||||
FROM users
|
||||
WHERE last_login > CURRENT_DATE - INTERVAL '30 days'
|
||||
)
|
||||
SELECT au.name, COUNT(o.id) AS recent_orders
|
||||
FROM active_users au
|
||||
JOIN orders o ON o.user_id = au.id
|
||||
GROUP BY au.name;
|
||||
```
|
||||
|
||||
### Multiple CTEs — chaining transformations
|
||||
```sql
|
||||
WITH monthly_revenue AS (
|
||||
SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue
|
||||
FROM orders WHERE status = 'paid'
|
||||
GROUP BY 1
|
||||
),
|
||||
growth AS (
|
||||
SELECT month, revenue,
|
||||
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
|
||||
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
|
||||
FROM monthly_revenue
|
||||
)
|
||||
SELECT * FROM growth ORDER BY month;
|
||||
```
|
||||
|
||||
### Recursive CTE — hierarchical data
|
||||
```sql
|
||||
-- Organization tree
|
||||
WITH RECURSIVE org_tree AS (
|
||||
-- Base case: top-level managers
|
||||
SELECT id, name, manager_id, 0 AS depth
|
||||
FROM employees WHERE manager_id IS NULL
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Recursive case: subordinates
|
||||
SELECT e.id, e.name, e.manager_id, ot.depth + 1
|
||||
FROM employees e
|
||||
JOIN org_tree ot ON e.manager_id = ot.id
|
||||
)
|
||||
SELECT * FROM org_tree ORDER BY depth, name;
|
||||
```
|
||||
|
||||
### Recursive CTE — path traversal
|
||||
```sql
|
||||
-- Category breadcrumb
|
||||
WITH RECURSIVE breadcrumb AS (
|
||||
SELECT id, name, parent_id, name::TEXT AS path
|
||||
FROM categories WHERE id = 42
|
||||
|
||||
UNION ALL
|
||||
|
||||
SELECT c.id, c.name, c.parent_id, c.name || ' > ' || b.path
|
||||
FROM categories c
|
||||
JOIN breadcrumb b ON c.id = b.parent_id
|
||||
)
|
||||
SELECT path FROM breadcrumb WHERE parent_id IS NULL;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Window Functions
|
||||
|
||||
### ROW_NUMBER — assign unique rank per partition
|
||||
```sql
|
||||
SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### RANK and DENSE_RANK — handle ties
|
||||
```sql
|
||||
-- RANK: 1, 2, 2, 4 (skips after tie)
|
||||
-- DENSE_RANK: 1, 2, 2, 3 (no skip)
|
||||
SELECT name, salary,
|
||||
RANK() OVER (ORDER BY salary DESC) AS rank,
|
||||
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### Running total and moving average
|
||||
```sql
|
||||
SELECT date, amount,
|
||||
SUM(amount) OVER (ORDER BY date) AS running_total,
|
||||
AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
|
||||
FROM daily_revenue;
|
||||
```
|
||||
|
||||
### LAG / LEAD — access adjacent rows
|
||||
```sql
|
||||
SELECT date, revenue,
|
||||
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
|
||||
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day_change
|
||||
FROM daily_revenue;
|
||||
```
|
||||
|
||||
### NTILE — divide into buckets
|
||||
```sql
|
||||
-- Split customers into quartiles by total spend
|
||||
SELECT customer_id, total_spend,
|
||||
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
|
||||
FROM customer_summary;
|
||||
```
|
||||
|
||||
### FIRST_VALUE / LAST_VALUE
|
||||
```sql
|
||||
SELECT department_id, name, salary,
|
||||
FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_paid
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Subquery Patterns
|
||||
|
||||
### EXISTS — correlated existence check
|
||||
```sql
|
||||
-- Users who have placed at least one order
|
||||
SELECT u.* FROM users u
|
||||
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
|
||||
```
|
||||
|
||||
### NOT EXISTS — safer than NOT IN for NULLs
|
||||
```sql
|
||||
-- Users who have never ordered
|
||||
SELECT u.* FROM users u
|
||||
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
|
||||
```
|
||||
|
||||
### Scalar subquery — single value
|
||||
```sql
|
||||
SELECT name, salary,
|
||||
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
|
||||
FROM employees;
|
||||
```
|
||||
|
||||
### Derived table — subquery in FROM
|
||||
```sql
|
||||
SELECT dept, avg_salary
|
||||
FROM (
|
||||
SELECT department_id AS dept, AVG(salary) AS avg_salary
|
||||
FROM employees GROUP BY department_id
|
||||
) dept_avg
|
||||
WHERE avg_salary > 100000;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Aggregation Patterns
|
||||
|
||||
### GROUP BY with HAVING
|
||||
```sql
|
||||
-- Departments with more than 10 employees
|
||||
SELECT department_id, COUNT(*) AS headcount, AVG(salary) AS avg_salary
|
||||
FROM employees
|
||||
GROUP BY department_id
|
||||
HAVING COUNT(*) > 10;
|
||||
```
|
||||
|
||||
### GROUPING SETS — multiple grouping levels
|
||||
```sql
|
||||
SELECT region, product_category, SUM(revenue)
|
||||
FROM sales
|
||||
GROUP BY GROUPING SETS (
|
||||
(region, product_category),
|
||||
(region),
|
||||
(product_category),
|
||||
()
|
||||
);
|
||||
```
|
||||
|
||||
### ROLLUP — hierarchical subtotals
|
||||
```sql
|
||||
SELECT region, city, SUM(revenue)
|
||||
FROM sales
|
||||
GROUP BY ROLLUP (region, city);
|
||||
-- Produces: (region, city), (region), ()
|
||||
```
|
||||
|
||||
### CUBE — all combinations
|
||||
```sql
|
||||
SELECT region, product, SUM(revenue)
|
||||
FROM sales
|
||||
GROUP BY CUBE (region, product);
|
||||
```
|
||||
|
||||
### FILTER clause (PostgreSQL) — conditional aggregation
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) AS total,
|
||||
COUNT(*) FILTER (WHERE status = 'paid') AS paid,
|
||||
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
|
||||
SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
|
||||
FROM orders;
|
||||
```
|
||||
MySQL/SQL Server equivalent: `SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)`.
|
||||
|
||||
---
|
||||
|
||||
## UPSERT Patterns
|
||||
|
||||
### PostgreSQL — ON CONFLICT
|
||||
```sql
|
||||
INSERT INTO user_settings (user_id, key, value, updated_at)
|
||||
VALUES (1, 'theme', 'dark', NOW())
|
||||
ON CONFLICT (user_id, key)
|
||||
DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
|
||||
```
|
||||
|
||||
### MySQL — ON DUPLICATE KEY
|
||||
```sql
|
||||
INSERT INTO user_settings (user_id, key_name, value, updated_at)
|
||||
VALUES (1, 'theme', 'dark', NOW())
|
||||
ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
|
||||
```
|
||||
|
||||
### SQL Server — MERGE
|
||||
```sql
|
||||
MERGE INTO user_settings AS target
|
||||
USING (VALUES (1, 'theme', 'dark')) AS source (user_id, key_name, value)
|
||||
ON target.user_id = source.user_id AND target.key_name = source.key_name
|
||||
WHEN MATCHED THEN UPDATE SET value = source.value, updated_at = GETDATE()
|
||||
WHEN NOT MATCHED THEN INSERT (user_id, key_name, value, updated_at)
|
||||
VALUES (source.user_id, source.key_name, source.value, GETDATE());
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## JSON Operations
|
||||
|
||||
### PostgreSQL JSONB
|
||||
```sql
|
||||
-- Extract field
|
||||
SELECT data->>'name' AS name FROM products WHERE data->>'category' = 'electronics';
|
||||
|
||||
-- Array contains
|
||||
SELECT * FROM products WHERE data->'tags' ? 'sale';
|
||||
|
||||
-- Update nested field
|
||||
UPDATE products SET data = jsonb_set(data, '{price}', '29.99') WHERE id = 1;
|
||||
|
||||
-- Aggregate into JSON array
|
||||
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM users;
|
||||
```
|
||||
|
||||
### MySQL JSON
|
||||
```sql
|
||||
-- Extract field
|
||||
SELECT JSON_EXTRACT(data, '$.name') AS name FROM products;
|
||||
-- Shorthand: SELECT data->>"$.name"
|
||||
|
||||
-- Search in array
|
||||
SELECT * FROM products WHERE JSON_CONTAINS(data->"$.tags", '"sale"');
|
||||
|
||||
-- Update
|
||||
UPDATE products SET data = JSON_SET(data, '$.price', 29.99) WHERE id = 1;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pagination Patterns
|
||||
|
||||
### Offset pagination (simple but slow for deep pages)
|
||||
```sql
|
||||
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
|
||||
```
|
||||
|
||||
### Keyset pagination (fast, requires ordered unique column)
|
||||
```sql
|
||||
-- Page after the last seen id
|
||||
SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20;
|
||||
```
|
||||
|
||||
### Keyset with composite sort
|
||||
```sql
|
||||
SELECT * FROM products
|
||||
WHERE (created_at, id) < (:last_created_at, :last_id)
|
||||
ORDER BY created_at DESC, id DESC
|
||||
LIMIT 20;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bulk Operations
|
||||
|
||||
### Batch INSERT
|
||||
```sql
|
||||
INSERT INTO events (type, payload, created_at) VALUES
|
||||
('click', '{"page": "/home"}', NOW()),
|
||||
('view', '{"page": "/pricing"}', NOW()),
|
||||
('click', '{"page": "/signup"}', NOW());
|
||||
```
|
||||
|
||||
### Batch UPDATE with VALUES
|
||||
```sql
|
||||
UPDATE products AS p SET price = v.price
|
||||
FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, price)
|
||||
WHERE p.id = v.id;
|
||||
```
|
||||
|
||||
### DELETE with subquery
|
||||
```sql
|
||||
DELETE FROM sessions
|
||||
WHERE user_id IN (SELECT id FROM users WHERE deleted_at IS NOT NULL);
|
||||
```
|
||||
|
||||
### COPY (PostgreSQL bulk load)
|
||||
```sql
|
||||
COPY products (name, price, category) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Utility Patterns
|
||||
|
||||
### Generate series (PostgreSQL)
|
||||
```sql
|
||||
-- Fill date gaps
|
||||
SELECT d::date FROM generate_series('2025-01-01'::date, '2025-12-31', '1 day') d;
|
||||
```
|
||||
|
||||
### Deduplicate rows
|
||||
```sql
|
||||
DELETE FROM events a USING events b
|
||||
WHERE a.id > b.id AND a.user_id = b.user_id AND a.event_type = b.event_type
|
||||
AND a.created_at = b.created_at;
|
||||
```
|
||||
|
||||
### Pivot (manual)
|
||||
```sql
|
||||
SELECT user_id,
|
||||
SUM(CASE WHEN month = 1 THEN revenue END) AS jan,
|
||||
SUM(CASE WHEN month = 2 THEN revenue END) AS feb,
|
||||
SUM(CASE WHEN month = 3 THEN revenue END) AS mar
|
||||
FROM monthly_revenue
|
||||
GROUP BY user_id;
|
||||
```
|
||||
|
||||
### Conditional INSERT (skip if exists)
|
||||
```sql
|
||||
INSERT INTO tags (name) SELECT 'new-tag'
|
||||
WHERE NOT EXISTS (SELECT 1 FROM tags WHERE name = 'new-tag');
|
||||
```
|
||||
@@ -0,0 +1,442 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Migration Generator
|
||||
|
||||
Generates database migration file templates (up/down) from natural-language
|
||||
schema change descriptions.
|
||||
|
||||
Supported operations:
|
||||
- Add column, drop column, rename column
|
||||
- Add table, drop table, rename table
|
||||
- Add index, drop index
|
||||
- Add constraint, drop constraint
|
||||
- Change column type
|
||||
|
||||
Usage:
|
||||
python migration_generator.py --change "add email_verified boolean to users" --dialect postgres
|
||||
python migration_generator.py --change "rename column name to full_name in customers" --format alembic
|
||||
python migration_generator.py --change "add index on orders(status, created_at)" --output 001_add_index.sql
|
||||
python migration_generator.py --change "create table reviews with id, user_id, rating, body" --json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import textwrap
|
||||
from dataclasses import dataclass, asdict
|
||||
from datetime import datetime
|
||||
from typing import List, Optional, Tuple
|
||||
|
||||
|
||||
@dataclass
|
||||
class Migration:
|
||||
"""A generated migration with up and down scripts."""
|
||||
description: str
|
||||
dialect: str
|
||||
format: str
|
||||
up: str
|
||||
down: str
|
||||
warnings: List[str]
|
||||
|
||||
def to_dict(self):
|
||||
return asdict(self)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Change parsers — extract structured intent from natural language
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def parse_add_column(desc: str) -> Optional[dict]:
|
||||
"""Parse: add <column> <type> to <table>"""
|
||||
m = re.match(
|
||||
r'add\s+(?:column\s+)?(\w+)\s+(\w[\w(),.]*)\s+(?:to|on)\s+(\w+)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return {"op": "add_column", "column": m.group(1), "type": m.group(2), "table": m.group(3)}
|
||||
return None
|
||||
|
||||
|
||||
def parse_drop_column(desc: str) -> Optional[dict]:
|
||||
"""Parse: drop/remove <column> from <table>"""
|
||||
m = re.match(
|
||||
r'(?:drop|remove)\s+(?:column\s+)?(\w+)\s+from\s+(\w+)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return {"op": "drop_column", "column": m.group(1), "table": m.group(2)}
|
||||
return None
|
||||
|
||||
|
||||
def parse_rename_column(desc: str) -> Optional[dict]:
|
||||
"""Parse: rename column <old> to <new> in <table>"""
|
||||
m = re.match(
|
||||
r'rename\s+column\s+(\w+)\s+to\s+(\w+)\s+in\s+(\w+)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return {"op": "rename_column", "old": m.group(1), "new": m.group(2), "table": m.group(3)}
|
||||
return None
|
||||
|
||||
|
||||
def parse_add_table(desc: str) -> Optional[dict]:
|
||||
"""Parse: create table <name> with <col1>, <col2>, ..."""
|
||||
m = re.match(
|
||||
r'create\s+table\s+(\w+)\s+with\s+(.+)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
cols = [c.strip() for c in m.group(2).split(",")]
|
||||
return {"op": "add_table", "table": m.group(1), "columns": cols}
|
||||
return None
|
||||
|
||||
|
||||
def parse_drop_table(desc: str) -> Optional[dict]:
|
||||
"""Parse: drop table <name>"""
|
||||
m = re.match(r'drop\s+table\s+(\w+)', desc, re.IGNORECASE)
|
||||
if m:
|
||||
return {"op": "drop_table", "table": m.group(1)}
|
||||
return None
|
||||
|
||||
|
||||
def parse_add_index(desc: str) -> Optional[dict]:
|
||||
"""Parse: add index on <table>(<col1>, <col2>)"""
|
||||
m = re.match(
|
||||
r'add\s+(?:unique\s+)?index\s+(?:on\s+)?(\w+)\s*\(([^)]+)\)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
unique = "unique" in desc.lower()
|
||||
cols = [c.strip() for c in m.group(2).split(",")]
|
||||
return {"op": "add_index", "table": m.group(1), "columns": cols, "unique": unique}
|
||||
return None
|
||||
|
||||
|
||||
def parse_change_type(desc: str) -> Optional[dict]:
|
||||
"""Parse: change <column> type to <type> in <table>"""
|
||||
m = re.match(
|
||||
r'change\s+(?:column\s+)?(\w+)\s+type\s+to\s+(\w[\w(),.]*)\s+in\s+(\w+)',
|
||||
desc, re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return {"op": "change_type", "column": m.group(1), "new_type": m.group(2), "table": m.group(3)}
|
||||
return None
|
||||
|
||||
|
||||
PARSERS = [
|
||||
parse_add_column,
|
||||
parse_drop_column,
|
||||
parse_rename_column,
|
||||
parse_add_table,
|
||||
parse_drop_table,
|
||||
parse_add_index,
|
||||
parse_change_type,
|
||||
]
|
||||
|
||||
|
||||
def parse_change(desc: str) -> Optional[dict]:
|
||||
for parser in PARSERS:
|
||||
result = parser(desc)
|
||||
if result:
|
||||
return result
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SQL generators per dialect
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
TYPE_MAP = {
|
||||
"boolean": {"postgres": "BOOLEAN", "mysql": "TINYINT(1)", "sqlite": "INTEGER", "sqlserver": "BIT"},
|
||||
"text": {"postgres": "TEXT", "mysql": "TEXT", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
|
||||
"integer": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
|
||||
"int": {"postgres": "INTEGER", "mysql": "INT", "sqlite": "INTEGER", "sqlserver": "INT"},
|
||||
"serial": {"postgres": "SERIAL", "mysql": "INT AUTO_INCREMENT", "sqlite": "INTEGER", "sqlserver": "INT IDENTITY(1,1)"},
|
||||
"varchar": {"postgres": "VARCHAR(255)", "mysql": "VARCHAR(255)", "sqlite": "TEXT", "sqlserver": "NVARCHAR(255)"},
|
||||
"timestamp": {"postgres": "TIMESTAMP", "mysql": "DATETIME", "sqlite": "TEXT", "sqlserver": "DATETIME2"},
|
||||
"uuid": {"postgres": "UUID", "mysql": "CHAR(36)", "sqlite": "TEXT", "sqlserver": "UNIQUEIDENTIFIER"},
|
||||
"json": {"postgres": "JSONB", "mysql": "JSON", "sqlite": "TEXT", "sqlserver": "NVARCHAR(MAX)"},
|
||||
"decimal": {"postgres": "DECIMAL(19,4)", "mysql": "DECIMAL(19,4)", "sqlite": "REAL", "sqlserver": "DECIMAL(19,4)"},
|
||||
"float": {"postgres": "DOUBLE PRECISION", "mysql": "DOUBLE", "sqlite": "REAL", "sqlserver": "FLOAT"},
|
||||
}
|
||||
|
||||
|
||||
def map_type(type_name: str, dialect: str) -> str:
|
||||
"""Map a generic type name to a dialect-specific type."""
|
||||
key = type_name.lower().rstrip("()")
|
||||
if key in TYPE_MAP and dialect in TYPE_MAP[key]:
|
||||
return TYPE_MAP[key][dialect]
|
||||
return type_name.upper()
|
||||
|
||||
|
||||
def gen_add_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
col_type = map_type(change["type"], dialect)
|
||||
table = change["table"]
|
||||
col = change["column"]
|
||||
up = f"ALTER TABLE {table} ADD COLUMN {col} {col_type};"
|
||||
down = f"ALTER TABLE {table} DROP COLUMN {col};"
|
||||
return up, down, []
|
||||
|
||||
|
||||
def gen_drop_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
col = change["column"]
|
||||
up = f"ALTER TABLE {table} DROP COLUMN {col};"
|
||||
down = f"-- WARNING: Cannot fully reverse DROP COLUMN. Provide the original type.\nALTER TABLE {table} ADD COLUMN {col} TEXT;"
|
||||
return up, down, ["Down migration uses TEXT as placeholder. Replace with the original column type."]
|
||||
|
||||
|
||||
def gen_rename_column(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
old, new = change["old"], change["new"]
|
||||
warnings = []
|
||||
if dialect == "postgres":
|
||||
up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
|
||||
down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
|
||||
elif dialect == "mysql":
|
||||
up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
|
||||
down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
|
||||
elif dialect == "sqlite":
|
||||
up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
|
||||
down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
|
||||
warnings.append("SQLite RENAME COLUMN requires version 3.25.0+.")
|
||||
elif dialect == "sqlserver":
|
||||
up = f"EXEC sp_rename '{table}.{old}', '{new}', 'COLUMN';"
|
||||
down = f"EXEC sp_rename '{table}.{new}', '{old}', 'COLUMN';"
|
||||
else:
|
||||
up = f"ALTER TABLE {table} RENAME COLUMN {old} TO {new};"
|
||||
down = f"ALTER TABLE {table} RENAME COLUMN {new} TO {old};"
|
||||
return up, down, warnings
|
||||
|
||||
|
||||
def gen_add_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
cols = change["columns"]
|
||||
col_defs = []
|
||||
has_id = False
|
||||
for col in cols:
|
||||
col = col.strip()
|
||||
if col.lower() == "id":
|
||||
has_id = True
|
||||
if dialect == "postgres":
|
||||
col_defs.append(" id SERIAL PRIMARY KEY")
|
||||
elif dialect == "mysql":
|
||||
col_defs.append(" id INT AUTO_INCREMENT PRIMARY KEY")
|
||||
elif dialect == "sqlite":
|
||||
col_defs.append(" id INTEGER PRIMARY KEY AUTOINCREMENT")
|
||||
elif dialect == "sqlserver":
|
||||
col_defs.append(" id INT IDENTITY(1,1) PRIMARY KEY")
|
||||
else:
|
||||
# Check if type is specified (e.g., "rating int")
|
||||
parts = col.split()
|
||||
if len(parts) >= 2:
|
||||
col_defs.append(f" {parts[0]} {map_type(parts[1], dialect)}")
|
||||
else:
|
||||
col_defs.append(f" {col} TEXT")
|
||||
|
||||
cols_sql = ",\n".join(col_defs)
|
||||
up = f"CREATE TABLE {table} (\n{cols_sql}\n);"
|
||||
down = f"DROP TABLE {table};"
|
||||
warnings = []
|
||||
if not has_id:
|
||||
warnings.append("Table has no explicit primary key. Consider adding an 'id' column.")
|
||||
return up, down, warnings
|
||||
|
||||
|
||||
def gen_drop_table(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
up = f"DROP TABLE {table};"
|
||||
down = f"-- WARNING: Cannot reverse DROP TABLE without original DDL.\nCREATE TABLE {table} (id INTEGER PRIMARY KEY);"
|
||||
return up, down, ["Down migration is a placeholder. Replace with the original CREATE TABLE statement."]
|
||||
|
||||
|
||||
def gen_add_index(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
cols = change["columns"]
|
||||
unique = "UNIQUE " if change.get("unique") else ""
|
||||
idx_name = f"idx_{table}_{'_'.join(cols)}"
|
||||
if dialect == "postgres":
|
||||
up = f"CREATE {unique}INDEX CONCURRENTLY {idx_name} ON {table} ({', '.join(cols)});"
|
||||
else:
|
||||
up = f"CREATE {unique}INDEX {idx_name} ON {table} ({', '.join(cols)});"
|
||||
down = f"DROP INDEX {idx_name};" if dialect != "mysql" else f"DROP INDEX {idx_name} ON {table};"
|
||||
warnings = []
|
||||
if dialect == "postgres":
|
||||
warnings.append("CONCURRENTLY cannot run inside a transaction. Run outside migration transaction.")
|
||||
return up, down, warnings
|
||||
|
||||
|
||||
def gen_change_type(change: dict, dialect: str) -> Tuple[str, str, List[str]]:
|
||||
table = change["table"]
|
||||
col = change["column"]
|
||||
new_type = map_type(change["new_type"], dialect)
|
||||
warnings = ["Down migration uses TEXT as placeholder. Replace with the original column type."]
|
||||
if dialect == "postgres":
|
||||
up = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE {new_type};"
|
||||
down = f"ALTER TABLE {table} ALTER COLUMN {col} TYPE TEXT;"
|
||||
elif dialect == "mysql":
|
||||
up = f"ALTER TABLE {table} MODIFY COLUMN {col} {new_type};"
|
||||
down = f"ALTER TABLE {table} MODIFY COLUMN {col} TEXT;"
|
||||
elif dialect == "sqlserver":
|
||||
up = f"ALTER TABLE {table} ALTER COLUMN {col} {new_type};"
|
||||
down = f"ALTER TABLE {table} ALTER COLUMN {col} NVARCHAR(MAX);"
|
||||
else:
|
||||
up = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
|
||||
down = f"-- SQLite does not support ALTER COLUMN. Recreate the table."
|
||||
warnings.append("SQLite requires table recreation for type changes.")
|
||||
return up, down, warnings
|
||||
|
||||
|
||||
GENERATORS = {
|
||||
"add_column": gen_add_column,
|
||||
"drop_column": gen_drop_column,
|
||||
"rename_column": gen_rename_column,
|
||||
"add_table": gen_add_table,
|
||||
"drop_table": gen_drop_table,
|
||||
"add_index": gen_add_index,
|
||||
"change_type": gen_change_type,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Format wrappers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def wrap_sql(up: str, down: str, description: str) -> Tuple[str, str]:
|
||||
"""Wrap as plain SQL migration files."""
|
||||
timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
|
||||
header = f"-- Migration: {description}\n-- Generated: {datetime.now().isoformat()}\n\n"
|
||||
return header + "-- Up\n" + up, header + "-- Down\n" + down
|
||||
|
||||
|
||||
def wrap_prisma(up: str, down: str, description: str) -> Tuple[str, str]:
|
||||
"""Format as Prisma migration SQL (Prisma uses raw SQL in migration.sql)."""
|
||||
header = f"-- Migration: {description}\n-- Format: Prisma (migration.sql)\n\n"
|
||||
return header + up, header + "-- Rollback\n" + down
|
||||
|
||||
|
||||
def wrap_alembic(up: str, down: str, description: str) -> Tuple[str, str]:
|
||||
"""Format as Alembic Python migration."""
|
||||
slug = re.sub(r'\W+', '_', description.lower())[:40]
|
||||
revision = datetime.now().strftime("%Y%m%d%H%M")
|
||||
template = textwrap.dedent(f'''\
|
||||
"""
|
||||
{description}
|
||||
|
||||
Revision ID: {revision}
|
||||
"""
|
||||
from alembic import op
|
||||
import sqlalchemy as sa
|
||||
|
||||
revision = '{revision}'
|
||||
down_revision = None # Set to previous revision
|
||||
|
||||
|
||||
def upgrade():
|
||||
op.execute("""
|
||||
{textwrap.indent(up, " ")}
|
||||
""")
|
||||
|
||||
|
||||
def downgrade():
|
||||
op.execute("""
|
||||
{textwrap.indent(down, " ")}
|
||||
""")
|
||||
''')
|
||||
return template, ""
|
||||
|
||||
|
||||
FORMATTERS = {
|
||||
"sql": wrap_sql,
|
||||
"prisma": wrap_prisma,
|
||||
"alembic": wrap_alembic,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate database migration templates from change descriptions.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Supported change descriptions:
|
||||
"add email_verified boolean to users"
|
||||
"drop column legacy_flag from accounts"
|
||||
"rename column name to full_name in customers"
|
||||
"create table reviews with id, user_id, rating int, body text"
|
||||
"drop table temp_imports"
|
||||
"add index on orders(status, created_at)"
|
||||
"add unique index on users(email)"
|
||||
"change email type to varchar in users"
|
||||
|
||||
Examples:
|
||||
%(prog)s --change "add phone varchar to users" --dialect postgres
|
||||
%(prog)s --change "create table reviews with id, user_id, rating int, body" --format prisma
|
||||
%(prog)s --change "add index on orders(status)" --output migrations/001.sql --json
|
||||
""",
|
||||
)
|
||||
parser.add_argument("--change", required=True, help="Natural-language description of the schema change")
|
||||
parser.add_argument("--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
|
||||
default="postgres", help="Target database dialect (default: postgres)")
|
||||
parser.add_argument("--format", choices=["sql", "prisma", "alembic"], default="sql",
|
||||
dest="fmt", help="Output format (default: sql)")
|
||||
parser.add_argument("--output", help="Write migration to file instead of stdout")
|
||||
parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
|
||||
args = parser.parse_args()
|
||||
|
||||
change = parse_change(args.change)
|
||||
if not change:
|
||||
print(f"Error: Could not parse change description: '{args.change}'", file=sys.stderr)
|
||||
print("Run with --help to see supported patterns.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
gen_fn = GENERATORS.get(change["op"])
|
||||
if not gen_fn:
|
||||
print(f"Error: No generator for operation '{change['op']}'", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
up, down, warnings = gen_fn(change, args.dialect)
|
||||
|
||||
fmt_fn = FORMATTERS[args.fmt]
|
||||
up_formatted, down_formatted = fmt_fn(up, down, args.change)
|
||||
|
||||
migration = Migration(
|
||||
description=args.change,
|
||||
dialect=args.dialect,
|
||||
format=args.fmt,
|
||||
up=up_formatted,
|
||||
down=down_formatted,
|
||||
warnings=warnings,
|
||||
)
|
||||
|
||||
if args.json_output:
|
||||
print(json.dumps(migration.to_dict(), indent=2))
|
||||
else:
|
||||
if args.output:
|
||||
with open(args.output, "w") as f:
|
||||
f.write(migration.up)
|
||||
print(f"Migration written to {args.output}")
|
||||
if migration.down:
|
||||
down_path = args.output.replace(".sql", "_down.sql")
|
||||
with open(down_path, "w") as f:
|
||||
f.write(migration.down)
|
||||
print(f"Rollback written to {down_path}")
|
||||
else:
|
||||
print(migration.up)
|
||||
if migration.down:
|
||||
print("\n" + "=" * 40 + " ROLLBACK " + "=" * 40 + "\n")
|
||||
print(migration.down)
|
||||
|
||||
if warnings:
|
||||
print("\nWarnings:")
|
||||
for w in warnings:
|
||||
print(f" - {w}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
348
engineering/sql-database-assistant/scripts/query_optimizer.py
Normal file
348
engineering/sql-database-assistant/scripts/query_optimizer.py
Normal file
@@ -0,0 +1,348 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
SQL Query Optimizer — Static Analysis
|
||||
|
||||
Analyzes SQL queries for common performance issues:
|
||||
- SELECT * usage
|
||||
- Missing WHERE clauses on UPDATE/DELETE
|
||||
- Cartesian joins (missing JOIN conditions)
|
||||
- Subqueries in SELECT list
|
||||
- Missing LIMIT on unbounded SELECTs
|
||||
- Function calls on indexed columns (non-sargable)
|
||||
- LIKE with leading wildcard
|
||||
- ORDER BY RAND()
|
||||
- UNION instead of UNION ALL
|
||||
- NOT IN with subquery (NULL-unsafe)
|
||||
|
||||
Usage:
|
||||
python query_optimizer.py --query "SELECT * FROM users"
|
||||
python query_optimizer.py --query queries.sql --dialect postgres
|
||||
python query_optimizer.py --query "SELECT * FROM orders" --json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from dataclasses import dataclass, asdict
|
||||
from typing import List, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class Issue:
|
||||
"""A single optimization issue found in a query."""
|
||||
severity: str # critical, warning, info
|
||||
rule: str
|
||||
message: str
|
||||
suggestion: str
|
||||
line: Optional[int] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class QueryAnalysis:
|
||||
"""Analysis result for one SQL query."""
|
||||
query: str
|
||||
issues: List[Issue]
|
||||
score: int # 0-100, higher is better
|
||||
|
||||
def to_dict(self):
|
||||
return {
|
||||
"query": self.query[:200] + ("..." if len(self.query) > 200 else ""),
|
||||
"issues": [asdict(i) for i in self.issues],
|
||||
"issue_count": len(self.issues),
|
||||
"score": self.score,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Rule checkers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def check_select_star(sql: str) -> Optional[Issue]:
|
||||
"""Detect SELECT * usage."""
|
||||
if re.search(r'\bSELECT\s+\*\s', sql, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="select-star",
|
||||
message="SELECT * transfers unnecessary data and breaks on schema changes.",
|
||||
suggestion="List only the columns you need: SELECT col1, col2, ...",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_missing_where(sql: str) -> Optional[Issue]:
|
||||
"""Detect UPDATE/DELETE without WHERE."""
|
||||
upper = sql.upper().strip()
|
||||
for keyword in ("UPDATE", "DELETE"):
|
||||
if upper.startswith(keyword) and "WHERE" not in upper:
|
||||
return Issue(
|
||||
severity="critical",
|
||||
rule="missing-where",
|
||||
message=f"{keyword} without WHERE affects every row in the table.",
|
||||
suggestion=f"Add a WHERE clause to restrict the {keyword} scope.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_cartesian_join(sql: str) -> Optional[Issue]:
|
||||
"""Detect comma-separated tables without explicit JOIN or WHERE join condition."""
|
||||
upper = sql.upper()
|
||||
if "SELECT" not in upper:
|
||||
return None
|
||||
from_match = re.search(r'\bFROM\s+(.+?)(?:\bWHERE\b|\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
|
||||
sql, re.IGNORECASE | re.DOTALL)
|
||||
if not from_match:
|
||||
return None
|
||||
from_clause = from_match.group(1)
|
||||
# Skip if explicit JOINs are used
|
||||
if re.search(r'\bJOIN\b', from_clause, re.IGNORECASE):
|
||||
return None
|
||||
# Count comma-separated tables
|
||||
tables = [t.strip() for t in from_clause.split(",") if t.strip()]
|
||||
if len(tables) > 1 and "WHERE" not in upper:
|
||||
return Issue(
|
||||
severity="critical",
|
||||
rule="cartesian-join",
|
||||
message="Multiple tables in FROM without JOIN or WHERE creates a cartesian product.",
|
||||
suggestion="Use explicit JOIN syntax with ON conditions.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_subquery_in_select(sql: str) -> Optional[Issue]:
|
||||
"""Detect correlated subqueries in SELECT list."""
|
||||
select_match = re.search(r'\bSELECT\b(.+?)\bFROM\b', sql, re.IGNORECASE | re.DOTALL)
|
||||
if select_match:
|
||||
select_clause = select_match.group(1)
|
||||
if re.search(r'\(\s*SELECT\b', select_clause, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="subquery-in-select",
|
||||
message="Subquery in SELECT list executes once per row (correlated subquery).",
|
||||
suggestion="Rewrite as a LEFT JOIN with aggregation.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_missing_limit(sql: str) -> Optional[Issue]:
|
||||
"""Detect unbounded SELECT without LIMIT."""
|
||||
upper = sql.upper().strip()
|
||||
if not upper.startswith("SELECT"):
|
||||
return None
|
||||
# Skip if it's a subquery or aggregate-only
|
||||
if re.search(r'\bCOUNT\s*\(', upper) and "GROUP BY" not in upper:
|
||||
return None
|
||||
if "LIMIT" not in upper and "FETCH" not in upper and "TOP " not in upper:
|
||||
return Issue(
|
||||
severity="info",
|
||||
rule="missing-limit",
|
||||
message="SELECT without LIMIT may return unbounded rows.",
|
||||
suggestion="Add LIMIT to prevent returning excessive data.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_function_on_column(sql: str) -> Optional[Issue]:
|
||||
"""Detect function calls on columns in WHERE (non-sargable)."""
|
||||
where_match = re.search(r'\bWHERE\b(.+?)(?:\bGROUP\b|\bORDER\b|\bLIMIT\b|\bHAVING\b|;|$)',
|
||||
sql, re.IGNORECASE | re.DOTALL)
|
||||
if not where_match:
|
||||
return None
|
||||
where_clause = where_match.group(1)
|
||||
non_sargable = re.search(
|
||||
r'\b(YEAR|MONTH|DAY|DATE|UPPER|LOWER|TRIM|CAST|COALESCE|IFNULL|NVL)\s*\(',
|
||||
where_clause, re.IGNORECASE
|
||||
)
|
||||
if non_sargable:
|
||||
func = non_sargable.group(1).upper()
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="non-sargable",
|
||||
message=f"Function {func}() on column in WHERE prevents index usage.",
|
||||
suggestion="Rewrite to compare the raw column against transformed constants.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_leading_wildcard(sql: str) -> Optional[Issue]:
|
||||
"""Detect LIKE '%...' patterns."""
|
||||
if re.search(r"LIKE\s+'%", sql, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="leading-wildcard",
|
||||
message="LIKE with leading wildcard prevents index usage.",
|
||||
suggestion="Use full-text search (GIN index, FULLTEXT, FTS5) for substring matching.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_order_by_rand(sql: str) -> Optional[Issue]:
|
||||
"""Detect ORDER BY RAND() / RANDOM()."""
|
||||
if re.search(r'ORDER\s+BY\s+(RAND|RANDOM)\s*\(\)', sql, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="order-by-rand",
|
||||
message="ORDER BY RAND() scans and sorts the entire table.",
|
||||
suggestion="Use application-side random sampling or TABLESAMPLE.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_union_vs_union_all(sql: str) -> Optional[Issue]:
|
||||
"""Detect UNION without ALL (unnecessary dedup)."""
|
||||
if re.search(r'\bUNION\b(?!\s+ALL\b)', sql, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="info",
|
||||
rule="union-without-all",
|
||||
message="UNION performs deduplication sort; use UNION ALL if duplicates are acceptable.",
|
||||
suggestion="Replace UNION with UNION ALL unless you specifically need deduplication.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def check_not_in_subquery(sql: str) -> Optional[Issue]:
|
||||
"""Detect NOT IN (SELECT ...) which is NULL-unsafe."""
|
||||
if re.search(r'\bNOT\s+IN\s*\(\s*SELECT\b', sql, re.IGNORECASE):
|
||||
return Issue(
|
||||
severity="warning",
|
||||
rule="not-in-subquery",
|
||||
message="NOT IN with subquery returns no rows if any subquery result is NULL.",
|
||||
suggestion="Use NOT EXISTS (SELECT 1 ...) instead.",
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
ALL_CHECKS = [
|
||||
check_select_star,
|
||||
check_missing_where,
|
||||
check_cartesian_join,
|
||||
check_subquery_in_select,
|
||||
check_missing_limit,
|
||||
check_function_on_column,
|
||||
check_leading_wildcard,
|
||||
check_order_by_rand,
|
||||
check_union_vs_union_all,
|
||||
check_not_in_subquery,
|
||||
]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Analysis engine
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def analyze_query(sql: str, dialect: str = "postgres") -> QueryAnalysis:
|
||||
"""Run all checks against a single SQL query."""
|
||||
issues: List[Issue] = []
|
||||
for check_fn in ALL_CHECKS:
|
||||
issue = check_fn(sql)
|
||||
if issue:
|
||||
issues.append(issue)
|
||||
|
||||
# Score: start at 100, deduct per severity
|
||||
score = 100
|
||||
for issue in issues:
|
||||
if issue.severity == "critical":
|
||||
score -= 25
|
||||
elif issue.severity == "warning":
|
||||
score -= 10
|
||||
else:
|
||||
score -= 5
|
||||
score = max(0, score)
|
||||
|
||||
return QueryAnalysis(query=sql.strip(), issues=issues, score=score)
|
||||
|
||||
|
||||
def split_queries(text: str) -> List[str]:
|
||||
"""Split SQL text into individual statements."""
|
||||
queries = []
|
||||
for stmt in text.split(";"):
|
||||
stmt = stmt.strip()
|
||||
if stmt and len(stmt) > 5:
|
||||
queries.append(stmt + ";")
|
||||
return queries
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Output formatting
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SEVERITY_ICONS = {"critical": "[CRITICAL]", "warning": "[WARNING]", "info": "[INFO]"}
|
||||
|
||||
|
||||
def format_text(analyses: List[QueryAnalysis]) -> str:
|
||||
"""Format analysis results as human-readable text."""
|
||||
lines = []
|
||||
for i, analysis in enumerate(analyses, 1):
|
||||
lines.append(f"{'='*60}")
|
||||
lines.append(f"Query {i} (Score: {analysis.score}/100)")
|
||||
lines.append(f" {analysis.query[:120]}{'...' if len(analysis.query) > 120 else ''}")
|
||||
lines.append("")
|
||||
if not analysis.issues:
|
||||
lines.append(" No issues detected.")
|
||||
for issue in analysis.issues:
|
||||
icon = SEVERITY_ICONS.get(issue.severity, "")
|
||||
lines.append(f" {icon} {issue.rule}: {issue.message}")
|
||||
lines.append(f" -> {issue.suggestion}")
|
||||
lines.append("")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def format_json(analyses: List[QueryAnalysis]) -> str:
|
||||
"""Format analysis results as JSON."""
|
||||
return json.dumps(
|
||||
{"analyses": [a.to_dict() for a in analyses], "total_queries": len(analyses)},
|
||||
indent=2,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Analyze SQL queries for common performance issues.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s --query "SELECT * FROM users"
|
||||
%(prog)s --query queries.sql --dialect mysql
|
||||
%(prog)s --query "DELETE FROM orders" --json
|
||||
""",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--query", required=True,
|
||||
help="SQL query string or path to a .sql file",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dialect", choices=["postgres", "mysql", "sqlite", "sqlserver"],
|
||||
default="postgres", help="SQL dialect (default: postgres)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json", action="store_true", dest="json_output",
|
||||
help="Output results as JSON",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# Determine if query is a file path or inline SQL
|
||||
sql_text = args.query
|
||||
if os.path.isfile(args.query):
|
||||
with open(args.query, "r") as f:
|
||||
sql_text = f.read()
|
||||
|
||||
queries = split_queries(sql_text)
|
||||
if not queries:
|
||||
# Treat the whole input as a single query
|
||||
queries = [sql_text.strip()]
|
||||
|
||||
analyses = [analyze_query(q, args.dialect) for q in queries]
|
||||
|
||||
if args.json_output:
|
||||
print(format_json(analyses))
|
||||
else:
|
||||
print(format_text(analyses))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
315
engineering/sql-database-assistant/scripts/schema_explorer.py
Normal file
315
engineering/sql-database-assistant/scripts/schema_explorer.py
Normal file
@@ -0,0 +1,315 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Schema Explorer
|
||||
|
||||
Generates schema documentation from database introspection queries.
|
||||
Outputs the introspection SQL and sample documentation templates
|
||||
for PostgreSQL, MySQL, SQLite, and SQL Server.
|
||||
|
||||
Since this tool runs without a live database connection, it generates:
|
||||
1. The introspection queries you need to run
|
||||
2. Documentation templates from the results
|
||||
3. Sample schema docs for common table patterns
|
||||
|
||||
Usage:
|
||||
python schema_explorer.py --dialect postgres --tables all --format md
|
||||
python schema_explorer.py --dialect mysql --tables users,orders --format json
|
||||
python schema_explorer.py --dialect sqlite --tables all --json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
import textwrap
|
||||
from dataclasses import dataclass, asdict
|
||||
from typing import List, Optional, Dict
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Introspection query templates per dialect
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
INTROSPECTION_QUERIES: Dict[str, Dict[str, str]] = {
|
||||
"postgres": {
|
||||
"tables": textwrap.dedent("""\
|
||||
SELECT table_name
|
||||
FROM information_schema.tables
|
||||
WHERE table_schema = 'public' AND table_type = 'BASE TABLE'
|
||||
ORDER BY table_name;"""),
|
||||
"columns": textwrap.dedent("""\
|
||||
SELECT table_name, column_name, data_type, character_maximum_length,
|
||||
is_nullable, column_default
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'public' {table_filter}
|
||||
ORDER BY table_name, ordinal_position;"""),
|
||||
"primary_keys": textwrap.dedent("""\
|
||||
SELECT tc.table_name, kcu.column_name
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.constraint_name = kcu.constraint_name
|
||||
WHERE tc.constraint_type = 'PRIMARY KEY' AND tc.table_schema = 'public'
|
||||
ORDER BY tc.table_name;"""),
|
||||
"foreign_keys": textwrap.dedent("""\
|
||||
SELECT tc.table_name, kcu.column_name,
|
||||
ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
|
||||
FROM information_schema.table_constraints tc
|
||||
JOIN information_schema.key_column_usage kcu
|
||||
ON tc.constraint_name = kcu.constraint_name
|
||||
JOIN information_schema.constraint_column_usage ccu
|
||||
ON tc.constraint_name = ccu.constraint_name
|
||||
WHERE tc.constraint_type = 'FOREIGN KEY'
|
||||
ORDER BY tc.table_name;"""),
|
||||
"indexes": textwrap.dedent("""\
|
||||
SELECT schemaname, tablename, indexname, indexdef
|
||||
FROM pg_indexes
|
||||
WHERE schemaname = 'public'
|
||||
ORDER BY tablename, indexname;"""),
|
||||
"table_sizes": textwrap.dedent("""\
|
||||
SELECT relname AS table_name,
|
||||
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
|
||||
pg_size_pretty(pg_relation_size(relid)) AS data_size,
|
||||
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
|
||||
FROM pg_catalog.pg_statio_user_tables
|
||||
ORDER BY pg_total_relation_size(relid) DESC;"""),
|
||||
},
|
||||
"mysql": {
|
||||
"tables": textwrap.dedent("""\
|
||||
SELECT table_name
|
||||
FROM information_schema.tables
|
||||
WHERE table_schema = DATABASE() AND table_type = 'BASE TABLE'
|
||||
ORDER BY table_name;"""),
|
||||
"columns": textwrap.dedent("""\
|
||||
SELECT table_name, column_name, column_type, is_nullable,
|
||||
column_default, column_key, extra
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = DATABASE() {table_filter}
|
||||
ORDER BY table_name, ordinal_position;"""),
|
||||
"foreign_keys": textwrap.dedent("""\
|
||||
SELECT table_name, column_name, referenced_table_name, referenced_column_name
|
||||
FROM information_schema.key_column_usage
|
||||
WHERE table_schema = DATABASE() AND referenced_table_name IS NOT NULL
|
||||
ORDER BY table_name;"""),
|
||||
"indexes": textwrap.dedent("""\
|
||||
SELECT table_name, index_name, non_unique, column_name, seq_in_index
|
||||
FROM information_schema.statistics
|
||||
WHERE table_schema = DATABASE()
|
||||
ORDER BY table_name, index_name, seq_in_index;"""),
|
||||
"table_sizes": textwrap.dedent("""\
|
||||
SELECT table_name, table_rows,
|
||||
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
|
||||
ROUND(index_length / 1024 / 1024, 2) AS index_mb
|
||||
FROM information_schema.tables
|
||||
WHERE table_schema = DATABASE()
|
||||
ORDER BY data_length DESC;"""),
|
||||
},
|
||||
"sqlite": {
|
||||
"tables": textwrap.dedent("""\
|
||||
SELECT name FROM sqlite_master
|
||||
WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
|
||||
ORDER BY name;"""),
|
||||
"columns": textwrap.dedent("""\
|
||||
-- Run for each table:
|
||||
PRAGMA table_info({table_name});"""),
|
||||
"foreign_keys": textwrap.dedent("""\
|
||||
-- Run for each table:
|
||||
PRAGMA foreign_key_list({table_name});"""),
|
||||
"indexes": textwrap.dedent("""\
|
||||
SELECT name, tbl_name, sql FROM sqlite_master
|
||||
WHERE type = 'index'
|
||||
ORDER BY tbl_name, name;"""),
|
||||
"schema_dump": textwrap.dedent("""\
|
||||
SELECT name, sql FROM sqlite_master
|
||||
WHERE type = 'table'
|
||||
ORDER BY name;"""),
|
||||
},
|
||||
"sqlserver": {
|
||||
"tables": textwrap.dedent("""\
|
||||
SELECT TABLE_NAME
|
||||
FROM INFORMATION_SCHEMA.TABLES
|
||||
WHERE TABLE_TYPE = 'BASE TABLE'
|
||||
ORDER BY TABLE_NAME;"""),
|
||||
"columns": textwrap.dedent("""\
|
||||
SELECT t.name AS table_name, c.name AS column_name,
|
||||
ty.name AS data_type, c.max_length, c.precision, c.scale,
|
||||
c.is_nullable, dc.definition AS default_value
|
||||
FROM sys.columns c
|
||||
JOIN sys.tables t ON c.object_id = t.object_id
|
||||
JOIN sys.types ty ON c.user_type_id = ty.user_type_id
|
||||
LEFT JOIN sys.default_constraints dc ON c.default_object_id = dc.object_id
|
||||
{table_filter}
|
||||
ORDER BY t.name, c.column_id;"""),
|
||||
"foreign_keys": textwrap.dedent("""\
|
||||
SELECT fk.name AS fk_name,
|
||||
tp.name AS parent_table, cp.name AS parent_column,
|
||||
tr.name AS referenced_table, cr.name AS referenced_column
|
||||
FROM sys.foreign_keys fk
|
||||
JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
|
||||
JOIN sys.tables tp ON fkc.parent_object_id = tp.object_id
|
||||
JOIN sys.columns cp ON fkc.parent_object_id = cp.object_id AND fkc.parent_column_id = cp.column_id
|
||||
JOIN sys.tables tr ON fkc.referenced_object_id = tr.object_id
|
||||
JOIN sys.columns cr ON fkc.referenced_object_id = cr.object_id AND fkc.referenced_column_id = cr.column_id
|
||||
ORDER BY tp.name;"""),
|
||||
"indexes": textwrap.dedent("""\
|
||||
SELECT t.name AS table_name, i.name AS index_name,
|
||||
i.type_desc, i.is_unique, c.name AS column_name,
|
||||
ic.key_ordinal
|
||||
FROM sys.indexes i
|
||||
JOIN sys.index_columns ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id
|
||||
JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
|
||||
JOIN sys.tables t ON i.object_id = t.object_id
|
||||
WHERE i.name IS NOT NULL
|
||||
ORDER BY t.name, i.name, ic.key_ordinal;"""),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Documentation generators
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SAMPLE_TABLES = {
|
||||
"users": {
|
||||
"columns": [
|
||||
{"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
|
||||
{"name": "email", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "Unique, indexed"},
|
||||
{"name": "name", "type": "VARCHAR(255)", "nullable": "YES", "default": "NULL", "notes": "Display name"},
|
||||
{"name": "password_hash", "type": "VARCHAR(255)", "nullable": "NO", "default": "-", "notes": "bcrypt hash"},
|
||||
{"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
|
||||
{"name": "updated_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
|
||||
],
|
||||
"indexes": ["PRIMARY KEY (id)", "UNIQUE INDEX (email)"],
|
||||
"foreign_keys": [],
|
||||
},
|
||||
"orders": {
|
||||
"columns": [
|
||||
{"name": "id", "type": "SERIAL / INT", "nullable": "NO", "default": "auto", "notes": "Primary key"},
|
||||
{"name": "user_id", "type": "INTEGER", "nullable": "NO", "default": "-", "notes": "FK -> users.id"},
|
||||
{"name": "status", "type": "VARCHAR(50)", "nullable": "NO", "default": "'pending'", "notes": "pending/paid/shipped/cancelled"},
|
||||
{"name": "total", "type": "DECIMAL(19,4)", "nullable": "NO", "default": "0", "notes": "Order total in cents"},
|
||||
{"name": "created_at", "type": "TIMESTAMP", "nullable": "NO", "default": "NOW()", "notes": ""},
|
||||
],
|
||||
"indexes": ["PRIMARY KEY (id)", "INDEX (user_id)", "INDEX (status, created_at)"],
|
||||
"foreign_keys": ["user_id -> users.id ON DELETE CASCADE"],
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def generate_md(dialect: str, tables: List[str]) -> str:
|
||||
"""Generate markdown schema documentation."""
|
||||
lines = [f"# Database Schema Documentation ({dialect.upper()})\n"]
|
||||
lines.append(f"Generated by sql-database-assistant schema_explorer.\n")
|
||||
|
||||
# Introspection queries section
|
||||
lines.append("## Introspection Queries\n")
|
||||
lines.append("Run these queries against your database to extract schema information:\n")
|
||||
queries = INTROSPECTION_QUERIES.get(dialect, {})
|
||||
for qname, qsql in queries.items():
|
||||
table_filter = ""
|
||||
if "all" not in tables:
|
||||
tlist = ", ".join(f"'{t}'" for t in tables)
|
||||
table_filter = f"AND table_name IN ({tlist})"
|
||||
qsql = qsql.replace("{table_filter}", table_filter)
|
||||
qsql = qsql.replace("{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME")
|
||||
lines.append(f"### {qname.replace('_', ' ').title()}\n")
|
||||
lines.append(f"```sql\n{qsql}\n```\n")
|
||||
|
||||
# Sample documentation
|
||||
lines.append("## Sample Table Documentation\n")
|
||||
lines.append("Below is an example of the documentation format produced from query results:\n")
|
||||
|
||||
show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
|
||||
for tname in show_tables:
|
||||
sample = SAMPLE_TABLES.get(tname)
|
||||
if not sample:
|
||||
lines.append(f"### {tname}\n")
|
||||
lines.append("_No sample data available. Run introspection queries above._\n")
|
||||
continue
|
||||
|
||||
lines.append(f"### {tname}\n")
|
||||
lines.append("| Column | Type | Nullable | Default | Notes |")
|
||||
lines.append("|--------|------|----------|---------|-------|")
|
||||
for col in sample["columns"]:
|
||||
lines.append(f"| {col['name']} | {col['type']} | {col['nullable']} | {col['default']} | {col['notes']} |")
|
||||
lines.append("")
|
||||
if sample["indexes"]:
|
||||
lines.append("**Indexes:** " + ", ".join(sample["indexes"]))
|
||||
if sample["foreign_keys"]:
|
||||
lines.append("**Foreign Keys:** " + ", ".join(sample["foreign_keys"]))
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def generate_json_output(dialect: str, tables: List[str]) -> dict:
|
||||
"""Generate JSON schema documentation."""
|
||||
queries = INTROSPECTION_QUERIES.get(dialect, {})
|
||||
processed = {}
|
||||
for qname, qsql in queries.items():
|
||||
table_filter = ""
|
||||
if "all" not in tables:
|
||||
tlist = ", ".join(f"'{t}'" for t in tables)
|
||||
table_filter = f"AND table_name IN ({tlist})"
|
||||
processed[qname] = qsql.replace("{table_filter}", table_filter).replace(
|
||||
"{table_name}", tables[0] if tables and tables[0] != "all" else "TABLE_NAME"
|
||||
)
|
||||
|
||||
show_tables = tables if "all" not in tables else list(SAMPLE_TABLES.keys())
|
||||
sample_docs = {}
|
||||
for tname in show_tables:
|
||||
sample = SAMPLE_TABLES.get(tname)
|
||||
if sample:
|
||||
sample_docs[tname] = sample
|
||||
|
||||
return {
|
||||
"dialect": dialect,
|
||||
"requested_tables": tables,
|
||||
"introspection_queries": processed,
|
||||
"sample_documentation": sample_docs,
|
||||
"instructions": "Run the introspection queries against your database, then use the results to populate documentation in the sample format shown.",
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# CLI
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate schema documentation from database introspection.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s --dialect postgres --tables all --format md
|
||||
%(prog)s --dialect mysql --tables users,orders --format json
|
||||
%(prog)s --dialect sqlite --tables all --json
|
||||
""",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dialect", required=True, choices=["postgres", "mysql", "sqlite", "sqlserver"],
|
||||
help="Target database dialect",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--tables", default="all",
|
||||
help="Comma-separated table names or 'all' (default: all)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--format", choices=["md", "json"], default="md", dest="fmt",
|
||||
help="Output format (default: md)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--json", action="store_true", dest="json_output",
|
||||
help="Output as JSON (overrides --format)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
tables = [t.strip() for t in args.tables.split(",")]
|
||||
|
||||
if args.json_output or args.fmt == "json":
|
||||
result = generate_json_output(args.dialect, tables)
|
||||
print(json.dumps(result, indent=2))
|
||||
else:
|
||||
print(generate_md(args.dialect, tables))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user