feat(engineering,ra-qm): add secrets-vault-manager, sql-database-assistant, gcp-cloud-architect, soc2-compliance

secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references):
- HashiCorp Vault, AWS SM, Azure KV, GCP SM integration
- Secret rotation, dynamic secrets, audit logging, emergency procedures

sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references):
- Query optimization, migration generation, schema exploration
- Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server)
- ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy)

gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references):
- 6-step workflow mirroring aws-solution-architect for GCP
- Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization
- Completes cloud trifecta (AWS + Azure + GCP)

soc2-compliance (417-line SKILL.md, 3 scripts, 3 references):
- SOC 2 Type I & II preparation, Trust Service Criteria mapping
- Control matrix generation, evidence tracking, gap analysis
- First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485)

All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Reza Rezvani
2026-03-25 14:05:11 +01:00
parent 7a2189fa21
commit 87f3a007c9
36 changed files with 13450 additions and 6 deletions

View File

@@ -0,0 +1,330 @@
# Query Optimization Guide
How to read EXPLAIN plans, choose the right index types, understand query plan operators, and configure connection pooling.
---
## Reading EXPLAIN Plans
### PostgreSQL — EXPLAIN ANALYZE
```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
```
**Sample output:**
```
Limit (cost=0.43..12.87 rows=20 width=128) (actual time=0.052..0.089 rows=20 loops=1)
-> Index Scan Backward using idx_orders_status_created on orders (cost=0.43..4521.33 rows=7284 width=128) (actual time=0.051..0.085 rows=20 loops=1)
Index Cond: (status = 'paid')
Buffers: shared hit=4
Planning Time: 0.156 ms
Execution Time: 0.112 ms
```
**Key fields to check:**
| Field | What it tells you |
|-------|-------------------|
| `cost` | Estimated startup..total cost (arbitrary units) |
| `rows` | Estimated row count at that node |
| `actual time` | Real wall-clock time in milliseconds |
| `actual rows` | Real row count — compare against estimate |
| `Buffers: shared hit` | Pages read from cache (good) |
| `Buffers: shared read` | Pages read from disk (slow) |
| `loops` | How many times the node executed |
**Red flags:**
- `Seq Scan` on a large table with a WHERE clause — missing index
- `actual rows` >> `rows` (estimated) — stale statistics, run `ANALYZE`
- `Nested Loop` with high loop count — consider hash join or add index
- `Sort` with `external merge` — not enough `work_mem`, spilling to disk
- `Buffers: shared read` much higher than `shared hit` — cold cache or table too large for memory
### MySQL — EXPLAIN FORMAT=JSON
```sql
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE status = 'paid' ORDER BY created_at DESC LIMIT 20;
```
**Key fields:**
- `query_block.select_id` — identifies subqueries
- `table.access_type``ALL` (full scan), `ref` (index lookup), `range`, `index`, `const`
- `table.rows_examined_per_scan` — how many rows the engine reads
- `table.using_index` — covering index (no table lookup needed)
- `table.attached_condition` — the WHERE filter applied
**Access types ranked (best to worst):**
`system` > `const` > `eq_ref` > `ref` > `range` > `index` > `ALL`
---
## Index Types
### B-tree (default)
The workhorse index. Supports equality, range, prefix, and ORDER BY operations.
**Best for:** `=`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `LIKE 'prefix%'`, `ORDER BY`, `MIN()`, `MAX()`
```sql
CREATE INDEX idx_orders_created ON orders (created_at);
```
**Composite B-tree:** Column order matters. The index is useful for queries that filter on a leftmost prefix of the indexed columns.
```sql
-- This index serves: WHERE status = ... AND created_at > ...
-- Also serves: WHERE status = ...
-- Does NOT serve: WHERE created_at > ... (without status)
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
```
### Hash
Equality-only lookups. Faster than B-tree for exact matches but no range support.
**Best for:** `=` lookups on high-cardinality columns
```sql
-- PostgreSQL
CREATE INDEX idx_sessions_token ON sessions USING hash (token);
```
**Limitations:** No range queries, no ORDER BY, not WAL-logged before PostgreSQL 10.
### GIN (Generalized Inverted Index)
For multi-valued data: arrays, JSONB, full-text search vectors.
```sql
-- JSONB containment
CREATE INDEX idx_products_tags ON products USING gin (tags);
-- Query: SELECT * FROM products WHERE tags @> '["sale"]';
-- Full-text search
CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
```
### GiST (Generalized Search Tree)
For geometric, range, and proximity data.
```sql
-- Range type (e.g., date ranges)
CREATE INDEX idx_bookings_period ON bookings USING gist (during);
-- Query: SELECT * FROM bookings WHERE during && '[2025-01-01, 2025-01-31]';
-- PostGIS geometry
CREATE INDEX idx_locations_geom ON locations USING gist (geom);
```
### BRIN (Block Range INdex)
Tiny index for naturally ordered data (e.g., time-series append-only tables).
```sql
CREATE INDEX idx_events_created ON events USING brin (created_at);
```
**Best for:** Large tables where the indexed column correlates with physical row order. Much smaller than B-tree but less precise.
### Partial Index
Index only rows matching a condition. Smaller and faster for targeted queries.
```sql
-- Only index active users (skip millions of inactive)
CREATE INDEX idx_users_active_email ON users (email) WHERE status = 'active';
```
### Covering Index (INCLUDE)
Store extra columns in the index to avoid table lookups (index-only scans).
```sql
-- PostgreSQL 11+
CREATE INDEX idx_orders_status ON orders (status) INCLUDE (total, created_at);
-- Query can be answered entirely from the index:
-- SELECT total, created_at FROM orders WHERE status = 'paid';
```
### Expression Index
Index the result of a function or expression.
```sql
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Query: SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
```
---
## Query Plan Operators
### Scan operators
| Operator | Description | Performance |
|----------|-------------|-------------|
| **Seq Scan** | Full table scan, reads every row | Slow on large tables |
| **Index Scan** | B-tree lookup + table fetch | Fast for selective queries |
| **Index Only Scan** | Reads only the index (covering) | Fastest for covered queries |
| **Bitmap Index Scan** | Builds a bitmap of matching pages | Good for medium selectivity |
| **Bitmap Heap Scan** | Fetches pages identified by bitmap | Pairs with bitmap index scan |
### Join operators
| Operator | Description | Best when |
|----------|-------------|-----------|
| **Nested Loop** | For each outer row, scan inner | Small outer set, indexed inner |
| **Hash Join** | Build hash table on inner, probe with outer | Medium-large sets, no index |
| **Merge Join** | Merge two sorted inputs | Both inputs already sorted |
### Other operators
| Operator | Description |
|----------|-------------|
| **Sort** | Sorts rows (may spill to disk if work_mem exceeded) |
| **Hash Aggregate** | GROUP BY using hash table |
| **Group Aggregate** | GROUP BY on pre-sorted input |
| **Limit** | Stops after N rows |
| **Materialize** | Caches subquery results in memory |
| **Gather / Gather Merge** | Collects results from parallel workers |
---
## Connection Pooling
### Why pool connections?
Each database connection consumes memory (5-10 MB in PostgreSQL). Without pooling:
- Application creates a new connection per request (slow: TCP + TLS + auth)
- Under load, connection count spikes past `max_connections`
- Database OOM or connection refused errors
### PgBouncer (PostgreSQL)
The standard external connection pooler for PostgreSQL.
**Modes:**
- **Session** — connection assigned for entire client session (safest, least efficient)
- **Transaction** — connection returned to pool after each transaction (recommended)
- **Statement** — connection returned after each statement (cannot use transactions)
```ini
# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 200
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 300
```
**Sizing formula:**
```
default_pool_size = num_cpu_cores * 2 + effective_spindle_count
```
For SSDs, start with `num_cpu_cores * 2` (typically 4-16 connections is optimal).
### ProxySQL (MySQL)
```ini
mysql_servers = ({ address="127.0.0.1", port=3306, hostgroup=0, max_connections=100 })
mysql_query_rules = ({ rule_id=1, match_pattern="^SELECT.*FOR UPDATE", destination_hostgroup=0 })
```
### Application-Level Pooling
Most ORMs and drivers include built-in pooling:
| Platform | Pool Configuration |
|----------|--------------------|
| **node-postgres** | `new Pool({ max: 20, idleTimeoutMillis: 30000 })` |
| **SQLAlchemy** | `create_engine(url, pool_size=20, max_overflow=5)` |
| **HikariCP (Java)** | `maximumPoolSize=20, minimumIdle=5, idleTimeout=300000` |
| **Prisma** | `connection_limit=20` in connection string |
### Pool Sizing Guidelines
| Metric | Guideline |
|--------|-----------|
| **Minimum** | Number of always-active background workers |
| **Maximum** | 2-4x CPU cores for OLTP; lower for OLAP |
| **Idle timeout** | 30-300 seconds (reclaim unused connections) |
| **Connection timeout** | 3-10 seconds (fail fast under pressure) |
| **Queue size** | 2-5x pool max (buffer bursts before rejecting) |
**Warning:** More connections does not mean better performance. Beyond the optimal point (usually 20-50), contention on locks, CPU, and I/O causes throughput to decrease.
---
## Statistics and Maintenance
### PostgreSQL
```sql
-- Update statistics for the query planner
ANALYZE orders;
ANALYZE; -- All tables
-- Check table bloat and dead tuples
SELECT relname, n_dead_tup, last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
-- Identify unused indexes
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
```
### MySQL
```sql
-- Update statistics
ANALYZE TABLE orders;
-- Check index usage
SELECT * FROM sys.schema_unused_indexes;
SELECT * FROM sys.schema_redundant_indexes;
-- Identify long-running queries
SELECT * FROM information_schema.processlist WHERE time > 10;
```
---
## Performance Checklist
Before deploying any query to production:
1. Run `EXPLAIN ANALYZE` and verify no unexpected sequential scans
2. Check that estimated rows are within 10x of actual rows
3. Verify index usage on all WHERE, JOIN, and ORDER BY columns
4. Ensure LIMIT is present for user-facing list queries
5. Confirm parameterized queries (no string concatenation)
6. Test with production-like data volume (not just 10 rows)
7. Monitor query time in application metrics after deployment
8. Set up slow query log alerting (> 100ms for OLTP, > 5s for reports)
---
## Quick Reference: When to Use Which Index
| Query Pattern | Index Type |
|--------------|-----------|
| `WHERE col = value` | B-tree or Hash |
| `WHERE col > value` | B-tree |
| `WHERE col LIKE 'prefix%'` | B-tree |
| `WHERE col LIKE '%substring%'` | GIN (full-text) or trigram |
| `WHERE jsonb_col @> '{...}'` | GIN |
| `WHERE array_col && ARRAY[...]` | GIN |
| `WHERE range_col && '[a,b]'` | GiST |
| `WHERE ST_DWithin(geom, ...)` | GiST |
| `WHERE col = value` (append-only) | BRIN |
| `WHERE col = value AND status = 'active'` | Partial B-tree |
| `SELECT a, b WHERE c = value` | Covering (INCLUDE) |

View File

@@ -0,0 +1,451 @@
# ORM Patterns Reference
Side-by-side comparison of Prisma, Drizzle, TypeORM, and SQLAlchemy patterns for common database operations.
---
## Schema Definition
### Prisma (schema.prisma)
```prisma
model User {
id Int @id @default(autoincrement())
email String @unique
name String?
role Role @default(USER)
posts Post[]
profile Profile?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
@@index([email])
@@map("users")
}
model Post {
id Int @id @default(autoincrement())
title String
body String?
published Boolean @default(false)
author User @relation(fields: [authorId], references: [id], onDelete: Cascade)
authorId Int
tags Tag[]
createdAt DateTime @default(now())
@@index([authorId])
@@index([published, createdAt])
@@map("posts")
}
enum Role {
USER
ADMIN
MODERATOR
}
```
### Drizzle (schema.ts)
```typescript
import { pgTable, serial, varchar, text, boolean, timestamp, integer, pgEnum } from 'drizzle-orm/pg-core';
export const roleEnum = pgEnum('role', ['USER', 'ADMIN', 'MODERATOR']);
export const users = pgTable('users', {
id: serial('id').primaryKey(),
email: varchar('email', { length: 255 }).notNull().unique(),
name: varchar('name', { length: 255 }),
role: roleEnum('role').default('USER').notNull(),
createdAt: timestamp('created_at').defaultNow().notNull(),
updatedAt: timestamp('updated_at').defaultNow().notNull(),
});
export const posts = pgTable('posts', {
id: serial('id').primaryKey(),
title: varchar('title', { length: 255 }).notNull(),
body: text('body'),
published: boolean('published').default(false).notNull(),
authorId: integer('author_id').notNull().references(() => users.id, { onDelete: 'cascade' }),
createdAt: timestamp('created_at').defaultNow().notNull(),
}, (table) => ({
authorIdx: index('idx_posts_author').on(table.authorId),
publishedIdx: index('idx_posts_published').on(table.published, table.createdAt),
}));
```
### TypeORM (entities)
```typescript
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, OneToMany, CreateDateColumn, UpdateDateColumn, Index } from 'typeorm';
export enum Role { USER = 'USER', ADMIN = 'ADMIN', MODERATOR = 'MODERATOR' }
@Entity('users')
export class User {
@PrimaryGeneratedColumn()
id: number;
@Column({ unique: true })
@Index()
email: string;
@Column({ nullable: true })
name: string;
@Column({ type: 'enum', enum: Role, default: Role.USER })
role: Role;
@OneToMany(() => Post, post => post.author)
posts: Post[];
@CreateDateColumn()
createdAt: Date;
@UpdateDateColumn()
updatedAt: Date;
}
@Entity('posts')
@Index(['published', 'createdAt'])
export class Post {
@PrimaryGeneratedColumn()
id: number;
@Column()
title: string;
@Column({ nullable: true, type: 'text' })
body: string;
@Column({ default: false })
published: boolean;
@ManyToOne(() => User, user => user.posts, { onDelete: 'CASCADE' })
author: User;
@Column()
authorId: number;
@CreateDateColumn()
createdAt: Date;
}
```
### SQLAlchemy (models.py)
```python
import enum
from datetime import datetime
from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, Enum, ForeignKey, Index
from sqlalchemy.orm import relationship, DeclarativeBase
class Base(DeclarativeBase):
pass
class Role(enum.Enum):
USER = "USER"
ADMIN = "ADMIN"
MODERATOR = "MODERATOR"
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, autoincrement=True)
email = Column(String(255), unique=True, nullable=False, index=True)
name = Column(String(255), nullable=True)
role = Column(Enum(Role), default=Role.USER, nullable=False)
posts = relationship('Post', back_populates='author', cascade='all, delete-orphan')
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
class Post(Base):
__tablename__ = 'posts'
__table_args__ = (
Index('idx_posts_published', 'published', 'created_at'),
)
id = Column(Integer, primary_key=True, autoincrement=True)
title = Column(String(255), nullable=False)
body = Column(Text, nullable=True)
published = Column(Boolean, default=False, nullable=False)
author_id = Column(Integer, ForeignKey('users.id', ondelete='CASCADE'), nullable=False, index=True)
author = relationship('User', back_populates='posts')
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
```
---
## CRUD Operations
### Create
| ORM | Pattern |
|-----|---------|
| **Prisma** | `await prisma.user.create({ data: { email, name } })` |
| **Drizzle** | `await db.insert(users).values({ email, name }).returning()` |
| **TypeORM** | `await userRepo.save(userRepo.create({ email, name }))` |
| **SQLAlchemy** | `session.add(User(email=email, name=name)); session.commit()` |
### Read (with filter)
| ORM | Pattern |
|-----|---------|
| **Prisma** | `await prisma.user.findMany({ where: { role: 'ADMIN' }, orderBy: { createdAt: 'desc' } })` |
| **Drizzle** | `await db.select().from(users).where(eq(users.role, 'ADMIN')).orderBy(desc(users.createdAt))` |
| **TypeORM** | `await userRepo.find({ where: { role: Role.ADMIN }, order: { createdAt: 'DESC' } })` |
| **SQLAlchemy** | `session.query(User).filter(User.role == Role.ADMIN).order_by(User.created_at.desc()).all()` |
### Update
| ORM | Pattern |
|-----|---------|
| **Prisma** | `await prisma.user.update({ where: { id }, data: { name } })` |
| **Drizzle** | `await db.update(users).set({ name }).where(eq(users.id, id))` |
| **TypeORM** | `await userRepo.update(id, { name })` |
| **SQLAlchemy** | `session.query(User).filter(User.id == id).update({User.name: name}); session.commit()` |
### Delete
| ORM | Pattern |
|-----|---------|
| **Prisma** | `await prisma.user.delete({ where: { id } })` |
| **Drizzle** | `await db.delete(users).where(eq(users.id, id))` |
| **TypeORM** | `await userRepo.delete(id)` |
| **SQLAlchemy** | `session.query(User).filter(User.id == id).delete(); session.commit()` |
---
## Relations and Eager Loading
### Prisma — include / select
```typescript
// Eager load posts with user
const user = await prisma.user.findUnique({
where: { id: 1 },
include: { posts: { where: { published: true }, orderBy: { createdAt: 'desc' } } },
});
// Nested create
await prisma.user.create({
data: {
email: 'new@example.com',
posts: { create: [{ title: 'First post' }] },
},
});
```
### Drizzle — relational queries
```typescript
const result = await db.query.users.findFirst({
where: eq(users.id, 1),
with: { posts: { where: eq(posts.published, true), orderBy: [desc(posts.createdAt)] } },
});
```
### TypeORM — relations / query builder
```typescript
// FindOptions
const user = await userRepo.findOne({ where: { id: 1 }, relations: ['posts'] });
// QueryBuilder for complex joins
const result = await userRepo.createQueryBuilder('u')
.leftJoinAndSelect('u.posts', 'p', 'p.published = :pub', { pub: true })
.where('u.id = :id', { id: 1 })
.getOne();
```
### SQLAlchemy — joinedload / selectinload
```python
from sqlalchemy.orm import joinedload, selectinload
# Eager load in one JOIN query
user = session.query(User).options(joinedload(User.posts)).filter(User.id == 1).first()
# Eager load in a separate IN query (better for collections)
users = session.query(User).options(selectinload(User.posts)).all()
```
---
## Raw SQL Escape Hatches
Every ORM should provide a way to execute raw SQL for complex queries:
| ORM | Pattern |
|-----|---------|
| **Prisma** | `` prisma.$queryRaw`SELECT * FROM users WHERE id = ${id}` `` |
| **Drizzle** | `db.execute(sql`SELECT * FROM users WHERE id = ${id}`)` |
| **TypeORM** | `dataSource.query('SELECT * FROM users WHERE id = $1', [id])` |
| **SQLAlchemy** | `session.execute(text('SELECT * FROM users WHERE id = :id'), {'id': id})` |
Always use parameterized queries in raw SQL to prevent injection.
---
## Transaction Patterns
### Prisma
```typescript
await prisma.$transaction(async (tx) => {
const user = await tx.user.create({ data: { email } });
await tx.post.create({ data: { title: 'Welcome', authorId: user.id } });
});
```
### Drizzle
```typescript
await db.transaction(async (tx) => {
const [user] = await tx.insert(users).values({ email }).returning();
await tx.insert(posts).values({ title: 'Welcome', authorId: user.id });
});
```
### TypeORM
```typescript
await dataSource.transaction(async (manager) => {
const user = await manager.save(User, { email });
await manager.save(Post, { title: 'Welcome', authorId: user.id });
});
```
### SQLAlchemy
```python
with Session() as session:
try:
user = User(email=email)
session.add(user)
session.flush() # Get user.id without committing
session.add(Post(title='Welcome', author_id=user.id))
session.commit()
except Exception:
session.rollback()
raise
```
---
## Migration Workflows
### Prisma
```bash
# Generate migration from schema changes
npx prisma migrate dev --name add_posts_table
# Apply in production
npx prisma migrate deploy
# Reset database (dev only)
npx prisma migrate reset
# Generate client after schema change
npx prisma generate
```
**Files:** `prisma/migrations/<timestamp>_<name>/migration.sql`
### Drizzle
```bash
# Generate migration SQL from schema diff
npx drizzle-kit generate:pg
# Push schema directly (dev only, no migration files)
npx drizzle-kit push:pg
# Apply migrations
npx drizzle-kit migrate
```
**Files:** `drizzle/<timestamp>_<name>.sql`
### TypeORM
```bash
# Auto-generate migration from entity changes
npx typeorm migration:generate -d data-source.ts -n AddPostsTable
# Create empty migration
npx typeorm migration:create -n CustomMigration
# Run pending migrations
npx typeorm migration:run -d data-source.ts
# Revert last migration
npx typeorm migration:revert -d data-source.ts
```
**Files:** `src/migrations/<timestamp>-<Name>.ts`
### SQLAlchemy (Alembic)
```bash
# Initialize Alembic
alembic init alembic
# Auto-generate migration from model changes
alembic revision --autogenerate -m "add posts table"
# Apply all pending
alembic upgrade head
# Revert one step
alembic downgrade -1
# Show current state
alembic current
```
**Files:** `alembic/versions/<hash>_<slug>.py`
---
## N+1 Prevention Cheat Sheet
| ORM | Lazy (N+1 risk) | Eager (fixed) |
|-----|-----------------|---------------|
| **Prisma** | Not accessing `include` | `include: { posts: true }` |
| **Drizzle** | Separate queries | `with: { posts: true }` |
| **TypeORM** | `@ManyToOne(() => ..., { lazy: true })` | `relations: ['posts']` or `leftJoinAndSelect` |
| **SQLAlchemy** | Default `lazy='select'` | `joinedload()` or `selectinload()` |
**Rule of thumb:** If you access a relation inside a loop, you have an N+1 problem. Always load relations before the loop.
---
## Connection Pooling
### Prisma
```
# In .env or connection string
DATABASE_URL="postgresql://user:pass@host/db?connection_limit=20&pool_timeout=10"
```
### Drizzle (with node-postgres)
```typescript
import { Pool } from 'pg';
const pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 5000 });
const db = drizzle(pool);
```
### TypeORM
```typescript
const dataSource = new DataSource({
type: 'postgres',
extra: { max: 20, idleTimeoutMillis: 30000 },
});
```
### SQLAlchemy
```python
from sqlalchemy import create_engine
engine = create_engine('postgresql://user:pass@host/db', pool_size=20, max_overflow=5, pool_timeout=30)
```
---
## Best Practices Summary
1. **Always use migrations** — never modify production schemas by hand
2. **Eager load relations** — prevent N+1 in every list/collection query
3. **Use transactions** — group related writes to maintain consistency
4. **Parameterize raw SQL** — never concatenate user input into queries
5. **Connection pooling** — configure pool size matching your workload
6. **Index foreign keys** — ORMs often skip this; add manually if needed
7. **Review generated SQL** — enable query logging in development to catch inefficiencies
8. **Type-safe queries** — leverage TypeScript/Python typing for compile-time checks
9. **Separate read/write models** — use views or read replicas for heavy reporting queries
10. **Test migrations both ways** — always verify that down migrations actually reverse up migrations

View File

@@ -0,0 +1,406 @@
# SQL Query Patterns Reference
Common query patterns for everyday database operations. All examples use PostgreSQL syntax with dialect notes where they differ.
---
## JOIN Patterns
### INNER JOIN — matching rows in both tables
```sql
SELECT u.name, o.id AS order_id, o.total
FROM users u
INNER JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid';
```
### LEFT JOIN — all rows from left, matching from right
```sql
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;
```
Returns users even if they have zero orders.
### Self JOIN — comparing rows within the same table
```sql
-- Find employees who earn more than their manager
SELECT e.name AS employee, m.name AS manager, e.salary, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
```
### CROSS JOIN — every combination (cartesian product)
```sql
-- Generate a calendar grid
SELECT d.date, s.shift_name
FROM dates d
CROSS JOIN shifts s;
```
Use intentionally. Accidental cartesian joins are a performance killer.
### LATERAL JOIN (PostgreSQL) — correlated subquery as a table
```sql
-- Top 3 orders per user
SELECT u.name, top_orders.*
FROM users u
CROSS JOIN LATERAL (
SELECT id, total FROM orders
WHERE user_id = u.id
ORDER BY total DESC LIMIT 3
) top_orders;
```
MySQL equivalent: use a subquery with `ROW_NUMBER()`.
---
## Common Table Expressions (CTEs)
### Basic CTE — readable subquery
```sql
WITH active_users AS (
SELECT id, name, email
FROM users
WHERE last_login > CURRENT_DATE - INTERVAL '30 days'
)
SELECT au.name, COUNT(o.id) AS recent_orders
FROM active_users au
JOIN orders o ON o.user_id = au.id
GROUP BY au.name;
```
### Multiple CTEs — chaining transformations
```sql
WITH monthly_revenue AS (
SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue
FROM orders WHERE status = 'paid'
GROUP BY 1
),
growth AS (
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
FROM monthly_revenue
)
SELECT * FROM growth ORDER BY month;
```
### Recursive CTE — hierarchical data
```sql
-- Organization tree
WITH RECURSIVE org_tree AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 0 AS depth
FROM employees WHERE manager_id IS NULL
UNION ALL
-- Recursive case: subordinates
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY depth, name;
```
### Recursive CTE — path traversal
```sql
-- Category breadcrumb
WITH RECURSIVE breadcrumb AS (
SELECT id, name, parent_id, name::TEXT AS path
FROM categories WHERE id = 42
UNION ALL
SELECT c.id, c.name, c.parent_id, c.name || ' > ' || b.path
FROM categories c
JOIN breadcrumb b ON c.id = b.parent_id
)
SELECT path FROM breadcrumb WHERE parent_id IS NULL;
```
---
## Window Functions
### ROW_NUMBER — assign unique rank per partition
```sql
SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM employees;
```
### RANK and DENSE_RANK — handle ties
```sql
-- RANK: 1, 2, 2, 4 (skips after tie)
-- DENSE_RANK: 1, 2, 2, 3 (no skip)
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
```
### Running total and moving average
```sql
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total,
AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7d
FROM daily_revenue;
```
### LAG / LEAD — access adjacent rows
```sql
SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day_change
FROM daily_revenue;
```
### NTILE — divide into buckets
```sql
-- Split customers into quartiles by total spend
SELECT customer_id, total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM customer_summary;
```
### FIRST_VALUE / LAST_VALUE
```sql
SELECT department_id, name, salary,
FIRST_VALUE(name) OVER (PARTITION BY department_id ORDER BY salary DESC) AS highest_paid
FROM employees;
```
---
## Subquery Patterns
### EXISTS — correlated existence check
```sql
-- Users who have placed at least one order
SELECT u.* FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
```
### NOT EXISTS — safer than NOT IN for NULLs
```sql
-- Users who have never ordered
SELECT u.* FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
```
### Scalar subquery — single value
```sql
SELECT name, salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
```
### Derived table — subquery in FROM
```sql
SELECT dept, avg_salary
FROM (
SELECT department_id AS dept, AVG(salary) AS avg_salary
FROM employees GROUP BY department_id
) dept_avg
WHERE avg_salary > 100000;
```
---
## Aggregation Patterns
### GROUP BY with HAVING
```sql
-- Departments with more than 10 employees
SELECT department_id, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10;
```
### GROUPING SETS — multiple grouping levels
```sql
SELECT region, product_category, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
(region, product_category),
(region),
(product_category),
()
);
```
### ROLLUP — hierarchical subtotals
```sql
SELECT region, city, SUM(revenue)
FROM sales
GROUP BY ROLLUP (region, city);
-- Produces: (region, city), (region), ()
```
### CUBE — all combinations
```sql
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY CUBE (region, product);
```
### FILTER clause (PostgreSQL) — conditional aggregation
```sql
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = 'paid') AS paid,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
SUM(total) FILTER (WHERE status = 'paid') AS paid_revenue
FROM orders;
```
MySQL/SQL Server equivalent: `SUM(CASE WHEN status = 'paid' THEN 1 ELSE 0 END)`.
---
## UPSERT Patterns
### PostgreSQL — ON CONFLICT
```sql
INSERT INTO user_settings (user_id, key, value, updated_at)
VALUES (1, 'theme', 'dark', NOW())
ON CONFLICT (user_id, key)
DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
```
### MySQL — ON DUPLICATE KEY
```sql
INSERT INTO user_settings (user_id, key_name, value, updated_at)
VALUES (1, 'theme', 'dark', NOW())
ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
```
### SQL Server — MERGE
```sql
MERGE INTO user_settings AS target
USING (VALUES (1, 'theme', 'dark')) AS source (user_id, key_name, value)
ON target.user_id = source.user_id AND target.key_name = source.key_name
WHEN MATCHED THEN UPDATE SET value = source.value, updated_at = GETDATE()
WHEN NOT MATCHED THEN INSERT (user_id, key_name, value, updated_at)
VALUES (source.user_id, source.key_name, source.value, GETDATE());
```
---
## JSON Operations
### PostgreSQL JSONB
```sql
-- Extract field
SELECT data->>'name' AS name FROM products WHERE data->>'category' = 'electronics';
-- Array contains
SELECT * FROM products WHERE data->'tags' ? 'sale';
-- Update nested field
UPDATE products SET data = jsonb_set(data, '{price}', '29.99') WHERE id = 1;
-- Aggregate into JSON array
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM users;
```
### MySQL JSON
```sql
-- Extract field
SELECT JSON_EXTRACT(data, '$.name') AS name FROM products;
-- Shorthand: SELECT data->>"$.name"
-- Search in array
SELECT * FROM products WHERE JSON_CONTAINS(data->"$.tags", '"sale"');
-- Update
UPDATE products SET data = JSON_SET(data, '$.price', 29.99) WHERE id = 1;
```
---
## Pagination Patterns
### Offset pagination (simple but slow for deep pages)
```sql
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
```
### Keyset pagination (fast, requires ordered unique column)
```sql
-- Page after the last seen id
SELECT * FROM products WHERE id > :last_seen_id ORDER BY id LIMIT 20;
```
### Keyset with composite sort
```sql
SELECT * FROM products
WHERE (created_at, id) < (:last_created_at, :last_id)
ORDER BY created_at DESC, id DESC
LIMIT 20;
```
---
## Bulk Operations
### Batch INSERT
```sql
INSERT INTO events (type, payload, created_at) VALUES
('click', '{"page": "/home"}', NOW()),
('view', '{"page": "/pricing"}', NOW()),
('click', '{"page": "/signup"}', NOW());
```
### Batch UPDATE with VALUES
```sql
UPDATE products AS p SET price = v.price
FROM (VALUES (1, 29.99), (2, 49.99), (3, 9.99)) AS v(id, price)
WHERE p.id = v.id;
```
### DELETE with subquery
```sql
DELETE FROM sessions
WHERE user_id IN (SELECT id FROM users WHERE deleted_at IS NOT NULL);
```
### COPY (PostgreSQL bulk load)
```sql
COPY products (name, price, category) FROM '/path/to/data.csv' WITH (FORMAT csv, HEADER true);
```
---
## Utility Patterns
### Generate series (PostgreSQL)
```sql
-- Fill date gaps
SELECT d::date FROM generate_series('2025-01-01'::date, '2025-12-31', '1 day') d;
```
### Deduplicate rows
```sql
DELETE FROM events a USING events b
WHERE a.id > b.id AND a.user_id = b.user_id AND a.event_type = b.event_type
AND a.created_at = b.created_at;
```
### Pivot (manual)
```sql
SELECT user_id,
SUM(CASE WHEN month = 1 THEN revenue END) AS jan,
SUM(CASE WHEN month = 2 THEN revenue END) AS feb,
SUM(CASE WHEN month = 3 THEN revenue END) AS mar
FROM monthly_revenue
GROUP BY user_id;
```
### Conditional INSERT (skip if exists)
```sql
INSERT INTO tags (name) SELECT 'new-tag'
WHERE NOT EXISTS (SELECT 1 FROM tags WHERE name = 'new-tag');
```