Files
claude-skills-reference/engineering-team/senior-backend/references/database_optimization_guide.md

594 lines
14 KiB
Markdown

# Database Optimization Guide
Practical strategies for PostgreSQL query optimization, indexing, and performance tuning.
## Guide Index
1. [Query Analysis with EXPLAIN](#1-query-analysis-with-explain)
2. [Indexing Strategies](#2-indexing-strategies)
3. [N+1 Query Problem](#3-n1-query-problem)
4. [Connection Pooling](#4-connection-pooling)
5. [Query Optimization Patterns](#5-query-optimization-patterns)
6. [Database Migrations](#6-database-migrations)
7. [Monitoring and Alerting](#7-monitoring-and-alerting)
---
## 1. Query Analysis with EXPLAIN
### Basic EXPLAIN Usage
```sql
-- Show query plan
EXPLAIN SELECT * FROM orders WHERE user_id = 123;
-- Show plan with actual execution times
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Show buffers and I/O statistics
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE user_id = 123;
```
### Reading EXPLAIN Output
```
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using idx_orders_user_id on orders (cost=0.43..8.45 rows=10 width=120)
Index Cond: (user_id = 123)
Buffers: shared hit=3
Planning Time: 0.152 ms
Execution Time: 0.089 ms
```
**Key metrics:**
- `cost`: Estimated cost (startup..total)
- `rows`: Estimated row count
- `width`: Average row size in bytes
- `actual time`: Real execution time (with ANALYZE)
- `Buffers: shared hit`: Pages read from cache
### Scan Types (Best to Worst)
| Scan Type | Description | Performance |
|-----------|-------------|-------------|
| Index Only Scan | Data from index alone | Best |
| Index Scan | Index lookup + heap fetch | Good |
| Bitmap Index Scan | Multiple index conditions | Good |
| Index Scan + Filter | Index + row filtering | Okay |
| Seq Scan (small table) | Full table scan | Okay |
| Seq Scan (large table) | Full table scan | Bad |
| Nested Loop (large) | O(n*m) join | Very Bad |
### Warning Signs
```sql
-- BAD: Sequential scan on large table
Seq Scan on orders (cost=0.00..1854231.00 rows=50000000 width=120)
Filter: (status = 'pending')
Rows Removed by Filter: 49500000
-- BAD: Nested loop with high iterations
Nested Loop (cost=0.43..2847593.20 rows=12500000 width=240)
-> Seq Scan on users (cost=0.00..1250.00 rows=50000 width=120)
-> Index Scan on orders (cost=0.43..45.73 rows=250 width=120)
Index Cond: (orders.user_id = users.id)
```
---
## 2. Indexing Strategies
### Index Types
```sql
-- B-tree (default, most common)
CREATE INDEX idx_users_email ON users(email);
-- Hash (equality only, rarely better than B-tree)
CREATE INDEX idx_users_id_hash ON users USING hash(id);
-- GIN (arrays, JSONB, full-text search)
CREATE INDEX idx_products_tags ON products USING gin(tags);
CREATE INDEX idx_users_data ON users USING gin(metadata jsonb_path_ops);
-- GiST (geometric, range types, full-text)
CREATE INDEX idx_locations_point ON locations USING gist(coordinates);
```
### Composite Indexes
```sql
-- Order matters! Column with = first, then range/sort
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at DESC);
-- This index supports:
-- WHERE user_id = ?
-- WHERE user_id = ? AND status = ?
-- WHERE user_id = ? AND status = ? ORDER BY created_at DESC
-- WHERE user_id = ? ORDER BY created_at DESC
-- This index does NOT efficiently support:
-- WHERE status = ? (user_id not in query)
-- WHERE created_at > ? (leftmost column not in query)
```
### Partial Indexes
```sql
-- Index only active users (smaller, faster)
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_orders_recent
ON orders(created_at DESC)
WHERE created_at > CURRENT_DATE - INTERVAL '90 days';
-- Index only unprocessed items
CREATE INDEX idx_queue_pending
ON job_queue(priority DESC, created_at)
WHERE processed_at IS NULL;
```
### Covering Indexes (Index-Only Scans)
```sql
-- Include non-indexed columns to avoid heap lookup
CREATE INDEX idx_users_email_covering
ON users(email)
INCLUDE (name, created_at);
-- Query can be satisfied from index alone
SELECT name, created_at FROM users WHERE email = 'test@example.com';
-- Result: Index Only Scan
```
### Index Maintenance
```sql
-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Find unused indexes (candidates for removal)
SELECT indexrelid::regclass as index,
relid::regclass as table,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (SELECT conindid FROM pg_constraint);
-- Rebuild bloated indexes
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
```
---
## 3. N+1 Query Problem
### The Problem
```typescript
// BAD: N+1 queries
const users = await db.query('SELECT * FROM users LIMIT 100');
for (const user of users) {
// This runs 100 times!
const orders = await db.query(
'SELECT * FROM orders WHERE user_id = $1',
[user.id]
);
user.orders = orders;
}
// Total queries: 1 + 100 = 101
```
### Solution 1: JOIN
```typescript
// GOOD: Single query with JOIN
const usersWithOrders = await db.query(`
SELECT u.*, o.id as order_id, o.total, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
LIMIT 100
`);
// Total queries: 1
```
### Solution 2: Batch Loading (DataLoader pattern)
```typescript
// GOOD: Two queries with batch loading
const users = await db.query('SELECT * FROM users LIMIT 100');
const userIds = users.map(u => u.id);
const orders = await db.query(
'SELECT * FROM orders WHERE user_id = ANY($1)',
[userIds]
);
// Group orders by user_id
const ordersByUser = groupBy(orders, 'user_id');
users.forEach(user => {
user.orders = ordersByUser[user.id] || [];
});
// Total queries: 2
```
### Solution 3: ORM Eager Loading
```typescript
// Prisma
const users = await prisma.user.findMany({
take: 100,
include: { orders: true }
});
// TypeORM
const users = await userRepository.find({
take: 100,
relations: ['orders']
});
// Sequelize
const users = await User.findAll({
limit: 100,
include: [{ model: Order }]
});
```
### Detecting N+1 in Production
```typescript
// Query logging middleware
let queryCount = 0;
const originalQuery = db.query;
db.query = async (...args) => {
queryCount++;
if (queryCount > 10) {
console.warn(`High query count: ${queryCount} in single request`);
console.trace();
}
return originalQuery.apply(db, args);
};
```
---
## 4. Connection Pooling
### Why Pooling Matters
```
Without pooling:
Request → Create connection → Query → Close connection
(50-100ms overhead)
With pooling:
Request → Get connection from pool → Query → Return to pool
(0-1ms overhead)
```
### pg-pool Configuration
```typescript
import { Pool } from 'pg';
const pool = new Pool({
host: process.env.DB_HOST,
port: 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
// Pool settings
min: 5, // Minimum connections
max: 20, // Maximum connections
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 5000, // Fail if can't connect in 5s
// Statement timeout (cancel long queries)
statement_timeout: 30000,
});
// Health check
pool.on('error', (err, client) => {
console.error('Unexpected pool error', err);
});
```
### Pool Sizing Formula
```
Optimal connections = (CPU cores * 2) + effective_spindle_count
For SSD with 4 cores:
connections = (4 * 2) + 1 = 9
For multiple app servers:
connections_per_server = total_connections / num_servers
```
### PgBouncer for High Scale
```ini
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
```
---
## 5. Query Optimization Patterns
### Pagination Optimization
```sql
-- BAD: OFFSET is slow for large values
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Must scan 10,020 rows, discard 10,000
-- GOOD: Cursor-based pagination
SELECT * FROM orders
WHERE created_at < '2024-01-15T10:00:00Z'
ORDER BY created_at DESC
LIMIT 20;
-- Only scans 20 rows
```
### Batch Updates
```sql
-- BAD: Individual updates
UPDATE orders SET status = 'shipped' WHERE id = 1;
UPDATE orders SET status = 'shipped' WHERE id = 2;
-- ...repeat 1000 times
-- GOOD: Batch update
UPDATE orders
SET status = 'shipped'
WHERE id = ANY(ARRAY[1, 2, 3, ...1000]);
-- GOOD: Update from values
UPDATE orders o
SET status = v.new_status
FROM (VALUES
(1, 'shipped'),
(2, 'delivered'),
(3, 'cancelled')
) AS v(id, new_status)
WHERE o.id = v.id;
```
### Avoiding SELECT *
```sql
-- BAD: Fetches all columns including large text/blob
SELECT * FROM articles WHERE published = true;
-- GOOD: Only fetch needed columns
SELECT id, title, summary, author_id, published_at
FROM articles
WHERE published = true;
```
### Using EXISTS vs IN
```sql
-- For checking existence, EXISTS is often faster
-- BAD
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
-- GOOD (for large subquery results)
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.total > 1000
);
```
### Materialized Views for Complex Aggregations
```sql
-- Create materialized view for expensive aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
date_trunc('day', created_at) as date,
product_id,
COUNT(*) as order_count,
SUM(quantity) as total_quantity,
SUM(total) as total_revenue
FROM orders
GROUP BY date_trunc('day', created_at), product_id;
-- Create index on materialized view
CREATE INDEX idx_daily_sales_date ON daily_sales_summary(date);
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;
```
---
## 6. Database Migrations
### Migration Best Practices
```sql
-- Always include rollback
-- migrations/20240115_001_add_user_status.sql
-- UP
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
CREATE INDEX CONCURRENTLY idx_users_status ON users(status);
-- DOWN (in separate file or comment)
DROP INDEX CONCURRENTLY IF EXISTS idx_users_status;
ALTER TABLE users DROP COLUMN IF EXISTS status;
```
### Safe Column Addition
```sql
-- SAFE: Add nullable column (no table rewrite)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- SAFE: Add column with volatile default (PG 11+)
ALTER TABLE users ADD COLUMN created_at TIMESTAMP DEFAULT NOW();
-- UNSAFE: Add column with constant default (table rewrite before PG 11)
-- ALTER TABLE users ADD COLUMN score INTEGER DEFAULT 0;
-- SAFE alternative for constant default:
ALTER TABLE users ADD COLUMN score INTEGER;
UPDATE users SET score = 0 WHERE score IS NULL;
ALTER TABLE users ALTER COLUMN score SET DEFAULT 0;
ALTER TABLE users ALTER COLUMN score SET NOT NULL;
```
### Safe Index Creation
```sql
-- UNSAFE: Locks table
CREATE INDEX idx_orders_user ON orders(user_id);
-- SAFE: Non-blocking
CREATE INDEX CONCURRENTLY idx_orders_user ON orders(user_id);
-- Note: CONCURRENTLY cannot run in a transaction
```
### Safe Column Removal
```sql
-- Step 1: Stop writing to column (application change)
-- Step 2: Wait for all deployments
-- Step 3: Drop column
ALTER TABLE users DROP COLUMN IF EXISTS legacy_field;
```
---
## 7. Monitoring and Alerting
### Key Metrics to Monitor
```sql
-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
-- Connection by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;
-- Long-running queries
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
AND state != 'idle';
-- Table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) as table_size,
pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) as index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
```
### pg_stat_statements for Query Analysis
```sql
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT
round(total_exec_time::numeric, 2) as total_time_ms,
calls,
round(mean_exec_time::numeric, 2) as avg_time_ms,
round((100 * total_exec_time / sum(total_exec_time) over())::numeric, 2) as percentage,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Find most frequent queries
SELECT
calls,
round(total_exec_time::numeric, 2) as total_time_ms,
round(mean_exec_time::numeric, 2) as avg_time_ms,
query
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
```
### Alert Thresholds
| Metric | Warning | Critical |
|--------|---------|----------|
| Connection usage | > 70% | > 90% |
| Query time P95 | > 500ms | > 2s |
| Replication lag | > 30s | > 5m |
| Disk usage | > 70% | > 85% |
| Cache hit ratio | < 95% | < 90% |
---
## Quick Reference: PostgreSQL Commands
```sql
-- Check table sizes
SELECT pg_size_pretty(pg_total_relation_size('orders'));
-- Check index sizes
SELECT pg_size_pretty(pg_indexes_size('orders'));
-- Kill a query
SELECT pg_cancel_backend(pid); -- Graceful
SELECT pg_terminate_backend(pid); -- Force
-- Check locks
SELECT * FROM pg_locks WHERE granted = false;
-- Vacuum analyze (update statistics)
VACUUM ANALYZE orders;
-- Check autovacuum status
SELECT * FROM pg_stat_user_tables WHERE relname = 'orders';
```