secrets-vault-manager (403-line SKILL.md, 3 scripts, 3 references): - HashiCorp Vault, AWS SM, Azure KV, GCP SM integration - Secret rotation, dynamic secrets, audit logging, emergency procedures sql-database-assistant (457-line SKILL.md, 3 scripts, 3 references): - Query optimization, migration generation, schema exploration - Multi-DB support (PostgreSQL, MySQL, SQLite, SQL Server) - ORM patterns (Prisma, Drizzle, TypeORM, SQLAlchemy) gcp-cloud-architect (418-line SKILL.md, 3 scripts, 3 references): - 6-step workflow mirroring aws-solution-architect for GCP - Cloud Run, GKE, BigQuery, Cloud Functions, cost optimization - Completes cloud trifecta (AWS + Azure + GCP) soc2-compliance (417-line SKILL.md, 3 scripts, 3 references): - SOC 2 Type I & II preparation, Trust Service Criteria mapping - Control matrix generation, evidence tracking, gap analysis - First SOC 2 skill in ra-qm-team (joins GDPR, ISO 27001, ISO 13485) All 12 scripts pass --help. Docs generated, mkdocs.yml nav updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
458 lines
16 KiB
Markdown
458 lines
16 KiB
Markdown
---
|
|
name: "sql-database-assistant"
|
|
description: "Use when the user asks to write SQL queries, optimize database performance, generate migrations, explore database schemas, or work with ORMs like Prisma, Drizzle, TypeORM, or SQLAlchemy."
|
|
---
|
|
|
|
# SQL Database Assistant - POWERFUL Tier Skill
|
|
|
|
## Overview
|
|
|
|
The operational companion to database design. While **database-designer** focuses on schema architecture and **database-schema-designer** handles ERD modeling, this skill covers the day-to-day: writing queries, optimizing performance, generating migrations, and bridging the gap between application code and database engines.
|
|
|
|
### Core Capabilities
|
|
|
|
- **Natural Language to SQL** — translate requirements into correct, performant queries
|
|
- **Schema Exploration** — introspect live databases across PostgreSQL, MySQL, SQLite, SQL Server
|
|
- **Query Optimization** — EXPLAIN analysis, index recommendations, N+1 detection, rewrite patterns
|
|
- **Migration Generation** — up/down scripts, zero-downtime strategies, rollback plans
|
|
- **ORM Integration** — Prisma, Drizzle, TypeORM, SQLAlchemy patterns and escape hatches
|
|
- **Multi-Database Support** — dialect-aware SQL with compatibility guidance
|
|
|
|
### Tools
|
|
|
|
| Script | Purpose |
|
|
|--------|---------|
|
|
| `scripts/query_optimizer.py` | Static analysis of SQL queries for performance issues |
|
|
| `scripts/migration_generator.py` | Generate migration file templates from change descriptions |
|
|
| `scripts/schema_explorer.py` | Generate schema documentation from introspection queries |
|
|
|
|
---
|
|
|
|
## Natural Language to SQL
|
|
|
|
### Translation Patterns
|
|
|
|
When converting requirements to SQL, follow this sequence:
|
|
|
|
1. **Identify entities** — map nouns to tables
|
|
2. **Identify relationships** — map verbs to JOINs or subqueries
|
|
3. **Identify filters** — map adjectives/conditions to WHERE clauses
|
|
4. **Identify aggregations** — map "total", "average", "count" to GROUP BY
|
|
5. **Identify ordering** — map "top", "latest", "highest" to ORDER BY + LIMIT
|
|
|
|
### Common Query Templates
|
|
|
|
**Top-N per group (window function)**
|
|
```sql
|
|
SELECT * FROM (
|
|
SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
|
|
FROM employees
|
|
) ranked WHERE rn <= 3;
|
|
```
|
|
|
|
**Running totals**
|
|
```sql
|
|
SELECT date, amount,
|
|
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
|
|
FROM transactions;
|
|
```
|
|
|
|
**Gap detection**
|
|
```sql
|
|
SELECT curr.id, curr.seq_num, prev.seq_num AS prev_seq
|
|
FROM records curr
|
|
LEFT JOIN records prev ON prev.seq_num = curr.seq_num - 1
|
|
WHERE prev.id IS NULL AND curr.seq_num > 1;
|
|
```
|
|
|
|
**UPSERT (PostgreSQL)**
|
|
```sql
|
|
INSERT INTO settings (key, value, updated_at)
|
|
VALUES ('theme', 'dark', NOW())
|
|
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, updated_at = EXCLUDED.updated_at;
|
|
```
|
|
|
|
**UPSERT (MySQL)**
|
|
```sql
|
|
INSERT INTO settings (key_name, value, updated_at)
|
|
VALUES ('theme', 'dark', NOW())
|
|
ON DUPLICATE KEY UPDATE value = VALUES(value), updated_at = VALUES(updated_at);
|
|
```
|
|
|
|
> See references/query_patterns.md for JOINs, CTEs, window functions, JSON operations, and more.
|
|
|
|
---
|
|
|
|
## Schema Exploration
|
|
|
|
### Introspection Queries
|
|
|
|
**PostgreSQL — list tables and columns**
|
|
```sql
|
|
SELECT table_name, column_name, data_type, is_nullable, column_default
|
|
FROM information_schema.columns
|
|
WHERE table_schema = 'public'
|
|
ORDER BY table_name, ordinal_position;
|
|
```
|
|
|
|
**PostgreSQL — foreign keys**
|
|
```sql
|
|
SELECT tc.table_name, kcu.column_name,
|
|
ccu.table_name AS foreign_table, ccu.column_name AS foreign_column
|
|
FROM information_schema.table_constraints tc
|
|
JOIN information_schema.key_column_usage kcu ON tc.constraint_name = kcu.constraint_name
|
|
JOIN information_schema.constraint_column_usage ccu ON tc.constraint_name = ccu.constraint_name
|
|
WHERE tc.constraint_type = 'FOREIGN KEY';
|
|
```
|
|
|
|
**MySQL — table sizes**
|
|
```sql
|
|
SELECT table_name, table_rows,
|
|
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
|
|
ROUND(index_length / 1024 / 1024, 2) AS index_mb
|
|
FROM information_schema.tables
|
|
WHERE table_schema = DATABASE()
|
|
ORDER BY data_length DESC;
|
|
```
|
|
|
|
**SQLite — schema dump**
|
|
```sql
|
|
SELECT name, sql FROM sqlite_master WHERE type = 'table' ORDER BY name;
|
|
```
|
|
|
|
**SQL Server — columns with types**
|
|
```sql
|
|
SELECT t.name AS table_name, c.name AS column_name,
|
|
ty.name AS data_type, c.max_length, c.is_nullable
|
|
FROM sys.columns c
|
|
JOIN sys.tables t ON c.object_id = t.object_id
|
|
JOIN sys.types ty ON c.user_type_id = ty.user_type_id
|
|
ORDER BY t.name, c.column_id;
|
|
```
|
|
|
|
### Generating Documentation from Schema
|
|
|
|
Use `scripts/schema_explorer.py` to produce markdown or JSON documentation:
|
|
|
|
```bash
|
|
python scripts/schema_explorer.py --dialect postgres --tables all --format md
|
|
python scripts/schema_explorer.py --dialect mysql --tables users,orders --format json --json
|
|
```
|
|
|
|
---
|
|
|
|
## Query Optimization
|
|
|
|
### EXPLAIN Analysis Workflow
|
|
|
|
1. **Run EXPLAIN ANALYZE** (PostgreSQL) or **EXPLAIN FORMAT=JSON** (MySQL)
|
|
2. **Identify the costliest node** — Seq Scan on large tables, Nested Loop with high row estimates
|
|
3. **Check for missing indexes** — sequential scans on filtered columns
|
|
4. **Look for estimation errors** — planned vs actual rows divergence signals stale statistics
|
|
5. **Evaluate JOIN order** — ensure the smallest result set drives the join
|
|
|
|
### Index Recommendation Checklist
|
|
|
|
- Columns in WHERE clauses with high selectivity
|
|
- Columns in JOIN conditions (foreign keys)
|
|
- Columns in ORDER BY when combined with LIMIT
|
|
- Composite indexes matching multi-column WHERE predicates (most selective column first)
|
|
- Partial indexes for queries with constant filters (e.g., `WHERE status = 'active'`)
|
|
- Covering indexes to avoid table lookups for read-heavy queries
|
|
|
|
### Query Rewriting Patterns
|
|
|
|
| Anti-Pattern | Rewrite |
|
|
|-------------|---------|
|
|
| `SELECT * FROM orders` | `SELECT id, status, total FROM orders` (explicit columns) |
|
|
| `WHERE YEAR(created_at) = 2025` | `WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'` (sargable) |
|
|
| Correlated subquery in SELECT | LEFT JOIN with aggregation |
|
|
| `NOT IN (SELECT ...)` with NULLs | `NOT EXISTS (SELECT 1 ...)` |
|
|
| `UNION` (dedup) when not needed | `UNION ALL` |
|
|
| `LIKE '%search%'` | Full-text search index (GIN/FULLTEXT) |
|
|
| `ORDER BY RAND()` | Application-side random sampling or `TABLESAMPLE` |
|
|
|
|
### N+1 Detection
|
|
|
|
**Symptoms:**
|
|
- Application loop that executes one query per parent row
|
|
- ORM lazy-loading related entities inside a loop
|
|
- Query log shows hundreds of identical SELECT patterns with different IDs
|
|
|
|
**Fixes:**
|
|
- Use eager loading (`include` in Prisma, `joinedload` in SQLAlchemy)
|
|
- Batch queries with `WHERE id IN (...)`
|
|
- Use DataLoader pattern for GraphQL resolvers
|
|
|
|
### Static Analysis Tool
|
|
|
|
```bash
|
|
python scripts/query_optimizer.py --query "SELECT * FROM orders WHERE status = 'pending'" --dialect postgres
|
|
python scripts/query_optimizer.py --query queries.sql --dialect mysql --json
|
|
```
|
|
|
|
> See references/optimization_guide.md for EXPLAIN plan reading, index types, and connection pooling.
|
|
|
|
---
|
|
|
|
## Migration Generation
|
|
|
|
### Zero-Downtime Migration Patterns
|
|
|
|
**Adding a column (safe)**
|
|
```sql
|
|
-- Up
|
|
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
|
|
|
|
-- Down
|
|
ALTER TABLE users DROP COLUMN phone;
|
|
```
|
|
|
|
**Renaming a column (expand-contract)**
|
|
```sql
|
|
-- Step 1: Add new column
|
|
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
|
|
-- Step 2: Backfill
|
|
UPDATE users SET full_name = name;
|
|
-- Step 3: Deploy app reading both columns
|
|
-- Step 4: Deploy app writing only new column
|
|
-- Step 5: Drop old column
|
|
ALTER TABLE users DROP COLUMN name;
|
|
```
|
|
|
|
**Adding a NOT NULL column (safe sequence)**
|
|
```sql
|
|
-- Step 1: Add nullable
|
|
ALTER TABLE orders ADD COLUMN region VARCHAR(50);
|
|
-- Step 2: Backfill with default
|
|
UPDATE orders SET region = 'unknown' WHERE region IS NULL;
|
|
-- Step 3: Add constraint
|
|
ALTER TABLE orders ALTER COLUMN region SET NOT NULL;
|
|
ALTER TABLE orders ALTER COLUMN region SET DEFAULT 'unknown';
|
|
```
|
|
|
|
**Index creation (non-blocking, PostgreSQL)**
|
|
```sql
|
|
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
|
|
```
|
|
|
|
### Data Backfill Strategies
|
|
|
|
- **Batch updates** — process in chunks of 1000-10000 rows to avoid lock contention
|
|
- **Background jobs** — run backfills asynchronously with progress tracking
|
|
- **Dual-write** — write to old and new columns during transition period
|
|
- **Validation queries** — verify row counts and data integrity after each batch
|
|
|
|
### Rollback Strategies
|
|
|
|
Every migration must have a reversible down script. For irreversible changes:
|
|
|
|
1. **Backup before execution** — `pg_dump` the affected tables
|
|
2. **Feature flags** — application can switch between old/new schema reads
|
|
3. **Shadow tables** — keep a copy of the original table during migration window
|
|
|
|
### Migration Generator Tool
|
|
|
|
```bash
|
|
python scripts/migration_generator.py --change "add email_verified boolean to users" --dialect postgres --format sql
|
|
python scripts/migration_generator.py --change "rename column name to full_name in customers" --dialect mysql --format alembic --json
|
|
```
|
|
|
|
---
|
|
|
|
## Multi-Database Support
|
|
|
|
### Dialect Differences
|
|
|
|
| Feature | PostgreSQL | MySQL | SQLite | SQL Server |
|
|
|---------|-----------|-------|--------|------------|
|
|
| UPSERT | `ON CONFLICT DO UPDATE` | `ON DUPLICATE KEY UPDATE` | `ON CONFLICT DO UPDATE` | `MERGE` |
|
|
| Boolean | Native `BOOLEAN` | `TINYINT(1)` | `INTEGER` | `BIT` |
|
|
| Auto-increment | `SERIAL` / `GENERATED` | `AUTO_INCREMENT` | `INTEGER PRIMARY KEY` | `IDENTITY` |
|
|
| JSON | `JSONB` (indexed) | `JSON` | Text (ext) | `NVARCHAR(MAX)` |
|
|
| Array | Native `ARRAY` | Not supported | Not supported | Not supported |
|
|
| CTE (recursive) | Full support | 8.0+ | 3.8.3+ | Full support |
|
|
| Window functions | Full support | 8.0+ | 3.25.0+ | Full support |
|
|
| Full-text search | `tsvector` + GIN | `FULLTEXT` index | FTS5 extension | Full-text catalog |
|
|
| LIMIT/OFFSET | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `LIMIT n OFFSET m` | `OFFSET m ROWS FETCH NEXT n ROWS ONLY` |
|
|
|
|
### Compatibility Tips
|
|
|
|
- **Always use parameterized queries** — prevents SQL injection across all dialects
|
|
- **Avoid dialect-specific functions in shared code** — wrap in adapter layer
|
|
- **Test migrations on target engine** — `information_schema` varies between engines
|
|
- **Use ISO date format** — `'YYYY-MM-DD'` works everywhere
|
|
- **Quote identifiers** — use double quotes (SQL standard) or backticks (MySQL)
|
|
|
|
---
|
|
|
|
## ORM Patterns
|
|
|
|
### Prisma
|
|
|
|
**Schema definition**
|
|
```prisma
|
|
model User {
|
|
id Int @id @default(autoincrement())
|
|
email String @unique
|
|
name String?
|
|
posts Post[]
|
|
createdAt DateTime @default(now())
|
|
}
|
|
|
|
model Post {
|
|
id Int @id @default(autoincrement())
|
|
title String
|
|
author User @relation(fields: [authorId], references: [id])
|
|
authorId Int
|
|
}
|
|
```
|
|
|
|
**Migrations**: `npx prisma migrate dev --name add_user_email`
|
|
**Query API**: `prisma.user.findMany({ where: { email: { contains: '@' } }, include: { posts: true } })`
|
|
**Raw SQL escape hatch**: `prisma.$queryRaw\`SELECT * FROM users WHERE id = ${userId}\``
|
|
|
|
### Drizzle
|
|
|
|
**Schema-first definition**
|
|
```typescript
|
|
export const users = pgTable('users', {
|
|
id: serial('id').primaryKey(),
|
|
email: varchar('email', { length: 255 }).notNull().unique(),
|
|
name: text('name'),
|
|
createdAt: timestamp('created_at').defaultNow(),
|
|
});
|
|
```
|
|
|
|
**Query builder**: `db.select().from(users).where(eq(users.email, email))`
|
|
**Migrations**: `npx drizzle-kit generate:pg` then `npx drizzle-kit push:pg`
|
|
|
|
### TypeORM
|
|
|
|
**Entity decorators**
|
|
```typescript
|
|
@Entity()
|
|
export class User {
|
|
@PrimaryGeneratedColumn()
|
|
id: number;
|
|
|
|
@Column({ unique: true })
|
|
email: string;
|
|
|
|
@OneToMany(() => Post, post => post.author)
|
|
posts: Post[];
|
|
}
|
|
```
|
|
|
|
**Repository pattern**: `userRepo.find({ where: { email }, relations: ['posts'] })`
|
|
**Migrations**: `npx typeorm migration:generate -n AddUserEmail`
|
|
|
|
### SQLAlchemy
|
|
|
|
**Declarative models**
|
|
```python
|
|
class User(Base):
|
|
__tablename__ = 'users'
|
|
id = Column(Integer, primary_key=True)
|
|
email = Column(String(255), unique=True, nullable=False)
|
|
name = Column(String(255))
|
|
posts = relationship('Post', back_populates='author')
|
|
```
|
|
|
|
**Session management**: Always use `with Session() as session:` context manager
|
|
**Alembic migrations**: `alembic revision --autogenerate -m "add user email"`
|
|
|
|
> See references/orm_patterns.md for side-by-side comparisons and migration workflows per ORM.
|
|
|
|
---
|
|
|
|
## Data Integrity
|
|
|
|
### Constraint Strategy
|
|
|
|
- **Primary keys** — every table must have one; prefer surrogate keys (serial/UUID)
|
|
- **Foreign keys** — enforce referential integrity; define ON DELETE behavior explicitly
|
|
- **UNIQUE constraints** — for business-level uniqueness (email, slug, API key)
|
|
- **CHECK constraints** — validate ranges, enums, and business rules at the DB level
|
|
- **NOT NULL** — default to NOT NULL; make nullable only when genuinely optional
|
|
|
|
### Transaction Isolation Levels
|
|
|
|
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use Case |
|
|
|-------|-----------|-------------------|-------------|----------|
|
|
| READ UNCOMMITTED | Yes | Yes | Yes | Never recommended |
|
|
| READ COMMITTED | No | Yes | Yes | Default for PostgreSQL, general OLTP |
|
|
| REPEATABLE READ | No | No | Yes (InnoDB: No) | Financial calculations |
|
|
| SERIALIZABLE | No | No | No | Critical consistency (billing, inventory) |
|
|
|
|
### Deadlock Prevention
|
|
|
|
1. **Consistent lock ordering** — always acquire locks in the same table/row order
|
|
2. **Short transactions** — minimize time between first lock and commit
|
|
3. **Advisory locks** — use `pg_advisory_lock()` for application-level coordination
|
|
4. **Retry logic** — catch deadlock errors and retry with exponential backoff
|
|
|
|
---
|
|
|
|
## Backup & Restore
|
|
|
|
### PostgreSQL
|
|
```bash
|
|
# Full backup
|
|
pg_dump -Fc --no-owner dbname > backup.dump
|
|
# Restore
|
|
pg_restore -d dbname --clean --no-owner backup.dump
|
|
# Point-in-time recovery: configure WAL archiving + restore_command
|
|
```
|
|
|
|
### MySQL
|
|
```bash
|
|
# Full backup
|
|
mysqldump --single-transaction --routines --triggers dbname > backup.sql
|
|
# Restore
|
|
mysql dbname < backup.sql
|
|
# Binary log for PITR: mysqlbinlog --start-datetime="2025-01-01 00:00:00" binlog.000001
|
|
```
|
|
|
|
### SQLite
|
|
```bash
|
|
# Backup (safe with concurrent reads)
|
|
sqlite3 dbname ".backup backup.db"
|
|
```
|
|
|
|
### Backup Best Practices
|
|
- **Automate** — cron or systemd timer, never manual-only
|
|
- **Test restores** — untested backups are not backups
|
|
- **Offsite copies** — S3, GCS, or separate region
|
|
- **Retention policy** — daily for 7 days, weekly for 4 weeks, monthly for 12 months
|
|
- **Monitor backup size and duration** — sudden changes signal issues
|
|
|
|
---
|
|
|
|
## Anti-Patterns
|
|
|
|
| Anti-Pattern | Problem | Fix |
|
|
|-------------|---------|-----|
|
|
| `SELECT *` | Transfers unnecessary data, breaks on schema changes | Explicit column list |
|
|
| Missing indexes on FK columns | Slow JOINs and cascading deletes | Add indexes on all foreign keys |
|
|
| N+1 queries | 1 + N round trips to database | Eager loading or batch queries |
|
|
| Implicit type coercion | `WHERE id = '123'` prevents index use | Match types in predicates |
|
|
| No connection pooling | Exhausts connections under load | PgBouncer, ProxySQL, or ORM pool |
|
|
| Unbounded queries | No LIMIT risks returning millions of rows | Always paginate |
|
|
| Storing money as FLOAT | Rounding errors | Use `DECIMAL(19,4)` or integer cents |
|
|
| God tables | One table with 50+ columns | Normalize or use vertical partitioning |
|
|
| Soft deletes everywhere | Complicates every query with `WHERE deleted_at IS NULL` | Archive tables or event sourcing |
|
|
| Raw string concatenation | SQL injection | Parameterized queries always |
|
|
|
|
---
|
|
|
|
## Cross-References
|
|
|
|
| Skill | Relationship |
|
|
|-------|-------------|
|
|
| **database-designer** | Schema architecture, normalization analysis, ERD generation |
|
|
| **database-schema-designer** | Visual ERD modeling, relationship mapping |
|
|
| **migration-architect** | Complex multi-step migration orchestration |
|
|
| **api-design-reviewer** | Ensuring API endpoints align with query patterns |
|
|
| **observability-platform** | Query performance monitoring, slow query alerts |
|