feat(bundles): add editorial bundle plugins

This commit is contained in:
sickn33
2026-03-27 08:48:03 +01:00
parent 8eff08b706
commit dffac91d3b
1052 changed files with 212282 additions and 68 deletions

View File

@@ -0,0 +1,33 @@
{
"name": "antigravity-bundle-data-analytics",
"version": "8.10.0",
"description": "Install the \"Data & Analytics\" editorial skill bundle from Antigravity Awesome Skills.",
"author": {
"name": "sickn33 and contributors",
"url": "https://github.com/sickn33/antigravity-awesome-skills"
},
"homepage": "https://github.com/sickn33/antigravity-awesome-skills",
"repository": "https://github.com/sickn33/antigravity-awesome-skills",
"license": "MIT",
"keywords": [
"codex",
"skills",
"bundle",
"data-analytics",
"productivity"
],
"skills": "./skills/",
"interface": {
"displayName": "Data & Analytics",
"shortDescription": "Data & Analytics · 6 curated skills",
"longDescription": "For making sense of the numbers. Covers Analytics Tracking, Claude D3js Skill, and 4 more skills.",
"developerName": "sickn33 and contributors",
"category": "Data & Analytics",
"capabilities": [
"Interactive",
"Write"
],
"websiteURL": "https://github.com/sickn33/antigravity-awesome-skills",
"brandColor": "#111827"
}
}

View File

@@ -0,0 +1,238 @@
---
name: ab-test-setup
description: "Structured guide for setting up A/B tests with mandatory gates for hypothesis, metrics, and execution readiness."
risk: unknown
source: community
date_added: "2026-02-27"
---
# A/B Test Setup
## 1⃣ Purpose & Scope
Ensure every A/B test is **valid, rigorous, and safe** before a single line of code is written.
- Prevents "peeking"
- Enforces statistical power
- Blocks invalid hypotheses
---
## 2⃣ Pre-Requisites
You must have:
- A clear user problem
- Access to an analytics source
- Roughly estimated traffic volume
### Hypothesis Quality Checklist
A valid hypothesis includes:
- Observation or evidence
- Single, specific change
- Directional expectation
- Defined audience
- Measurable success criteria
---
### 3⃣ Hypothesis Lock (Hard Gate)
Before designing variants or metrics, you MUST:
- Present the **final hypothesis**
- Specify:
- Target audience
- Primary metric
- Expected direction of effect
- Minimum Detectable Effect (MDE)
Ask explicitly:
> “Is this the final hypothesis we are committing to for this test?”
**Do NOT proceed until confirmed.**
---
### 4⃣ Assumptions & Validity Check (Mandatory)
Explicitly list assumptions about:
- Traffic stability
- User independence
- Metric reliability
- Randomization quality
- External factors (seasonality, campaigns, releases)
If assumptions are weak or violated:
- Warn the user
- Recommend delaying or redesigning the test
---
### 5⃣ Test Type Selection
Choose the simplest valid test:
- **A/B Test** single change, two variants
- **A/B/n Test** multiple variants, higher traffic required
- **Multivariate Test (MVT)** interaction effects, very high traffic
- **Split URL Test** major structural changes
Default to **A/B** unless there is a clear reason otherwise.
---
### 6⃣ Metrics Definition
#### Primary Metric (Mandatory)
- Single metric used to evaluate success
- Directly tied to the hypothesis
- Pre-defined and frozen before launch
#### Secondary Metrics
- Provide context
- Explain _why_ results occurred
- Must not override the primary metric
#### Guardrail Metrics
- Metrics that must not degrade
- Used to prevent harmful wins
- Trigger test stop if significantly negative
---
### 7⃣ Sample Size & Duration
Define upfront:
- Baseline rate
- MDE
- Significance level (typically 95%)
- Statistical power (typically 80%)
Estimate:
- Required sample size per variant
- Expected test duration
**Do NOT proceed without a realistic sample size estimate.**
---
### 8⃣ Execution Readiness Gate (Hard Stop)
You may proceed to implementation **only if all are true**:
- Hypothesis is locked
- Primary metric is frozen
- Sample size is calculated
- Test duration is defined
- Guardrails are set
- Tracking is verified
If any item is missing, stop and resolve it.
---
## Running the Test
### During the Test
**DO:**
- Monitor technical health
- Document external factors
**DO NOT:**
- Stop early due to “good-looking” results
- Change variants mid-test
- Add new traffic sources
- Redefine success criteria
---
## Analyzing Results
### Analysis Discipline
When interpreting results:
- Do NOT generalize beyond the tested population
- Do NOT claim causality beyond the tested change
- Do NOT override guardrail failures
- Separate statistical significance from business judgment
### Interpretation Outcomes
| Result | Action |
| -------------------- | -------------------------------------- |
| Significant positive | Consider rollout |
| Significant negative | Reject variant, document learning |
| Inconclusive | Consider more traffic or bolder change |
| Guardrail failure | Do not ship, even if primary wins |
---
## Documentation & Learning
### Test Record (Mandatory)
Document:
- Hypothesis
- Variants
- Metrics
- Sample size vs achieved
- Results
- Decision
- Learnings
- Follow-up ideas
Store records in a shared, searchable location to avoid repeated failures.
---
## Refusal Conditions (Safety)
Refuse to proceed if:
- Baseline rate is unknown and cannot be estimated
- Traffic is insufficient to detect the MDE
- Primary metric is undefined
- Multiple variables are changed without proper design
- Hypothesis cannot be clearly stated
Explain why and recommend next steps.
---
## Key Principles (Non-Negotiable)
- One hypothesis per test
- One primary metric
- Commit before launch
- No peeking
- Learning over winning
- Statistical rigor first
---
## Final Reminder
A/B testing is not about proving ideas right.
It is about **learning the truth with confidence**.
If you feel tempted to rush, simplify, or “just try it” —
that is the signal to **slow down and re-check the design**.
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

View File

@@ -0,0 +1,405 @@
---
name: analytics-tracking
description: Design, audit, and improve analytics tracking systems that produce reliable, decision-ready data.
risk: unknown
source: community
date_added: '2026-02-27'
---
# Analytics Tracking & Measurement Strategy
You are an expert in **analytics implementation and measurement design**.
Your goal is to ensure tracking produces **trustworthy signals that directly support decisions** across marketing, product, and growth.
You do **not** track everything.
You do **not** optimize dashboards without fixing instrumentation.
You do **not** treat GA4 numbers as truth unless validated.
---
## Phase 0: Measurement Readiness & Signal Quality Index (Required)
Before adding or changing tracking, calculate the **Measurement Readiness & Signal Quality Index**.
### Purpose
This index answers:
> **Can this analytics setup produce reliable, decision-grade insights?**
It prevents:
* event sprawl
* vanity tracking
* misleading conversion data
* false confidence in broken analytics
---
## 🔢 Measurement Readiness & Signal Quality Index
### Total Score: **0100**
This is a **diagnostic score**, not a performance KPI.
---
### Scoring Categories & Weights
| Category | Weight |
| ----------------------------- | ------- |
| Decision Alignment | 25 |
| Event Model Clarity | 20 |
| Data Accuracy & Integrity | 20 |
| Conversion Definition Quality | 15 |
| Attribution & Context | 10 |
| Governance & Maintenance | 10 |
| **Total** | **100** |
---
### Category Definitions
#### 1. Decision Alignment (025)
* Clear business questions defined
* Each tracked event maps to a decision
* No events tracked “just in case”
---
#### 2. Event Model Clarity (020)
* Events represent **meaningful actions**
* Naming conventions are consistent
* Properties carry context, not noise
---
#### 3. Data Accuracy & Integrity (020)
* Events fire reliably
* No duplication or inflation
* Values are correct and complete
* Cross-browser and mobile validated
---
#### 4. Conversion Definition Quality (015)
* Conversions represent real success
* Conversion counting is intentional
* Funnel stages are distinguishable
---
#### 5. Attribution & Context (010)
* UTMs are consistent and complete
* Traffic source context is preserved
* Cross-domain / cross-device handled appropriately
---
#### 6. Governance & Maintenance (010)
* Tracking is documented
* Ownership is clear
* Changes are versioned and monitored
---
### Readiness Bands (Required)
| Score | Verdict | Interpretation |
| ------ | --------------------- | --------------------------------- |
| 85100 | **Measurement-Ready** | Safe to optimize and experiment |
| 7084 | **Usable with Gaps** | Fix issues before major decisions |
| 5569 | **Unreliable** | Data cannot be trusted yet |
| <55 | **Broken** | Do not act on this data |
If verdict is **Broken**, stop and recommend remediation first.
---
## Phase 1: Context & Decision Definition
(Proceed only after scoring)
### 1. Business Context
* What decisions will this data inform?
* Who uses the data (marketing, product, leadership)?
* What actions will be taken based on insights?
---
### 2. Current State
* Tools in use (GA4, GTM, Mixpanel, Amplitude, etc.)
* Existing events and conversions
* Known issues or distrust in data
---
### 3. Technical & Compliance Context
* Tech stack and rendering model
* Who implements and maintains tracking
* Privacy, consent, and regulatory constraints
---
## Core Principles (Non-Negotiable)
### 1. Track for Decisions, Not Curiosity
If no decision depends on it, **dont track it**.
---
### 2. Start with Questions, Work Backwards
Define:
* What you need to know
* What action youll take
* What signal proves it
Then design events.
---
### 3. Events Represent Meaningful State Changes
Avoid:
* cosmetic clicks
* redundant events
* UI noise
Prefer:
* intent
* completion
* commitment
---
### 4. Data Quality Beats Volume
Fewer accurate events > many unreliable ones.
---
## Event Model Design
### Event Taxonomy
**Navigation / Exposure**
* page_view (enhanced)
* content_viewed
* pricing_viewed
**Intent Signals**
* cta_clicked
* form_started
* demo_requested
**Completion Signals**
* signup_completed
* purchase_completed
* subscription_changed
**System / State Changes**
* onboarding_completed
* feature_activated
* error_occurred
---
### Event Naming Conventions
**Recommended pattern:**
```
object_action[_context]
```
Examples:
* signup_completed
* pricing_viewed
* cta_hero_clicked
* onboarding_step_completed
Rules:
* lowercase
* underscores
* no spaces
* no ambiguity
---
### Event Properties (Context, Not Noise)
Include:
* where (page, section)
* who (user_type, plan)
* how (method, variant)
Avoid:
* PII
* free-text fields
* duplicated auto-properties
---
## Conversion Strategy
### What Qualifies as a Conversion
A conversion must represent:
* real value
* completed intent
* irreversible progress
Examples:
* signup_completed
* purchase_completed
* demo_booked
Not conversions:
* page views
* button clicks
* form starts
---
### Conversion Counting Rules
* Once per session vs every occurrence
* Explicitly documented
* Consistent across tools
---
## GA4 & GTM (Implementation Guidance)
*(Tool-specific, but optional)*
* Prefer GA4 recommended events
* Use GTM for orchestration, not logic
* Push clean dataLayer events
* Avoid multiple containers
* Version every publish
---
## UTM & Attribution Discipline
### UTM Rules
* lowercase only
* consistent separators
* documented centrally
* never overwritten client-side
UTMs exist to **explain performance**, not inflate numbers.
---
## Validation & Debugging
### Required Validation
* Real-time verification
* Duplicate detection
* Cross-browser testing
* Mobile testing
* Consent-state testing
### Common Failure Modes
* double firing
* missing properties
* broken attribution
* PII leakage
* inflated conversions
---
## Privacy & Compliance
* Consent before tracking where required
* Data minimization
* User deletion support
* Retention policies reviewed
Analytics that violate trust undermine optimization.
---
## Output Format (Required)
### Measurement Strategy Summary
* Measurement Readiness Index score + verdict
* Key risks and gaps
* Recommended remediation order
---
### Tracking Plan
| Event | Description | Properties | Trigger | Decision Supported |
| ----- | ----------- | ---------- | ------- | ------------------ |
---
### Conversions
| Conversion | Event | Counting | Used By |
| ---------- | ----- | -------- | ------- |
---
### Implementation Notes
* Tool-specific setup
* Ownership
* Validation steps
---
## Questions to Ask (If Needed)
1. What decisions depend on this data?
2. Which metrics are currently trusted or distrusted?
3. Who owns analytics long term?
4. What compliance constraints apply?
5. What tools are already in place?
---
## Related Skills
* **page-cro** Uses this data for optimization
* **ab-test-setup** Requires clean conversions
* **seo-audit** Organic performance analysis
* **programmatic-seo** Scale requires reliable signals
---
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

View File

@@ -0,0 +1,823 @@
---
name: claude-d3js-skill
description: "This skill provides guidance for creating sophisticated, interactive data visualisations using d3.js."
risk: unknown
source: community
date_added: "2026-02-27"
---
# D3.js Visualisation
## Overview
This skill provides guidance for creating sophisticated, interactive data visualisations using d3.js. D3.js (Data-Driven Documents) excels at binding data to DOM elements and applying data-driven transformations to create custom, publication-quality visualisations with precise control over every visual element. The techniques work across any JavaScript environment, including vanilla JavaScript, React, Vue, Svelte, and other frameworks.
## When to use d3.js
**Use d3.js for:**
- Custom visualisations requiring unique visual encodings or layouts
- Interactive explorations with complex pan, zoom, or brush behaviours
- Network/graph visualisations (force-directed layouts, tree diagrams, hierarchies, chord diagrams)
- Geographic visualisations with custom projections
- Visualisations requiring smooth, choreographed transitions
- Publication-quality graphics with fine-grained styling control
- Novel chart types not available in standard libraries
**Consider alternatives for:**
- 3D visualisations - use Three.js instead
## Core workflow
### 1. Set up d3.js
Import d3 at the top of your script:
```javascript
import * as d3 from 'd3';
```
Or use the CDN version (7.x):
```html
<script src="https://d3js.org/d3.v7.min.js"></script>
```
All modules (scales, axes, shapes, transitions, etc.) are accessible through the `d3` namespace.
### 2. Choose the integration pattern
**Pattern A: Direct DOM manipulation (recommended for most cases)**
Use d3 to select DOM elements and manipulate them imperatively. This works in any JavaScript environment:
```javascript
function drawChart(data) {
if (!data || data.length === 0) return;
const svg = d3.select('#chart'); // Select by ID, class, or DOM element
// Clear previous content
svg.selectAll("*").remove();
// Set up dimensions
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
// Create scales, axes, and draw visualisation
// ... d3 code here ...
}
// Call when data changes
drawChart(myData);
```
**Pattern B: Declarative rendering (for frameworks with templating)**
Use d3 for data calculations (scales, layouts) but render elements via your framework:
```javascript
function getChartElements(data) {
const xScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.value)])
.range([0, 400]);
return data.map((d, i) => ({
x: 50,
y: i * 30,
width: xScale(d.value),
height: 25
}));
}
// In React: {getChartElements(data).map((d, i) => <rect key={i} {...d} fill="steelblue" />)}
// In Vue: v-for directive over the returned array
// In vanilla JS: Create elements manually from the returned data
```
Use Pattern A for complex visualisations with transitions, interactions, or when leveraging d3's full capabilities. Use Pattern B for simpler visualisations or when your framework prefers declarative rendering.
### 3. Structure the visualisation code
Follow this standard structure in your drawing function:
```javascript
function drawVisualization(data) {
if (!data || data.length === 0) return;
const svg = d3.select('#chart'); // Or pass a selector/element
svg.selectAll("*").remove(); // Clear previous render
// 1. Define dimensions
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// 2. Create main group with margins
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// 3. Create scales
const xScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.x)])
.range([0, innerWidth]);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.y)])
.range([innerHeight, 0]); // Note: inverted for SVG coordinates
// 4. Create and append axes
const xAxis = d3.axisBottom(xScale);
const yAxis = d3.axisLeft(yScale);
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(xAxis);
g.append("g")
.call(yAxis);
// 5. Bind data and create visual elements
g.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", 5)
.attr("fill", "steelblue");
}
// Call when data changes
drawVisualization(myData);
```
### 4. Implement responsive sizing
Make visualisations responsive to container size:
```javascript
function setupResponsiveChart(containerId, data) {
const container = document.getElementById(containerId);
const svg = d3.select(`#${containerId}`).append('svg');
function updateChart() {
const { width, height } = container.getBoundingClientRect();
svg.attr('width', width).attr('height', height);
// Redraw visualisation with new dimensions
drawChart(data, svg, width, height);
}
// Update on initial load
updateChart();
// Update on window resize
window.addEventListener('resize', updateChart);
// Return cleanup function
return () => window.removeEventListener('resize', updateChart);
}
// Usage:
// const cleanup = setupResponsiveChart('chart-container', myData);
// cleanup(); // Call when component unmounts or element removed
```
Or use ResizeObserver for more direct container monitoring:
```javascript
function setupResponsiveChartWithObserver(svgElement, data) {
const observer = new ResizeObserver(() => {
const { width, height } = svgElement.getBoundingClientRect();
d3.select(svgElement)
.attr('width', width)
.attr('height', height);
// Redraw visualisation
drawChart(data, d3.select(svgElement), width, height);
});
observer.observe(svgElement.parentElement);
return () => observer.disconnect();
}
```
## Common visualisation patterns
### Bar chart
```javascript
function drawBarChart(data, svgElement) {
if (!data || data.length === 0) return;
const svg = d3.select(svgElement);
svg.selectAll("*").remove();
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const xScale = d3.scaleBand()
.domain(data.map(d => d.category))
.range([0, innerWidth])
.padding(0.1);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.value)])
.range([innerHeight, 0]);
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(xScale));
g.append("g")
.call(d3.axisLeft(yScale));
g.selectAll("rect")
.data(data)
.join("rect")
.attr("x", d => xScale(d.category))
.attr("y", d => yScale(d.value))
.attr("width", xScale.bandwidth())
.attr("height", d => innerHeight - yScale(d.value))
.attr("fill", "steelblue");
}
// Usage:
// drawBarChart(myData, document.getElementById('chart'));
```
### Line chart
```javascript
const line = d3.line()
.x(d => xScale(d.date))
.y(d => yScale(d.value))
.curve(d3.curveMonotoneX); // Smooth curve
g.append("path")
.datum(data)
.attr("fill", "none")
.attr("stroke", "steelblue")
.attr("stroke-width", 2)
.attr("d", line);
```
### Scatter plot
```javascript
g.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", d => sizeScale(d.size)) // Optional: size encoding
.attr("fill", d => colourScale(d.category)) // Optional: colour encoding
.attr("opacity", 0.7);
```
### Chord diagram
A chord diagram shows relationships between entities in a circular layout, with ribbons representing flows between them:
```javascript
function drawChordDiagram(data) {
// data format: array of objects with source, target, and value
// Example: [{ source: 'A', target: 'B', value: 10 }, ...]
if (!data || data.length === 0) return;
const svg = d3.select('#chart');
svg.selectAll("*").remove();
const width = 600;
const height = 600;
const innerRadius = Math.min(width, height) * 0.3;
const outerRadius = innerRadius + 30;
// Create matrix from data
const nodes = Array.from(new Set(data.flatMap(d => [d.source, d.target])));
const matrix = Array.from({ length: nodes.length }, () => Array(nodes.length).fill(0));
data.forEach(d => {
const i = nodes.indexOf(d.source);
const j = nodes.indexOf(d.target);
matrix[i][j] += d.value;
matrix[j][i] += d.value;
});
// Create chord layout
const chord = d3.chord()
.padAngle(0.05)
.sortSubgroups(d3.descending);
const arc = d3.arc()
.innerRadius(innerRadius)
.outerRadius(outerRadius);
const ribbon = d3.ribbon()
.source(d => d.source)
.target(d => d.target);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10)
.domain(nodes);
const g = svg.append("g")
.attr("transform", `translate(${width / 2},${height / 2})`);
const chords = chord(matrix);
// Draw ribbons
g.append("g")
.attr("fill-opacity", 0.67)
.selectAll("path")
.data(chords)
.join("path")
.attr("d", ribbon)
.attr("fill", d => colourScale(nodes[d.source.index]))
.attr("stroke", d => d3.rgb(colourScale(nodes[d.source.index])).darker());
// Draw groups (arcs)
const group = g.append("g")
.selectAll("g")
.data(chords.groups)
.join("g");
group.append("path")
.attr("d", arc)
.attr("fill", d => colourScale(nodes[d.index]))
.attr("stroke", d => d3.rgb(colourScale(nodes[d.index])).darker());
// Add labels
group.append("text")
.each(d => { d.angle = (d.startAngle + d.endAngle) / 2; })
.attr("dy", "0.31em")
.attr("transform", d => `rotate(${(d.angle * 180 / Math.PI) - 90})translate(${outerRadius + 30})${d.angle > Math.PI ? "rotate(180)" : ""}`)
.attr("text-anchor", d => d.angle > Math.PI ? "end" : null)
.text((d, i) => nodes[i])
.style("font-size", "12px");
}
```
### Heatmap
A heatmap uses colour to encode values in a two-dimensional grid, useful for showing patterns across categories:
```javascript
function drawHeatmap(data) {
// data format: array of objects with row, column, and value
// Example: [{ row: 'A', column: 'X', value: 10 }, ...]
if (!data || data.length === 0) return;
const svg = d3.select('#chart');
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const margin = { top: 100, right: 30, bottom: 30, left: 100 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Get unique rows and columns
const rows = Array.from(new Set(data.map(d => d.row)));
const columns = Array.from(new Set(data.map(d => d.column)));
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// Create scales
const xScale = d3.scaleBand()
.domain(columns)
.range([0, innerWidth])
.padding(0.01);
const yScale = d3.scaleBand()
.domain(rows)
.range([0, innerHeight])
.padding(0.01);
// Colour scale for values
const colourScale = d3.scaleSequential(d3.interpolateYlOrRd)
.domain([0, d3.max(data, d => d.value)]);
// Draw rectangles
g.selectAll("rect")
.data(data)
.join("rect")
.attr("x", d => xScale(d.column))
.attr("y", d => yScale(d.row))
.attr("width", xScale.bandwidth())
.attr("height", yScale.bandwidth())
.attr("fill", d => colourScale(d.value));
// Add x-axis labels
svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`)
.selectAll("text")
.data(columns)
.join("text")
.attr("x", d => xScale(d) + xScale.bandwidth() / 2)
.attr("y", -10)
.attr("text-anchor", "middle")
.text(d => d)
.style("font-size", "12px");
// Add y-axis labels
svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`)
.selectAll("text")
.data(rows)
.join("text")
.attr("x", -10)
.attr("y", d => yScale(d) + yScale.bandwidth() / 2)
.attr("dy", "0.35em")
.attr("text-anchor", "end")
.text(d => d)
.style("font-size", "12px");
// Add colour legend
const legendWidth = 20;
const legendHeight = 200;
const legend = svg.append("g")
.attr("transform", `translate(${width - 60},${margin.top})`);
const legendScale = d3.scaleLinear()
.domain(colourScale.domain())
.range([legendHeight, 0]);
const legendAxis = d3.axisRight(legendScale)
.ticks(5);
// Draw colour gradient in legend
for (let i = 0; i < legendHeight; i++) {
legend.append("rect")
.attr("y", i)
.attr("width", legendWidth)
.attr("height", 1)
.attr("fill", colourScale(legendScale.invert(i)));
}
legend.append("g")
.attr("transform", `translate(${legendWidth},0)`)
.call(legendAxis);
}
```
### Pie chart
```javascript
const pie = d3.pie()
.value(d => d.value)
.sort(null);
const arc = d3.arc()
.innerRadius(0)
.outerRadius(Math.min(width, height) / 2 - 20);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
const g = svg.append("g")
.attr("transform", `translate(${width / 2},${height / 2})`);
g.selectAll("path")
.data(pie(data))
.join("path")
.attr("d", arc)
.attr("fill", (d, i) => colourScale(i))
.attr("stroke", "white")
.attr("stroke-width", 2);
```
### Force-directed network
```javascript
const simulation = d3.forceSimulation(nodes)
.force("link", d3.forceLink(links).id(d => d.id).distance(100))
.force("charge", d3.forceManyBody().strength(-300))
.force("center", d3.forceCenter(width / 2, height / 2));
const link = g.selectAll("line")
.data(links)
.join("line")
.attr("stroke", "#999")
.attr("stroke-width", 1);
const node = g.selectAll("circle")
.data(nodes)
.join("circle")
.attr("r", 8)
.attr("fill", "steelblue")
.call(d3.drag()
.on("start", dragstarted)
.on("drag", dragged)
.on("end", dragended));
simulation.on("tick", () => {
link
.attr("x1", d => d.source.x)
.attr("y1", d => d.source.y)
.attr("x2", d => d.target.x)
.attr("y2", d => d.target.y);
node
.attr("cx", d => d.x)
.attr("cy", d => d.y);
});
function dragstarted(event) {
if (!event.active) simulation.alphaTarget(0.3).restart();
event.subject.fx = event.subject.x;
event.subject.fy = event.subject.y;
}
function dragged(event) {
event.subject.fx = event.x;
event.subject.fy = event.y;
}
function dragended(event) {
if (!event.active) simulation.alphaTarget(0);
event.subject.fx = null;
event.subject.fy = null;
}
```
## Adding interactivity
### Tooltips
```javascript
// Create tooltip div (outside SVG)
const tooltip = d3.select("body").append("div")
.attr("class", "tooltip")
.style("position", "absolute")
.style("visibility", "hidden")
.style("background-color", "white")
.style("border", "1px solid #ddd")
.style("padding", "10px")
.style("border-radius", "4px")
.style("pointer-events", "none");
// Add to elements
circles
.on("mouseover", function(event, d) {
d3.select(this).attr("opacity", 1);
tooltip
.style("visibility", "visible")
.html(`<strong>${d.label}</strong><br/>Value: ${d.value}`);
})
.on("mousemove", function(event) {
tooltip
.style("top", (event.pageY - 10) + "px")
.style("left", (event.pageX + 10) + "px");
})
.on("mouseout", function() {
d3.select(this).attr("opacity", 0.7);
tooltip.style("visibility", "hidden");
});
```
### Zoom and pan
```javascript
const zoom = d3.zoom()
.scaleExtent([0.5, 10])
.on("zoom", (event) => {
g.attr("transform", event.transform);
});
svg.call(zoom);
```
### Click interactions
```javascript
circles
.on("click", function(event, d) {
// Handle click (dispatch event, update app state, etc.)
console.log("Clicked:", d);
// Visual feedback
d3.selectAll("circle").attr("fill", "steelblue");
d3.select(this).attr("fill", "orange");
// Optional: dispatch custom event for your framework/app to listen to
// window.dispatchEvent(new CustomEvent('chartClick', { detail: d }));
});
```
## Transitions and animations
Add smooth transitions to visual changes:
```javascript
// Basic transition
circles
.transition()
.duration(750)
.attr("r", 10);
// Chained transitions
circles
.transition()
.duration(500)
.attr("fill", "orange")
.transition()
.duration(500)
.attr("r", 15);
// Staggered transitions
circles
.transition()
.delay((d, i) => i * 50)
.duration(500)
.attr("cy", d => yScale(d.value));
// Custom easing
circles
.transition()
.duration(1000)
.ease(d3.easeBounceOut)
.attr("r", 10);
```
## Scales reference
### Quantitative scales
```javascript
// Linear scale
const xScale = d3.scaleLinear()
.domain([0, 100])
.range([0, 500]);
// Log scale (for exponential data)
const logScale = d3.scaleLog()
.domain([1, 1000])
.range([0, 500]);
// Power scale
const powScale = d3.scalePow()
.exponent(2)
.domain([0, 100])
.range([0, 500]);
// Time scale
const timeScale = d3.scaleTime()
.domain([new Date(2020, 0, 1), new Date(2024, 0, 1)])
.range([0, 500]);
```
### Ordinal scales
```javascript
// Band scale (for bar charts)
const bandScale = d3.scaleBand()
.domain(['A', 'B', 'C', 'D'])
.range([0, 400])
.padding(0.1);
// Point scale (for line/scatter categories)
const pointScale = d3.scalePoint()
.domain(['A', 'B', 'C', 'D'])
.range([0, 400]);
// Ordinal scale (for colours)
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
```
### Sequential scales
```javascript
// Sequential colour scale
const colourScale = d3.scaleSequential(d3.interpolateBlues)
.domain([0, 100]);
// Diverging colour scale
const divScale = d3.scaleDiverging(d3.interpolateRdBu)
.domain([-10, 0, 10]);
```
## Best practices
### Data preparation
Always validate and prepare data before visualisation:
```javascript
// Filter invalid values
const cleanData = data.filter(d => d.value != null && !isNaN(d.value));
// Sort data if order matters
const sortedData = [...data].sort((a, b) => b.value - a.value);
// Parse dates
const parsedData = data.map(d => ({
...d,
date: d3.timeParse("%Y-%m-%d")(d.date)
}));
```
### Performance optimisation
For large datasets (>1000 elements):
```javascript
// Use canvas instead of SVG for many elements
// Use quadtree for collision detection
// Simplify paths with d3.line().curve(d3.curveStep)
// Implement virtual scrolling for large lists
// Use requestAnimationFrame for custom animations
```
### Accessibility
Make visualisations accessible:
```javascript
// Add ARIA labels
svg.attr("role", "img")
.attr("aria-label", "Bar chart showing quarterly revenue");
// Add title and description
svg.append("title").text("Quarterly Revenue 2024");
svg.append("desc").text("Bar chart showing revenue growth across four quarters");
// Ensure sufficient colour contrast
// Provide keyboard navigation for interactive elements
// Include data table alternative
```
### Styling
Use consistent, professional styling:
```javascript
// Define colour palettes upfront
const colours = {
primary: '#4A90E2',
secondary: '#7B68EE',
background: '#F5F7FA',
text: '#333333',
gridLines: '#E0E0E0'
};
// Apply consistent typography
svg.selectAll("text")
.style("font-family", "Inter, sans-serif")
.style("font-size", "12px");
// Use subtle grid lines
g.selectAll(".tick line")
.attr("stroke", colours.gridLines)
.attr("stroke-dasharray", "2,2");
```
## Common issues and solutions
**Issue**: Axes not appearing
- Ensure scales have valid domains (check for NaN values)
- Verify axis is appended to correct group
- Check transform translations are correct
**Issue**: Transitions not working
- Call `.transition()` before attribute changes
- Ensure elements have unique keys for proper data binding
- Check that useEffect dependencies include all changing data
**Issue**: Responsive sizing not working
- Use ResizeObserver or window resize listener
- Update dimensions in state to trigger re-render
- Ensure SVG has width/height attributes or viewBox
**Issue**: Performance problems
- Limit number of DOM elements (consider canvas for >1000 items)
- Debounce resize handlers
- Use `.join()` instead of separate enter/update/exit selections
- Avoid unnecessary re-renders by checking dependencies
## Resources
### references/
Contains detailed reference materials:
- `d3-patterns.md` - Comprehensive collection of visualisation patterns and code examples
- `scale-reference.md` - Complete guide to d3 scales with examples
- `colour-schemes.md` - D3 colour schemes and palette recommendations
### assets/
Contains boilerplate templates:
- `chart-template.js` - Starter template for basic chart
- `interactive-template.js` - Template with tooltips, zoom, and interactions
- `sample-data.json` - Example datasets for testing
These templates work with vanilla JavaScript, React, Vue, Svelte, or any other JavaScript environment. Adapt them as needed for your specific framework.
To use these resources, read the relevant files when detailed guidance is needed for specific visualisation types or patterns.

View File

@@ -0,0 +1,106 @@
import { useEffect, useRef, useState } from 'react';
import * as d3 from 'd3';
function BasicChart({ data }) {
const svgRef = useRef();
useEffect(() => {
if (!data || data.length === 0) return;
// Select SVG element
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove(); // Clear previous content
// Define dimensions and margins
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Create main group with margins
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// Create scales
const xScale = d3.scaleBand()
.domain(data.map(d => d.label))
.range([0, innerWidth])
.padding(0.1);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.value)])
.range([innerHeight, 0])
.nice();
// Create and append axes
const xAxis = d3.axisBottom(xScale);
const yAxis = d3.axisLeft(yScale);
g.append("g")
.attr("class", "x-axis")
.attr("transform", `translate(0,${innerHeight})`)
.call(xAxis);
g.append("g")
.attr("class", "y-axis")
.call(yAxis);
// Bind data and create visual elements (bars in this example)
g.selectAll("rect")
.data(data)
.join("rect")
.attr("x", d => xScale(d.label))
.attr("y", d => yScale(d.value))
.attr("width", xScale.bandwidth())
.attr("height", d => innerHeight - yScale(d.value))
.attr("fill", "steelblue");
// Optional: Add axis labels
g.append("text")
.attr("class", "axis-label")
.attr("x", innerWidth / 2)
.attr("y", innerHeight + margin.bottom - 5)
.attr("text-anchor", "middle")
.text("Category");
g.append("text")
.attr("class", "axis-label")
.attr("transform", "rotate(-90)")
.attr("x", -innerHeight / 2)
.attr("y", -margin.left + 15)
.attr("text-anchor", "middle")
.text("Value");
}, [data]);
return (
<div className="chart-container">
<svg
ref={svgRef}
width="800"
height="400"
style={{ border: '1px solid #ddd' }}
/>
</div>
);
}
// Example usage
export default function App() {
const sampleData = [
{ label: 'A', value: 30 },
{ label: 'B', value: 80 },
{ label: 'C', value: 45 },
{ label: 'D', value: 60 },
{ label: 'E', value: 20 },
{ label: 'F', value: 90 }
];
return (
<div className="p-8">
<h1 className="text-2xl font-bold mb-4">Basic D3.js Chart</h1>
<BasicChart data={sampleData} />
</div>
);
}

View File

@@ -0,0 +1,227 @@
import { useEffect, useRef, useState } from 'react';
import * as d3 from 'd3';
function InteractiveChart({ data }) {
const svgRef = useRef();
const tooltipRef = useRef();
const [selectedPoint, setSelectedPoint] = useState(null);
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
// Dimensions
const width = 800;
const height = 500;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Create main group
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// Scales
const xScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.x)])
.range([0, innerWidth])
.nice();
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.y)])
.range([innerHeight, 0])
.nice();
const sizeScale = d3.scaleSqrt()
.domain([0, d3.max(data, d => d.size || 10)])
.range([3, 20]);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
// Add zoom behaviour
const zoom = d3.zoom()
.scaleExtent([0.5, 10])
.on("zoom", (event) => {
g.attr("transform", `translate(${margin.left + event.transform.x},${margin.top + event.transform.y}) scale(${event.transform.k})`);
});
svg.call(zoom);
// Axes
const xAxis = d3.axisBottom(xScale);
const yAxis = d3.axisLeft(yScale);
const xAxisGroup = g.append("g")
.attr("class", "x-axis")
.attr("transform", `translate(0,${innerHeight})`)
.call(xAxis);
const yAxisGroup = g.append("g")
.attr("class", "y-axis")
.call(yAxis);
// Grid lines
g.append("g")
.attr("class", "grid")
.attr("opacity", 0.1)
.call(d3.axisLeft(yScale)
.tickSize(-innerWidth)
.tickFormat(""));
g.append("g")
.attr("class", "grid")
.attr("opacity", 0.1)
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(xScale)
.tickSize(-innerHeight)
.tickFormat(""));
// Tooltip
const tooltip = d3.select(tooltipRef.current);
// Data points
const circles = g.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", d => sizeScale(d.size || 10))
.attr("fill", d => colourScale(d.category || 'default'))
.attr("stroke", "#fff")
.attr("stroke-width", 2)
.attr("opacity", 0.7)
.style("cursor", "pointer");
// Hover interactions
circles
.on("mouseover", function(event, d) {
// Enlarge circle
d3.select(this)
.transition()
.duration(200)
.attr("opacity", 1)
.attr("stroke-width", 3);
// Show tooltip
tooltip
.style("display", "block")
.style("left", (event.pageX + 10) + "px")
.style("top", (event.pageY - 10) + "px")
.html(`
<strong>${d.label || 'Point'}</strong><br/>
X: ${d.x.toFixed(2)}<br/>
Y: ${d.y.toFixed(2)}<br/>
${d.category ? `Category: ${d.category}<br/>` : ''}
${d.size ? `Size: ${d.size.toFixed(2)}` : ''}
`);
})
.on("mousemove", function(event) {
tooltip
.style("left", (event.pageX + 10) + "px")
.style("top", (event.pageY - 10) + "px");
})
.on("mouseout", function() {
// Restore circle
d3.select(this)
.transition()
.duration(200)
.attr("opacity", 0.7)
.attr("stroke-width", 2);
// Hide tooltip
tooltip.style("display", "none");
})
.on("click", function(event, d) {
// Highlight selected point
circles.attr("stroke", "#fff").attr("stroke-width", 2);
d3.select(this)
.attr("stroke", "#000")
.attr("stroke-width", 3);
setSelectedPoint(d);
});
// Add transition on initial render
circles
.attr("r", 0)
.transition()
.duration(800)
.delay((d, i) => i * 20)
.attr("r", d => sizeScale(d.size || 10));
// Axis labels
g.append("text")
.attr("class", "axis-label")
.attr("x", innerWidth / 2)
.attr("y", innerHeight + margin.bottom - 5)
.attr("text-anchor", "middle")
.style("font-size", "14px")
.text("X Axis");
g.append("text")
.attr("class", "axis-label")
.attr("transform", "rotate(-90)")
.attr("x", -innerHeight / 2)
.attr("y", -margin.left + 15)
.attr("text-anchor", "middle")
.style("font-size", "14px")
.text("Y Axis");
}, [data]);
return (
<div className="relative">
<svg
ref={svgRef}
width="800"
height="500"
style={{ border: '1px solid #ddd', cursor: 'grab' }}
/>
<div
ref={tooltipRef}
style={{
position: 'absolute',
display: 'none',
padding: '10px',
background: 'white',
border: '1px solid #ddd',
borderRadius: '4px',
pointerEvents: 'none',
boxShadow: '0 2px 4px rgba(0,0,0,0.1)',
fontSize: '13px',
zIndex: 1000
}}
/>
{selectedPoint && (
<div className="mt-4 p-4 bg-blue-50 rounded border border-blue-200">
<h3 className="font-bold mb-2">Selected Point</h3>
<pre className="text-sm">{JSON.stringify(selectedPoint, null, 2)}</pre>
</div>
)}
</div>
);
}
// Example usage
export default function App() {
const sampleData = Array.from({ length: 50 }, (_, i) => ({
id: i,
label: `Point ${i + 1}`,
x: Math.random() * 100,
y: Math.random() * 100,
size: Math.random() * 30 + 5,
category: ['A', 'B', 'C', 'D'][Math.floor(Math.random() * 4)]
}));
return (
<div className="p-8">
<h1 className="text-2xl font-bold mb-2">Interactive D3.js Chart</h1>
<p className="text-gray-600 mb-4">
Hover over points for details. Click to select. Scroll to zoom. Drag to pan.
</p>
<InteractiveChart data={sampleData} />
</div>
);
}

View File

@@ -0,0 +1,115 @@
{
"timeSeries": [
{ "date": "2024-01-01", "value": 120, "category": "A" },
{ "date": "2024-02-01", "value": 135, "category": "A" },
{ "date": "2024-03-01", "value": 128, "category": "A" },
{ "date": "2024-04-01", "value": 145, "category": "A" },
{ "date": "2024-05-01", "value": 152, "category": "A" },
{ "date": "2024-06-01", "value": 168, "category": "A" },
{ "date": "2024-07-01", "value": 175, "category": "A" },
{ "date": "2024-08-01", "value": 182, "category": "A" },
{ "date": "2024-09-01", "value": 190, "category": "A" },
{ "date": "2024-10-01", "value": 185, "category": "A" },
{ "date": "2024-11-01", "value": 195, "category": "A" },
{ "date": "2024-12-01", "value": 210, "category": "A" }
],
"categorical": [
{ "label": "Product A", "value": 450, "category": "Electronics" },
{ "label": "Product B", "value": 320, "category": "Electronics" },
{ "label": "Product C", "value": 580, "category": "Clothing" },
{ "label": "Product D", "value": 290, "category": "Clothing" },
{ "label": "Product E", "value": 410, "category": "Food" },
{ "label": "Product F", "value": 370, "category": "Food" }
],
"scatterData": [
{ "x": 12, "y": 45, "size": 25, "category": "Group A", "label": "Point 1" },
{ "x": 25, "y": 62, "size": 35, "category": "Group A", "label": "Point 2" },
{ "x": 38, "y": 55, "size": 20, "category": "Group B", "label": "Point 3" },
{ "x": 45, "y": 78, "size": 40, "category": "Group B", "label": "Point 4" },
{ "x": 52, "y": 68, "size": 30, "category": "Group C", "label": "Point 5" },
{ "x": 65, "y": 85, "size": 45, "category": "Group C", "label": "Point 6" },
{ "x": 72, "y": 72, "size": 28, "category": "Group A", "label": "Point 7" },
{ "x": 85, "y": 92, "size": 50, "category": "Group B", "label": "Point 8" }
],
"hierarchical": {
"name": "Root",
"children": [
{
"name": "Category 1",
"children": [
{ "name": "Item 1.1", "value": 100 },
{ "name": "Item 1.2", "value": 150 },
{ "name": "Item 1.3", "value": 80 }
]
},
{
"name": "Category 2",
"children": [
{ "name": "Item 2.1", "value": 200 },
{ "name": "Item 2.2", "value": 120 },
{ "name": "Item 2.3", "value": 90 }
]
},
{
"name": "Category 3",
"children": [
{ "name": "Item 3.1", "value": 180 },
{ "name": "Item 3.2", "value": 140 }
]
}
]
},
"network": {
"nodes": [
{ "id": "A", "group": 1 },
{ "id": "B", "group": 1 },
{ "id": "C", "group": 1 },
{ "id": "D", "group": 2 },
{ "id": "E", "group": 2 },
{ "id": "F", "group": 3 },
{ "id": "G", "group": 3 },
{ "id": "H", "group": 3 }
],
"links": [
{ "source": "A", "target": "B", "value": 1 },
{ "source": "A", "target": "C", "value": 2 },
{ "source": "B", "target": "C", "value": 1 },
{ "source": "C", "target": "D", "value": 3 },
{ "source": "D", "target": "E", "value": 2 },
{ "source": "E", "target": "F", "value": 1 },
{ "source": "F", "target": "G", "value": 2 },
{ "source": "F", "target": "H", "value": 1 },
{ "source": "G", "target": "H", "value": 1 }
]
},
"stackedData": [
{ "group": "Q1", "seriesA": 30, "seriesB": 40, "seriesC": 25 },
{ "group": "Q2", "seriesA": 45, "seriesB": 35, "seriesC": 30 },
{ "group": "Q3", "seriesA": 40, "seriesB": 50, "seriesC": 35 },
{ "group": "Q4", "seriesA": 55, "seriesB": 45, "seriesC": 40 }
],
"geographicPoints": [
{ "city": "London", "latitude": 51.5074, "longitude": -0.1278, "value": 8900000 },
{ "city": "Paris", "latitude": 48.8566, "longitude": 2.3522, "value": 2140000 },
{ "city": "Berlin", "latitude": 52.5200, "longitude": 13.4050, "value": 3645000 },
{ "city": "Madrid", "latitude": 40.4168, "longitude": -3.7038, "value": 3223000 },
{ "city": "Rome", "latitude": 41.9028, "longitude": 12.4964, "value": 2873000 }
],
"divergingData": [
{ "category": "Item A", "value": -15 },
{ "category": "Item B", "value": 8 },
{ "category": "Item C", "value": -22 },
{ "category": "Item D", "value": 18 },
{ "category": "Item E", "value": -5 },
{ "category": "Item F", "value": 25 },
{ "category": "Item G", "value": -12 },
{ "category": "Item H", "value": 14 }
]
}

View File

@@ -0,0 +1,564 @@
# D3.js Colour Schemes and Palette Recommendations
Comprehensive guide to colour selection in data visualisation with d3.js.
## Built-in categorical colour schemes
### Category10 (default)
```javascript
d3.schemeCategory10
// ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
// '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
```
**Characteristics:**
- 10 distinct colours
- Good colour-blind accessibility
- Default choice for most categorical data
- Balanced saturation and brightness
**Use cases:** General purpose categorical encoding, legend items, multiple data series
### Tableau10
```javascript
d3.schemeTableau10
```
**Characteristics:**
- 10 colours optimised for data visualisation
- Professional appearance
- Excellent distinguishability
**Use cases:** Business dashboards, professional reports, presentations
### Accent
```javascript
d3.schemeAccent
// 8 colours with high saturation
```
**Characteristics:**
- Bright, vibrant colours
- High contrast
- Modern aesthetic
**Use cases:** Highlighting important categories, modern web applications
### Dark2
```javascript
d3.schemeDark2
// 8 darker, muted colours
```
**Characteristics:**
- Subdued palette
- Professional appearance
- Good for dark backgrounds
**Use cases:** Dark mode visualisations, professional contexts
### Paired
```javascript
d3.schemePaired
// 12 colours in pairs of similar hues
```
**Characteristics:**
- Pairs of light and dark variants
- Useful for nested categories
- 12 distinct colours
**Use cases:** Grouped bar charts, hierarchical categories, before/after comparisons
### Pastel1 & Pastel2
```javascript
d3.schemePastel1 // 9 colours
d3.schemePastel2 // 8 colours
```
**Characteristics:**
- Soft, low-saturation colours
- Gentle appearance
- Good for large areas
**Use cases:** Background colours, subtle categorisation, calming visualisations
### Set1, Set2, Set3
```javascript
d3.schemeSet1 // 9 colours - vivid
d3.schemeSet2 // 8 colours - muted
d3.schemeSet3 // 12 colours - pastel
```
**Characteristics:**
- Set1: High saturation, maximum distinction
- Set2: Professional, balanced
- Set3: Subtle, many categories
**Use cases:** Varied based on visual hierarchy needs
## Sequential colour schemes
Sequential schemes map continuous data from low to high values using a single hue or gradient.
### Single-hue sequential
**Blues:**
```javascript
d3.interpolateBlues
d3.schemeBlues[9] // 9-step discrete version
```
**Other single-hue options:**
- `d3.interpolateGreens` / `d3.schemeGreens`
- `d3.interpolateOranges` / `d3.schemeOranges`
- `d3.interpolatePurples` / `d3.schemePurples`
- `d3.interpolateReds` / `d3.schemeReds`
- `d3.interpolateGreys` / `d3.schemeGreys`
**Use cases:**
- Simple heat maps
- Choropleth maps
- Density plots
- Single-metric visualisations
### Multi-hue sequential
**Viridis (recommended):**
```javascript
d3.interpolateViridis
```
**Characteristics:**
- Perceptually uniform
- Colour-blind friendly
- Print-safe
- No visual dead zones
- Monotonically increasing perceived lightness
**Other perceptually-uniform options:**
- `d3.interpolatePlasma` - Purple to yellow
- `d3.interpolateInferno` - Black to white through red/orange
- `d3.interpolateMagma` - Black to white through purple
- `d3.interpolateCividis` - Colour-blind optimised
**Colour-blind accessible:**
```javascript
d3.interpolateTurbo // Rainbow-like but perceptually uniform
d3.interpolateCool // Cyan to magenta
d3.interpolateWarm // Orange to yellow
```
**Use cases:**
- Scientific visualisation
- Medical imaging
- Any high-precision data visualisation
- Accessible visualisations
### Traditional sequential
**Yellow-Orange-Red:**
```javascript
d3.interpolateYlOrRd
d3.schemeYlOrRd[9]
```
**Yellow-Green-Blue:**
```javascript
d3.interpolateYlGnBu
d3.schemeYlGnBu[9]
```
**Other multi-hue:**
- `d3.interpolateBuGn` - Blue to green
- `d3.interpolateBuPu` - Blue to purple
- `d3.interpolateGnBu` - Green to blue
- `d3.interpolateOrRd` - Orange to red
- `d3.interpolatePuBu` - Purple to blue
- `d3.interpolatePuBuGn` - Purple to blue-green
- `d3.interpolatePuRd` - Purple to red
- `d3.interpolateRdPu` - Red to purple
- `d3.interpolateYlGn` - Yellow to green
- `d3.interpolateYlOrBr` - Yellow to orange-brown
**Use cases:** Traditional data visualisation, familiar colour associations (temperature, vegetation, water)
## Diverging colour schemes
Diverging schemes highlight deviations from a central value using two distinct hues.
### Red-Blue (temperature)
```javascript
d3.interpolateRdBu
d3.schemeRdBu[11]
```
**Characteristics:**
- Intuitive temperature metaphor
- Strong contrast
- Clear positive/negative distinction
**Use cases:** Temperature, profit/loss, above/below average, correlation
### Red-Yellow-Blue
```javascript
d3.interpolateRdYlBu
d3.schemeRdYlBu[11]
```
**Characteristics:**
- Three-colour gradient
- Softer transition through yellow
- More visual steps
**Use cases:** When extreme values need emphasis and middle needs visibility
### Other diverging schemes
**Traffic light:**
```javascript
d3.interpolateRdYlGn // Red (bad) to green (good)
```
**Spectral (rainbow):**
```javascript
d3.interpolateSpectral // Full spectrum
```
**Other options:**
- `d3.interpolateBrBG` - Brown to blue-green
- `d3.interpolatePiYG` - Pink to yellow-green
- `d3.interpolatePRGn` - Purple to green
- `d3.interpolatePuOr` - Purple to orange
- `d3.interpolateRdGy` - Red to grey
**Use cases:** Choose based on semantic meaning and accessibility needs
## Colour-blind friendly palettes
### General guidelines
1. **Avoid red-green combinations** (most common colour blindness)
2. **Use blue-orange diverging** instead of red-green
3. **Add texture or patterns** as redundant encoding
4. **Test with simulation tools**
### Recommended colour-blind safe schemes
**Categorical:**
```javascript
// Okabe-Ito palette (colour-blind safe)
const okabePalette = [
'#E69F00', // Orange
'#56B4E9', // Sky blue
'#009E73', // Bluish green
'#F0E442', // Yellow
'#0072B2', // Blue
'#D55E00', // Vermillion
'#CC79A7', // Reddish purple
'#000000' // Black
];
const colourScale = d3.scaleOrdinal()
.domain(categories)
.range(okabePalette);
```
**Sequential:**
```javascript
// Use Viridis, Cividis, or Blues
d3.interpolateViridis // Best overall
d3.interpolateCividis // Optimised for CVD
d3.interpolateBlues // Simple, safe
```
**Diverging:**
```javascript
// Use blue-orange instead of red-green
d3.interpolateBrBG
d3.interpolatePuOr
```
## Custom colour palettes
### Creating custom sequential
```javascript
const customSequential = d3.scaleLinear()
.domain([0, 100])
.range(['#e8f4f8', '#006d9c']) // Light to dark blue
.interpolate(d3.interpolateLab); // Perceptually uniform
```
### Creating custom diverging
```javascript
const customDiverging = d3.scaleLinear()
.domain([0, 50, 100])
.range(['#ca0020', '#f7f7f7', '#0571b0']) // Red, grey, blue
.interpolate(d3.interpolateLab);
```
### Creating custom categorical
```javascript
// Brand colours
const brandPalette = [
'#FF6B6B', // Primary red
'#4ECDC4', // Secondary teal
'#45B7D1', // Tertiary blue
'#FFA07A', // Accent coral
'#98D8C8' // Accent mint
];
const colourScale = d3.scaleOrdinal()
.domain(categories)
.range(brandPalette);
```
## Semantic colour associations
### Universal colour meanings
**Red:**
- Danger, error, negative
- High temperature
- Debt, loss
**Green:**
- Success, positive
- Growth, vegetation
- Profit, gain
**Blue:**
- Trust, calm
- Water, cold
- Information, neutral
**Yellow/Orange:**
- Warning, caution
- Energy, warmth
- Attention
**Grey:**
- Neutral, inactive
- Missing data
- Background
### Context-specific palettes
**Financial:**
```javascript
const financialColours = {
profit: '#27ae60',
loss: '#e74c3c',
neutral: '#95a5a6',
highlight: '#3498db'
};
```
**Temperature:**
```javascript
const temperatureScale = d3.scaleSequential(d3.interpolateRdYlBu)
.domain([40, -10]); // Hot to cold (reversed)
```
**Traffic/Status:**
```javascript
const statusColours = {
success: '#27ae60',
warning: '#f39c12',
error: '#e74c3c',
info: '#3498db',
neutral: '#95a5a6'
};
```
## Accessibility best practices
### Contrast ratios
Ensure sufficient contrast between colours and backgrounds:
```javascript
// Good contrast example
const highContrast = {
background: '#ffffff',
text: '#2c3e50',
primary: '#3498db',
secondary: '#e74c3c'
};
```
**WCAG guidelines:**
- Normal text: 4.5:1 minimum
- Large text: 3:1 minimum
- UI components: 3:1 minimum
### Redundant encoding
Never rely solely on colour to convey information:
```javascript
// Add patterns or shapes
const symbols = ['circle', 'square', 'triangle', 'diamond'];
// Add text labels
// Use line styles (solid, dashed, dotted)
// Use size encoding
```
### Testing
Test visualisations for colour blindness:
- Chrome DevTools (Rendering > Emulate vision deficiencies)
- Colour Oracle (free desktop application)
- Coblis (online simulator)
## Professional colour recommendations
### Data journalism
```javascript
// Guardian style
const guardianPalette = [
'#005689', // Guardian blue
'#c70000', // Guardian red
'#7d0068', // Guardian pink
'#951c75', // Guardian purple
];
// FT style
const ftPalette = [
'#0f5499', // FT blue
'#990f3d', // FT red
'#593380', // FT purple
'#262a33', // FT black
];
```
### Academic/Scientific
```javascript
// Nature journal style
const naturePalette = [
'#0071b2', // Blue
'#d55e00', // Vermillion
'#009e73', // Green
'#f0e442', // Yellow
];
// Use Viridis for continuous data
const scientificScale = d3.scaleSequential(d3.interpolateViridis);
```
### Corporate/Business
```javascript
// Professional, conservative
const corporatePalette = [
'#003f5c', // Dark blue
'#58508d', // Purple
'#bc5090', // Magenta
'#ff6361', // Coral
'#ffa600' // Orange
];
```
## Dynamic colour selection
### Based on data range
```javascript
function selectColourScheme(data) {
const extent = d3.extent(data);
const hasNegative = extent[0] < 0;
const hasPositive = extent[1] > 0;
if (hasNegative && hasPositive) {
// Diverging: data crosses zero
return d3.scaleSequentialSymlog(d3.interpolateRdBu)
.domain([extent[0], 0, extent[1]]);
} else {
// Sequential: all positive or all negative
return d3.scaleSequential(d3.interpolateViridis)
.domain(extent);
}
}
```
### Based on category count
```javascript
function selectCategoricalScheme(categories) {
const n = categories.length;
if (n <= 10) {
return d3.scaleOrdinal(d3.schemeTableau10);
} else if (n <= 12) {
return d3.scaleOrdinal(d3.schemePaired);
} else {
// For many categories, use sequential with quantize
return d3.scaleQuantize()
.domain([0, n - 1])
.range(d3.quantize(d3.interpolateRainbow, n));
}
}
```
## Common colour mistakes to avoid
1. **Rainbow gradients for sequential data**
- Problem: Not perceptually uniform, hard to read
- Solution: Use Viridis, Blues, or other uniform schemes
2. **Red-green for diverging (colour blindness)**
- Problem: 8% of males can't distinguish
- Solution: Use blue-orange or purple-green
3. **Too many categorical colours**
- Problem: Hard to distinguish and remember
- Solution: Limit to 5-8 categories, use grouping
4. **Insufficient contrast**
- Problem: Poor readability
- Solution: Test contrast ratios, use darker colours on light backgrounds
5. **Culturally inconsistent colours**
- Problem: Confusing semantic meaning
- Solution: Research colour associations for target audience
6. **Inverted temperature scales**
- Problem: Counterintuitive (red = cold)
- Solution: Red/orange = hot, blue = cold
## Quick reference guide
**Need to show...**
- **Categories (≤10):** `d3.schemeCategory10` or `d3.schemeTableau10`
- **Categories (>10):** `d3.schemePaired` or group categories
- **Sequential (general):** `d3.interpolateViridis`
- **Sequential (scientific):** `d3.interpolateViridis` or `d3.interpolatePlasma`
- **Sequential (temperature):** `d3.interpolateRdYlBu` (inverted)
- **Diverging (zero):** `d3.interpolateRdBu` or `d3.interpolateBrBG`
- **Diverging (good/bad):** `d3.interpolateRdYlGn` (inverted)
- **Colour-blind safe (categorical):** Okabe-Ito palette (shown above)
- **Colour-blind safe (sequential):** `d3.interpolateCividis` or `d3.interpolateBlues`
- **Colour-blind safe (diverging):** `d3.interpolatePuOr` or `d3.interpolateBrBG`
**Always remember:**
1. Test for colour-blindness
2. Ensure sufficient contrast
3. Use semantic colours appropriately
4. Add redundant encoding (patterns, labels)
5. Keep it simple (fewer colours = clearer visualisation)

View File

@@ -0,0 +1,869 @@
# D3.js Visualisation Patterns
This reference provides detailed code patterns for common d3.js visualisation types.
## Hierarchical visualisations
### Tree diagram
```javascript
useEffect(() => {
if (!data) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const tree = d3.tree().size([height - 100, width - 200]);
const root = d3.hierarchy(data);
tree(root);
const g = svg.append("g")
.attr("transform", "translate(100,50)");
// Links
g.selectAll("path")
.data(root.links())
.join("path")
.attr("d", d3.linkHorizontal()
.x(d => d.y)
.y(d => d.x))
.attr("fill", "none")
.attr("stroke", "#555")
.attr("stroke-width", 2);
// Nodes
const node = g.selectAll("g")
.data(root.descendants())
.join("g")
.attr("transform", d => `translate(${d.y},${d.x})`);
node.append("circle")
.attr("r", 6)
.attr("fill", d => d.children ? "#555" : "#999");
node.append("text")
.attr("dy", "0.31em")
.attr("x", d => d.children ? -8 : 8)
.attr("text-anchor", d => d.children ? "end" : "start")
.text(d => d.data.name)
.style("font-size", "12px");
}, [data]);
```
### Treemap
```javascript
useEffect(() => {
if (!data) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const root = d3.hierarchy(data)
.sum(d => d.value)
.sort((a, b) => b.value - a.value);
d3.treemap()
.size([width, height])
.padding(2)
.round(true)(root);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
const cell = svg.selectAll("g")
.data(root.leaves())
.join("g")
.attr("transform", d => `translate(${d.x0},${d.y0})`);
cell.append("rect")
.attr("width", d => d.x1 - d.x0)
.attr("height", d => d.y1 - d.y0)
.attr("fill", d => colourScale(d.parent.data.name))
.attr("stroke", "white")
.attr("stroke-width", 2);
cell.append("text")
.attr("x", 4)
.attr("y", 16)
.text(d => d.data.name)
.style("font-size", "12px")
.style("fill", "white");
}, [data]);
```
### Sunburst diagram
```javascript
useEffect(() => {
if (!data) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 600;
const height = 600;
const radius = Math.min(width, height) / 2;
const root = d3.hierarchy(data)
.sum(d => d.value)
.sort((a, b) => b.value - a.value);
const partition = d3.partition()
.size([2 * Math.PI, radius]);
partition(root);
const arc = d3.arc()
.startAngle(d => d.x0)
.endAngle(d => d.x1)
.innerRadius(d => d.y0)
.outerRadius(d => d.y1);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
const g = svg.append("g")
.attr("transform", `translate(${width / 2},${height / 2})`);
g.selectAll("path")
.data(root.descendants())
.join("path")
.attr("d", arc)
.attr("fill", d => colourScale(d.depth))
.attr("stroke", "white")
.attr("stroke-width", 1);
}, [data]);
```
### Chord diagram
```javascript
function drawChordDiagram(data) {
// data format: array of objects with source, target, and value
// Example: [{ source: 'A', target: 'B', value: 10 }, ...]
if (!data || data.length === 0) return;
const svg = d3.select('#chart');
svg.selectAll("*").remove();
const width = 600;
const height = 600;
const innerRadius = Math.min(width, height) * 0.3;
const outerRadius = innerRadius + 30;
// Create matrix from data
const nodes = Array.from(new Set(data.flatMap(d => [d.source, d.target])));
const matrix = Array.from({ length: nodes.length }, () => Array(nodes.length).fill(0));
data.forEach(d => {
const i = nodes.indexOf(d.source);
const j = nodes.indexOf(d.target);
matrix[i][j] += d.value;
matrix[j][i] += d.value;
});
// Create chord layout
const chord = d3.chord()
.padAngle(0.05)
.sortSubgroups(d3.descending);
const arc = d3.arc()
.innerRadius(innerRadius)
.outerRadius(outerRadius);
const ribbon = d3.ribbon()
.source(d => d.source)
.target(d => d.target);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10)
.domain(nodes);
const g = svg.append("g")
.attr("transform", `translate(${width / 2},${height / 2})`);
const chords = chord(matrix);
// Draw ribbons
g.append("g")
.attr("fill-opacity", 0.67)
.selectAll("path")
.data(chords)
.join("path")
.attr("d", ribbon)
.attr("fill", d => colourScale(nodes[d.source.index]))
.attr("stroke", d => d3.rgb(colourScale(nodes[d.source.index])).darker());
// Draw groups (arcs)
const group = g.append("g")
.selectAll("g")
.data(chords.groups)
.join("g");
group.append("path")
.attr("d", arc)
.attr("fill", d => colourScale(nodes[d.index]))
.attr("stroke", d => d3.rgb(colourScale(nodes[d.index])).darker());
// Add labels
group.append("text")
.each(d => { d.angle = (d.startAngle + d.endAngle) / 2; })
.attr("dy", "0.31em")
.attr("transform", d => `rotate(${(d.angle * 180 / Math.PI) - 90})translate(${outerRadius + 30})${d.angle > Math.PI ? "rotate(180)" : ""}`)
.attr("text-anchor", d => d.angle > Math.PI ? "end" : null)
.text((d, i) => nodes[i])
.style("font-size", "12px");
}
// Data format example:
// const data = [
// { source: 'Category A', target: 'Category B', value: 100 },
// { source: 'Category A', target: 'Category C', value: 50 },
// { source: 'Category B', target: 'Category C', value: 75 }
// ];
// drawChordDiagram(data);
```
## Advanced chart types
### Heatmap
```javascript
function drawHeatmap(data) {
// data format: array of objects with row, column, and value
// Example: [{ row: 'A', column: 'X', value: 10 }, ...]
if (!data || data.length === 0) return;
const svg = d3.select('#chart');
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const margin = { top: 100, right: 30, bottom: 30, left: 100 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Get unique rows and columns
const rows = Array.from(new Set(data.map(d => d.row)));
const columns = Array.from(new Set(data.map(d => d.column)));
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
// Create scales
const xScale = d3.scaleBand()
.domain(columns)
.range([0, innerWidth])
.padding(0.01);
const yScale = d3.scaleBand()
.domain(rows)
.range([0, innerHeight])
.padding(0.01);
// Colour scale for values (sequential from light to dark red)
const colourScale = d3.scaleSequential(d3.interpolateYlOrRd)
.domain([0, d3.max(data, d => d.value)]);
// Draw rectangles
g.selectAll("rect")
.data(data)
.join("rect")
.attr("x", d => xScale(d.column))
.attr("y", d => yScale(d.row))
.attr("width", xScale.bandwidth())
.attr("height", yScale.bandwidth())
.attr("fill", d => colourScale(d.value));
// Add x-axis labels
svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`)
.selectAll("text")
.data(columns)
.join("text")
.attr("x", d => xScale(d) + xScale.bandwidth() / 2)
.attr("y", -10)
.attr("text-anchor", "middle")
.text(d => d)
.style("font-size", "12px");
// Add y-axis labels
svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`)
.selectAll("text")
.data(rows)
.join("text")
.attr("x", -10)
.attr("y", d => yScale(d) + yScale.bandwidth() / 2)
.attr("dy", "0.35em")
.attr("text-anchor", "end")
.text(d => d)
.style("font-size", "12px");
// Add colour legend
const legendWidth = 20;
const legendHeight = 200;
const legend = svg.append("g")
.attr("transform", `translate(${width - 60},${margin.top})`);
const legendScale = d3.scaleLinear()
.domain(colourScale.domain())
.range([legendHeight, 0]);
const legendAxis = d3.axisRight(legendScale).ticks(5);
// Draw colour gradient in legend
for (let i = 0; i < legendHeight; i++) {
legend.append("rect")
.attr("y", i)
.attr("width", legendWidth)
.attr("height", 1)
.attr("fill", colourScale(legendScale.invert(i)));
}
legend.append("g")
.attr("transform", `translate(${legendWidth},0)`)
.call(legendAxis);
}
// Data format example:
// const data = [
// { row: 'Monday', column: 'Morning', value: 42 },
// { row: 'Monday', column: 'Afternoon', value: 78 },
// { row: 'Tuesday', column: 'Morning', value: 65 },
// { row: 'Tuesday', column: 'Afternoon', value: 55 }
// ];
// drawHeatmap(data);
```
### Area chart with gradient
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
// Define gradient
const defs = svg.append("defs");
const gradient = defs.append("linearGradient")
.attr("id", "areaGradient")
.attr("x1", "0%")
.attr("x2", "0%")
.attr("y1", "0%")
.attr("y2", "100%");
gradient.append("stop")
.attr("offset", "0%")
.attr("stop-color", "steelblue")
.attr("stop-opacity", 0.8);
gradient.append("stop")
.attr("offset", "100%")
.attr("stop-color", "steelblue")
.attr("stop-opacity", 0.1);
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const xScale = d3.scaleTime()
.domain(d3.extent(data, d => d.date))
.range([0, innerWidth]);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.value)])
.range([innerHeight, 0]);
const area = d3.area()
.x(d => xScale(d.date))
.y0(innerHeight)
.y1(d => yScale(d.value))
.curve(d3.curveMonotoneX);
g.append("path")
.datum(data)
.attr("fill", "url(#areaGradient)")
.attr("d", area);
const line = d3.line()
.x(d => xScale(d.date))
.y(d => yScale(d.value))
.curve(d3.curveMonotoneX);
g.append("path")
.datum(data)
.attr("fill", "none")
.attr("stroke", "steelblue")
.attr("stroke-width", 2)
.attr("d", line);
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(xScale));
g.append("g")
.call(d3.axisLeft(yScale));
}, [data]);
```
### Stacked bar chart
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const categories = Object.keys(data[0]).filter(k => k !== 'group');
const stackedData = d3.stack().keys(categories)(data);
const xScale = d3.scaleBand()
.domain(data.map(d => d.group))
.range([0, innerWidth])
.padding(0.1);
const yScale = d3.scaleLinear()
.domain([0, d3.max(stackedData[stackedData.length - 1], d => d[1])])
.range([innerHeight, 0]);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
g.selectAll("g")
.data(stackedData)
.join("g")
.attr("fill", (d, i) => colourScale(i))
.selectAll("rect")
.data(d => d)
.join("rect")
.attr("x", d => xScale(d.data.group))
.attr("y", d => yScale(d[1]))
.attr("height", d => yScale(d[0]) - yScale(d[1]))
.attr("width", xScale.bandwidth());
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(xScale));
g.append("g")
.call(d3.axisLeft(yScale));
}, [data]);
```
### Grouped bar chart
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const categories = Object.keys(data[0]).filter(k => k !== 'group');
const x0Scale = d3.scaleBand()
.domain(data.map(d => d.group))
.range([0, innerWidth])
.padding(0.1);
const x1Scale = d3.scaleBand()
.domain(categories)
.range([0, x0Scale.bandwidth()])
.padding(0.05);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => Math.max(...categories.map(c => d[c])))])
.range([innerHeight, 0]);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
const group = g.selectAll("g")
.data(data)
.join("g")
.attr("transform", d => `translate(${x0Scale(d.group)},0)`);
group.selectAll("rect")
.data(d => categories.map(key => ({ key, value: d[key] })))
.join("rect")
.attr("x", d => x1Scale(d.key))
.attr("y", d => yScale(d.value))
.attr("width", x1Scale.bandwidth())
.attr("height", d => innerHeight - yScale(d.value))
.attr("fill", d => colourScale(d.key));
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(x0Scale));
g.append("g")
.call(d3.axisLeft(yScale));
}, [data]);
```
### Bubble chart
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const xScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.x)])
.range([0, innerWidth]);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.y)])
.range([innerHeight, 0]);
const sizeScale = d3.scaleSqrt()
.domain([0, d3.max(data, d => d.size)])
.range([0, 50]);
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
g.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", d => sizeScale(d.size))
.attr("fill", d => colourScale(d.category))
.attr("opacity", 0.6)
.attr("stroke", "white")
.attr("stroke-width", 2);
g.append("g")
.attr("transform", `translate(0,${innerHeight})`)
.call(d3.axisBottom(xScale));
g.append("g")
.call(d3.axisLeft(yScale));
}, [data]);
```
## Geographic visualisations
### Basic map with points
```javascript
useEffect(() => {
if (!geoData || !pointData) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const projection = d3.geoMercator()
.fitSize([width, height], geoData);
const pathGenerator = d3.geoPath().projection(projection);
// Draw map
svg.selectAll("path")
.data(geoData.features)
.join("path")
.attr("d", pathGenerator)
.attr("fill", "#e0e0e0")
.attr("stroke", "#999")
.attr("stroke-width", 0.5);
// Draw points
svg.selectAll("circle")
.data(pointData)
.join("circle")
.attr("cx", d => projection([d.longitude, d.latitude])[0])
.attr("cy", d => projection([d.longitude, d.latitude])[1])
.attr("r", 5)
.attr("fill", "steelblue")
.attr("opacity", 0.7);
}, [geoData, pointData]);
```
### Choropleth map
```javascript
useEffect(() => {
if (!geoData || !valueData) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 600;
const projection = d3.geoMercator()
.fitSize([width, height], geoData);
const pathGenerator = d3.geoPath().projection(projection);
// Create value lookup
const valueLookup = new Map(valueData.map(d => [d.id, d.value]));
// Colour scale
const colourScale = d3.scaleSequential(d3.interpolateBlues)
.domain([0, d3.max(valueData, d => d.value)]);
svg.selectAll("path")
.data(geoData.features)
.join("path")
.attr("d", pathGenerator)
.attr("fill", d => {
const value = valueLookup.get(d.id);
return value ? colourScale(value) : "#e0e0e0";
})
.attr("stroke", "#999")
.attr("stroke-width", 0.5);
}, [geoData, valueData]);
```
## Advanced interactions
### Brush and zoom
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
svg.selectAll("*").remove();
const width = 800;
const height = 400;
const margin = { top: 20, right: 30, bottom: 40, left: 50 };
const innerWidth = width - margin.left - margin.right;
const innerHeight = height - margin.top - margin.bottom;
const xScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.x)])
.range([0, innerWidth]);
const yScale = d3.scaleLinear()
.domain([0, d3.max(data, d => d.y)])
.range([innerHeight, 0]);
const g = svg.append("g")
.attr("transform", `translate(${margin.left},${margin.top})`);
const circles = g.selectAll("circle")
.data(data)
.join("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", 5)
.attr("fill", "steelblue");
// Add brush
const brush = d3.brush()
.extent([[0, 0], [innerWidth, innerHeight]])
.on("start brush", (event) => {
if (!event.selection) return;
const [[x0, y0], [x1, y1]] = event.selection;
circles.attr("fill", d => {
const cx = xScale(d.x);
const cy = yScale(d.y);
return (cx >= x0 && cx <= x1 && cy >= y0 && cy <= y1)
? "orange"
: "steelblue";
});
});
g.append("g")
.attr("class", "brush")
.call(brush);
}, [data]);
```
### Linked brushing between charts
```javascript
function LinkedCharts({ data }) {
const [selectedPoints, setSelectedPoints] = useState(new Set());
const svg1Ref = useRef();
const svg2Ref = useRef();
useEffect(() => {
// Chart 1: Scatter plot
const svg1 = d3.select(svg1Ref.current);
svg1.selectAll("*").remove();
// ... create first chart ...
const circles1 = svg1.selectAll("circle")
.data(data)
.join("circle")
.attr("fill", d => selectedPoints.has(d.id) ? "orange" : "steelblue");
// Chart 2: Bar chart
const svg2 = d3.select(svg2Ref.current);
svg2.selectAll("*").remove();
// ... create second chart ...
const bars = svg2.selectAll("rect")
.data(data)
.join("rect")
.attr("fill", d => selectedPoints.has(d.id) ? "orange" : "steelblue");
// Add brush to first chart
const brush = d3.brush()
.on("start brush end", (event) => {
if (!event.selection) {
setSelectedPoints(new Set());
return;
}
const [[x0, y0], [x1, y1]] = event.selection;
const selected = new Set();
data.forEach(d => {
const x = xScale(d.x);
const y = yScale(d.y);
if (x >= x0 && x <= x1 && y >= y0 && y <= y1) {
selected.add(d.id);
}
});
setSelectedPoints(selected);
});
svg1.append("g").call(brush);
}, [data, selectedPoints]);
return (
<div>
<svg ref={svg1Ref} width="400" height="300" />
<svg ref={svg2Ref} width="400" height="300" />
</div>
);
}
```
## Animation patterns
### Enter, update, exit with transitions
```javascript
useEffect(() => {
if (!data || data.length === 0) return;
const svg = d3.select(svgRef.current);
const circles = svg.selectAll("circle")
.data(data, d => d.id); // Key function for object constancy
// EXIT: Remove old elements
circles.exit()
.transition()
.duration(500)
.attr("r", 0)
.remove();
// UPDATE: Modify existing elements
circles
.transition()
.duration(500)
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("fill", "steelblue");
// ENTER: Add new elements
circles.enter()
.append("circle")
.attr("cx", d => xScale(d.x))
.attr("cy", d => yScale(d.y))
.attr("r", 0)
.attr("fill", "steelblue")
.transition()
.duration(500)
.attr("r", 5);
}, [data]);
```
### Path morphing
```javascript
useEffect(() => {
if (!data1 || !data2) return;
const svg = d3.select(svgRef.current);
const line = d3.line()
.x(d => xScale(d.x))
.y(d => yScale(d.y))
.curve(d3.curveMonotoneX);
const path = svg.select("path");
// Morph from data1 to data2
path
.datum(data1)
.attr("d", line)
.transition()
.duration(1000)
.attrTween("d", function() {
const previous = d3.select(this).attr("d");
const current = line(data2);
return d3.interpolatePath(previous, current);
});
}, [data1, data2]);
```

View File

@@ -0,0 +1,509 @@
# D3.js Scale Reference
Comprehensive guide to all d3 scale types with examples and use cases.
## Continuous scales
### Linear scale
Maps continuous input domain to continuous output range with linear interpolation.
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range([0, 500]);
scale(50); // Returns 250
scale(0); // Returns 0
scale(100); // Returns 500
// Invert scale (get input from output)
scale.invert(250); // Returns 50
```
**Use cases:**
- Most common scale for quantitative data
- Axes, bar lengths, position encoding
- Temperature, prices, counts, measurements
**Methods:**
- `.domain([min, max])` - Set input domain
- `.range([min, max])` - Set output range
- `.invert(value)` - Get domain value from range value
- `.clamp(true)` - Restrict output to range bounds
- `.nice()` - Extend domain to nice round values
### Power scale
Maps continuous input to continuous output with exponential transformation.
```javascript
const sqrtScale = d3.scalePow()
.exponent(0.5) // Square root
.domain([0, 100])
.range([0, 500]);
const squareScale = d3.scalePow()
.exponent(2) // Square
.domain([0, 100])
.range([0, 500]);
// Shorthand for square root
const sqrtScale2 = d3.scaleSqrt()
.domain([0, 100])
.range([0, 500]);
```
**Use cases:**
- Perceptual scaling (human perception is non-linear)
- Area encoding (use square root to map values to circle radii)
- Emphasising differences in small or large values
### Logarithmic scale
Maps continuous input to continuous output with logarithmic transformation.
```javascript
const logScale = d3.scaleLog()
.domain([1, 1000]) // Must be positive
.range([0, 500]);
logScale(1); // Returns 0
logScale(10); // Returns ~167
logScale(100); // Returns ~333
logScale(1000); // Returns 500
```
**Use cases:**
- Data spanning multiple orders of magnitude
- Population, GDP, wealth distributions
- Logarithmic axes
- Exponential growth visualisations
**Important:** Domain values must be strictly positive (>0).
### Time scale
Specialised linear scale for temporal data.
```javascript
const timeScale = d3.scaleTime()
.domain([new Date(2020, 0, 1), new Date(2024, 0, 1)])
.range([0, 800]);
timeScale(new Date(2022, 0, 1)); // Returns 400
// Invert to get date
timeScale.invert(400); // Returns Date object for mid-2022
```
**Use cases:**
- Time series visualisations
- Timeline axes
- Temporal animations
- Date-based interactions
**Methods:**
- `.nice()` - Extend domain to nice time intervals
- `.ticks(count)` - Generate nicely-spaced tick values
- All linear scale methods apply
### Quantize scale
Maps continuous input to discrete output buckets.
```javascript
const quantizeScale = d3.scaleQuantize()
.domain([0, 100])
.range(['low', 'medium', 'high']);
quantizeScale(25); // Returns 'low'
quantizeScale(50); // Returns 'medium'
quantizeScale(75); // Returns 'high'
// Get the threshold values
quantizeScale.thresholds(); // Returns [33.33, 66.67]
```
**Use cases:**
- Binning continuous data
- Heat map colours
- Risk categories (low/medium/high)
- Age groups, income brackets
### Quantile scale
Maps continuous input to discrete output based on quantiles.
```javascript
const quantileScale = d3.scaleQuantile()
.domain([3, 6, 7, 8, 8, 10, 13, 15, 16, 20, 24]) // Sample data
.range(['low', 'medium', 'high']);
quantileScale(8); // Returns based on quantile position
quantileScale.quantiles(); // Returns quantile thresholds
```
**Use cases:**
- Equal-size groups regardless of distribution
- Percentile-based categorisation
- Handling skewed distributions
### Threshold scale
Maps continuous input to discrete output with custom thresholds.
```javascript
const thresholdScale = d3.scaleThreshold()
.domain([0, 10, 20])
.range(['freezing', 'cold', 'warm', 'hot']);
thresholdScale(-5); // Returns 'freezing'
thresholdScale(5); // Returns 'cold'
thresholdScale(15); // Returns 'warm'
thresholdScale(25); // Returns 'hot'
```
**Use cases:**
- Custom breakpoints
- Grade boundaries (A, B, C, D, F)
- Temperature categories
- Air quality indices
## Sequential scales
### Sequential colour scale
Maps continuous input to continuous colour gradient.
```javascript
const colourScale = d3.scaleSequential(d3.interpolateBlues)
.domain([0, 100]);
colourScale(0); // Returns lightest blue
colourScale(50); // Returns mid blue
colourScale(100); // Returns darkest blue
```
**Available interpolators:**
**Single hue:**
- `d3.interpolateBlues`, `d3.interpolateGreens`, `d3.interpolateReds`
- `d3.interpolateOranges`, `d3.interpolatePurples`, `d3.interpolateGreys`
**Multi-hue:**
- `d3.interpolateViridis`, `d3.interpolateInferno`, `d3.interpolateMagma`
- `d3.interpolatePlasma`, `d3.interpolateWarm`, `d3.interpolateCool`
- `d3.interpolateCubehelixDefault`, `d3.interpolateTurbo`
**Use cases:**
- Heat maps, choropleth maps
- Continuous data visualisation
- Temperature, elevation, density
### Diverging colour scale
Maps continuous input to diverging colour gradient with a midpoint.
```javascript
const divergingScale = d3.scaleDiverging(d3.interpolateRdBu)
.domain([-10, 0, 10]);
divergingScale(-10); // Returns red
divergingScale(0); // Returns white/neutral
divergingScale(10); // Returns blue
```
**Available interpolators:**
- `d3.interpolateRdBu` - Red to blue
- `d3.interpolateRdYlBu` - Red, yellow, blue
- `d3.interpolateRdYlGn` - Red, yellow, green
- `d3.interpolatePiYG` - Pink, yellow, green
- `d3.interpolateBrBG` - Brown, blue-green
- `d3.interpolatePRGn` - Purple, green
- `d3.interpolatePuOr` - Purple, orange
- `d3.interpolateRdGy` - Red, grey
- `d3.interpolateSpectral` - Rainbow spectrum
**Use cases:**
- Data with meaningful midpoint (zero, average, neutral)
- Positive/negative values
- Above/below comparisons
- Correlation matrices
### Sequential quantile scale
Combines sequential colour with quantile mapping.
```javascript
const sequentialQuantileScale = d3.scaleSequentialQuantile(d3.interpolateBlues)
.domain([3, 6, 7, 8, 8, 10, 13, 15, 16, 20, 24]);
// Maps based on quantile position
```
**Use cases:**
- Perceptually uniform binning
- Handling outliers
- Skewed distributions
## Ordinal scales
### Band scale
Maps discrete input to continuous bands (rectangles) with optional padding.
```javascript
const bandScale = d3.scaleBand()
.domain(['A', 'B', 'C', 'D'])
.range([0, 400])
.padding(0.1);
bandScale('A'); // Returns start position (e.g., 0)
bandScale('B'); // Returns start position (e.g., 110)
bandScale.bandwidth(); // Returns width of each band (e.g., 95)
bandScale.step(); // Returns total step including padding
bandScale.paddingInner(); // Returns inner padding (between bands)
bandScale.paddingOuter(); // Returns outer padding (at edges)
```
**Use cases:**
- Bar charts (most common use case)
- Grouped elements
- Categorical axes
- Heat map cells
**Padding options:**
- `.padding(value)` - Sets both inner and outer padding (0-1)
- `.paddingInner(value)` - Padding between bands (0-1)
- `.paddingOuter(value)` - Padding at edges (0-1)
- `.align(value)` - Alignment of bands (0-1, default 0.5)
### Point scale
Maps discrete input to continuous points (no width).
```javascript
const pointScale = d3.scalePoint()
.domain(['A', 'B', 'C', 'D'])
.range([0, 400])
.padding(0.5);
pointScale('A'); // Returns position (e.g., 50)
pointScale('B'); // Returns position (e.g., 150)
pointScale('C'); // Returns position (e.g., 250)
pointScale('D'); // Returns position (e.g., 350)
pointScale.step(); // Returns distance between points
```
**Use cases:**
- Line chart categorical x-axis
- Scatter plot with categorical axis
- Node positions in network graphs
- Any point positioning for categories
### Ordinal colour scale
Maps discrete input to discrete output (colours, shapes, etc.).
```javascript
const colourScale = d3.scaleOrdinal(d3.schemeCategory10);
colourScale('apples'); // Returns first colour
colourScale('oranges'); // Returns second colour
colourScale('apples'); // Returns same first colour (consistent)
// Custom range
const customScale = d3.scaleOrdinal()
.domain(['cat1', 'cat2', 'cat3'])
.range(['#FF6B6B', '#4ECDC4', '#45B7D1']);
```
**Built-in colour schemes:**
**Categorical:**
- `d3.schemeCategory10` - 10 colours
- `d3.schemeAccent` - 8 colours
- `d3.schemeDark2` - 8 colours
- `d3.schemePaired` - 12 colours
- `d3.schemePastel1` - 9 colours
- `d3.schemePastel2` - 8 colours
- `d3.schemeSet1` - 9 colours
- `d3.schemeSet2` - 8 colours
- `d3.schemeSet3` - 12 colours
- `d3.schemeTableau10` - 10 colours
**Use cases:**
- Category colours
- Legend items
- Multi-series charts
- Network node types
## Scale utilities
### Nice domain
Extend domain to nice round values.
```javascript
const scale = d3.scaleLinear()
.domain([0.201, 0.996])
.nice();
scale.domain(); // Returns [0.2, 1.0]
// With count (approximate tick count)
const scale2 = d3.scaleLinear()
.domain([0.201, 0.996])
.nice(5);
```
### Clamping
Restrict output to range bounds.
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range([0, 500])
.clamp(true);
scale(-10); // Returns 0 (clamped)
scale(150); // Returns 500 (clamped)
```
### Copy scales
Create independent copies.
```javascript
const scale1 = d3.scaleLinear()
.domain([0, 100])
.range([0, 500]);
const scale2 = scale1.copy();
// scale2 is independent of scale1
```
### Tick generation
Generate nice tick values for axes.
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range([0, 500]);
scale.ticks(10); // Generate ~10 ticks
scale.tickFormat(10); // Get format function for ticks
scale.tickFormat(10, ".2f"); // Custom format (2 decimal places)
// Time scale ticks
const timeScale = d3.scaleTime()
.domain([new Date(2020, 0, 1), new Date(2024, 0, 1)]);
timeScale.ticks(d3.timeYear); // Yearly ticks
timeScale.ticks(d3.timeMonth, 3); // Every 3 months
timeScale.tickFormat(5, "%Y-%m"); // Format as year-month
```
## Colour spaces and interpolation
### RGB interpolation
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range(["blue", "red"]);
// Default: RGB interpolation
```
### HSL interpolation
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range(["blue", "red"])
.interpolate(d3.interpolateHsl);
// Smoother colour transitions
```
### Lab interpolation
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range(["blue", "red"])
.interpolate(d3.interpolateLab);
// Perceptually uniform
```
### HCL interpolation
```javascript
const scale = d3.scaleLinear()
.domain([0, 100])
.range(["blue", "red"])
.interpolate(d3.interpolateHcl);
// Perceptually uniform with hue
```
## Common patterns
### Diverging scale with custom midpoint
```javascript
const scale = d3.scaleLinear()
.domain([min, midpoint, max])
.range(["red", "white", "blue"])
.interpolate(d3.interpolateHcl);
```
### Multi-stop gradient scale
```javascript
const scale = d3.scaleLinear()
.domain([0, 25, 50, 75, 100])
.range(["#d53e4f", "#fc8d59", "#fee08b", "#e6f598", "#66c2a5"]);
```
### Radius scale for circles (perceptual)
```javascript
const radiusScale = d3.scaleSqrt()
.domain([0, d3.max(data, d => d.value)])
.range([0, 50]);
// Use with circles
circle.attr("r", d => radiusScale(d.value));
```
### Adaptive scale based on data range
```javascript
function createAdaptiveScale(data) {
const extent = d3.extent(data);
const range = extent[1] - extent[0];
// Use log scale if data spans >2 orders of magnitude
if (extent[1] / extent[0] > 100) {
return d3.scaleLog()
.domain(extent)
.range([0, width]);
}
// Otherwise use linear
return d3.scaleLinear()
.domain(extent)
.range([0, width]);
}
```
### Colour scale with explicit categories
```javascript
const colourScale = d3.scaleOrdinal()
.domain(['Low Risk', 'Medium Risk', 'High Risk'])
.range(['#2ecc71', '#f39c12', '#e74c3c'])
.unknown('#95a5a6'); // Fallback for unknown values
```

View File

@@ -0,0 +1,263 @@
---
name: database-architect
description: Expert database architect specializing in data layer design from scratch, technology selection, schema modeling, and scalable database architectures.
risk: unknown
source: community
date_added: '2026-02-27'
---
You are a database architect specializing in designing scalable, performant, and maintainable data layers from the ground up.
## Use this skill when
- Selecting database technologies or storage patterns
- Designing schemas, partitions, or replication strategies
- Planning migrations or re-architecting data layers
## Do not use this skill when
- You only need query tuning
- You need application-level feature design only
- You cannot modify the data model or infrastructure
## Instructions
1. Capture data domain, access patterns, and scale targets.
2. Choose the database model and architecture pattern.
3. Design schemas, indexes, and lifecycle policies.
4. Plan migration, backup, and rollout strategies.
## Safety
- Avoid destructive changes without backups and rollbacks.
- Validate migration plans in staging before production.
## Purpose
Expert database architect with comprehensive knowledge of data modeling, technology selection, and scalable database design. Masters both greenfield architecture and re-architecture of existing systems. Specializes in choosing the right database technology, designing optimal schemas, planning migrations, and building performance-first data architectures that scale with application growth.
## Core Philosophy
Design the data layer right from the start to avoid costly rework. Focus on choosing the right technology, modeling data correctly, and planning for scale from day one. Build architectures that are both performant today and adaptable for tomorrow's requirements.
## Capabilities
### Technology Selection & Evaluation
- **Relational databases**: PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
- **NoSQL databases**: MongoDB, DynamoDB, Cassandra, CouchDB, Redis, Couchbase
- **Time-series databases**: TimescaleDB, InfluxDB, ClickHouse, QuestDB
- **NewSQL databases**: CockroachDB, TiDB, Google Spanner, YugabyteDB
- **Graph databases**: Neo4j, Amazon Neptune, ArangoDB
- **Search engines**: Elasticsearch, OpenSearch, Meilisearch, Typesense
- **Document stores**: MongoDB, Firestore, RavenDB, DocumentDB
- **Key-value stores**: Redis, DynamoDB, etcd, Memcached
- **Wide-column stores**: Cassandra, HBase, ScyllaDB, Bigtable
- **Multi-model databases**: ArangoDB, OrientDB, FaunaDB, CosmosDB
- **Decision frameworks**: Consistency vs availability trade-offs, CAP theorem implications
- **Technology assessment**: Performance characteristics, operational complexity, cost implications
- **Hybrid architectures**: Polyglot persistence, multi-database strategies, data synchronization
### Data Modeling & Schema Design
- **Conceptual modeling**: Entity-relationship diagrams, domain modeling, business requirement mapping
- **Logical modeling**: Normalization (1NF-5NF), denormalization strategies, dimensional modeling
- **Physical modeling**: Storage optimization, data type selection, partitioning strategies
- **Relational design**: Table relationships, foreign keys, constraints, referential integrity
- **NoSQL design patterns**: Document embedding vs referencing, data duplication strategies
- **Schema evolution**: Versioning strategies, backward/forward compatibility, migration patterns
- **Data integrity**: Constraints, triggers, check constraints, application-level validation
- **Temporal data**: Slowly changing dimensions, event sourcing, audit trails, time-travel queries
- **Hierarchical data**: Adjacency lists, nested sets, materialized paths, closure tables
- **JSON/semi-structured**: JSONB indexes, schema-on-read vs schema-on-write
- **Multi-tenancy**: Shared schema, database per tenant, schema per tenant trade-offs
- **Data archival**: Historical data strategies, cold storage, compliance requirements
### Normalization vs Denormalization
- **Normalization benefits**: Data consistency, update efficiency, storage optimization
- **Denormalization strategies**: Read performance optimization, reduced JOIN complexity
- **Trade-off analysis**: Write vs read patterns, consistency requirements, query complexity
- **Hybrid approaches**: Selective denormalization, materialized views, derived columns
- **OLTP vs OLAP**: Transaction processing vs analytical workload optimization
- **Aggregate patterns**: Pre-computed aggregations, incremental updates, refresh strategies
- **Dimensional modeling**: Star schema, snowflake schema, fact and dimension tables
### Indexing Strategy & Design
- **Index types**: B-tree, Hash, GiST, GIN, BRIN, bitmap, spatial indexes
- **Composite indexes**: Column ordering, covering indexes, index-only scans
- **Partial indexes**: Filtered indexes, conditional indexing, storage optimization
- **Full-text search**: Text search indexes, ranking strategies, language-specific optimization
- **JSON indexing**: JSONB GIN indexes, expression indexes, path-based indexes
- **Unique constraints**: Primary keys, unique indexes, compound uniqueness
- **Index planning**: Query pattern analysis, index selectivity, cardinality considerations
- **Index maintenance**: Bloat management, statistics updates, rebuild strategies
- **Cloud-specific**: Aurora indexing, Azure SQL intelligent indexing, managed index recommendations
- **NoSQL indexing**: MongoDB compound indexes, DynamoDB secondary indexes (GSI/LSI)
### Query Design & Optimization
- **Query patterns**: Read-heavy, write-heavy, analytical, transactional patterns
- **JOIN strategies**: INNER, LEFT, RIGHT, FULL joins, cross joins, semi/anti joins
- **Subquery optimization**: Correlated subqueries, derived tables, CTEs, materialization
- **Window functions**: Ranking, running totals, moving averages, partition-based analysis
- **Aggregation patterns**: GROUP BY optimization, HAVING clauses, cube/rollup operations
- **Query hints**: Optimizer hints, index hints, join hints (when appropriate)
- **Prepared statements**: Parameterized queries, plan caching, SQL injection prevention
- **Batch operations**: Bulk inserts, batch updates, upsert patterns, merge operations
### Caching Architecture
- **Cache layers**: Application cache, query cache, object cache, result cache
- **Cache technologies**: Redis, Memcached, Varnish, application-level caching
- **Cache strategies**: Cache-aside, write-through, write-behind, refresh-ahead
- **Cache invalidation**: TTL strategies, event-driven invalidation, cache stampede prevention
- **Distributed caching**: Redis Cluster, cache partitioning, cache consistency
- **Materialized views**: Database-level caching, incremental refresh, full refresh strategies
- **CDN integration**: Edge caching, API response caching, static asset caching
- **Cache warming**: Preloading strategies, background refresh, predictive caching
### Scalability & Performance Design
- **Vertical scaling**: Resource optimization, instance sizing, performance tuning
- **Horizontal scaling**: Read replicas, load balancing, connection pooling
- **Partitioning strategies**: Range, hash, list, composite partitioning
- **Sharding design**: Shard key selection, resharding strategies, cross-shard queries
- **Replication patterns**: Master-slave, master-master, multi-region replication
- **Consistency models**: Strong consistency, eventual consistency, causal consistency
- **Connection pooling**: Pool sizing, connection lifecycle, timeout configuration
- **Load distribution**: Read/write splitting, geographic distribution, workload isolation
- **Storage optimization**: Compression, columnar storage, tiered storage
- **Capacity planning**: Growth projections, resource forecasting, performance baselines
### Migration Planning & Strategy
- **Migration approaches**: Big bang, trickle, parallel run, strangler pattern
- **Zero-downtime migrations**: Online schema changes, rolling deployments, blue-green databases
- **Data migration**: ETL pipelines, data validation, consistency checks, rollback procedures
- **Schema versioning**: Migration tools (Flyway, Liquibase, Alembic, Prisma), version control
- **Rollback planning**: Backup strategies, data snapshots, recovery procedures
- **Cross-database migration**: SQL to NoSQL, database engine switching, cloud migration
- **Large table migrations**: Chunked migrations, incremental approaches, downtime minimization
- **Testing strategies**: Migration testing, data integrity validation, performance testing
- **Cutover planning**: Timing, coordination, rollback triggers, success criteria
### Transaction Design & Consistency
- **ACID properties**: Atomicity, consistency, isolation, durability requirements
- **Isolation levels**: Read uncommitted, read committed, repeatable read, serializable
- **Transaction patterns**: Unit of work, optimistic locking, pessimistic locking
- **Distributed transactions**: Two-phase commit, saga patterns, compensating transactions
- **Eventual consistency**: BASE properties, conflict resolution, version vectors
- **Concurrency control**: Lock management, deadlock prevention, timeout strategies
- **Idempotency**: Idempotent operations, retry safety, deduplication strategies
- **Event sourcing**: Event store design, event replay, snapshot strategies
### Security & Compliance
- **Access control**: Role-based access (RBAC), row-level security, column-level security
- **Encryption**: At-rest encryption, in-transit encryption, key management
- **Data masking**: Dynamic data masking, anonymization, pseudonymization
- **Audit logging**: Change tracking, access logging, compliance reporting
- **Compliance patterns**: GDPR, HIPAA, PCI-DSS, SOC2 compliance architecture
- **Data retention**: Retention policies, automated cleanup, legal holds
- **Sensitive data**: PII handling, tokenization, secure storage patterns
- **Backup security**: Encrypted backups, secure storage, access controls
### Cloud Database Architecture
- **AWS databases**: RDS, Aurora, DynamoDB, DocumentDB, Neptune, Timestream
- **Azure databases**: SQL Database, Cosmos DB, Database for PostgreSQL/MySQL, Synapse
- **GCP databases**: Cloud SQL, Cloud Spanner, Firestore, Bigtable, BigQuery
- **Serverless databases**: Aurora Serverless, Azure SQL Serverless, FaunaDB
- **Database-as-a-Service**: Managed benefits, operational overhead reduction, cost implications
- **Cloud-native features**: Auto-scaling, automated backups, point-in-time recovery
- **Multi-region design**: Global distribution, cross-region replication, latency optimization
- **Hybrid cloud**: On-premises integration, private cloud, data sovereignty
### ORM & Framework Integration
- **ORM selection**: Django ORM, SQLAlchemy, Prisma, TypeORM, Entity Framework, ActiveRecord
- **Schema-first vs Code-first**: Migration generation, type safety, developer experience
- **Migration tools**: Prisma Migrate, Alembic, Flyway, Liquibase, Laravel Migrations
- **Query builders**: Type-safe queries, dynamic query construction, performance implications
- **Connection management**: Pooling configuration, transaction handling, session management
- **Performance patterns**: Eager loading, lazy loading, batch fetching, N+1 prevention
- **Type safety**: Schema validation, runtime checks, compile-time safety
### Monitoring & Observability
- **Performance metrics**: Query latency, throughput, connection counts, cache hit rates
- **Monitoring tools**: CloudWatch, DataDog, New Relic, Prometheus, Grafana
- **Query analysis**: Slow query logs, execution plans, query profiling
- **Capacity monitoring**: Storage growth, CPU/memory utilization, I/O patterns
- **Alert strategies**: Threshold-based alerts, anomaly detection, SLA monitoring
- **Performance baselines**: Historical trends, regression detection, capacity planning
### Disaster Recovery & High Availability
- **Backup strategies**: Full, incremental, differential backups, backup rotation
- **Point-in-time recovery**: Transaction log backups, continuous archiving, recovery procedures
- **High availability**: Active-passive, active-active, automatic failover
- **RPO/RTO planning**: Recovery point objectives, recovery time objectives, testing procedures
- **Multi-region**: Geographic distribution, disaster recovery regions, failover automation
- **Data durability**: Replication factor, synchronous vs asynchronous replication
## Behavioral Traits
- Starts with understanding business requirements and access patterns before choosing technology
- Designs for both current needs and anticipated future scale
- Recommends schemas and architecture (doesn't modify files unless explicitly requested)
- Plans migrations thoroughly (doesn't execute unless explicitly requested)
- Generates ERD diagrams only when requested
- Considers operational complexity alongside performance requirements
- Values simplicity and maintainability over premature optimization
- Documents architectural decisions with clear rationale and trade-offs
- Designs with failure modes and edge cases in mind
- Balances normalization principles with real-world performance needs
- Considers the entire application architecture when designing data layer
- Emphasizes testability and migration safety in design decisions
## Workflow Position
- **Before**: backend-architect (data layer informs API design)
- **Complements**: database-admin (operations), database-optimizer (performance tuning), performance-engineer (system-wide optimization)
- **Enables**: Backend services can be built on solid data foundation
## Knowledge Base
- Relational database theory and normalization principles
- NoSQL database patterns and consistency models
- Time-series and analytical database optimization
- Cloud database services and their specific features
- Migration strategies and zero-downtime deployment patterns
- ORM frameworks and code-first vs database-first approaches
- Scalability patterns and distributed system design
- Security and compliance requirements for data systems
- Modern development workflows and CI/CD integration
## Response Approach
1. **Understand requirements**: Business domain, access patterns, scale expectations, consistency needs
2. **Recommend technology**: Database selection with clear rationale and trade-offs
3. **Design schema**: Conceptual, logical, and physical models with normalization considerations
4. **Plan indexing**: Index strategy based on query patterns and access frequency
5. **Design caching**: Multi-tier caching architecture for performance optimization
6. **Plan scalability**: Partitioning, sharding, replication strategies for growth
7. **Migration strategy**: Version-controlled, zero-downtime migration approach (recommend only)
8. **Document decisions**: Clear rationale, trade-offs, alternatives considered
9. **Generate diagrams**: ERD diagrams when requested using Mermaid
10. **Consider integration**: ORM selection, framework compatibility, developer experience
## Example Interactions
- "Design a database schema for a multi-tenant SaaS e-commerce platform"
- "Help me choose between PostgreSQL and MongoDB for a real-time analytics dashboard"
- "Create a migration strategy to move from MySQL to PostgreSQL with zero downtime"
- "Design a time-series database architecture for IoT sensor data at 1M events/second"
- "Re-architect our monolithic database into a microservices data architecture"
- "Plan a sharding strategy for a social media platform expecting 100M users"
- "Design a CQRS event-sourced architecture for an order management system"
- "Create an ERD for a healthcare appointment booking system" (generates Mermaid diagram)
- "Optimize schema design for a read-heavy content management system"
- "Design a multi-region database architecture with strong consistency guarantees"
- "Plan migration from denormalized NoSQL to normalized relational schema"
- "Create a database architecture for GDPR-compliant user data storage"
## Key Distinctions
- **vs database-optimizer**: Focuses on architecture and design (greenfield/re-architecture) rather than tuning existing systems
- **vs database-admin**: Focuses on design decisions rather than operations and maintenance
- **vs backend-architect**: Focuses specifically on data layer architecture before backend services are designed
- **vs performance-engineer**: Focuses on data architecture design rather than system-wide performance optimization
## Output Examples
When designing architecture, provide:
- Technology recommendation with selection rationale
- Schema design with tables/collections, relationships, constraints
- Index strategy with specific indexes and rationale
- Caching architecture with layers and invalidation strategy
- Migration plan with phases and rollback procedures
- Scaling strategy with growth projections
- ERD diagrams (when requested) using Mermaid syntax
- Code examples for ORM integration and migration scripts
- Monitoring and alerting recommendations
- Documentation of trade-offs and alternative approaches considered

View File

@@ -0,0 +1,119 @@
# Postgres Best Practices - Contributor Guide
This repository contains Postgres performance optimization rules optimized for
AI agents and LLMs.
## Quick Start
```bash
# Install dependencies
cd packages/postgres-best-practices-build
npm install
# Validate existing rules
npm run validate
# Build AGENTS.md
npm run build
```
## Creating a New Rule
1. **Choose a section prefix** based on the category:
- `query-` Query Performance (CRITICAL)
- `conn-` Connection Management (CRITICAL)
- `security-` Security & RLS (CRITICAL)
- `schema-` Schema Design (HIGH)
- `lock-` Concurrency & Locking (MEDIUM-HIGH)
- `data-` Data Access Patterns (MEDIUM)
- `monitor-` Monitoring & Diagnostics (LOW-MEDIUM)
- `advanced-` Advanced Features (LOW)
2. **Copy the template**:
```bash
cp rules/_template.md rules/query-your-rule-name.md
```
3. **Fill in the content** following the template structure
4. **Validate and build**:
```bash
npm run validate
npm run build
```
5. **Review** the generated `AGENTS.md`
## Repository Structure
```
skills/postgres-best-practices/
├── SKILL.md # Agent-facing skill manifest
├── AGENTS.md # [GENERATED] Compiled rules document
├── README.md # This file
├── metadata.json # Version and metadata
└── rules/
├── _template.md # Rule template
├── _sections.md # Section definitions
├── _contributing.md # Writing guidelines
└── *.md # Individual rules
packages/postgres-best-practices-build/
├── src/ # Build system source
├── package.json # NPM scripts
└── test-cases.json # [GENERATED] Test artifacts
```
## Rule File Structure
See `rules/_template.md` for the complete template. Key elements:
````markdown
---
title: Clear, Action-Oriented Title
impact: CRITICAL|HIGH|MEDIUM-HIGH|MEDIUM|LOW-MEDIUM|LOW
impactDescription: Quantified benefit (e.g., "10-100x faster")
tags: relevant, keywords
---
## [Title]
[1-2 sentence explanation]
**Incorrect (description):**
```sql
-- Comment explaining what's wrong
[Bad SQL example]
```
````
**Correct (description):**
```sql
-- Comment explaining why this is better
[Good SQL example]
```
```
## Writing Guidelines
See `rules/_contributing.md` for detailed guidelines. Key principles:
1. **Show concrete transformations** - "Change X to Y", not abstract advice
2. **Error-first structure** - Show the problem before the solution
3. **Quantify impact** - Include specific metrics (10x faster, 50% smaller)
4. **Self-contained examples** - Complete, runnable SQL
5. **Semantic naming** - Use meaningful names (users, email), not (table1, col1)
## Impact Levels
| Level | Improvement | Examples |
|-------|-------------|----------|
| CRITICAL | 10-100x | Missing indexes, connection exhaustion |
| HIGH | 5-20x | Wrong index types, poor partitioning |
| MEDIUM-HIGH | 2-5x | N+1 queries, RLS optimization |
| MEDIUM | 1.5-3x | Redundant indexes, stale statistics |
| LOW-MEDIUM | 1.2-2x | VACUUM tuning, config tweaks |
| LOW | Incremental | Advanced patterns, edge cases |
```

View File

@@ -0,0 +1,58 @@
---
name: postgres-best-practices
description: "Postgres performance optimization and best practices from Supabase. Use this skill when writing, reviewing, or optimizing Postgres queries, schema designs, or database configurations."
risk: unknown
source: community
date_added: "2026-02-27"
---
# Supabase Postgres Best Practices
Comprehensive performance optimization guide for Postgres, maintained by Supabase. Contains rules across 8 categories, prioritized by impact to guide automated query optimization and schema design.
## When to Use
Reference these guidelines when:
- Writing SQL queries or designing schemas
- Implementing indexes or query optimization
- Reviewing database performance issues
- Configuring connection pooling or scaling
- Optimizing for Postgres-specific features
- Working with Row-Level Security (RLS)
## Rule Categories by Priority
| Priority | Category | Impact | Prefix |
|----------|----------|--------|--------|
| 1 | Query Performance | CRITICAL | `query-` |
| 2 | Connection Management | CRITICAL | `conn-` |
| 3 | Security & RLS | CRITICAL | `security-` |
| 4 | Schema Design | HIGH | `schema-` |
| 5 | Concurrency & Locking | MEDIUM-HIGH | `lock-` |
| 6 | Data Access Patterns | MEDIUM | `data-` |
| 7 | Monitoring & Diagnostics | LOW-MEDIUM | `monitor-` |
| 8 | Advanced Features | LOW | `advanced-` |
## How to Use
Read individual rule files for detailed explanations and SQL examples:
```
rules/query-missing-indexes.md
rules/schema-partial-indexes.md
rules/_sections.md
```
Each rule file contains:
- Brief explanation of why it matters
- Incorrect SQL example with explanation
- Correct SQL example with explanation
- Optional EXPLAIN output or metrics
- Additional context and references
- Supabase-specific notes (when applicable)
## Full Compiled Document
For the complete guide with all rules expanded: `AGENTS.md`
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.

View File

@@ -0,0 +1,13 @@
{
"version": "1.0.0",
"organization": "Supabase",
"date": "January 2026",
"abstract": "Comprehensive Postgres performance optimization guide for developers using Supabase and Postgres. Contains performance rules across 8 categories, prioritized by impact from critical (query performance, connection management) to incremental (advanced features). Each rule includes detailed explanations, incorrect vs. correct SQL examples, query plan analysis, and specific performance metrics to guide automated optimization and code generation.",
"references": [
"https://www.postgresql.org/docs/current/",
"https://supabase.com/docs",
"https://wiki.postgresql.org/wiki/Performance_Optimization",
"https://supabase.com/docs/guides/database/overview",
"https://supabase.com/docs/guides/auth/row-level-security"
]
}

View File

@@ -0,0 +1,171 @@
# Writing Guidelines for Postgres Rules
This document provides guidelines for creating effective Postgres best
practice rules that work well with AI agents and LLMs.
## Key Principles
### 1. Concrete Transformation Patterns
Show exact SQL rewrites. Avoid philosophical advice.
**Good:** "Use `WHERE id = ANY(ARRAY[...])` instead of
`WHERE id IN (SELECT ...)`" **Bad:** "Design good schemas"
### 2. Error-First Structure
Always show the problematic pattern first, then the solution. This trains agents
to recognize anti-patterns.
```markdown
**Incorrect (sequential queries):** [bad example]
**Correct (batched query):** [good example]
```
### 3. Quantified Impact
Include specific metrics. Helps agents prioritize fixes.
**Good:** "10x faster queries", "50% smaller index", "Eliminates N+1"
**Bad:** "Faster", "Better", "More efficient"
### 4. Self-Contained Examples
Examples should be complete and runnable (or close to it). Include `CREATE TABLE`
if context is needed.
```sql
-- Include table definition when needed for clarity
CREATE TABLE users (
id bigint PRIMARY KEY,
email text NOT NULL,
deleted_at timestamptz
);
-- Now show the index
CREATE INDEX users_active_email_idx ON users(email) WHERE deleted_at IS NULL;
```
### 5. Semantic Naming
Use meaningful table/column names. Names carry intent for LLMs.
**Good:** `users`, `email`, `created_at`, `is_active`
**Bad:** `table1`, `col1`, `field`, `flag`
---
## Code Example Standards
### SQL Formatting
```sql
-- Use lowercase keywords, clear formatting
CREATE INDEX CONCURRENTLY users_email_idx
ON users(email)
WHERE deleted_at IS NULL;
-- Not cramped or ALL CAPS
CREATE INDEX CONCURRENTLY USERS_EMAIL_IDX ON USERS(EMAIL) WHERE DELETED_AT IS NULL;
```
### Comments
- Explain _why_, not _what_
- Highlight performance implications
- Point out common pitfalls
### Language Tags
- `sql` - Standard SQL queries
- `plpgsql` - Stored procedures/functions
- `typescript` - Application code (when needed)
- `python` - Application code (when needed)
---
## When to Include Application Code
**Default: SQL Only**
Most rules should focus on pure SQL patterns. This keeps examples portable.
**Include Application Code When:**
- Connection pooling configuration
- Transaction management in application context
- ORM anti-patterns (N+1 in Prisma/TypeORM)
- Prepared statement usage
**Format for Mixed Examples:**
````markdown
**Incorrect (N+1 in application):**
```typescript
for (const user of users) {
const posts = await db.query("SELECT * FROM posts WHERE user_id = $1", [
user.id,
]);
}
```
````
**Correct (batch query):**
```typescript
const posts = await db.query("SELECT * FROM posts WHERE user_id = ANY($1)", [
userIds,
]);
```
---
## Impact Level Guidelines
| Level | Improvement | Use When |
|-------|-------------|----------|
| **CRITICAL** | 10-100x | Missing indexes, connection exhaustion, sequential scans on large tables |
| **HIGH** | 5-20x | Wrong index types, poor partitioning, missing covering indexes |
| **MEDIUM-HIGH** | 2-5x | N+1 queries, inefficient pagination, RLS optimization |
| **MEDIUM** | 1.5-3x | Redundant indexes, query plan instability |
| **LOW-MEDIUM** | 1.2-2x | VACUUM tuning, configuration tweaks |
| **LOW** | Incremental | Advanced patterns, edge cases |
---
## Reference Standards
**Primary Sources:**
- Official Postgres documentation
- Supabase documentation
- Postgres wiki
- Established blogs (2ndQuadrant, Crunchy Data)
**Format:**
```markdown
Reference:
[Postgres Indexes](https://www.postgresql.org/docs/current/indexes.html)
```
---
## Review Checklist
Before submitting a rule:
- [ ] Title is clear and action-oriented
- [ ] Impact level matches the performance gain
- [ ] impactDescription includes quantification
- [ ] Explanation is concise (1-2 sentences)
- [ ] Has at least 1 **Incorrect** SQL example
- [ ] Has at least 1 **Correct** SQL example
- [ ] SQL uses semantic naming
- [ ] Comments explain _why_, not _what_
- [ ] Trade-offs mentioned if applicable
- [ ] Reference links included
- [ ] `npm run validate` passes
- [ ] `npm run build` generates correct output

View File

@@ -0,0 +1,39 @@
# Section Definitions
This file defines the rule categories for Postgres best practices. Rules are automatically assigned to sections based on their filename prefix.
Take the examples below as pure demonstrative. Replace each section with the actual rule categories for Postgres best practices.
---
## 1. Query Performance (query)
**Impact:** CRITICAL
**Description:** Slow queries, missing indexes, inefficient query plans. The most common source of Postgres performance issues.
## 2. Connection Management (conn)
**Impact:** CRITICAL
**Description:** Connection pooling, limits, and serverless strategies. Critical for applications with high concurrency or serverless deployments.
## 3. Security & RLS (security)
**Impact:** CRITICAL
**Description:** Row-Level Security policies, privilege management, and authentication patterns.
## 4. Schema Design (schema)
**Impact:** HIGH
**Description:** Table design, index strategies, partitioning, and data type selection. Foundation for long-term performance.
## 5. Concurrency & Locking (lock)
**Impact:** MEDIUM-HIGH
**Description:** Transaction management, isolation levels, deadlock prevention, and lock contention patterns.
## 6. Data Access Patterns (data)
**Impact:** MEDIUM
**Description:** N+1 query elimination, batch operations, cursor-based pagination, and efficient data fetching.
## 7. Monitoring & Diagnostics (monitor)
**Impact:** LOW-MEDIUM
**Description:** Using pg_stat_statements, EXPLAIN ANALYZE, metrics collection, and performance diagnostics.
## 8. Advanced Features (advanced)
**Impact:** LOW
**Description:** Full-text search, JSONB optimization, PostGIS, extensions, and advanced Postgres features.

View File

@@ -0,0 +1,34 @@
---
title: Clear, Action-Oriented Title (e.g., "Use Partial Indexes for Filtered Queries")
impact: MEDIUM
impactDescription: 5-20x query speedup for filtered queries
tags: indexes, query-optimization, performance
---
## [Rule Title]
[1-2 sentence explanation of the problem and why it matters. Focus on performance impact.]
**Incorrect (describe the problem):**
```sql
-- Comment explaining what makes this slow/problematic
CREATE INDEX users_email_idx ON users(email);
SELECT * FROM users WHERE email = 'user@example.com' AND deleted_at IS NULL;
-- This scans deleted records unnecessarily
```
**Correct (describe the solution):**
```sql
-- Comment explaining why this is better
CREATE INDEX users_active_email_idx ON users(email) WHERE deleted_at IS NULL;
SELECT * FROM users WHERE email = 'user@example.com' AND deleted_at IS NULL;
-- Only indexes active users, 10x smaller index, faster queries
```
[Optional: Additional context, edge cases, or trade-offs]
Reference: [Postgres Docs](https://www.postgresql.org/docs/current/)

View File

@@ -0,0 +1,55 @@
---
title: Use tsvector for Full-Text Search
impact: MEDIUM
impactDescription: 100x faster than LIKE, with ranking support
tags: full-text-search, tsvector, gin, search
---
## Use tsvector for Full-Text Search
LIKE with wildcards can't use indexes. Full-text search with tsvector is orders of magnitude faster.
**Incorrect (LIKE pattern matching):**
```sql
-- Cannot use index, scans all rows
select * from articles where content like '%postgresql%';
-- Case-insensitive makes it worse
select * from articles where lower(content) like '%postgresql%';
```
**Correct (full-text search with tsvector):**
```sql
-- Add tsvector column and index
alter table articles add column search_vector tsvector
generated always as (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(content,''))) stored;
create index articles_search_idx on articles using gin (search_vector);
-- Fast full-text search
select * from articles
where search_vector @@ to_tsquery('english', 'postgresql & performance');
-- With ranking
select *, ts_rank(search_vector, query) as rank
from articles, to_tsquery('english', 'postgresql') query
where search_vector @@ query
order by rank desc;
```
Search multiple terms:
```sql
-- AND: both terms required
to_tsquery('postgresql & performance')
-- OR: either term
to_tsquery('postgresql | mysql')
-- Prefix matching
to_tsquery('post:*')
```
Reference: [Full Text Search](https://supabase.com/docs/guides/database/full-text-search)

View File

@@ -0,0 +1,49 @@
---
title: Index JSONB Columns for Efficient Querying
impact: MEDIUM
impactDescription: 10-100x faster JSONB queries with proper indexing
tags: jsonb, gin, indexes, json
---
## Index JSONB Columns for Efficient Querying
JSONB queries without indexes scan the entire table. Use GIN indexes for containment queries.
**Incorrect (no index on JSONB):**
```sql
create table products (
id bigint primary key,
attributes jsonb
);
-- Full table scan for every query
select * from products where attributes @> '{"color": "red"}';
select * from products where attributes->>'brand' = 'Nike';
```
**Correct (GIN index for JSONB):**
```sql
-- GIN index for containment operators (@>, ?, ?&, ?|)
create index products_attrs_gin on products using gin (attributes);
-- Now containment queries use the index
select * from products where attributes @> '{"color": "red"}';
-- For specific key lookups, use expression index
create index products_brand_idx on products ((attributes->>'brand'));
select * from products where attributes->>'brand' = 'Nike';
```
Choose the right operator class:
```sql
-- jsonb_ops (default): supports all operators, larger index
create index idx1 on products using gin (attributes);
-- jsonb_path_ops: only @> operator, but 2-3x smaller index
create index idx2 on products using gin (attributes jsonb_path_ops);
```
Reference: [JSONB Indexes](https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING)

View File

@@ -0,0 +1,46 @@
---
title: Configure Idle Connection Timeouts
impact: HIGH
impactDescription: Reclaim 30-50% of connection slots from idle clients
tags: connections, timeout, idle, resource-management
---
## Configure Idle Connection Timeouts
Idle connections waste resources. Configure timeouts to automatically reclaim them.
**Incorrect (connections held indefinitely):**
```sql
-- No timeout configured
show idle_in_transaction_session_timeout; -- 0 (disabled)
-- Connections stay open forever, even when idle
select pid, state, state_change, query
from pg_stat_activity
where state = 'idle in transaction';
-- Shows transactions idle for hours, holding locks
```
**Correct (automatic cleanup of idle connections):**
```sql
-- Terminate connections idle in transaction after 30 seconds
alter system set idle_in_transaction_session_timeout = '30s';
-- Terminate completely idle connections after 10 minutes
alter system set idle_session_timeout = '10min';
-- Reload configuration
select pg_reload_conf();
```
For pooled connections, configure at the pooler level:
```ini
# pgbouncer.ini
server_idle_timeout = 60
client_idle_timeout = 300
```
Reference: [Connection Timeouts](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT)

View File

@@ -0,0 +1,44 @@
---
title: Set Appropriate Connection Limits
impact: CRITICAL
impactDescription: Prevent database crashes and memory exhaustion
tags: connections, max-connections, limits, stability
---
## Set Appropriate Connection Limits
Too many connections exhaust memory and degrade performance. Set limits based on available resources.
**Incorrect (unlimited or excessive connections):**
```sql
-- Default max_connections = 100, but often increased blindly
show max_connections; -- 500 (way too high for 4GB RAM)
-- Each connection uses 1-3MB RAM
-- 500 connections * 2MB = 1GB just for connections!
-- Out of memory errors under load
```
**Correct (calculate based on resources):**
```sql
-- Formula: max_connections = (RAM in MB / 5MB per connection) - reserved
-- For 4GB RAM: (4096 / 5) - 10 = ~800 theoretical max
-- But practically, 100-200 is better for query performance
-- Recommended settings for 4GB RAM
alter system set max_connections = 100;
-- Also set work_mem appropriately
-- work_mem * max_connections should not exceed 25% of RAM
alter system set work_mem = '8MB'; -- 8MB * 100 = 800MB max
```
Monitor connection usage:
```sql
select count(*), state from pg_stat_activity group by state;
```
Reference: [Database Connections](https://supabase.com/docs/guides/platform/performance#connection-management)

View File

@@ -0,0 +1,41 @@
---
title: Use Connection Pooling for All Applications
impact: CRITICAL
impactDescription: Handle 10-100x more concurrent users
tags: connection-pooling, pgbouncer, performance, scalability
---
## Use Connection Pooling for All Applications
Postgres connections are expensive (1-3MB RAM each). Without pooling, applications exhaust connections under load.
**Incorrect (new connection per request):**
```sql
-- Each request creates a new connection
-- Application code: db.connect() per request
-- Result: 500 concurrent users = 500 connections = crashed database
-- Check current connections
select count(*) from pg_stat_activity; -- 487 connections!
```
**Correct (connection pooling):**
```sql
-- Use a pooler like PgBouncer between app and database
-- Application connects to pooler, pooler reuses a small pool to Postgres
-- Configure pool_size based on: (CPU cores * 2) + spindle_count
-- Example for 4 cores: pool_size = 10
-- Result: 500 concurrent users share 10 actual connections
select count(*) from pg_stat_activity; -- 10 connections
```
Pool modes:
- **Transaction mode**: connection returned after each transaction (best for most apps)
- **Session mode**: connection held for entire session (needed for prepared statements, temp tables)
Reference: [Connection Pooling](https://supabase.com/docs/guides/database/connecting-to-postgres#connection-pooler)

View File

@@ -0,0 +1,46 @@
---
title: Use Prepared Statements Correctly with Pooling
impact: HIGH
impactDescription: Avoid prepared statement conflicts in pooled environments
tags: prepared-statements, connection-pooling, transaction-mode
---
## Use Prepared Statements Correctly with Pooling
Prepared statements are tied to individual database connections. In transaction-mode pooling, connections are shared, causing conflicts.
**Incorrect (named prepared statements with transaction pooling):**
```sql
-- Named prepared statement
prepare get_user as select * from users where id = $1;
-- In transaction mode pooling, next request may get different connection
execute get_user(123);
-- ERROR: prepared statement "get_user" does not exist
```
**Correct (use unnamed statements or session mode):**
```sql
-- Option 1: Use unnamed prepared statements (most ORMs do this automatically)
-- The query is prepared and executed in a single protocol message
-- Option 2: Deallocate after use in transaction mode
prepare get_user as select * from users where id = $1;
execute get_user(123);
deallocate get_user;
-- Option 3: Use session mode pooling (port 5432 vs 6543)
-- Connection is held for entire session, prepared statements persist
```
Check your driver settings:
```sql
-- Many drivers use prepared statements by default
-- Node.js pg: { prepare: false } to disable
-- JDBC: prepareThreshold=0 to disable
```
Reference: [Prepared Statements with Pooling](https://supabase.com/docs/guides/database/connecting-to-postgres#connection-pool-modes)

View File

@@ -0,0 +1,54 @@
---
title: Batch INSERT Statements for Bulk Data
impact: MEDIUM
impactDescription: 10-50x faster bulk inserts
tags: batch, insert, bulk, performance, copy
---
## Batch INSERT Statements for Bulk Data
Individual INSERT statements have high overhead. Batch multiple rows in single statements or use COPY.
**Incorrect (individual inserts):**
```sql
-- Each insert is a separate transaction and round trip
insert into events (user_id, action) values (1, 'click');
insert into events (user_id, action) values (1, 'view');
insert into events (user_id, action) values (2, 'click');
-- ... 1000 more individual inserts
-- 1000 inserts = 1000 round trips = slow
```
**Correct (batch insert):**
```sql
-- Multiple rows in single statement
insert into events (user_id, action) values
(1, 'click'),
(1, 'view'),
(2, 'click'),
-- ... up to ~1000 rows per batch
(999, 'view');
-- One round trip for 1000 rows
```
For large imports, use COPY:
```sql
-- COPY is fastest for bulk loading
copy events (user_id, action, created_at)
from '/path/to/data.csv'
with (format csv, header true);
-- Or from stdin in application
copy events (user_id, action) from stdin with (format csv);
1,click
1,view
2,click
\.
```
Reference: [COPY](https://www.postgresql.org/docs/current/sql-copy.html)

View File

@@ -0,0 +1,53 @@
---
title: Eliminate N+1 Queries with Batch Loading
impact: MEDIUM-HIGH
impactDescription: 10-100x fewer database round trips
tags: n-plus-one, batch, performance, queries
---
## Eliminate N+1 Queries with Batch Loading
N+1 queries execute one query per item in a loop. Batch them into a single query using arrays or JOINs.
**Incorrect (N+1 queries):**
```sql
-- First query: get all users
select id from users where active = true; -- Returns 100 IDs
-- Then N queries, one per user
select * from orders where user_id = 1;
select * from orders where user_id = 2;
select * from orders where user_id = 3;
-- ... 97 more queries!
-- Total: 101 round trips to database
```
**Correct (single batch query):**
```sql
-- Collect IDs and query once with ANY
select * from orders where user_id = any(array[1, 2, 3, ...]);
-- Or use JOIN instead of loop
select u.id, u.name, o.*
from users u
left join orders o on o.user_id = u.id
where u.active = true;
-- Total: 1 round trip
```
Application pattern:
```sql
-- Instead of looping in application code:
-- for user in users: db.query("SELECT * FROM orders WHERE user_id = $1", user.id)
-- Pass array parameter:
select * from orders where user_id = any($1::bigint[]);
-- Application passes: [1, 2, 3, 4, 5, ...]
```
Reference: [N+1 Query Problem](https://supabase.com/docs/guides/database/query-optimization)

View File

@@ -0,0 +1,50 @@
---
title: Use Cursor-Based Pagination Instead of OFFSET
impact: MEDIUM-HIGH
impactDescription: Consistent O(1) performance regardless of page depth
tags: pagination, cursor, keyset, offset, performance
---
## Use Cursor-Based Pagination Instead of OFFSET
OFFSET-based pagination scans all skipped rows, getting slower on deeper pages. Cursor pagination is O(1).
**Incorrect (OFFSET pagination):**
```sql
-- Page 1: scans 20 rows
select * from products order by id limit 20 offset 0;
-- Page 100: scans 2000 rows to skip 1980
select * from products order by id limit 20 offset 1980;
-- Page 10000: scans 200,000 rows!
select * from products order by id limit 20 offset 199980;
```
**Correct (cursor/keyset pagination):**
```sql
-- Page 1: get first 20
select * from products order by id limit 20;
-- Application stores last_id = 20
-- Page 2: start after last ID
select * from products where id > 20 order by id limit 20;
-- Uses index, always fast regardless of page depth
-- Page 10000: same speed as page 1
select * from products where id > 199980 order by id limit 20;
```
For multi-column sorting:
```sql
-- Cursor must include all sort columns
select * from products
where (created_at, id) > ('2024-01-15 10:00:00', 12345)
order by created_at, id
limit 20;
```
Reference: [Pagination](https://supabase.com/docs/guides/database/pagination)

View File

@@ -0,0 +1,50 @@
---
title: Use UPSERT for Insert-or-Update Operations
impact: MEDIUM
impactDescription: Atomic operation, eliminates race conditions
tags: upsert, on-conflict, insert, update
---
## Use UPSERT for Insert-or-Update Operations
Using separate SELECT-then-INSERT/UPDATE creates race conditions. Use INSERT ... ON CONFLICT for atomic upserts.
**Incorrect (check-then-insert race condition):**
```sql
-- Race condition: two requests check simultaneously
select * from settings where user_id = 123 and key = 'theme';
-- Both find nothing
-- Both try to insert
insert into settings (user_id, key, value) values (123, 'theme', 'dark');
-- One succeeds, one fails with duplicate key error!
```
**Correct (atomic UPSERT):**
```sql
-- Single atomic operation
insert into settings (user_id, key, value)
values (123, 'theme', 'dark')
on conflict (user_id, key)
do update set value = excluded.value, updated_at = now();
-- Returns the inserted/updated row
insert into settings (user_id, key, value)
values (123, 'theme', 'dark')
on conflict (user_id, key)
do update set value = excluded.value
returning *;
```
Insert-or-ignore pattern:
```sql
-- Insert only if not exists (no update)
insert into page_views (page_id, user_id)
values (1, 123)
on conflict (page_id, user_id) do nothing;
```
Reference: [INSERT ON CONFLICT](https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT)

View File

@@ -0,0 +1,56 @@
---
title: Use Advisory Locks for Application-Level Locking
impact: MEDIUM
impactDescription: Efficient coordination without row-level lock overhead
tags: advisory-locks, coordination, application-locks
---
## Use Advisory Locks for Application-Level Locking
Advisory locks provide application-level coordination without requiring database rows to lock.
**Incorrect (creating rows just for locking):**
```sql
-- Creating dummy rows to lock on
create table resource_locks (
resource_name text primary key
);
insert into resource_locks values ('report_generator');
-- Lock by selecting the row
select * from resource_locks where resource_name = 'report_generator' for update;
```
**Correct (advisory locks):**
```sql
-- Session-level advisory lock (released on disconnect or unlock)
select pg_advisory_lock(hashtext('report_generator'));
-- ... do exclusive work ...
select pg_advisory_unlock(hashtext('report_generator'));
-- Transaction-level lock (released on commit/rollback)
begin;
select pg_advisory_xact_lock(hashtext('daily_report'));
-- ... do work ...
commit; -- Lock automatically released
```
Try-lock for non-blocking operations:
```sql
-- Returns immediately with true/false instead of waiting
select pg_try_advisory_lock(hashtext('resource_name'));
-- Use in application
if (acquired) {
-- Do work
select pg_advisory_unlock(hashtext('resource_name'));
} else {
-- Skip or retry later
}
```
Reference: [Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)

View File

@@ -0,0 +1,68 @@
---
title: Prevent Deadlocks with Consistent Lock Ordering
impact: MEDIUM-HIGH
impactDescription: Eliminate deadlock errors, improve reliability
tags: deadlocks, locking, transactions, ordering
---
## Prevent Deadlocks with Consistent Lock Ordering
Deadlocks occur when transactions lock resources in different orders. Always
acquire locks in a consistent order.
**Incorrect (inconsistent lock ordering):**
```sql
-- Transaction A -- Transaction B
begin; begin;
update accounts update accounts
set balance = balance - 100 set balance = balance - 50
where id = 1; where id = 2; -- B locks row 2
update accounts update accounts
set balance = balance + 100 set balance = balance + 50
where id = 2; -- A waits for B where id = 1; -- B waits for A
-- DEADLOCK! Both waiting for each other
```
**Correct (lock rows in consistent order first):**
```sql
-- Explicitly acquire locks in ID order before updating
begin;
select * from accounts where id in (1, 2) order by id for update;
-- Now perform updates in any order - locks already held
update accounts set balance = balance - 100 where id = 1;
update accounts set balance = balance + 100 where id = 2;
commit;
```
Alternative: use a single statement to update atomically:
```sql
-- Single statement acquires all locks atomically
begin;
update accounts
set balance = balance + case id
when 1 then -100
when 2 then 100
end
where id in (1, 2);
commit;
```
Detect deadlocks in logs:
```sql
-- Check for recent deadlocks
select * from pg_stat_database where deadlocks > 0;
-- Enable deadlock logging
set log_lock_waits = on;
set deadlock_timeout = '1s';
```
Reference:
[Deadlocks](https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-DEADLOCKS)

View File

@@ -0,0 +1,50 @@
---
title: Keep Transactions Short to Reduce Lock Contention
impact: MEDIUM-HIGH
impactDescription: 3-5x throughput improvement, fewer deadlocks
tags: transactions, locking, contention, performance
---
## Keep Transactions Short to Reduce Lock Contention
Long-running transactions hold locks that block other queries. Keep transactions as short as possible.
**Incorrect (long transaction with external calls):**
```sql
begin;
select * from orders where id = 1 for update; -- Lock acquired
-- Application makes HTTP call to payment API (2-5 seconds)
-- Other queries on this row are blocked!
update orders set status = 'paid' where id = 1;
commit; -- Lock held for entire duration
```
**Correct (minimal transaction scope):**
```sql
-- Validate data and call APIs outside transaction
-- Application: response = await paymentAPI.charge(...)
-- Only hold lock for the actual update
begin;
update orders
set status = 'paid', payment_id = $1
where id = $2 and status = 'pending'
returning *;
commit; -- Lock held for milliseconds
```
Use `statement_timeout` to prevent runaway transactions:
```sql
-- Abort queries running longer than 30 seconds
set statement_timeout = '30s';
-- Or per-session
set local statement_timeout = '5s';
```
Reference: [Transaction Management](https://www.postgresql.org/docs/current/tutorial-transactions.html)

View File

@@ -0,0 +1,54 @@
---
title: Use SKIP LOCKED for Non-Blocking Queue Processing
impact: MEDIUM-HIGH
impactDescription: 10x throughput for worker queues
tags: skip-locked, queue, workers, concurrency
---
## Use SKIP LOCKED for Non-Blocking Queue Processing
When multiple workers process a queue, SKIP LOCKED allows workers to process different rows without waiting.
**Incorrect (workers block each other):**
```sql
-- Worker 1 and Worker 2 both try to get next job
begin;
select * from jobs where status = 'pending' order by created_at limit 1 for update;
-- Worker 2 waits for Worker 1's lock to release!
```
**Correct (SKIP LOCKED for parallel processing):**
```sql
-- Each worker skips locked rows and gets the next available
begin;
select * from jobs
where status = 'pending'
order by created_at
limit 1
for update skip locked;
-- Worker 1 gets job 1, Worker 2 gets job 2 (no waiting)
update jobs set status = 'processing' where id = $1;
commit;
```
Complete queue pattern:
```sql
-- Atomic claim-and-update in one statement
update jobs
set status = 'processing', worker_id = $1, started_at = now()
where id = (
select id from jobs
where status = 'pending'
order by created_at
limit 1
for update skip locked
)
returning *;
```
Reference: [SELECT FOR UPDATE SKIP LOCKED](https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE)

View File

@@ -0,0 +1,45 @@
---
title: Use EXPLAIN ANALYZE to Diagnose Slow Queries
impact: LOW-MEDIUM
impactDescription: Identify exact bottlenecks in query execution
tags: explain, analyze, diagnostics, query-plan
---
## Use EXPLAIN ANALYZE to Diagnose Slow Queries
EXPLAIN ANALYZE executes the query and shows actual timings, revealing the true performance bottlenecks.
**Incorrect (guessing at performance issues):**
```sql
-- Query is slow, but why?
select * from orders where customer_id = 123 and status = 'pending';
-- "It must be missing an index" - but which one?
```
**Correct (use EXPLAIN ANALYZE):**
```sql
explain (analyze, buffers, format text)
select * from orders where customer_id = 123 and status = 'pending';
-- Output reveals the issue:
-- Seq Scan on orders (cost=0.00..25000.00 rows=50 width=100) (actual time=0.015..450.123 rows=50 loops=1)
-- Filter: ((customer_id = 123) AND (status = 'pending'::text))
-- Rows Removed by Filter: 999950
-- Buffers: shared hit=5000 read=15000
-- Planning Time: 0.150 ms
-- Execution Time: 450.500 ms
```
Key things to look for:
```sql
-- Seq Scan on large tables = missing index
-- Rows Removed by Filter = poor selectivity or missing index
-- Buffers: read >> hit = data not cached, needs more memory
-- Nested Loop with high loops = consider different join strategy
-- Sort Method: external merge = work_mem too low
```
Reference: [EXPLAIN](https://supabase.com/docs/guides/database/inspect)

View File

@@ -0,0 +1,55 @@
---
title: Enable pg_stat_statements for Query Analysis
impact: LOW-MEDIUM
impactDescription: Identify top resource-consuming queries
tags: pg-stat-statements, monitoring, statistics, performance
---
## Enable pg_stat_statements for Query Analysis
pg_stat_statements tracks execution statistics for all queries, helping identify slow and frequent queries.
**Incorrect (no visibility into query patterns):**
```sql
-- Database is slow, but which queries are the problem?
-- No way to know without pg_stat_statements
```
**Correct (enable and query pg_stat_statements):**
```sql
-- Enable the extension
create extension if not exists pg_stat_statements;
-- Find slowest queries by total time
select
calls,
round(total_exec_time::numeric, 2) as total_time_ms,
round(mean_exec_time::numeric, 2) as mean_time_ms,
query
from pg_stat_statements
order by total_exec_time desc
limit 10;
-- Find most frequent queries
select calls, query
from pg_stat_statements
order by calls desc
limit 10;
-- Reset statistics after optimization
select pg_stat_statements_reset();
```
Key metrics to monitor:
```sql
-- Queries with high mean time (candidates for optimization)
select query, mean_exec_time, calls
from pg_stat_statements
where mean_exec_time > 100 -- > 100ms average
order by mean_exec_time desc;
```
Reference: [pg_stat_statements](https://supabase.com/docs/guides/database/extensions/pg_stat_statements)

View File

@@ -0,0 +1,55 @@
---
title: Maintain Table Statistics with VACUUM and ANALYZE
impact: MEDIUM
impactDescription: 2-10x better query plans with accurate statistics
tags: vacuum, analyze, statistics, maintenance, autovacuum
---
## Maintain Table Statistics with VACUUM and ANALYZE
Outdated statistics cause the query planner to make poor decisions. VACUUM reclaims space, ANALYZE updates statistics.
**Incorrect (stale statistics):**
```sql
-- Table has 1M rows but stats say 1000
-- Query planner chooses wrong strategy
explain select * from orders where status = 'pending';
-- Shows: Seq Scan (because stats show small table)
-- Actually: Index Scan would be much faster
```
**Correct (maintain fresh statistics):**
```sql
-- Manually analyze after large data changes
analyze orders;
-- Analyze specific columns used in WHERE clauses
analyze orders (status, created_at);
-- Check when tables were last analyzed
select
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
from pg_stat_user_tables
order by last_analyze nulls first;
```
Autovacuum tuning for busy tables:
```sql
-- Increase frequency for high-churn tables
alter table orders set (
autovacuum_vacuum_scale_factor = 0.05, -- Vacuum at 5% dead tuples (default 20%)
autovacuum_analyze_scale_factor = 0.02 -- Analyze at 2% changes (default 10%)
);
-- Check autovacuum status
select * from pg_stat_progress_vacuum;
```
Reference: [VACUUM](https://supabase.com/docs/guides/database/database-size#vacuum-operations)

View File

@@ -0,0 +1,44 @@
---
title: Create Composite Indexes for Multi-Column Queries
impact: HIGH
impactDescription: 5-10x faster multi-column queries
tags: indexes, composite-index, multi-column, query-optimization
---
## Create Composite Indexes for Multi-Column Queries
When queries filter on multiple columns, a composite index is more efficient than separate single-column indexes.
**Incorrect (separate indexes require bitmap scan):**
```sql
-- Two separate indexes
create index orders_status_idx on orders (status);
create index orders_created_idx on orders (created_at);
-- Query must combine both indexes (slower)
select * from orders where status = 'pending' and created_at > '2024-01-01';
```
**Correct (composite index):**
```sql
-- Single composite index (leftmost column first for equality checks)
create index orders_status_created_idx on orders (status, created_at);
-- Query uses one efficient index scan
select * from orders where status = 'pending' and created_at > '2024-01-01';
```
**Column order matters** - place equality columns first, range columns last:
```sql
-- Good: status (=) before created_at (>)
create index idx on orders (status, created_at);
-- Works for: WHERE status = 'pending'
-- Works for: WHERE status = 'pending' AND created_at > '2024-01-01'
-- Does NOT work for: WHERE created_at > '2024-01-01' (leftmost prefix rule)
```
Reference: [Multicolumn Indexes](https://www.postgresql.org/docs/current/indexes-multicolumn.html)

View File

@@ -0,0 +1,40 @@
---
title: Use Covering Indexes to Avoid Table Lookups
impact: MEDIUM-HIGH
impactDescription: 2-5x faster queries by eliminating heap fetches
tags: indexes, covering-index, include, index-only-scan
---
## Use Covering Indexes to Avoid Table Lookups
Covering indexes include all columns needed by a query, enabling index-only scans that skip the table entirely.
**Incorrect (index scan + heap fetch):**
```sql
create index users_email_idx on users (email);
-- Must fetch name and created_at from table heap
select email, name, created_at from users where email = 'user@example.com';
```
**Correct (index-only scan with INCLUDE):**
```sql
-- Include non-searchable columns in the index
create index users_email_idx on users (email) include (name, created_at);
-- All columns served from index, no table access needed
select email, name, created_at from users where email = 'user@example.com';
```
Use INCLUDE for columns you SELECT but don't filter on:
```sql
-- Searching by status, but also need customer_id and total
create index orders_status_idx on orders (status) include (customer_id, total);
select status, customer_id, total from orders where status = 'shipped';
```
Reference: [Index-Only Scans](https://www.postgresql.org/docs/current/indexes-index-only-scans.html)

View File

@@ -0,0 +1,45 @@
---
title: Choose the Right Index Type for Your Data
impact: HIGH
impactDescription: 10-100x improvement with correct index type
tags: indexes, btree, gin, brin, hash, index-types
---
## Choose the Right Index Type for Your Data
Different index types excel at different query patterns. The default B-tree isn't always optimal.
**Incorrect (B-tree for JSONB containment):**
```sql
-- B-tree cannot optimize containment operators
create index products_attrs_idx on products (attributes);
select * from products where attributes @> '{"color": "red"}';
-- Full table scan - B-tree doesn't support @> operator
```
**Correct (GIN for JSONB):**
```sql
-- GIN supports @>, ?, ?&, ?| operators
create index products_attrs_idx on products using gin (attributes);
select * from products where attributes @> '{"color": "red"}';
```
Index type guide:
```sql
-- B-tree (default): =, <, >, BETWEEN, IN, IS NULL
create index users_created_idx on users (created_at);
-- GIN: arrays, JSONB, full-text search
create index posts_tags_idx on posts using gin (tags);
-- BRIN: large time-series tables (10-100x smaller)
create index events_time_idx on events using brin (created_at);
-- Hash: equality-only (slightly faster than B-tree for =)
create index sessions_token_idx on sessions using hash (token);
```
Reference: [Index Types](https://www.postgresql.org/docs/current/indexes-types.html)

View File

@@ -0,0 +1,43 @@
---
title: Add Indexes on WHERE and JOIN Columns
impact: CRITICAL
impactDescription: 100-1000x faster queries on large tables
tags: indexes, performance, sequential-scan, query-optimization
---
## Add Indexes on WHERE and JOIN Columns
Queries filtering or joining on unindexed columns cause full table scans, which become exponentially slower as tables grow.
**Incorrect (sequential scan on large table):**
```sql
-- No index on customer_id causes full table scan
select * from orders where customer_id = 123;
-- EXPLAIN shows: Seq Scan on orders (cost=0.00..25000.00 rows=100 width=85)
```
**Correct (index scan):**
```sql
-- Create index on frequently filtered column
create index orders_customer_id_idx on orders (customer_id);
select * from orders where customer_id = 123;
-- EXPLAIN shows: Index Scan using orders_customer_id_idx (cost=0.42..8.44 rows=100 width=85)
```
For JOIN columns, always index the foreign key side:
```sql
-- Index the referencing column
create index orders_customer_id_idx on orders (customer_id);
select c.name, o.total
from customers c
join orders o on o.customer_id = c.id;
```
Reference: [Query Optimization](https://supabase.com/docs/guides/database/query-optimization)

View File

@@ -0,0 +1,45 @@
---
title: Use Partial Indexes for Filtered Queries
impact: HIGH
impactDescription: 5-20x smaller indexes, faster writes and queries
tags: indexes, partial-index, query-optimization, storage
---
## Use Partial Indexes for Filtered Queries
Partial indexes only include rows matching a WHERE condition, making them smaller and faster when queries consistently filter on the same condition.
**Incorrect (full index includes irrelevant rows):**
```sql
-- Index includes all rows, even soft-deleted ones
create index users_email_idx on users (email);
-- Query always filters active users
select * from users where email = 'user@example.com' and deleted_at is null;
```
**Correct (partial index matches query filter):**
```sql
-- Index only includes active users
create index users_active_email_idx on users (email)
where deleted_at is null;
-- Query uses the smaller, faster index
select * from users where email = 'user@example.com' and deleted_at is null;
```
Common use cases for partial indexes:
```sql
-- Only pending orders (status rarely changes once completed)
create index orders_pending_idx on orders (created_at)
where status = 'pending';
-- Only non-null values
create index products_sku_idx on products (sku)
where sku is not null;
```
Reference: [Partial Indexes](https://www.postgresql.org/docs/current/indexes-partial.html)

View File

@@ -0,0 +1,46 @@
---
title: Choose Appropriate Data Types
impact: HIGH
impactDescription: 50% storage reduction, faster comparisons
tags: data-types, schema, storage, performance
---
## Choose Appropriate Data Types
Using the right data types reduces storage, improves query performance, and prevents bugs.
**Incorrect (wrong data types):**
```sql
create table users (
id int, -- Will overflow at 2.1 billion
email varchar(255), -- Unnecessary length limit
created_at timestamp, -- Missing timezone info
is_active varchar(5), -- String for boolean
price varchar(20) -- String for numeric
);
```
**Correct (appropriate data types):**
```sql
create table users (
id bigint generated always as identity primary key, -- 9 quintillion max
email text, -- No artificial limit, same performance as varchar
created_at timestamptz, -- Always store timezone-aware timestamps
is_active boolean default true, -- 1 byte vs variable string length
price numeric(10,2) -- Exact decimal arithmetic
);
```
Key guidelines:
```sql
-- IDs: use bigint, not int (future-proofing)
-- Strings: use text, not varchar(n) unless constraint needed
-- Time: use timestamptz, not timestamp
-- Money: use numeric, not float (precision matters)
-- Enums: use text with check constraint or create enum type
```
Reference: [Data Types](https://www.postgresql.org/docs/current/datatype.html)

View File

@@ -0,0 +1,59 @@
---
title: Index Foreign Key Columns
impact: HIGH
impactDescription: 10-100x faster JOINs and CASCADE operations
tags: foreign-key, indexes, joins, schema
---
## Index Foreign Key Columns
Postgres does not automatically index foreign key columns. Missing indexes cause slow JOINs and CASCADE operations.
**Incorrect (unindexed foreign key):**
```sql
create table orders (
id bigint generated always as identity primary key,
customer_id bigint references customers(id) on delete cascade,
total numeric(10,2)
);
-- No index on customer_id!
-- JOINs and ON DELETE CASCADE both require full table scan
select * from orders where customer_id = 123; -- Seq Scan
delete from customers where id = 123; -- Locks table, scans all orders
```
**Correct (indexed foreign key):**
```sql
create table orders (
id bigint generated always as identity primary key,
customer_id bigint references customers(id) on delete cascade,
total numeric(10,2)
);
-- Always index the FK column
create index orders_customer_id_idx on orders (customer_id);
-- Now JOINs and cascades are fast
select * from orders where customer_id = 123; -- Index Scan
delete from customers where id = 123; -- Uses index, fast cascade
```
Find missing FK indexes:
```sql
select
conrelid::regclass as table_name,
a.attname as fk_column
from pg_constraint c
join pg_attribute a on a.attrelid = c.conrelid and a.attnum = any(c.conkey)
where c.contype = 'f'
and not exists (
select 1 from pg_index i
where i.indrelid = c.conrelid and a.attnum = any(i.indkey)
);
```
Reference: [Foreign Keys](https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK)

View File

@@ -0,0 +1,55 @@
---
title: Use Lowercase Identifiers for Compatibility
impact: MEDIUM
impactDescription: Avoid case-sensitivity bugs with tools, ORMs, and AI assistants
tags: naming, identifiers, case-sensitivity, schema, conventions
---
## Use Lowercase Identifiers for Compatibility
PostgreSQL folds unquoted identifiers to lowercase. Quoted mixed-case identifiers require quotes forever and cause issues with tools, ORMs, and AI assistants that may not recognize them.
**Incorrect (mixed-case identifiers):**
```sql
-- Quoted identifiers preserve case but require quotes everywhere
CREATE TABLE "Users" (
"userId" bigint PRIMARY KEY,
"firstName" text,
"lastName" text
);
-- Must always quote or queries fail
SELECT "firstName" FROM "Users" WHERE "userId" = 1;
-- This fails - Users becomes users without quotes
SELECT firstName FROM Users;
-- ERROR: relation "users" does not exist
```
**Correct (lowercase snake_case):**
```sql
-- Unquoted lowercase identifiers are portable and tool-friendly
CREATE TABLE users (
user_id bigint PRIMARY KEY,
first_name text,
last_name text
);
-- Works without quotes, recognized by all tools
SELECT first_name FROM users WHERE user_id = 1;
```
Common sources of mixed-case identifiers:
```sql
-- ORMs often generate quoted camelCase - configure them to use snake_case
-- Migrations from other databases may preserve original casing
-- Some GUI tools quote identifiers by default - disable this
-- If stuck with mixed-case, create views as a compatibility layer
CREATE VIEW users AS SELECT "userId" AS user_id, "firstName" AS first_name FROM "Users";
```
Reference: [Identifiers and Key Words](https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS)

View File

@@ -0,0 +1,55 @@
---
title: Partition Large Tables for Better Performance
impact: MEDIUM-HIGH
impactDescription: 5-20x faster queries and maintenance on large tables
tags: partitioning, large-tables, time-series, performance
---
## Partition Large Tables for Better Performance
Partitioning splits a large table into smaller pieces, improving query performance and maintenance operations.
**Incorrect (single large table):**
```sql
create table events (
id bigint generated always as identity,
created_at timestamptz,
data jsonb
);
-- 500M rows, queries scan everything
select * from events where created_at > '2024-01-01'; -- Slow
vacuum events; -- Takes hours, locks table
```
**Correct (partitioned by time range):**
```sql
create table events (
id bigint generated always as identity,
created_at timestamptz not null,
data jsonb
) partition by range (created_at);
-- Create partitions for each month
create table events_2024_01 partition of events
for values from ('2024-01-01') to ('2024-02-01');
create table events_2024_02 partition of events
for values from ('2024-02-01') to ('2024-03-01');
-- Queries only scan relevant partitions
select * from events where created_at > '2024-01-15'; -- Only scans events_2024_01+
-- Drop old data instantly
drop table events_2023_01; -- Instant vs DELETE taking hours
```
When to partition:
- Tables > 100M rows
- Time-series data with date-based queries
- Need to efficiently drop old data
Reference: [Table Partitioning](https://www.postgresql.org/docs/current/ddl-partitioning.html)

View File

@@ -0,0 +1,61 @@
---
title: Select Optimal Primary Key Strategy
impact: HIGH
impactDescription: Better index locality, reduced fragmentation
tags: primary-key, identity, uuid, serial, schema
---
## Select Optimal Primary Key Strategy
Primary key choice affects insert performance, index size, and replication
efficiency.
**Incorrect (problematic PK choices):**
```sql
-- identity is the SQL-standard approach
create table users (
id serial primary key -- Works, but IDENTITY is recommended
);
-- Random UUIDs (v4) cause index fragmentation
create table orders (
id uuid default gen_random_uuid() primary key -- UUIDv4 = random = scattered inserts
);
```
**Correct (optimal PK strategies):**
```sql
-- Use IDENTITY for sequential IDs (SQL-standard, best for most cases)
create table users (
id bigint generated always as identity primary key
);
-- For distributed systems needing UUIDs, use UUIDv7 (time-ordered)
-- Requires pg_uuidv7 extension: create extension pg_uuidv7;
create table orders (
id uuid default uuid_generate_v7() primary key -- Time-ordered, no fragmentation
);
-- Alternative: time-prefixed IDs for sortable, distributed IDs (no extension needed)
create table events (
id text default concat(
to_char(now() at time zone 'utc', 'YYYYMMDDHH24MISSMS'),
gen_random_uuid()::text
) primary key
);
```
Guidelines:
- Single database: `bigint identity` (sequential, 8 bytes, SQL-standard)
- Distributed/exposed IDs: UUIDv7 (requires pg_uuidv7) or ULID (time-ordered, no
fragmentation)
- `serial` works but `identity` is SQL-standard and preferred for new
applications
- Avoid random UUIDs (v4) as primary keys on large tables (causes index
fragmentation)
Reference:
[Identity Columns](https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-PARMS-GENERATED-IDENTITY)

View File

@@ -0,0 +1,54 @@
---
title: Apply Principle of Least Privilege
impact: MEDIUM
impactDescription: Reduced attack surface, better audit trail
tags: privileges, security, roles, permissions
---
## Apply Principle of Least Privilege
Grant only the minimum permissions required. Never use superuser for application queries.
**Incorrect (overly broad permissions):**
```sql
-- Application uses superuser connection
-- Or grants ALL to application role
grant all privileges on all tables in schema public to app_user;
grant all privileges on all sequences in schema public to app_user;
-- Any SQL injection becomes catastrophic
-- drop table users; cascades to everything
```
**Correct (minimal, specific grants):**
```sql
-- Create role with no default privileges
create role app_readonly nologin;
-- Grant only SELECT on specific tables
grant usage on schema public to app_readonly;
grant select on public.products, public.categories to app_readonly;
-- Create role for writes with limited scope
create role app_writer nologin;
grant usage on schema public to app_writer;
grant select, insert, update on public.orders to app_writer;
grant usage on sequence orders_id_seq to app_writer;
-- No DELETE permission
-- Login role inherits from these
create role app_user login password 'xxx';
grant app_writer to app_user;
```
Revoke public defaults:
```sql
-- Revoke default public access
revoke all on schema public from public;
revoke all on all tables in schema public from public;
```
Reference: [Roles and Privileges](https://supabase.com/blog/postgres-roles-and-privileges)

View File

@@ -0,0 +1,50 @@
---
title: Enable Row Level Security for Multi-Tenant Data
impact: CRITICAL
impactDescription: Database-enforced tenant isolation, prevent data leaks
tags: rls, row-level-security, multi-tenant, security
---
## Enable Row Level Security for Multi-Tenant Data
Row Level Security (RLS) enforces data access at the database level, ensuring users only see their own data.
**Incorrect (application-level filtering only):**
```sql
-- Relying only on application to filter
select * from orders where user_id = $current_user_id;
-- Bug or bypass means all data is exposed!
select * from orders; -- Returns ALL orders
```
**Correct (database-enforced RLS):**
```sql
-- Enable RLS on the table
alter table orders enable row level security;
-- Create policy for users to see only their orders
create policy orders_user_policy on orders
for all
using (user_id = current_setting('app.current_user_id')::bigint);
-- Force RLS even for table owners
alter table orders force row level security;
-- Set user context and query
set app.current_user_id = '123';
select * from orders; -- Only returns orders for user 123
```
Policy for authenticated role:
```sql
create policy orders_user_policy on orders
for all
to authenticated
using (user_id = auth.uid());
```
Reference: [Row Level Security](https://supabase.com/docs/guides/database/postgres/row-level-security)

View File

@@ -0,0 +1,57 @@
---
title: Optimize RLS Policies for Performance
impact: HIGH
impactDescription: 5-10x faster RLS queries with proper patterns
tags: rls, performance, security, optimization
---
## Optimize RLS Policies for Performance
Poorly written RLS policies can cause severe performance issues. Use subqueries and indexes strategically.
**Incorrect (function called for every row):**
```sql
create policy orders_policy on orders
using (auth.uid() = user_id); -- auth.uid() called per row!
-- With 1M rows, auth.uid() is called 1M times
```
**Correct (wrap functions in SELECT):**
```sql
create policy orders_policy on orders
using ((select auth.uid()) = user_id); -- Called once, cached
-- 100x+ faster on large tables
```
Use security definer functions for complex checks:
```sql
-- Create helper function (runs as definer, bypasses RLS)
create or replace function is_team_member(team_id bigint)
returns boolean
language sql
security definer
set search_path = ''
as $$
select exists (
select 1 from public.team_members
where team_id = $1 and user_id = (select auth.uid())
);
$$;
-- Use in policy (indexed lookup, not per-row check)
create policy team_orders_policy on orders
using ((select is_team_member(team_id)));
```
Always add indexes on columns used in RLS policies:
```sql
create index orders_user_id_idx on orders (user_id);
```
Reference: [RLS Performance](https://supabase.com/docs/guides/database/postgres/row-level-security#rls-performance-recommendations)

View File

@@ -0,0 +1,171 @@
---
name: sql-pro
description: Master modern SQL with cloud-native databases, OLTP/OLAP optimization, and advanced query techniques. Expert in performance tuning, data modeling, and hybrid analytical systems.
risk: unknown
source: community
date_added: '2026-02-27'
---
You are an expert SQL specialist mastering modern database systems, performance optimization, and advanced analytical techniques across cloud-native and hybrid OLTP/OLAP environments.
## Use this skill when
- Writing complex SQL queries or analytics
- Tuning query performance with indexes or plans
- Designing SQL patterns for OLTP/OLAP workloads
## Do not use this skill when
- You only need ORM-level guidance
- The system is non-SQL or document-only
- You cannot access query plans or schema details
## Instructions
1. Define query goals, constraints, and expected outputs.
2. Inspect schema, statistics, and access paths.
3. Optimize queries and validate with EXPLAIN.
4. Verify correctness and performance under load.
## Safety
- Avoid heavy queries on production without safeguards.
- Use read replicas or limits for exploratory analysis.
## Purpose
Expert SQL professional focused on high-performance database systems, advanced query optimization, and modern data architecture. Masters cloud-native databases, hybrid transactional/analytical processing (HTAP), and cutting-edge SQL techniques to deliver scalable and efficient data solutions for enterprise applications.
## Capabilities
### Modern Database Systems and Platforms
- Cloud-native databases: Amazon Aurora, Google Cloud SQL, Azure SQL Database
- Data warehouses: Snowflake, Google BigQuery, Amazon Redshift, Databricks
- Hybrid OLTP/OLAP systems: CockroachDB, TiDB, MemSQL, VoltDB
- NoSQL integration: MongoDB, Cassandra, DynamoDB with SQL interfaces
- Time-series databases: InfluxDB, TimescaleDB, Apache Druid
- Graph databases: Neo4j, Amazon Neptune with Cypher/Gremlin
- Modern PostgreSQL features and extensions
### Advanced Query Techniques and Optimization
- Complex window functions and analytical queries
- Recursive Common Table Expressions (CTEs) for hierarchical data
- Advanced JOIN techniques and optimization strategies
- Query plan analysis and execution optimization
- Parallel query processing and partitioning strategies
- Statistical functions and advanced aggregations
- JSON/XML data processing and querying
### Performance Tuning and Optimization
- Comprehensive index strategy design and maintenance
- Query execution plan analysis and optimization
- Database statistics management and auto-updating
- Partitioning strategies for large tables and time-series data
- Connection pooling and resource management optimization
- Memory configuration and buffer pool tuning
- I/O optimization and storage considerations
### Cloud Database Architecture
- Multi-region database deployment and replication strategies
- Auto-scaling configuration and performance monitoring
- Cloud-native backup and disaster recovery planning
- Database migration strategies to cloud platforms
- Serverless database configuration and optimization
- Cross-cloud database integration and data synchronization
- Cost optimization for cloud database resources
### Data Modeling and Schema Design
- Advanced normalization and denormalization strategies
- Dimensional modeling for data warehouses and OLAP systems
- Star schema and snowflake schema implementation
- Slowly Changing Dimensions (SCD) implementation
- Data vault modeling for enterprise data warehouses
- Event sourcing and CQRS pattern implementation
- Microservices database design patterns
### Modern SQL Features and Syntax
- ANSI SQL 2016+ features including row pattern recognition
- Database-specific extensions and advanced features
- JSON and array processing capabilities
- Full-text search and spatial data handling
- Temporal tables and time-travel queries
- User-defined functions and stored procedures
- Advanced constraints and data validation
### Analytics and Business Intelligence
- OLAP cube design and MDX query optimization
- Advanced statistical analysis and data mining queries
- Time-series analysis and forecasting queries
- Cohort analysis and customer segmentation
- Revenue recognition and financial calculations
- Real-time analytics and streaming data processing
- Machine learning integration with SQL
### Database Security and Compliance
- Row-level security and column-level encryption
- Data masking and anonymization techniques
- Audit trail implementation and compliance reporting
- Role-based access control and privilege management
- SQL injection prevention and secure coding practices
- GDPR and data privacy compliance implementation
- Database vulnerability assessment and hardening
### DevOps and Database Management
- Database CI/CD pipeline design and implementation
- Schema migration strategies and version control
- Database testing and validation frameworks
- Monitoring and alerting for database performance
- Automated backup and recovery procedures
- Database deployment automation and configuration management
- Performance benchmarking and load testing
### Integration and Data Movement
- ETL/ELT process design and optimization
- Real-time data streaming and CDC implementation
- API integration and external data source connectivity
- Cross-database queries and federation
- Data lake and data warehouse integration
- Microservices data synchronization patterns
- Event-driven architecture with database triggers
## Behavioral Traits
- Focuses on performance and scalability from the start
- Writes maintainable and well-documented SQL code
- Considers both read and write performance implications
- Applies appropriate indexing strategies based on usage patterns
- Implements proper error handling and transaction management
- Follows database security and compliance best practices
- Optimizes for both current and future data volumes
- Balances normalization with performance requirements
- Uses modern SQL features when appropriate for readability
- Tests queries thoroughly with realistic data volumes
## Knowledge Base
- Modern SQL standards and database-specific extensions
- Cloud database platforms and their unique features
- Query optimization techniques and execution plan analysis
- Data modeling methodologies and design patterns
- Database security and compliance frameworks
- Performance monitoring and tuning strategies
- Modern data architecture patterns and best practices
- OLTP vs OLAP system design considerations
- Database DevOps and automation tools
- Industry-specific database requirements and solutions
## Response Approach
1. **Analyze requirements** and identify optimal database approach
2. **Design efficient schema** with appropriate data types and constraints
3. **Write optimized queries** using modern SQL techniques
4. **Implement proper indexing** based on usage patterns
5. **Test performance** with realistic data volumes
6. **Document assumptions** and provide maintenance guidelines
7. **Consider scalability** for future data growth
8. **Validate security** and compliance requirements
## Example Interactions
- "Optimize this complex analytical query for a billion-row table in Snowflake"
- "Design a database schema for a multi-tenant SaaS application with GDPR compliance"
- "Create a real-time dashboard query that updates every second with minimal latency"
- "Implement a data migration strategy from Oracle to cloud-native PostgreSQL"
- "Build a cohort analysis query to track customer retention over time"
- "Design an HTAP system that handles both transactions and analytics efficiently"
- "Create a time-series analysis query for IoT sensor data in TimescaleDB"
- "Optimize database performance for a high-traffic e-commerce platform"