Using the **performance-profiler** skill to give you a systematic plan. --- ## Systematic Profiling Plan for GET /api/users Your symptoms (200ms → 3s at 500 concurrent users) point to either database saturation, connection pool exhaustion, or N+1 queries. Here's the exact sequence to find out. ### Phase 1: Establish Baseline ```bash # Install tools npm install -g clinic autocannon k6 # Baseline load test — record P50/P95/P99 before touching anything k6 run --vus 500 --duration 60s -e BASE_URL=http://localhost:3000 - <<'EOF' import http from 'k6/http' import { check } from 'k6' export const options = { thresholds: { http_req_duration: ['p(95)<500', 'p(99)<1000'], }, } export default function() { const res = http.get(`${__ENV.BASE_URL}/api/users`) check(res, { 'status 200': (r) => r.status === 200 }) } EOF ``` Record the output: P50, P95, P99, RPS, error rate. ### Phase 2: Find the Bottleneck Layer Run these three checks in parallel — each eliminates a different layer: **2a. Database — is the query itself slow?** ```sql -- Enable pg_stat_statements if not already CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Find the slow queries SELECT round(mean_exec_time::numeric, 2) AS mean_ms, calls, round(total_exec_time::numeric, 2) AS total_ms, left(query, 100) AS query FROM pg_stat_statements WHERE calls > 10 ORDER BY mean_exec_time DESC LIMIT 10; -- Then EXPLAIN the /api/users query specifically EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM users WHERE deleted_at IS NULL ORDER BY created_at DESC LIMIT 20; ``` Look for: `Seq Scan` on large tables (needs index), high `actual time`. **2b. N+1 detection — is Node making too many queries per request?** Add temporary query counting: ```javascript // Add to your DB client temporarily let queryCount = 0; pool.on('query', () => queryCount++); app.use((req, res, next) => { queryCount = 0; res.on('finish', () => { if (queryCount > 3) { console.warn(`⚠️ ${req.method} ${req.url}: ${queryCount} queries`); } }); next(); }); ``` If you see 20+ queries for one GET /api/users, you have an N+1. **2c. Connection pool — is it exhausted under load?** ```javascript // Log pool stats every 5s setInterval(() => { console.log({ total: pool.totalCount, idle: pool.idleCount, waiting: pool.waitingCount, // > 0 means pool exhaustion }); }, 5000); ``` If `waitingCount` climbs under load, increase `max` in your pool config (default is often just 10). ### Phase 3: CPU Flamegraph ```bash # Run under clinic flame while applying load autocannon -c 500 -d 30 http://localhost:3000/api/users & clinic flame -- node dist/server.js ``` This opens a flamegraph in your browser. Look for wide bars — those are where CPU time is spent. Common culprits: JSON serialization of massive result sets, synchronous operations blocking the event loop. ### Phase 4: Event Loop Blocking ```bash npm install blocked-at ``` ```javascript import blocked from 'blocked-at'; blocked((time, stack) => { console.warn(`Event loop blocked for ${time}ms`, stack[0]); }, { threshold: 50 }); ``` ### Phase 5: Fix and Re-measure Based on what Phase 2-4 reveals, the fix is usually one of: | Root Cause | Fix | |---|---| | Seq Scan | `CREATE INDEX idx_users_created ON users(created_at DESC) WHERE deleted_at IS NULL` | | N+1 queries | Replace loop with JOIN or batch query | | Pool exhaustion | Increase `max` to 50-100, add `connectionTimeoutMillis` | | No pagination | Add `LIMIT`/`OFFSET` or cursor pagination | | SELECT * | Select only needed columns | | No caching | Add `Cache-Control` headers or Redis for hot data | After fixing, re-run the exact same k6 baseline test and fill in the before/after table: ```markdown | Metric | Before | After | Delta | |--------|--------|-------|-------| | P50 | ?ms | ?ms | ?% | | P95 | ?ms | ?ms | ?% | | P99 | ?ms | ?ms | ?% | | RPS | ? | ? | ?% | ``` ### Most Likely Culprit Given the pattern (fine at low load, degrades at 500 concurrent), my top suspects in order: 1. **Connection pool exhaustion** — default pool of 10 connections serving 500 users 2. **N+1 queries** — amplifies the pool problem (20 queries/request × 500 users = 10,000 queries competing for 10 connections) 3. **Missing index** — Seq Scan gets worse as table grows Start with Phase 2a-2c. You'll likely find your answer there without needing the flamegraph.