Files
claude-skills-reference/product-team/experiment-designer/references/experiment-playbook.md

71 lines
1.9 KiB
Markdown

# Experiment Playbook
## Experiment Types
### A/B Test
- Compare one control versus one variant.
- Best for high-confidence directional decisions.
### Multivariate Test
- Test combinations of multiple factors.
- Useful for interaction effects, requires larger traffic.
### Holdout Test
- Keep a percentage unexposed to intervention.
- Useful for measuring incremental lift over broader changes.
## Metric Design
### Primary Metric
- One metric that decides ship/no-ship.
- Must align with user value and business objective.
### Guardrail Metrics
- Prevent local optimization damage.
- Examples: error rate, latency, churn proxy, support contacts.
### Diagnostic Metrics
- Explain why change happened.
- Do not use as decision gate unless pre-specified.
## Stopping Rules
Define before launch:
- Fixed sample size per group
- Minimum run duration (to capture weekday/weekend behavior)
- Guardrail breach thresholds (pause criteria)
Avoid:
- Continuous peeking with fixed-horizon inference
- Changing success metric mid-test
- Retroactive segmentation without correction
## Novelty and Primacy Effects
- Novelty effect: short-term spike due to newness, not durable value.
- Primacy effect: early exposure creates bias in user behavior.
Mitigation:
- Run long enough for behavior stabilization.
- Check returning users and delayed cohorts separately.
- Re-run key tests when stakes are high.
## Pre-Launch Checklist
- [ ] Hypothesis complete (If/Then/Because)
- [ ] Metric definitions frozen
- [ ] Instrumentation validated
- [ ] Randomization and assignment verified
- [ ] Sample size and duration approved
- [ ] Rollback plan documented
## Post-Test Readout Template
1. Hypothesis and scope
2. Experiment setup and quality checks
3. Primary metric effect size + confidence interval
4. Guardrail status
5. Segment-level observations (pre-registered only)
6. Decision: ship, iterate, or reject
7. Follow-up experiments