71 lines
1.9 KiB
Markdown
71 lines
1.9 KiB
Markdown
# Experiment Playbook
|
|
|
|
## Experiment Types
|
|
|
|
### A/B Test
|
|
- Compare one control versus one variant.
|
|
- Best for high-confidence directional decisions.
|
|
|
|
### Multivariate Test
|
|
- Test combinations of multiple factors.
|
|
- Useful for interaction effects, requires larger traffic.
|
|
|
|
### Holdout Test
|
|
- Keep a percentage unexposed to intervention.
|
|
- Useful for measuring incremental lift over broader changes.
|
|
|
|
## Metric Design
|
|
|
|
### Primary Metric
|
|
- One metric that decides ship/no-ship.
|
|
- Must align with user value and business objective.
|
|
|
|
### Guardrail Metrics
|
|
- Prevent local optimization damage.
|
|
- Examples: error rate, latency, churn proxy, support contacts.
|
|
|
|
### Diagnostic Metrics
|
|
- Explain why change happened.
|
|
- Do not use as decision gate unless pre-specified.
|
|
|
|
## Stopping Rules
|
|
|
|
Define before launch:
|
|
- Fixed sample size per group
|
|
- Minimum run duration (to capture weekday/weekend behavior)
|
|
- Guardrail breach thresholds (pause criteria)
|
|
|
|
Avoid:
|
|
- Continuous peeking with fixed-horizon inference
|
|
- Changing success metric mid-test
|
|
- Retroactive segmentation without correction
|
|
|
|
## Novelty and Primacy Effects
|
|
|
|
- Novelty effect: short-term spike due to newness, not durable value.
|
|
- Primacy effect: early exposure creates bias in user behavior.
|
|
|
|
Mitigation:
|
|
- Run long enough for behavior stabilization.
|
|
- Check returning users and delayed cohorts separately.
|
|
- Re-run key tests when stakes are high.
|
|
|
|
## Pre-Launch Checklist
|
|
|
|
- [ ] Hypothesis complete (If/Then/Because)
|
|
- [ ] Metric definitions frozen
|
|
- [ ] Instrumentation validated
|
|
- [ ] Randomization and assignment verified
|
|
- [ ] Sample size and duration approved
|
|
- [ ] Rollback plan documented
|
|
|
|
## Post-Test Readout Template
|
|
|
|
1. Hypothesis and scope
|
|
2. Experiment setup and quality checks
|
|
3. Primary metric effect size + confidence interval
|
|
4. Guardrail status
|
|
5. Segment-level observations (pre-registered only)
|
|
6. Decision: ship, iterate, or reject
|
|
7. Follow-up experiments
|