# Usability Testing Frameworks Reference for planning and conducting usability tests that produce actionable insights. --- ## Table of Contents - [Testing Methods Overview](#testing-methods-overview) - [Test Planning](#test-planning) - [Task Design](#task-design) - [Moderation Techniques](#moderation-techniques) - [Analysis Framework](#analysis-framework) - [Reporting Template](#reporting-template) --- ## Testing Methods Overview ### Method Selection Matrix | Method | When to Use | Participants | Time | Output | |--------|-------------|--------------|------|--------| | Moderated remote | Deep insights, complex flows | 5-8 | 45-60 min | Rich qualitative | | Unmoderated remote | Quick validation, simple tasks | 10-20 | 15-20 min | Quantitative + video | | In-person | Physical products, context matters | 5-10 | 60-90 min | Very rich qualitative | | Guerrilla | Quick feedback, public spaces | 3-5 | 5-10 min | Rapid insights | | A/B testing | Comparing two designs | 100+ | Varies | Statistical data | ### Participant Count Guidelines ``` ┌─────────────────────────────────────────────────────────────┐ │ FINDING USABILITY ISSUES │ ├─────────────────────────────────────────────────────────────┤ │ │ │ % Issues Found │ │ 100% ┤ ●────●────● │ │ 90% ┤ ●───── │ │ 80% ┤ ●───── │ │ 75% ┤ ●──── ← 5 users: 75-80% │ │ 50% ┤ ●──── │ │ 25% ┤ ●── │ │ 0% ┼────┬────┬────┬────┬────┬──── │ │ 1 2 3 4 5 6+ Users │ │ │ └─────────────────────────────────────────────────────────────┘ ``` **Nielsen's Rule:** 5 users find ~75-80% of usability issues | Goal | Participants | Reasoning | |------|--------------|-----------| | Find major issues | 5 | 80% coverage, diminishing returns | | Validate fix | 3 | Confirm specific issue resolved | | Compare designs | 8-10 per design | Need comparison data | | Quantitative metrics | 20+ | Statistical significance | --- ## Test Planning ### Research Questions Transform vague goals into testable questions: | Vague Goal | Testable Question | |------------|-------------------| | "Is it easy to use?" | "Can users complete checkout in under 3 minutes?" | | "Do users like it?" | "Will users choose Design A or B for this task?" | | "Does it make sense?" | "Can users find the settings without hints?" | ### Test Plan Template ``` PROJECT: _______________ DATE: _______________ RESEARCHER: _______________ RESEARCH QUESTIONS: 1. _______________ 2. _______________ 3. _______________ PARTICIPANTS: • Target: [Persona or user type] • Count: [Number] • Recruitment: [Source] • Incentive: [Amount/type] METHOD: • Type: [Moderated/Unmoderated/Remote/In-person] • Duration: [Minutes per session] • Environment: [Tool/Location] TASKS: 1. [Task description + success criteria] 2. [Task description + success criteria] 3. [Task description + success criteria] METRICS: • Completion rate (target: __%) • Time on task (target: __ min) • Error rate (target: __%) • Satisfaction (target: __/5) SCHEDULE: • Pilot: [Date] • Sessions: [Date range] • Analysis: [Date] • Report: [Date] ``` ### Pilot Testing **Always pilot before real sessions:** - Run 1-2 test sessions with team members - Check task clarity and timing - Test recording/screen sharing - Adjust based on pilot feedback **Pilot Checklist:** - [ ] Tasks understood without clarification - [ ] Session fits in time slot - [ ] Recording captures screen + audio - [ ] Post-test questions make sense --- ## Task Design ### Good vs. Bad Tasks | Bad Task | Why Bad | Good Task | |----------|---------|-----------| | "Find the settings" | Leading | "Change your notification preferences" | | "Use the dashboard" | Vague | "Find how many sales you made last month" | | "Click the blue button" | Prescriptive | "Submit your order" | | "Do you like this?" | Opinion-based | "Rate how easy it was (1-5)" | ### Task Construction Formula ``` SCENARIO + GOAL + SUCCESS CRITERIA Scenario: Context that makes task realistic Goal: What user needs to accomplish Success: How we know they succeeded Example: "Imagine you're planning a trip to Paris next month. [SCENARIO] Book a hotel for 3 nights in your budget. [GOAL] You've succeeded when you see the confirmation page. [SUCCESS]" ``` ### Task Types | Type | Purpose | Example | |------|---------|---------| | Exploration | First impressions | "Look around and tell me what you think this does" | | Specific | Core functionality | "Add item to cart and checkout" | | Comparison | Design validation | "Which of these two menus would you use to..." | | Stress | Edge cases | "What would you do if your payment failed?" | ### Task Difficulty Progression Start easy, increase difficulty: ``` Task 1: Warm-up (easy, builds confidence) Task 2: Core flow (main functionality) Task 3: Secondary flow (important but less common) Task 4: Edge case (stress test) Task 5: Free exploration (open-ended) ``` --- ## Moderation Techniques ### The Think-Aloud Protocol **Instruction Script:** "As you work through the tasks, please think out loud. Tell me what you're looking at, what you're thinking, and what you're trying to do. There are no wrong answers - we're testing the design, not you." **Prompts When Silent:** - "What are you thinking right now?" - "What do you expect to happen?" - "What are you looking for?" - "Tell me more about that" ### Handling Common Situations | Situation | What to Say | |-----------|-------------| | User asks for help | "What would you do if I weren't here?" | | User is stuck | "What are your options?" (wait 30 sec before hint) | | User apologizes | "You're doing great. We're testing the design." | | User goes off-task | "That's interesting. Let's come back to [task]." | | User criticizes | "Tell me more about that." (neutral, don't defend) | ### Non-Leading Question Techniques | Leading (Don't) | Neutral (Do) | |-----------------|--------------| | "Did you find that confusing?" | "How was that experience?" | | "The search is over here" | "What do you think you should do?" | | "Don't you think X is easier?" | "Which do you prefer and why?" | | "Did you notice the tooltip?" | "What happened there?" | ### Post-Task Questions After each task: 1. "How difficult was that?" (1-5 scale) 2. "What, if anything, was confusing?" 3. "What would you improve?" After all tasks: 1. "What stood out to you?" 2. "What was the best/worst part?" 3. "Would you use this? Why/why not?" --- ## Analysis Framework ### Severity Rating Scale | Severity | Definition | Criteria | |----------|------------|----------| | 4 - Critical | Prevents task completion | User cannot proceed | | 3 - Major | Significant difficulty | User struggles, considers giving up | | 2 - Minor | Causes hesitation | User recovers independently | | 1 - Cosmetic | Noticed but not problematic | User comments but unaffected | ### Issue Documentation Template ``` ISSUE ID: ___ SEVERITY: [1-4] FREQUENCY: [X/Y participants] TASK: [Which task] TIMESTAMP: [When in session] OBSERVATION: [What happened - factual description] USER QUOTE: "[Direct quote if available]" HYPOTHESIS: [Why this might be happening] RECOMMENDATION: [Proposed solution] AFFECTED PERSONA: [Which user types] ``` ### Pattern Recognition **Quantitative Signals:** - Task completion rate < 80% - Time on task > 2x expected - Error rate > 20% - Satisfaction < 3/5 **Qualitative Signals:** - Same confusion point across 3+ users - Repeated verbal frustration - Workaround attempts - Feature requests during task ### Analysis Matrix ``` ┌─────────────────┬───────────┬───────────┬───────────┐ │ Issue │ Frequency │ Severity │ Priority │ ├─────────────────┼───────────┼───────────┼───────────┤ │ Can't find X │ 4/5 │ Critical │ HIGH │ │ Confusing label │ 3/5 │ Major │ HIGH │ │ Slow loading │ 2/5 │ Minor │ MEDIUM │ │ Typo in text │ 1/5 │ Cosmetic │ LOW │ └─────────────────┴───────────┴───────────┴───────────┘ Priority = Frequency × Severity ``` --- ## Reporting Template ### Executive Summary ``` USABILITY TEST REPORT [Project Name] | [Date] OVERVIEW • Participants: [N] users matching [persona] • Method: [Type of test] • Tasks: [N] tasks covering [scope] KEY FINDINGS 1. [Most critical issue + impact] 2. [Second issue] 3. [Third issue] SUCCESS METRICS • Completion rate: [X]% (target: Y%) • Avg. time on task: [X] min (target: Y min) • Satisfaction: [X]/5 (target: Y/5) TOP RECOMMENDATIONS 1. [Highest priority fix] 2. [Second priority] 3. [Third priority] ``` ### Detailed Findings Section ``` FINDING 1: [Title] Severity: [Critical/Major/Minor/Cosmetic] Frequency: [X/Y participants] Affected Tasks: [List] What Happened: [Description of the problem] Evidence: • P1: "[Quote]" • P3: "[Quote]" • [Video timestamp if available] Impact: [How this affects users and business] Recommendation: [Proposed solution with rationale] Design Mockup: [Optional: before/after if applicable] ``` ### Metrics Dashboard ``` TASK PERFORMANCE SUMMARY Task 1: [Name] ├─ Completion: ████████░░ 80% ├─ Avg. Time: 2:15 (target: 2:00) ├─ Errors: 1.2 avg └─ Satisfaction: ★★★★☆ 4.2/5 Task 2: [Name] ├─ Completion: ██████░░░░ 60% ⚠️ ├─ Avg. Time: 4:30 (target: 3:00) ⚠️ ├─ Errors: 3.1 avg ⚠️ └─ Satisfaction: ★★★☆☆ 3.1/5 [Continue for all tasks] ``` --- ## Quick Reference ### Session Checklist **Before Session:** - [ ] Test plan finalized - [ ] Tasks written and piloted - [ ] Recording set up and tested - [ ] Consent form ready - [ ] Prototype/product accessible - [ ] Note-taking template ready **During Session:** - [ ] Consent obtained - [ ] Think-aloud explained - [ ] Recording started - [ ] Tasks presented one at a time - [ ] Post-task ratings collected - [ ] Debrief questions asked - [ ] Thanks and incentive **After Session:** - [ ] Notes organized - [ ] Recording saved - [ ] Initial impressions captured - [ ] Issues logged ### Common Metrics | Metric | Formula | Target | |--------|---------|--------| | Completion rate | Successful / Total × 100 | >80% | | Time on task | Average seconds | <2x expected | | Error rate | Errors / Attempts × 100 | <15% | | Task-level satisfaction | Average rating | >4/5 | | SUS score | Standard formula | >68 | | NPS | Promoters - Detractors | >0 | --- *See also: `journey-mapping-guide.md` for contextual research*