# 🔍 Incident Post-Mortem Template **Incident ID:** [YYYY-MM-DD-###] **Severity:** [Red Alert / Yellow Alert / Info] **Date:** [Date of incident] **Author:** [Name] **Status:** [Draft / Under Review / Published] --- ## 📊 INCIDENT SUMMARY **In plain language, what happened?** [2-3 sentence summary that anyone can understand] **Impact:** - **Services Affected:** [List] - **Users Impacted:** [Number/percentage] - **Duration:** [X hours Y minutes] - **Revenue Impact:** [Yes/No, details if yes] --- ## ⏱️ TIMELINE **All times in Central Time (America/Chicago)** | Time | Event | Action Taken | By Whom | |------|-------|--------------|---------| | HH:MM | [What happened] | [What was done] | [Who] | | HH:MM | [Next event] | [Next action] | [Who] | | HH:MM | [Next event] | [Next action] | [Who] | **Example:** | Time | Event | Action Taken | By Whom | |------|-------|--------------|---------| | 03:47 | ATM10 server crashed | Alert received in Discord | Automated | | 03:52 | Investigated crash logs | SSH to NC1, checked logs | Michael | | 04:05 | Root cause identified (OOM) | Increased RAM allocation | Michael | | 04:12 | Server restarted | Restart via panel | Michael | | 04:15 | Verified functionality | Test player connection | Michael | | 04:20 | All clear | Posted update in Discord | Meg | --- ## 🔍 ROOT CAUSE ANALYSIS ### What was the root cause? [Detailed technical explanation] ### Why did it happen? [Contributing factors] ### Why didn't we catch it earlier? [Monitoring gaps, if any] --- ## 🛡️ WHAT WENT WELL **Things that worked as expected:** - [ ] [Monitoring detected issue quickly] - [ ] [Team responded within SLA] - [ ] [Emergency protocols followed] - [ ] [Communication was clear] - [ ] [Recovery was successful] [Expand on each point] --- ## 🚨 WHAT WENT WRONG **Things that didn't work as expected:** - [ ] [Issue that caused incident] - [ ] [Monitoring didn't catch X] - [ ] [Response was delayed because...] - [ ] [Communication breakdown in...] [Expand on each point] --- ## 🎯 ACTION ITEMS **Immediate (Within 24 hours):** - [ ] [Action 1] - Assigned to: [Person] - Due: [Date] - [ ] [Action 2] - Assigned to: [Person] - Due: [Date] **Short-term (Within 1 week):** - [ ] [Action 1] - Assigned to: [Person] - Due: [Date] - [ ] [Action 2] - Assigned to: [Person] - Due: [Date] **Long-term (Within 1 month):** - [ ] [Action 1] - Assigned to: [Person] - Due: [Date] - [ ] [Action 2] - Assigned to: [Person] - Due: [Date] --- ## 📚 LESSONS LEARNED **What did we learn?** 1. [Lesson 1] 2. [Lesson 2] 3. [Lesson 3] **How will we prevent this from happening again?** - [Prevention measure 1] - [Prevention measure 2] - [Prevention measure 3] **What documentation needs to be updated?** - [ ] [Document 1 - link] - [ ] [Document 2 - link] - [ ] [Procedure 3 - link] --- ## 💰 COST IMPACT **Direct Costs:** - Lost revenue: $[amount] - Emergency support costs: $[amount] - Overtime/after-hours work: [hours] **Indirect Costs:** - Player churn (estimated): [number] - Reputation impact: [assessment] - Time investment: [person-hours] **Total Estimated Impact:** $[amount] --- ## 🔄 FOLLOW-UP **30-Day Follow-Up:** - [ ] Verify all action items completed - [ ] Check if similar incidents occurred - [ ] Measure effectiveness of changes **90-Day Follow-Up:** - [ ] Review long-term prevention measures - [ ] Assess if incident type has recurred - [ ] Update procedures based on experience --- ## 📎 SUPPORTING MATERIALS **Logs:** - Link to server logs: [path/link] - Link to monitoring data: [path/link] - Screenshots: [path/link] **Communications:** - Discord announcements: [links] - Staff communications: [links] - Player feedback: [links] --- ## ✅ APPROVAL & PUBLICATION **Reviewed by:** - [ ] Technical Lead: [Name] - [Date] - [ ] Management: [Name] - [Date] **Publication:** - [ ] Internal (staff only) - [ ] Public (redacted version) **Published:** [Date] **Location:** [docs/reference/post-mortems/YYYY-MM-DD-###.md] --- **Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️ --- **Template Version:** 1.0 **Last Updated:** 2026-02-17