firefrost-gaming/claude-skills-reference

Files

Leo f6f50f5282 Fix CI workflows and installation documentation

- Replace non-existent anthropics/claude-code-action@v1 with direct bash steps in smart-sync.yml and pr-issue-auto-close.yml
- Add missing checkout steps to both workflows for WORKFLOW_KILLSWITCH access
- Fix Issue #189: Replace broken 'npx ai-agent-skills install' with working 'npx agent-skills-cli add' command
- Update README.md and INSTALLATION.md with correct Agent Skills CLI commands and repository links
- Verified: agent-skills-cli detects all 53 skills and works with 42+ AI agents

Fixes: Two GitHub Actions workflows that broke on PR #191 merge
Closes: #189

2026-02-16 11:30:18 +00:00

4.5 KiB

Raw Blame History

Incident Report: [INC-YYYY-NNNN] [Title]

Severity: SEV[1-4] Status: [Active | Mitigated | Resolved] Incident Commander: [Name] Date: [YYYY-MM-DD]

Executive Summary

[2-3 sentence summary of the incident: what happened, impact scope, resolution status. Written for executive audience — no jargon, focus on business impact.]

Impact Statement

Metric	Value
Duration	[X hours Y minutes]
Affected Users	[number or percentage]
Failed Transactions	[number]
Revenue Impact	$[amount]
Data Loss	[Yes/No — if yes, detail below]
SLA Impact	[X.XX% availability for period]
Affected Regions	[list regions]
Affected Services	[list services]

Customer-Facing Impact

[Describe what customers experienced: error messages, degraded functionality, complete outage. Be specific about which user journeys were affected.]

Timeline

Time (UTC)	Phase	Event
HH:MM	Detection	[First alert or report]
HH:MM	Declaration	[Incident declared, channel created]
HH:MM	Investigation	[Key investigation findings]
HH:MM	Mitigation	[Mitigation action taken]
HH:MM	Resolution	[Permanent fix applied]
HH:MM	Closure	[Incident closed, monitoring confirmed stable]

Key Decision Points

[HH:MM] [Decision] — [Rationale and outcome]
[HH:MM] [Decision] — [Rationale and outcome]

Timeline Gaps

[Note any periods >15 minutes without logged events. These represent potential blind spots in the response.]

Root Cause Analysis

Root Cause

[Clear, specific statement of the root cause. Not "human error" — describe the systemic failure.]

Contributing Factors

[Factor Category: Process/Tooling/Human/Environment] — [Description]
[Factor Category] — [Description]
[Factor Category] — [Description]

5-Whys Analysis

Why did the service degrade? → [Answer]

Why did [answer above] happen? → [Answer]

Why did [answer above] happen? → [Root systemic cause]

Response Metrics

Metric	Value	Target	Status
MTTD (Mean Time to Detect)	[X min]	<5 min	[Met/Missed]
Time to Declare	[X min]	<10 min	[Met/Missed]
Time to Mitigate	[X min]	<60 min (SEV1)	[Met/Missed]
MTTR (Mean Time to Resolve)	[X min]	<4 hr (SEV1)	[Met/Missed]
Postmortem Timeliness	[X hours]	<72 hr	[Met/Missed]

Action Items

#	Priority	Action	Owner	Deadline	Type	Status
1	P1	[Action description]	[owner]	[date]	Detection	Open
2	P1	[Action description]	[owner]	[date]	Prevention	Open
3	P2	[Action description]	[owner]	[date]	Prevention	Open
4	P2	[Action description]	[owner]	[date]	Process	Open

Action Item Types

Detection: Improve ability to detect this class of issue faster
Prevention: Prevent this class of issue from occurring
Mitigation: Reduce impact when this class of issue occurs
Process: Improve response process and coordination

Lessons Learned

What Went Well

[Specific positive outcome from the response]
[Specific positive outcome]

What Didn't Go Well

[Specific area for improvement]
[Specific area for improvement]

Where We Got Lucky

[Things that could have made this worse but didn't]

Communication Log

Time (UTC)	Channel	Audience	Summary
HH:MM	Status Page	External	[Summary of update]
HH:MM	Slack #exec	Internal	[Summary of update]
HH:MM	Email	Customers	[Summary of notification]

Participants

Name	Role
[Name]	Incident Commander
[Name]	Operations Lead
[Name]	Communications Lead
[Name]	Subject Matter Expert

Appendix

[INC-YYYY-NNNN] — [Brief description of related incident]

Reference Links

[Link to monitoring dashboard]
[Link to deployment logs]
[Link to incident channel archive]

This report follows the blameless postmortem principle. The goal is systemic improvement, not individual accountability. All contributing factors should trace to process, tooling, or environmental gaps that can be addressed with concrete action items.

4.5 KiB Raw Blame History