A comprehensive incident response framework providing structured tools for managing technology incidents from detection through resolution and post-incident review.

Overview

This skill implements battle-tested practices from SRE and DevOps teams at scale, providing:

Automated Severity Classification - Intelligent incident triage
Timeline Reconstruction - Transform scattered events into coherent narratives
Post-Incident Review Generation - Structured PIRs with RCA frameworks
Communication Templates - Pre-built stakeholder communication
Comprehensive Documentation - Reference guides for incident response

Quick Start

Classify an Incident

# From JSON file
python scripts/incident_classifier.py --input incident.json --format text

# From stdin text
echo "Database is down affecting all users" | python scripts/incident_classifier.py --format text

# Interactive mode
python scripts/incident_classifier.py --interactive

Reconstruct Timeline

# Analyze event timeline
python scripts/timeline_reconstructor.py --input events.json --format text

# With gap analysis
python scripts/timeline_reconstructor.py --input events.json --gap-analysis --format markdown

Generate PIR Document

# Basic PIR
python scripts/pir_generator.py --incident incident.json --format markdown

# Comprehensive PIR with timeline
python scripts/pir_generator.py --incident incident.json --timeline timeline.json --rca-method fishbone

Scripts

incident_classifier.py

Purpose: Analyzes incident descriptions and provides severity classification, team recommendations, and response templates.

Input: JSON object with incident details or plain text description Output: JSON + human-readable classification report

Example Input:

{
  "description": "Database connection timeouts causing 500 errors",
  "service": "payment-api",
  "affected_users": "80%",
  "business_impact": "high"
}

Key Features:

SEV1-4 severity classification
Recommended response teams
Initial action prioritization
Communication templates
Response timelines

timeline_reconstructor.py

Purpose: Reconstructs incident timelines from timestamped events, identifies phases, and performs gap analysis.

Input: JSON array of timestamped events Output: Formatted timeline with phase analysis and metrics

Example Input:

[
  {
    "timestamp": "2024-01-01T12:00:00Z",
    "source": "monitoring",
    "message": "High error rate detected",
    "severity": "critical",
    "actor": "system"
  }
]

Key Features:

Phase detection (detection → triage → mitigation → resolution)
Duration analysis
Gap identification
Communication effectiveness analysis
Response metrics

pir_generator.py

Purpose: Generates comprehensive Post-Incident Review documents with multiple RCA frameworks.

Input: Incident data JSON, optional timeline data Output: Structured PIR document with RCA analysis

Key Features:

Multiple RCA methods (5 Whys, Fishbone, Timeline, Bow Tie)
Automated action item generation
Lessons learned categorization
Follow-up planning
Completeness assessment

Sample Data

The assets/ directory contains sample data files for testing:

sample_incident_classification.json - Database connection pool exhaustion incident
sample_timeline_events.json - Complete timeline with 21 events across phases
sample_incident_pir_data.json - Comprehensive incident data for PIR generation
simple_incident.json - Minimal incident for basic testing
simple_timeline_events.json - Simple 4-event timeline

Expected Outputs

The expected_outputs/ directory contains reference outputs showing what each script produces:

incident_classification_text_output.txt - Detailed classification report
timeline_reconstruction_text_output.txt - Complete timeline analysis
pir_markdown_output.md - Full PIR document
simple_incident_classification.txt - Basic classification example

Reference Documentation

references/incident_severity_matrix.md

Complete severity classification system with:

SEV1-4 definitions and criteria
Response requirements and timelines
Escalation paths
Communication requirements
Decision trees and examples

references/rca_frameworks_guide.md

Detailed guide for root cause analysis:

5 Whys methodology
Fishbone (Ishikawa) diagram analysis
Timeline analysis techniques
Bow Tie analysis for high-risk incidents
Framework selection guidelines

references/communication_templates.md

Standardized communication templates:

Severity-specific notification templates
Stakeholder-specific messaging
Escalation communications
Resolution notifications
Customer communication guidelines

Usage Patterns

End-to-End Incident Workflow

Initial Classification

echo "Payment API returning 500 errors for 70% of requests" | \
  python scripts/incident_classifier.py --format text

Timeline Reconstruction (after collecting events)

python scripts/timeline_reconstructor.py \
  --input events.json \
  --gap-analysis \
  --format markdown \
  --output timeline.md

PIR Generation (after incident resolution)

python scripts/pir_generator.py \
  --incident incident.json \
  --timeline timeline.md \
  --rca-method fishbone \
  --output pir.md

Integration Examples

CI/CD Pipeline Integration:

# Classify deployment issues
cat deployment_error.log | python scripts/incident_classifier.py --format json

Monitoring Integration:

# Process alert events
curl -s "monitoring-api/events" | python scripts/timeline_reconstructor.py --format text

Runbook Generation: Use classification output to automatically select appropriate runbooks and escalation procedures.

Quality Standards

Zero External Dependencies - All scripts use only Python standard library
Dual Output Format - Both JSON (machine-readable) and text (human-readable)
Robust Input Handling - Graceful handling of missing or malformed data
Professional Defaults - Opinionated, battle-tested configurations
Comprehensive Testing - Sample data and expected outputs included

Technical Requirements

Python 3.6+
No external dependencies required
Works with standard Unix tools (pipes, redirection)
Cross-platform compatible

Severity Classification Reference

Severity	Description	Response Time	Update Frequency
SEV1	Complete outage	5 minutes	Every 15 minutes
SEV2	Major degradation	15 minutes	Every 30 minutes
SEV3	Minor impact	2 hours	At milestones
SEV4	Low impact	1-2 days	Weekly

Getting Help

Each script includes comprehensive help:

python scripts/incident_classifier.py --help
python scripts/timeline_reconstructor.py --help  
python scripts/pir_generator.py --help

For methodology questions, refer to the reference documentation in the references/ directory.

Contributing

When adding new features:

Maintain zero external dependencies
Add comprehensive examples to assets/
Update expected outputs in expected_outputs/
Follow the established patterns for argument parsing and output formatting

License

This skill is part of the claude-skills repository. See the main repository LICENSE for details.