- Add SKILL.md with 300+ lines of incident response playbook - Implement incident_classifier.py: severity classification and response recommendations - Implement timeline_reconstructor.py: event timeline reconstruction with phase analysis - Implement pir_generator.py: comprehensive PIR generation with multiple RCA frameworks - Add reference documentation: severity matrix, RCA frameworks, communication templates - Add sample data files and expected outputs for testing - All scripts are standalone with zero external dependencies - Dual output formats: JSON + human-readable text - Professional, opinionated defaults based on SRE best practices This POWERFUL-tier skill provides end-to-end incident response capabilities from detection through post-incident review.
14 lines
610 B
JSON
14 lines
610 B
JSON
{
|
|
"description": "Database connection timeouts causing 500 errors for payment processing API. Users unable to complete checkout. Error rate spiked from 0.1% to 45% starting at 14:30 UTC. Database monitoring shows connection pool exhaustion with 200/200 connections active.",
|
|
"service": "payment-api",
|
|
"affected_users": "80%",
|
|
"business_impact": "high",
|
|
"duration_minutes": 95,
|
|
"metadata": {
|
|
"error_rate": "45%",
|
|
"connection_pool_utilization": "100%",
|
|
"affected_regions": ["us-west", "us-east", "eu-west"],
|
|
"detection_method": "monitoring_alert",
|
|
"customer_escalations": 12
|
|
}
|
|
} |