Add two executive leadership skill packages: CEO Advisor: - Strategy analyzer and financial scenario analyzer (Python tools) - Executive decision framework - Leadership & organizational culture guidelines - Board governance & investor relations guidance CTO Advisor: - Tech debt analyzer and team scaling calculator (Python tools) - Engineering metrics framework - Technology evaluation framework - Architecture decision records templates Also includes packaged .zip archives for easy distribution. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
Engineering Metrics & KPIs Guide
Metrics Framework
DORA Metrics (DevOps Research and Assessment)
1. Deployment Frequency
- Definition: How often code is deployed to production
- Target:
- Elite: Multiple deploys per day
- High: Weekly to monthly
- Medium: Monthly to bi-annually
- Low: Less than bi-annually
- Measurement: Deployments per day/week/month
- Improvement: Smaller batch sizes, feature flags, CI/CD
2. Lead Time for Changes
- Definition: Time from code commit to production
- Target:
- Elite: Less than 1 hour
- High: 1 day to 1 week
- Medium: 1 week to 1 month
- Low: More than 1 month
- Measurement: Median time from commit to deploy
- Improvement: Automation, parallel testing, smaller changes
3. Mean Time to Recovery (MTTR)
- Definition: Time to restore service after incident
- Target:
- Elite: Less than 1 hour
- High: Less than 1 day
- Medium: 1 day to 1 week
- Low: More than 1 week
- Measurement: Average incident resolution time
- Improvement: Monitoring, rollback capability, runbooks
4. Change Failure Rate
- Definition: Percentage of changes causing failures
- Target:
- Elite: 0-15%
- High: 16-30%
- Medium/Low: >30%
- Measurement: Failed deploys / Total deploys
- Improvement: Testing, code review, gradual rollouts
Engineering Productivity Metrics
Code Quality
| Metric | Formula | Target | Action if Below |
|---|---|---|---|
| Test Coverage | Tests / Total Code | >80% | Add unit tests |
| Code Review Coverage | Reviewed PRs / Total PRs | 100% | Enforce review policy |
| Technical Debt Ratio | Debt / Development Time | <10% | Dedicate debt sprints |
| Cyclomatic Complexity | Per function/method | <10 | Refactor complex code |
| Code Duplication | Duplicate Lines / Total | <5% | Extract common code |
Development Velocity
| Metric | Formula | Target | Action if Below |
|---|---|---|---|
| Sprint Velocity | Story Points / Sprint | Stable ±10% | Review estimation |
| Cycle Time | Start to Done Time | <5 days | Reduce WIP |
| PR Merge Time | Open to Merge | <24 hours | Smaller PRs |
| Build Time | Code to Artifact | <10 minutes | Optimize pipeline |
| Test Execution Time | Full Test Suite | <30 minutes | Parallelize tests |
Team Health
| Metric | Formula | Target | Action if Below |
|---|---|---|---|
| On-call Incidents | Incidents / Week | <5 | Improve monitoring |
| Bug Escape Rate | Prod Bugs / Release | <5% | Improve testing |
| Unplanned Work | Unplanned / Total | <20% | Better planning |
| Meeting Time | Meetings / Total Time | <20% | Reduce meetings |
| Focus Time | Uninterrupted Hours | >4h/day | Block calendars |
Business Impact Metrics
System Performance
| Metric | Description | Target | Business Impact |
|---|---|---|---|
| Uptime | System availability | 99.9%+ | Revenue protection |
| Page Load Time | Time to interactive | <3s | User retention |
| API Response Time | P95 latency | <200ms | User experience |
| Error Rate | Errors / Requests | <0.1% | Customer satisfaction |
| Throughput | Requests / Second | Per requirement | Scalability |
Product Delivery
| Metric | Description | Target | Business Impact |
|---|---|---|---|
| Feature Delivery Rate | Features / Quarter | Per roadmap | Market competitiveness |
| Time to Market | Idea to Production | <3 months | First mover advantage |
| Customer Defect Rate | Customer Bugs / Month | <10 | Customer satisfaction |
| Feature Adoption | Users / Feature | >50% | ROI validation |
| NPS from Engineering | Customer Score | >50 | Product quality |
Metrics Dashboards
Executive Dashboard (Weekly)
┌─────────────────────────────────────┐
│ EXECUTIVE METRICS │
├─────────────────────────────────────┤
│ Uptime: 99.97% ✓ │
│ Sprint Velocity: 142 pts ✓ │
│ Deployment Frequency: 3.2/day ✓ │
│ Lead Time: 4.2 hrs ✓ │
│ MTTR: 47 min ✓ │
│ Change Failure Rate: 8.3% ✓ │
│ │
│ Team Health: 8.2/10 │
│ Tech Debt Ratio: 12% ⚠ │
│ Feature Delivery: 85% ✓ │
└─────────────────────────────────────┘
Team Dashboard (Daily)
┌─────────────────────────────────────┐
│ TEAM METRICS │
├─────────────────────────────────────┤
│ Current Sprint: │
│ Completed: 65/100 pts (65%) │
│ In Progress: 20 pts │
│ Days Left: 3 │
│ │
│ PR Queue: 8 pending │
│ Build Status: ✓ Passing │
│ Test Coverage: 82.3% │
│ Open Incidents: 2 (P2, P3) │
│ │
│ On-call Load: 3 pages this week │
└─────────────────────────────────────┘
Individual Dashboard (Daily)
┌─────────────────────────────────────┐
│ DEVELOPER METRICS │
├─────────────────────────────────────┤
│ This Week: │
│ PRs Merged: 8 │
│ Code Reviews: 12 │
│ Commits: 23 │
│ Focus Time: 22.5 hrs │
│ │
│ Quality: │
│ Test Coverage: 87% │
│ Code Review Feedback: 95% ✓ │
│ Bug Introduction Rate: 0% │
└─────────────────────────────────────┘
Implementation Guide
Phase 1: Foundation (Month 1)
-
Basic Metrics
- Deployment frequency
- Build success rate
- Uptime/availability
- Team velocity
-
Tools Setup
- CI/CD instrumentation
- Basic monitoring
- Time tracking
Phase 2: Quality (Month 2)
-
Quality Metrics
- Test coverage
- Code review metrics
- Bug rates
- Technical debt
-
Tool Integration
- Static analysis
- Test reporting
- Code quality gates
Phase 3: Performance (Month 3)
-
Performance Metrics
- DORA metrics complete
- System performance
- API metrics
- Database metrics
-
Advanced Monitoring
- APM tools
- Distributed tracing
- Custom dashboards
Phase 4: Optimization (Ongoing)
- Advanced Analytics
- Predictive metrics
- Trend analysis
- Anomaly detection
- Correlation analysis
Metric Anti-patterns
What NOT to Measure
❌ Lines of Code: Encourages bloat
❌ Hours Worked: Promotes presenteeism
❌ Individual Velocity: Creates competition
❌ Bug Count Without Context: Discourages risk-taking
❌ Commit Count: Encourages tiny commits
Goodhart's Law
"When a measure becomes a target, it ceases to be a good measure"
Examples:
- Optimizing test coverage → Writing meaningless tests
- Reducing bug count → Not reporting bugs
- Increasing velocity → Inflating estimates
- Reducing meeting time → Skipping important discussions
How to Avoid Gaming
- Use Multiple Metrics: No single metric tells the whole story
- Focus on Trends: Not absolute numbers
- Combine Leading and Lagging: Balance predictive and historical
- Regular Review: Adjust metrics that are being gamed
- Team Ownership: Let teams choose their metrics
OKR Framework for Engineering
Company Level OKRs
Objective: Deliver exceptional product quality
Key Results:
- KR1: Achieve 99.95% uptime (from 99.9%)
- KR2: Reduce customer-reported bugs by 50%
- KR3: Improve deployment frequency to 10x/day
Engineering OKRs
Objective: Build scalable, reliable infrastructure
Key Results:
- KR1: Migrate 80% of services to Kubernetes
- KR2: Reduce MTTR to <30 minutes
- KR3: Achieve 85% test coverage
Team OKRs
Objective: Improve developer productivity
Key Results:
- KR1: Reduce build time to <5 minutes
- KR2: Automate 90% of deployment process
- KR3: Reduce PR review time to <4 hours
Reporting Templates
Monthly Engineering Report
# Engineering Report - [Month Year]
## Executive Summary
- Key Achievement: [Highlight]
- Main Challenge: [Issue and resolution]
- Next Month Focus: [Priority]
## DORA Metrics
| Metric | This Month | Last Month | Target | Status |
|--------|------------|------------|--------|--------|
| Deploy Frequency | X/day | Y/day | Z/day | ✓/⚠/✗ |
| Lead Time | X hrs | Y hrs | <Z hrs | ✓/⚠/✗ |
| MTTR | X min | Y min | <Z min | ✓/⚠/✗ |
| Change Failure | X% | Y% | <Z% | ✓/⚠/✗ |
## Team Performance
- Velocity: X story points (Y% of plan)
- Sprint Completion: X%
- Unplanned Work: X%
## Quality Metrics
- Test Coverage: X% (Δ Y%)
- Customer Bugs: X (Δ Y)
- Code Review Coverage: X%
## Highlights
1. [Major feature or improvement]
2. [Technical achievement]
3. [Process improvement]
## Challenges & Solutions
1. Challenge: [Issue]
Solution: [Action taken]
## Next Month Priorities
1. [Priority 1]
2. [Priority 2]
3. [Priority 3]
Quarterly Business Review
# Engineering QBR - Q[X] [Year]
## Strategic Alignment
- Business Goal: [Goal]
- Engineering Contribution: [How engineering supported]
- Impact: [Measurable outcome]
## Quarterly Metrics
### Delivery
- Features Shipped: X of Y planned (Z%)
- Major Releases: [List]
- Technical Debt Reduced: X%
### Reliability
- Uptime: X%
- Incidents: X (PY critical, PZ major)
- Customer Impact: [Description]
### Efficiency
- Cost per Transaction: $X (Δ Y%)
- Infrastructure Cost: $X (Δ Y%)
- Engineering Cost per Feature: $X
## Team Growth
- Headcount: Start: X → End: Y
- Attrition: X%
- Key Hires: [Roles]
## Innovation
- Patents Filed: X
- Open Source Contributions: X
- Hackathon Projects: X
## Lessons Learned
1. [What worked well]
2. [What didn't work]
3. [What we're changing]
## Next Quarter Focus
1. [Strategic Initiative 1]
2. [Strategic Initiative 2]
3. [Strategic Initiative 3]
Tool Recommendations
Metrics Collection
- DataDog: Comprehensive monitoring
- New Relic: Application performance
- Grafana + Prometheus: Open source stack
- CloudWatch: AWS native
Engineering Analytics
- LinearB: Developer productivity
- Velocity: Engineering metrics
- Sleuth: DORA metrics
- Swarmia: Engineering insights
Project Tracking
- Jira: Issue tracking
- Linear: Modern issue tracking
- Azure DevOps: Microsoft ecosystem
- GitHub Projects: Integrated with code
Incident Management
- PagerDuty: On-call management
- Opsgenie: Incident response
- StatusPage: Status communication
- FireHydrant: Incident command
Success Indicators
Healthy Engineering Organization
✓ DORA metrics improving quarter-over-quarter
✓ Team satisfaction >8/10
✓ Attrition <10% annually
✓ On-time delivery >80%
✓ Technical debt <15% of capacity
✓ Innovation time >20%
Warning Signs
⚠️ Increasing MTTR trend
⚠️ Declining velocity
⚠️ Rising bug escape rate
⚠️ Increasing unplanned work
⚠️ Growing PR queue
⚠️ Decreasing test coverage
Crisis Indicators
🚨 Multiple production incidents per week
🚨 Team satisfaction <6/10
🚨 Attrition >20%
🚨 Technical debt >30%
🚨 No deployments for >1 week
🚨 Customer escalations increasing