Files
claude-skills-reference/c-level-advisor/cto-advisor/references/engineering_metrics.md
Reza Rezvani 619f7be887 feat: add C-level advisor skills (CEO & CTO) and packaged skill archives
Add two executive leadership skill packages:

CEO Advisor:
- Strategy analyzer and financial scenario analyzer (Python tools)
- Executive decision framework
- Leadership & organizational culture guidelines
- Board governance & investor relations guidance

CTO Advisor:
- Tech debt analyzer and team scaling calculator (Python tools)
- Engineering metrics framework
- Technology evaluation framework
- Architecture decision records templates

Also includes packaged .zip archives for easy distribution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-19 06:06:54 +02:00

394 lines
12 KiB
Markdown

# Engineering Metrics & KPIs Guide
## Metrics Framework
### DORA Metrics (DevOps Research and Assessment)
#### 1. Deployment Frequency
- **Definition**: How often code is deployed to production
- **Target**:
- Elite: Multiple deploys per day
- High: Weekly to monthly
- Medium: Monthly to bi-annually
- Low: Less than bi-annually
- **Measurement**: Deployments per day/week/month
- **Improvement**: Smaller batch sizes, feature flags, CI/CD
#### 2. Lead Time for Changes
- **Definition**: Time from code commit to production
- **Target**:
- Elite: Less than 1 hour
- High: 1 day to 1 week
- Medium: 1 week to 1 month
- Low: More than 1 month
- **Measurement**: Median time from commit to deploy
- **Improvement**: Automation, parallel testing, smaller changes
#### 3. Mean Time to Recovery (MTTR)
- **Definition**: Time to restore service after incident
- **Target**:
- Elite: Less than 1 hour
- High: Less than 1 day
- Medium: 1 day to 1 week
- Low: More than 1 week
- **Measurement**: Average incident resolution time
- **Improvement**: Monitoring, rollback capability, runbooks
#### 4. Change Failure Rate
- **Definition**: Percentage of changes causing failures
- **Target**:
- Elite: 0-15%
- High: 16-30%
- Medium/Low: >30%
- **Measurement**: Failed deploys / Total deploys
- **Improvement**: Testing, code review, gradual rollouts
### Engineering Productivity Metrics
#### Code Quality
| Metric | Formula | Target | Action if Below |
|--------|---------|--------|-----------------|
| Test Coverage | Tests / Total Code | >80% | Add unit tests |
| Code Review Coverage | Reviewed PRs / Total PRs | 100% | Enforce review policy |
| Technical Debt Ratio | Debt / Development Time | <10% | Dedicate debt sprints |
| Cyclomatic Complexity | Per function/method | <10 | Refactor complex code |
| Code Duplication | Duplicate Lines / Total | <5% | Extract common code |
#### Development Velocity
| Metric | Formula | Target | Action if Below |
|--------|---------|--------|-----------------|
| Sprint Velocity | Story Points / Sprint | Stable ±10% | Review estimation |
| Cycle Time | Start to Done Time | <5 days | Reduce WIP |
| PR Merge Time | Open to Merge | <24 hours | Smaller PRs |
| Build Time | Code to Artifact | <10 minutes | Optimize pipeline |
| Test Execution Time | Full Test Suite | <30 minutes | Parallelize tests |
#### Team Health
| Metric | Formula | Target | Action if Below |
|--------|---------|--------|-----------------|
| On-call Incidents | Incidents / Week | <5 | Improve monitoring |
| Bug Escape Rate | Prod Bugs / Release | <5% | Improve testing |
| Unplanned Work | Unplanned / Total | <20% | Better planning |
| Meeting Time | Meetings / Total Time | <20% | Reduce meetings |
| Focus Time | Uninterrupted Hours | >4h/day | Block calendars |
### Business Impact Metrics
#### System Performance
| Metric | Description | Target | Business Impact |
|--------|-------------|--------|-----------------|
| Uptime | System availability | 99.9%+ | Revenue protection |
| Page Load Time | Time to interactive | <3s | User retention |
| API Response Time | P95 latency | <200ms | User experience |
| Error Rate | Errors / Requests | <0.1% | Customer satisfaction |
| Throughput | Requests / Second | Per requirement | Scalability |
#### Product Delivery
| Metric | Description | Target | Business Impact |
|--------|-------------|--------|-----------------|
| Feature Delivery Rate | Features / Quarter | Per roadmap | Market competitiveness |
| Time to Market | Idea to Production | <3 months | First mover advantage |
| Customer Defect Rate | Customer Bugs / Month | <10 | Customer satisfaction |
| Feature Adoption | Users / Feature | >50% | ROI validation |
| NPS from Engineering | Customer Score | >50 | Product quality |
## Metrics Dashboards
### Executive Dashboard (Weekly)
```
┌─────────────────────────────────────┐
│ EXECUTIVE METRICS │
├─────────────────────────────────────┤
│ Uptime: 99.97% ✓ │
│ Sprint Velocity: 142 pts ✓ │
│ Deployment Frequency: 3.2/day ✓ │
│ Lead Time: 4.2 hrs ✓ │
│ MTTR: 47 min ✓ │
│ Change Failure Rate: 8.3% ✓ │
│ │
│ Team Health: 8.2/10 │
│ Tech Debt Ratio: 12% ⚠ │
│ Feature Delivery: 85% ✓ │
└─────────────────────────────────────┘
```
### Team Dashboard (Daily)
```
┌─────────────────────────────────────┐
│ TEAM METRICS │
├─────────────────────────────────────┤
│ Current Sprint: │
│ Completed: 65/100 pts (65%) │
│ In Progress: 20 pts │
│ Days Left: 3 │
│ │
│ PR Queue: 8 pending │
│ Build Status: ✓ Passing │
│ Test Coverage: 82.3% │
│ Open Incidents: 2 (P2, P3) │
│ │
│ On-call Load: 3 pages this week │
└─────────────────────────────────────┘
```
### Individual Dashboard (Daily)
```
┌─────────────────────────────────────┐
│ DEVELOPER METRICS │
├─────────────────────────────────────┤
│ This Week: │
│ PRs Merged: 8 │
│ Code Reviews: 12 │
│ Commits: 23 │
│ Focus Time: 22.5 hrs │
│ │
│ Quality: │
│ Test Coverage: 87% │
│ Code Review Feedback: 95% ✓ │
│ Bug Introduction Rate: 0% │
└─────────────────────────────────────┘
```
## Implementation Guide
### Phase 1: Foundation (Month 1)
1. **Basic Metrics**
- Deployment frequency
- Build success rate
- Uptime/availability
- Team velocity
2. **Tools Setup**
- CI/CD instrumentation
- Basic monitoring
- Time tracking
### Phase 2: Quality (Month 2)
1. **Quality Metrics**
- Test coverage
- Code review metrics
- Bug rates
- Technical debt
2. **Tool Integration**
- Static analysis
- Test reporting
- Code quality gates
### Phase 3: Performance (Month 3)
1. **Performance Metrics**
- DORA metrics complete
- System performance
- API metrics
- Database metrics
2. **Advanced Monitoring**
- APM tools
- Distributed tracing
- Custom dashboards
### Phase 4: Optimization (Ongoing)
1. **Advanced Analytics**
- Predictive metrics
- Trend analysis
- Anomaly detection
- Correlation analysis
## Metric Anti-patterns
### What NOT to Measure
**Lines of Code**: Encourages bloat
**Hours Worked**: Promotes presenteeism
**Individual Velocity**: Creates competition
**Bug Count Without Context**: Discourages risk-taking
**Commit Count**: Encourages tiny commits
### Goodhart's Law
"When a measure becomes a target, it ceases to be a good measure"
**Examples**:
- Optimizing test coverage → Writing meaningless tests
- Reducing bug count → Not reporting bugs
- Increasing velocity → Inflating estimates
- Reducing meeting time → Skipping important discussions
### How to Avoid Gaming
1. **Use Multiple Metrics**: No single metric tells the whole story
2. **Focus on Trends**: Not absolute numbers
3. **Combine Leading and Lagging**: Balance predictive and historical
4. **Regular Review**: Adjust metrics that are being gamed
5. **Team Ownership**: Let teams choose their metrics
## OKR Framework for Engineering
### Company Level OKRs
**Objective**: Deliver exceptional product quality
**Key Results**:
- KR1: Achieve 99.95% uptime (from 99.9%)
- KR2: Reduce customer-reported bugs by 50%
- KR3: Improve deployment frequency to 10x/day
### Engineering OKRs
**Objective**: Build scalable, reliable infrastructure
**Key Results**:
- KR1: Migrate 80% of services to Kubernetes
- KR2: Reduce MTTR to <30 minutes
- KR3: Achieve 85% test coverage
### Team OKRs
**Objective**: Improve developer productivity
**Key Results**:
- KR1: Reduce build time to <5 minutes
- KR2: Automate 90% of deployment process
- KR3: Reduce PR review time to <4 hours
## Reporting Templates
### Monthly Engineering Report
```markdown
# Engineering Report - [Month Year]
## Executive Summary
- Key Achievement: [Highlight]
- Main Challenge: [Issue and resolution]
- Next Month Focus: [Priority]
## DORA Metrics
| Metric | This Month | Last Month | Target | Status |
|--------|------------|------------|--------|--------|
| Deploy Frequency | X/day | Y/day | Z/day | ✓/⚠/✗ |
| Lead Time | X hrs | Y hrs | <Z hrs | ✓/⚠/✗ |
| MTTR | X min | Y min | <Z min | ✓/⚠/✗ |
| Change Failure | X% | Y% | <Z% | ✓/⚠/✗ |
## Team Performance
- Velocity: X story points (Y% of plan)
- Sprint Completion: X%
- Unplanned Work: X%
## Quality Metrics
- Test Coverage: X% (Δ Y%)
- Customer Bugs: X (Δ Y)
- Code Review Coverage: X%
## Highlights
1. [Major feature or improvement]
2. [Technical achievement]
3. [Process improvement]
## Challenges & Solutions
1. Challenge: [Issue]
Solution: [Action taken]
## Next Month Priorities
1. [Priority 1]
2. [Priority 2]
3. [Priority 3]
```
### Quarterly Business Review
```markdown
# Engineering QBR - Q[X] [Year]
## Strategic Alignment
- Business Goal: [Goal]
- Engineering Contribution: [How engineering supported]
- Impact: [Measurable outcome]
## Quarterly Metrics
### Delivery
- Features Shipped: X of Y planned (Z%)
- Major Releases: [List]
- Technical Debt Reduced: X%
### Reliability
- Uptime: X%
- Incidents: X (PY critical, PZ major)
- Customer Impact: [Description]
### Efficiency
- Cost per Transaction: $X (Δ Y%)
- Infrastructure Cost: $X (Δ Y%)
- Engineering Cost per Feature: $X
## Team Growth
- Headcount: Start: X → End: Y
- Attrition: X%
- Key Hires: [Roles]
## Innovation
- Patents Filed: X
- Open Source Contributions: X
- Hackathon Projects: X
## Lessons Learned
1. [What worked well]
2. [What didn't work]
3. [What we're changing]
## Next Quarter Focus
1. [Strategic Initiative 1]
2. [Strategic Initiative 2]
3. [Strategic Initiative 3]
```
## Tool Recommendations
### Metrics Collection
- **DataDog**: Comprehensive monitoring
- **New Relic**: Application performance
- **Grafana + Prometheus**: Open source stack
- **CloudWatch**: AWS native
### Engineering Analytics
- **LinearB**: Developer productivity
- **Velocity**: Engineering metrics
- **Sleuth**: DORA metrics
- **Swarmia**: Engineering insights
### Project Tracking
- **Jira**: Issue tracking
- **Linear**: Modern issue tracking
- **Azure DevOps**: Microsoft ecosystem
- **GitHub Projects**: Integrated with code
### Incident Management
- **PagerDuty**: On-call management
- **Opsgenie**: Incident response
- **StatusPage**: Status communication
- **FireHydrant**: Incident command
## Success Indicators
### Healthy Engineering Organization
✓ DORA metrics improving quarter-over-quarter
✓ Team satisfaction >8/10
✓ Attrition <10% annually
✓ On-time delivery >80%
✓ Technical debt <15% of capacity
✓ Innovation time >20%
### Warning Signs
⚠️ Increasing MTTR trend
⚠️ Declining velocity
⚠️ Rising bug escape rate
⚠️ Increasing unplanned work
⚠️ Growing PR queue
⚠️ Decreasing test coverage
### Crisis Indicators
🚨 Multiple production incidents per week
🚨 Team satisfaction <6/10
🚨 Attrition >20%
🚨 Technical debt >30%
🚨 No deployments for >1 week
🚨 Customer escalations increasing