- Update skill count to 170, Python tools to 213, references to 314 across all docs - Regenerate all 170 skill doc pages from latest SKILL.md sources - Update CLAUDE.md with v2.1.1 highlights, accurate architecture tree, and roadmap - Update README.md badges and overview table - Update marketplace.json metadata description and version - Update mkdocs.yml, index.md, getting-started.md with correct numbers
283 lines
8.0 KiB
Markdown
283 lines
8.0 KiB
Markdown
---
|
|
title: "Senior Devops"
|
|
description: "Senior Devops - Claude Code skill from the Engineering - Core domain."
|
|
---
|
|
|
|
# Senior Devops
|
|
|
|
**Domain:** Engineering - Core | **Skill:** `senior-devops` | **Source:** [`engineering-team/senior-devops/SKILL.md`](https://github.com/alirezarezvani/claude-skills/tree/main/engineering-team/senior-devops/SKILL.md)
|
|
|
|
---
|
|
|
|
|
|
# Senior Devops
|
|
|
|
Complete toolkit for senior devops with modern tools and best practices.
|
|
|
|
## Quick Start
|
|
|
|
### Main Capabilities
|
|
|
|
This skill provides three core capabilities through automated scripts:
|
|
|
|
```bash
|
|
# Script 1: Pipeline Generator — scaffolds CI/CD pipelines for GitHub Actions or CircleCI
|
|
python scripts/pipeline_generator.py ./app --platform=github --stages=build,test,deploy
|
|
|
|
# Script 2: Terraform Scaffolder — generates and validates IaC modules for AWS/GCP/Azure
|
|
python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose
|
|
|
|
# Script 3: Deployment Manager — orchestrates container deployments with rollback support
|
|
python scripts/deployment_manager.py deploy --env=production --image=app:1.2.3 --strategy=blue-green
|
|
```
|
|
|
|
## Core Capabilities
|
|
|
|
### 1. Pipeline Generator
|
|
|
|
Scaffolds CI/CD pipeline configurations for GitHub Actions or CircleCI, with stages for build, test, security scan, and deploy.
|
|
|
|
**Example — GitHub Actions workflow:**
|
|
```yaml
|
|
# .github/workflows/ci.yml
|
|
name: CI/CD Pipeline
|
|
on:
|
|
push:
|
|
branches: [main, develop]
|
|
pull_request:
|
|
branches: [main]
|
|
|
|
jobs:
|
|
build-and-test:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- name: Set up Node.js
|
|
uses: actions/setup-node@v4
|
|
with:
|
|
node-version: '20'
|
|
cache: 'npm'
|
|
- run: npm ci
|
|
- run: npm run lint
|
|
- run: npm test -- --coverage
|
|
- name: Upload coverage
|
|
uses: codecov/codecov-action@v4
|
|
|
|
build-docker:
|
|
needs: build-and-test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- name: Build and push image
|
|
uses: docker/build-push-action@v5
|
|
with:
|
|
push: ${{ github.ref == 'refs/heads/main' }}
|
|
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
|
|
|
|
deploy:
|
|
needs: build-docker
|
|
if: github.ref == 'refs/heads/main'
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Deploy to ECS
|
|
run: |
|
|
aws ecs update-service \
|
|
--cluster production \
|
|
--service app-service \
|
|
--force-new-deployment
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/pipeline_generator.py <project-path> --platform=github|circleci --stages=build,test,deploy
|
|
```
|
|
|
|
### 2. Terraform Scaffolder
|
|
|
|
Generates, validates, and plans Terraform modules. Enforces consistent module structure and runs `terraform validate` + `terraform plan` before any apply.
|
|
|
|
**Example — AWS ECS service module:**
|
|
```hcl
|
|
# modules/ecs-service/main.tf
|
|
resource "aws_ecs_task_definition" "app" {
|
|
family = var.service_name
|
|
requires_compatibilities = ["FARGATE"]
|
|
network_mode = "awsvpc"
|
|
cpu = var.cpu
|
|
memory = var.memory
|
|
|
|
container_definitions = jsonencode([{
|
|
name = var.service_name
|
|
image = var.container_image
|
|
essential = true
|
|
portMappings = [{
|
|
containerPort = var.container_port
|
|
protocol = "tcp"
|
|
}]
|
|
environment = [for k, v in var.env_vars : { name = k, value = v }]
|
|
logConfiguration = {
|
|
logDriver = "awslogs"
|
|
options = {
|
|
awslogs-group = "/ecs/${var.service_name}"
|
|
awslogs-region = var.aws_region
|
|
awslogs-stream-prefix = "ecs"
|
|
}
|
|
}
|
|
}])
|
|
}
|
|
|
|
resource "aws_ecs_service" "app" {
|
|
name = var.service_name
|
|
cluster = var.cluster_id
|
|
task_definition = aws_ecs_task_definition.app.arn
|
|
desired_count = var.desired_count
|
|
launch_type = "FARGATE"
|
|
|
|
network_configuration {
|
|
subnets = var.private_subnet_ids
|
|
security_groups = [aws_security_group.app.id]
|
|
assign_public_ip = false
|
|
}
|
|
|
|
load_balancer {
|
|
target_group_arn = aws_lb_target_group.app.arn
|
|
container_name = var.service_name
|
|
container_port = var.container_port
|
|
}
|
|
}
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/terraform_scaffolder.py <target-path> --provider=aws|gcp|azure --module=ecs-service|gke-deployment|aks-service [--verbose]
|
|
```
|
|
|
|
### 3. Deployment Manager
|
|
|
|
Orchestrates deployments with blue/green or rolling strategies, health-check gates, and automatic rollback on failure.
|
|
|
|
**Example — Kubernetes blue/green deployment (blue-slot specific elements):**
|
|
```yaml
|
|
# k8s/deployment-blue.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: app-blue
|
|
labels:
|
|
app: myapp
|
|
slot: blue # slot label distinguishes blue from green
|
|
spec:
|
|
replicas: 3
|
|
selector:
|
|
matchLabels:
|
|
app: myapp
|
|
slot: blue
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: myapp
|
|
slot: blue
|
|
spec:
|
|
containers:
|
|
- name: app
|
|
image: ghcr.io/org/app:1.2.3
|
|
readinessProbe: # gate: pod must pass before traffic switches
|
|
httpGet:
|
|
path: /healthz
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
resources:
|
|
requests:
|
|
cpu: "250m"
|
|
memory: "256Mi"
|
|
limits:
|
|
cpu: "500m"
|
|
memory: "512Mi"
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/deployment_manager.py deploy \
|
|
--env=staging|production \
|
|
--image=app:1.2.3 \
|
|
--strategy=blue-green|rolling \
|
|
--health-check-url=https://app.example.com/healthz
|
|
|
|
python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2
|
|
python scripts/deployment_manager.py --analyze --env=production # audit current state
|
|
```
|
|
|
|
## Resources
|
|
|
|
- Pattern Reference: `references/cicd_pipeline_guide.md` — detailed CI/CD patterns, best practices, anti-patterns
|
|
- Workflow Guide: `references/infrastructure_as_code.md` — IaC step-by-step processes, optimization, troubleshooting
|
|
- Technical Guide: `references/deployment_strategies.md` — deployment strategy configs, security considerations, scalability
|
|
- Tool Scripts: `scripts/` directory
|
|
|
|
## Development Workflow
|
|
|
|
### 1. Infrastructure Changes (Terraform)
|
|
|
|
```bash
|
|
# Scaffold or update module
|
|
python scripts/terraform_scaffolder.py ./infra --provider=aws --module=ecs-service --verbose
|
|
|
|
# Validate and plan — review diff before applying
|
|
terraform -chdir=infra init
|
|
terraform -chdir=infra validate
|
|
terraform -chdir=infra plan -out=tfplan
|
|
|
|
# Apply only after plan review
|
|
terraform -chdir=infra apply tfplan
|
|
|
|
# Verify resources are healthy
|
|
aws ecs describe-services --cluster production --services app-service \
|
|
--query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'
|
|
```
|
|
|
|
### 2. Application Deployment
|
|
|
|
```bash
|
|
# Generate or update pipeline config
|
|
python scripts/pipeline_generator.py . --platform=github --stages=build,test,security,deploy
|
|
|
|
# Build and tag image
|
|
docker build -t ghcr.io/org/app:$(git rev-parse --short HEAD) .
|
|
docker push ghcr.io/org/app:$(git rev-parse --short HEAD)
|
|
|
|
# Deploy with health-check gate
|
|
python scripts/deployment_manager.py deploy \
|
|
--env=production \
|
|
--image=app:$(git rev-parse --short HEAD) \
|
|
--strategy=blue-green \
|
|
--health-check-url=https://app.example.com/healthz
|
|
|
|
# Verify pods are running
|
|
kubectl get pods -n production -l app=myapp
|
|
kubectl rollout status deployment/app-blue -n production
|
|
|
|
# Switch traffic after verification
|
|
kubectl patch service app-svc -n production \
|
|
-p '{"spec":{"selector":{"slot":"blue"}}}'
|
|
```
|
|
|
|
### 3. Rollback Procedure
|
|
|
|
```bash
|
|
# Immediate rollback via deployment manager
|
|
python scripts/deployment_manager.py rollback --env=production --to-version=1.2.2
|
|
|
|
# Or via kubectl
|
|
kubectl rollout undo deployment/app -n production
|
|
kubectl rollout status deployment/app -n production
|
|
|
|
# Verify rollback succeeded
|
|
kubectl get pods -n production -l app=myapp
|
|
curl -sf https://app.example.com/healthz || echo "ROLLBACK FAILED — escalate"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
Check the comprehensive troubleshooting section in `references/deployment_strategies.md`.
|