feat(skills): add research-summarizer and docker-development agent skills

research-summarizer (product-team/):
- Structured research summarization for papers, articles, reports
- Slash commands: /research:summarize, /research:compare, /research:cite
- Python tools: extract_citations.py (5 citation formats), format_summary.py (6 templates)
- References: summary-templates.md, citation-formats.md

docker-development (engineering/):
- Dockerfile optimization, compose orchestration, container security
- Slash commands: /docker:optimize, /docker:compose, /docker:security
- Python tools: dockerfile_analyzer.py (15 rules), compose_validator.py (best practices)
- References: dockerfile-best-practices.md, compose-patterns.md

Both skills include .claude-plugin/plugin.json and follow POWERFUL tier conventions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Leo
2026-03-15 22:47:16 +01:00
parent 2b5260dbeb
commit bf1473b1be
12 changed files with 2761 additions and 0 deletions

View File

@@ -0,0 +1,13 @@
{
"name": "docker-development",
"description": "Docker and container development agent skill and plugin for Dockerfile optimization, docker-compose orchestration, multi-stage builds, and container security hardening. Covers build performance, layer caching, and production-ready container patterns.",
"version": "1.0.0",
"author": {
"name": "Alireza Rezvani",
"url": "https://alirezarezvani.com"
},
"homepage": "https://github.com/alirezarezvani/claude-skills/tree/main/engineering/docker-development",
"repository": "https://github.com/alirezarezvani/claude-skills",
"license": "MIT",
"skills": "./"
}

View File

@@ -0,0 +1,366 @@
---
name: "docker-development"
description: "Docker and container development agent skill and plugin for Dockerfile optimization, docker-compose orchestration, multi-stage builds, and container security hardening. Use when: user wants to optimize a Dockerfile, create or improve docker-compose configurations, implement multi-stage builds, audit container security, reduce image size, or follow container best practices. Covers build performance, layer caching, secret management, and production-ready container patterns."
license: MIT
metadata:
version: 1.0.0
author: Alireza Rezvani
category: engineering
updated: 2026-03-16
---
# Docker Development
> Smaller images. Faster builds. Secure containers. No guesswork.
Opinionated Docker workflow that turns bloated Dockerfiles into production-grade containers. Covers optimization, multi-stage builds, compose orchestration, and security hardening.
Not a Docker tutorial — a set of concrete decisions about how to build containers that don't waste time, space, or attack surface.
---
## Slash Commands
| Command | What it does |
|---------|-------------|
| `/docker:optimize` | Analyze and optimize a Dockerfile for size, speed, and layer caching |
| `/docker:compose` | Generate or improve docker-compose.yml with best practices |
| `/docker:security` | Audit a Dockerfile or running container for security issues |
---
## When This Skill Activates
Recognize these patterns from the user:
- "Optimize this Dockerfile"
- "My Docker build is slow"
- "Create a docker-compose for this project"
- "Is this Dockerfile secure?"
- "Reduce my Docker image size"
- "Set up multi-stage builds"
- "Docker best practices for [language/framework]"
- Any request involving: Dockerfile, docker-compose, container, image size, build cache, Docker security
If the user has a Dockerfile or wants to containerize something → this skill applies.
---
## Workflow
### `/docker:optimize` — Dockerfile Optimization
1. **Analyze current state**
- Read the Dockerfile
- Identify base image and its size
- Count layers (each RUN/COPY/ADD = 1 layer)
- Check for common anti-patterns
2. **Apply optimization checklist**
```
BASE IMAGE
├── Use specific tags, never :latest in production
├── Prefer slim/alpine variants (debian-slim > ubuntu > debian)
├── Pin digest for reproducibility in CI: image@sha256:...
└── Match base to runtime needs (don't use python:3.12 for a compiled binary)
LAYER OPTIMIZATION
├── Combine related RUN commands with && \
├── Order layers: least-changing first (deps before source code)
├── Clean package manager cache in the same RUN layer
├── Use .dockerignore to exclude unnecessary files
└── Separate build deps from runtime deps
BUILD CACHE
├── COPY dependency files before source code (package.json, requirements.txt, go.mod)
├── Install deps in a separate layer from code copy
├── Use BuildKit cache mounts: --mount=type=cache,target=/root/.cache
└── Avoid COPY . . before dependency installation
MULTI-STAGE BUILDS
├── Stage 1: build (full SDK, build tools, dev deps)
├── Stage 2: runtime (minimal base, only production artifacts)
├── COPY --from=builder only what's needed
└── Final image should have NO build tools, NO source code, NO dev deps
```
3. **Generate optimized Dockerfile**
- Apply all relevant optimizations
- Add inline comments explaining each decision
- Report estimated size reduction
4. **Validate**
```bash
python3 scripts/dockerfile_analyzer.py Dockerfile
```
### `/docker:compose` — Docker Compose Configuration
1. **Identify services**
- Application (web, API, worker)
- Database (postgres, mysql, redis, mongo)
- Cache (redis, memcached)
- Queue (rabbitmq, kafka)
- Reverse proxy (nginx, traefik, caddy)
2. **Apply compose best practices**
```
SERVICES
├── Use depends_on with condition: service_healthy
├── Add healthchecks for every service
├── Set resource limits (mem_limit, cpus)
├── Use named volumes for persistent data
└── Pin image versions
NETWORKING
├── Create explicit networks (don't rely on default)
├── Separate frontend and backend networks
├── Only expose ports that need external access
└── Use internal: true for backend-only networks
ENVIRONMENT
├── Use env_file for secrets, not inline environment
├── Never commit .env files (add to .gitignore)
├── Use variable substitution: ${VAR:-default}
└── Document all required env vars
DEVELOPMENT vs PRODUCTION
├── Use compose profiles or override files
├── Dev: bind mounts for hot reload, debug ports exposed
├── Prod: named volumes, no debug ports, restart: unless-stopped
└── docker-compose.override.yml for dev-only config
```
3. **Generate compose file**
- Output docker-compose.yml with healthchecks, networks, volumes
- Generate .env.example with all required variables documented
- Add dev/prod profile annotations
### `/docker:security` — Container Security Audit
1. **Dockerfile audit**
| Check | Severity | Fix |
|-------|----------|-----|
| Running as root | Critical | Add `USER nonroot` after creating user |
| Using :latest tag | High | Pin to specific version |
| Secrets in ENV/ARG | Critical | Use BuildKit secrets: `--mount=type=secret` |
| COPY with broad glob | Medium | Use specific paths, add .dockerignore |
| Unnecessary EXPOSE | Low | Only expose ports the app uses |
| No HEALTHCHECK | Medium | Add HEALTHCHECK with appropriate interval |
| Privileged instructions | High | Avoid `--privileged`, drop capabilities |
| Package manager cache retained | Low | Clean in same RUN layer |
2. **Runtime security checks**
| Check | Severity | Fix |
|-------|----------|-----|
| Container running as root | Critical | Set user in Dockerfile or compose |
| Writable root filesystem | Medium | Use `read_only: true` in compose |
| All capabilities retained | High | Drop all, add only needed: `cap_drop: [ALL]` |
| No resource limits | Medium | Set `mem_limit` and `cpus` |
| Host network mode | High | Use bridge or custom network |
| Sensitive mounts | Critical | Never mount /etc, /var/run/docker.sock in prod |
| No log driver configured | Low | Set `logging:` with size limits |
3. **Generate security report**
```
SECURITY AUDIT — [Dockerfile/Image name]
Date: [timestamp]
CRITICAL: [count]
HIGH: [count]
MEDIUM: [count]
LOW: [count]
[Detailed findings with fix recommendations]
```
---
## Tooling
### `scripts/dockerfile_analyzer.py`
CLI utility for static analysis of Dockerfiles.
**Features:**
- Layer count and optimization suggestions
- Base image analysis with size estimates
- Anti-pattern detection (15+ rules)
- Security issue flagging
- Multi-stage build detection and validation
- JSON and text output
**Usage:**
```bash
# Analyze a Dockerfile
python3 scripts/dockerfile_analyzer.py Dockerfile
# JSON output
python3 scripts/dockerfile_analyzer.py Dockerfile --output json
# Analyze with security focus
python3 scripts/dockerfile_analyzer.py Dockerfile --security
# Check a specific directory
python3 scripts/dockerfile_analyzer.py path/to/Dockerfile
```
### `scripts/compose_validator.py`
CLI utility for validating docker-compose files.
**Features:**
- Service dependency validation
- Healthcheck presence detection
- Network configuration analysis
- Volume mount validation
- Environment variable audit
- Port conflict detection
- Best practice scoring
**Usage:**
```bash
# Validate a compose file
python3 scripts/compose_validator.py docker-compose.yml
# JSON output
python3 scripts/compose_validator.py docker-compose.yml --output json
# Strict mode (fail on warnings)
python3 scripts/compose_validator.py docker-compose.yml --strict
```
---
## Multi-Stage Build Patterns
### Pattern 1: Compiled Language (Go, Rust, C++)
```dockerfile
# Build stage
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o /app/server ./cmd/server
# Runtime stage
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]
```
### Pattern 2: Node.js / TypeScript
```dockerfile
# Dependencies stage
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production=false
# Build stage
FROM deps AS builder
COPY . .
RUN npm run build
# Runtime stage
FROM node:20-alpine
WORKDIR /app
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001
COPY --from=builder /app/dist ./dist
COPY --from=deps /app/node_modules ./node_modules
COPY package.json ./
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
```
### Pattern 3: Python
```dockerfile
# Build stage
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Runtime stage
FROM python:3.12-slim
WORKDIR /app
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
COPY --from=builder /install /usr/local
COPY . .
USER appuser
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
```
---
## Base Image Decision Tree
```
Is it a compiled binary (Go, Rust, C)?
├── Yes → distroless/static or scratch
└── No
├── Need a shell for debugging?
│ ├── Yes → alpine variant (e.g., node:20-alpine)
│ └── No → distroless variant
├── Need glibc (not musl)?
│ ├── Yes → slim variant (e.g., python:3.12-slim)
│ └── No → alpine variant
└── Need specific OS packages?
├── Many → debian-slim
└── Few → alpine + apk add
```
---
## Proactive Triggers
Flag these without being asked:
- **Dockerfile uses :latest** → Suggest pinning to a specific version tag.
- **No .dockerignore** → Create one. At minimum: `.git`, `node_modules`, `__pycache__`, `.env`.
- **COPY . . before dependency install** → Cache bust. Reorder to install deps first.
- **Running as root** → Add USER instruction. No exceptions for production.
- **Secrets in ENV or ARG** → Use BuildKit secret mounts. Never bake secrets into layers.
- **Image over 1GB** → Multi-stage build required. No reason for a production image this large.
- **No healthcheck** → Add one. Orchestrators (Compose, K8s) need it for proper lifecycle management.
- **apt-get without cleanup in same layer** → `rm -rf /var/lib/apt/lists/*` in the same RUN.
---
## Installation
### One-liner (any tool)
```bash
git clone https://github.com/alirezarezvani/claude-skills.git
cp -r claude-skills/engineering/docker-development ~/.claude/skills/
```
### Multi-tool install
```bash
./scripts/convert.sh --skill docker-development --tool codex|gemini|cursor|windsurf|openclaw
```
### OpenClaw
```bash
clawhub install cs-docker-development
```
---
## Related Skills
- **senior-devops** — Broader DevOps scope (CI/CD, IaC, monitoring). Complementary — use docker-development for container-specific work, senior-devops for pipeline and infrastructure.
- **senior-security** — Application security. Complementary — docker-development covers container security, senior-security covers application-level threats.
- **autoresearch-agent** — Can optimize Docker build times or image sizes as measurable experiments.
- **ci-cd-pipeline-builder** — Pipeline construction. Complementary — docker-development builds the containers, ci-cd-pipeline-builder deploys them.

View File

@@ -0,0 +1,282 @@
# Docker Compose Patterns Reference
## Production-Ready Patterns
### Web App + Database + Cache
```yaml
services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
env_file:
- .env
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 10s
restart: unless-stopped
networks:
- frontend
- backend
mem_limit: 512m
cpus: 1.0
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
env_file:
- .env.db
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
networks:
- backend
mem_limit: 256m
redis:
image: redis:7-alpine
command: redis-server --maxmemory 64mb --maxmemory-policy allkeys-lru
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
restart: unless-stopped
networks:
- backend
mem_limit: 128m
volumes:
pgdata:
networks:
frontend:
backend:
internal: true
```
### Key Patterns
- **Healthchecks on every service** — enables depends_on with condition
- **Named volumes** — data persists across container recreation
- **Explicit networks** — backend is internal (no external access)
- **env_file** — secrets not in compose file
- **Resource limits** — prevent runaway containers
---
## Development Override Pattern
### docker-compose.yml (base — production-like)
```yaml
services:
app:
build: .
ports:
- "3000:3000"
restart: unless-stopped
```
### docker-compose.override.yml (dev — auto-loaded)
```yaml
services:
app:
build:
target: development
volumes:
- .:/app # Bind mount for hot reload
- /app/node_modules # Preserve container node_modules
environment:
- NODE_ENV=development
- DEBUG=true
ports:
- "9229:9229" # Debug port
restart: "no"
```
### Usage
```bash
# Development (auto-loads override)
docker compose up
# Production (skip override)
docker compose -f docker-compose.yml up -d
# Explicit profiles
docker compose --profile dev up
docker compose --profile prod up -d
```
---
## Network Isolation Pattern
```yaml
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
networks:
- frontend
app:
build: .
networks:
- frontend
- backend
db:
image: postgres:16-alpine
networks:
- backend
redis:
image: redis:7-alpine
networks:
- backend
networks:
frontend:
# External traffic reaches nginx and app
backend:
internal: true
# DB and Redis only reachable by app
```
### Why This Matters
- Database and cache are **not accessible from outside**
- Only nginx and app handle external traffic
- Lateral movement limited if one container is compromised
---
## Worker + Queue Pattern
```yaml
services:
api:
build:
context: .
target: runtime
command: uvicorn main:app --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
depends_on:
rabbitmq:
condition: service_healthy
worker:
build:
context: .
target: runtime
command: celery -A tasks worker --loglevel=info
depends_on:
rabbitmq:
condition: service_healthy
scheduler:
build:
context: .
target: runtime
command: celery -A tasks beat --loglevel=info
depends_on:
rabbitmq:
condition: service_healthy
rabbitmq:
image: rabbitmq:3.13-management-alpine
ports:
- "15672:15672" # Management UI (dev only)
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "check_running"]
interval: 10s
timeout: 5s
retries: 5
```
---
## Logging Configuration
```yaml
services:
app:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
tag: "{{.Name}}/{{.ID}}"
```
### Why
- **max-size** prevents disk exhaustion
- **max-file** rotates logs automatically
- Default Docker logging has NO size limit — production servers can run out of disk
---
## Environment Variable Patterns
### .env.example (committed to repo)
```env
# Database
DATABASE_URL=postgres://user:password@db:5432/appname
POSTGRES_USER=user
POSTGRES_PASSWORD=changeme
POSTGRES_DB=appname
# Redis
REDIS_URL=redis://redis:6379/0
# Application
SECRET_KEY=changeme-generate-a-real-secret
NODE_ENV=production
LOG_LEVEL=info
# External Services (BYOK)
# SMTP_HOST=
# SMTP_PORT=587
# AWS_ACCESS_KEY_ID=
# AWS_SECRET_ACCESS_KEY=
```
### Variable Substitution in Compose
```yaml
services:
app:
image: myapp:${APP_VERSION:-latest}
environment:
- LOG_LEVEL=${LOG_LEVEL:-info}
- PORT=${PORT:-3000}
```
---
## Troubleshooting Checklist
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| Container exits immediately | CMD/ENTRYPOINT crashes, missing env vars | Check logs: `docker compose logs service` |
| Port already in use | Another service or host process on same port | Change host port: `"3001:3000"` |
| Volume permissions denied | Container user doesn't own mounted path | Match UID/GID or use named volumes |
| Build cache not working | COPY . . invalidates cache early | Reorder: copy deps first, then source |
| depends_on doesn't wait | No healthcheck condition | Add `condition: service_healthy` |
| Container OOM killed | No memory limit or limit too low | Set appropriate `mem_limit` |
| Network connectivity issues | Wrong network or service name | Services communicate by service name within shared network |

View File

@@ -0,0 +1,235 @@
# Dockerfile Best Practices Reference
## Layer Optimization
### The Golden Rule
Every `RUN`, `COPY`, and `ADD` instruction creates a new layer. Fewer layers = smaller image.
### Combine Related Commands
```dockerfile
# Bad — 3 layers
RUN apt-get update
RUN apt-get install -y curl git
RUN rm -rf /var/lib/apt/lists/*
# Good — 1 layer
RUN apt-get update && \
apt-get install -y --no-install-recommends curl git && \
rm -rf /var/lib/apt/lists/*
```
### Order Layers by Change Frequency
```dockerfile
# Least-changing layers first
COPY package.json package-lock.json ./ # Changes rarely
RUN npm ci # Changes when deps change
COPY . . # Changes every build
RUN npm run build # Changes every build
```
### Use .dockerignore
```
.git
node_modules
__pycache__
*.pyc
.env
.env.*
dist
build
*.log
.DS_Store
.vscode
.idea
coverage
.pytest_cache
```
---
## Base Image Selection
### Size Comparison (approximate)
| Base | Size | Use Case |
|------|------|----------|
| `scratch` | 0MB | Static binaries (Go, Rust) |
| `distroless/static` | 2MB | Static binaries with CA certs |
| `alpine` | 7MB | Minimal Linux, shell access |
| `distroless/base` | 20MB | Dynamic binaries (C/C++) |
| `debian-slim` | 80MB | When you need glibc + apt |
| `ubuntu` | 78MB | Full Ubuntu ecosystem |
| `python:3.12-slim` | 130MB | Python apps (production) |
| `node:20-alpine` | 130MB | Node.js apps |
| `golang:1.22` | 800MB | Go build stage only |
| `python:3.12` | 900MB | Never use in production |
| `node:20` | 1000MB | Never use in production |
### When to Use Alpine
- Small image size matters
- No dependency on glibc (musl works)
- Willing to handle occasional musl-related issues
- Not running Python with C extensions that need glibc
### When to Use Slim
- Need glibc compatibility
- Python with compiled C extensions (numpy, pandas)
- Fewer musl compatibility issues
- Still much smaller than full images
### When to Use Distroless
- Maximum security (no shell, no package manager)
- Compiled/static binaries
- Don't need debugging access inside container
- Production-only (not development)
---
## Multi-Stage Builds
### Why Multi-Stage
- Build tools and source code stay out of production image
- Final image contains only runtime artifacts
- Dramatically reduces image size and attack surface
### Naming Stages
```dockerfile
FROM golang:1.22 AS builder # Named stage
FROM alpine:3.19 AS runtime # Named stage
COPY --from=builder /app /app # Reference by name
```
### Selective Copy
```dockerfile
# Only copy the built artifact — nothing else
COPY --from=builder /app/server /server
COPY --from=builder /app/config.yaml /config.yaml
# Don't COPY --from=builder /app/ /app/ (copies source code too)
```
---
## Security Hardening
### Run as Non-Root
```dockerfile
# Create user
RUN groupadd -r appgroup && useradd -r -g appgroup -s /sbin/nologin appuser
# Set ownership
COPY --chown=appuser:appgroup . .
# Switch user (after all root-requiring operations)
USER appuser
```
### Secret Management
```dockerfile
# Bad — secret baked into layer
ENV API_KEY=sk-12345
# Good — BuildKit secret mount (never in layer)
RUN --mount=type=secret,id=api_key \
export API_KEY=$(cat /run/secrets/api_key) && \
./configure --api-key=$API_KEY
```
Build with:
```bash
docker build --secret id=api_key,src=./api_key.txt .
```
### Read-Only Filesystem
```yaml
# docker-compose.yml
services:
app:
read_only: true
tmpfs:
- /tmp
- /var/run
```
### Drop Capabilities
```yaml
services:
app:
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024
```
---
## Build Performance
### BuildKit Cache Mounts
```dockerfile
# Cache pip downloads across builds
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Cache apt downloads
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y curl
```
### Parallel Builds
```dockerfile
# These stages build in parallel when using BuildKit
FROM node:20-alpine AS frontend
COPY frontend/ .
RUN npm ci && npm run build
FROM golang:1.22 AS backend
COPY backend/ .
RUN go build -o server
FROM alpine:3.19
COPY --from=frontend /dist /static
COPY --from=backend /server /server
```
### Enable BuildKit
```bash
export DOCKER_BUILDKIT=1
docker build .
# Or in daemon.json
{ "features": { "buildkit": true } }
```
---
## Health Checks
### HTTP Service
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
```
### Without curl (using wget)
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1
```
### TCP Check
```dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD nc -z localhost 8000 || exit 1
```
### PostgreSQL
```dockerfile
HEALTHCHECK --interval=10s --timeout=5s --retries=5 \
CMD pg_isready -U postgres || exit 1
```
### Redis
```dockerfile
HEALTHCHECK --interval=10s --timeout=3s --retries=3 \
CMD redis-cli ping | grep PONG || exit 1
```

View File

@@ -0,0 +1,390 @@
#!/usr/bin/env python3
"""
docker-development: Docker Compose Validator
Validate docker-compose.yml files for best practices, missing healthchecks,
network configuration, port conflicts, and security issues.
Usage:
python scripts/compose_validator.py docker-compose.yml
python scripts/compose_validator.py docker-compose.yml --output json
python scripts/compose_validator.py docker-compose.yml --strict
"""
import argparse
import json
import re
import sys
from pathlib import Path
# --- Demo Compose File ---
DEMO_COMPOSE = """
version: '3.8'
services:
web:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgres://user:password@db:5432/app
- SECRET_KEY=my-secret-key
depends_on:
- db
- redis
db:
image: postgres:latest
ports:
- "5432:5432"
environment:
POSTGRES_PASSWORD: password123
volumes:
- ./data:/var/lib/postgresql/data
redis:
image: redis
ports:
- "6379:6379"
worker:
build: .
command: python worker.py
environment:
- DATABASE_URL=postgres://user:password@db:5432/app
"""
def parse_yaml_simple(content):
"""Simple YAML-like parser for docker-compose files (stdlib only).
Handles the subset of YAML used in typical docker-compose files:
- Top-level keys
- Service definitions
- Lists (- items)
- Key-value pairs
- Nested indentation
"""
result = {"services": {}, "volumes": {}, "networks": {}}
current_section = None
current_service = None
current_key = None
indent_stack = []
for line in content.splitlines():
stripped = line.strip()
if not stripped or stripped.startswith("#"):
continue
indent = len(line) - len(line.lstrip())
# Top-level keys
if indent == 0 and ":" in stripped:
key = stripped.split(":")[0].strip()
if key == "services":
current_section = "services"
elif key == "volumes":
current_section = "volumes"
elif key == "networks":
current_section = "networks"
elif key == "version":
val = stripped.split(":", 1)[1].strip().strip("'\"")
result["version"] = val
current_service = None
current_key = None
continue
if current_section == "services":
# Service name (indent level 2)
if indent == 2 and ":" in stripped and not stripped.startswith("-"):
key = stripped.split(":")[0].strip()
val = stripped.split(":", 1)[1].strip() if ":" in stripped else ""
if val and not val.startswith("{"):
# Simple key:value inside a service
if current_service and current_service in result["services"]:
result["services"][current_service][key] = val
else:
current_service = key
result["services"][current_service] = {}
current_key = None
else:
current_service = key
result["services"][current_service] = {}
current_key = None
continue
if current_service and current_service in result["services"]:
svc = result["services"][current_service]
# Service-level keys (indent 4)
if indent == 4 and ":" in stripped and not stripped.startswith("-"):
key = stripped.split(":")[0].strip()
val = stripped.split(":", 1)[1].strip()
current_key = key
if val:
svc[key] = val.strip("'\"")
else:
svc[key] = []
continue
# List items (indent 6 or 8)
if stripped.startswith("-") and current_key:
item = stripped[1:].strip().strip("'\"")
if current_key in svc:
if isinstance(svc[current_key], list):
svc[current_key].append(item)
else:
svc[current_key] = [svc[current_key], item]
else:
svc[current_key] = [item]
continue
# Nested key:value under current_key (e.g., healthcheck test)
if indent >= 6 and ":" in stripped and not stripped.startswith("-"):
key = stripped.split(":")[0].strip()
val = stripped.split(":", 1)[1].strip()
if current_key and current_key in svc:
if isinstance(svc[current_key], list):
svc[current_key] = {}
if isinstance(svc[current_key], dict):
svc[current_key][key] = val
return result
def validate_compose(parsed, strict=False):
"""Run validation rules on parsed compose file."""
findings = []
services = parsed.get("services", {})
# --- Version check ---
version = parsed.get("version", "")
if version:
findings.append({
"severity": "low",
"category": "deprecation",
"message": f"'version: {version}' is deprecated in Compose V2 — remove it",
"service": "(top-level)",
})
# --- Per-service checks ---
all_ports = []
for name, svc in services.items():
# Healthcheck
if "healthcheck" not in svc:
findings.append({
"severity": "medium",
"category": "reliability",
"message": f"No healthcheck defined — orchestrator can't detect unhealthy state",
"service": name,
})
# Image tag
image = svc.get("image", "")
if image:
if ":latest" in image:
findings.append({
"severity": "high",
"category": "reproducibility",
"message": f"Using :latest tag on '{image}' — pin to specific version",
"service": name,
})
elif ":" not in image and "/" not in image:
findings.append({
"severity": "high",
"category": "reproducibility",
"message": f"No tag on image '{image}' — defaults to :latest",
"service": name,
})
# Ports
ports = svc.get("ports", [])
if isinstance(ports, list):
for p in ports:
p_str = str(p)
# Extract host port
match = re.match(r"(\d+):\d+", p_str)
if match:
host_port = match.group(1)
all_ports.append((host_port, name))
# Environment secrets
env = svc.get("environment", [])
if isinstance(env, list):
for e in env:
e_str = str(e)
if re.search(r"(?:PASSWORD|SECRET|TOKEN|KEY)=\S+", e_str, re.IGNORECASE):
if "env_file" not in svc:
findings.append({
"severity": "critical",
"category": "security",
"message": f"Inline secret in environment: {e_str[:40]}...",
"service": name,
})
elif isinstance(env, dict):
for k, v in env.items():
if re.search(r"(?:PASSWORD|SECRET|TOKEN|KEY)", k, re.IGNORECASE) and v:
findings.append({
"severity": "critical",
"category": "security",
"message": f"Inline secret: {k}={str(v)[:20]}...",
"service": name,
})
# depends_on without condition
depends = svc.get("depends_on", [])
if isinstance(depends, list) and depends:
findings.append({
"severity": "medium",
"category": "reliability",
"message": "depends_on without condition: service_healthy — race condition risk",
"service": name,
})
# Bind mounts (./path style)
volumes = svc.get("volumes", [])
if isinstance(volumes, list):
for v in volumes:
v_str = str(v)
if v_str.startswith("./") or v_str.startswith("/"):
if "/var/run/docker.sock" in v_str:
findings.append({
"severity": "critical",
"category": "security",
"message": "Docker socket mounted — container has host Docker access",
"service": name,
})
# Restart policy
if "restart" not in svc and "build" not in svc:
findings.append({
"severity": "low",
"category": "reliability",
"message": "No restart policy — container won't auto-restart on failure",
"service": name,
})
# Resource limits
if "mem_limit" not in svc and "deploy" not in svc:
findings.append({
"severity": "low" if not strict else "medium",
"category": "resources",
"message": "No memory limit — container can consume all host memory",
"service": name,
})
# Port conflicts
port_map = {}
for port, svc_name in all_ports:
if port in port_map:
findings.append({
"severity": "high",
"category": "networking",
"message": f"Port {port} conflict between '{port_map[port]}' and '{svc_name}'",
"service": svc_name,
})
port_map[port] = svc_name
# Network check
if "networks" not in parsed or not parsed["networks"]:
if len(services) > 1:
findings.append({
"severity": "low",
"category": "networking",
"message": "No explicit networks — all services share default bridge network",
"service": "(top-level)",
})
# Sort by severity
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
findings.sort(key=lambda f: severity_order.get(f["severity"], 4))
return findings
def generate_report(content, output_format="text", strict=False):
"""Generate validation report."""
parsed = parse_yaml_simple(content)
findings = validate_compose(parsed, strict)
services = parsed.get("services", {})
# Score
deductions = {"critical": 25, "high": 15, "medium": 5, "low": 2}
score = max(0, 100 - sum(deductions.get(f["severity"], 0) for f in findings))
counts = {
"critical": sum(1 for f in findings if f["severity"] == "critical"),
"high": sum(1 for f in findings if f["severity"] == "high"),
"medium": sum(1 for f in findings if f["severity"] == "medium"),
"low": sum(1 for f in findings if f["severity"] == "low"),
}
result = {
"score": score,
"services": list(services.keys()),
"service_count": len(services),
"findings": findings,
"finding_counts": counts,
}
if output_format == "json":
print(json.dumps(result, indent=2))
return result
# Text output
print(f"\n{'=' * 60}")
print(f" Docker Compose Validation Report")
print(f"{'=' * 60}")
print(f" Score: {score}/100")
print(f" Services: {', '.join(services.keys()) if services else 'none'}")
print()
print(f" Findings: {counts['critical']} critical | {counts['high']} high | {counts['medium']} medium | {counts['low']} low")
print(f"{'' * 60}")
for f in findings:
icon = {"critical": "!!!", "high": "!!", "medium": "!", "low": "~"}.get(f["severity"], "?")
print(f"\n {icon} {f['severity'].upper()} [{f['category']}] — {f['service']}")
print(f" {f['message']}")
if not findings:
print("\n No issues found. Compose file looks good.")
print(f"\n{'=' * 60}\n")
return result
def main():
parser = argparse.ArgumentParser(
description="docker-development: Docker Compose validator"
)
parser.add_argument("composefile", nargs="?", help="Path to docker-compose.yml (omit for demo)")
parser.add_argument(
"--output", "-o",
choices=["text", "json"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--strict",
action="store_true",
help="Strict mode — elevate warnings to higher severity",
)
args = parser.parse_args()
if args.composefile:
path = Path(args.composefile)
if not path.exists():
print(f"Error: File not found: {args.composefile}", file=sys.stderr)
sys.exit(1)
content = path.read_text(encoding="utf-8")
else:
print("No compose file provided. Running demo validation...\n")
content = DEMO_COMPOSE
generate_report(content, args.output, args.strict)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,410 @@
#!/usr/bin/env python3
"""
docker-development: Dockerfile Analyzer
Static analysis of Dockerfiles for optimization opportunities, anti-patterns,
and security issues. Reports layer count, base image analysis, and actionable
recommendations.
Usage:
python scripts/dockerfile_analyzer.py Dockerfile
python scripts/dockerfile_analyzer.py Dockerfile --output json
python scripts/dockerfile_analyzer.py Dockerfile --security
"""
import argparse
import json
import re
import sys
from pathlib import Path
# --- Analysis Rules ---
ANTI_PATTERNS = [
{
"id": "AP001",
"name": "latest_tag",
"severity": "high",
"pattern": r"^FROM\s+\S+:latest",
"message": "Using :latest tag — pin to a specific version for reproducibility",
"fix": "Use a specific tag like :3.12-slim or pin by digest",
},
{
"id": "AP002",
"name": "no_tag",
"severity": "high",
"pattern": r"^FROM\s+([a-z][a-z0-9_.-]+)\s*$",
"message": "No tag specified on base image — defaults to :latest",
"fix": "Add a specific version tag",
},
{
"id": "AP003",
"name": "run_apt_no_clean",
"severity": "medium",
"pattern": r"^RUN\s+.*apt-get\s+install(?!.*rm\s+-rf\s+/var/lib/apt/lists)",
"message": "apt-get install without cleanup in same layer — bloats image",
"fix": "Add && rm -rf /var/lib/apt/lists/* in the same RUN instruction",
},
{
"id": "AP004",
"name": "run_apk_no_cache",
"severity": "medium",
"pattern": r"^RUN\s+.*apk\s+add(?!\s+--no-cache)",
"message": "apk add without --no-cache — retains package index",
"fix": "Use: apk add --no-cache <packages>",
},
{
"id": "AP005",
"name": "add_instead_of_copy",
"severity": "low",
"pattern": r"^ADD\s+(?!https?://)\S+",
"message": "Using ADD for local files — COPY is more explicit and predictable",
"fix": "Use COPY instead of ADD unless you need tar auto-extraction or URL fetching",
},
{
"id": "AP006",
"name": "multiple_cmd",
"severity": "medium",
"pattern": None, # Custom check
"message": "Multiple CMD instructions — only the last one takes effect",
"fix": "Keep exactly one CMD instruction",
},
{
"id": "AP007",
"name": "env_secrets",
"severity": "critical",
"pattern": r"^(?:ENV|ARG)\s+\S*(?:PASSWORD|SECRET|TOKEN|KEY|API_KEY)\s*=",
"message": "Secrets in ENV/ARG — baked into image layers and visible in history",
"fix": "Use BuildKit secrets: RUN --mount=type=secret,id=mytoken",
},
{
"id": "AP008",
"name": "broad_copy",
"severity": "medium",
"pattern": r"^COPY\s+\.\s+\.",
"message": "COPY . . copies everything — may include secrets, git history, node_modules",
"fix": "Use .dockerignore and copy specific directories, or copy after dependency install",
},
{
"id": "AP009",
"name": "no_user",
"severity": "critical",
"pattern": None, # Custom check
"message": "No USER instruction — container runs as root",
"fix": "Add USER nonroot or create a dedicated user",
},
{
"id": "AP010",
"name": "pip_no_cache",
"severity": "low",
"pattern": r"^RUN\s+.*pip\s+install(?!\s+--no-cache-dir)",
"message": "pip install without --no-cache-dir — retains pip cache in layer",
"fix": "Use: pip install --no-cache-dir -r requirements.txt",
},
{
"id": "AP011",
"name": "npm_install_dev",
"severity": "medium",
"pattern": r"^RUN\s+.*npm\s+install\s*$",
"message": "npm install includes devDependencies — use npm ci --omit=dev for production",
"fix": "Use: npm ci --omit=dev (or npm ci --production)",
},
{
"id": "AP012",
"name": "expose_all",
"severity": "low",
"pattern": r"^EXPOSE\s+\d+(?:\s+\d+){3,}",
"message": "Exposing many ports — only expose what the application actually needs",
"fix": "Remove unnecessary EXPOSE directives",
},
{
"id": "AP013",
"name": "curl_wget_without_cleanup",
"severity": "low",
"pattern": r"^RUN\s+.*(?:curl|wget)\s+.*(?!&&\s*rm)",
"message": "Download without cleanup — downloaded archives may remain in layer",
"fix": "Download, extract, and remove archive in the same RUN instruction",
},
{
"id": "AP014",
"name": "no_healthcheck",
"severity": "medium",
"pattern": None, # Custom check
"message": "No HEALTHCHECK instruction — orchestrators can't determine container health",
"fix": "Add HEALTHCHECK CMD curl -f http://localhost:PORT/health || exit 1",
},
{
"id": "AP015",
"name": "shell_form_cmd",
"severity": "low",
"pattern": r'^(?:CMD|ENTRYPOINT)\s+(?!\[)["\']?\w',
"message": "Using shell form for CMD/ENTRYPOINT — exec form is preferred for signal handling",
"fix": 'Use exec form: CMD ["executable", "arg1", "arg2"]',
},
]
# Approximate base image sizes (MB)
BASE_IMAGE_SIZES = {
"scratch": 0,
"alpine": 7,
"distroless/static": 2,
"distroless/base": 20,
"distroless/cc": 25,
"debian-slim": 80,
"debian": 120,
"ubuntu": 78,
"python-slim": 130,
"python-alpine": 50,
"python": 900,
"node-alpine": 130,
"node-slim": 200,
"node": 1000,
"golang-alpine": 250,
"golang": 800,
"rust-slim": 750,
"rust": 1400,
"nginx-alpine": 40,
"nginx": 140,
}
# --- Demo Dockerfile ---
DEMO_DOCKERFILE = """FROM python:3.12
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENV SECRET_KEY=mysecretkey123
EXPOSE 8000 5432 6379
CMD python manage.py runserver 0.0.0.0:8000
"""
def parse_dockerfile(content):
"""Parse Dockerfile into structured instructions."""
instructions = []
current = ""
for line in content.splitlines():
stripped = line.strip()
if not stripped or stripped.startswith("#"):
continue
if stripped.endswith("\\"):
current += stripped[:-1] + " "
continue
current += stripped
# Parse instruction
match = re.match(r"^(\w+)\s+(.*)", current.strip())
if match:
instructions.append({
"instruction": match.group(1).upper(),
"args": match.group(2),
"raw": current.strip(),
})
current = ""
return instructions
def analyze_layers(instructions):
"""Count and classify layers."""
layer_instructions = {"FROM", "RUN", "COPY", "ADD"}
layers = [i for i in instructions if i["instruction"] in layer_instructions]
stages = [i for i in instructions if i["instruction"] == "FROM"]
return {
"total_layers": len(layers),
"stages": len(stages),
"is_multistage": len(stages) > 1,
"run_count": sum(1 for i in instructions if i["instruction"] == "RUN"),
"copy_count": sum(1 for i in instructions if i["instruction"] == "COPY"),
"add_count": sum(1 for i in instructions if i["instruction"] == "ADD"),
}
def analyze_base_image(instructions):
"""Analyze base image choice."""
from_instructions = [i for i in instructions if i["instruction"] == "FROM"]
if not from_instructions:
return {"image": "unknown", "tag": "unknown", "estimated_size_mb": 0}
last_from = from_instructions[-1]["args"].split()[0]
parts = last_from.split(":")
image = parts[0]
tag = parts[1] if len(parts) > 1 else "latest"
# Estimate size
size = 0
image_base = image.split("/")[-1]
for key, val in BASE_IMAGE_SIZES.items():
if key in f"{image_base}-{tag}" or key == image_base:
size = val
break
return {
"image": image,
"tag": tag,
"estimated_size_mb": size,
"is_alpine": "alpine" in tag,
"is_slim": "slim" in tag,
"is_distroless": "distroless" in image,
}
def run_pattern_checks(content, instructions):
"""Run anti-pattern checks."""
findings = []
for rule in ANTI_PATTERNS:
if rule["pattern"] is not None:
for match in re.finditer(rule["pattern"], content, re.MULTILINE | re.IGNORECASE):
findings.append({
"id": rule["id"],
"severity": rule["severity"],
"message": rule["message"],
"fix": rule["fix"],
"line": match.group(0).strip()[:80],
})
# Custom checks
# AP006: Multiple CMD
cmd_count = sum(1 for i in instructions if i["instruction"] == "CMD")
if cmd_count > 1:
r = next(r for r in ANTI_PATTERNS if r["id"] == "AP006")
findings.append({
"id": r["id"], "severity": r["severity"],
"message": r["message"], "fix": r["fix"],
"line": f"{cmd_count} CMD instructions found",
})
# AP009: No USER
has_user = any(i["instruction"] == "USER" for i in instructions)
if not has_user and instructions:
r = next(r for r in ANTI_PATTERNS if r["id"] == "AP009")
findings.append({
"id": r["id"], "severity": r["severity"],
"message": r["message"], "fix": r["fix"],
"line": "(no USER instruction found)",
})
# AP014: No HEALTHCHECK
has_healthcheck = any(i["instruction"] == "HEALTHCHECK" for i in instructions)
if not has_healthcheck and instructions:
r = next(r for r in ANTI_PATTERNS if r["id"] == "AP014")
findings.append({
"id": r["id"], "severity": r["severity"],
"message": r["message"], "fix": r["fix"],
"line": "(no HEALTHCHECK instruction found)",
})
return findings
def generate_report(content, output_format="text", security_focus=False):
"""Generate full analysis report."""
instructions = parse_dockerfile(content)
layers = analyze_layers(instructions)
base = analyze_base_image(instructions)
findings = run_pattern_checks(content, instructions)
if security_focus:
security_ids = {"AP007", "AP009", "AP008"}
security_severities = {"critical", "high"}
findings = [f for f in findings if f["id"] in security_ids or f["severity"] in security_severities]
# Deduplicate findings by id
seen_ids = set()
unique_findings = []
for f in findings:
key = (f["id"], f["line"])
if key not in seen_ids:
seen_ids.add(key)
unique_findings.append(f)
findings = unique_findings
# Sort by severity
severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
findings.sort(key=lambda f: severity_order.get(f["severity"], 4))
# Score (100 minus deductions)
deductions = {"critical": 25, "high": 15, "medium": 5, "low": 2}
score = max(0, 100 - sum(deductions.get(f["severity"], 0) for f in findings))
result = {
"score": score,
"base_image": base,
"layers": layers,
"findings": findings,
"finding_counts": {
"critical": sum(1 for f in findings if f["severity"] == "critical"),
"high": sum(1 for f in findings if f["severity"] == "high"),
"medium": sum(1 for f in findings if f["severity"] == "medium"),
"low": sum(1 for f in findings if f["severity"] == "low"),
},
}
if output_format == "json":
print(json.dumps(result, indent=2))
return result
# Text output
print(f"\n{'=' * 60}")
print(f" Dockerfile Analysis Report")
print(f"{'=' * 60}")
print(f" Score: {score}/100")
print(f" Base: {base['image']}:{base['tag']} (~{base['estimated_size_mb']}MB)")
print(f" Layers: {layers['total_layers']} | Stages: {layers['stages']} | Multi-stage: {'Yes' if layers['is_multistage'] else 'No'}")
print(f" RUN: {layers['run_count']} | COPY: {layers['copy_count']} | ADD: {layers['add_count']}")
print()
counts = result["finding_counts"]
print(f" Findings: {counts['critical']} critical | {counts['high']} high | {counts['medium']} medium | {counts['low']} low")
print(f"{'' * 60}")
for f in findings:
icon = {"critical": "!!!", "high": "!!", "medium": "!", "low": "~"}.get(f["severity"], "?")
print(f"\n [{f['id']}] {icon} {f['severity'].upper()}")
print(f" {f['message']}")
print(f" Line: {f['line']}")
print(f" Fix: {f['fix']}")
if not findings:
print("\n No issues found. Dockerfile looks good.")
print(f"\n{'=' * 60}\n")
return result
def main():
parser = argparse.ArgumentParser(
description="docker-development: Dockerfile static analyzer"
)
parser.add_argument("dockerfile", nargs="?", help="Path to Dockerfile (omit for demo)")
parser.add_argument(
"--output", "-o",
choices=["text", "json"],
default="text",
help="Output format (default: text)",
)
parser.add_argument(
"--security",
action="store_true",
help="Security-focused analysis only",
)
args = parser.parse_args()
if args.dockerfile:
path = Path(args.dockerfile)
if not path.exists():
print(f"Error: File not found: {args.dockerfile}", file=sys.stderr)
sys.exit(1)
content = path.read_text(encoding="utf-8")
else:
print("No Dockerfile provided. Running demo analysis...\n")
content = DEMO_DOCKERFILE
generate_report(content, args.output, args.security)
if __name__ == "__main__":
main()