Files
claude-skills-reference/engineering-team/azure-cloud-architect/references/architecture_patterns.md
Reza Rezvani 2056ba251f feat(engineering-team): add azure-cloud-architect, security-pen-testing; extend terraform-patterns
azure-cloud-architect (451-line SKILL.md, 3 scripts, 3 references):
- 6-step workflow mirroring aws-solution-architect for Azure
- Bicep/ARM templates, AKS, Functions, Cosmos DB, cost optimization
- architecture_designer.py, cost_optimizer.py, bicep_generator.py

security-pen-testing (850-line SKILL.md, 3 scripts, 3 references):
- OWASP Top 10 systematic audit, offensive security testing
- XSS/SQLi/SSRF/IDOR detection, secret scanning, API security
- vulnerability_scanner.py, dependency_auditor.py, pentest_report_generator.py
- Responsible disclosure workflow included

terraform-patterns extended (487 → 740 lines):
- Multi-cloud provider configuration
- OpenTofu compatibility notes
- Infracost integration for PR cost estimation
- Import existing infrastructure patterns
- Terragrunt DRY multi-environment patterns

Updated engineering-team plugin.json (26 → 28 skills).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 13:32:22 +01:00

414 lines
18 KiB
Markdown

# Azure Architecture Patterns
Reference guide for selecting the right Azure architecture pattern based on application requirements.
---
## Table of Contents
- [Pattern Selection Matrix](#pattern-selection-matrix)
- [Pattern 1: App Service Web Application](#pattern-1-app-service-web-application)
- [Pattern 2: Microservices on AKS](#pattern-2-microservices-on-aks)
- [Pattern 3: Serverless Event-Driven](#pattern-3-serverless-event-driven)
- [Pattern 4: Data Pipeline](#pattern-4-data-pipeline)
- [Pattern 5: Multi-Region Active-Active](#pattern-5-multi-region-active-active)
- [Well-Architected Framework Alignment](#well-architected-framework-alignment)
---
## Pattern Selection Matrix
| Pattern | Best For | Users | Monthly Cost | Complexity |
|---------|----------|-------|--------------|------------|
| App Service Web | MVPs, SaaS, APIs | <100K | $50-500 | Low |
| Microservices on AKS | Complex platforms, multi-team | Any | $500-5000 | High |
| Serverless Event-Driven | Event processing, webhooks, APIs | <1M | $20-500 | Low-Medium |
| Data Pipeline | Analytics, ETL, ML | Any | $200-3000 | Medium-High |
| Multi-Region Active-Active | Global apps, 99.99% uptime | >100K | 1.5-2x single | High |
---
## Pattern 1: App Service Web Application
### Architecture
```
┌──────────────┐
│ Azure Front │
│ Door │
│ (CDN + WAF) │
└──────┬───────┘
┌──────▼───────┐
│ App Service │
│ (Linux P1v3)│
│ + Slots │
└──┬───────┬───┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ Azure SQL │ │ Blob │
│ Serverless │ │ Storage │
└────────────┘ └───────────┘
┌────────▼──────────┐
│ Key Vault │
│ (secrets, certs) │
└───────────────────┘
```
### Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| Azure Front Door | Global CDN, WAF, SSL | Standard or Premium tier, custom domain |
| App Service | Web application hosting | Linux P1v3 (production), B1 (dev) |
| Azure SQL Database | Relational database | Serverless GP_S_Gen5_2 with auto-pause |
| Blob Storage | Static assets, uploads | Hot tier with lifecycle policies |
| Key Vault | Secrets management | RBAC authorization, soft-delete enabled |
| Application Insights | Monitoring and APM | Workspace-based, connected to Log Analytics |
| Entra ID | Authentication | Easy Auth or MSAL library |
### Deployment Strategy
- **Deployment slots**: staging slot for zero-downtime deploys, swap to production after validation
- **Auto-scale**: CPU-based rules, 1-10 instances in production
- **Health checks**: `/health` endpoint monitored by App Service and Front Door
### Cost Estimate
| Component | Dev | Production |
|-----------|-----|-----------|
| App Service | $13 (B1) | $75 (P1v3) |
| Azure SQL | $5 (Basic) | $40-120 (Serverless GP) |
| Front Door | $0 (disabled) | $35-55 |
| Blob Storage | $1 | $5-15 |
| Key Vault | $0.03 | $1-5 |
| Application Insights | $0 (free tier) | $5-20 |
| **Total** | **~$19** | **~$160-290** |
---
## Pattern 2: Microservices on AKS
### Architecture
```
┌──────────────┐
│ Azure Front │
│ Door │
└──────┬───────┘
┌──────▼───────┐
│ API Mgmt │
│ (gateway) │
└──────┬───────┘
┌────────────▼────────────┐
│ AKS Cluster │
│ ┌───────┐ ┌───────┐ │
│ │ svc-A │ │ svc-B │ │
│ └───┬───┘ └───┬───┘ │
│ │ │ │
│ ┌───▼─────────▼───┐ │
│ │ Service Bus │ │
│ │ (async msgs) │ │
│ └─────────────────┘ │
└─────────────────────────┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ Cosmos DB │ │ ACR │
│ (data) │ │ (images) │
└────────────┘ └───────────┘
```
### Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| AKS | Container orchestration | 3 node pools: system (D2s_v5), app (D4s_v5), jobs (spot) |
| API Management | API gateway, rate limiting | Standard v2 or Consumption tier |
| Cosmos DB | Multi-model database | Session consistency, autoscale RU/s |
| Service Bus | Async messaging | Standard tier, topics for pub/sub |
| Container Registry | Docker image storage | Basic (dev), Standard (prod) |
| Key Vault | Secrets for pods | CSI driver + workload identity |
| Azure Monitor | Cluster and app observability | Container Insights + App Insights |
### AKS Best Practices
**Node Pools:**
- System pool: 2-3 nodes, D2s_v5, taints for system pods only
- App pool: 2-10 nodes (autoscaler), D4s_v5, for application workloads
- Jobs pool: spot instances, for batch processing and CI runners
**Networking:**
- Azure CNI for VNet-native pod networking
- Network policies (Azure or Calico) for pod-to-pod isolation
- Ingress via NGINX Ingress Controller or Application Gateway Ingress Controller (AGIC)
**Security:**
- Workload Identity for pod-to-Azure service auth (replaces pod identity)
- Azure Policy for Kubernetes (OPA Gatekeeper)
- Defender for Containers for runtime threat detection
- Private cluster for production (API server not exposed to internet)
**Deployment:**
- Helm charts for application packaging
- Flux or ArgoCD for GitOps
- Horizontal Pod Autoscaler (HPA) + KEDA for event-driven scaling
### Cost Estimate
| Component | Dev | Production |
|-----------|-----|-----------|
| AKS nodes (system) | $60 (1x D2s_v5) | $180 (3x D2s_v5) |
| AKS nodes (app) | $120 (1x D4s_v5) | $360 (3x D4s_v5) |
| API Management | $0 (Consumption) | $175 (Standard v2) |
| Cosmos DB | $25 (serverless) | $100-400 (autoscale) |
| Service Bus | $10 | $10-50 |
| Container Registry | $5 | $20 |
| Monitoring | $0 | $50-100 |
| **Total** | **~$220** | **~$900-1300** |
---
## Pattern 3: Serverless Event-Driven
### Architecture
```
┌──────────┐ ┌──────────┐ ┌──────────┐
│ HTTP │ │ Blob │ │ Timer │
│ Trigger │ │ Trigger │ │ Trigger │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────┬───────┘─────────┬───────┘
│ │
┌──────▼───────┐ ┌──────▼───────┐
│ Azure │ │ Azure │
│ Functions │ │ Functions │
│ (handlers) │ │ (workers) │
└──┬────┬──────┘ └──────┬───────┘
│ │ │
┌─────────▼┐ ┌─▼──────────┐ ┌─▼──────────┐
│ Event │ │ Service │ │ Cosmos DB │
│ Grid │ │ Bus Queue │ │ (data) │
│ (fanout) │ │ (reliable) │ │ │
└──────────┘ └────────────┘ └────────────┘
```
### Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| Azure Functions | Event handlers, APIs | Consumption plan (dev), Premium (prod) |
| Event Grid | Event routing and fan-out | System + custom topics |
| Service Bus | Reliable messaging with DLQ | Basic or Standard, queues + topics |
| Cosmos DB | Low-latency data store | Serverless (dev), autoscale (prod) |
| Blob Storage | File processing triggers | Lifecycle policies |
| Application Insights | Function monitoring | Sampling at 5-10% for high volume |
### Durable Functions Patterns
Use Durable Functions for orchestration instead of building custom state machines:
| Pattern | Use Case | Example |
|---------|----------|---------|
| Function chaining | Sequential steps | Order: validate -> charge -> fulfill -> notify |
| Fan-out/fan-in | Parallel processing | Process all images in a batch, aggregate results |
| Async HTTP APIs | Long-running operations | Start job, poll for status, return result |
| Monitor | Periodic polling | Check external API until condition met |
| Human interaction | Approval workflows | Send approval email, wait for response with timeout |
### Cost Estimate
| Component | Dev | Production |
|-----------|-----|-----------|
| Functions (Consumption) | $0 (1M free) | $5-30 |
| Event Grid | $0 | $0-5 |
| Service Bus | $0 (Basic) | $10-30 |
| Cosmos DB | $0 (serverless free tier) | $25-150 |
| Blob Storage | $1 | $5-15 |
| Application Insights | $0 | $5-15 |
| **Total** | **~$1** | **~$50-245** |
---
## Pattern 4: Data Pipeline
### Architecture
```
┌──────────┐ ┌──────────┐
│ IoT/Apps │ │ Batch │
│ (events) │ │ (files) │
└────┬─────┘ └────┬─────┘
│ │
┌────▼─────┐ ┌────▼─────┐
│ Event │ │ Data │
│ Hubs │ │ Factory │
└────┬─────┘ └────┬─────┘
│ │
└────────┬───────┘
┌────────▼────────┐
│ Data Lake │
│ Storage Gen2 │
│ (raw/curated) │
└────────┬────────┘
┌────────▼────────┐
│ Synapse │
│ Analytics │
│ (SQL + Spark) │
└────────┬────────┘
┌────────▼────────┐
│ Power BI │
│ (dashboards) │
└─────────────────┘
```
### Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| Event Hubs | Real-time event ingestion | Standard, 2-8 partitions |
| Data Factory | Batch ETL orchestration | Managed, 90+ connectors |
| Data Lake Storage Gen2 | Raw and curated data lake | HNS enabled, lifecycle policies |
| Synapse Analytics | SQL and Spark analytics | Serverless SQL pool (pay-per-query) |
| Azure Functions | Lightweight processing | Triggered by Event Hubs or Blob |
| Power BI | Business intelligence | Pro ($10/user/month) |
### Data Lake Organization
```
data-lake/
├── raw/ # Landing zone — immutable source data
│ ├── source-system-a/
│ │ └── YYYY/MM/DD/ # Date-partitioned
│ └── source-system-b/
├── curated/ # Cleaned, validated, business-ready
│ ├── dimension/
│ └── fact/
├── sandbox/ # Ad-hoc exploration
└── archive/ # Cold storage (lifecycle policy target)
```
### Cost Estimate
| Component | Dev | Production |
|-----------|-----|-----------|
| Event Hubs (1 TU) | $22 | $44-176 |
| Data Factory | $0 (free tier) | $50-200 |
| Data Lake Storage | $5 | $20-80 |
| Synapse Serverless SQL | $5 | $50-300 |
| Azure Functions | $0 | $5-20 |
| Power BI Pro | $10/user | $10/user |
| **Total** | **~$42** | **~$180-800** |
---
## Pattern 5: Multi-Region Active-Active
### Architecture
```
┌──────────────┐
│ Azure Front │
│ Door (Global│
│ LB + WAF) │
└──┬────────┬──┘
│ │
┌──────────▼──┐ ┌──▼──────────┐
│ Region 1 │ │ Region 2 │
│ (East US) │ │ (West EU) │
│ │ │ │
│ App Service │ │ App Service │
│ + SQL │ │ + SQL │
│ + Redis │ │ + Redis │
└──────┬──────┘ └──────┬──────┘
│ │
┌──────▼───────────────▼──────┐
│ Cosmos DB │
│ (multi-region writes) │
│ Session consistency │
└─────────────────────────────┘
```
### Multi-Region Design Decisions
| Decision | Recommendation | Rationale |
|----------|---------------|-----------|
| Global load balancer | Front Door Premium | Built-in WAF, CDN, health probes, fastest failover |
| Database replication | Cosmos DB multi-write or SQL failover groups | Cosmos for global writes, SQL for relational needs |
| Session state | Azure Cache for Redis (per region) | Local sessions, avoid cross-region latency |
| Static content | Front Door CDN | Edge-cached, no origin required |
| DNS strategy | Front Door handles routing | No separate Traffic Manager needed |
| Failover | Automatic (Front Door health probes) | 10-30 second detection, automatic reroute |
### Azure SQL Failover Groups vs Cosmos DB Multi-Region
| Feature | SQL Failover Groups | Cosmos DB Multi-Region |
|---------|-------------------|----------------------|
| Replication | Async (RPO ~5s) | Sync or async (configurable) |
| Write region | Single primary | Multi-write capable |
| Failover | Automatic or manual (60s grace) | Automatic |
| Consistency | Strong (single writer) | 5 levels (session recommended) |
| Cost | 2x compute (active-passive) | Per-region RU/s charge |
| Best for | Relational data, transactions | Document data, global low-latency |
### Cost Impact
Multi-region typically costs 1.5-2x single region:
- Compute: 2x (running in both regions)
- Database: 1.5-2x (replication, multi-write)
- Networking: Additional cross-region data transfer (~$0.02-0.05/GB)
- Front Door Premium: ~$100-200/month
---
## Well-Architected Framework Alignment
Every architecture pattern should address all five pillars of the Azure Well-Architected Framework.
### Reliability
- Deploy across Availability Zones (zone-redundant App Service, AKS, SQL)
- Enable health probes at every layer
- Implement retry policies with exponential backoff (Polly for .NET, tenacity for Python)
- Define RPO/RTO and test disaster recovery quarterly
- Use Azure Chaos Studio for fault injection testing
### Security
- Entra ID for all human and service authentication
- Managed Identity for all Azure service-to-service communication
- Key Vault for secrets, certificates, and encryption keys — no secrets in code or config
- Private Endpoints for all PaaS services in production
- Microsoft Defender for Cloud for threat detection and compliance
### Cost Optimization
- Use serverless and consumption-based services where possible
- Auto-pause Azure SQL in dev/test (serverless tier)
- Spot VMs for fault-tolerant AKS node pools
- Reserved Instances for steady-state production workloads (1-year = 35% savings)
- Azure Advisor cost recommendations — review weekly
- Set budgets and alerts at subscription and resource group level
### Operational Excellence
- Bicep for all infrastructure (no manual portal deployments)
- GitOps for AKS (Flux or ArgoCD)
- Deployment slots or blue-green for zero-downtime deploys
- Centralized logging in Log Analytics with standardized KQL queries
- Azure DevOps or GitHub Actions for CI/CD with workload identity federation
### Performance Efficiency
- Application Insights for distributed tracing and performance profiling
- Azure Cache for Redis for session state and hot-path caching
- Front Door for edge caching and global acceleration
- Autoscale rules on compute (CPU, memory, HTTP queue length)
- Load testing with Azure Load Testing before production launch