Files
claude-skills-reference/engineering-team/azure-cloud-architect/references/service_selection.md
Reza Rezvani 2056ba251f feat(engineering-team): add azure-cloud-architect, security-pen-testing; extend terraform-patterns
azure-cloud-architect (451-line SKILL.md, 3 scripts, 3 references):
- 6-step workflow mirroring aws-solution-architect for Azure
- Bicep/ARM templates, AKS, Functions, Cosmos DB, cost optimization
- architecture_designer.py, cost_optimizer.py, bicep_generator.py

security-pen-testing (850-line SKILL.md, 3 scripts, 3 references):
- OWASP Top 10 systematic audit, offensive security testing
- XSS/SQLi/SSRF/IDOR detection, secret scanning, API security
- vulnerability_scanner.py, dependency_auditor.py, pentest_report_generator.py
- Responsible disclosure workflow included

terraform-patterns extended (487 → 740 lines):
- Multi-cloud provider configuration
- OpenTofu compatibility notes
- Infracost integration for PR cost estimation
- Import existing infrastructure patterns
- Terragrunt DRY multi-environment patterns

Updated engineering-team plugin.json (26 → 28 skills).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 13:32:22 +01:00

11 KiB

Azure Service Selection Guide

Quick reference for choosing the right Azure service based on workload requirements.


Table of Contents


Compute Services

Decision Matrix

Requirement Recommended Service
Event-driven, short tasks (<10 min) Azure Functions (Consumption)
Event-driven, longer tasks (<30 min) Azure Functions (Premium)
Containerized apps, simple deployment Azure Container Apps
Full Kubernetes control AKS
Traditional web apps (PaaS) App Service
GPU, HPC, custom OS Virtual Machines
Batch processing Azure Batch
Simple container from source App Service (container)

Azure Functions vs Container Apps vs AKS vs App Service

Feature Functions Container Apps AKS App Service
Scale to zero Yes (Consumption) Yes No (min 1 node) No
Kubernetes No Built on K8s (abstracted) Full K8s No
Cold start 1-5s (Consumption) 0-2s N/A N/A
Max execution time 10 min (Consumption), 30 min (Premium) Unlimited Unlimited Unlimited
Languages C#, JS, Python, Java, Go, Rust, PowerShell Any container Any container .NET, Node, Python, Java, PHP, Ruby
Pricing model Per-execution Per vCPU-second Per node Per plan
Best for Event handlers, APIs, scheduled jobs Microservices, APIs Complex platforms, multi-team Web apps, APIs, mobile backends
Operational complexity Low Low-Medium High Low
Dapr integration No Built-in Manual No
KEDA autoscaling No Built-in Manual install No

Opinionated recommendation:

  • Start with App Service for web apps and APIs — simplest operational model.
  • Use Container Apps for microservices — serverless containers without Kubernetes complexity.
  • Use AKS only when you need full Kubernetes API access (custom operators, service mesh, multi-cluster).
  • Use Functions for event-driven glue (queue processing, webhooks, scheduled jobs).

VM Size Selection

Workload Series Example vCPUs RAM Use Case
General purpose Dv5/Dsv5 Standard_D4s_v5 4 16 GB Web servers, small databases
Memory optimized Ev5/Esv5 Standard_E8s_v5 8 64 GB Databases, caching, analytics
Compute optimized Fv2/Fsv2 Standard_F8s_v2 8 16 GB Batch processing, ML inference
Storage optimized Lsv3 Standard_L8s_v3 8 64 GB Data warehouses, large databases
GPU NCv3/NDv4 Standard_NC6s_v3 6 112 GB ML training, rendering

Always use v5 generation or newer — better price-performance than older series.


Database Services

Decision Matrix

Requirement Recommended Service
Relational, SQL Server compatible Azure SQL Database
Relational, PostgreSQL Azure Database for PostgreSQL Flexible Server
Relational, MySQL Azure Database for MySQL Flexible Server
Document / multi-model, global distribution Cosmos DB
Key-value cache, sessions Azure Cache for Redis
Time-series, IoT data Azure Data Explorer (Kusto)
Full-text search Azure AI Search (formerly Cognitive Search)
Graph database Cosmos DB (Gremlin API)

Cosmos DB vs Azure SQL vs PostgreSQL

Feature Cosmos DB Azure SQL PostgreSQL Flexible
Data model Document, key-value, graph, table, column Relational Relational + JSON
Global distribution Native multi-region writes Geo-replication (async) Read replicas
Consistency 5 levels (strong to eventual) Strong Strong
Scaling RU/s (auto or manual) DTU or vCore vCore
Serverless tier Yes Yes No
Best for Global apps, variable schema, low-latency reads OLTP, complex queries, transactions PostgreSQL ecosystem, extensions
Pricing model Per RU/s + storage Per DTU or per vCore Per vCore
Managed backups Continuous + point-in-time Automatic + long-term retention Automatic

Opinionated recommendation:

  • Default to Azure SQL Serverless for most relational workloads — auto-pause saves money in dev/staging.
  • Use PostgreSQL Flexible when you need PostGIS, full-text search, or specific PostgreSQL extensions.
  • Use Cosmos DB only when you need global distribution, sub-10ms latency, or flexible schema.
  • Never use Cosmos DB for workloads that need complex joins or transactions across partitions.

Azure SQL Tier Selection

Tier Use Case Compute Cost Range
Basic / S0 Dev/test, tiny workloads 5 DTUs $5/month
General Purpose (Serverless) Variable workloads, dev/staging 0.5-40 vCores (auto-pause) $40-800/month
General Purpose (Provisioned) Steady production workloads 2-80 vCores $150-3000/month
Business Critical High IOPS, low latency, readable secondary 2-128 vCores $400-8000/month
Hyperscale Large databases (>4 TB), instant scaling 2-128 vCores $200-5000/month

Storage Services

Decision Matrix

Requirement Recommended Service
Unstructured data (files, images, backups) Blob Storage
File shares (SMB/NFS) Azure Files
High-performance file shares Azure NetApp Files
Data Lake (analytics, big data) Data Lake Storage Gen2
Disk storage for VMs Managed Disks
Queue-based messaging (simple) Queue Storage
Table data (simple key-value) Table Storage (or Cosmos DB Table API)

Blob Storage Tiers

Tier Access Pattern Cost (per GB/month) Access Cost Use Case
Hot Frequent access $0.018 Low Active data, web content
Cool Infrequent (30+ days) $0.01 Medium Backups, older data
Cold Rarely accessed (90+ days) $0.0036 Higher Compliance archives
Archive Almost never (180+ days) $0.00099 High (rehydrate required) Long-term retention

Always set lifecycle management policies. Rule of thumb: Hot for 30 days, Cool for 90 days, Cold or Archive after that.


Messaging and Events

Decision Matrix

Requirement Recommended Service
Pub/sub, event routing, reactive Event Grid
Reliable message queues, transactions Service Bus
High-throughput event streaming Event Hubs
Simple task queues Queue Storage
IoT device telemetry IoT Hub

Event Grid vs Service Bus vs Event Hubs

Feature Event Grid Service Bus Event Hubs
Pattern Pub/Sub events Message queue / topic Event streaming
Delivery At-least-once At-least-once (peek-lock) At-least-once (partitioned)
Ordering No guarantee FIFO (sessions) Per partition
Max message size 1 MB 256 KB (Standard), 100 MB (Premium) 1 MB (Standard), 20 MB (Premium)
Retention 24 hours 14 days (Standard) 1-90 days
Throughput Millions/sec Thousands/sec Millions/sec
Best for Reactive events, webhooks Business workflows, commands Telemetry, logs, analytics
Dead letter Yes Yes Via capture to storage

Opinionated recommendation:

  • Event Grid for reactive, fan-out scenarios (blob uploaded, resource created, custom events).
  • Service Bus for reliable business messaging (orders, payments, workflows). Use topics for pub/sub, queues for point-to-point.
  • Event Hubs for high-volume telemetry, log aggregation, and streaming analytics.

Networking

Decision Matrix

Requirement Recommended Service
Global HTTP load balancing + CDN + WAF Azure Front Door
Regional Layer 7 load balancing + WAF Application Gateway
Regional Layer 4 load balancing Azure Load Balancer
DNS management Azure DNS
DNS-based global traffic routing Traffic Manager
Private connectivity to PaaS Private Endpoints
Site-to-site VPN VPN Gateway
Dedicated private connection ExpressRoute
Outbound internet from VNet NAT Gateway
DDoS protection Azure DDoS Protection

Front Door vs Application Gateway vs Load Balancer

Feature Front Door Application Gateway Load Balancer
Layer 7 (HTTP/HTTPS) 7 (HTTP/HTTPS) 4 (TCP/UDP)
Scope Global Regional Regional
WAF Yes (Premium) Yes (v2) No
SSL termination Yes Yes No
CDN Built-in No No
Health probes Yes Yes Yes
Best for Global web apps, multi-region Single-region web apps TCP/UDP workloads, internal LB

Security and Identity

Decision Matrix

Requirement Recommended Service
User authentication Entra ID (Azure AD)
B2C customer identity Entra External ID (Azure AD B2C)
Secrets, keys, certificates Key Vault
Service-to-service auth Managed Identity
Network access control NSGs + Private Endpoints
Web application firewall Front Door WAF or App Gateway WAF
Threat detection Microsoft Defender for Cloud
Policy enforcement Azure Policy
Privileged access management Entra ID PIM

Managed Identity Usage

Scenario Configuration
App Service accessing SQL System-assigned MI + Azure SQL Entra auth
Functions accessing Key Vault System-assigned MI + Key Vault RBAC
AKS pods accessing Cosmos DB Workload Identity + Cosmos DB RBAC
VM accessing Storage System-assigned MI + Storage RBAC
DevOps pipeline deploying Workload Identity Federation (no secrets)

Rule: Every Azure service that supports Managed Identity should use it. No connection strings with passwords, no service principal secrets in config.


Monitoring and Observability

Decision Matrix

Requirement Recommended Service
Application performance monitoring Application Insights
Log aggregation and queries Log Analytics (KQL)
Metrics and alerts Azure Monitor
Dashboards Azure Dashboard or Grafana (managed)
Distributed tracing Application Insights (OpenTelemetry)
Cost monitoring Cost Management + Budgets
Security monitoring Microsoft Defender for Cloud
Compliance monitoring Azure Policy + Regulatory Compliance

Every resource should have diagnostic settings sending logs and metrics to a Log Analytics workspace. Non-negotiable for production.