Files
claude-skills-reference/engineering-team/senior-architect/references/tech_decision_guide.md
Alireza Rezvani 94224f2201 feat(senior-architect): Complete skill overhaul per Issue #48 (#88)
Addresses SkillzWave feedback and Anthropic best practices:

SKILL.md (343 lines):
- Third-person description with trigger phrases
- Added Table of Contents for navigation
- Concrete tool descriptions with usage examples
- Decision workflows: Database, Architecture Pattern, Monolith vs Microservices
- Removed marketing fluff, added actionable content

References (rewritten with real content):
- architecture_patterns.md: 9 patterns with trade-offs, code examples
  (Monolith, Modular Monolith, Microservices, Event-Driven, CQRS,
  Event Sourcing, Hexagonal, Clean Architecture, API Gateway)
- system_design_workflows.md: 6 step-by-step workflows
  (System Design Interview, Capacity Planning, API Design,
  Database Schema, Scalability Assessment, Migration Planning)
- tech_decision_guide.md: 7 decision frameworks with matrices
  (Database, Cache, Message Queue, Auth, Frontend, Cloud, API)

Scripts (fully functional, standard library only):
- architecture_diagram_generator.py: Mermaid + PlantUML + ASCII output
  Scans project structure, detects components, relationships
- dependency_analyzer.py: npm/pip/go/cargo support
  Circular dependency detection, coupling score calculation
- project_architect.py: Pattern detection (7 patterns)
  Layer violation detection, code quality metrics

All scripts tested and working.

Closes #48

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:29:14 +01:00

413 lines
12 KiB
Markdown

# Technology Decision Guide
Decision frameworks and comparison matrices for common technology choices.
## Decision Frameworks Index
1. [Database Selection](#1-database-selection)
2. [Caching Strategy](#2-caching-strategy)
3. [Message Queue Selection](#3-message-queue-selection)
4. [Authentication Strategy](#4-authentication-strategy)
5. [Frontend Framework Selection](#5-frontend-framework-selection)
6. [Cloud Provider Selection](#6-cloud-provider-selection)
7. [API Style Selection](#7-api-style-selection)
---
## 1. Database Selection
### SQL vs NoSQL Decision Matrix
| Factor | Choose SQL | Choose NoSQL |
|--------|-----------|--------------|
| Data relationships | Complex, many-to-many | Simple, denormalized OK |
| Schema | Well-defined, stable | Evolving, flexible |
| Transactions | ACID required | Eventual consistency OK |
| Query patterns | Complex joins, aggregations | Key-value, document lookups |
| Scale | Vertical (some horizontal) | Horizontal first |
| Team expertise | Strong SQL skills | Document/KV experience |
### Database Type Selection
**Relational (SQL):**
| Database | Best For | Avoid When |
|----------|----------|------------|
| PostgreSQL | General purpose, JSON support, extensions | Simple key-value only |
| MySQL | Web applications, read-heavy | Complex queries, JSON-heavy |
| SQLite | Embedded, development, small apps | Concurrent writes, scale |
**Document (NoSQL):**
| Database | Best For | Avoid When |
|----------|----------|------------|
| MongoDB | Flexible schema, rapid iteration | Complex transactions |
| CouchDB | Offline-first, sync required | High throughput |
**Key-Value:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Redis | Caching, sessions, real-time | Persistence critical |
| DynamoDB | Serverless, auto-scaling | Complex queries |
**Wide-Column:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Cassandra | Write-heavy, time-series | Complex queries, small scale |
| ScyllaDB | Cassandra alternative, performance | Small datasets |
**Time-Series:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| TimescaleDB | Time-series with SQL | Non-time-series data |
| InfluxDB | Metrics, monitoring | Relational queries |
**Search:**
| Database | Best For | Avoid When |
|----------|----------|------------|
| Elasticsearch | Full-text search, logs | Primary data store |
| Meilisearch | Simple search, fast setup | Complex analytics |
### Quick Decision Flow
```
Start
├─ Need ACID transactions? ──Yes──► PostgreSQL/MySQL
├─ Flexible schema needed? ──Yes──► MongoDB
├─ Write-heavy (>50K/sec)? ──Yes──► Cassandra/ScyllaDB
├─ Key-value access only? ──Yes──► Redis/DynamoDB
├─ Time-series data? ──Yes──► TimescaleDB/InfluxDB
├─ Full-text search? ──Yes──► Elasticsearch
└─ Default ──────────────────────► PostgreSQL
```
---
## 2. Caching Strategy
### Cache Type Selection
| Type | Use Case | Invalidation | Complexity |
|------|----------|--------------|------------|
| Read-through | Frequent reads, tolerance for stale | On write/TTL | Low |
| Write-through | Data consistency critical | Automatic | Medium |
| Write-behind | High write throughput | Async | High |
| Cache-aside | Fine-grained control | Application | Medium |
### Cache Technology Selection
| Technology | Best For | Limitations |
|------------|----------|-------------|
| Redis | General purpose, data structures | Memory cost |
| Memcached | Simple key-value, high throughput | No persistence |
| CDN (CloudFront, Fastly) | Static assets, edge caching | Dynamic content |
| Application cache | Per-instance, small data | Not distributed |
### Cache Patterns
**Cache-Aside (Lazy Loading):**
```
Read:
1. Check cache
2. If miss, read from DB
3. Store in cache
4. Return data
Write:
1. Write to DB
2. Invalidate cache
```
**Write-Through:**
```
Write:
1. Write to cache
2. Cache writes to DB
3. Return success
Read:
1. Read from cache (always hit)
```
**TTL Guidelines:**
| Data Type | Suggested TTL |
|-----------|---------------|
| User sessions | 24-48 hours |
| API responses | 1-5 minutes |
| Static content | 24 hours - 1 week |
| Database queries | 5-60 minutes |
| Feature flags | 1-5 minutes |
---
## 3. Message Queue Selection
### Queue Technology Comparison
| Feature | RabbitMQ | Kafka | SQS | Redis Streams |
|---------|----------|-------|-----|---------------|
| Throughput | Medium (10K/s) | Very High (100K+/s) | Medium | High |
| Ordering | Per-queue | Per-partition | FIFO optional | Per-stream |
| Durability | Configurable | Strong | Strong | Configurable |
| Replay | No | Yes | No | Yes |
| Complexity | Medium | High | Low | Low |
| Cost | Self-hosted | Self-hosted | Pay-per-use | Self-hosted |
### Decision Matrix
| Requirement | Recommendation |
|-------------|----------------|
| Simple task queue | SQS or Redis |
| Event streaming | Kafka |
| Complex routing | RabbitMQ |
| Log aggregation | Kafka |
| Serverless integration | SQS |
| Real-time analytics | Kafka |
| Request/reply pattern | RabbitMQ |
### When to Use Each
**RabbitMQ:**
- Complex routing logic (topic, fanout, headers)
- Request/reply patterns
- Priority queues
- Message acknowledgment critical
**Kafka:**
- Event sourcing
- High throughput requirements (>50K messages/sec)
- Message replay needed
- Stream processing
- Log aggregation
**SQS:**
- AWS-native applications
- Simple queue semantics
- Serverless architectures
- Don't want to manage infrastructure
**Redis Streams:**
- Already using Redis
- Moderate throughput
- Simple streaming needs
- Real-time features
---
## 4. Authentication Strategy
### Method Selection
| Method | Best For | Avoid When |
|--------|----------|------------|
| Session-based | Traditional web apps, server-rendered | Mobile apps, microservices |
| JWT | SPAs, mobile apps, microservices | Need immediate revocation |
| OAuth 2.0 | Third-party access, social login | Internal-only apps |
| API Keys | Server-to-server, simple auth | User authentication |
| mTLS | Service mesh, high security | Public APIs |
### JWT vs Sessions
| Factor | JWT | Sessions |
|--------|-----|----------|
| Scalability | Stateless, easy to scale | Requires session store |
| Revocation | Difficult (need blocklist) | Immediate |
| Payload | Can contain claims | Server-side only |
| Security | Token in client | Server-controlled |
| Mobile friendly | Yes | Requires cookies |
### OAuth 2.0 Flow Selection
| Flow | Use Case |
|------|----------|
| Authorization Code | Web apps with backend |
| Authorization Code + PKCE | SPAs, mobile apps |
| Client Credentials | Machine-to-machine |
| Device Code | Smart TVs, CLI tools |
**Avoid:** Implicit flow (deprecated), Resource Owner Password (legacy only)
### Token Lifetimes
| Token Type | Suggested Lifetime |
|------------|-------------------|
| Access token | 15-60 minutes |
| Refresh token | 7-30 days |
| API key | No expiry (rotate quarterly) |
| Session | 24 hours - 7 days |
---
## 5. Frontend Framework Selection
### Framework Comparison
| Factor | React | Vue | Angular | Svelte |
|--------|-------|-----|---------|--------|
| Learning curve | Medium | Low | High | Low |
| Ecosystem | Largest | Large | Complete | Growing |
| Performance | Good | Good | Good | Excellent |
| Bundle size | Medium | Small | Large | Smallest |
| TypeScript | Good | Good | Native | Good |
| Job market | Largest | Growing | Enterprise | Niche |
### Decision Matrix
| Requirement | Recommendation |
|-------------|----------------|
| Large team, enterprise | Angular |
| Startup, rapid iteration | React or Vue |
| Performance critical | Svelte or Solid |
| Existing React team | React |
| Progressive enhancement | Vue or Svelte |
| Component library needed | React (most options) |
### Meta-Framework Selection
| Framework | Best For |
|-----------|----------|
| Next.js (React) | Full-stack React, SSR/SSG |
| Nuxt (Vue) | Full-stack Vue, SSR/SSG |
| SvelteKit | Full-stack Svelte |
| Remix | Data-heavy React apps |
| Astro | Content sites, multi-framework |
### When to Use SSR vs SPA vs SSG
| Rendering | Use When |
|-----------|----------|
| SSR | SEO critical, dynamic content, auth-gated |
| SPA | Internal tools, highly interactive, no SEO |
| SSG | Content sites, blogs, documentation |
| ISR | Mix of static and dynamic |
---
## 6. Cloud Provider Selection
### Provider Comparison
| Factor | AWS | GCP | Azure |
|--------|-----|-----|-------|
| Market share | Largest | Growing | Enterprise strong |
| Service breadth | Most comprehensive | Strong ML/data | Best Microsoft integration |
| Pricing | Complex, volume discounts | Simpler, sustained use | EA discounts |
| Kubernetes | EKS | GKE (best managed) | AKS |
| Serverless | Lambda (mature) | Cloud Functions | Azure Functions |
| Database | RDS, DynamoDB | Cloud SQL, Spanner | SQL, Cosmos |
### Decision Factors
| If You Need | Consider |
|-------------|----------|
| Microsoft ecosystem | Azure |
| Best Kubernetes experience | GCP |
| Widest service selection | AWS |
| Machine learning focus | GCP or AWS |
| Government compliance | AWS GovCloud or Azure Gov |
| Startup credits | All offer programs |
### Multi-Cloud Considerations
**Go multi-cloud when:**
- Regulatory requirements mandate it
- Specific service (e.g., GCP BigQuery) is best-in-class
- Negotiating leverage with vendors
**Stay single-cloud when:**
- Team is small
- Want to minimize complexity
- Deep integration needed
### Service Mapping
| Need | AWS | GCP | Azure |
|------|-----|-----|-------|
| Compute | EC2 | Compute Engine | Virtual Machines |
| Containers | ECS, EKS | GKE, Cloud Run | AKS, Container Apps |
| Serverless | Lambda | Cloud Functions | Azure Functions |
| Object Storage | S3 | Cloud Storage | Blob Storage |
| SQL Database | RDS | Cloud SQL | Azure SQL |
| NoSQL | DynamoDB | Firestore | Cosmos DB |
| CDN | CloudFront | Cloud CDN | Azure CDN |
| DNS | Route 53 | Cloud DNS | Azure DNS |
---
## 7. API Style Selection
### REST vs GraphQL vs gRPC
| Factor | REST | GraphQL | gRPC |
|--------|------|---------|------|
| Use case | General purpose | Flexible queries | Microservices |
| Learning curve | Low | Medium | High |
| Over-fetching | Common | Solved | N/A |
| Caching | HTTP native | Complex | Custom |
| Browser support | Native | Native | Limited |
| Tooling | Mature | Growing | Strong |
| Performance | Good | Good | Excellent |
### Decision Matrix
| Requirement | Recommendation |
|-------------|----------------|
| Public API | REST |
| Mobile apps with varied needs | GraphQL |
| Microservices communication | gRPC |
| Real-time updates | GraphQL subscriptions or WebSocket |
| File uploads | REST |
| Internal services only | gRPC |
| Third-party developers | REST + OpenAPI |
### When to Choose Each
**Choose REST when:**
- Building public APIs
- Need HTTP caching
- Simple CRUD operations
- Team experienced with REST
**Choose GraphQL when:**
- Multiple clients with different data needs
- Rapid frontend iteration
- Complex, nested data relationships
- Want to reduce API calls
**Choose gRPC when:**
- Service-to-service communication
- Performance critical
- Streaming required
- Strong typing important
### API Versioning Strategies
| Strategy | Pros | Cons |
|----------|------|------|
| URL path (`/v1/`) | Clear, easy to implement | URL pollution |
| Query param (`?version=1`) | Flexible | Easy to miss |
| Header (`Accept-Version: 1`) | Clean URLs | Less discoverable |
| No versioning (evolve) | Simple | Breaking changes risky |
**Recommendation:** URL path versioning for public APIs, header versioning for internal.
---
## Quick Reference
| Decision | Default Choice | Alternative When |
|----------|----------------|------------------|
| Database | PostgreSQL | Scale/flexibility → MongoDB, DynamoDB |
| Cache | Redis | Simple needs → Memcached |
| Queue | SQS (AWS) / RabbitMQ | Event streaming → Kafka |
| Auth | JWT + Refresh | Traditional web → Sessions |
| Frontend | React + Next.js | Simplicity → Vue, Performance → Svelte |
| Cloud | AWS | Microsoft shop → Azure, ML-first → GCP |
| API | REST | Mobile flexibility → GraphQL, Internal → gRPC |