fix(skill): restructure aws-solution-architect for better organization (#61) (#114)

Complete restructure based on AI Agent Skills Benchmark feedback (original score: 66/100): ## Directory Reorganization - Moved Python scripts to scripts/ directory - Moved sample files to assets/ directory - Created references/ directory with extracted content - Removed HOW_TO_USE.md (integrated into SKILL.md) - Removed __pycache__ ## New Reference Files (3 files) - architecture_patterns.md: 6 AWS patterns (serverless, microservices, three-tier, data processing, GraphQL, multi-region) with diagrams, cost breakdowns, pros/cons - service_selection.md: Decision matrices for compute, database, storage, messaging, networking, security services with code examples - best_practices.md: Serverless design, cost optimization, security hardening, scalability patterns, common pitfalls ## SKILL.md Rewrite - Reduced from 345 lines to 307 lines (moved patterns to references/) - Added trigger phrases to description ("design serverless architecture", "create CloudFormation templates", "optimize AWS costs") - Structured around 6-step workflow instead of encyclopedia format - Added Quick Start examples (MVP, Scaling, Cost Optimization, IaC) - Removed marketing language ("Expert", "comprehensive") - Consistent imperative voice throughout ## Structure Changes - scripts/: architecture_designer.py, cost_optimizer.py, serverless_stack.py - references/: architecture_patterns.md, service_selection.md, best_practices.md - assets/: sample_input.json, expected_output.json Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 02:42:08 +01:00
parent c0989817bc
commit c7dc957823
13 changed files with 1930 additions and 626 deletions
--- a/engineering-team/aws-solution-architect/HOW_TO_USE.md
+++ b/engineering-team/aws-solution-architect/HOW_TO_USE.md
@@ -1,308 +0,0 @@
-# How to Use This Skill
-
-Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable serverless architecture for my startup?
-
-## Example Invocations
-
-**Example 1: Serverless Web Application**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you design a serverless architecture for a SaaS platform with 10k users, including API, database, and authentication?
-```
-
-**Example 2: Microservices Architecture**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you design an event-driven microservices architecture using Lambda, EventBridge, and DynamoDB for an e-commerce platform?
-```
-
-**Example 3: Cost Optimization**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you analyze my current AWS setup and recommend ways to reduce costs by 30%? I'm currently spending $2000/month.
-```
-
-**Example 4: Infrastructure as Code**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you generate a CloudFormation template for a three-tier web application with auto-scaling and RDS?
-```
-
-**Example 5: Mobile Backend**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable mobile backend using AppSync GraphQL, Cognito, and DynamoDB?
-```
-
-**Example 6: Data Pipeline**
-```
-Hey Claude—I just added the "aws-solution-architect" skill. Can you design a real-time data processing pipeline using Kinesis for analytics on IoT sensor data?
-```
-
-## What to Provide
-
-Depending on your needs, provide:
-
-### For Architecture Design:
- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS
- **Expected scale**: Number of users, requests per second, data volume
- **Budget**: Monthly AWS spend limit or constraints
- **Team context**: Team size, AWS experience level
- **Requirements**: Authentication, real-time features, compliance needs (GDPR, HIPAA)
- **Geographic scope**: Single region, multi-region, global
-
-### For Cost Optimization:
- **Current monthly spend**: Total AWS bill
- **Resource inventory**: List of EC2, RDS, S3, etc. resources
- **Utilization metrics**: CPU, memory, storage usage
- **Budget target**: Desired monthly spend or savings percentage
-
-### For Infrastructure as Code:
- **Template type**: CloudFormation, CDK (TypeScript/Python), or Terraform
- **Services needed**: Compute, database, storage, networking
- **Environment**: dev, staging, production configurations
-
-## What You'll Get
-
-Based on your request, you'll receive:
-
-### Architecture Designs:
- **Pattern recommendation** with service selection
- **Architecture diagram** description (visual representation)
- **Service configuration** details
- **Cost estimates** with monthly breakdown
- **Pros/cons** analysis
- **Scaling characteristics** and limitations
-
-### Infrastructure as Code:
- **CloudFormation templates** (YAML) - production-ready
- **AWS CDK stacks** (TypeScript) - modern, type-safe
- **Terraform configurations** (HCL) - multi-cloud compatible
- **Deployment instructions** and prerequisites
- **Security best practices** built-in
-
-### Cost Optimization:
- **Current spend analysis** by service
- **Specific recommendations** with savings potential
- **Priority actions** (high/medium/low)
- **Implementation checklist** with timelines
- **Long-term optimization** strategies
-
-### Best Practices:
- **Security hardening** checklist
- **Scalability patterns** and anti-patterns
- **Monitoring setup** recommendations
- **Disaster recovery** procedures
- **Compliance guidance** (GDPR, HIPAA, SOC 2)
-
-## Common Use Cases
-
-### 1. MVP/Startup Launch
-**Ask for:** "Serverless architecture for MVP with minimal costs"
-
-**You'll get:**
- Amplify or Lambda + API Gateway + DynamoDB stack
- Cognito authentication setup
- S3 + CloudFront for frontend
- Cost estimate: $20-100/month
- Fast deployment (1-3 days)
-
-### 2. Scaling Existing Application
-**Ask for:** "Migrate from single server to scalable AWS architecture"
-
-**You'll get:**
- Migration strategy (phased approach)
- Modern three-tier or containerized architecture
- Load balancing and auto-scaling configuration
- Database migration plan (DMS)
- Zero-downtime deployment strategy
-
-### 3. Cost Reduction
-**Ask for:** "Analyze and optimize my $5000/month AWS bill"
-
-**You'll get:**
- Service-by-service cost breakdown
- Right-sizing recommendations
- Savings Plans/Reserved Instance opportunities
- Storage lifecycle optimizations
- Estimated savings: 20-40%
-
-### 4. Compliance Requirements
-**Ask for:** "HIPAA-compliant architecture for healthcare application"
-
-**You'll get:**
- Compliant service selection (BAA-eligible only)
- Encryption configuration (at rest and in transit)
- Audit logging setup (CloudTrail, Config)
- Network isolation (VPC private subnets)
- Access control (IAM policies)
-
-### 5. Global Deployment
-**Ask for:** "Multi-region architecture for global users"
-
-**You'll get:**
- Route 53 geolocation routing
- DynamoDB Global Tables or Aurora Global
- CloudFront edge caching
- Disaster recovery and failover
- Cross-region cost considerations
-
-## Prerequisites
-
-### For Using Generated Templates:
-
-**AWS Account**:
- Active AWS account with appropriate permissions
- IAM user or role with admin access (for initial setup)
- Billing alerts enabled
-
-**Tools Required**:
-```bash
-# AWS CLI
-brew install awscli  # macOS
-aws configure
-
-# For CloudFormation
-# (AWS CLI includes CloudFormation)
-
-# For AWS CDK
-npm install -g aws-cdk
-cdk --version
-
-# For Terraform
-brew install terraform  # macOS
-terraform --version
-```
-
-**Knowledge**:
- Basic AWS concepts (VPC, IAM, EC2, S3)
- Command line proficiency
- Git for version control
-
-## Deployment Steps
-
-### CloudFormation:
-```bash
-# Validate template
-aws cloudformation validate-template --template-body file://template.yaml
-
-# Deploy stack
-aws cloudformation create-stack \
-  --stack-name my-app-stack \
-  --template-body file://template.yaml \
-  --parameters ParameterKey=Environment,ParameterValue=dev \
-  --capabilities CAPABILITY_IAM
-
-# Monitor deployment
-aws cloudformation describe-stacks --stack-name my-app-stack
-```
-
-### AWS CDK:
-```bash
-# Initialize project
-cdk init app --language=typescript
-
-# Install dependencies
-npm install
-
-# Deploy stack
-cdk deploy
-
-# View outputs
-cdk outputs
-```
-
-### Terraform:
-```bash
-# Initialize
-terraform init
-
-# Plan deployment
-terraform plan
-
-# Apply changes
-terraform apply
-
-# View outputs
-terraform output
-```
-
-## Best Practices Tips
-
-### 1. Start Small, Scale Gradually
- Begin with serverless to minimize costs
- Add managed services as you grow
- Avoid over-engineering for hypothetical scale
-
-### 2. Enable Monitoring from Day One
- Set up CloudWatch dashboards
- Configure alarms for critical metrics
- Enable AWS Cost Explorer
- Create budget alerts
-
-### 3. Infrastructure as Code Always
- Version control all infrastructure
- Use separate accounts for dev/staging/prod
- Implement CI/CD for infrastructure changes
- Document architecture decisions
-
-### 4. Security First
- Enable MFA on root and admin accounts
- Use IAM roles, never long-term credentials
- Encrypt everything (S3, RDS, EBS)
- Regular security audits (AWS Security Hub)
-
-### 5. Cost Management
- Tag all resources for cost allocation
- Review bills weekly
- Delete unused resources promptly
- Use Savings Plans for predictable workloads
-
-## Troubleshooting
-
-### Common Issues:
-
-**"Access Denied" errors:**
- Check IAM permissions for your user/role
- Ensure service-linked roles exist
- Verify resource policies (S3, KMS)
-
-**High costs unexpectedly:**
- Check for undeleted resources (EC2, RDS snapshots)
- Review NAT Gateway data transfer
- Check CloudWatch Logs retention
- Look for unauthorized usage
-
-**Deployment failures:**
- Validate templates before deploying
- Check service quotas (limits)
- Verify VPC/subnet configuration
- Review CloudFormation/Terraform error messages
-
-**Performance issues:**
- Enable CloudWatch metrics and X-Ray
- Check database connection pooling
- Review Lambda cold starts (use provisioned concurrency)
- Optimize database queries and indexes
-
-## Additional Resources
-
- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/
- **AWS Architecture Center**: https://aws.amazon.com/architecture/
- **Serverless Land**: https://serverlessland.com/
- **AWS Pricing Calculator**: https://calculator.aws/
- **AWS Free Tier**: https://aws.amazon.com/free/
- **AWS Startups**: https://aws.amazon.com/startups/
-
-## Tips for Best Results
-
-1. **Be specific** about scale and budget constraints
-2. **Mention team experience** level with AWS
-3. **State compliance requirements** upfront (GDPR, HIPAA, etc.)
-4. **Describe current setup** if migrating from existing infrastructure
-5. **Ask for alternatives** if you need options to compare
-6. **Request explanations** for WHY certain services are recommended
-7. **Specify IaC preference** (CloudFormation, CDK, or Terraform)
-
-## Support
-
-For AWS-specific questions:
- AWS Support Plans (Developer, Business, Enterprise)
- AWS re:Post community forum
- AWS Documentation: https://docs.aws.amazon.com/
- AWS Training: https://aws.amazon.com/training/
--- a/engineering-team/aws-solution-architect/SKILL.md
+++ b/engineering-team/aws-solution-architect/SKILL.md
@@ -1,344 +1,306 @@
 ---
 name: aws-solution-architect
-description: Expert AWS solution architecture for startups focusing on serverless, scalable, and cost-effective cloud infrastructure with modern DevOps practices and infrastructure-as-code
+description: Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora, and cost optimization.
 ---

-# AWS Solution Architect for Startups
+# AWS Solution Architect

-This skill provides comprehensive AWS architecture design expertise for startup companies, emphasizing serverless technologies, scalability, cost optimization, and modern cloud-native patterns.
+Design scalable, cost-effective AWS architectures for startups with infrastructure-as-code templates.

-## Capabilities
+---

- **Serverless Architecture Design**: Lambda, API Gateway, DynamoDB, EventBridge, Step Functions, AppSync
- **Infrastructure as Code**: CloudFormation, CDK (Cloud Development Kit), Terraform templates
- **Scalable Application Architecture**: Auto-scaling, load balancing, multi-region deployment
- **Data & Storage Solutions**: S3, RDS Aurora Serverless, DynamoDB, ElastiCache, Neptune
- **Event-Driven Architecture**: EventBridge, SNS, SQS, Kinesis, Lambda triggers
- **API Design**: API Gateway (REST & WebSocket), AppSync (GraphQL), rate limiting, authentication
- **Authentication & Authorization**: Cognito, IAM, fine-grained access control, federated identity
- **CI/CD Pipelines**: CodePipeline, CodeBuild, CodeDeploy, GitHub Actions integration
- **Monitoring & Observability**: CloudWatch, X-Ray, CloudTrail, alarms, dashboards
- **Cost Optimization**: Reserved instances, Savings Plans, right-sizing, budget alerts
- **Security Best Practices**: VPC design, security groups, WAF, Secrets Manager, encryption
- **Microservices Patterns**: Service mesh, API composition, saga patterns, CQRS
- **Container Orchestration**: ECS Fargate, EKS (Kubernetes), App Runner
- **Content Delivery**: CloudFront, edge locations, origin shield, caching strategies
- **Database Migration**: DMS, schema conversion, zero-downtime migrations
+## Table of Contents
+
+- [Trigger Terms](#trigger-terms)
+- [Workflow](#workflow)
+- [Tools](#tools)
+- [Quick Start](#quick-start)
+- [Input Requirements](#input-requirements)
+- [Output Formats](#output-formats)
+
+---
+
+## Trigger Terms
+
+Use this skill when you encounter:
+
+| Category | Terms |
+|----------|-------|
+| **Architecture Design** | serverless architecture, AWS architecture, cloud design, microservices, three-tier |
+| **IaC Generation** | CloudFormation, CDK, Terraform, infrastructure as code, deploy template |
+| **Serverless** | Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, AppSync |
+| **Containers** | ECS, Fargate, EKS, container orchestration, Docker on AWS |
+| **Cost Optimization** | reduce AWS costs, optimize spending, right-sizing, Savings Plans |
+| **Database** | Aurora, RDS, DynamoDB design, database migration, data modeling |
+| **Security** | IAM policies, VPC design, encryption, Cognito, WAF |
+| **CI/CD** | CodePipeline, CodeBuild, CodeDeploy, GitHub Actions AWS |
+| **Monitoring** | CloudWatch, X-Ray, observability, alarms, dashboards |
+| **Migration** | migrate to AWS, lift and shift, replatform, DMS |
+
+---
+
+## Workflow
+
+### Step 1: Gather Requirements
+
+Collect application specifications:
+
+```
+- Application type (web app, mobile backend, data pipeline, SaaS)
+- Expected users and requests per second
+- Budget constraints (monthly spend limit)
+- Team size and AWS experience level
+- Compliance requirements (GDPR, HIPAA, SOC 2)
+- Availability requirements (SLA, RPO/RTO)
+```
+
+### Step 2: Design Architecture
+
+Run the architecture designer to get pattern recommendations:
+
+```bash
+python scripts/architecture_designer.py --input requirements.json
+```
+
+Select from recommended patterns:
+- **Serverless Web**: S3 + CloudFront + API Gateway + Lambda + DynamoDB
+- **Event-Driven Microservices**: EventBridge + Lambda + SQS + Step Functions
+- **Three-Tier**: ALB + ECS Fargate + Aurora + ElastiCache
+- **GraphQL Backend**: AppSync + Lambda + DynamoDB + Cognito
+
+See `references/architecture_patterns.md` for detailed pattern specifications.
+
+### Step 3: Generate IaC Templates
+
+Create infrastructure-as-code for the selected pattern:
+
+```bash
+# Serverless stack (CloudFormation)
+python scripts/serverless_stack.py --app-name my-app --region us-east-1
+
+# Output: CloudFormation YAML template ready to deploy
+```
+
+### Step 4: Review Costs
+
+Analyze estimated costs and optimization opportunities:
+
+```bash
+python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000
+```
+
+Output includes:
+- Monthly cost breakdown by service
+- Right-sizing recommendations
+- Savings Plans opportunities
+- Potential monthly savings
+
+### Step 5: Deploy
+
+Deploy the generated infrastructure:
+
+```bash
+# CloudFormation
+aws cloudformation create-stack \
+  --stack-name my-app-stack \
+  --template-body file://template.yaml \
+  --capabilities CAPABILITY_IAM
+
+# CDK
+cdk deploy
+
+# Terraform
+terraform init && terraform apply
+```
+
+### Step 6: Validate
+
+Verify deployment and set up monitoring:
+
+```bash
+# Check stack status
+aws cloudformation describe-stacks --stack-name my-app-stack
+
+# Set up CloudWatch alarms
+aws cloudwatch put-metric-alarm --alarm-name high-errors ...
+```
+
+---
+
+## Tools
+
+### architecture_designer.py
+
+Generates architecture patterns based on requirements.
+
+```bash
+python scripts/architecture_designer.py --input requirements.json --output design.json
+```
+
+**Input:** JSON with app type, scale, budget, compliance needs
+**Output:** Recommended pattern, service stack, cost estimate, pros/cons
+
+### serverless_stack.py
+
+Creates serverless CloudFormation templates.
+
+```bash
+python scripts/serverless_stack.py --app-name my-app --region us-east-1
+```
+
+**Output:** Production-ready CloudFormation YAML with:
+- API Gateway + Lambda
+- DynamoDB table
+- Cognito user pool
+- IAM roles with least privilege
+- CloudWatch logging
+
+### cost_optimizer.py
+
+Analyzes costs and recommends optimizations.
+
+```bash
+python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000
+```
+
+**Output:** Recommendations for:
+- Idle resource removal
+- Instance right-sizing
+- Reserved capacity purchases
+- Storage tier transitions
+- NAT Gateway alternatives
+
+---
+
+## Quick Start
+
+### MVP Architecture (< $100/month)
+
+```
+Ask: "Design a serverless MVP backend for a mobile app with 1000 users"
+
+Result:
+- Lambda + API Gateway for API
+- DynamoDB pay-per-request for data
+- Cognito for authentication
+- S3 + CloudFront for static assets
+- Estimated: $20-50/month
+```
+
+### Scaling Architecture ($500-2000/month)
+
+```
+Ask: "Design a scalable architecture for a SaaS platform with 50k users"
+
+Result:
+- ECS Fargate for containerized API
+- Aurora Serverless for relational data
+- ElastiCache for session caching
+- CloudFront for CDN
+- CodePipeline for CI/CD
+- Multi-AZ deployment
+```
+
+### Cost Optimization
+
+```
+Ask: "Optimize my AWS setup to reduce costs by 30%. Current spend: $3000/month"
+
+Provide: Current resource inventory (EC2, RDS, S3, etc.)
+
+Result:
+- Idle resource identification
+- Right-sizing recommendations
+- Savings Plans analysis
+- Storage lifecycle policies
+- Target savings: $900/month
+```
+
+### IaC Generation
+
+```
+Ask: "Generate CloudFormation for a three-tier web app with auto-scaling"
+
+Result:
+- VPC with public/private subnets
+- ALB with HTTPS
+- ECS Fargate with auto-scaling
+- Aurora with read replicas
+- Security groups and IAM roles
+```
+
+---

 ## Input Requirements

-Architecture design requires:
- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS platform
- **Traffic expectations**: Users/day, requests/second, geographic distribution
- **Data requirements**: Storage needs, database type, backup/retention policies
- **Budget constraints**: Monthly spend limits, cost optimization priorities
- **Team size & expertise**: Developer count, AWS experience level, DevOps maturity
- **Compliance needs**: GDPR, HIPAA, SOC 2, PCI-DSS, data residency
- **Availability requirements**: SLA targets, uptime goals, disaster recovery RPO/RTO
+Provide these details for architecture design:

-Formats accepted:
- Text description of application requirements
- JSON with structured architecture specifications
- Existing architecture diagrams or documentation
- Current AWS resource inventory (for optimization)
+| Requirement | Description | Example |
+|-------------|-------------|---------|
+| Application type | What you're building | SaaS platform, mobile backend |
+| Expected scale | Users, requests/sec | 10k users, 100 RPS |
+| Budget | Monthly AWS limit | $500/month max |
+| Team context | Size, AWS experience | 3 devs, intermediate |
+| Compliance | Regulatory needs | HIPAA, GDPR, SOC 2 |
+| Availability | Uptime requirements | 99.9% SLA, 1hr RPO |
+
+**JSON Format:**
+
+```json
+{
+  "application_type": "saas_platform",
+  "expected_users": 10000,
+  "requests_per_second": 100,
+  "budget_monthly_usd": 500,
+  "team_size": 3,
+  "aws_experience": "intermediate",
+  "compliance": ["SOC2"],
+  "availability_sla": "99.9%"
+}
+```
+
+---

 ## Output Formats

-Results include:
- **Architecture diagrams**: Visual representations using draw.io or Lucidchart format
- **CloudFormation/CDK templates**: Infrastructure as Code (IaC) ready to deploy
- **Terraform configurations**: Multi-cloud compatible infrastructure definitions
- **Cost estimates**: Detailed monthly cost breakdown with optimization suggestions
- **Security assessment**: Best practices checklist, compliance validation
- **Deployment guides**: Step-by-step implementation instructions
- **Runbooks**: Operational procedures, troubleshooting guides, disaster recovery plans
- **Migration strategies**: Phased migration plans, rollback procedures
+### Architecture Design

-## How to Use
+- Pattern recommendation with rationale
+- Service stack diagram (ASCII)
+- Configuration specifications
+- Monthly cost estimate
+- Scaling characteristics
+- Trade-offs and limitations

-"Design a serverless API backend for a mobile app with 100k users using Lambda and DynamoDB"
-"Create a cost-optimized architecture for a SaaS platform with multi-tenancy"
-"Generate CloudFormation template for a three-tier web application with auto-scaling"
-"Design event-driven microservices architecture using EventBridge and Step Functions"
-"Optimize my current AWS setup to reduce costs by 30%"
+### IaC Templates

-## Scripts
+- **CloudFormation YAML**: Production-ready SAM/CFN templates
+- **CDK TypeScript**: Type-safe infrastructure code
+- **Terraform HCL**: Multi-cloud compatible configs

- `architecture_designer.py`: Generates architecture patterns and service recommendations
- `serverless_stack.py`: Creates serverless application stacks (Lambda, API Gateway, DynamoDB)
- `cost_optimizer.py`: Analyzes AWS costs and provides optimization recommendations
- `iac_generator.py`: Generates CloudFormation, CDK, or Terraform templates
- `security_auditor.py`: AWS security best practices validation and compliance checks
+### Cost Analysis

-## Architecture Patterns
+- Current spend breakdown
+- Optimization recommendations with savings
+- Priority action list (high/medium/low)
+- Implementation checklist

-### 1. Serverless Web Application
-**Use Case**: SaaS platforms, mobile backends, low-traffic websites
+---

-**Stack**:
- **Frontend**: S3 + CloudFront (static hosting)
- **API**: API Gateway + Lambda
- **Database**: DynamoDB or Aurora Serverless
- **Auth**: Cognito
- **CI/CD**: Amplify or CodePipeline
+## Reference Documentation

-**Benefits**: Zero server management, pay-per-use, auto-scaling, low operational overhead
+| Document | Contents |
+|----------|----------|
+| `references/architecture_patterns.md` | 6 patterns: serverless, microservices, three-tier, data processing, GraphQL, multi-region |
+| `references/service_selection.md` | Decision matrices for compute, database, storage, messaging |
+| `references/best_practices.md` | Serverless design, cost optimization, security hardening, scalability |

-**Cost**: $50-500/month for small to medium traffic
-
-### 2. Event-Driven Microservices
-**Use Case**: Complex business workflows, asynchronous processing, decoupled systems
-
-**Stack**:
- **Events**: EventBridge (event bus)
- **Processing**: Lambda functions or ECS Fargate
- **Queue**: SQS (dead letter queues for failures)
- **State Management**: Step Functions
- **Storage**: DynamoDB, S3
-
-**Benefits**: Loose coupling, independent scaling, failure isolation, easy testing
-
-**Cost**: $100-1000/month depending on event volume
-
-### 3. Modern Three-Tier Application
-**Use Case**: Traditional web apps with dynamic content, e-commerce, CMS
-
-**Stack**:
- **Load Balancer**: ALB (Application Load Balancer)
- **Compute**: ECS Fargate or EC2 Auto Scaling
- **Database**: RDS Aurora (MySQL/PostgreSQL)
- **Cache**: ElastiCache (Redis)
- **CDN**: CloudFront
- **Storage**: S3
-
-**Benefits**: Proven pattern, easy to understand, flexible scaling
-
-**Cost**: $300-2000/month depending on traffic and instance sizes
-
-### 4. Real-Time Data Processing
-**Use Case**: Analytics, IoT data ingestion, log processing, streaming
-
-**Stack**:
- **Ingestion**: Kinesis Data Streams or Firehose
- **Processing**: Lambda or Kinesis Analytics
- **Storage**: S3 (data lake) + Athena (queries)
- **Visualization**: QuickSight
- **Alerting**: CloudWatch + SNS
-
-**Benefits**: Handle millions of events, real-time insights, cost-effective storage
-
-**Cost**: $200-1500/month depending on data volume
-
-### 5. GraphQL API Backend
-**Use Case**: Mobile apps, single-page applications, flexible data queries
-
-**Stack**:
- **API**: AppSync (managed GraphQL)
- **Resolvers**: Lambda or direct DynamoDB integration
- **Database**: DynamoDB
- **Real-time**: AppSync subscriptions (WebSocket)
- **Auth**: Cognito or API keys
-
-**Benefits**: Single endpoint, reduce over/under-fetching, real-time subscriptions
-
-**Cost**: $50-400/month for moderate usage
-
-### 6. Multi-Region High Availability
-**Use Case**: Global applications, disaster recovery, compliance requirements
-
-**Stack**:
- **DNS**: Route 53 (geolocation routing)
- **CDN**: CloudFront with multiple origins
- **Compute**: Multi-region Lambda or ECS
- **Database**: DynamoDB Global Tables or Aurora Global Database
- **Replication**: S3 cross-region replication
-
-**Benefits**: Low latency globally, disaster recovery, data sovereignty
-
-**Cost**: 1.5-2x single region costs
-
-## Best Practices
-
-### Serverless Design Principles
-1. **Stateless functions** - Store state in DynamoDB, S3, or ElastiCache
-2. **Idempotency** - Handle retries gracefully, use unique request IDs
-3. **Cold start optimization** - Use provisioned concurrency for critical paths, optimize package size
-4. **Timeout management** - Set appropriate timeouts, use Step Functions for long processes
-5. **Error handling** - Implement retry logic, dead letter queues, exponential backoff
-
-### Cost Optimization
-1. **Right-sizing** - Start small, monitor metrics, scale based on actual usage
-2. **Reserved capacity** - Use Savings Plans or Reserved Instances for predictable workloads
-3. **S3 lifecycle policies** - Transition to cheaper storage tiers (IA, Glacier)
-4. **Lambda memory optimization** - Test different memory settings for cost/performance balance
-5. **CloudWatch log retention** - Set appropriate retention periods (7-30 days for most)
-6. **NAT Gateway alternatives** - Use VPC endpoints, consider single NAT in dev environments
-
-### Security Hardening
-1. **Principle of least privilege** - IAM roles with minimal permissions
-2. **Encryption everywhere** - At rest (KMS) and in transit (TLS/SSL)
-3. **Network isolation** - Private subnets, security groups, NACLs
-4. **Secrets management** - Use Secrets Manager or Parameter Store, never hardcode
-5. **API protection** - WAF rules, rate limiting, API keys, OAuth2
-6. **Audit logging** - CloudTrail for API calls, VPC Flow Logs for network traffic
-
-### Scalability Design
-1. **Horizontal over vertical** - Scale out with more small instances vs. larger instances
-2. **Database sharding** - Partition data by tenant, geography, or time
-3. **Read replicas** - Offload read traffic from primary database
-4. **Caching layers** - CloudFront (edge), ElastiCache (application), DAX (DynamoDB)
-5. **Async processing** - Use queues (SQS) for non-critical operations
-6. **Auto-scaling policies** - Target tracking (CPU, requests) vs. step scaling
-
-### DevOps & Reliability
-1. **Infrastructure as Code** - Version control, peer review, automated testing
-2. **Blue/Green deployments** - Zero-downtime releases, instant rollback
-3. **Canary releases** - Test new versions with small traffic percentage
-4. **Health checks** - Application-level health endpoints, graceful degradation
-5. **Chaos engineering** - Test failure scenarios, validate recovery procedures
-6. **Monitoring & alerting** - Set up CloudWatch alarms for critical metrics
-
-## Service Selection Guide
-
-### Compute
- **Lambda**: Event-driven, short-duration tasks (<15 min), variable traffic
- **Fargate**: Containerized apps, long-running processes, predictable traffic
- **EC2**: Custom configurations, GPU/FPGA needs, Windows apps
- **App Runner**: Simple container deployment from source code
-
-### Database
- **DynamoDB**: Key-value, document store, serverless, single-digit ms latency
- **Aurora Serverless**: Relational DB, variable workloads, auto-scaling
- **Aurora Standard**: High-performance relational, predictable traffic
- **RDS**: Traditional databases (MySQL, PostgreSQL, MariaDB, SQL Server)
- **DocumentDB**: MongoDB-compatible, document store
- **Neptune**: Graph database for connected data
- **Timestream**: Time-series data, IoT metrics
-
-### Storage
- **S3 Standard**: Frequent access, low latency
- **S3 Intelligent-Tiering**: Automatic cost optimization
- **S3 IA (Infrequent Access)**: Backups, archives (30-day minimum)
- **S3 Glacier**: Long-term archives, compliance
- **EFS**: Network file system, shared storage across instances
- **EBS**: Block storage for EC2, high IOPS
-
-### Messaging & Events
- **EventBridge**: Event bus, loosely coupled microservices
- **SNS**: Pub/sub, fan-out notifications
- **SQS**: Message queuing, decoupling, buffering
- **Kinesis**: Real-time streaming data, analytics
- **MQ**: Managed message brokers (RabbitMQ, ActiveMQ)
-
-### API & Integration
- **API Gateway**: REST APIs, WebSocket, throttling, caching
- **AppSync**: GraphQL APIs, real-time subscriptions
- **AppFlow**: SaaS integration (Salesforce, Slack, etc.)
- **Step Functions**: Workflow orchestration, state machines
-
-## Startup-Specific Considerations
-
-### MVP (Minimum Viable Product) Architecture
-**Goal**: Launch fast, minimal infrastructure
-
-**Recommended**:
- Amplify (full-stack deployment)
- Lambda + API Gateway + DynamoDB
- Cognito for auth
- CloudFront + S3 for frontend
-
-**Cost**: $20-100/month
-**Setup time**: 1-3 days
-
-### Growth Stage (Scaling to 10k-100k users)
-**Goal**: Handle growth, maintain cost efficiency
-
-**Add**:
- ElastiCache for caching
- Aurora Serverless for complex queries
- CloudWatch dashboards and alarms
- CI/CD pipeline (CodePipeline)
- Multi-AZ deployment
-
-**Cost**: $500-2000/month
-**Migration time**: 1-2 weeks
-
-### Scale-Up (100k+ users, Series A+)
-**Goal**: Reliability, observability, global reach
-
-**Add**:
- Multi-region deployment
- DynamoDB Global Tables
- Advanced monitoring (X-Ray, third-party APM)
- WAF and Shield for DDoS protection
- Dedicated support plan
- Reserved instances/Savings Plans
-
-**Cost**: $3000-10000/month
-**Migration time**: 1-3 months
-
-## Common Pitfalls to Avoid
-
-### Technical Debt
- **Over-engineering early** - Don't build for 10M users when you have 100
- **Under-monitoring** - Set up basic monitoring from day one
- **Ignoring costs** - Enable Cost Explorer and billing alerts immediately
- **Single region dependency** - Plan for multi-region from start
-
-### Security Mistakes
- **Public S3 buckets** - Use bucket policies, block public access
- **Overly permissive IAM** - Avoid "*" permissions, use specific resources
- **Hardcoded credentials** - Use IAM roles, Secrets Manager
- **Unencrypted data** - Enable encryption by default
-
-### Performance Issues
- **No caching** - Add CloudFront, ElastiCache early
- **Inefficient queries** - Use indexes, avoid scans in DynamoDB
- **Large Lambda packages** - Use layers, minimize dependencies
- **N+1 queries** - Implement DataLoader pattern, batch operations
-
-### Cost Surprises
- **Undeleted resources** - Tag everything, review regularly
- **Data transfer costs** - Keep traffic within same AZ/region when possible
- **NAT Gateway charges** - Use VPC endpoints for AWS services
- **CloudWatch Logs accumulation** - Set retention policies
-
-## Compliance & Governance
-
-### Data Residency
- Use specific regions (eu-west-1 for GDPR)
- Enable S3 bucket replication restrictions
- Configure Route 53 geolocation routing
-
-### HIPAA Compliance
- Use BAA-eligible services only
- Enable encryption at rest and in transit
- Implement audit logging (CloudTrail)
- Configure VPC with private subnets
-
-### SOC 2 / ISO 27001
- Enable AWS Config for compliance rules
- Use AWS Audit Manager
- Implement least privilege access
- Regular security assessments
+---

 ## Limitations

- **Lambda limitations**: 15-minute execution limit, 10GB memory max, cold start latency
- **API Gateway limits**: 29-second timeout, 10MB payload size
- **DynamoDB limits**: 400KB item size, eventually consistent reads by default
- **Regional availability**: Not all services available in all regions
- **Vendor lock-in**: Some serverless services are AWS-specific (consider abstraction layers)
- **Learning curve**: Requires AWS expertise, DevOps knowledge
- **Debugging complexity**: Distributed systems harder to troubleshoot than monoliths
-
-## Helpful Resources
-
- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/
- **AWS Architecture Center**: https://aws.amazon.com/architecture/
- **Serverless Land**: https://serverlessland.com/
- **AWS Pricing Calculator**: https://calculator.aws/
- **AWS Cost Explorer**: Track and analyze spending
- **AWS Trusted Advisor**: Automated best practice checks
- **CloudFormation Templates**: https://github.com/awslabs/aws-cloudformation-templates
- **AWS CDK Examples**: https://github.com/aws-samples/aws-cdk-examples
+- Lambda: 15-minute execution, 10GB memory max
+- API Gateway: 29-second timeout, 10MB payload
+- DynamoDB: 400KB item size, eventually consistent by default
+- Regional availability varies by service
+- Some services have AWS-specific lock-in
--- a/engineering-team/aws-solution-architect/pycache/architecture_designer.cpython-313.pyc
+++ b/engineering-team/aws-solution-architect/pycache/architecture_designer.cpython-313.pyc
--- a/engineering-team/aws-solution-architect/pycache/cost_optimizer.cpython-313.pyc
+++ b/engineering-team/aws-solution-architect/pycache/cost_optimizer.cpython-313.pyc
--- a/engineering-team/aws-solution-architect/pycache/serverless_stack.cpython-313.pyc
+++ b/engineering-team/aws-solution-architect/pycache/serverless_stack.cpython-313.pyc
--- a/engineering-team/aws-solution-architect/assets/expected_output.json
+++ b/engineering-team/aws-solution-architect/assets/expected_output.json
--- a/engineering-team/aws-solution-architect/assets/sample_input.json
+++ b/engineering-team/aws-solution-architect/assets/sample_input.json
--- a/engineering-team/aws-solution-architect/references/architecture_patterns.md
+++ b/engineering-team/aws-solution-architect/references/architecture_patterns.md
@@ -0,0 +1,535 @@
+# AWS Architecture Patterns for Startups
+
+Reference guide for selecting the right AWS architecture pattern based on application requirements.
+
+---
+
+## Table of Contents
+
+- [Pattern Selection Matrix](#pattern-selection-matrix)
+- [Pattern 1: Serverless Web Application](#pattern-1-serverless-web-application)
+- [Pattern 2: Event-Driven Microservices](#pattern-2-event-driven-microservices)
+- [Pattern 3: Modern Three-Tier Application](#pattern-3-modern-three-tier-application)
+- [Pattern 4: Real-Time Data Processing](#pattern-4-real-time-data-processing)
+- [Pattern 5: GraphQL API Backend](#pattern-5-graphql-api-backend)
+- [Pattern 6: Multi-Region High Availability](#pattern-6-multi-region-high-availability)
+
+---
+
+## Pattern Selection Matrix
+
+| Pattern | Best For | Users | Monthly Cost | Complexity |
+|---------|----------|-------|--------------|------------|
+| Serverless Web | MVP, SaaS, mobile backend | <50K | $50-500 | Low |
+| Event-Driven Microservices | Complex workflows, async processing | Any | $100-1000 | Medium |
+| Three-Tier | Traditional web, e-commerce | 10K-500K | $300-2000 | Medium |
+| Real-Time Data | Analytics, IoT, streaming | Any | $200-1500 | High |
+| GraphQL Backend | Mobile apps, SPAs | <100K | $50-400 | Medium |
+| Multi-Region HA | Global apps, DR requirements | >100K | 1.5-2x single | High |
+
+---
+
+## Pattern 1: Serverless Web Application
+
+### Use Case
+SaaS platforms, mobile backends, low-traffic websites, MVPs
+
+### Architecture Diagram
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  CloudFront │────▶│     S3      │     │   Cognito   │
+│    (CDN)    │     │  (Static)   │     │   (Auth)    │
+└─────────────┘     └─────────────┘     └──────┬──────┘
+                                               │
+┌─────────────┐     ┌─────────────┐     ┌──────▼──────┐
+│   Route 53  │────▶│ API Gateway │────▶│   Lambda    │
+│    (DNS)    │     │   (REST)    │     │ (Functions) │
+└─────────────┘     └─────────────┘     └──────┬──────┘
+                                               │
+                                        ┌──────▼──────┐
+                                        │  DynamoDB   │
+                                        │ (Database)  │
+                                        └─────────────┘
+```
+
+### Service Stack
+
+| Layer | Service | Configuration |
+|-------|---------|---------------|
+| Frontend | S3 + CloudFront | Static hosting with HTTPS |
+| API | API Gateway + Lambda | REST endpoints with throttling |
+| Database | DynamoDB | Pay-per-request billing |
+| Auth | Cognito | User pools with MFA support |
+| CI/CD | Amplify or CodePipeline | Automated deployments |
+
+### CloudFormation Template
+
+```yaml
+AWSTemplateFormatVersion: '2010-09-09'
+Transform: AWS::Serverless-2016-10-31
+
+Resources:
+  # API Function
+  ApiFunction:
+    Type: AWS::Serverless::Function
+    Properties:
+      Runtime: nodejs18.x
+      Handler: index.handler
+      MemorySize: 512
+      Timeout: 10
+      Events:
+        Api:
+          Type: Api
+          Properties:
+            Path: /{proxy+}
+            Method: ANY
+
+  # DynamoDB Table
+  DataTable:
+    Type: AWS::DynamoDB::Table
+    Properties:
+      BillingMode: PAY_PER_REQUEST
+      AttributeDefinitions:
+        - AttributeName: PK
+          AttributeType: S
+        - AttributeName: SK
+          AttributeType: S
+      KeySchema:
+        - AttributeName: PK
+          KeyType: HASH
+        - AttributeName: SK
+          KeyType: RANGE
+```
+
+### Cost Breakdown (10K users)
+
+| Service | Monthly Cost |
+|---------|-------------|
+| Lambda | $5-20 |
+| API Gateway | $10-30 |
+| DynamoDB | $10-50 |
+| CloudFront | $5-15 |
+| S3 | $1-5 |
+| Cognito | $0-50 |
+| **Total** | **$31-170** |
+
+### Pros and Cons
+
+**Pros:**
+- Zero server management
+- Pay only for what you use
+- Auto-scaling built-in
+- Low operational overhead
+
+**Cons:**
+- Cold start latency (100-500ms)
+- 15-minute Lambda execution limit
+- Vendor lock-in
+
+---
+
+## Pattern 2: Event-Driven Microservices
+
+### Use Case
+Complex business workflows, asynchronous processing, decoupled systems
+
+### Architecture Diagram
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│   Service   │────▶│ EventBridge │────▶│   Service   │
+│      A      │     │  (Event Bus)│     │      B      │
+└─────────────┘     └──────┬──────┘     └─────────────┘
+                           │
+                    ┌──────▼──────┐
+                    │     SQS     │
+                    │   (Queue)   │
+                    └──────┬──────┘
+                           │
+┌─────────────┐     ┌──────▼──────┐     ┌─────────────┐
+│    Step     │◀────│   Lambda    │────▶│  DynamoDB   │
+│  Functions  │     │ (Processor) │     │  (Storage)  │
+└─────────────┘     └─────────────┘     └─────────────┘
+```
+
+### Service Stack
+
+| Layer | Service | Purpose |
+|-------|---------|---------|
+| Events | EventBridge | Central event bus |
+| Processing | Lambda or ECS Fargate | Event handlers |
+| Queue | SQS | Dead letter queue for failures |
+| Orchestration | Step Functions | Complex workflow state |
+| Storage | DynamoDB, S3 | Persistent data |
+
+### Event Schema Example
+
+```json
+{
+  "source": "orders.service",
+  "detail-type": "OrderCreated",
+  "detail": {
+    "orderId": "ord-12345",
+    "customerId": "cust-67890",
+    "items": [...],
+    "total": 99.99,
+    "timestamp": "2024-01-15T10:30:00Z"
+  }
+}
+```
+
+### Cost Breakdown
+
+| Service | Monthly Cost |
+|---------|-------------|
+| EventBridge | $1-10 |
+| Lambda | $20-100 |
+| SQS | $5-20 |
+| Step Functions | $25-100 |
+| DynamoDB | $20-100 |
+| **Total** | **$71-330** |
+
+### Pros and Cons
+
+**Pros:**
+- Loose coupling between services
+- Independent scaling per service
+- Failure isolation
+- Easy to test individually
+
+**Cons:**
+- Distributed system complexity
+- Eventual consistency
+- Harder to debug
+
+---
+
+## Pattern 3: Modern Three-Tier Application
+
+### Use Case
+Traditional web apps, e-commerce, CMS, applications with complex queries
+
+### Architecture Diagram
+
+```
+┌─────────────┐     ┌─────────────┐
+│  CloudFront │────▶│     ALB     │
+│    (CDN)    │     │ (Load Bal.) │
+└─────────────┘     └──────┬──────┘
+                           │
+                    ┌──────▼──────┐
+                    │ ECS Fargate │
+                    │ (Auto-scale)│
+                    └──────┬──────┘
+                           │
+        ┌──────────────────┼──────────────────┐
+        │                  │                  │
+ ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
+ │   Aurora    │    │ ElastiCache │    │     S3      │
+ │ (Database)  │    │   (Redis)   │    │  (Storage)  │
+ └─────────────┘    └─────────────┘    └─────────────┘
+```
+
+### Service Stack
+
+| Layer | Service | Configuration |
+|-------|---------|---------------|
+| CDN | CloudFront | Edge caching, HTTPS |
+| Load Balancer | ALB | Path-based routing, health checks |
+| Compute | ECS Fargate | Container auto-scaling |
+| Database | Aurora MySQL/PostgreSQL | Multi-AZ, auto-scaling |
+| Cache | ElastiCache Redis | Session, query caching |
+| Storage | S3 | Static assets, uploads |
+
+### Terraform Example
+
+```hcl
+# ECS Service with Auto-scaling
+resource "aws_ecs_service" "app" {
+  name            = "app-service"
+  cluster         = aws_ecs_cluster.main.id
+  task_definition = aws_ecs_task_definition.app.arn
+  desired_count   = 2
+
+  capacity_provider_strategy {
+    capacity_provider = "FARGATE"
+    weight            = 100
+  }
+
+  load_balancer {
+    target_group_arn = aws_lb_target_group.app.arn
+    container_name   = "app"
+    container_port   = 3000
+  }
+}
+
+# Auto-scaling Policy
+resource "aws_appautoscaling_target" "app" {
+  max_capacity       = 10
+  min_capacity       = 2
+  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
+  scalable_dimension = "ecs:service:DesiredCount"
+  service_namespace  = "ecs"
+}
+```
+
+### Cost Breakdown (50K users)
+
+| Service | Monthly Cost |
+|---------|-------------|
+| ECS Fargate (2 tasks) | $100-200 |
+| ALB | $25-50 |
+| Aurora | $100-300 |
+| ElastiCache | $50-100 |
+| CloudFront | $20-50 |
+| **Total** | **$295-700** |
+
+---
+
+## Pattern 4: Real-Time Data Processing
+
+### Use Case
+Analytics, IoT data ingestion, log processing, streaming data
+
+### Architecture Diagram
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  IoT Core   │────▶│   Kinesis   │────▶│   Lambda    │
+│  (Devices)  │     │  (Stream)   │     │ (Process)   │
+└─────────────┘     └─────────────┘     └──────┬──────┘
+                                               │
+┌─────────────┐     ┌─────────────┐     ┌──────▼──────┐
+│  QuickSight │◀────│   Athena    │◀────│     S3      │
+│   (Viz)     │     │  (Query)    │     │ (Data Lake) │
+└─────────────┘     └─────────────┘     └─────────────┘
+                                               │
+                                        ┌──────▼──────┐
+                                        │  CloudWatch │
+                                        │  (Alerts)   │
+                                        └─────────────┘
+```
+
+### Service Stack
+
+| Layer | Service | Purpose |
+|-------|---------|---------|
+| Ingestion | Kinesis Data Streams | Real-time data capture |
+| Processing | Lambda or Kinesis Analytics | Transform and analyze |
+| Storage | S3 (data lake) | Long-term storage |
+| Query | Athena | SQL queries on S3 |
+| Visualization | QuickSight | Dashboards and reports |
+| Alerting | CloudWatch + SNS | Threshold-based alerts |
+
+### Kinesis Producer Example
+
+```python
+import boto3
+import json
+
+kinesis = boto3.client('kinesis')
+
+def send_event(stream_name, data, partition_key):
+    response = kinesis.put_record(
+        StreamName=stream_name,
+        Data=json.dumps(data),
+        PartitionKey=partition_key
+    )
+    return response['SequenceNumber']
+
+# Send sensor reading
+send_event(
+    'sensor-stream',
+    {'sensor_id': 'temp-01', 'value': 23.5, 'unit': 'celsius'},
+    'sensor-01'
+)
+```
+
+### Cost Breakdown
+
+| Service | Monthly Cost |
+|---------|-------------|
+| Kinesis (1 shard) | $15-30 |
+| Lambda | $10-50 |
+| S3 | $5-50 |
+| Athena | $5-25 |
+| QuickSight | $24+ |
+| **Total** | **$59-179** |
+
+---
+
+## Pattern 5: GraphQL API Backend
+
+### Use Case
+Mobile apps, single-page applications, flexible data queries
+
+### Architecture Diagram
+
+```
+┌─────────────┐     ┌─────────────┐     ┌─────────────┐
+│  Mobile App │────▶│   AppSync   │────▶│   Lambda    │
+│   or SPA    │     │  (GraphQL)  │     │ (Resolvers) │
+└─────────────┘     └──────┬──────┘     └─────────────┘
+                           │
+                    ┌──────▼──────┐
+                    │  DynamoDB   │
+                    │  (Direct)   │
+                    └──────┬──────┘
+                           │
+                    ┌──────▼──────┐
+                    │   Cognito   │
+                    │   (Auth)    │
+                    └─────────────┘
+```
+
+### AppSync Schema Example
+
+```graphql
+type Query {
+  getUser(id: ID!): User
+  listPosts(limit: Int, nextToken: String): PostConnection
+}
+
+type Mutation {
+  createPost(input: CreatePostInput!): Post
+  updatePost(input: UpdatePostInput!): Post
+}
+
+type Subscription {
+  onCreatePost: Post @aws_subscribe(mutations: ["createPost"])
+}
+
+type User {
+  id: ID!
+  email: String!
+  posts: [Post]
+}
+
+type Post {
+  id: ID!
+  title: String!
+  content: String!
+  author: User!
+  createdAt: AWSDateTime!
+}
+```
+
+### Cost Breakdown
+
+| Service | Monthly Cost |
+|---------|-------------|
+| AppSync | $4-40 |
+| Lambda | $5-30 |
+| DynamoDB | $10-50 |
+| Cognito | $0-50 |
+| **Total** | **$19-170** |
+
+---
+
+## Pattern 6: Multi-Region High Availability
+
+### Use Case
+Global applications, disaster recovery, data sovereignty compliance
+
+### Architecture Diagram
+
+```
+                    ┌─────────────┐
+                    │  Route 53   │
+                    │(Geo routing)│
+                    └──────┬──────┘
+                           │
+          ┌────────────────┼────────────────┐
+          │                                 │
+   ┌──────▼──────┐                   ┌──────▼──────┐
+   │ us-east-1   │                   │ eu-west-1   │
+   │ CloudFront  │                   │ CloudFront  │
+   └──────┬──────┘                   └──────┬──────┘
+          │                                 │
+   ┌──────▼──────┐                   ┌──────▼──────┐
+   │ ECS/Lambda  │                   │ ECS/Lambda  │
+   └──────┬──────┘                   └──────┬──────┘
+          │                                 │
+   ┌──────▼──────┐◀── Replication ──▶┌──────▼──────┐
+   │  DynamoDB   │                   │  DynamoDB   │
+   │Global Table │                   │Global Table │
+   └─────────────┘                   └─────────────┘
+```
+
+### Service Stack
+
+| Component | Service | Configuration |
+|-----------|---------|---------------|
+| DNS | Route 53 | Geolocation or latency routing |
+| CDN | CloudFront | Multiple origins per region |
+| Compute | Lambda or ECS | Deployed in each region |
+| Database | DynamoDB Global Tables | Automatic replication |
+| Storage | S3 CRR | Cross-region replication |
+
+### Route 53 Failover Policy
+
+```yaml
+# Primary record
+HealthCheck:
+  Type: AWS::Route53::HealthCheck
+  Properties:
+    HealthCheckConfig:
+      Port: 443
+      Type: HTTPS
+      ResourcePath: /health
+      FullyQualifiedDomainName: api-us-east-1.example.com
+
+RecordSetPrimary:
+  Type: AWS::Route53::RecordSet
+  Properties:
+    Name: api.example.com
+    Type: A
+    SetIdentifier: primary
+    Failover: PRIMARY
+    HealthCheckId: !Ref HealthCheck
+    AliasTarget:
+      DNSName: !GetAtt USEast1ALB.DNSName
+      HostedZoneId: !GetAtt USEast1ALB.CanonicalHostedZoneID
+```
+
+### Cost Considerations
+
+| Factor | Impact |
+|--------|--------|
+| Compute | 2x (each region) |
+| Database | 25% premium for global tables |
+| Data Transfer | Cross-region replication costs |
+| Route 53 | Health checks + geo queries |
+| **Total** | **1.5-2x single region** |
+
+---
+
+## Pattern Comparison Summary
+
+### Latency
+
+| Pattern | Typical Latency |
+|---------|-----------------|
+| Serverless | 50-200ms (cold: 500ms+) |
+| Three-Tier | 20-100ms |
+| GraphQL | 30-150ms |
+| Multi-Region | <50ms (regional) |
+
+### Scaling Characteristics
+
+| Pattern | Scale Limit | Scale Speed |
+|---------|-------------|-------------|
+| Serverless | 1000 concurrent/function | Instant |
+| Three-Tier | Instance limits | Minutes |
+| Event-Driven | Unlimited | Instant |
+| Multi-Region | Regional limits | Instant |
+
+### Operational Complexity
+
+| Pattern | Setup | Maintenance | Debugging |
+|---------|-------|-------------|-----------|
+| Serverless | Low | Low | Medium |
+| Three-Tier | Medium | Medium | Low |
+| Event-Driven | High | Medium | High |
+| Multi-Region | High | High | High |
--- a/engineering-team/aws-solution-architect/references/best_practices.md
+++ b/engineering-team/aws-solution-architect/references/best_practices.md
@@ -0,0 +1,631 @@
+# AWS Best Practices for Startups
+
+Production-ready practices for serverless, cost optimization, security, and operational excellence.
+
+---
+
+## Table of Contents
+
+- [Serverless Best Practices](#serverless-best-practices)
+- [Cost Optimization](#cost-optimization)
+- [Security Hardening](#security-hardening)
+- [Scalability Patterns](#scalability-patterns)
+- [DevOps and Reliability](#devops-and-reliability)
+- [Common Pitfalls](#common-pitfalls)
+
+---
+
+## Serverless Best Practices
+
+### Lambda Function Design
+
+#### 1. Keep Functions Stateless
+
+Store state externally in DynamoDB, S3, or ElastiCache.
+
+```python
+# BAD: Function-level state
+cache = {}
+
+def handler(event, context):
+    if event['key'] in cache:
+        return cache[event['key']]
+    # ...
+
+# GOOD: External state
+import boto3
+dynamodb = boto3.resource('dynamodb')
+table = dynamodb.Table('cache')
+
+def handler(event, context):
+    response = table.get_item(Key={'pk': event['key']})
+    if 'Item' in response:
+        return response['Item']['value']
+    # ...
+```
+
+#### 2. Implement Idempotency
+
+Handle retries gracefully with unique request IDs.
+
+```python
+import boto3
+import hashlib
+
+dynamodb = boto3.resource('dynamodb')
+idempotency_table = dynamodb.Table('idempotency')
+
+def handler(event, context):
+    # Generate idempotency key
+    idempotency_key = hashlib.sha256(
+        f"{event['orderId']}-{event['action']}".encode()
+    ).hexdigest()
+
+    # Check if already processed
+    try:
+        response = idempotency_table.get_item(Key={'pk': idempotency_key})
+        if 'Item' in response:
+            return response['Item']['result']
+    except Exception:
+        pass
+
+    # Process request
+    result = process_order(event)
+
+    # Store result for idempotency
+    idempotency_table.put_item(
+        Item={
+            'pk': idempotency_key,
+            'result': result,
+            'ttl': int(time.time()) + 86400  # 24h TTL
+        }
+    )
+
+    return result
+```
+
+#### 3. Optimize Cold Starts
+
+```python
+# Initialize outside handler (reused across invocations)
+import boto3
+from aws_xray_sdk.core import patch_all
+
+# SDK initialization happens once
+dynamodb = boto3.resource('dynamodb')
+table = dynamodb.Table('my-table')
+patch_all()
+
+def handler(event, context):
+    # Handler code uses pre-initialized resources
+    return table.get_item(Key={'pk': event['id']})
+```
+
+**Cold Start Reduction Techniques:**
+- Use provisioned concurrency for critical paths
+- Minimize package size (use layers for dependencies)
+- Choose interpreted languages (Python, Node.js) over compiled
+- Avoid VPC unless necessary (adds 6-10 sec cold start)
+
+#### 4. Set Appropriate Timeouts
+
+```yaml
+# Lambda configuration
+Functions:
+  ApiHandler:
+    Timeout: 10  # Shorter for synchronous APIs
+    MemorySize: 512
+
+  BackgroundProcessor:
+    Timeout: 300  # Longer for async processing
+    MemorySize: 1024
+```
+
+**Timeout Guidelines:**
+- API handlers: 10-30 seconds
+- Event processors: 60-300 seconds
+- Use Step Functions for >15 minute workflows
+
+---
+
+## Cost Optimization
+
+### 1. Right-Sizing Strategy
+
+```bash
+# Check EC2 utilization
+aws cloudwatch get-metric-statistics \
+  --namespace AWS/EC2 \
+  --metric-name CPUUtilization \
+  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
+  --start-time $(date -d '7 days ago' -u +"%Y-%m-%dT%H:%M:%SZ") \
+  --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
+  --period 3600 \
+  --statistics Average
+```
+
+**Right-Sizing Rules:**
+- <10% CPU average: Downsize instance
+- >80% CPU average: Consider upgrade or horizontal scaling
+- Review every month for the first 6 months
+
+### 2. Savings Plans and Reserved Instances
+
+| Commitment | Savings | Best For |
+|------------|---------|----------|
+| No Upfront, 1-year | 20-30% | Unknown future |
+| Partial Upfront, 1-year | 30-40% | Moderate confidence |
+| All Upfront, 3-year | 50-60% | Stable workloads |
+
+```bash
+# Check Savings Plans recommendations
+aws cost-explorer get-savings-plans-purchase-recommendation \
+  --savings-plans-type COMPUTE_SP \
+  --term-in-years ONE_YEAR \
+  --payment-option NO_UPFRONT \
+  --lookback-period-in-days THIRTY_DAYS
+```
+
+### 3. S3 Lifecycle Policies
+
+```json
+{
+  "Rules": [
+    {
+      "ID": "Transition to cheaper storage",
+      "Status": "Enabled",
+      "Filter": {
+        "Prefix": "logs/"
+      },
+      "Transitions": [
+        { "Days": 30, "StorageClass": "STANDARD_IA" },
+        { "Days": 90, "StorageClass": "GLACIER" }
+      ],
+      "Expiration": { "Days": 365 }
+    }
+  ]
+}
+```
+
+### 4. Lambda Memory Optimization
+
+Test different memory settings to find optimal cost/performance.
+
+```python
+# Use AWS Lambda Power Tuning
+# https://github.com/alexcasalboni/aws-lambda-power-tuning
+
+# Example results:
+# 128 MB: 2000ms, $0.000042
+# 512 MB: 500ms, $0.000042
+# 1024 MB: 300ms, $0.000050
+
+# Optimal: 512 MB (same cost, 4x faster)
+```
+
+### 5. NAT Gateway Alternatives
+
+```
+NAT Gateway: $0.045/hour + $0.045/GB = ~$32/month + data
+
+Alternatives:
+1. VPC Endpoints: $0.01/hour = ~$7.30/month (for AWS services)
+2. NAT Instance: t3.nano = ~$3.80/month (limited throughput)
+3. No NAT: Use VPC endpoints + Lambda outside VPC
+```
+
+### 6. CloudWatch Log Retention
+
+```yaml
+# Set retention policies to avoid unbounded growth
+LogGroup:
+  Type: AWS::Logs::LogGroup
+  Properties:
+    LogGroupName: /aws/lambda/my-function
+    RetentionInDays: 14  # 7, 14, 30, 60, 90, etc.
+```
+
+**Retention Guidelines:**
+- Development: 7 days
+- Production non-critical: 30 days
+- Production critical: 90 days
+- Compliance requirements: As specified
+
+---
+
+## Security Hardening
+
+### 1. IAM Least Privilege
+
+```json
+// BAD: Overly permissive
+{
+  "Effect": "Allow",
+  "Action": "dynamodb:*",
+  "Resource": "*"
+}
+
+// GOOD: Specific actions and resources
+{
+  "Effect": "Allow",
+  "Action": [
+    "dynamodb:GetItem",
+    "dynamodb:PutItem",
+    "dynamodb:Query"
+  ],
+  "Resource": [
+    "arn:aws:dynamodb:us-east-1:123456789:table/users",
+    "arn:aws:dynamodb:us-east-1:123456789:table/users/index/*"
+  ]
+}
+```
+
+### 2. Encryption Configuration
+
+```yaml
+# Enable encryption everywhere
+Resources:
+  # DynamoDB
+  Table:
+    Type: AWS::DynamoDB::Table
+    Properties:
+      SSESpecification:
+        SSEEnabled: true
+        SSEType: KMS
+        KMSMasterKeyId: !Ref EncryptionKey
+
+  # S3
+  Bucket:
+    Type: AWS::S3::Bucket
+    Properties:
+      BucketEncryption:
+        ServerSideEncryptionConfiguration:
+          - ServerSideEncryptionByDefault:
+              SSEAlgorithm: aws:kms
+              KMSMasterKeyID: !Ref EncryptionKey
+
+  # RDS
+  Database:
+    Type: AWS::RDS::DBInstance
+    Properties:
+      StorageEncrypted: true
+      KmsKeyId: !Ref EncryptionKey
+```
+
+### 3. Network Isolation
+
+```yaml
+# Private subnets with VPC endpoints
+Resources:
+  PrivateSubnet:
+    Type: AWS::EC2::Subnet
+    Properties:
+      MapPublicIpOnLaunch: false
+
+  # DynamoDB Gateway Endpoint (free)
+  DynamoDBEndpoint:
+    Type: AWS::EC2::VPCEndpoint
+    Properties:
+      VpcId: !Ref VPC
+      ServiceName: !Sub com.amazonaws.${AWS::Region}.dynamodb
+      VpcEndpointType: Gateway
+      RouteTableIds:
+        - !Ref PrivateRouteTable
+
+  # Secrets Manager Interface Endpoint
+  SecretsEndpoint:
+    Type: AWS::EC2::VPCEndpoint
+    Properties:
+      VpcId: !Ref VPC
+      ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager
+      VpcEndpointType: Interface
+      PrivateDnsEnabled: true
+```
+
+### 4. Secrets Management
+
+```python
+# Never hardcode secrets
+import boto3
+import json
+
+def get_secret(secret_name):
+    client = boto3.client('secretsmanager')
+    response = client.get_secret_value(SecretId=secret_name)
+    return json.loads(response['SecretString'])
+
+# Usage
+db_creds = get_secret('prod/database/credentials')
+connection = connect(
+    host=db_creds['host'],
+    user=db_creds['username'],
+    password=db_creds['password']
+)
+```
+
+### 5. API Protection
+
+```yaml
+# WAF + API Gateway
+WebACL:
+  Type: AWS::WAFv2::WebACL
+  Properties:
+    DefaultAction:
+      Allow: {}
+    Rules:
+      - Name: RateLimit
+        Priority: 1
+        Action:
+          Block: {}
+        Statement:
+          RateBasedStatement:
+            Limit: 2000
+            AggregateKeyType: IP
+        VisibilityConfig:
+          SampledRequestsEnabled: true
+          CloudWatchMetricsEnabled: true
+          MetricName: RateLimitRule
+
+      - Name: AWSManagedRulesCommonRuleSet
+        Priority: 2
+        OverrideAction:
+          None: {}
+        Statement:
+          ManagedRuleGroupStatement:
+            VendorName: AWS
+            Name: AWSManagedRulesCommonRuleSet
+```
+
+### 6. Audit Logging
+
+```yaml
+# Enable CloudTrail for all API calls
+CloudTrail:
+  Type: AWS::CloudTrail::Trail
+  Properties:
+    IsMultiRegionTrail: true
+    IsLogging: true
+    S3BucketName: !Ref AuditLogsBucket
+    IncludeGlobalServiceEvents: true
+    EnableLogFileValidation: true
+    EventSelectors:
+      - ReadWriteType: All
+        IncludeManagementEvents: true
+```
+
+---
+
+## Scalability Patterns
+
+### 1. Horizontal vs Vertical Scaling
+
+```
+Horizontal (preferred):
+- Add more Lambda concurrent executions
+- Add more Fargate tasks
+- Add more DynamoDB capacity
+
+Vertical (when necessary):
+- Increase Lambda memory
+- Upgrade RDS instance
+- Larger EC2 instances
+```
+
+### 2. Database Sharding
+
+```python
+# Partition by tenant ID
+def get_table_for_tenant(tenant_id):
+    shard = hash(tenant_id) % NUM_SHARDS
+    return f"data-shard-{shard}"
+
+# Or use DynamoDB single-table design with partition keys
+def get_partition_key(tenant_id, entity_type, entity_id):
+    return f"TENANT#{tenant_id}#{entity_type}#{entity_id}"
+```
+
+### 3. Caching Layers
+
+```
+Edge (CloudFront):     Global, static content, TTL: hours-days
+Application (Redis):   Regional, session/query cache, TTL: minutes-hours
+Database (DAX):        DynamoDB-specific, TTL: minutes
+```
+
+```python
+# ElastiCache Redis caching pattern
+import redis
+import json
+
+cache = redis.Redis(host='cache.abc123.cache.amazonaws.com', port=6379)
+
+def get_user(user_id):
+    # Check cache first
+    cached = cache.get(f"user:{user_id}")
+    if cached:
+        return json.loads(cached)
+
+    # Fetch from database
+    user = db.get_user(user_id)
+
+    # Cache for 5 minutes
+    cache.setex(f"user:{user_id}", 300, json.dumps(user))
+
+    return user
+```
+
+### 4. Auto-Scaling Configuration
+
+```yaml
+# ECS Service Auto-scaling
+AutoScalingTarget:
+  Type: AWS::ApplicationAutoScaling::ScalableTarget
+  Properties:
+    MaxCapacity: 10
+    MinCapacity: 2
+    ResourceId: !Sub service/${Cluster}/${Service.Name}
+    ScalableDimension: ecs:service:DesiredCount
+    ServiceNamespace: ecs
+
+ScalingPolicy:
+  Type: AWS::ApplicationAutoScaling::ScalingPolicy
+  Properties:
+    PolicyType: TargetTrackingScaling
+    TargetTrackingScalingPolicyConfiguration:
+      PredefinedMetricSpecification:
+        PredefinedMetricType: ECSServiceAverageCPUUtilization
+      TargetValue: 70
+      ScaleInCooldown: 300
+      ScaleOutCooldown: 60
+```
+
+---
+
+## DevOps and Reliability
+
+### 1. Infrastructure as Code
+
+```bash
+# Version control all infrastructure
+git init
+git add .
+git commit -m "Initial infrastructure setup"
+
+# Use separate stacks per environment
+cdk deploy --context environment=dev
+cdk deploy --context environment=staging
+cdk deploy --context environment=production
+```
+
+### 2. Blue/Green Deployments
+
+```yaml
+# CodeDeploy Blue/Green for ECS
+DeploymentGroup:
+  Type: AWS::CodeDeploy::DeploymentGroup
+  Properties:
+    DeploymentConfigName: CodeDeployDefault.ECSAllAtOnce
+    DeploymentStyle:
+      DeploymentType: BLUE_GREEN
+      DeploymentOption: WITH_TRAFFIC_CONTROL
+    BlueGreenDeploymentConfiguration:
+      DeploymentReadyOption:
+        ActionOnTimeout: CONTINUE_DEPLOYMENT
+        WaitTimeInMinutes: 0
+      TerminateBlueInstancesOnDeploymentSuccess:
+        Action: TERMINATE
+        TerminationWaitTimeInMinutes: 5
+```
+
+### 3. Health Checks
+
+```python
+# Application health endpoint
+from flask import Flask, jsonify
+import boto3
+
+app = Flask(__name__)
+
+@app.route('/health')
+def health():
+    checks = {
+        'database': check_database(),
+        'cache': check_cache(),
+        'external_api': check_external_api()
+    }
+
+    status = 'healthy' if all(checks.values()) else 'unhealthy'
+    code = 200 if status == 'healthy' else 503
+
+    return jsonify({'status': status, 'checks': checks}), code
+
+def check_database():
+    try:
+        # Quick connectivity test
+        db.execute('SELECT 1')
+        return True
+    except Exception:
+        return False
+```
+
+### 4. Monitoring Setup
+
+```yaml
+# CloudWatch Dashboard
+Dashboard:
+  Type: AWS::CloudWatch::Dashboard
+  Properties:
+    DashboardName: production-overview
+    DashboardBody: |
+      {
+        "widgets": [
+          {
+            "type": "metric",
+            "properties": {
+              "metrics": [
+                ["AWS/Lambda", "Invocations", "FunctionName", "api-handler"],
+                [".", "Errors", ".", "."],
+                [".", "Duration", ".", ".", {"stat": "p99"}]
+              ],
+              "period": 60,
+              "title": "Lambda Metrics"
+            }
+          }
+        ]
+      }
+
+# Critical Alarms
+ErrorAlarm:
+  Type: AWS::CloudWatch::Alarm
+  Properties:
+    AlarmName: high-error-rate
+    MetricName: Errors
+    Namespace: AWS/Lambda
+    Statistic: Sum
+    Period: 60
+    EvaluationPeriods: 3
+    Threshold: 10
+    ComparisonOperator: GreaterThanThreshold
+    AlarmActions:
+      - !Ref AlertTopic
+```
+
+---
+
+## Common Pitfalls
+
+### Technical Debt
+
+| Pitfall | Solution |
+|---------|----------|
+| Over-engineering early | Start simple, scale when needed |
+| Under-monitoring | Set up CloudWatch from day one |
+| Ignoring costs | Enable Cost Explorer and billing alerts |
+| Single region only | Plan for multi-region from start |
+
+### Security Mistakes
+
+| Mistake | Prevention |
+|---------|------------|
+| Public S3 buckets | Block public access, use bucket policies |
+| Overly permissive IAM | Never use "*", specify resources |
+| Hardcoded credentials | Use Secrets Manager, IAM roles |
+| Unencrypted data | Enable encryption by default |
+
+### Performance Issues
+
+| Issue | Solution |
+|-------|----------|
+| No caching | Add CloudFront, ElastiCache early |
+| Inefficient queries | Use indexes, avoid DynamoDB scans |
+| Large Lambda packages | Use layers, minimize dependencies |
+| N+1 queries | Implement DataLoader, batch operations |
+
+### Cost Surprises
+
+| Surprise | Prevention |
+|----------|------------|
+| Undeleted resources | Tag everything, review weekly |
+| Data transfer costs | Keep traffic in same AZ/region |
+| NAT Gateway charges | Use VPC endpoints for AWS services |
+| Log accumulation | Set CloudWatch retention policies |
--- a/engineering-team/aws-solution-architect/references/service_selection.md
+++ b/engineering-team/aws-solution-architect/references/service_selection.md
@@ -0,0 +1,484 @@
+# AWS Service Selection Guide
+
+Quick reference for choosing the right AWS service based on requirements.
+
+---
+
+## Table of Contents
+
+- [Compute Services](#compute-services)
+- [Database Services](#database-services)
+- [Storage Services](#storage-services)
+- [Messaging and Events](#messaging-and-events)
+- [API and Integration](#api-and-integration)
+- [Networking](#networking)
+- [Security and Identity](#security-and-identity)
+
+---
+
+## Compute Services
+
+### Decision Matrix
+
+| Requirement | Recommended Service |
+|-------------|---------------------|
+| Event-driven, short tasks (<15 min) | Lambda |
+| Containerized apps, predictable traffic | ECS Fargate |
+| Custom configs, GPU/FPGA | EC2 |
+| Simple container from source | App Runner |
+| Kubernetes workloads | EKS |
+| Batch processing | AWS Batch |
+
+### Lambda
+
+**Best for:** Event-driven functions, API backends, scheduled tasks
+
+```
+Limits:
+- Execution: 15 minutes max
+- Memory: 128 MB - 10 GB
+- Package: 50 MB (zip), 10 GB (container)
+- Concurrency: 1000 default (soft limit)
+
+Pricing: $0.20 per 1M requests + compute time
+```
+
+**Use when:**
+- Variable/unpredictable traffic
+- Pay-per-use is important
+- No server management desired
+- Short-duration operations
+
+**Avoid when:**
+- Long-running processes (>15 min)
+- Low-latency requirements (<50ms)
+- Heavy compute (consider Fargate)
+
+### ECS Fargate
+
+**Best for:** Containerized applications, microservices
+
+```
+Limits:
+- vCPU: 0.25 - 16
+- Memory: 0.5 GB - 120 GB
+- Storage: 20 GB - 200 GB ephemeral
+
+Pricing: Per vCPU-hour + GB-hour
+```
+
+**Use when:**
+- Containerized applications
+- Predictable traffic patterns
+- Long-running processes
+- Need more control than Lambda
+
+### EC2
+
+**Best for:** Custom configurations, specialized hardware
+
+```
+Instance Types:
+- General: t3, m6i
+- Compute: c6i
+- Memory: r6i
+- GPU: p4d, g5
+- Storage: i3, d3
+```
+
+**Use when:**
+- Need GPU/FPGA
+- Windows applications
+- Specific instance configurations
+- Reserved capacity makes sense
+
+---
+
+## Database Services
+
+### Decision Matrix
+
+| Data Type | Query Pattern | Scale | Recommended |
+|-----------|--------------|-------|-------------|
+| Key-value | Simple lookups | Any | DynamoDB |
+| Document | Flexible queries | <1TB | DocumentDB |
+| Relational | Complex joins | Variable | Aurora Serverless |
+| Relational | High volume | Fixed | Aurora Standard |
+| Time-series | Time-based | Any | Timestream |
+| Graph | Relationships | Any | Neptune |
+
+### DynamoDB
+
+**Best for:** Key-value and document data, serverless applications
+
+```
+Limits:
+- Item size: 400 KB max
+- Partition key: 2048 bytes
+- Sort key: 1024 bytes
+- GSI: 20 per table
+
+Pricing:
+- On-demand: $1.25 per million writes, $0.25 per million reads
+- Provisioned: Per RCU/WCU
+```
+
+**Data Modeling Example:**
+
+```
+# Single-table design for e-commerce
+PK                  SK                  Attributes
+USER#123            PROFILE             {name, email, ...}
+USER#123            ORDER#456           {total, status, ...}
+USER#123            ORDER#456#ITEM#1    {product, qty, ...}
+PRODUCT#789         METADATA            {name, price, ...}
+```
+
+### Aurora
+
+**Best for:** Relational data with complex queries
+
+| Edition | Use Case | Scaling |
+|---------|----------|---------|
+| Aurora Serverless v2 | Variable workloads | 0.5-128 ACUs, auto |
+| Aurora Standard | Predictable workloads | Instance-based |
+| Aurora Global | Multi-region | Cross-region replication |
+
+```
+Limits:
+- Storage: 128 TB max
+- Replicas: 15 read replicas
+- Connections: Instance-dependent
+
+Pricing:
+- Serverless: $0.12 per ACU-hour
+- Standard: Instance + storage + I/O
+```
+
+### Comparison: DynamoDB vs Aurora
+
+| Factor | DynamoDB | Aurora |
+|--------|----------|--------|
+| Query flexibility | Limited (key-based) | Full SQL |
+| Scaling | Instant, unlimited | Minutes, up to limits |
+| Consistency | Eventually/Strong | ACID |
+| Cost model | Per-request | Per-hour |
+| Operational | Zero management | Some management |
+
+---
+
+## Storage Services
+
+### S3 Storage Classes
+
+| Class | Access Pattern | Retrieval | Cost (GB/mo) |
+|-------|---------------|-----------|--------------|
+| Standard | Frequent | Instant | $0.023 |
+| Intelligent-Tiering | Unknown | Instant | $0.023 + monitoring |
+| Standard-IA | Infrequent (30+ days) | Instant | $0.0125 |
+| One Zone-IA | Infrequent, single AZ | Instant | $0.01 |
+| Glacier Instant | Archive, instant access | Instant | $0.004 |
+| Glacier Flexible | Archive | Minutes-hours | $0.0036 |
+| Glacier Deep Archive | Long-term archive | 12-48 hours | $0.00099 |
+
+### Lifecycle Policy Example
+
+```json
+{
+  "Rules": [
+    {
+      "ID": "Archive old data",
+      "Status": "Enabled",
+      "Transitions": [
+        {
+          "Days": 30,
+          "StorageClass": "STANDARD_IA"
+        },
+        {
+          "Days": 90,
+          "StorageClass": "GLACIER"
+        },
+        {
+          "Days": 365,
+          "StorageClass": "DEEP_ARCHIVE"
+        }
+      ],
+      "Expiration": {
+        "Days": 2555
+      }
+    }
+  ]
+}
+```
+
+### Block and File Storage
+
+| Service | Use Case | Access |
+|---------|----------|--------|
+| EBS | EC2 block storage | Single instance |
+| EFS | Shared file system | Multiple instances |
+| FSx for Lustre | HPC workloads | High throughput |
+| FSx for Windows | Windows apps | SMB protocol |
+
+---
+
+## Messaging and Events
+
+### Decision Matrix
+
+| Pattern | Service | Use Case |
+|---------|---------|----------|
+| Event routing | EventBridge | Microservices, SaaS integration |
+| Pub/sub | SNS | Fan-out notifications |
+| Queue | SQS | Decoupling, buffering |
+| Streaming | Kinesis | Real-time analytics |
+| Message broker | Amazon MQ | Legacy migrations |
+
+### EventBridge
+
+**Best for:** Event-driven architectures, SaaS integration
+
+```python
+# EventBridge rule pattern
+{
+    "source": ["orders.service"],
+    "detail-type": ["OrderCreated"],
+    "detail": {
+        "total": [{"numeric": [">=", 100]}]
+    }
+}
+```
+
+### SQS
+
+**Best for:** Decoupling services, handling load spikes
+
+| Feature | Standard | FIFO |
+|---------|----------|------|
+| Throughput | Unlimited | 3000 msg/sec |
+| Ordering | Best effort | Guaranteed |
+| Delivery | At least once | Exactly once |
+| Deduplication | No | Yes |
+
+```python
+# SQS with dead letter queue
+import boto3
+
+sqs = boto3.client('sqs')
+
+def process_with_dlq(queue_url, dlq_url, max_retries=3):
+    response = sqs.receive_message(
+        QueueUrl=queue_url,
+        MaxNumberOfMessages=10,
+        WaitTimeSeconds=20,
+        AttributeNames=['ApproximateReceiveCount']
+    )
+
+    for message in response.get('Messages', []):
+        receive_count = int(message['Attributes']['ApproximateReceiveCount'])
+
+        try:
+            process(message)
+            sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
+        except Exception as e:
+            if receive_count >= max_retries:
+                sqs.send_message(QueueUrl=dlq_url, MessageBody=message['Body'])
+                sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
+```
+
+### Kinesis
+
+**Best for:** Real-time streaming data, analytics
+
+| Service | Use Case |
+|---------|----------|
+| Data Streams | Custom processing |
+| Data Firehose | Direct to S3/Redshift |
+| Data Analytics | SQL on streams |
+| Video Streams | Video ingestion |
+
+---
+
+## API and Integration
+
+### API Gateway vs AppSync
+
+| Factor | API Gateway | AppSync |
+|--------|-------------|---------|
+| Protocol | REST, WebSocket | GraphQL |
+| Real-time | WebSocket setup | Built-in subscriptions |
+| Caching | Response caching | Field-level caching |
+| Integration | Lambda, HTTP, AWS | Lambda, DynamoDB, HTTP |
+| Pricing | Per request | Per request + data |
+
+### API Gateway Configuration
+
+```yaml
+# Throttling and caching
+Resources:
+  ApiGateway:
+    Type: AWS::ApiGateway::RestApi
+    Properties:
+      Name: my-api
+
+  ApiStage:
+    Type: AWS::ApiGateway::Stage
+    Properties:
+      StageName: prod
+      MethodSettings:
+        - HttpMethod: "*"
+          ResourcePath: "/*"
+          ThrottlingBurstLimit: 500
+          ThrottlingRateLimit: 1000
+          CachingEnabled: true
+          CacheTtlInSeconds: 300
+```
+
+### Step Functions
+
+**Best for:** Workflow orchestration, long-running processes
+
+```json
+{
+  "StartAt": "ProcessOrder",
+  "States": {
+    "ProcessOrder": {
+      "Type": "Task",
+      "Resource": "arn:aws:lambda:...:processOrder",
+      "Next": "CheckInventory"
+    },
+    "CheckInventory": {
+      "Type": "Choice",
+      "Choices": [
+        {
+          "Variable": "$.inStock",
+          "BooleanEquals": true,
+          "Next": "ShipOrder"
+        }
+      ],
+      "Default": "BackOrder"
+    },
+    "ShipOrder": {
+      "Type": "Task",
+      "Resource": "arn:aws:lambda:...:shipOrder",
+      "End": true
+    },
+    "BackOrder": {
+      "Type": "Task",
+      "Resource": "arn:aws:lambda:...:backOrder",
+      "End": true
+    }
+  }
+}
+```
+
+---
+
+## Networking
+
+### VPC Components
+
+| Component | Purpose |
+|-----------|---------|
+| VPC | Isolated network |
+| Subnet | Network segment (public/private) |
+| Internet Gateway | Public internet access |
+| NAT Gateway | Private subnet outbound |
+| VPC Endpoint | Private AWS service access |
+| Transit Gateway | VPC interconnection |
+
+### VPC Design Pattern
+
+```
+VPC: 10.0.0.0/16
+
+Public Subnets (AZ a, b, c):
+  10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24
+  - ALB, NAT Gateway, Bastion
+
+Private Subnets (AZ a, b, c):
+  10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24
+  - Application servers, Lambda
+
+Database Subnets (AZ a, b, c):
+  10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24
+  - RDS, ElastiCache
+```
+
+### VPC Endpoints (Cost Savings)
+
+```yaml
+# Interface endpoint for Secrets Manager
+SecretsManagerEndpoint:
+  Type: AWS::EC2::VPCEndpoint
+  Properties:
+    VpcId: !Ref VPC
+    ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager
+    VpcEndpointType: Interface
+    SubnetIds: !Ref PrivateSubnets
+    SecurityGroupIds:
+      - !Ref EndpointSecurityGroup
+```
+
+---
+
+## Security and Identity
+
+### IAM Best Practices
+
+```json
+// Least privilege policy example
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "dynamodb:GetItem",
+        "dynamodb:PutItem",
+        "dynamodb:Query"
+      ],
+      "Resource": "arn:aws:dynamodb:us-east-1:123456789:table/users",
+      "Condition": {
+        "ForAllValues:StringEquals": {
+          "dynamodb:LeadingKeys": ["${aws:userid}"]
+        }
+      }
+    }
+  ]
+}
+```
+
+### Secrets Manager vs Parameter Store
+
+| Factor | Secrets Manager | Parameter Store |
+|--------|-----------------|-----------------|
+| Auto-rotation | Built-in | Manual |
+| Cross-account | Yes | Limited |
+| Pricing | $0.40/secret/month | Free (standard) |
+| Use case | Credentials, API keys | Config, non-secrets |
+
+### Cognito Configuration
+
+```yaml
+UserPool:
+  Type: AWS::Cognito::UserPool
+  Properties:
+    UserPoolName: my-app-users
+    AutoVerifiedAttributes:
+      - email
+    MfaConfiguration: OPTIONAL
+    EnabledMfas:
+      - SOFTWARE_TOKEN_MFA
+    Policies:
+      PasswordPolicy:
+        MinimumLength: 12
+        RequireLowercase: true
+        RequireUppercase: true
+        RequireNumbers: true
+        RequireSymbols: true
+    AccountRecoverySetting:
+      RecoveryMechanisms:
+        - Name: verified_email
+          Priority: 1
+```
--- a/engineering-team/aws-solution-architect/scripts/architecture_designer.py
+++ b/engineering-team/aws-solution-architect/scripts/architecture_designer.py
--- a/engineering-team/aws-solution-architect/scripts/cost_optimizer.py
+++ b/engineering-team/aws-solution-architect/scripts/cost_optimizer.py
--- a/engineering-team/aws-solution-architect/scripts/serverless_stack.py
+++ b/engineering-team/aws-solution-architect/scripts/serverless_stack.py