diff --git a/.codex/skills-index.json b/.codex/skills-index.json index 533ee3c..009e375 100644 --- a/.codex/skills-index.json +++ b/.codex/skills-index.json @@ -21,7 +21,7 @@ "name": "aws-solution-architect", "source": "../../engineering-team/aws-solution-architect", "category": "engineering", - "description": "Expert AWS solution architecture for startups focusing on serverless, scalable, and cost-effective cloud infrastructure with modern DevOps practices and infrastructure-as-code" + "description": "Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora, and cost optimization." }, { "name": "code-reviewer", diff --git a/engineering-team/aws-solution-architect/HOW_TO_USE.md b/engineering-team/aws-solution-architect/HOW_TO_USE.md deleted file mode 100644 index 59dbb9f..0000000 --- a/engineering-team/aws-solution-architect/HOW_TO_USE.md +++ /dev/null @@ -1,308 +0,0 @@ -# How to Use This Skill - -Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable serverless architecture for my startup? - -## Example Invocations - -**Example 1: Serverless Web Application** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you design a serverless architecture for a SaaS platform with 10k users, including API, database, and authentication? -``` - -**Example 2: Microservices Architecture** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you design an event-driven microservices architecture using Lambda, EventBridge, and DynamoDB for an e-commerce platform? -``` - -**Example 3: Cost Optimization** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you analyze my current AWS setup and recommend ways to reduce costs by 30%? I'm currently spending $2000/month. -``` - -**Example 4: Infrastructure as Code** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you generate a CloudFormation template for a three-tier web application with auto-scaling and RDS? -``` - -**Example 5: Mobile Backend** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable mobile backend using AppSync GraphQL, Cognito, and DynamoDB? -``` - -**Example 6: Data Pipeline** -``` -Hey Claude—I just added the "aws-solution-architect" skill. Can you design a real-time data processing pipeline using Kinesis for analytics on IoT sensor data? -``` - -## What to Provide - -Depending on your needs, provide: - -### For Architecture Design: -- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS -- **Expected scale**: Number of users, requests per second, data volume -- **Budget**: Monthly AWS spend limit or constraints -- **Team context**: Team size, AWS experience level -- **Requirements**: Authentication, real-time features, compliance needs (GDPR, HIPAA) -- **Geographic scope**: Single region, multi-region, global - -### For Cost Optimization: -- **Current monthly spend**: Total AWS bill -- **Resource inventory**: List of EC2, RDS, S3, etc. resources -- **Utilization metrics**: CPU, memory, storage usage -- **Budget target**: Desired monthly spend or savings percentage - -### For Infrastructure as Code: -- **Template type**: CloudFormation, CDK (TypeScript/Python), or Terraform -- **Services needed**: Compute, database, storage, networking -- **Environment**: dev, staging, production configurations - -## What You'll Get - -Based on your request, you'll receive: - -### Architecture Designs: -- **Pattern recommendation** with service selection -- **Architecture diagram** description (visual representation) -- **Service configuration** details -- **Cost estimates** with monthly breakdown -- **Pros/cons** analysis -- **Scaling characteristics** and limitations - -### Infrastructure as Code: -- **CloudFormation templates** (YAML) - production-ready -- **AWS CDK stacks** (TypeScript) - modern, type-safe -- **Terraform configurations** (HCL) - multi-cloud compatible -- **Deployment instructions** and prerequisites -- **Security best practices** built-in - -### Cost Optimization: -- **Current spend analysis** by service -- **Specific recommendations** with savings potential -- **Priority actions** (high/medium/low) -- **Implementation checklist** with timelines -- **Long-term optimization** strategies - -### Best Practices: -- **Security hardening** checklist -- **Scalability patterns** and anti-patterns -- **Monitoring setup** recommendations -- **Disaster recovery** procedures -- **Compliance guidance** (GDPR, HIPAA, SOC 2) - -## Common Use Cases - -### 1. MVP/Startup Launch -**Ask for:** "Serverless architecture for MVP with minimal costs" - -**You'll get:** -- Amplify or Lambda + API Gateway + DynamoDB stack -- Cognito authentication setup -- S3 + CloudFront for frontend -- Cost estimate: $20-100/month -- Fast deployment (1-3 days) - -### 2. Scaling Existing Application -**Ask for:** "Migrate from single server to scalable AWS architecture" - -**You'll get:** -- Migration strategy (phased approach) -- Modern three-tier or containerized architecture -- Load balancing and auto-scaling configuration -- Database migration plan (DMS) -- Zero-downtime deployment strategy - -### 3. Cost Reduction -**Ask for:** "Analyze and optimize my $5000/month AWS bill" - -**You'll get:** -- Service-by-service cost breakdown -- Right-sizing recommendations -- Savings Plans/Reserved Instance opportunities -- Storage lifecycle optimizations -- Estimated savings: 20-40% - -### 4. Compliance Requirements -**Ask for:** "HIPAA-compliant architecture for healthcare application" - -**You'll get:** -- Compliant service selection (BAA-eligible only) -- Encryption configuration (at rest and in transit) -- Audit logging setup (CloudTrail, Config) -- Network isolation (VPC private subnets) -- Access control (IAM policies) - -### 5. Global Deployment -**Ask for:** "Multi-region architecture for global users" - -**You'll get:** -- Route 53 geolocation routing -- DynamoDB Global Tables or Aurora Global -- CloudFront edge caching -- Disaster recovery and failover -- Cross-region cost considerations - -## Prerequisites - -### For Using Generated Templates: - -**AWS Account**: -- Active AWS account with appropriate permissions -- IAM user or role with admin access (for initial setup) -- Billing alerts enabled - -**Tools Required**: -```bash -# AWS CLI -brew install awscli # macOS -aws configure - -# For CloudFormation -# (AWS CLI includes CloudFormation) - -# For AWS CDK -npm install -g aws-cdk -cdk --version - -# For Terraform -brew install terraform # macOS -terraform --version -``` - -**Knowledge**: -- Basic AWS concepts (VPC, IAM, EC2, S3) -- Command line proficiency -- Git for version control - -## Deployment Steps - -### CloudFormation: -```bash -# Validate template -aws cloudformation validate-template --template-body file://template.yaml - -# Deploy stack -aws cloudformation create-stack \ - --stack-name my-app-stack \ - --template-body file://template.yaml \ - --parameters ParameterKey=Environment,ParameterValue=dev \ - --capabilities CAPABILITY_IAM - -# Monitor deployment -aws cloudformation describe-stacks --stack-name my-app-stack -``` - -### AWS CDK: -```bash -# Initialize project -cdk init app --language=typescript - -# Install dependencies -npm install - -# Deploy stack -cdk deploy - -# View outputs -cdk outputs -``` - -### Terraform: -```bash -# Initialize -terraform init - -# Plan deployment -terraform plan - -# Apply changes -terraform apply - -# View outputs -terraform output -``` - -## Best Practices Tips - -### 1. Start Small, Scale Gradually -- Begin with serverless to minimize costs -- Add managed services as you grow -- Avoid over-engineering for hypothetical scale - -### 2. Enable Monitoring from Day One -- Set up CloudWatch dashboards -- Configure alarms for critical metrics -- Enable AWS Cost Explorer -- Create budget alerts - -### 3. Infrastructure as Code Always -- Version control all infrastructure -- Use separate accounts for dev/staging/prod -- Implement CI/CD for infrastructure changes -- Document architecture decisions - -### 4. Security First -- Enable MFA on root and admin accounts -- Use IAM roles, never long-term credentials -- Encrypt everything (S3, RDS, EBS) -- Regular security audits (AWS Security Hub) - -### 5. Cost Management -- Tag all resources for cost allocation -- Review bills weekly -- Delete unused resources promptly -- Use Savings Plans for predictable workloads - -## Troubleshooting - -### Common Issues: - -**"Access Denied" errors:** -- Check IAM permissions for your user/role -- Ensure service-linked roles exist -- Verify resource policies (S3, KMS) - -**High costs unexpectedly:** -- Check for undeleted resources (EC2, RDS snapshots) -- Review NAT Gateway data transfer -- Check CloudWatch Logs retention -- Look for unauthorized usage - -**Deployment failures:** -- Validate templates before deploying -- Check service quotas (limits) -- Verify VPC/subnet configuration -- Review CloudFormation/Terraform error messages - -**Performance issues:** -- Enable CloudWatch metrics and X-Ray -- Check database connection pooling -- Review Lambda cold starts (use provisioned concurrency) -- Optimize database queries and indexes - -## Additional Resources - -- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/ -- **AWS Architecture Center**: https://aws.amazon.com/architecture/ -- **Serverless Land**: https://serverlessland.com/ -- **AWS Pricing Calculator**: https://calculator.aws/ -- **AWS Free Tier**: https://aws.amazon.com/free/ -- **AWS Startups**: https://aws.amazon.com/startups/ - -## Tips for Best Results - -1. **Be specific** about scale and budget constraints -2. **Mention team experience** level with AWS -3. **State compliance requirements** upfront (GDPR, HIPAA, etc.) -4. **Describe current setup** if migrating from existing infrastructure -5. **Ask for alternatives** if you need options to compare -6. **Request explanations** for WHY certain services are recommended -7. **Specify IaC preference** (CloudFormation, CDK, or Terraform) - -## Support - -For AWS-specific questions: -- AWS Support Plans (Developer, Business, Enterprise) -- AWS re:Post community forum -- AWS Documentation: https://docs.aws.amazon.com/ -- AWS Training: https://aws.amazon.com/training/ diff --git a/engineering-team/aws-solution-architect/SKILL.md b/engineering-team/aws-solution-architect/SKILL.md index d4b3933..1fc1953 100644 --- a/engineering-team/aws-solution-architect/SKILL.md +++ b/engineering-team/aws-solution-architect/SKILL.md @@ -1,344 +1,306 @@ --- name: aws-solution-architect -description: Expert AWS solution architecture for startups focusing on serverless, scalable, and cost-effective cloud infrastructure with modern DevOps practices and infrastructure-as-code +description: Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora, and cost optimization. --- -# AWS Solution Architect for Startups +# AWS Solution Architect -This skill provides comprehensive AWS architecture design expertise for startup companies, emphasizing serverless technologies, scalability, cost optimization, and modern cloud-native patterns. +Design scalable, cost-effective AWS architectures for startups with infrastructure-as-code templates. -## Capabilities +--- -- **Serverless Architecture Design**: Lambda, API Gateway, DynamoDB, EventBridge, Step Functions, AppSync -- **Infrastructure as Code**: CloudFormation, CDK (Cloud Development Kit), Terraform templates -- **Scalable Application Architecture**: Auto-scaling, load balancing, multi-region deployment -- **Data & Storage Solutions**: S3, RDS Aurora Serverless, DynamoDB, ElastiCache, Neptune -- **Event-Driven Architecture**: EventBridge, SNS, SQS, Kinesis, Lambda triggers -- **API Design**: API Gateway (REST & WebSocket), AppSync (GraphQL), rate limiting, authentication -- **Authentication & Authorization**: Cognito, IAM, fine-grained access control, federated identity -- **CI/CD Pipelines**: CodePipeline, CodeBuild, CodeDeploy, GitHub Actions integration -- **Monitoring & Observability**: CloudWatch, X-Ray, CloudTrail, alarms, dashboards -- **Cost Optimization**: Reserved instances, Savings Plans, right-sizing, budget alerts -- **Security Best Practices**: VPC design, security groups, WAF, Secrets Manager, encryption -- **Microservices Patterns**: Service mesh, API composition, saga patterns, CQRS -- **Container Orchestration**: ECS Fargate, EKS (Kubernetes), App Runner -- **Content Delivery**: CloudFront, edge locations, origin shield, caching strategies -- **Database Migration**: DMS, schema conversion, zero-downtime migrations +## Table of Contents + +- [Trigger Terms](#trigger-terms) +- [Workflow](#workflow) +- [Tools](#tools) +- [Quick Start](#quick-start) +- [Input Requirements](#input-requirements) +- [Output Formats](#output-formats) + +--- + +## Trigger Terms + +Use this skill when you encounter: + +| Category | Terms | +|----------|-------| +| **Architecture Design** | serverless architecture, AWS architecture, cloud design, microservices, three-tier | +| **IaC Generation** | CloudFormation, CDK, Terraform, infrastructure as code, deploy template | +| **Serverless** | Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, AppSync | +| **Containers** | ECS, Fargate, EKS, container orchestration, Docker on AWS | +| **Cost Optimization** | reduce AWS costs, optimize spending, right-sizing, Savings Plans | +| **Database** | Aurora, RDS, DynamoDB design, database migration, data modeling | +| **Security** | IAM policies, VPC design, encryption, Cognito, WAF | +| **CI/CD** | CodePipeline, CodeBuild, CodeDeploy, GitHub Actions AWS | +| **Monitoring** | CloudWatch, X-Ray, observability, alarms, dashboards | +| **Migration** | migrate to AWS, lift and shift, replatform, DMS | + +--- + +## Workflow + +### Step 1: Gather Requirements + +Collect application specifications: + +``` +- Application type (web app, mobile backend, data pipeline, SaaS) +- Expected users and requests per second +- Budget constraints (monthly spend limit) +- Team size and AWS experience level +- Compliance requirements (GDPR, HIPAA, SOC 2) +- Availability requirements (SLA, RPO/RTO) +``` + +### Step 2: Design Architecture + +Run the architecture designer to get pattern recommendations: + +```bash +python scripts/architecture_designer.py --input requirements.json +``` + +Select from recommended patterns: +- **Serverless Web**: S3 + CloudFront + API Gateway + Lambda + DynamoDB +- **Event-Driven Microservices**: EventBridge + Lambda + SQS + Step Functions +- **Three-Tier**: ALB + ECS Fargate + Aurora + ElastiCache +- **GraphQL Backend**: AppSync + Lambda + DynamoDB + Cognito + +See `references/architecture_patterns.md` for detailed pattern specifications. + +### Step 3: Generate IaC Templates + +Create infrastructure-as-code for the selected pattern: + +```bash +# Serverless stack (CloudFormation) +python scripts/serverless_stack.py --app-name my-app --region us-east-1 + +# Output: CloudFormation YAML template ready to deploy +``` + +### Step 4: Review Costs + +Analyze estimated costs and optimization opportunities: + +```bash +python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000 +``` + +Output includes: +- Monthly cost breakdown by service +- Right-sizing recommendations +- Savings Plans opportunities +- Potential monthly savings + +### Step 5: Deploy + +Deploy the generated infrastructure: + +```bash +# CloudFormation +aws cloudformation create-stack \ + --stack-name my-app-stack \ + --template-body file://template.yaml \ + --capabilities CAPABILITY_IAM + +# CDK +cdk deploy + +# Terraform +terraform init && terraform apply +``` + +### Step 6: Validate + +Verify deployment and set up monitoring: + +```bash +# Check stack status +aws cloudformation describe-stacks --stack-name my-app-stack + +# Set up CloudWatch alarms +aws cloudwatch put-metric-alarm --alarm-name high-errors ... +``` + +--- + +## Tools + +### architecture_designer.py + +Generates architecture patterns based on requirements. + +```bash +python scripts/architecture_designer.py --input requirements.json --output design.json +``` + +**Input:** JSON with app type, scale, budget, compliance needs +**Output:** Recommended pattern, service stack, cost estimate, pros/cons + +### serverless_stack.py + +Creates serverless CloudFormation templates. + +```bash +python scripts/serverless_stack.py --app-name my-app --region us-east-1 +``` + +**Output:** Production-ready CloudFormation YAML with: +- API Gateway + Lambda +- DynamoDB table +- Cognito user pool +- IAM roles with least privilege +- CloudWatch logging + +### cost_optimizer.py + +Analyzes costs and recommends optimizations. + +```bash +python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000 +``` + +**Output:** Recommendations for: +- Idle resource removal +- Instance right-sizing +- Reserved capacity purchases +- Storage tier transitions +- NAT Gateway alternatives + +--- + +## Quick Start + +### MVP Architecture (< $100/month) + +``` +Ask: "Design a serverless MVP backend for a mobile app with 1000 users" + +Result: +- Lambda + API Gateway for API +- DynamoDB pay-per-request for data +- Cognito for authentication +- S3 + CloudFront for static assets +- Estimated: $20-50/month +``` + +### Scaling Architecture ($500-2000/month) + +``` +Ask: "Design a scalable architecture for a SaaS platform with 50k users" + +Result: +- ECS Fargate for containerized API +- Aurora Serverless for relational data +- ElastiCache for session caching +- CloudFront for CDN +- CodePipeline for CI/CD +- Multi-AZ deployment +``` + +### Cost Optimization + +``` +Ask: "Optimize my AWS setup to reduce costs by 30%. Current spend: $3000/month" + +Provide: Current resource inventory (EC2, RDS, S3, etc.) + +Result: +- Idle resource identification +- Right-sizing recommendations +- Savings Plans analysis +- Storage lifecycle policies +- Target savings: $900/month +``` + +### IaC Generation + +``` +Ask: "Generate CloudFormation for a three-tier web app with auto-scaling" + +Result: +- VPC with public/private subnets +- ALB with HTTPS +- ECS Fargate with auto-scaling +- Aurora with read replicas +- Security groups and IAM roles +``` + +--- ## Input Requirements -Architecture design requires: -- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS platform -- **Traffic expectations**: Users/day, requests/second, geographic distribution -- **Data requirements**: Storage needs, database type, backup/retention policies -- **Budget constraints**: Monthly spend limits, cost optimization priorities -- **Team size & expertise**: Developer count, AWS experience level, DevOps maturity -- **Compliance needs**: GDPR, HIPAA, SOC 2, PCI-DSS, data residency -- **Availability requirements**: SLA targets, uptime goals, disaster recovery RPO/RTO +Provide these details for architecture design: -Formats accepted: -- Text description of application requirements -- JSON with structured architecture specifications -- Existing architecture diagrams or documentation -- Current AWS resource inventory (for optimization) +| Requirement | Description | Example | +|-------------|-------------|---------| +| Application type | What you're building | SaaS platform, mobile backend | +| Expected scale | Users, requests/sec | 10k users, 100 RPS | +| Budget | Monthly AWS limit | $500/month max | +| Team context | Size, AWS experience | 3 devs, intermediate | +| Compliance | Regulatory needs | HIPAA, GDPR, SOC 2 | +| Availability | Uptime requirements | 99.9% SLA, 1hr RPO | + +**JSON Format:** + +```json +{ + "application_type": "saas_platform", + "expected_users": 10000, + "requests_per_second": 100, + "budget_monthly_usd": 500, + "team_size": 3, + "aws_experience": "intermediate", + "compliance": ["SOC2"], + "availability_sla": "99.9%" +} +``` + +--- ## Output Formats -Results include: -- **Architecture diagrams**: Visual representations using draw.io or Lucidchart format -- **CloudFormation/CDK templates**: Infrastructure as Code (IaC) ready to deploy -- **Terraform configurations**: Multi-cloud compatible infrastructure definitions -- **Cost estimates**: Detailed monthly cost breakdown with optimization suggestions -- **Security assessment**: Best practices checklist, compliance validation -- **Deployment guides**: Step-by-step implementation instructions -- **Runbooks**: Operational procedures, troubleshooting guides, disaster recovery plans -- **Migration strategies**: Phased migration plans, rollback procedures +### Architecture Design -## How to Use +- Pattern recommendation with rationale +- Service stack diagram (ASCII) +- Configuration specifications +- Monthly cost estimate +- Scaling characteristics +- Trade-offs and limitations -"Design a serverless API backend for a mobile app with 100k users using Lambda and DynamoDB" -"Create a cost-optimized architecture for a SaaS platform with multi-tenancy" -"Generate CloudFormation template for a three-tier web application with auto-scaling" -"Design event-driven microservices architecture using EventBridge and Step Functions" -"Optimize my current AWS setup to reduce costs by 30%" +### IaC Templates -## Scripts +- **CloudFormation YAML**: Production-ready SAM/CFN templates +- **CDK TypeScript**: Type-safe infrastructure code +- **Terraform HCL**: Multi-cloud compatible configs -- `architecture_designer.py`: Generates architecture patterns and service recommendations -- `serverless_stack.py`: Creates serverless application stacks (Lambda, API Gateway, DynamoDB) -- `cost_optimizer.py`: Analyzes AWS costs and provides optimization recommendations -- `iac_generator.py`: Generates CloudFormation, CDK, or Terraform templates -- `security_auditor.py`: AWS security best practices validation and compliance checks +### Cost Analysis -## Architecture Patterns +- Current spend breakdown +- Optimization recommendations with savings +- Priority action list (high/medium/low) +- Implementation checklist -### 1. Serverless Web Application -**Use Case**: SaaS platforms, mobile backends, low-traffic websites +--- -**Stack**: -- **Frontend**: S3 + CloudFront (static hosting) -- **API**: API Gateway + Lambda -- **Database**: DynamoDB or Aurora Serverless -- **Auth**: Cognito -- **CI/CD**: Amplify or CodePipeline +## Reference Documentation -**Benefits**: Zero server management, pay-per-use, auto-scaling, low operational overhead +| Document | Contents | +|----------|----------| +| `references/architecture_patterns.md` | 6 patterns: serverless, microservices, three-tier, data processing, GraphQL, multi-region | +| `references/service_selection.md` | Decision matrices for compute, database, storage, messaging | +| `references/best_practices.md` | Serverless design, cost optimization, security hardening, scalability | -**Cost**: $50-500/month for small to medium traffic - -### 2. Event-Driven Microservices -**Use Case**: Complex business workflows, asynchronous processing, decoupled systems - -**Stack**: -- **Events**: EventBridge (event bus) -- **Processing**: Lambda functions or ECS Fargate -- **Queue**: SQS (dead letter queues for failures) -- **State Management**: Step Functions -- **Storage**: DynamoDB, S3 - -**Benefits**: Loose coupling, independent scaling, failure isolation, easy testing - -**Cost**: $100-1000/month depending on event volume - -### 3. Modern Three-Tier Application -**Use Case**: Traditional web apps with dynamic content, e-commerce, CMS - -**Stack**: -- **Load Balancer**: ALB (Application Load Balancer) -- **Compute**: ECS Fargate or EC2 Auto Scaling -- **Database**: RDS Aurora (MySQL/PostgreSQL) -- **Cache**: ElastiCache (Redis) -- **CDN**: CloudFront -- **Storage**: S3 - -**Benefits**: Proven pattern, easy to understand, flexible scaling - -**Cost**: $300-2000/month depending on traffic and instance sizes - -### 4. Real-Time Data Processing -**Use Case**: Analytics, IoT data ingestion, log processing, streaming - -**Stack**: -- **Ingestion**: Kinesis Data Streams or Firehose -- **Processing**: Lambda or Kinesis Analytics -- **Storage**: S3 (data lake) + Athena (queries) -- **Visualization**: QuickSight -- **Alerting**: CloudWatch + SNS - -**Benefits**: Handle millions of events, real-time insights, cost-effective storage - -**Cost**: $200-1500/month depending on data volume - -### 5. GraphQL API Backend -**Use Case**: Mobile apps, single-page applications, flexible data queries - -**Stack**: -- **API**: AppSync (managed GraphQL) -- **Resolvers**: Lambda or direct DynamoDB integration -- **Database**: DynamoDB -- **Real-time**: AppSync subscriptions (WebSocket) -- **Auth**: Cognito or API keys - -**Benefits**: Single endpoint, reduce over/under-fetching, real-time subscriptions - -**Cost**: $50-400/month for moderate usage - -### 6. Multi-Region High Availability -**Use Case**: Global applications, disaster recovery, compliance requirements - -**Stack**: -- **DNS**: Route 53 (geolocation routing) -- **CDN**: CloudFront with multiple origins -- **Compute**: Multi-region Lambda or ECS -- **Database**: DynamoDB Global Tables or Aurora Global Database -- **Replication**: S3 cross-region replication - -**Benefits**: Low latency globally, disaster recovery, data sovereignty - -**Cost**: 1.5-2x single region costs - -## Best Practices - -### Serverless Design Principles -1. **Stateless functions** - Store state in DynamoDB, S3, or ElastiCache -2. **Idempotency** - Handle retries gracefully, use unique request IDs -3. **Cold start optimization** - Use provisioned concurrency for critical paths, optimize package size -4. **Timeout management** - Set appropriate timeouts, use Step Functions for long processes -5. **Error handling** - Implement retry logic, dead letter queues, exponential backoff - -### Cost Optimization -1. **Right-sizing** - Start small, monitor metrics, scale based on actual usage -2. **Reserved capacity** - Use Savings Plans or Reserved Instances for predictable workloads -3. **S3 lifecycle policies** - Transition to cheaper storage tiers (IA, Glacier) -4. **Lambda memory optimization** - Test different memory settings for cost/performance balance -5. **CloudWatch log retention** - Set appropriate retention periods (7-30 days for most) -6. **NAT Gateway alternatives** - Use VPC endpoints, consider single NAT in dev environments - -### Security Hardening -1. **Principle of least privilege** - IAM roles with minimal permissions -2. **Encryption everywhere** - At rest (KMS) and in transit (TLS/SSL) -3. **Network isolation** - Private subnets, security groups, NACLs -4. **Secrets management** - Use Secrets Manager or Parameter Store, never hardcode -5. **API protection** - WAF rules, rate limiting, API keys, OAuth2 -6. **Audit logging** - CloudTrail for API calls, VPC Flow Logs for network traffic - -### Scalability Design -1. **Horizontal over vertical** - Scale out with more small instances vs. larger instances -2. **Database sharding** - Partition data by tenant, geography, or time -3. **Read replicas** - Offload read traffic from primary database -4. **Caching layers** - CloudFront (edge), ElastiCache (application), DAX (DynamoDB) -5. **Async processing** - Use queues (SQS) for non-critical operations -6. **Auto-scaling policies** - Target tracking (CPU, requests) vs. step scaling - -### DevOps & Reliability -1. **Infrastructure as Code** - Version control, peer review, automated testing -2. **Blue/Green deployments** - Zero-downtime releases, instant rollback -3. **Canary releases** - Test new versions with small traffic percentage -4. **Health checks** - Application-level health endpoints, graceful degradation -5. **Chaos engineering** - Test failure scenarios, validate recovery procedures -6. **Monitoring & alerting** - Set up CloudWatch alarms for critical metrics - -## Service Selection Guide - -### Compute -- **Lambda**: Event-driven, short-duration tasks (<15 min), variable traffic -- **Fargate**: Containerized apps, long-running processes, predictable traffic -- **EC2**: Custom configurations, GPU/FPGA needs, Windows apps -- **App Runner**: Simple container deployment from source code - -### Database -- **DynamoDB**: Key-value, document store, serverless, single-digit ms latency -- **Aurora Serverless**: Relational DB, variable workloads, auto-scaling -- **Aurora Standard**: High-performance relational, predictable traffic -- **RDS**: Traditional databases (MySQL, PostgreSQL, MariaDB, SQL Server) -- **DocumentDB**: MongoDB-compatible, document store -- **Neptune**: Graph database for connected data -- **Timestream**: Time-series data, IoT metrics - -### Storage -- **S3 Standard**: Frequent access, low latency -- **S3 Intelligent-Tiering**: Automatic cost optimization -- **S3 IA (Infrequent Access)**: Backups, archives (30-day minimum) -- **S3 Glacier**: Long-term archives, compliance -- **EFS**: Network file system, shared storage across instances -- **EBS**: Block storage for EC2, high IOPS - -### Messaging & Events -- **EventBridge**: Event bus, loosely coupled microservices -- **SNS**: Pub/sub, fan-out notifications -- **SQS**: Message queuing, decoupling, buffering -- **Kinesis**: Real-time streaming data, analytics -- **MQ**: Managed message brokers (RabbitMQ, ActiveMQ) - -### API & Integration -- **API Gateway**: REST APIs, WebSocket, throttling, caching -- **AppSync**: GraphQL APIs, real-time subscriptions -- **AppFlow**: SaaS integration (Salesforce, Slack, etc.) -- **Step Functions**: Workflow orchestration, state machines - -## Startup-Specific Considerations - -### MVP (Minimum Viable Product) Architecture -**Goal**: Launch fast, minimal infrastructure - -**Recommended**: -- Amplify (full-stack deployment) -- Lambda + API Gateway + DynamoDB -- Cognito for auth -- CloudFront + S3 for frontend - -**Cost**: $20-100/month -**Setup time**: 1-3 days - -### Growth Stage (Scaling to 10k-100k users) -**Goal**: Handle growth, maintain cost efficiency - -**Add**: -- ElastiCache for caching -- Aurora Serverless for complex queries -- CloudWatch dashboards and alarms -- CI/CD pipeline (CodePipeline) -- Multi-AZ deployment - -**Cost**: $500-2000/month -**Migration time**: 1-2 weeks - -### Scale-Up (100k+ users, Series A+) -**Goal**: Reliability, observability, global reach - -**Add**: -- Multi-region deployment -- DynamoDB Global Tables -- Advanced monitoring (X-Ray, third-party APM) -- WAF and Shield for DDoS protection -- Dedicated support plan -- Reserved instances/Savings Plans - -**Cost**: $3000-10000/month -**Migration time**: 1-3 months - -## Common Pitfalls to Avoid - -### Technical Debt -- **Over-engineering early** - Don't build for 10M users when you have 100 -- **Under-monitoring** - Set up basic monitoring from day one -- **Ignoring costs** - Enable Cost Explorer and billing alerts immediately -- **Single region dependency** - Plan for multi-region from start - -### Security Mistakes -- **Public S3 buckets** - Use bucket policies, block public access -- **Overly permissive IAM** - Avoid "*" permissions, use specific resources -- **Hardcoded credentials** - Use IAM roles, Secrets Manager -- **Unencrypted data** - Enable encryption by default - -### Performance Issues -- **No caching** - Add CloudFront, ElastiCache early -- **Inefficient queries** - Use indexes, avoid scans in DynamoDB -- **Large Lambda packages** - Use layers, minimize dependencies -- **N+1 queries** - Implement DataLoader pattern, batch operations - -### Cost Surprises -- **Undeleted resources** - Tag everything, review regularly -- **Data transfer costs** - Keep traffic within same AZ/region when possible -- **NAT Gateway charges** - Use VPC endpoints for AWS services -- **CloudWatch Logs accumulation** - Set retention policies - -## Compliance & Governance - -### Data Residency -- Use specific regions (eu-west-1 for GDPR) -- Enable S3 bucket replication restrictions -- Configure Route 53 geolocation routing - -### HIPAA Compliance -- Use BAA-eligible services only -- Enable encryption at rest and in transit -- Implement audit logging (CloudTrail) -- Configure VPC with private subnets - -### SOC 2 / ISO 27001 -- Enable AWS Config for compliance rules -- Use AWS Audit Manager -- Implement least privilege access -- Regular security assessments +--- ## Limitations -- **Lambda limitations**: 15-minute execution limit, 10GB memory max, cold start latency -- **API Gateway limits**: 29-second timeout, 10MB payload size -- **DynamoDB limits**: 400KB item size, eventually consistent reads by default -- **Regional availability**: Not all services available in all regions -- **Vendor lock-in**: Some serverless services are AWS-specific (consider abstraction layers) -- **Learning curve**: Requires AWS expertise, DevOps knowledge -- **Debugging complexity**: Distributed systems harder to troubleshoot than monoliths - -## Helpful Resources - -- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/ -- **AWS Architecture Center**: https://aws.amazon.com/architecture/ -- **Serverless Land**: https://serverlessland.com/ -- **AWS Pricing Calculator**: https://calculator.aws/ -- **AWS Cost Explorer**: Track and analyze spending -- **AWS Trusted Advisor**: Automated best practice checks -- **CloudFormation Templates**: https://github.com/awslabs/aws-cloudformation-templates -- **AWS CDK Examples**: https://github.com/aws-samples/aws-cdk-examples +- Lambda: 15-minute execution, 10GB memory max +- API Gateway: 29-second timeout, 10MB payload +- DynamoDB: 400KB item size, eventually consistent by default +- Regional availability varies by service +- Some services have AWS-specific lock-in diff --git a/engineering-team/aws-solution-architect/__pycache__/architecture_designer.cpython-313.pyc b/engineering-team/aws-solution-architect/__pycache__/architecture_designer.cpython-313.pyc deleted file mode 100644 index 3e95ea1..0000000 Binary files a/engineering-team/aws-solution-architect/__pycache__/architecture_designer.cpython-313.pyc and /dev/null differ diff --git a/engineering-team/aws-solution-architect/__pycache__/cost_optimizer.cpython-313.pyc b/engineering-team/aws-solution-architect/__pycache__/cost_optimizer.cpython-313.pyc deleted file mode 100644 index a1f331b..0000000 Binary files a/engineering-team/aws-solution-architect/__pycache__/cost_optimizer.cpython-313.pyc and /dev/null differ diff --git a/engineering-team/aws-solution-architect/__pycache__/serverless_stack.cpython-313.pyc b/engineering-team/aws-solution-architect/__pycache__/serverless_stack.cpython-313.pyc deleted file mode 100644 index fd662cb..0000000 Binary files a/engineering-team/aws-solution-architect/__pycache__/serverless_stack.cpython-313.pyc and /dev/null differ diff --git a/engineering-team/aws-solution-architect/expected_output.json b/engineering-team/aws-solution-architect/assets/expected_output.json similarity index 100% rename from engineering-team/aws-solution-architect/expected_output.json rename to engineering-team/aws-solution-architect/assets/expected_output.json diff --git a/engineering-team/aws-solution-architect/sample_input.json b/engineering-team/aws-solution-architect/assets/sample_input.json similarity index 100% rename from engineering-team/aws-solution-architect/sample_input.json rename to engineering-team/aws-solution-architect/assets/sample_input.json diff --git a/engineering-team/aws-solution-architect/references/architecture_patterns.md b/engineering-team/aws-solution-architect/references/architecture_patterns.md new file mode 100644 index 0000000..028a70a --- /dev/null +++ b/engineering-team/aws-solution-architect/references/architecture_patterns.md @@ -0,0 +1,535 @@ +# AWS Architecture Patterns for Startups + +Reference guide for selecting the right AWS architecture pattern based on application requirements. + +--- + +## Table of Contents + +- [Pattern Selection Matrix](#pattern-selection-matrix) +- [Pattern 1: Serverless Web Application](#pattern-1-serverless-web-application) +- [Pattern 2: Event-Driven Microservices](#pattern-2-event-driven-microservices) +- [Pattern 3: Modern Three-Tier Application](#pattern-3-modern-three-tier-application) +- [Pattern 4: Real-Time Data Processing](#pattern-4-real-time-data-processing) +- [Pattern 5: GraphQL API Backend](#pattern-5-graphql-api-backend) +- [Pattern 6: Multi-Region High Availability](#pattern-6-multi-region-high-availability) + +--- + +## Pattern Selection Matrix + +| Pattern | Best For | Users | Monthly Cost | Complexity | +|---------|----------|-------|--------------|------------| +| Serverless Web | MVP, SaaS, mobile backend | <50K | $50-500 | Low | +| Event-Driven Microservices | Complex workflows, async processing | Any | $100-1000 | Medium | +| Three-Tier | Traditional web, e-commerce | 10K-500K | $300-2000 | Medium | +| Real-Time Data | Analytics, IoT, streaming | Any | $200-1500 | High | +| GraphQL Backend | Mobile apps, SPAs | <100K | $50-400 | Medium | +| Multi-Region HA | Global apps, DR requirements | >100K | 1.5-2x single | High | + +--- + +## Pattern 1: Serverless Web Application + +### Use Case +SaaS platforms, mobile backends, low-traffic websites, MVPs + +### Architecture Diagram + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ CloudFront │────▶│ S3 │ │ Cognito │ +│ (CDN) │ │ (Static) │ │ (Auth) │ +└─────────────┘ └─────────────┘ └──────┬──────┘ + │ +┌─────────────┐ ┌─────────────┐ ┌──────▼──────┐ +│ Route 53 │────▶│ API Gateway │────▶│ Lambda │ +│ (DNS) │ │ (REST) │ │ (Functions) │ +└─────────────┘ └─────────────┘ └──────┬──────┘ + │ + ┌──────▼──────┐ + │ DynamoDB │ + │ (Database) │ + └─────────────┘ +``` + +### Service Stack + +| Layer | Service | Configuration | +|-------|---------|---------------| +| Frontend | S3 + CloudFront | Static hosting with HTTPS | +| API | API Gateway + Lambda | REST endpoints with throttling | +| Database | DynamoDB | Pay-per-request billing | +| Auth | Cognito | User pools with MFA support | +| CI/CD | Amplify or CodePipeline | Automated deployments | + +### CloudFormation Template + +```yaml +AWSTemplateFormatVersion: '2010-09-09' +Transform: AWS::Serverless-2016-10-31 + +Resources: + # API Function + ApiFunction: + Type: AWS::Serverless::Function + Properties: + Runtime: nodejs18.x + Handler: index.handler + MemorySize: 512 + Timeout: 10 + Events: + Api: + Type: Api + Properties: + Path: /{proxy+} + Method: ANY + + # DynamoDB Table + DataTable: + Type: AWS::DynamoDB::Table + Properties: + BillingMode: PAY_PER_REQUEST + AttributeDefinitions: + - AttributeName: PK + AttributeType: S + - AttributeName: SK + AttributeType: S + KeySchema: + - AttributeName: PK + KeyType: HASH + - AttributeName: SK + KeyType: RANGE +``` + +### Cost Breakdown (10K users) + +| Service | Monthly Cost | +|---------|-------------| +| Lambda | $5-20 | +| API Gateway | $10-30 | +| DynamoDB | $10-50 | +| CloudFront | $5-15 | +| S3 | $1-5 | +| Cognito | $0-50 | +| **Total** | **$31-170** | + +### Pros and Cons + +**Pros:** +- Zero server management +- Pay only for what you use +- Auto-scaling built-in +- Low operational overhead + +**Cons:** +- Cold start latency (100-500ms) +- 15-minute Lambda execution limit +- Vendor lock-in + +--- + +## Pattern 2: Event-Driven Microservices + +### Use Case +Complex business workflows, asynchronous processing, decoupled systems + +### Architecture Diagram + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Service │────▶│ EventBridge │────▶│ Service │ +│ A │ │ (Event Bus)│ │ B │ +└─────────────┘ └──────┬──────┘ └─────────────┘ + │ + ┌──────▼──────┐ + │ SQS │ + │ (Queue) │ + └──────┬──────┘ + │ +┌─────────────┐ ┌──────▼──────┐ ┌─────────────┐ +│ Step │◀────│ Lambda │────▶│ DynamoDB │ +│ Functions │ │ (Processor) │ │ (Storage) │ +└─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Service Stack + +| Layer | Service | Purpose | +|-------|---------|---------| +| Events | EventBridge | Central event bus | +| Processing | Lambda or ECS Fargate | Event handlers | +| Queue | SQS | Dead letter queue for failures | +| Orchestration | Step Functions | Complex workflow state | +| Storage | DynamoDB, S3 | Persistent data | + +### Event Schema Example + +```json +{ + "source": "orders.service", + "detail-type": "OrderCreated", + "detail": { + "orderId": "ord-12345", + "customerId": "cust-67890", + "items": [...], + "total": 99.99, + "timestamp": "2024-01-15T10:30:00Z" + } +} +``` + +### Cost Breakdown + +| Service | Monthly Cost | +|---------|-------------| +| EventBridge | $1-10 | +| Lambda | $20-100 | +| SQS | $5-20 | +| Step Functions | $25-100 | +| DynamoDB | $20-100 | +| **Total** | **$71-330** | + +### Pros and Cons + +**Pros:** +- Loose coupling between services +- Independent scaling per service +- Failure isolation +- Easy to test individually + +**Cons:** +- Distributed system complexity +- Eventual consistency +- Harder to debug + +--- + +## Pattern 3: Modern Three-Tier Application + +### Use Case +Traditional web apps, e-commerce, CMS, applications with complex queries + +### Architecture Diagram + +``` +┌─────────────┐ ┌─────────────┐ +│ CloudFront │────▶│ ALB │ +│ (CDN) │ │ (Load Bal.) │ +└─────────────┘ └──────┬──────┘ + │ + ┌──────▼──────┐ + │ ECS Fargate │ + │ (Auto-scale)│ + └──────┬──────┘ + │ + ┌──────────────────┼──────────────────┐ + │ │ │ + ┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐ + │ Aurora │ │ ElastiCache │ │ S3 │ + │ (Database) │ │ (Redis) │ │ (Storage) │ + └─────────────┘ └─────────────┘ └─────────────┘ +``` + +### Service Stack + +| Layer | Service | Configuration | +|-------|---------|---------------| +| CDN | CloudFront | Edge caching, HTTPS | +| Load Balancer | ALB | Path-based routing, health checks | +| Compute | ECS Fargate | Container auto-scaling | +| Database | Aurora MySQL/PostgreSQL | Multi-AZ, auto-scaling | +| Cache | ElastiCache Redis | Session, query caching | +| Storage | S3 | Static assets, uploads | + +### Terraform Example + +```hcl +# ECS Service with Auto-scaling +resource "aws_ecs_service" "app" { + name = "app-service" + cluster = aws_ecs_cluster.main.id + task_definition = aws_ecs_task_definition.app.arn + desired_count = 2 + + capacity_provider_strategy { + capacity_provider = "FARGATE" + weight = 100 + } + + load_balancer { + target_group_arn = aws_lb_target_group.app.arn + container_name = "app" + container_port = 3000 + } +} + +# Auto-scaling Policy +resource "aws_appautoscaling_target" "app" { + max_capacity = 10 + min_capacity = 2 + resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}" + scalable_dimension = "ecs:service:DesiredCount" + service_namespace = "ecs" +} +``` + +### Cost Breakdown (50K users) + +| Service | Monthly Cost | +|---------|-------------| +| ECS Fargate (2 tasks) | $100-200 | +| ALB | $25-50 | +| Aurora | $100-300 | +| ElastiCache | $50-100 | +| CloudFront | $20-50 | +| **Total** | **$295-700** | + +--- + +## Pattern 4: Real-Time Data Processing + +### Use Case +Analytics, IoT data ingestion, log processing, streaming data + +### Architecture Diagram + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ IoT Core │────▶│ Kinesis │────▶│ Lambda │ +│ (Devices) │ │ (Stream) │ │ (Process) │ +└─────────────┘ └─────────────┘ └──────┬──────┘ + │ +┌─────────────┐ ┌─────────────┐ ┌──────▼──────┐ +│ QuickSight │◀────│ Athena │◀────│ S3 │ +│ (Viz) │ │ (Query) │ │ (Data Lake) │ +└─────────────┘ └─────────────┘ └─────────────┘ + │ + ┌──────▼──────┐ + │ CloudWatch │ + │ (Alerts) │ + └─────────────┘ +``` + +### Service Stack + +| Layer | Service | Purpose | +|-------|---------|---------| +| Ingestion | Kinesis Data Streams | Real-time data capture | +| Processing | Lambda or Kinesis Analytics | Transform and analyze | +| Storage | S3 (data lake) | Long-term storage | +| Query | Athena | SQL queries on S3 | +| Visualization | QuickSight | Dashboards and reports | +| Alerting | CloudWatch + SNS | Threshold-based alerts | + +### Kinesis Producer Example + +```python +import boto3 +import json + +kinesis = boto3.client('kinesis') + +def send_event(stream_name, data, partition_key): + response = kinesis.put_record( + StreamName=stream_name, + Data=json.dumps(data), + PartitionKey=partition_key + ) + return response['SequenceNumber'] + +# Send sensor reading +send_event( + 'sensor-stream', + {'sensor_id': 'temp-01', 'value': 23.5, 'unit': 'celsius'}, + 'sensor-01' +) +``` + +### Cost Breakdown + +| Service | Monthly Cost | +|---------|-------------| +| Kinesis (1 shard) | $15-30 | +| Lambda | $10-50 | +| S3 | $5-50 | +| Athena | $5-25 | +| QuickSight | $24+ | +| **Total** | **$59-179** | + +--- + +## Pattern 5: GraphQL API Backend + +### Use Case +Mobile apps, single-page applications, flexible data queries + +### Architecture Diagram + +``` +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Mobile App │────▶│ AppSync │────▶│ Lambda │ +│ or SPA │ │ (GraphQL) │ │ (Resolvers) │ +└─────────────┘ └──────┬──────┘ └─────────────┘ + │ + ┌──────▼──────┐ + │ DynamoDB │ + │ (Direct) │ + └──────┬──────┘ + │ + ┌──────▼──────┐ + │ Cognito │ + │ (Auth) │ + └─────────────┘ +``` + +### AppSync Schema Example + +```graphql +type Query { + getUser(id: ID!): User + listPosts(limit: Int, nextToken: String): PostConnection +} + +type Mutation { + createPost(input: CreatePostInput!): Post + updatePost(input: UpdatePostInput!): Post +} + +type Subscription { + onCreatePost: Post @aws_subscribe(mutations: ["createPost"]) +} + +type User { + id: ID! + email: String! + posts: [Post] +} + +type Post { + id: ID! + title: String! + content: String! + author: User! + createdAt: AWSDateTime! +} +``` + +### Cost Breakdown + +| Service | Monthly Cost | +|---------|-------------| +| AppSync | $4-40 | +| Lambda | $5-30 | +| DynamoDB | $10-50 | +| Cognito | $0-50 | +| **Total** | **$19-170** | + +--- + +## Pattern 6: Multi-Region High Availability + +### Use Case +Global applications, disaster recovery, data sovereignty compliance + +### Architecture Diagram + +``` + ┌─────────────┐ + │ Route 53 │ + │(Geo routing)│ + └──────┬──────┘ + │ + ┌────────────────┼────────────────┐ + │ │ + ┌──────▼──────┐ ┌──────▼──────┐ + │ us-east-1 │ │ eu-west-1 │ + │ CloudFront │ │ CloudFront │ + └──────┬──────┘ └──────┬──────┘ + │ │ + ┌──────▼──────┐ ┌──────▼──────┐ + │ ECS/Lambda │ │ ECS/Lambda │ + └──────┬──────┘ └──────┬──────┘ + │ │ + ┌──────▼──────┐◀── Replication ──▶┌──────▼──────┐ + │ DynamoDB │ │ DynamoDB │ + │Global Table │ │Global Table │ + └─────────────┘ └─────────────┘ +``` + +### Service Stack + +| Component | Service | Configuration | +|-----------|---------|---------------| +| DNS | Route 53 | Geolocation or latency routing | +| CDN | CloudFront | Multiple origins per region | +| Compute | Lambda or ECS | Deployed in each region | +| Database | DynamoDB Global Tables | Automatic replication | +| Storage | S3 CRR | Cross-region replication | + +### Route 53 Failover Policy + +```yaml +# Primary record +HealthCheck: + Type: AWS::Route53::HealthCheck + Properties: + HealthCheckConfig: + Port: 443 + Type: HTTPS + ResourcePath: /health + FullyQualifiedDomainName: api-us-east-1.example.com + +RecordSetPrimary: + Type: AWS::Route53::RecordSet + Properties: + Name: api.example.com + Type: A + SetIdentifier: primary + Failover: PRIMARY + HealthCheckId: !Ref HealthCheck + AliasTarget: + DNSName: !GetAtt USEast1ALB.DNSName + HostedZoneId: !GetAtt USEast1ALB.CanonicalHostedZoneID +``` + +### Cost Considerations + +| Factor | Impact | +|--------|--------| +| Compute | 2x (each region) | +| Database | 25% premium for global tables | +| Data Transfer | Cross-region replication costs | +| Route 53 | Health checks + geo queries | +| **Total** | **1.5-2x single region** | + +--- + +## Pattern Comparison Summary + +### Latency + +| Pattern | Typical Latency | +|---------|-----------------| +| Serverless | 50-200ms (cold: 500ms+) | +| Three-Tier | 20-100ms | +| GraphQL | 30-150ms | +| Multi-Region | <50ms (regional) | + +### Scaling Characteristics + +| Pattern | Scale Limit | Scale Speed | +|---------|-------------|-------------| +| Serverless | 1000 concurrent/function | Instant | +| Three-Tier | Instance limits | Minutes | +| Event-Driven | Unlimited | Instant | +| Multi-Region | Regional limits | Instant | + +### Operational Complexity + +| Pattern | Setup | Maintenance | Debugging | +|---------|-------|-------------|-----------| +| Serverless | Low | Low | Medium | +| Three-Tier | Medium | Medium | Low | +| Event-Driven | High | Medium | High | +| Multi-Region | High | High | High | diff --git a/engineering-team/aws-solution-architect/references/best_practices.md b/engineering-team/aws-solution-architect/references/best_practices.md new file mode 100644 index 0000000..85925a0 --- /dev/null +++ b/engineering-team/aws-solution-architect/references/best_practices.md @@ -0,0 +1,631 @@ +# AWS Best Practices for Startups + +Production-ready practices for serverless, cost optimization, security, and operational excellence. + +--- + +## Table of Contents + +- [Serverless Best Practices](#serverless-best-practices) +- [Cost Optimization](#cost-optimization) +- [Security Hardening](#security-hardening) +- [Scalability Patterns](#scalability-patterns) +- [DevOps and Reliability](#devops-and-reliability) +- [Common Pitfalls](#common-pitfalls) + +--- + +## Serverless Best Practices + +### Lambda Function Design + +#### 1. Keep Functions Stateless + +Store state externally in DynamoDB, S3, or ElastiCache. + +```python +# BAD: Function-level state +cache = {} + +def handler(event, context): + if event['key'] in cache: + return cache[event['key']] + # ... + +# GOOD: External state +import boto3 +dynamodb = boto3.resource('dynamodb') +table = dynamodb.Table('cache') + +def handler(event, context): + response = table.get_item(Key={'pk': event['key']}) + if 'Item' in response: + return response['Item']['value'] + # ... +``` + +#### 2. Implement Idempotency + +Handle retries gracefully with unique request IDs. + +```python +import boto3 +import hashlib + +dynamodb = boto3.resource('dynamodb') +idempotency_table = dynamodb.Table('idempotency') + +def handler(event, context): + # Generate idempotency key + idempotency_key = hashlib.sha256( + f"{event['orderId']}-{event['action']}".encode() + ).hexdigest() + + # Check if already processed + try: + response = idempotency_table.get_item(Key={'pk': idempotency_key}) + if 'Item' in response: + return response['Item']['result'] + except Exception: + pass + + # Process request + result = process_order(event) + + # Store result for idempotency + idempotency_table.put_item( + Item={ + 'pk': idempotency_key, + 'result': result, + 'ttl': int(time.time()) + 86400 # 24h TTL + } + ) + + return result +``` + +#### 3. Optimize Cold Starts + +```python +# Initialize outside handler (reused across invocations) +import boto3 +from aws_xray_sdk.core import patch_all + +# SDK initialization happens once +dynamodb = boto3.resource('dynamodb') +table = dynamodb.Table('my-table') +patch_all() + +def handler(event, context): + # Handler code uses pre-initialized resources + return table.get_item(Key={'pk': event['id']}) +``` + +**Cold Start Reduction Techniques:** +- Use provisioned concurrency for critical paths +- Minimize package size (use layers for dependencies) +- Choose interpreted languages (Python, Node.js) over compiled +- Avoid VPC unless necessary (adds 6-10 sec cold start) + +#### 4. Set Appropriate Timeouts + +```yaml +# Lambda configuration +Functions: + ApiHandler: + Timeout: 10 # Shorter for synchronous APIs + MemorySize: 512 + + BackgroundProcessor: + Timeout: 300 # Longer for async processing + MemorySize: 1024 +``` + +**Timeout Guidelines:** +- API handlers: 10-30 seconds +- Event processors: 60-300 seconds +- Use Step Functions for >15 minute workflows + +--- + +## Cost Optimization + +### 1. Right-Sizing Strategy + +```bash +# Check EC2 utilization +aws cloudwatch get-metric-statistics \ + --namespace AWS/EC2 \ + --metric-name CPUUtilization \ + --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \ + --start-time $(date -d '7 days ago' -u +"%Y-%m-%dT%H:%M:%SZ") \ + --end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \ + --period 3600 \ + --statistics Average +``` + +**Right-Sizing Rules:** +- <10% CPU average: Downsize instance +- >80% CPU average: Consider upgrade or horizontal scaling +- Review every month for the first 6 months + +### 2. Savings Plans and Reserved Instances + +| Commitment | Savings | Best For | +|------------|---------|----------| +| No Upfront, 1-year | 20-30% | Unknown future | +| Partial Upfront, 1-year | 30-40% | Moderate confidence | +| All Upfront, 3-year | 50-60% | Stable workloads | + +```bash +# Check Savings Plans recommendations +aws cost-explorer get-savings-plans-purchase-recommendation \ + --savings-plans-type COMPUTE_SP \ + --term-in-years ONE_YEAR \ + --payment-option NO_UPFRONT \ + --lookback-period-in-days THIRTY_DAYS +``` + +### 3. S3 Lifecycle Policies + +```json +{ + "Rules": [ + { + "ID": "Transition to cheaper storage", + "Status": "Enabled", + "Filter": { + "Prefix": "logs/" + }, + "Transitions": [ + { "Days": 30, "StorageClass": "STANDARD_IA" }, + { "Days": 90, "StorageClass": "GLACIER" } + ], + "Expiration": { "Days": 365 } + } + ] +} +``` + +### 4. Lambda Memory Optimization + +Test different memory settings to find optimal cost/performance. + +```python +# Use AWS Lambda Power Tuning +# https://github.com/alexcasalboni/aws-lambda-power-tuning + +# Example results: +# 128 MB: 2000ms, $0.000042 +# 512 MB: 500ms, $0.000042 +# 1024 MB: 300ms, $0.000050 + +# Optimal: 512 MB (same cost, 4x faster) +``` + +### 5. NAT Gateway Alternatives + +``` +NAT Gateway: $0.045/hour + $0.045/GB = ~$32/month + data + +Alternatives: +1. VPC Endpoints: $0.01/hour = ~$7.30/month (for AWS services) +2. NAT Instance: t3.nano = ~$3.80/month (limited throughput) +3. No NAT: Use VPC endpoints + Lambda outside VPC +``` + +### 6. CloudWatch Log Retention + +```yaml +# Set retention policies to avoid unbounded growth +LogGroup: + Type: AWS::Logs::LogGroup + Properties: + LogGroupName: /aws/lambda/my-function + RetentionInDays: 14 # 7, 14, 30, 60, 90, etc. +``` + +**Retention Guidelines:** +- Development: 7 days +- Production non-critical: 30 days +- Production critical: 90 days +- Compliance requirements: As specified + +--- + +## Security Hardening + +### 1. IAM Least Privilege + +```json +// BAD: Overly permissive +{ + "Effect": "Allow", + "Action": "dynamodb:*", + "Resource": "*" +} + +// GOOD: Specific actions and resources +{ + "Effect": "Allow", + "Action": [ + "dynamodb:GetItem", + "dynamodb:PutItem", + "dynamodb:Query" + ], + "Resource": [ + "arn:aws:dynamodb:us-east-1:123456789:table/users", + "arn:aws:dynamodb:us-east-1:123456789:table/users/index/*" + ] +} +``` + +### 2. Encryption Configuration + +```yaml +# Enable encryption everywhere +Resources: + # DynamoDB + Table: + Type: AWS::DynamoDB::Table + Properties: + SSESpecification: + SSEEnabled: true + SSEType: KMS + KMSMasterKeyId: !Ref EncryptionKey + + # S3 + Bucket: + Type: AWS::S3::Bucket + Properties: + BucketEncryption: + ServerSideEncryptionConfiguration: + - ServerSideEncryptionByDefault: + SSEAlgorithm: aws:kms + KMSMasterKeyID: !Ref EncryptionKey + + # RDS + Database: + Type: AWS::RDS::DBInstance + Properties: + StorageEncrypted: true + KmsKeyId: !Ref EncryptionKey +``` + +### 3. Network Isolation + +```yaml +# Private subnets with VPC endpoints +Resources: + PrivateSubnet: + Type: AWS::EC2::Subnet + Properties: + MapPublicIpOnLaunch: false + + # DynamoDB Gateway Endpoint (free) + DynamoDBEndpoint: + Type: AWS::EC2::VPCEndpoint + Properties: + VpcId: !Ref VPC + ServiceName: !Sub com.amazonaws.${AWS::Region}.dynamodb + VpcEndpointType: Gateway + RouteTableIds: + - !Ref PrivateRouteTable + + # Secrets Manager Interface Endpoint + SecretsEndpoint: + Type: AWS::EC2::VPCEndpoint + Properties: + VpcId: !Ref VPC + ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager + VpcEndpointType: Interface + PrivateDnsEnabled: true +``` + +### 4. Secrets Management + +```python +# Never hardcode secrets +import boto3 +import json + +def get_secret(secret_name): + client = boto3.client('secretsmanager') + response = client.get_secret_value(SecretId=secret_name) + return json.loads(response['SecretString']) + +# Usage +db_creds = get_secret('prod/database/credentials') +connection = connect( + host=db_creds['host'], + user=db_creds['username'], + password=db_creds['password'] +) +``` + +### 5. API Protection + +```yaml +# WAF + API Gateway +WebACL: + Type: AWS::WAFv2::WebACL + Properties: + DefaultAction: + Allow: {} + Rules: + - Name: RateLimit + Priority: 1 + Action: + Block: {} + Statement: + RateBasedStatement: + Limit: 2000 + AggregateKeyType: IP + VisibilityConfig: + SampledRequestsEnabled: true + CloudWatchMetricsEnabled: true + MetricName: RateLimitRule + + - Name: AWSManagedRulesCommonRuleSet + Priority: 2 + OverrideAction: + None: {} + Statement: + ManagedRuleGroupStatement: + VendorName: AWS + Name: AWSManagedRulesCommonRuleSet +``` + +### 6. Audit Logging + +```yaml +# Enable CloudTrail for all API calls +CloudTrail: + Type: AWS::CloudTrail::Trail + Properties: + IsMultiRegionTrail: true + IsLogging: true + S3BucketName: !Ref AuditLogsBucket + IncludeGlobalServiceEvents: true + EnableLogFileValidation: true + EventSelectors: + - ReadWriteType: All + IncludeManagementEvents: true +``` + +--- + +## Scalability Patterns + +### 1. Horizontal vs Vertical Scaling + +``` +Horizontal (preferred): +- Add more Lambda concurrent executions +- Add more Fargate tasks +- Add more DynamoDB capacity + +Vertical (when necessary): +- Increase Lambda memory +- Upgrade RDS instance +- Larger EC2 instances +``` + +### 2. Database Sharding + +```python +# Partition by tenant ID +def get_table_for_tenant(tenant_id): + shard = hash(tenant_id) % NUM_SHARDS + return f"data-shard-{shard}" + +# Or use DynamoDB single-table design with partition keys +def get_partition_key(tenant_id, entity_type, entity_id): + return f"TENANT#{tenant_id}#{entity_type}#{entity_id}" +``` + +### 3. Caching Layers + +``` +Edge (CloudFront): Global, static content, TTL: hours-days +Application (Redis): Regional, session/query cache, TTL: minutes-hours +Database (DAX): DynamoDB-specific, TTL: minutes +``` + +```python +# ElastiCache Redis caching pattern +import redis +import json + +cache = redis.Redis(host='cache.abc123.cache.amazonaws.com', port=6379) + +def get_user(user_id): + # Check cache first + cached = cache.get(f"user:{user_id}") + if cached: + return json.loads(cached) + + # Fetch from database + user = db.get_user(user_id) + + # Cache for 5 minutes + cache.setex(f"user:{user_id}", 300, json.dumps(user)) + + return user +``` + +### 4. Auto-Scaling Configuration + +```yaml +# ECS Service Auto-scaling +AutoScalingTarget: + Type: AWS::ApplicationAutoScaling::ScalableTarget + Properties: + MaxCapacity: 10 + MinCapacity: 2 + ResourceId: !Sub service/${Cluster}/${Service.Name} + ScalableDimension: ecs:service:DesiredCount + ServiceNamespace: ecs + +ScalingPolicy: + Type: AWS::ApplicationAutoScaling::ScalingPolicy + Properties: + PolicyType: TargetTrackingScaling + TargetTrackingScalingPolicyConfiguration: + PredefinedMetricSpecification: + PredefinedMetricType: ECSServiceAverageCPUUtilization + TargetValue: 70 + ScaleInCooldown: 300 + ScaleOutCooldown: 60 +``` + +--- + +## DevOps and Reliability + +### 1. Infrastructure as Code + +```bash +# Version control all infrastructure +git init +git add . +git commit -m "Initial infrastructure setup" + +# Use separate stacks per environment +cdk deploy --context environment=dev +cdk deploy --context environment=staging +cdk deploy --context environment=production +``` + +### 2. Blue/Green Deployments + +```yaml +# CodeDeploy Blue/Green for ECS +DeploymentGroup: + Type: AWS::CodeDeploy::DeploymentGroup + Properties: + DeploymentConfigName: CodeDeployDefault.ECSAllAtOnce + DeploymentStyle: + DeploymentType: BLUE_GREEN + DeploymentOption: WITH_TRAFFIC_CONTROL + BlueGreenDeploymentConfiguration: + DeploymentReadyOption: + ActionOnTimeout: CONTINUE_DEPLOYMENT + WaitTimeInMinutes: 0 + TerminateBlueInstancesOnDeploymentSuccess: + Action: TERMINATE + TerminationWaitTimeInMinutes: 5 +``` + +### 3. Health Checks + +```python +# Application health endpoint +from flask import Flask, jsonify +import boto3 + +app = Flask(__name__) + +@app.route('/health') +def health(): + checks = { + 'database': check_database(), + 'cache': check_cache(), + 'external_api': check_external_api() + } + + status = 'healthy' if all(checks.values()) else 'unhealthy' + code = 200 if status == 'healthy' else 503 + + return jsonify({'status': status, 'checks': checks}), code + +def check_database(): + try: + # Quick connectivity test + db.execute('SELECT 1') + return True + except Exception: + return False +``` + +### 4. Monitoring Setup + +```yaml +# CloudWatch Dashboard +Dashboard: + Type: AWS::CloudWatch::Dashboard + Properties: + DashboardName: production-overview + DashboardBody: | + { + "widgets": [ + { + "type": "metric", + "properties": { + "metrics": [ + ["AWS/Lambda", "Invocations", "FunctionName", "api-handler"], + [".", "Errors", ".", "."], + [".", "Duration", ".", ".", {"stat": "p99"}] + ], + "period": 60, + "title": "Lambda Metrics" + } + } + ] + } + +# Critical Alarms +ErrorAlarm: + Type: AWS::CloudWatch::Alarm + Properties: + AlarmName: high-error-rate + MetricName: Errors + Namespace: AWS/Lambda + Statistic: Sum + Period: 60 + EvaluationPeriods: 3 + Threshold: 10 + ComparisonOperator: GreaterThanThreshold + AlarmActions: + - !Ref AlertTopic +``` + +--- + +## Common Pitfalls + +### Technical Debt + +| Pitfall | Solution | +|---------|----------| +| Over-engineering early | Start simple, scale when needed | +| Under-monitoring | Set up CloudWatch from day one | +| Ignoring costs | Enable Cost Explorer and billing alerts | +| Single region only | Plan for multi-region from start | + +### Security Mistakes + +| Mistake | Prevention | +|---------|------------| +| Public S3 buckets | Block public access, use bucket policies | +| Overly permissive IAM | Never use "*", specify resources | +| Hardcoded credentials | Use Secrets Manager, IAM roles | +| Unencrypted data | Enable encryption by default | + +### Performance Issues + +| Issue | Solution | +|-------|----------| +| No caching | Add CloudFront, ElastiCache early | +| Inefficient queries | Use indexes, avoid DynamoDB scans | +| Large Lambda packages | Use layers, minimize dependencies | +| N+1 queries | Implement DataLoader, batch operations | + +### Cost Surprises + +| Surprise | Prevention | +|----------|------------| +| Undeleted resources | Tag everything, review weekly | +| Data transfer costs | Keep traffic in same AZ/region | +| NAT Gateway charges | Use VPC endpoints for AWS services | +| Log accumulation | Set CloudWatch retention policies | diff --git a/engineering-team/aws-solution-architect/references/service_selection.md b/engineering-team/aws-solution-architect/references/service_selection.md new file mode 100644 index 0000000..a81bed2 --- /dev/null +++ b/engineering-team/aws-solution-architect/references/service_selection.md @@ -0,0 +1,484 @@ +# AWS Service Selection Guide + +Quick reference for choosing the right AWS service based on requirements. + +--- + +## Table of Contents + +- [Compute Services](#compute-services) +- [Database Services](#database-services) +- [Storage Services](#storage-services) +- [Messaging and Events](#messaging-and-events) +- [API and Integration](#api-and-integration) +- [Networking](#networking) +- [Security and Identity](#security-and-identity) + +--- + +## Compute Services + +### Decision Matrix + +| Requirement | Recommended Service | +|-------------|---------------------| +| Event-driven, short tasks (<15 min) | Lambda | +| Containerized apps, predictable traffic | ECS Fargate | +| Custom configs, GPU/FPGA | EC2 | +| Simple container from source | App Runner | +| Kubernetes workloads | EKS | +| Batch processing | AWS Batch | + +### Lambda + +**Best for:** Event-driven functions, API backends, scheduled tasks + +``` +Limits: +- Execution: 15 minutes max +- Memory: 128 MB - 10 GB +- Package: 50 MB (zip), 10 GB (container) +- Concurrency: 1000 default (soft limit) + +Pricing: $0.20 per 1M requests + compute time +``` + +**Use when:** +- Variable/unpredictable traffic +- Pay-per-use is important +- No server management desired +- Short-duration operations + +**Avoid when:** +- Long-running processes (>15 min) +- Low-latency requirements (<50ms) +- Heavy compute (consider Fargate) + +### ECS Fargate + +**Best for:** Containerized applications, microservices + +``` +Limits: +- vCPU: 0.25 - 16 +- Memory: 0.5 GB - 120 GB +- Storage: 20 GB - 200 GB ephemeral + +Pricing: Per vCPU-hour + GB-hour +``` + +**Use when:** +- Containerized applications +- Predictable traffic patterns +- Long-running processes +- Need more control than Lambda + +### EC2 + +**Best for:** Custom configurations, specialized hardware + +``` +Instance Types: +- General: t3, m6i +- Compute: c6i +- Memory: r6i +- GPU: p4d, g5 +- Storage: i3, d3 +``` + +**Use when:** +- Need GPU/FPGA +- Windows applications +- Specific instance configurations +- Reserved capacity makes sense + +--- + +## Database Services + +### Decision Matrix + +| Data Type | Query Pattern | Scale | Recommended | +|-----------|--------------|-------|-------------| +| Key-value | Simple lookups | Any | DynamoDB | +| Document | Flexible queries | <1TB | DocumentDB | +| Relational | Complex joins | Variable | Aurora Serverless | +| Relational | High volume | Fixed | Aurora Standard | +| Time-series | Time-based | Any | Timestream | +| Graph | Relationships | Any | Neptune | + +### DynamoDB + +**Best for:** Key-value and document data, serverless applications + +``` +Limits: +- Item size: 400 KB max +- Partition key: 2048 bytes +- Sort key: 1024 bytes +- GSI: 20 per table + +Pricing: +- On-demand: $1.25 per million writes, $0.25 per million reads +- Provisioned: Per RCU/WCU +``` + +**Data Modeling Example:** + +``` +# Single-table design for e-commerce +PK SK Attributes +USER#123 PROFILE {name, email, ...} +USER#123 ORDER#456 {total, status, ...} +USER#123 ORDER#456#ITEM#1 {product, qty, ...} +PRODUCT#789 METADATA {name, price, ...} +``` + +### Aurora + +**Best for:** Relational data with complex queries + +| Edition | Use Case | Scaling | +|---------|----------|---------| +| Aurora Serverless v2 | Variable workloads | 0.5-128 ACUs, auto | +| Aurora Standard | Predictable workloads | Instance-based | +| Aurora Global | Multi-region | Cross-region replication | + +``` +Limits: +- Storage: 128 TB max +- Replicas: 15 read replicas +- Connections: Instance-dependent + +Pricing: +- Serverless: $0.12 per ACU-hour +- Standard: Instance + storage + I/O +``` + +### Comparison: DynamoDB vs Aurora + +| Factor | DynamoDB | Aurora | +|--------|----------|--------| +| Query flexibility | Limited (key-based) | Full SQL | +| Scaling | Instant, unlimited | Minutes, up to limits | +| Consistency | Eventually/Strong | ACID | +| Cost model | Per-request | Per-hour | +| Operational | Zero management | Some management | + +--- + +## Storage Services + +### S3 Storage Classes + +| Class | Access Pattern | Retrieval | Cost (GB/mo) | +|-------|---------------|-----------|--------------| +| Standard | Frequent | Instant | $0.023 | +| Intelligent-Tiering | Unknown | Instant | $0.023 + monitoring | +| Standard-IA | Infrequent (30+ days) | Instant | $0.0125 | +| One Zone-IA | Infrequent, single AZ | Instant | $0.01 | +| Glacier Instant | Archive, instant access | Instant | $0.004 | +| Glacier Flexible | Archive | Minutes-hours | $0.0036 | +| Glacier Deep Archive | Long-term archive | 12-48 hours | $0.00099 | + +### Lifecycle Policy Example + +```json +{ + "Rules": [ + { + "ID": "Archive old data", + "Status": "Enabled", + "Transitions": [ + { + "Days": 30, + "StorageClass": "STANDARD_IA" + }, + { + "Days": 90, + "StorageClass": "GLACIER" + }, + { + "Days": 365, + "StorageClass": "DEEP_ARCHIVE" + } + ], + "Expiration": { + "Days": 2555 + } + } + ] +} +``` + +### Block and File Storage + +| Service | Use Case | Access | +|---------|----------|--------| +| EBS | EC2 block storage | Single instance | +| EFS | Shared file system | Multiple instances | +| FSx for Lustre | HPC workloads | High throughput | +| FSx for Windows | Windows apps | SMB protocol | + +--- + +## Messaging and Events + +### Decision Matrix + +| Pattern | Service | Use Case | +|---------|---------|----------| +| Event routing | EventBridge | Microservices, SaaS integration | +| Pub/sub | SNS | Fan-out notifications | +| Queue | SQS | Decoupling, buffering | +| Streaming | Kinesis | Real-time analytics | +| Message broker | Amazon MQ | Legacy migrations | + +### EventBridge + +**Best for:** Event-driven architectures, SaaS integration + +```python +# EventBridge rule pattern +{ + "source": ["orders.service"], + "detail-type": ["OrderCreated"], + "detail": { + "total": [{"numeric": [">=", 100]}] + } +} +``` + +### SQS + +**Best for:** Decoupling services, handling load spikes + +| Feature | Standard | FIFO | +|---------|----------|------| +| Throughput | Unlimited | 3000 msg/sec | +| Ordering | Best effort | Guaranteed | +| Delivery | At least once | Exactly once | +| Deduplication | No | Yes | + +```python +# SQS with dead letter queue +import boto3 + +sqs = boto3.client('sqs') + +def process_with_dlq(queue_url, dlq_url, max_retries=3): + response = sqs.receive_message( + QueueUrl=queue_url, + MaxNumberOfMessages=10, + WaitTimeSeconds=20, + AttributeNames=['ApproximateReceiveCount'] + ) + + for message in response.get('Messages', []): + receive_count = int(message['Attributes']['ApproximateReceiveCount']) + + try: + process(message) + sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle']) + except Exception as e: + if receive_count >= max_retries: + sqs.send_message(QueueUrl=dlq_url, MessageBody=message['Body']) + sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle']) +``` + +### Kinesis + +**Best for:** Real-time streaming data, analytics + +| Service | Use Case | +|---------|----------| +| Data Streams | Custom processing | +| Data Firehose | Direct to S3/Redshift | +| Data Analytics | SQL on streams | +| Video Streams | Video ingestion | + +--- + +## API and Integration + +### API Gateway vs AppSync + +| Factor | API Gateway | AppSync | +|--------|-------------|---------| +| Protocol | REST, WebSocket | GraphQL | +| Real-time | WebSocket setup | Built-in subscriptions | +| Caching | Response caching | Field-level caching | +| Integration | Lambda, HTTP, AWS | Lambda, DynamoDB, HTTP | +| Pricing | Per request | Per request + data | + +### API Gateway Configuration + +```yaml +# Throttling and caching +Resources: + ApiGateway: + Type: AWS::ApiGateway::RestApi + Properties: + Name: my-api + + ApiStage: + Type: AWS::ApiGateway::Stage + Properties: + StageName: prod + MethodSettings: + - HttpMethod: "*" + ResourcePath: "/*" + ThrottlingBurstLimit: 500 + ThrottlingRateLimit: 1000 + CachingEnabled: true + CacheTtlInSeconds: 300 +``` + +### Step Functions + +**Best for:** Workflow orchestration, long-running processes + +```json +{ + "StartAt": "ProcessOrder", + "States": { + "ProcessOrder": { + "Type": "Task", + "Resource": "arn:aws:lambda:...:processOrder", + "Next": "CheckInventory" + }, + "CheckInventory": { + "Type": "Choice", + "Choices": [ + { + "Variable": "$.inStock", + "BooleanEquals": true, + "Next": "ShipOrder" + } + ], + "Default": "BackOrder" + }, + "ShipOrder": { + "Type": "Task", + "Resource": "arn:aws:lambda:...:shipOrder", + "End": true + }, + "BackOrder": { + "Type": "Task", + "Resource": "arn:aws:lambda:...:backOrder", + "End": true + } + } +} +``` + +--- + +## Networking + +### VPC Components + +| Component | Purpose | +|-----------|---------| +| VPC | Isolated network | +| Subnet | Network segment (public/private) | +| Internet Gateway | Public internet access | +| NAT Gateway | Private subnet outbound | +| VPC Endpoint | Private AWS service access | +| Transit Gateway | VPC interconnection | + +### VPC Design Pattern + +``` +VPC: 10.0.0.0/16 + +Public Subnets (AZ a, b, c): + 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24 + - ALB, NAT Gateway, Bastion + +Private Subnets (AZ a, b, c): + 10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24 + - Application servers, Lambda + +Database Subnets (AZ a, b, c): + 10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24 + - RDS, ElastiCache +``` + +### VPC Endpoints (Cost Savings) + +```yaml +# Interface endpoint for Secrets Manager +SecretsManagerEndpoint: + Type: AWS::EC2::VPCEndpoint + Properties: + VpcId: !Ref VPC + ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager + VpcEndpointType: Interface + SubnetIds: !Ref PrivateSubnets + SecurityGroupIds: + - !Ref EndpointSecurityGroup +``` + +--- + +## Security and Identity + +### IAM Best Practices + +```json +// Least privilege policy example +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "dynamodb:GetItem", + "dynamodb:PutItem", + "dynamodb:Query" + ], + "Resource": "arn:aws:dynamodb:us-east-1:123456789:table/users", + "Condition": { + "ForAllValues:StringEquals": { + "dynamodb:LeadingKeys": ["${aws:userid}"] + } + } + } + ] +} +``` + +### Secrets Manager vs Parameter Store + +| Factor | Secrets Manager | Parameter Store | +|--------|-----------------|-----------------| +| Auto-rotation | Built-in | Manual | +| Cross-account | Yes | Limited | +| Pricing | $0.40/secret/month | Free (standard) | +| Use case | Credentials, API keys | Config, non-secrets | + +### Cognito Configuration + +```yaml +UserPool: + Type: AWS::Cognito::UserPool + Properties: + UserPoolName: my-app-users + AutoVerifiedAttributes: + - email + MfaConfiguration: OPTIONAL + EnabledMfas: + - SOFTWARE_TOKEN_MFA + Policies: + PasswordPolicy: + MinimumLength: 12 + RequireLowercase: true + RequireUppercase: true + RequireNumbers: true + RequireSymbols: true + AccountRecoverySetting: + RecoveryMechanisms: + - Name: verified_email + Priority: 1 +``` diff --git a/engineering-team/aws-solution-architect/architecture_designer.py b/engineering-team/aws-solution-architect/scripts/architecture_designer.py similarity index 100% rename from engineering-team/aws-solution-architect/architecture_designer.py rename to engineering-team/aws-solution-architect/scripts/architecture_designer.py diff --git a/engineering-team/aws-solution-architect/cost_optimizer.py b/engineering-team/aws-solution-architect/scripts/cost_optimizer.py similarity index 100% rename from engineering-team/aws-solution-architect/cost_optimizer.py rename to engineering-team/aws-solution-architect/scripts/cost_optimizer.py diff --git a/engineering-team/aws-solution-architect/serverless_stack.py b/engineering-team/aws-solution-architect/scripts/serverless_stack.py similarity index 100% rename from engineering-team/aws-solution-architect/serverless_stack.py rename to engineering-team/aws-solution-architect/scripts/serverless_stack.py