Complete restructure based on AI Agent Skills Benchmark feedback (original score: 66/100):
## Directory Reorganization
- Moved Python scripts to scripts/ directory
- Moved sample files to assets/ directory
- Created references/ directory with extracted content
- Removed HOW_TO_USE.md (integrated into SKILL.md)
- Removed __pycache__
## New Reference Files (3 files)
- architecture_patterns.md: 6 AWS patterns (serverless, microservices, three-tier,
data processing, GraphQL, multi-region) with diagrams, cost breakdowns, pros/cons
- service_selection.md: Decision matrices for compute, database, storage, messaging,
networking, security services with code examples
- best_practices.md: Serverless design, cost optimization, security hardening,
scalability patterns, common pitfalls
## SKILL.md Rewrite
- Reduced from 345 lines to 307 lines (moved patterns to references/)
- Added trigger phrases to description ("design serverless architecture",
"create CloudFormation templates", "optimize AWS costs")
- Structured around 6-step workflow instead of encyclopedia format
- Added Quick Start examples (MVP, Scaling, Cost Optimization, IaC)
- Removed marketing language ("Expert", "comprehensive")
- Consistent imperative voice throughout
## Structure Changes
- scripts/: architecture_designer.py, cost_optimizer.py, serverless_stack.py
- references/: architecture_patterns.md, service_selection.md, best_practices.md
- assets/: sample_input.json, expected_output.json
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,308 +0,0 @@
|
||||
# How to Use This Skill
|
||||
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable serverless architecture for my startup?
|
||||
|
||||
## Example Invocations
|
||||
|
||||
**Example 1: Serverless Web Application**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you design a serverless architecture for a SaaS platform with 10k users, including API, database, and authentication?
|
||||
```
|
||||
|
||||
**Example 2: Microservices Architecture**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you design an event-driven microservices architecture using Lambda, EventBridge, and DynamoDB for an e-commerce platform?
|
||||
```
|
||||
|
||||
**Example 3: Cost Optimization**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you analyze my current AWS setup and recommend ways to reduce costs by 30%? I'm currently spending $2000/month.
|
||||
```
|
||||
|
||||
**Example 4: Infrastructure as Code**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you generate a CloudFormation template for a three-tier web application with auto-scaling and RDS?
|
||||
```
|
||||
|
||||
**Example 5: Mobile Backend**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you design a scalable mobile backend using AppSync GraphQL, Cognito, and DynamoDB?
|
||||
```
|
||||
|
||||
**Example 6: Data Pipeline**
|
||||
```
|
||||
Hey Claude—I just added the "aws-solution-architect" skill. Can you design a real-time data processing pipeline using Kinesis for analytics on IoT sensor data?
|
||||
```
|
||||
|
||||
## What to Provide
|
||||
|
||||
Depending on your needs, provide:
|
||||
|
||||
### For Architecture Design:
|
||||
- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS
|
||||
- **Expected scale**: Number of users, requests per second, data volume
|
||||
- **Budget**: Monthly AWS spend limit or constraints
|
||||
- **Team context**: Team size, AWS experience level
|
||||
- **Requirements**: Authentication, real-time features, compliance needs (GDPR, HIPAA)
|
||||
- **Geographic scope**: Single region, multi-region, global
|
||||
|
||||
### For Cost Optimization:
|
||||
- **Current monthly spend**: Total AWS bill
|
||||
- **Resource inventory**: List of EC2, RDS, S3, etc. resources
|
||||
- **Utilization metrics**: CPU, memory, storage usage
|
||||
- **Budget target**: Desired monthly spend or savings percentage
|
||||
|
||||
### For Infrastructure as Code:
|
||||
- **Template type**: CloudFormation, CDK (TypeScript/Python), or Terraform
|
||||
- **Services needed**: Compute, database, storage, networking
|
||||
- **Environment**: dev, staging, production configurations
|
||||
|
||||
## What You'll Get
|
||||
|
||||
Based on your request, you'll receive:
|
||||
|
||||
### Architecture Designs:
|
||||
- **Pattern recommendation** with service selection
|
||||
- **Architecture diagram** description (visual representation)
|
||||
- **Service configuration** details
|
||||
- **Cost estimates** with monthly breakdown
|
||||
- **Pros/cons** analysis
|
||||
- **Scaling characteristics** and limitations
|
||||
|
||||
### Infrastructure as Code:
|
||||
- **CloudFormation templates** (YAML) - production-ready
|
||||
- **AWS CDK stacks** (TypeScript) - modern, type-safe
|
||||
- **Terraform configurations** (HCL) - multi-cloud compatible
|
||||
- **Deployment instructions** and prerequisites
|
||||
- **Security best practices** built-in
|
||||
|
||||
### Cost Optimization:
|
||||
- **Current spend analysis** by service
|
||||
- **Specific recommendations** with savings potential
|
||||
- **Priority actions** (high/medium/low)
|
||||
- **Implementation checklist** with timelines
|
||||
- **Long-term optimization** strategies
|
||||
|
||||
### Best Practices:
|
||||
- **Security hardening** checklist
|
||||
- **Scalability patterns** and anti-patterns
|
||||
- **Monitoring setup** recommendations
|
||||
- **Disaster recovery** procedures
|
||||
- **Compliance guidance** (GDPR, HIPAA, SOC 2)
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### 1. MVP/Startup Launch
|
||||
**Ask for:** "Serverless architecture for MVP with minimal costs"
|
||||
|
||||
**You'll get:**
|
||||
- Amplify or Lambda + API Gateway + DynamoDB stack
|
||||
- Cognito authentication setup
|
||||
- S3 + CloudFront for frontend
|
||||
- Cost estimate: $20-100/month
|
||||
- Fast deployment (1-3 days)
|
||||
|
||||
### 2. Scaling Existing Application
|
||||
**Ask for:** "Migrate from single server to scalable AWS architecture"
|
||||
|
||||
**You'll get:**
|
||||
- Migration strategy (phased approach)
|
||||
- Modern three-tier or containerized architecture
|
||||
- Load balancing and auto-scaling configuration
|
||||
- Database migration plan (DMS)
|
||||
- Zero-downtime deployment strategy
|
||||
|
||||
### 3. Cost Reduction
|
||||
**Ask for:** "Analyze and optimize my $5000/month AWS bill"
|
||||
|
||||
**You'll get:**
|
||||
- Service-by-service cost breakdown
|
||||
- Right-sizing recommendations
|
||||
- Savings Plans/Reserved Instance opportunities
|
||||
- Storage lifecycle optimizations
|
||||
- Estimated savings: 20-40%
|
||||
|
||||
### 4. Compliance Requirements
|
||||
**Ask for:** "HIPAA-compliant architecture for healthcare application"
|
||||
|
||||
**You'll get:**
|
||||
- Compliant service selection (BAA-eligible only)
|
||||
- Encryption configuration (at rest and in transit)
|
||||
- Audit logging setup (CloudTrail, Config)
|
||||
- Network isolation (VPC private subnets)
|
||||
- Access control (IAM policies)
|
||||
|
||||
### 5. Global Deployment
|
||||
**Ask for:** "Multi-region architecture for global users"
|
||||
|
||||
**You'll get:**
|
||||
- Route 53 geolocation routing
|
||||
- DynamoDB Global Tables or Aurora Global
|
||||
- CloudFront edge caching
|
||||
- Disaster recovery and failover
|
||||
- Cross-region cost considerations
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### For Using Generated Templates:
|
||||
|
||||
**AWS Account**:
|
||||
- Active AWS account with appropriate permissions
|
||||
- IAM user or role with admin access (for initial setup)
|
||||
- Billing alerts enabled
|
||||
|
||||
**Tools Required**:
|
||||
```bash
|
||||
# AWS CLI
|
||||
brew install awscli # macOS
|
||||
aws configure
|
||||
|
||||
# For CloudFormation
|
||||
# (AWS CLI includes CloudFormation)
|
||||
|
||||
# For AWS CDK
|
||||
npm install -g aws-cdk
|
||||
cdk --version
|
||||
|
||||
# For Terraform
|
||||
brew install terraform # macOS
|
||||
terraform --version
|
||||
```
|
||||
|
||||
**Knowledge**:
|
||||
- Basic AWS concepts (VPC, IAM, EC2, S3)
|
||||
- Command line proficiency
|
||||
- Git for version control
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### CloudFormation:
|
||||
```bash
|
||||
# Validate template
|
||||
aws cloudformation validate-template --template-body file://template.yaml
|
||||
|
||||
# Deploy stack
|
||||
aws cloudformation create-stack \
|
||||
--stack-name my-app-stack \
|
||||
--template-body file://template.yaml \
|
||||
--parameters ParameterKey=Environment,ParameterValue=dev \
|
||||
--capabilities CAPABILITY_IAM
|
||||
|
||||
# Monitor deployment
|
||||
aws cloudformation describe-stacks --stack-name my-app-stack
|
||||
```
|
||||
|
||||
### AWS CDK:
|
||||
```bash
|
||||
# Initialize project
|
||||
cdk init app --language=typescript
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Deploy stack
|
||||
cdk deploy
|
||||
|
||||
# View outputs
|
||||
cdk outputs
|
||||
```
|
||||
|
||||
### Terraform:
|
||||
```bash
|
||||
# Initialize
|
||||
terraform init
|
||||
|
||||
# Plan deployment
|
||||
terraform plan
|
||||
|
||||
# Apply changes
|
||||
terraform apply
|
||||
|
||||
# View outputs
|
||||
terraform output
|
||||
```
|
||||
|
||||
## Best Practices Tips
|
||||
|
||||
### 1. Start Small, Scale Gradually
|
||||
- Begin with serverless to minimize costs
|
||||
- Add managed services as you grow
|
||||
- Avoid over-engineering for hypothetical scale
|
||||
|
||||
### 2. Enable Monitoring from Day One
|
||||
- Set up CloudWatch dashboards
|
||||
- Configure alarms for critical metrics
|
||||
- Enable AWS Cost Explorer
|
||||
- Create budget alerts
|
||||
|
||||
### 3. Infrastructure as Code Always
|
||||
- Version control all infrastructure
|
||||
- Use separate accounts for dev/staging/prod
|
||||
- Implement CI/CD for infrastructure changes
|
||||
- Document architecture decisions
|
||||
|
||||
### 4. Security First
|
||||
- Enable MFA on root and admin accounts
|
||||
- Use IAM roles, never long-term credentials
|
||||
- Encrypt everything (S3, RDS, EBS)
|
||||
- Regular security audits (AWS Security Hub)
|
||||
|
||||
### 5. Cost Management
|
||||
- Tag all resources for cost allocation
|
||||
- Review bills weekly
|
||||
- Delete unused resources promptly
|
||||
- Use Savings Plans for predictable workloads
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues:
|
||||
|
||||
**"Access Denied" errors:**
|
||||
- Check IAM permissions for your user/role
|
||||
- Ensure service-linked roles exist
|
||||
- Verify resource policies (S3, KMS)
|
||||
|
||||
**High costs unexpectedly:**
|
||||
- Check for undeleted resources (EC2, RDS snapshots)
|
||||
- Review NAT Gateway data transfer
|
||||
- Check CloudWatch Logs retention
|
||||
- Look for unauthorized usage
|
||||
|
||||
**Deployment failures:**
|
||||
- Validate templates before deploying
|
||||
- Check service quotas (limits)
|
||||
- Verify VPC/subnet configuration
|
||||
- Review CloudFormation/Terraform error messages
|
||||
|
||||
**Performance issues:**
|
||||
- Enable CloudWatch metrics and X-Ray
|
||||
- Check database connection pooling
|
||||
- Review Lambda cold starts (use provisioned concurrency)
|
||||
- Optimize database queries and indexes
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/
|
||||
- **AWS Architecture Center**: https://aws.amazon.com/architecture/
|
||||
- **Serverless Land**: https://serverlessland.com/
|
||||
- **AWS Pricing Calculator**: https://calculator.aws/
|
||||
- **AWS Free Tier**: https://aws.amazon.com/free/
|
||||
- **AWS Startups**: https://aws.amazon.com/startups/
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
1. **Be specific** about scale and budget constraints
|
||||
2. **Mention team experience** level with AWS
|
||||
3. **State compliance requirements** upfront (GDPR, HIPAA, etc.)
|
||||
4. **Describe current setup** if migrating from existing infrastructure
|
||||
5. **Ask for alternatives** if you need options to compare
|
||||
6. **Request explanations** for WHY certain services are recommended
|
||||
7. **Specify IaC preference** (CloudFormation, CDK, or Terraform)
|
||||
|
||||
## Support
|
||||
|
||||
For AWS-specific questions:
|
||||
- AWS Support Plans (Developer, Business, Enterprise)
|
||||
- AWS re:Post community forum
|
||||
- AWS Documentation: https://docs.aws.amazon.com/
|
||||
- AWS Training: https://aws.amazon.com/training/
|
||||
@@ -1,344 +1,306 @@
|
||||
---
|
||||
name: aws-solution-architect
|
||||
description: Expert AWS solution architecture for startups focusing on serverless, scalable, and cost-effective cloud infrastructure with modern DevOps practices and infrastructure-as-code
|
||||
description: Design AWS architectures for startups using serverless patterns and IaC templates. Use when asked to design serverless architecture, create CloudFormation templates, optimize AWS costs, set up CI/CD pipelines, or migrate to AWS. Covers Lambda, API Gateway, DynamoDB, ECS, Aurora, and cost optimization.
|
||||
---
|
||||
|
||||
# AWS Solution Architect for Startups
|
||||
# AWS Solution Architect
|
||||
|
||||
This skill provides comprehensive AWS architecture design expertise for startup companies, emphasizing serverless technologies, scalability, cost optimization, and modern cloud-native patterns.
|
||||
Design scalable, cost-effective AWS architectures for startups with infrastructure-as-code templates.
|
||||
|
||||
## Capabilities
|
||||
---
|
||||
|
||||
- **Serverless Architecture Design**: Lambda, API Gateway, DynamoDB, EventBridge, Step Functions, AppSync
|
||||
- **Infrastructure as Code**: CloudFormation, CDK (Cloud Development Kit), Terraform templates
|
||||
- **Scalable Application Architecture**: Auto-scaling, load balancing, multi-region deployment
|
||||
- **Data & Storage Solutions**: S3, RDS Aurora Serverless, DynamoDB, ElastiCache, Neptune
|
||||
- **Event-Driven Architecture**: EventBridge, SNS, SQS, Kinesis, Lambda triggers
|
||||
- **API Design**: API Gateway (REST & WebSocket), AppSync (GraphQL), rate limiting, authentication
|
||||
- **Authentication & Authorization**: Cognito, IAM, fine-grained access control, federated identity
|
||||
- **CI/CD Pipelines**: CodePipeline, CodeBuild, CodeDeploy, GitHub Actions integration
|
||||
- **Monitoring & Observability**: CloudWatch, X-Ray, CloudTrail, alarms, dashboards
|
||||
- **Cost Optimization**: Reserved instances, Savings Plans, right-sizing, budget alerts
|
||||
- **Security Best Practices**: VPC design, security groups, WAF, Secrets Manager, encryption
|
||||
- **Microservices Patterns**: Service mesh, API composition, saga patterns, CQRS
|
||||
- **Container Orchestration**: ECS Fargate, EKS (Kubernetes), App Runner
|
||||
- **Content Delivery**: CloudFront, edge locations, origin shield, caching strategies
|
||||
- **Database Migration**: DMS, schema conversion, zero-downtime migrations
|
||||
## Table of Contents
|
||||
|
||||
- [Trigger Terms](#trigger-terms)
|
||||
- [Workflow](#workflow)
|
||||
- [Tools](#tools)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Input Requirements](#input-requirements)
|
||||
- [Output Formats](#output-formats)
|
||||
|
||||
---
|
||||
|
||||
## Trigger Terms
|
||||
|
||||
Use this skill when you encounter:
|
||||
|
||||
| Category | Terms |
|
||||
|----------|-------|
|
||||
| **Architecture Design** | serverless architecture, AWS architecture, cloud design, microservices, three-tier |
|
||||
| **IaC Generation** | CloudFormation, CDK, Terraform, infrastructure as code, deploy template |
|
||||
| **Serverless** | Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, AppSync |
|
||||
| **Containers** | ECS, Fargate, EKS, container orchestration, Docker on AWS |
|
||||
| **Cost Optimization** | reduce AWS costs, optimize spending, right-sizing, Savings Plans |
|
||||
| **Database** | Aurora, RDS, DynamoDB design, database migration, data modeling |
|
||||
| **Security** | IAM policies, VPC design, encryption, Cognito, WAF |
|
||||
| **CI/CD** | CodePipeline, CodeBuild, CodeDeploy, GitHub Actions AWS |
|
||||
| **Monitoring** | CloudWatch, X-Ray, observability, alarms, dashboards |
|
||||
| **Migration** | migrate to AWS, lift and shift, replatform, DMS |
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Gather Requirements
|
||||
|
||||
Collect application specifications:
|
||||
|
||||
```
|
||||
- Application type (web app, mobile backend, data pipeline, SaaS)
|
||||
- Expected users and requests per second
|
||||
- Budget constraints (monthly spend limit)
|
||||
- Team size and AWS experience level
|
||||
- Compliance requirements (GDPR, HIPAA, SOC 2)
|
||||
- Availability requirements (SLA, RPO/RTO)
|
||||
```
|
||||
|
||||
### Step 2: Design Architecture
|
||||
|
||||
Run the architecture designer to get pattern recommendations:
|
||||
|
||||
```bash
|
||||
python scripts/architecture_designer.py --input requirements.json
|
||||
```
|
||||
|
||||
Select from recommended patterns:
|
||||
- **Serverless Web**: S3 + CloudFront + API Gateway + Lambda + DynamoDB
|
||||
- **Event-Driven Microservices**: EventBridge + Lambda + SQS + Step Functions
|
||||
- **Three-Tier**: ALB + ECS Fargate + Aurora + ElastiCache
|
||||
- **GraphQL Backend**: AppSync + Lambda + DynamoDB + Cognito
|
||||
|
||||
See `references/architecture_patterns.md` for detailed pattern specifications.
|
||||
|
||||
### Step 3: Generate IaC Templates
|
||||
|
||||
Create infrastructure-as-code for the selected pattern:
|
||||
|
||||
```bash
|
||||
# Serverless stack (CloudFormation)
|
||||
python scripts/serverless_stack.py --app-name my-app --region us-east-1
|
||||
|
||||
# Output: CloudFormation YAML template ready to deploy
|
||||
```
|
||||
|
||||
### Step 4: Review Costs
|
||||
|
||||
Analyze estimated costs and optimization opportunities:
|
||||
|
||||
```bash
|
||||
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000
|
||||
```
|
||||
|
||||
Output includes:
|
||||
- Monthly cost breakdown by service
|
||||
- Right-sizing recommendations
|
||||
- Savings Plans opportunities
|
||||
- Potential monthly savings
|
||||
|
||||
### Step 5: Deploy
|
||||
|
||||
Deploy the generated infrastructure:
|
||||
|
||||
```bash
|
||||
# CloudFormation
|
||||
aws cloudformation create-stack \
|
||||
--stack-name my-app-stack \
|
||||
--template-body file://template.yaml \
|
||||
--capabilities CAPABILITY_IAM
|
||||
|
||||
# CDK
|
||||
cdk deploy
|
||||
|
||||
# Terraform
|
||||
terraform init && terraform apply
|
||||
```
|
||||
|
||||
### Step 6: Validate
|
||||
|
||||
Verify deployment and set up monitoring:
|
||||
|
||||
```bash
|
||||
# Check stack status
|
||||
aws cloudformation describe-stacks --stack-name my-app-stack
|
||||
|
||||
# Set up CloudWatch alarms
|
||||
aws cloudwatch put-metric-alarm --alarm-name high-errors ...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tools
|
||||
|
||||
### architecture_designer.py
|
||||
|
||||
Generates architecture patterns based on requirements.
|
||||
|
||||
```bash
|
||||
python scripts/architecture_designer.py --input requirements.json --output design.json
|
||||
```
|
||||
|
||||
**Input:** JSON with app type, scale, budget, compliance needs
|
||||
**Output:** Recommended pattern, service stack, cost estimate, pros/cons
|
||||
|
||||
### serverless_stack.py
|
||||
|
||||
Creates serverless CloudFormation templates.
|
||||
|
||||
```bash
|
||||
python scripts/serverless_stack.py --app-name my-app --region us-east-1
|
||||
```
|
||||
|
||||
**Output:** Production-ready CloudFormation YAML with:
|
||||
- API Gateway + Lambda
|
||||
- DynamoDB table
|
||||
- Cognito user pool
|
||||
- IAM roles with least privilege
|
||||
- CloudWatch logging
|
||||
|
||||
### cost_optimizer.py
|
||||
|
||||
Analyzes costs and recommends optimizations.
|
||||
|
||||
```bash
|
||||
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000
|
||||
```
|
||||
|
||||
**Output:** Recommendations for:
|
||||
- Idle resource removal
|
||||
- Instance right-sizing
|
||||
- Reserved capacity purchases
|
||||
- Storage tier transitions
|
||||
- NAT Gateway alternatives
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### MVP Architecture (< $100/month)
|
||||
|
||||
```
|
||||
Ask: "Design a serverless MVP backend for a mobile app with 1000 users"
|
||||
|
||||
Result:
|
||||
- Lambda + API Gateway for API
|
||||
- DynamoDB pay-per-request for data
|
||||
- Cognito for authentication
|
||||
- S3 + CloudFront for static assets
|
||||
- Estimated: $20-50/month
|
||||
```
|
||||
|
||||
### Scaling Architecture ($500-2000/month)
|
||||
|
||||
```
|
||||
Ask: "Design a scalable architecture for a SaaS platform with 50k users"
|
||||
|
||||
Result:
|
||||
- ECS Fargate for containerized API
|
||||
- Aurora Serverless for relational data
|
||||
- ElastiCache for session caching
|
||||
- CloudFront for CDN
|
||||
- CodePipeline for CI/CD
|
||||
- Multi-AZ deployment
|
||||
```
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
```
|
||||
Ask: "Optimize my AWS setup to reduce costs by 30%. Current spend: $3000/month"
|
||||
|
||||
Provide: Current resource inventory (EC2, RDS, S3, etc.)
|
||||
|
||||
Result:
|
||||
- Idle resource identification
|
||||
- Right-sizing recommendations
|
||||
- Savings Plans analysis
|
||||
- Storage lifecycle policies
|
||||
- Target savings: $900/month
|
||||
```
|
||||
|
||||
### IaC Generation
|
||||
|
||||
```
|
||||
Ask: "Generate CloudFormation for a three-tier web app with auto-scaling"
|
||||
|
||||
Result:
|
||||
- VPC with public/private subnets
|
||||
- ALB with HTTPS
|
||||
- ECS Fargate with auto-scaling
|
||||
- Aurora with read replicas
|
||||
- Security groups and IAM roles
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Input Requirements
|
||||
|
||||
Architecture design requires:
|
||||
- **Application type**: Web app, mobile backend, data pipeline, microservices, SaaS platform
|
||||
- **Traffic expectations**: Users/day, requests/second, geographic distribution
|
||||
- **Data requirements**: Storage needs, database type, backup/retention policies
|
||||
- **Budget constraints**: Monthly spend limits, cost optimization priorities
|
||||
- **Team size & expertise**: Developer count, AWS experience level, DevOps maturity
|
||||
- **Compliance needs**: GDPR, HIPAA, SOC 2, PCI-DSS, data residency
|
||||
- **Availability requirements**: SLA targets, uptime goals, disaster recovery RPO/RTO
|
||||
Provide these details for architecture design:
|
||||
|
||||
Formats accepted:
|
||||
- Text description of application requirements
|
||||
- JSON with structured architecture specifications
|
||||
- Existing architecture diagrams or documentation
|
||||
- Current AWS resource inventory (for optimization)
|
||||
| Requirement | Description | Example |
|
||||
|-------------|-------------|---------|
|
||||
| Application type | What you're building | SaaS platform, mobile backend |
|
||||
| Expected scale | Users, requests/sec | 10k users, 100 RPS |
|
||||
| Budget | Monthly AWS limit | $500/month max |
|
||||
| Team context | Size, AWS experience | 3 devs, intermediate |
|
||||
| Compliance | Regulatory needs | HIPAA, GDPR, SOC 2 |
|
||||
| Availability | Uptime requirements | 99.9% SLA, 1hr RPO |
|
||||
|
||||
**JSON Format:**
|
||||
|
||||
```json
|
||||
{
|
||||
"application_type": "saas_platform",
|
||||
"expected_users": 10000,
|
||||
"requests_per_second": 100,
|
||||
"budget_monthly_usd": 500,
|
||||
"team_size": 3,
|
||||
"aws_experience": "intermediate",
|
||||
"compliance": ["SOC2"],
|
||||
"availability_sla": "99.9%"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Output Formats
|
||||
|
||||
Results include:
|
||||
- **Architecture diagrams**: Visual representations using draw.io or Lucidchart format
|
||||
- **CloudFormation/CDK templates**: Infrastructure as Code (IaC) ready to deploy
|
||||
- **Terraform configurations**: Multi-cloud compatible infrastructure definitions
|
||||
- **Cost estimates**: Detailed monthly cost breakdown with optimization suggestions
|
||||
- **Security assessment**: Best practices checklist, compliance validation
|
||||
- **Deployment guides**: Step-by-step implementation instructions
|
||||
- **Runbooks**: Operational procedures, troubleshooting guides, disaster recovery plans
|
||||
- **Migration strategies**: Phased migration plans, rollback procedures
|
||||
### Architecture Design
|
||||
|
||||
## How to Use
|
||||
- Pattern recommendation with rationale
|
||||
- Service stack diagram (ASCII)
|
||||
- Configuration specifications
|
||||
- Monthly cost estimate
|
||||
- Scaling characteristics
|
||||
- Trade-offs and limitations
|
||||
|
||||
"Design a serverless API backend for a mobile app with 100k users using Lambda and DynamoDB"
|
||||
"Create a cost-optimized architecture for a SaaS platform with multi-tenancy"
|
||||
"Generate CloudFormation template for a three-tier web application with auto-scaling"
|
||||
"Design event-driven microservices architecture using EventBridge and Step Functions"
|
||||
"Optimize my current AWS setup to reduce costs by 30%"
|
||||
### IaC Templates
|
||||
|
||||
## Scripts
|
||||
- **CloudFormation YAML**: Production-ready SAM/CFN templates
|
||||
- **CDK TypeScript**: Type-safe infrastructure code
|
||||
- **Terraform HCL**: Multi-cloud compatible configs
|
||||
|
||||
- `architecture_designer.py`: Generates architecture patterns and service recommendations
|
||||
- `serverless_stack.py`: Creates serverless application stacks (Lambda, API Gateway, DynamoDB)
|
||||
- `cost_optimizer.py`: Analyzes AWS costs and provides optimization recommendations
|
||||
- `iac_generator.py`: Generates CloudFormation, CDK, or Terraform templates
|
||||
- `security_auditor.py`: AWS security best practices validation and compliance checks
|
||||
### Cost Analysis
|
||||
|
||||
## Architecture Patterns
|
||||
- Current spend breakdown
|
||||
- Optimization recommendations with savings
|
||||
- Priority action list (high/medium/low)
|
||||
- Implementation checklist
|
||||
|
||||
### 1. Serverless Web Application
|
||||
**Use Case**: SaaS platforms, mobile backends, low-traffic websites
|
||||
---
|
||||
|
||||
**Stack**:
|
||||
- **Frontend**: S3 + CloudFront (static hosting)
|
||||
- **API**: API Gateway + Lambda
|
||||
- **Database**: DynamoDB or Aurora Serverless
|
||||
- **Auth**: Cognito
|
||||
- **CI/CD**: Amplify or CodePipeline
|
||||
## Reference Documentation
|
||||
|
||||
**Benefits**: Zero server management, pay-per-use, auto-scaling, low operational overhead
|
||||
| Document | Contents |
|
||||
|----------|----------|
|
||||
| `references/architecture_patterns.md` | 6 patterns: serverless, microservices, three-tier, data processing, GraphQL, multi-region |
|
||||
| `references/service_selection.md` | Decision matrices for compute, database, storage, messaging |
|
||||
| `references/best_practices.md` | Serverless design, cost optimization, security hardening, scalability |
|
||||
|
||||
**Cost**: $50-500/month for small to medium traffic
|
||||
|
||||
### 2. Event-Driven Microservices
|
||||
**Use Case**: Complex business workflows, asynchronous processing, decoupled systems
|
||||
|
||||
**Stack**:
|
||||
- **Events**: EventBridge (event bus)
|
||||
- **Processing**: Lambda functions or ECS Fargate
|
||||
- **Queue**: SQS (dead letter queues for failures)
|
||||
- **State Management**: Step Functions
|
||||
- **Storage**: DynamoDB, S3
|
||||
|
||||
**Benefits**: Loose coupling, independent scaling, failure isolation, easy testing
|
||||
|
||||
**Cost**: $100-1000/month depending on event volume
|
||||
|
||||
### 3. Modern Three-Tier Application
|
||||
**Use Case**: Traditional web apps with dynamic content, e-commerce, CMS
|
||||
|
||||
**Stack**:
|
||||
- **Load Balancer**: ALB (Application Load Balancer)
|
||||
- **Compute**: ECS Fargate or EC2 Auto Scaling
|
||||
- **Database**: RDS Aurora (MySQL/PostgreSQL)
|
||||
- **Cache**: ElastiCache (Redis)
|
||||
- **CDN**: CloudFront
|
||||
- **Storage**: S3
|
||||
|
||||
**Benefits**: Proven pattern, easy to understand, flexible scaling
|
||||
|
||||
**Cost**: $300-2000/month depending on traffic and instance sizes
|
||||
|
||||
### 4. Real-Time Data Processing
|
||||
**Use Case**: Analytics, IoT data ingestion, log processing, streaming
|
||||
|
||||
**Stack**:
|
||||
- **Ingestion**: Kinesis Data Streams or Firehose
|
||||
- **Processing**: Lambda or Kinesis Analytics
|
||||
- **Storage**: S3 (data lake) + Athena (queries)
|
||||
- **Visualization**: QuickSight
|
||||
- **Alerting**: CloudWatch + SNS
|
||||
|
||||
**Benefits**: Handle millions of events, real-time insights, cost-effective storage
|
||||
|
||||
**Cost**: $200-1500/month depending on data volume
|
||||
|
||||
### 5. GraphQL API Backend
|
||||
**Use Case**: Mobile apps, single-page applications, flexible data queries
|
||||
|
||||
**Stack**:
|
||||
- **API**: AppSync (managed GraphQL)
|
||||
- **Resolvers**: Lambda or direct DynamoDB integration
|
||||
- **Database**: DynamoDB
|
||||
- **Real-time**: AppSync subscriptions (WebSocket)
|
||||
- **Auth**: Cognito or API keys
|
||||
|
||||
**Benefits**: Single endpoint, reduce over/under-fetching, real-time subscriptions
|
||||
|
||||
**Cost**: $50-400/month for moderate usage
|
||||
|
||||
### 6. Multi-Region High Availability
|
||||
**Use Case**: Global applications, disaster recovery, compliance requirements
|
||||
|
||||
**Stack**:
|
||||
- **DNS**: Route 53 (geolocation routing)
|
||||
- **CDN**: CloudFront with multiple origins
|
||||
- **Compute**: Multi-region Lambda or ECS
|
||||
- **Database**: DynamoDB Global Tables or Aurora Global Database
|
||||
- **Replication**: S3 cross-region replication
|
||||
|
||||
**Benefits**: Low latency globally, disaster recovery, data sovereignty
|
||||
|
||||
**Cost**: 1.5-2x single region costs
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Serverless Design Principles
|
||||
1. **Stateless functions** - Store state in DynamoDB, S3, or ElastiCache
|
||||
2. **Idempotency** - Handle retries gracefully, use unique request IDs
|
||||
3. **Cold start optimization** - Use provisioned concurrency for critical paths, optimize package size
|
||||
4. **Timeout management** - Set appropriate timeouts, use Step Functions for long processes
|
||||
5. **Error handling** - Implement retry logic, dead letter queues, exponential backoff
|
||||
|
||||
### Cost Optimization
|
||||
1. **Right-sizing** - Start small, monitor metrics, scale based on actual usage
|
||||
2. **Reserved capacity** - Use Savings Plans or Reserved Instances for predictable workloads
|
||||
3. **S3 lifecycle policies** - Transition to cheaper storage tiers (IA, Glacier)
|
||||
4. **Lambda memory optimization** - Test different memory settings for cost/performance balance
|
||||
5. **CloudWatch log retention** - Set appropriate retention periods (7-30 days for most)
|
||||
6. **NAT Gateway alternatives** - Use VPC endpoints, consider single NAT in dev environments
|
||||
|
||||
### Security Hardening
|
||||
1. **Principle of least privilege** - IAM roles with minimal permissions
|
||||
2. **Encryption everywhere** - At rest (KMS) and in transit (TLS/SSL)
|
||||
3. **Network isolation** - Private subnets, security groups, NACLs
|
||||
4. **Secrets management** - Use Secrets Manager or Parameter Store, never hardcode
|
||||
5. **API protection** - WAF rules, rate limiting, API keys, OAuth2
|
||||
6. **Audit logging** - CloudTrail for API calls, VPC Flow Logs for network traffic
|
||||
|
||||
### Scalability Design
|
||||
1. **Horizontal over vertical** - Scale out with more small instances vs. larger instances
|
||||
2. **Database sharding** - Partition data by tenant, geography, or time
|
||||
3. **Read replicas** - Offload read traffic from primary database
|
||||
4. **Caching layers** - CloudFront (edge), ElastiCache (application), DAX (DynamoDB)
|
||||
5. **Async processing** - Use queues (SQS) for non-critical operations
|
||||
6. **Auto-scaling policies** - Target tracking (CPU, requests) vs. step scaling
|
||||
|
||||
### DevOps & Reliability
|
||||
1. **Infrastructure as Code** - Version control, peer review, automated testing
|
||||
2. **Blue/Green deployments** - Zero-downtime releases, instant rollback
|
||||
3. **Canary releases** - Test new versions with small traffic percentage
|
||||
4. **Health checks** - Application-level health endpoints, graceful degradation
|
||||
5. **Chaos engineering** - Test failure scenarios, validate recovery procedures
|
||||
6. **Monitoring & alerting** - Set up CloudWatch alarms for critical metrics
|
||||
|
||||
## Service Selection Guide
|
||||
|
||||
### Compute
|
||||
- **Lambda**: Event-driven, short-duration tasks (<15 min), variable traffic
|
||||
- **Fargate**: Containerized apps, long-running processes, predictable traffic
|
||||
- **EC2**: Custom configurations, GPU/FPGA needs, Windows apps
|
||||
- **App Runner**: Simple container deployment from source code
|
||||
|
||||
### Database
|
||||
- **DynamoDB**: Key-value, document store, serverless, single-digit ms latency
|
||||
- **Aurora Serverless**: Relational DB, variable workloads, auto-scaling
|
||||
- **Aurora Standard**: High-performance relational, predictable traffic
|
||||
- **RDS**: Traditional databases (MySQL, PostgreSQL, MariaDB, SQL Server)
|
||||
- **DocumentDB**: MongoDB-compatible, document store
|
||||
- **Neptune**: Graph database for connected data
|
||||
- **Timestream**: Time-series data, IoT metrics
|
||||
|
||||
### Storage
|
||||
- **S3 Standard**: Frequent access, low latency
|
||||
- **S3 Intelligent-Tiering**: Automatic cost optimization
|
||||
- **S3 IA (Infrequent Access)**: Backups, archives (30-day minimum)
|
||||
- **S3 Glacier**: Long-term archives, compliance
|
||||
- **EFS**: Network file system, shared storage across instances
|
||||
- **EBS**: Block storage for EC2, high IOPS
|
||||
|
||||
### Messaging & Events
|
||||
- **EventBridge**: Event bus, loosely coupled microservices
|
||||
- **SNS**: Pub/sub, fan-out notifications
|
||||
- **SQS**: Message queuing, decoupling, buffering
|
||||
- **Kinesis**: Real-time streaming data, analytics
|
||||
- **MQ**: Managed message brokers (RabbitMQ, ActiveMQ)
|
||||
|
||||
### API & Integration
|
||||
- **API Gateway**: REST APIs, WebSocket, throttling, caching
|
||||
- **AppSync**: GraphQL APIs, real-time subscriptions
|
||||
- **AppFlow**: SaaS integration (Salesforce, Slack, etc.)
|
||||
- **Step Functions**: Workflow orchestration, state machines
|
||||
|
||||
## Startup-Specific Considerations
|
||||
|
||||
### MVP (Minimum Viable Product) Architecture
|
||||
**Goal**: Launch fast, minimal infrastructure
|
||||
|
||||
**Recommended**:
|
||||
- Amplify (full-stack deployment)
|
||||
- Lambda + API Gateway + DynamoDB
|
||||
- Cognito for auth
|
||||
- CloudFront + S3 for frontend
|
||||
|
||||
**Cost**: $20-100/month
|
||||
**Setup time**: 1-3 days
|
||||
|
||||
### Growth Stage (Scaling to 10k-100k users)
|
||||
**Goal**: Handle growth, maintain cost efficiency
|
||||
|
||||
**Add**:
|
||||
- ElastiCache for caching
|
||||
- Aurora Serverless for complex queries
|
||||
- CloudWatch dashboards and alarms
|
||||
- CI/CD pipeline (CodePipeline)
|
||||
- Multi-AZ deployment
|
||||
|
||||
**Cost**: $500-2000/month
|
||||
**Migration time**: 1-2 weeks
|
||||
|
||||
### Scale-Up (100k+ users, Series A+)
|
||||
**Goal**: Reliability, observability, global reach
|
||||
|
||||
**Add**:
|
||||
- Multi-region deployment
|
||||
- DynamoDB Global Tables
|
||||
- Advanced monitoring (X-Ray, third-party APM)
|
||||
- WAF and Shield for DDoS protection
|
||||
- Dedicated support plan
|
||||
- Reserved instances/Savings Plans
|
||||
|
||||
**Cost**: $3000-10000/month
|
||||
**Migration time**: 1-3 months
|
||||
|
||||
## Common Pitfalls to Avoid
|
||||
|
||||
### Technical Debt
|
||||
- **Over-engineering early** - Don't build for 10M users when you have 100
|
||||
- **Under-monitoring** - Set up basic monitoring from day one
|
||||
- **Ignoring costs** - Enable Cost Explorer and billing alerts immediately
|
||||
- **Single region dependency** - Plan for multi-region from start
|
||||
|
||||
### Security Mistakes
|
||||
- **Public S3 buckets** - Use bucket policies, block public access
|
||||
- **Overly permissive IAM** - Avoid "*" permissions, use specific resources
|
||||
- **Hardcoded credentials** - Use IAM roles, Secrets Manager
|
||||
- **Unencrypted data** - Enable encryption by default
|
||||
|
||||
### Performance Issues
|
||||
- **No caching** - Add CloudFront, ElastiCache early
|
||||
- **Inefficient queries** - Use indexes, avoid scans in DynamoDB
|
||||
- **Large Lambda packages** - Use layers, minimize dependencies
|
||||
- **N+1 queries** - Implement DataLoader pattern, batch operations
|
||||
|
||||
### Cost Surprises
|
||||
- **Undeleted resources** - Tag everything, review regularly
|
||||
- **Data transfer costs** - Keep traffic within same AZ/region when possible
|
||||
- **NAT Gateway charges** - Use VPC endpoints for AWS services
|
||||
- **CloudWatch Logs accumulation** - Set retention policies
|
||||
|
||||
## Compliance & Governance
|
||||
|
||||
### Data Residency
|
||||
- Use specific regions (eu-west-1 for GDPR)
|
||||
- Enable S3 bucket replication restrictions
|
||||
- Configure Route 53 geolocation routing
|
||||
|
||||
### HIPAA Compliance
|
||||
- Use BAA-eligible services only
|
||||
- Enable encryption at rest and in transit
|
||||
- Implement audit logging (CloudTrail)
|
||||
- Configure VPC with private subnets
|
||||
|
||||
### SOC 2 / ISO 27001
|
||||
- Enable AWS Config for compliance rules
|
||||
- Use AWS Audit Manager
|
||||
- Implement least privilege access
|
||||
- Regular security assessments
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Lambda limitations**: 15-minute execution limit, 10GB memory max, cold start latency
|
||||
- **API Gateway limits**: 29-second timeout, 10MB payload size
|
||||
- **DynamoDB limits**: 400KB item size, eventually consistent reads by default
|
||||
- **Regional availability**: Not all services available in all regions
|
||||
- **Vendor lock-in**: Some serverless services are AWS-specific (consider abstraction layers)
|
||||
- **Learning curve**: Requires AWS expertise, DevOps knowledge
|
||||
- **Debugging complexity**: Distributed systems harder to troubleshoot than monoliths
|
||||
|
||||
## Helpful Resources
|
||||
|
||||
- **AWS Well-Architected Framework**: https://aws.amazon.com/architecture/well-architected/
|
||||
- **AWS Architecture Center**: https://aws.amazon.com/architecture/
|
||||
- **Serverless Land**: https://serverlessland.com/
|
||||
- **AWS Pricing Calculator**: https://calculator.aws/
|
||||
- **AWS Cost Explorer**: Track and analyze spending
|
||||
- **AWS Trusted Advisor**: Automated best practice checks
|
||||
- **CloudFormation Templates**: https://github.com/awslabs/aws-cloudformation-templates
|
||||
- **AWS CDK Examples**: https://github.com/aws-samples/aws-cdk-examples
|
||||
- Lambda: 15-minute execution, 10GB memory max
|
||||
- API Gateway: 29-second timeout, 10MB payload
|
||||
- DynamoDB: 400KB item size, eventually consistent by default
|
||||
- Regional availability varies by service
|
||||
- Some services have AWS-specific lock-in
|
||||
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,535 @@
|
||||
# AWS Architecture Patterns for Startups
|
||||
|
||||
Reference guide for selecting the right AWS architecture pattern based on application requirements.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Pattern Selection Matrix](#pattern-selection-matrix)
|
||||
- [Pattern 1: Serverless Web Application](#pattern-1-serverless-web-application)
|
||||
- [Pattern 2: Event-Driven Microservices](#pattern-2-event-driven-microservices)
|
||||
- [Pattern 3: Modern Three-Tier Application](#pattern-3-modern-three-tier-application)
|
||||
- [Pattern 4: Real-Time Data Processing](#pattern-4-real-time-data-processing)
|
||||
- [Pattern 5: GraphQL API Backend](#pattern-5-graphql-api-backend)
|
||||
- [Pattern 6: Multi-Region High Availability](#pattern-6-multi-region-high-availability)
|
||||
|
||||
---
|
||||
|
||||
## Pattern Selection Matrix
|
||||
|
||||
| Pattern | Best For | Users | Monthly Cost | Complexity |
|
||||
|---------|----------|-------|--------------|------------|
|
||||
| Serverless Web | MVP, SaaS, mobile backend | <50K | $50-500 | Low |
|
||||
| Event-Driven Microservices | Complex workflows, async processing | Any | $100-1000 | Medium |
|
||||
| Three-Tier | Traditional web, e-commerce | 10K-500K | $300-2000 | Medium |
|
||||
| Real-Time Data | Analytics, IoT, streaming | Any | $200-1500 | High |
|
||||
| GraphQL Backend | Mobile apps, SPAs | <100K | $50-400 | Medium |
|
||||
| Multi-Region HA | Global apps, DR requirements | >100K | 1.5-2x single | High |
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1: Serverless Web Application
|
||||
|
||||
### Use Case
|
||||
SaaS platforms, mobile backends, low-traffic websites, MVPs
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ CloudFront │────▶│ S3 │ │ Cognito │
|
||||
│ (CDN) │ │ (Static) │ │ (Auth) │
|
||||
└─────────────┘ └─────────────┘ └──────┬──────┘
|
||||
│
|
||||
┌─────────────┐ ┌─────────────┐ ┌──────▼──────┐
|
||||
│ Route 53 │────▶│ API Gateway │────▶│ Lambda │
|
||||
│ (DNS) │ │ (REST) │ │ (Functions) │
|
||||
└─────────────┘ └─────────────┘ └──────┬──────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ DynamoDB │
|
||||
│ (Database) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### Service Stack
|
||||
|
||||
| Layer | Service | Configuration |
|
||||
|-------|---------|---------------|
|
||||
| Frontend | S3 + CloudFront | Static hosting with HTTPS |
|
||||
| API | API Gateway + Lambda | REST endpoints with throttling |
|
||||
| Database | DynamoDB | Pay-per-request billing |
|
||||
| Auth | Cognito | User pools with MFA support |
|
||||
| CI/CD | Amplify or CodePipeline | Automated deployments |
|
||||
|
||||
### CloudFormation Template
|
||||
|
||||
```yaml
|
||||
AWSTemplateFormatVersion: '2010-09-09'
|
||||
Transform: AWS::Serverless-2016-10-31
|
||||
|
||||
Resources:
|
||||
# API Function
|
||||
ApiFunction:
|
||||
Type: AWS::Serverless::Function
|
||||
Properties:
|
||||
Runtime: nodejs18.x
|
||||
Handler: index.handler
|
||||
MemorySize: 512
|
||||
Timeout: 10
|
||||
Events:
|
||||
Api:
|
||||
Type: Api
|
||||
Properties:
|
||||
Path: /{proxy+}
|
||||
Method: ANY
|
||||
|
||||
# DynamoDB Table
|
||||
DataTable:
|
||||
Type: AWS::DynamoDB::Table
|
||||
Properties:
|
||||
BillingMode: PAY_PER_REQUEST
|
||||
AttributeDefinitions:
|
||||
- AttributeName: PK
|
||||
AttributeType: S
|
||||
- AttributeName: SK
|
||||
AttributeType: S
|
||||
KeySchema:
|
||||
- AttributeName: PK
|
||||
KeyType: HASH
|
||||
- AttributeName: SK
|
||||
KeyType: RANGE
|
||||
```
|
||||
|
||||
### Cost Breakdown (10K users)
|
||||
|
||||
| Service | Monthly Cost |
|
||||
|---------|-------------|
|
||||
| Lambda | $5-20 |
|
||||
| API Gateway | $10-30 |
|
||||
| DynamoDB | $10-50 |
|
||||
| CloudFront | $5-15 |
|
||||
| S3 | $1-5 |
|
||||
| Cognito | $0-50 |
|
||||
| **Total** | **$31-170** |
|
||||
|
||||
### Pros and Cons
|
||||
|
||||
**Pros:**
|
||||
- Zero server management
|
||||
- Pay only for what you use
|
||||
- Auto-scaling built-in
|
||||
- Low operational overhead
|
||||
|
||||
**Cons:**
|
||||
- Cold start latency (100-500ms)
|
||||
- 15-minute Lambda execution limit
|
||||
- Vendor lock-in
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2: Event-Driven Microservices
|
||||
|
||||
### Use Case
|
||||
Complex business workflows, asynchronous processing, decoupled systems
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Service │────▶│ EventBridge │────▶│ Service │
|
||||
│ A │ │ (Event Bus)│ │ B │
|
||||
└─────────────┘ └──────┬──────┘ └─────────────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ SQS │
|
||||
│ (Queue) │
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌─────────────┐ ┌──────▼──────┐ ┌─────────────┐
|
||||
│ Step │◀────│ Lambda │────▶│ DynamoDB │
|
||||
│ Functions │ │ (Processor) │ │ (Storage) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Service Stack
|
||||
|
||||
| Layer | Service | Purpose |
|
||||
|-------|---------|---------|
|
||||
| Events | EventBridge | Central event bus |
|
||||
| Processing | Lambda or ECS Fargate | Event handlers |
|
||||
| Queue | SQS | Dead letter queue for failures |
|
||||
| Orchestration | Step Functions | Complex workflow state |
|
||||
| Storage | DynamoDB, S3 | Persistent data |
|
||||
|
||||
### Event Schema Example
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "orders.service",
|
||||
"detail-type": "OrderCreated",
|
||||
"detail": {
|
||||
"orderId": "ord-12345",
|
||||
"customerId": "cust-67890",
|
||||
"items": [...],
|
||||
"total": 99.99,
|
||||
"timestamp": "2024-01-15T10:30:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
| Service | Monthly Cost |
|
||||
|---------|-------------|
|
||||
| EventBridge | $1-10 |
|
||||
| Lambda | $20-100 |
|
||||
| SQS | $5-20 |
|
||||
| Step Functions | $25-100 |
|
||||
| DynamoDB | $20-100 |
|
||||
| **Total** | **$71-330** |
|
||||
|
||||
### Pros and Cons
|
||||
|
||||
**Pros:**
|
||||
- Loose coupling between services
|
||||
- Independent scaling per service
|
||||
- Failure isolation
|
||||
- Easy to test individually
|
||||
|
||||
**Cons:**
|
||||
- Distributed system complexity
|
||||
- Eventual consistency
|
||||
- Harder to debug
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3: Modern Three-Tier Application
|
||||
|
||||
### Use Case
|
||||
Traditional web apps, e-commerce, CMS, applications with complex queries
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐
|
||||
│ CloudFront │────▶│ ALB │
|
||||
│ (CDN) │ │ (Load Bal.) │
|
||||
└─────────────┘ └──────┬──────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ ECS Fargate │
|
||||
│ (Auto-scale)│
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌──────────────────┼──────────────────┐
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
|
||||
│ Aurora │ │ ElastiCache │ │ S3 │
|
||||
│ (Database) │ │ (Redis) │ │ (Storage) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Service Stack
|
||||
|
||||
| Layer | Service | Configuration |
|
||||
|-------|---------|---------------|
|
||||
| CDN | CloudFront | Edge caching, HTTPS |
|
||||
| Load Balancer | ALB | Path-based routing, health checks |
|
||||
| Compute | ECS Fargate | Container auto-scaling |
|
||||
| Database | Aurora MySQL/PostgreSQL | Multi-AZ, auto-scaling |
|
||||
| Cache | ElastiCache Redis | Session, query caching |
|
||||
| Storage | S3 | Static assets, uploads |
|
||||
|
||||
### Terraform Example
|
||||
|
||||
```hcl
|
||||
# ECS Service with Auto-scaling
|
||||
resource "aws_ecs_service" "app" {
|
||||
name = "app-service"
|
||||
cluster = aws_ecs_cluster.main.id
|
||||
task_definition = aws_ecs_task_definition.app.arn
|
||||
desired_count = 2
|
||||
|
||||
capacity_provider_strategy {
|
||||
capacity_provider = "FARGATE"
|
||||
weight = 100
|
||||
}
|
||||
|
||||
load_balancer {
|
||||
target_group_arn = aws_lb_target_group.app.arn
|
||||
container_name = "app"
|
||||
container_port = 3000
|
||||
}
|
||||
}
|
||||
|
||||
# Auto-scaling Policy
|
||||
resource "aws_appautoscaling_target" "app" {
|
||||
max_capacity = 10
|
||||
min_capacity = 2
|
||||
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.app.name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
service_namespace = "ecs"
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Breakdown (50K users)
|
||||
|
||||
| Service | Monthly Cost |
|
||||
|---------|-------------|
|
||||
| ECS Fargate (2 tasks) | $100-200 |
|
||||
| ALB | $25-50 |
|
||||
| Aurora | $100-300 |
|
||||
| ElastiCache | $50-100 |
|
||||
| CloudFront | $20-50 |
|
||||
| **Total** | **$295-700** |
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4: Real-Time Data Processing
|
||||
|
||||
### Use Case
|
||||
Analytics, IoT data ingestion, log processing, streaming data
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ IoT Core │────▶│ Kinesis │────▶│ Lambda │
|
||||
│ (Devices) │ │ (Stream) │ │ (Process) │
|
||||
└─────────────┘ └─────────────┘ └──────┬──────┘
|
||||
│
|
||||
┌─────────────┐ ┌─────────────┐ ┌──────▼──────┐
|
||||
│ QuickSight │◀────│ Athena │◀────│ S3 │
|
||||
│ (Viz) │ │ (Query) │ │ (Data Lake) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ CloudWatch │
|
||||
│ (Alerts) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### Service Stack
|
||||
|
||||
| Layer | Service | Purpose |
|
||||
|-------|---------|---------|
|
||||
| Ingestion | Kinesis Data Streams | Real-time data capture |
|
||||
| Processing | Lambda or Kinesis Analytics | Transform and analyze |
|
||||
| Storage | S3 (data lake) | Long-term storage |
|
||||
| Query | Athena | SQL queries on S3 |
|
||||
| Visualization | QuickSight | Dashboards and reports |
|
||||
| Alerting | CloudWatch + SNS | Threshold-based alerts |
|
||||
|
||||
### Kinesis Producer Example
|
||||
|
||||
```python
|
||||
import boto3
|
||||
import json
|
||||
|
||||
kinesis = boto3.client('kinesis')
|
||||
|
||||
def send_event(stream_name, data, partition_key):
|
||||
response = kinesis.put_record(
|
||||
StreamName=stream_name,
|
||||
Data=json.dumps(data),
|
||||
PartitionKey=partition_key
|
||||
)
|
||||
return response['SequenceNumber']
|
||||
|
||||
# Send sensor reading
|
||||
send_event(
|
||||
'sensor-stream',
|
||||
{'sensor_id': 'temp-01', 'value': 23.5, 'unit': 'celsius'},
|
||||
'sensor-01'
|
||||
)
|
||||
```
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
| Service | Monthly Cost |
|
||||
|---------|-------------|
|
||||
| Kinesis (1 shard) | $15-30 |
|
||||
| Lambda | $10-50 |
|
||||
| S3 | $5-50 |
|
||||
| Athena | $5-25 |
|
||||
| QuickSight | $24+ |
|
||||
| **Total** | **$59-179** |
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5: GraphQL API Backend
|
||||
|
||||
### Use Case
|
||||
Mobile apps, single-page applications, flexible data queries
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Mobile App │────▶│ AppSync │────▶│ Lambda │
|
||||
│ or SPA │ │ (GraphQL) │ │ (Resolvers) │
|
||||
└─────────────┘ └──────┬──────┘ └─────────────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ DynamoDB │
|
||||
│ (Direct) │
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌──────▼──────┐
|
||||
│ Cognito │
|
||||
│ (Auth) │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
### AppSync Schema Example
|
||||
|
||||
```graphql
|
||||
type Query {
|
||||
getUser(id: ID!): User
|
||||
listPosts(limit: Int, nextToken: String): PostConnection
|
||||
}
|
||||
|
||||
type Mutation {
|
||||
createPost(input: CreatePostInput!): Post
|
||||
updatePost(input: UpdatePostInput!): Post
|
||||
}
|
||||
|
||||
type Subscription {
|
||||
onCreatePost: Post @aws_subscribe(mutations: ["createPost"])
|
||||
}
|
||||
|
||||
type User {
|
||||
id: ID!
|
||||
email: String!
|
||||
posts: [Post]
|
||||
}
|
||||
|
||||
type Post {
|
||||
id: ID!
|
||||
title: String!
|
||||
content: String!
|
||||
author: User!
|
||||
createdAt: AWSDateTime!
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Breakdown
|
||||
|
||||
| Service | Monthly Cost |
|
||||
|---------|-------------|
|
||||
| AppSync | $4-40 |
|
||||
| Lambda | $5-30 |
|
||||
| DynamoDB | $10-50 |
|
||||
| Cognito | $0-50 |
|
||||
| **Total** | **$19-170** |
|
||||
|
||||
---
|
||||
|
||||
## Pattern 6: Multi-Region High Availability
|
||||
|
||||
### Use Case
|
||||
Global applications, disaster recovery, data sovereignty compliance
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ Route 53 │
|
||||
│(Geo routing)│
|
||||
└──────┬──────┘
|
||||
│
|
||||
┌────────────────┼────────────────┐
|
||||
│ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐
|
||||
│ us-east-1 │ │ eu-west-1 │
|
||||
│ CloudFront │ │ CloudFront │
|
||||
└──────┬──────┘ └──────┬──────┘
|
||||
│ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐
|
||||
│ ECS/Lambda │ │ ECS/Lambda │
|
||||
└──────┬──────┘ └──────┬──────┘
|
||||
│ │
|
||||
┌──────▼──────┐◀── Replication ──▶┌──────▼──────┐
|
||||
│ DynamoDB │ │ DynamoDB │
|
||||
│Global Table │ │Global Table │
|
||||
└─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Service Stack
|
||||
|
||||
| Component | Service | Configuration |
|
||||
|-----------|---------|---------------|
|
||||
| DNS | Route 53 | Geolocation or latency routing |
|
||||
| CDN | CloudFront | Multiple origins per region |
|
||||
| Compute | Lambda or ECS | Deployed in each region |
|
||||
| Database | DynamoDB Global Tables | Automatic replication |
|
||||
| Storage | S3 CRR | Cross-region replication |
|
||||
|
||||
### Route 53 Failover Policy
|
||||
|
||||
```yaml
|
||||
# Primary record
|
||||
HealthCheck:
|
||||
Type: AWS::Route53::HealthCheck
|
||||
Properties:
|
||||
HealthCheckConfig:
|
||||
Port: 443
|
||||
Type: HTTPS
|
||||
ResourcePath: /health
|
||||
FullyQualifiedDomainName: api-us-east-1.example.com
|
||||
|
||||
RecordSetPrimary:
|
||||
Type: AWS::Route53::RecordSet
|
||||
Properties:
|
||||
Name: api.example.com
|
||||
Type: A
|
||||
SetIdentifier: primary
|
||||
Failover: PRIMARY
|
||||
HealthCheckId: !Ref HealthCheck
|
||||
AliasTarget:
|
||||
DNSName: !GetAtt USEast1ALB.DNSName
|
||||
HostedZoneId: !GetAtt USEast1ALB.CanonicalHostedZoneID
|
||||
```
|
||||
|
||||
### Cost Considerations
|
||||
|
||||
| Factor | Impact |
|
||||
|--------|--------|
|
||||
| Compute | 2x (each region) |
|
||||
| Database | 25% premium for global tables |
|
||||
| Data Transfer | Cross-region replication costs |
|
||||
| Route 53 | Health checks + geo queries |
|
||||
| **Total** | **1.5-2x single region** |
|
||||
|
||||
---
|
||||
|
||||
## Pattern Comparison Summary
|
||||
|
||||
### Latency
|
||||
|
||||
| Pattern | Typical Latency |
|
||||
|---------|-----------------|
|
||||
| Serverless | 50-200ms (cold: 500ms+) |
|
||||
| Three-Tier | 20-100ms |
|
||||
| GraphQL | 30-150ms |
|
||||
| Multi-Region | <50ms (regional) |
|
||||
|
||||
### Scaling Characteristics
|
||||
|
||||
| Pattern | Scale Limit | Scale Speed |
|
||||
|---------|-------------|-------------|
|
||||
| Serverless | 1000 concurrent/function | Instant |
|
||||
| Three-Tier | Instance limits | Minutes |
|
||||
| Event-Driven | Unlimited | Instant |
|
||||
| Multi-Region | Regional limits | Instant |
|
||||
|
||||
### Operational Complexity
|
||||
|
||||
| Pattern | Setup | Maintenance | Debugging |
|
||||
|---------|-------|-------------|-----------|
|
||||
| Serverless | Low | Low | Medium |
|
||||
| Three-Tier | Medium | Medium | Low |
|
||||
| Event-Driven | High | Medium | High |
|
||||
| Multi-Region | High | High | High |
|
||||
@@ -0,0 +1,631 @@
|
||||
# AWS Best Practices for Startups
|
||||
|
||||
Production-ready practices for serverless, cost optimization, security, and operational excellence.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Serverless Best Practices](#serverless-best-practices)
|
||||
- [Cost Optimization](#cost-optimization)
|
||||
- [Security Hardening](#security-hardening)
|
||||
- [Scalability Patterns](#scalability-patterns)
|
||||
- [DevOps and Reliability](#devops-and-reliability)
|
||||
- [Common Pitfalls](#common-pitfalls)
|
||||
|
||||
---
|
||||
|
||||
## Serverless Best Practices
|
||||
|
||||
### Lambda Function Design
|
||||
|
||||
#### 1. Keep Functions Stateless
|
||||
|
||||
Store state externally in DynamoDB, S3, or ElastiCache.
|
||||
|
||||
```python
|
||||
# BAD: Function-level state
|
||||
cache = {}
|
||||
|
||||
def handler(event, context):
|
||||
if event['key'] in cache:
|
||||
return cache[event['key']]
|
||||
# ...
|
||||
|
||||
# GOOD: External state
|
||||
import boto3
|
||||
dynamodb = boto3.resource('dynamodb')
|
||||
table = dynamodb.Table('cache')
|
||||
|
||||
def handler(event, context):
|
||||
response = table.get_item(Key={'pk': event['key']})
|
||||
if 'Item' in response:
|
||||
return response['Item']['value']
|
||||
# ...
|
||||
```
|
||||
|
||||
#### 2. Implement Idempotency
|
||||
|
||||
Handle retries gracefully with unique request IDs.
|
||||
|
||||
```python
|
||||
import boto3
|
||||
import hashlib
|
||||
|
||||
dynamodb = boto3.resource('dynamodb')
|
||||
idempotency_table = dynamodb.Table('idempotency')
|
||||
|
||||
def handler(event, context):
|
||||
# Generate idempotency key
|
||||
idempotency_key = hashlib.sha256(
|
||||
f"{event['orderId']}-{event['action']}".encode()
|
||||
).hexdigest()
|
||||
|
||||
# Check if already processed
|
||||
try:
|
||||
response = idempotency_table.get_item(Key={'pk': idempotency_key})
|
||||
if 'Item' in response:
|
||||
return response['Item']['result']
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Process request
|
||||
result = process_order(event)
|
||||
|
||||
# Store result for idempotency
|
||||
idempotency_table.put_item(
|
||||
Item={
|
||||
'pk': idempotency_key,
|
||||
'result': result,
|
||||
'ttl': int(time.time()) + 86400 # 24h TTL
|
||||
}
|
||||
)
|
||||
|
||||
return result
|
||||
```
|
||||
|
||||
#### 3. Optimize Cold Starts
|
||||
|
||||
```python
|
||||
# Initialize outside handler (reused across invocations)
|
||||
import boto3
|
||||
from aws_xray_sdk.core import patch_all
|
||||
|
||||
# SDK initialization happens once
|
||||
dynamodb = boto3.resource('dynamodb')
|
||||
table = dynamodb.Table('my-table')
|
||||
patch_all()
|
||||
|
||||
def handler(event, context):
|
||||
# Handler code uses pre-initialized resources
|
||||
return table.get_item(Key={'pk': event['id']})
|
||||
```
|
||||
|
||||
**Cold Start Reduction Techniques:**
|
||||
- Use provisioned concurrency for critical paths
|
||||
- Minimize package size (use layers for dependencies)
|
||||
- Choose interpreted languages (Python, Node.js) over compiled
|
||||
- Avoid VPC unless necessary (adds 6-10 sec cold start)
|
||||
|
||||
#### 4. Set Appropriate Timeouts
|
||||
|
||||
```yaml
|
||||
# Lambda configuration
|
||||
Functions:
|
||||
ApiHandler:
|
||||
Timeout: 10 # Shorter for synchronous APIs
|
||||
MemorySize: 512
|
||||
|
||||
BackgroundProcessor:
|
||||
Timeout: 300 # Longer for async processing
|
||||
MemorySize: 1024
|
||||
```
|
||||
|
||||
**Timeout Guidelines:**
|
||||
- API handlers: 10-30 seconds
|
||||
- Event processors: 60-300 seconds
|
||||
- Use Step Functions for >15 minute workflows
|
||||
|
||||
---
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### 1. Right-Sizing Strategy
|
||||
|
||||
```bash
|
||||
# Check EC2 utilization
|
||||
aws cloudwatch get-metric-statistics \
|
||||
--namespace AWS/EC2 \
|
||||
--metric-name CPUUtilization \
|
||||
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
|
||||
--start-time $(date -d '7 days ago' -u +"%Y-%m-%dT%H:%M:%SZ") \
|
||||
--end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
|
||||
--period 3600 \
|
||||
--statistics Average
|
||||
```
|
||||
|
||||
**Right-Sizing Rules:**
|
||||
- <10% CPU average: Downsize instance
|
||||
- >80% CPU average: Consider upgrade or horizontal scaling
|
||||
- Review every month for the first 6 months
|
||||
|
||||
### 2. Savings Plans and Reserved Instances
|
||||
|
||||
| Commitment | Savings | Best For |
|
||||
|------------|---------|----------|
|
||||
| No Upfront, 1-year | 20-30% | Unknown future |
|
||||
| Partial Upfront, 1-year | 30-40% | Moderate confidence |
|
||||
| All Upfront, 3-year | 50-60% | Stable workloads |
|
||||
|
||||
```bash
|
||||
# Check Savings Plans recommendations
|
||||
aws cost-explorer get-savings-plans-purchase-recommendation \
|
||||
--savings-plans-type COMPUTE_SP \
|
||||
--term-in-years ONE_YEAR \
|
||||
--payment-option NO_UPFRONT \
|
||||
--lookback-period-in-days THIRTY_DAYS
|
||||
```
|
||||
|
||||
### 3. S3 Lifecycle Policies
|
||||
|
||||
```json
|
||||
{
|
||||
"Rules": [
|
||||
{
|
||||
"ID": "Transition to cheaper storage",
|
||||
"Status": "Enabled",
|
||||
"Filter": {
|
||||
"Prefix": "logs/"
|
||||
},
|
||||
"Transitions": [
|
||||
{ "Days": 30, "StorageClass": "STANDARD_IA" },
|
||||
{ "Days": 90, "StorageClass": "GLACIER" }
|
||||
],
|
||||
"Expiration": { "Days": 365 }
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Lambda Memory Optimization
|
||||
|
||||
Test different memory settings to find optimal cost/performance.
|
||||
|
||||
```python
|
||||
# Use AWS Lambda Power Tuning
|
||||
# https://github.com/alexcasalboni/aws-lambda-power-tuning
|
||||
|
||||
# Example results:
|
||||
# 128 MB: 2000ms, $0.000042
|
||||
# 512 MB: 500ms, $0.000042
|
||||
# 1024 MB: 300ms, $0.000050
|
||||
|
||||
# Optimal: 512 MB (same cost, 4x faster)
|
||||
```
|
||||
|
||||
### 5. NAT Gateway Alternatives
|
||||
|
||||
```
|
||||
NAT Gateway: $0.045/hour + $0.045/GB = ~$32/month + data
|
||||
|
||||
Alternatives:
|
||||
1. VPC Endpoints: $0.01/hour = ~$7.30/month (for AWS services)
|
||||
2. NAT Instance: t3.nano = ~$3.80/month (limited throughput)
|
||||
3. No NAT: Use VPC endpoints + Lambda outside VPC
|
||||
```
|
||||
|
||||
### 6. CloudWatch Log Retention
|
||||
|
||||
```yaml
|
||||
# Set retention policies to avoid unbounded growth
|
||||
LogGroup:
|
||||
Type: AWS::Logs::LogGroup
|
||||
Properties:
|
||||
LogGroupName: /aws/lambda/my-function
|
||||
RetentionInDays: 14 # 7, 14, 30, 60, 90, etc.
|
||||
```
|
||||
|
||||
**Retention Guidelines:**
|
||||
- Development: 7 days
|
||||
- Production non-critical: 30 days
|
||||
- Production critical: 90 days
|
||||
- Compliance requirements: As specified
|
||||
|
||||
---
|
||||
|
||||
## Security Hardening
|
||||
|
||||
### 1. IAM Least Privilege
|
||||
|
||||
```json
|
||||
// BAD: Overly permissive
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": "dynamodb:*",
|
||||
"Resource": "*"
|
||||
}
|
||||
|
||||
// GOOD: Specific actions and resources
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"dynamodb:GetItem",
|
||||
"dynamodb:PutItem",
|
||||
"dynamodb:Query"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:dynamodb:us-east-1:123456789:table/users",
|
||||
"arn:aws:dynamodb:us-east-1:123456789:table/users/index/*"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Encryption Configuration
|
||||
|
||||
```yaml
|
||||
# Enable encryption everywhere
|
||||
Resources:
|
||||
# DynamoDB
|
||||
Table:
|
||||
Type: AWS::DynamoDB::Table
|
||||
Properties:
|
||||
SSESpecification:
|
||||
SSEEnabled: true
|
||||
SSEType: KMS
|
||||
KMSMasterKeyId: !Ref EncryptionKey
|
||||
|
||||
# S3
|
||||
Bucket:
|
||||
Type: AWS::S3::Bucket
|
||||
Properties:
|
||||
BucketEncryption:
|
||||
ServerSideEncryptionConfiguration:
|
||||
- ServerSideEncryptionByDefault:
|
||||
SSEAlgorithm: aws:kms
|
||||
KMSMasterKeyID: !Ref EncryptionKey
|
||||
|
||||
# RDS
|
||||
Database:
|
||||
Type: AWS::RDS::DBInstance
|
||||
Properties:
|
||||
StorageEncrypted: true
|
||||
KmsKeyId: !Ref EncryptionKey
|
||||
```
|
||||
|
||||
### 3. Network Isolation
|
||||
|
||||
```yaml
|
||||
# Private subnets with VPC endpoints
|
||||
Resources:
|
||||
PrivateSubnet:
|
||||
Type: AWS::EC2::Subnet
|
||||
Properties:
|
||||
MapPublicIpOnLaunch: false
|
||||
|
||||
# DynamoDB Gateway Endpoint (free)
|
||||
DynamoDBEndpoint:
|
||||
Type: AWS::EC2::VPCEndpoint
|
||||
Properties:
|
||||
VpcId: !Ref VPC
|
||||
ServiceName: !Sub com.amazonaws.${AWS::Region}.dynamodb
|
||||
VpcEndpointType: Gateway
|
||||
RouteTableIds:
|
||||
- !Ref PrivateRouteTable
|
||||
|
||||
# Secrets Manager Interface Endpoint
|
||||
SecretsEndpoint:
|
||||
Type: AWS::EC2::VPCEndpoint
|
||||
Properties:
|
||||
VpcId: !Ref VPC
|
||||
ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager
|
||||
VpcEndpointType: Interface
|
||||
PrivateDnsEnabled: true
|
||||
```
|
||||
|
||||
### 4. Secrets Management
|
||||
|
||||
```python
|
||||
# Never hardcode secrets
|
||||
import boto3
|
||||
import json
|
||||
|
||||
def get_secret(secret_name):
|
||||
client = boto3.client('secretsmanager')
|
||||
response = client.get_secret_value(SecretId=secret_name)
|
||||
return json.loads(response['SecretString'])
|
||||
|
||||
# Usage
|
||||
db_creds = get_secret('prod/database/credentials')
|
||||
connection = connect(
|
||||
host=db_creds['host'],
|
||||
user=db_creds['username'],
|
||||
password=db_creds['password']
|
||||
)
|
||||
```
|
||||
|
||||
### 5. API Protection
|
||||
|
||||
```yaml
|
||||
# WAF + API Gateway
|
||||
WebACL:
|
||||
Type: AWS::WAFv2::WebACL
|
||||
Properties:
|
||||
DefaultAction:
|
||||
Allow: {}
|
||||
Rules:
|
||||
- Name: RateLimit
|
||||
Priority: 1
|
||||
Action:
|
||||
Block: {}
|
||||
Statement:
|
||||
RateBasedStatement:
|
||||
Limit: 2000
|
||||
AggregateKeyType: IP
|
||||
VisibilityConfig:
|
||||
SampledRequestsEnabled: true
|
||||
CloudWatchMetricsEnabled: true
|
||||
MetricName: RateLimitRule
|
||||
|
||||
- Name: AWSManagedRulesCommonRuleSet
|
||||
Priority: 2
|
||||
OverrideAction:
|
||||
None: {}
|
||||
Statement:
|
||||
ManagedRuleGroupStatement:
|
||||
VendorName: AWS
|
||||
Name: AWSManagedRulesCommonRuleSet
|
||||
```
|
||||
|
||||
### 6. Audit Logging
|
||||
|
||||
```yaml
|
||||
# Enable CloudTrail for all API calls
|
||||
CloudTrail:
|
||||
Type: AWS::CloudTrail::Trail
|
||||
Properties:
|
||||
IsMultiRegionTrail: true
|
||||
IsLogging: true
|
||||
S3BucketName: !Ref AuditLogsBucket
|
||||
IncludeGlobalServiceEvents: true
|
||||
EnableLogFileValidation: true
|
||||
EventSelectors:
|
||||
- ReadWriteType: All
|
||||
IncludeManagementEvents: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scalability Patterns
|
||||
|
||||
### 1. Horizontal vs Vertical Scaling
|
||||
|
||||
```
|
||||
Horizontal (preferred):
|
||||
- Add more Lambda concurrent executions
|
||||
- Add more Fargate tasks
|
||||
- Add more DynamoDB capacity
|
||||
|
||||
Vertical (when necessary):
|
||||
- Increase Lambda memory
|
||||
- Upgrade RDS instance
|
||||
- Larger EC2 instances
|
||||
```
|
||||
|
||||
### 2. Database Sharding
|
||||
|
||||
```python
|
||||
# Partition by tenant ID
|
||||
def get_table_for_tenant(tenant_id):
|
||||
shard = hash(tenant_id) % NUM_SHARDS
|
||||
return f"data-shard-{shard}"
|
||||
|
||||
# Or use DynamoDB single-table design with partition keys
|
||||
def get_partition_key(tenant_id, entity_type, entity_id):
|
||||
return f"TENANT#{tenant_id}#{entity_type}#{entity_id}"
|
||||
```
|
||||
|
||||
### 3. Caching Layers
|
||||
|
||||
```
|
||||
Edge (CloudFront): Global, static content, TTL: hours-days
|
||||
Application (Redis): Regional, session/query cache, TTL: minutes-hours
|
||||
Database (DAX): DynamoDB-specific, TTL: minutes
|
||||
```
|
||||
|
||||
```python
|
||||
# ElastiCache Redis caching pattern
|
||||
import redis
|
||||
import json
|
||||
|
||||
cache = redis.Redis(host='cache.abc123.cache.amazonaws.com', port=6379)
|
||||
|
||||
def get_user(user_id):
|
||||
# Check cache first
|
||||
cached = cache.get(f"user:{user_id}")
|
||||
if cached:
|
||||
return json.loads(cached)
|
||||
|
||||
# Fetch from database
|
||||
user = db.get_user(user_id)
|
||||
|
||||
# Cache for 5 minutes
|
||||
cache.setex(f"user:{user_id}", 300, json.dumps(user))
|
||||
|
||||
return user
|
||||
```
|
||||
|
||||
### 4. Auto-Scaling Configuration
|
||||
|
||||
```yaml
|
||||
# ECS Service Auto-scaling
|
||||
AutoScalingTarget:
|
||||
Type: AWS::ApplicationAutoScaling::ScalableTarget
|
||||
Properties:
|
||||
MaxCapacity: 10
|
||||
MinCapacity: 2
|
||||
ResourceId: !Sub service/${Cluster}/${Service.Name}
|
||||
ScalableDimension: ecs:service:DesiredCount
|
||||
ServiceNamespace: ecs
|
||||
|
||||
ScalingPolicy:
|
||||
Type: AWS::ApplicationAutoScaling::ScalingPolicy
|
||||
Properties:
|
||||
PolicyType: TargetTrackingScaling
|
||||
TargetTrackingScalingPolicyConfiguration:
|
||||
PredefinedMetricSpecification:
|
||||
PredefinedMetricType: ECSServiceAverageCPUUtilization
|
||||
TargetValue: 70
|
||||
ScaleInCooldown: 300
|
||||
ScaleOutCooldown: 60
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## DevOps and Reliability
|
||||
|
||||
### 1. Infrastructure as Code
|
||||
|
||||
```bash
|
||||
# Version control all infrastructure
|
||||
git init
|
||||
git add .
|
||||
git commit -m "Initial infrastructure setup"
|
||||
|
||||
# Use separate stacks per environment
|
||||
cdk deploy --context environment=dev
|
||||
cdk deploy --context environment=staging
|
||||
cdk deploy --context environment=production
|
||||
```
|
||||
|
||||
### 2. Blue/Green Deployments
|
||||
|
||||
```yaml
|
||||
# CodeDeploy Blue/Green for ECS
|
||||
DeploymentGroup:
|
||||
Type: AWS::CodeDeploy::DeploymentGroup
|
||||
Properties:
|
||||
DeploymentConfigName: CodeDeployDefault.ECSAllAtOnce
|
||||
DeploymentStyle:
|
||||
DeploymentType: BLUE_GREEN
|
||||
DeploymentOption: WITH_TRAFFIC_CONTROL
|
||||
BlueGreenDeploymentConfiguration:
|
||||
DeploymentReadyOption:
|
||||
ActionOnTimeout: CONTINUE_DEPLOYMENT
|
||||
WaitTimeInMinutes: 0
|
||||
TerminateBlueInstancesOnDeploymentSuccess:
|
||||
Action: TERMINATE
|
||||
TerminationWaitTimeInMinutes: 5
|
||||
```
|
||||
|
||||
### 3. Health Checks
|
||||
|
||||
```python
|
||||
# Application health endpoint
|
||||
from flask import Flask, jsonify
|
||||
import boto3
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route('/health')
|
||||
def health():
|
||||
checks = {
|
||||
'database': check_database(),
|
||||
'cache': check_cache(),
|
||||
'external_api': check_external_api()
|
||||
}
|
||||
|
||||
status = 'healthy' if all(checks.values()) else 'unhealthy'
|
||||
code = 200 if status == 'healthy' else 503
|
||||
|
||||
return jsonify({'status': status, 'checks': checks}), code
|
||||
|
||||
def check_database():
|
||||
try:
|
||||
# Quick connectivity test
|
||||
db.execute('SELECT 1')
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
```
|
||||
|
||||
### 4. Monitoring Setup
|
||||
|
||||
```yaml
|
||||
# CloudWatch Dashboard
|
||||
Dashboard:
|
||||
Type: AWS::CloudWatch::Dashboard
|
||||
Properties:
|
||||
DashboardName: production-overview
|
||||
DashboardBody: |
|
||||
{
|
||||
"widgets": [
|
||||
{
|
||||
"type": "metric",
|
||||
"properties": {
|
||||
"metrics": [
|
||||
["AWS/Lambda", "Invocations", "FunctionName", "api-handler"],
|
||||
[".", "Errors", ".", "."],
|
||||
[".", "Duration", ".", ".", {"stat": "p99"}]
|
||||
],
|
||||
"period": 60,
|
||||
"title": "Lambda Metrics"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
# Critical Alarms
|
||||
ErrorAlarm:
|
||||
Type: AWS::CloudWatch::Alarm
|
||||
Properties:
|
||||
AlarmName: high-error-rate
|
||||
MetricName: Errors
|
||||
Namespace: AWS/Lambda
|
||||
Statistic: Sum
|
||||
Period: 60
|
||||
EvaluationPeriods: 3
|
||||
Threshold: 10
|
||||
ComparisonOperator: GreaterThanThreshold
|
||||
AlarmActions:
|
||||
- !Ref AlertTopic
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Technical Debt
|
||||
|
||||
| Pitfall | Solution |
|
||||
|---------|----------|
|
||||
| Over-engineering early | Start simple, scale when needed |
|
||||
| Under-monitoring | Set up CloudWatch from day one |
|
||||
| Ignoring costs | Enable Cost Explorer and billing alerts |
|
||||
| Single region only | Plan for multi-region from start |
|
||||
|
||||
### Security Mistakes
|
||||
|
||||
| Mistake | Prevention |
|
||||
|---------|------------|
|
||||
| Public S3 buckets | Block public access, use bucket policies |
|
||||
| Overly permissive IAM | Never use "*", specify resources |
|
||||
| Hardcoded credentials | Use Secrets Manager, IAM roles |
|
||||
| Unencrypted data | Enable encryption by default |
|
||||
|
||||
### Performance Issues
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| No caching | Add CloudFront, ElastiCache early |
|
||||
| Inefficient queries | Use indexes, avoid DynamoDB scans |
|
||||
| Large Lambda packages | Use layers, minimize dependencies |
|
||||
| N+1 queries | Implement DataLoader, batch operations |
|
||||
|
||||
### Cost Surprises
|
||||
|
||||
| Surprise | Prevention |
|
||||
|----------|------------|
|
||||
| Undeleted resources | Tag everything, review weekly |
|
||||
| Data transfer costs | Keep traffic in same AZ/region |
|
||||
| NAT Gateway charges | Use VPC endpoints for AWS services |
|
||||
| Log accumulation | Set CloudWatch retention policies |
|
||||
@@ -0,0 +1,484 @@
|
||||
# AWS Service Selection Guide
|
||||
|
||||
Quick reference for choosing the right AWS service based on requirements.
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Compute Services](#compute-services)
|
||||
- [Database Services](#database-services)
|
||||
- [Storage Services](#storage-services)
|
||||
- [Messaging and Events](#messaging-and-events)
|
||||
- [API and Integration](#api-and-integration)
|
||||
- [Networking](#networking)
|
||||
- [Security and Identity](#security-and-identity)
|
||||
|
||||
---
|
||||
|
||||
## Compute Services
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Requirement | Recommended Service |
|
||||
|-------------|---------------------|
|
||||
| Event-driven, short tasks (<15 min) | Lambda |
|
||||
| Containerized apps, predictable traffic | ECS Fargate |
|
||||
| Custom configs, GPU/FPGA | EC2 |
|
||||
| Simple container from source | App Runner |
|
||||
| Kubernetes workloads | EKS |
|
||||
| Batch processing | AWS Batch |
|
||||
|
||||
### Lambda
|
||||
|
||||
**Best for:** Event-driven functions, API backends, scheduled tasks
|
||||
|
||||
```
|
||||
Limits:
|
||||
- Execution: 15 minutes max
|
||||
- Memory: 128 MB - 10 GB
|
||||
- Package: 50 MB (zip), 10 GB (container)
|
||||
- Concurrency: 1000 default (soft limit)
|
||||
|
||||
Pricing: $0.20 per 1M requests + compute time
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Variable/unpredictable traffic
|
||||
- Pay-per-use is important
|
||||
- No server management desired
|
||||
- Short-duration operations
|
||||
|
||||
**Avoid when:**
|
||||
- Long-running processes (>15 min)
|
||||
- Low-latency requirements (<50ms)
|
||||
- Heavy compute (consider Fargate)
|
||||
|
||||
### ECS Fargate
|
||||
|
||||
**Best for:** Containerized applications, microservices
|
||||
|
||||
```
|
||||
Limits:
|
||||
- vCPU: 0.25 - 16
|
||||
- Memory: 0.5 GB - 120 GB
|
||||
- Storage: 20 GB - 200 GB ephemeral
|
||||
|
||||
Pricing: Per vCPU-hour + GB-hour
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Containerized applications
|
||||
- Predictable traffic patterns
|
||||
- Long-running processes
|
||||
- Need more control than Lambda
|
||||
|
||||
### EC2
|
||||
|
||||
**Best for:** Custom configurations, specialized hardware
|
||||
|
||||
```
|
||||
Instance Types:
|
||||
- General: t3, m6i
|
||||
- Compute: c6i
|
||||
- Memory: r6i
|
||||
- GPU: p4d, g5
|
||||
- Storage: i3, d3
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Need GPU/FPGA
|
||||
- Windows applications
|
||||
- Specific instance configurations
|
||||
- Reserved capacity makes sense
|
||||
|
||||
---
|
||||
|
||||
## Database Services
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Data Type | Query Pattern | Scale | Recommended |
|
||||
|-----------|--------------|-------|-------------|
|
||||
| Key-value | Simple lookups | Any | DynamoDB |
|
||||
| Document | Flexible queries | <1TB | DocumentDB |
|
||||
| Relational | Complex joins | Variable | Aurora Serverless |
|
||||
| Relational | High volume | Fixed | Aurora Standard |
|
||||
| Time-series | Time-based | Any | Timestream |
|
||||
| Graph | Relationships | Any | Neptune |
|
||||
|
||||
### DynamoDB
|
||||
|
||||
**Best for:** Key-value and document data, serverless applications
|
||||
|
||||
```
|
||||
Limits:
|
||||
- Item size: 400 KB max
|
||||
- Partition key: 2048 bytes
|
||||
- Sort key: 1024 bytes
|
||||
- GSI: 20 per table
|
||||
|
||||
Pricing:
|
||||
- On-demand: $1.25 per million writes, $0.25 per million reads
|
||||
- Provisioned: Per RCU/WCU
|
||||
```
|
||||
|
||||
**Data Modeling Example:**
|
||||
|
||||
```
|
||||
# Single-table design for e-commerce
|
||||
PK SK Attributes
|
||||
USER#123 PROFILE {name, email, ...}
|
||||
USER#123 ORDER#456 {total, status, ...}
|
||||
USER#123 ORDER#456#ITEM#1 {product, qty, ...}
|
||||
PRODUCT#789 METADATA {name, price, ...}
|
||||
```
|
||||
|
||||
### Aurora
|
||||
|
||||
**Best for:** Relational data with complex queries
|
||||
|
||||
| Edition | Use Case | Scaling |
|
||||
|---------|----------|---------|
|
||||
| Aurora Serverless v2 | Variable workloads | 0.5-128 ACUs, auto |
|
||||
| Aurora Standard | Predictable workloads | Instance-based |
|
||||
| Aurora Global | Multi-region | Cross-region replication |
|
||||
|
||||
```
|
||||
Limits:
|
||||
- Storage: 128 TB max
|
||||
- Replicas: 15 read replicas
|
||||
- Connections: Instance-dependent
|
||||
|
||||
Pricing:
|
||||
- Serverless: $0.12 per ACU-hour
|
||||
- Standard: Instance + storage + I/O
|
||||
```
|
||||
|
||||
### Comparison: DynamoDB vs Aurora
|
||||
|
||||
| Factor | DynamoDB | Aurora |
|
||||
|--------|----------|--------|
|
||||
| Query flexibility | Limited (key-based) | Full SQL |
|
||||
| Scaling | Instant, unlimited | Minutes, up to limits |
|
||||
| Consistency | Eventually/Strong | ACID |
|
||||
| Cost model | Per-request | Per-hour |
|
||||
| Operational | Zero management | Some management |
|
||||
|
||||
---
|
||||
|
||||
## Storage Services
|
||||
|
||||
### S3 Storage Classes
|
||||
|
||||
| Class | Access Pattern | Retrieval | Cost (GB/mo) |
|
||||
|-------|---------------|-----------|--------------|
|
||||
| Standard | Frequent | Instant | $0.023 |
|
||||
| Intelligent-Tiering | Unknown | Instant | $0.023 + monitoring |
|
||||
| Standard-IA | Infrequent (30+ days) | Instant | $0.0125 |
|
||||
| One Zone-IA | Infrequent, single AZ | Instant | $0.01 |
|
||||
| Glacier Instant | Archive, instant access | Instant | $0.004 |
|
||||
| Glacier Flexible | Archive | Minutes-hours | $0.0036 |
|
||||
| Glacier Deep Archive | Long-term archive | 12-48 hours | $0.00099 |
|
||||
|
||||
### Lifecycle Policy Example
|
||||
|
||||
```json
|
||||
{
|
||||
"Rules": [
|
||||
{
|
||||
"ID": "Archive old data",
|
||||
"Status": "Enabled",
|
||||
"Transitions": [
|
||||
{
|
||||
"Days": 30,
|
||||
"StorageClass": "STANDARD_IA"
|
||||
},
|
||||
{
|
||||
"Days": 90,
|
||||
"StorageClass": "GLACIER"
|
||||
},
|
||||
{
|
||||
"Days": 365,
|
||||
"StorageClass": "DEEP_ARCHIVE"
|
||||
}
|
||||
],
|
||||
"Expiration": {
|
||||
"Days": 2555
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Block and File Storage
|
||||
|
||||
| Service | Use Case | Access |
|
||||
|---------|----------|--------|
|
||||
| EBS | EC2 block storage | Single instance |
|
||||
| EFS | Shared file system | Multiple instances |
|
||||
| FSx for Lustre | HPC workloads | High throughput |
|
||||
| FSx for Windows | Windows apps | SMB protocol |
|
||||
|
||||
---
|
||||
|
||||
## Messaging and Events
|
||||
|
||||
### Decision Matrix
|
||||
|
||||
| Pattern | Service | Use Case |
|
||||
|---------|---------|----------|
|
||||
| Event routing | EventBridge | Microservices, SaaS integration |
|
||||
| Pub/sub | SNS | Fan-out notifications |
|
||||
| Queue | SQS | Decoupling, buffering |
|
||||
| Streaming | Kinesis | Real-time analytics |
|
||||
| Message broker | Amazon MQ | Legacy migrations |
|
||||
|
||||
### EventBridge
|
||||
|
||||
**Best for:** Event-driven architectures, SaaS integration
|
||||
|
||||
```python
|
||||
# EventBridge rule pattern
|
||||
{
|
||||
"source": ["orders.service"],
|
||||
"detail-type": ["OrderCreated"],
|
||||
"detail": {
|
||||
"total": [{"numeric": [">=", 100]}]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### SQS
|
||||
|
||||
**Best for:** Decoupling services, handling load spikes
|
||||
|
||||
| Feature | Standard | FIFO |
|
||||
|---------|----------|------|
|
||||
| Throughput | Unlimited | 3000 msg/sec |
|
||||
| Ordering | Best effort | Guaranteed |
|
||||
| Delivery | At least once | Exactly once |
|
||||
| Deduplication | No | Yes |
|
||||
|
||||
```python
|
||||
# SQS with dead letter queue
|
||||
import boto3
|
||||
|
||||
sqs = boto3.client('sqs')
|
||||
|
||||
def process_with_dlq(queue_url, dlq_url, max_retries=3):
|
||||
response = sqs.receive_message(
|
||||
QueueUrl=queue_url,
|
||||
MaxNumberOfMessages=10,
|
||||
WaitTimeSeconds=20,
|
||||
AttributeNames=['ApproximateReceiveCount']
|
||||
)
|
||||
|
||||
for message in response.get('Messages', []):
|
||||
receive_count = int(message['Attributes']['ApproximateReceiveCount'])
|
||||
|
||||
try:
|
||||
process(message)
|
||||
sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
|
||||
except Exception as e:
|
||||
if receive_count >= max_retries:
|
||||
sqs.send_message(QueueUrl=dlq_url, MessageBody=message['Body'])
|
||||
sqs.delete_message(QueueUrl=queue_url, ReceiptHandle=message['ReceiptHandle'])
|
||||
```
|
||||
|
||||
### Kinesis
|
||||
|
||||
**Best for:** Real-time streaming data, analytics
|
||||
|
||||
| Service | Use Case |
|
||||
|---------|----------|
|
||||
| Data Streams | Custom processing |
|
||||
| Data Firehose | Direct to S3/Redshift |
|
||||
| Data Analytics | SQL on streams |
|
||||
| Video Streams | Video ingestion |
|
||||
|
||||
---
|
||||
|
||||
## API and Integration
|
||||
|
||||
### API Gateway vs AppSync
|
||||
|
||||
| Factor | API Gateway | AppSync |
|
||||
|--------|-------------|---------|
|
||||
| Protocol | REST, WebSocket | GraphQL |
|
||||
| Real-time | WebSocket setup | Built-in subscriptions |
|
||||
| Caching | Response caching | Field-level caching |
|
||||
| Integration | Lambda, HTTP, AWS | Lambda, DynamoDB, HTTP |
|
||||
| Pricing | Per request | Per request + data |
|
||||
|
||||
### API Gateway Configuration
|
||||
|
||||
```yaml
|
||||
# Throttling and caching
|
||||
Resources:
|
||||
ApiGateway:
|
||||
Type: AWS::ApiGateway::RestApi
|
||||
Properties:
|
||||
Name: my-api
|
||||
|
||||
ApiStage:
|
||||
Type: AWS::ApiGateway::Stage
|
||||
Properties:
|
||||
StageName: prod
|
||||
MethodSettings:
|
||||
- HttpMethod: "*"
|
||||
ResourcePath: "/*"
|
||||
ThrottlingBurstLimit: 500
|
||||
ThrottlingRateLimit: 1000
|
||||
CachingEnabled: true
|
||||
CacheTtlInSeconds: 300
|
||||
```
|
||||
|
||||
### Step Functions
|
||||
|
||||
**Best for:** Workflow orchestration, long-running processes
|
||||
|
||||
```json
|
||||
{
|
||||
"StartAt": "ProcessOrder",
|
||||
"States": {
|
||||
"ProcessOrder": {
|
||||
"Type": "Task",
|
||||
"Resource": "arn:aws:lambda:...:processOrder",
|
||||
"Next": "CheckInventory"
|
||||
},
|
||||
"CheckInventory": {
|
||||
"Type": "Choice",
|
||||
"Choices": [
|
||||
{
|
||||
"Variable": "$.inStock",
|
||||
"BooleanEquals": true,
|
||||
"Next": "ShipOrder"
|
||||
}
|
||||
],
|
||||
"Default": "BackOrder"
|
||||
},
|
||||
"ShipOrder": {
|
||||
"Type": "Task",
|
||||
"Resource": "arn:aws:lambda:...:shipOrder",
|
||||
"End": true
|
||||
},
|
||||
"BackOrder": {
|
||||
"Type": "Task",
|
||||
"Resource": "arn:aws:lambda:...:backOrder",
|
||||
"End": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Networking
|
||||
|
||||
### VPC Components
|
||||
|
||||
| Component | Purpose |
|
||||
|-----------|---------|
|
||||
| VPC | Isolated network |
|
||||
| Subnet | Network segment (public/private) |
|
||||
| Internet Gateway | Public internet access |
|
||||
| NAT Gateway | Private subnet outbound |
|
||||
| VPC Endpoint | Private AWS service access |
|
||||
| Transit Gateway | VPC interconnection |
|
||||
|
||||
### VPC Design Pattern
|
||||
|
||||
```
|
||||
VPC: 10.0.0.0/16
|
||||
|
||||
Public Subnets (AZ a, b, c):
|
||||
10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24
|
||||
- ALB, NAT Gateway, Bastion
|
||||
|
||||
Private Subnets (AZ a, b, c):
|
||||
10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24
|
||||
- Application servers, Lambda
|
||||
|
||||
Database Subnets (AZ a, b, c):
|
||||
10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24
|
||||
- RDS, ElastiCache
|
||||
```
|
||||
|
||||
### VPC Endpoints (Cost Savings)
|
||||
|
||||
```yaml
|
||||
# Interface endpoint for Secrets Manager
|
||||
SecretsManagerEndpoint:
|
||||
Type: AWS::EC2::VPCEndpoint
|
||||
Properties:
|
||||
VpcId: !Ref VPC
|
||||
ServiceName: !Sub com.amazonaws.${AWS::Region}.secretsmanager
|
||||
VpcEndpointType: Interface
|
||||
SubnetIds: !Ref PrivateSubnets
|
||||
SecurityGroupIds:
|
||||
- !Ref EndpointSecurityGroup
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security and Identity
|
||||
|
||||
### IAM Best Practices
|
||||
|
||||
```json
|
||||
// Least privilege policy example
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"dynamodb:GetItem",
|
||||
"dynamodb:PutItem",
|
||||
"dynamodb:Query"
|
||||
],
|
||||
"Resource": "arn:aws:dynamodb:us-east-1:123456789:table/users",
|
||||
"Condition": {
|
||||
"ForAllValues:StringEquals": {
|
||||
"dynamodb:LeadingKeys": ["${aws:userid}"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Secrets Manager vs Parameter Store
|
||||
|
||||
| Factor | Secrets Manager | Parameter Store |
|
||||
|--------|-----------------|-----------------|
|
||||
| Auto-rotation | Built-in | Manual |
|
||||
| Cross-account | Yes | Limited |
|
||||
| Pricing | $0.40/secret/month | Free (standard) |
|
||||
| Use case | Credentials, API keys | Config, non-secrets |
|
||||
|
||||
### Cognito Configuration
|
||||
|
||||
```yaml
|
||||
UserPool:
|
||||
Type: AWS::Cognito::UserPool
|
||||
Properties:
|
||||
UserPoolName: my-app-users
|
||||
AutoVerifiedAttributes:
|
||||
- email
|
||||
MfaConfiguration: OPTIONAL
|
||||
EnabledMfas:
|
||||
- SOFTWARE_TOKEN_MFA
|
||||
Policies:
|
||||
PasswordPolicy:
|
||||
MinimumLength: 12
|
||||
RequireLowercase: true
|
||||
RequireUppercase: true
|
||||
RequireNumbers: true
|
||||
RequireSymbols: true
|
||||
AccountRecoverySetting:
|
||||
RecoveryMechanisms:
|
||||
- Name: verified_email
|
||||
Priority: 1
|
||||
```
|
||||
Reference in New Issue
Block a user