# Migration Patterns Catalog ## Overview This catalog provides detailed descriptions of proven migration patterns, their use cases, implementation guidelines, and best practices. Each pattern includes code examples, diagrams, and lessons learned from real-world implementations. ## Database Migration Patterns ### 1. Expand-Contract Pattern **Use Case:** Schema evolution with zero downtime **Complexity:** Medium **Risk Level:** Low-Medium #### Description The Expand-Contract pattern allows for schema changes without downtime by following a three-phase approach: 1. **Expand:** Add new schema elements alongside existing ones 2. **Migrate:** Dual-write to both old and new schema during transition 3. **Contract:** Remove old schema elements after validation #### Implementation Steps ```sql -- Phase 1: Expand ALTER TABLE users ADD COLUMN email_new VARCHAR(255); CREATE INDEX CONCURRENTLY idx_users_email_new ON users(email_new); -- Phase 2: Migrate (Application Code) -- Write to both columns during transition period INSERT INTO users (name, email, email_new) VALUES (?, ?, ?); -- Backfill existing data UPDATE users SET email_new = email WHERE email_new IS NULL; -- Phase 3: Contract (after validation) ALTER TABLE users DROP COLUMN email; ALTER TABLE users RENAME COLUMN email_new TO email; ``` #### Pros and Cons **Pros:** - Zero downtime deployments - Safe rollback at any point - Gradual transition with validation **Cons:** - Increased storage during transition - More complex application logic - Extended migration timeline ### 2. Parallel Schema Pattern **Use Case:** Major database restructuring **Complexity:** High **Risk Level:** Medium #### Description Run new and old schemas in parallel, using feature flags to gradually route traffic to the new schema while maintaining the ability to rollback quickly. #### Implementation Example ```python class DatabaseRouter: def __init__(self, feature_flag_service): self.feature_flags = feature_flag_service self.old_db = OldDatabaseConnection() self.new_db = NewDatabaseConnection() def route_query(self, user_id, query_type): if self.feature_flags.is_enabled("new_schema", user_id): return self.new_db.execute(query_type) else: return self.old_db.execute(query_type) def dual_write(self, data): # Write to both databases for consistency success_old = self.old_db.write(data) success_new = self.new_db.write(transform_data(data)) if not (success_old and success_new): # Handle partial failures self.handle_dual_write_failure(data, success_old, success_new) ``` #### Best Practices - Implement data consistency checks between schemas - Use circuit breakers for automatic failover - Monitor performance impact of dual writes - Plan for data reconciliation processes ### 3. Event Sourcing Migration **Use Case:** Migrating systems with complex business logic **Complexity:** High **Risk Level:** Medium-High #### Description Capture all changes as events during migration, enabling replay and reconciliation capabilities. #### Event Store Schema ```sql CREATE TABLE migration_events ( event_id UUID PRIMARY KEY, aggregate_id UUID NOT NULL, event_type VARCHAR(100) NOT NULL, event_data JSONB NOT NULL, event_version INTEGER NOT NULL, occurred_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), processed_at TIMESTAMP WITH TIME ZONE ); ``` #### Migration Event Handler ```python class MigrationEventHandler: def __init__(self, old_store, new_store): self.old_store = old_store self.new_store = new_store self.event_log = [] def handle_update(self, entity_id, old_data, new_data): # Log the change as an event event = MigrationEvent( entity_id=entity_id, event_type="entity_migrated", old_data=old_data, new_data=new_data, timestamp=datetime.now() ) self.event_log.append(event) # Apply to new store success = self.new_store.update(entity_id, new_data) if not success: # Mark for retry event.status = "failed" self.schedule_retry(event) return success def replay_events(self, from_timestamp=None): """Replay events for reconciliation""" events = self.get_events_since(from_timestamp) for event in events: self.apply_event(event) ``` ## Service Migration Patterns ### 1. Strangler Fig Pattern **Use Case:** Legacy system replacement **Complexity:** Medium-High **Risk Level:** Medium #### Description Gradually replace legacy functionality by intercepting calls and routing them to new services, eventually "strangling" the legacy system. #### Implementation Architecture ```yaml # API Gateway Configuration apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: user-service-migration spec: http: - match: - headers: migration-flag: exact: "new" route: - destination: host: user-service-v2 - route: - destination: host: user-service-v1 ``` #### Strangler Proxy Implementation ```python class StranglerProxy: def __init__(self): self.legacy_service = LegacyUserService() self.new_service = NewUserService() self.feature_flags = FeatureFlagService() def handle_request(self, request): route = self.determine_route(request) if route == "new": return self.handle_with_new_service(request) elif route == "both": return self.handle_with_both_services(request) else: return self.handle_with_legacy_service(request) def determine_route(self, request): user_id = request.get('user_id') if self.feature_flags.is_enabled("new_user_service", user_id): if self.feature_flags.is_enabled("dual_write", user_id): return "both" else: return "new" else: return "legacy" ``` ### 2. Parallel Run Pattern **Use Case:** Risk mitigation for critical services **Complexity:** Medium **Risk Level:** Low-Medium #### Description Run both old and new services simultaneously, comparing outputs to validate correctness before switching traffic. #### Implementation ```python class ParallelRunManager: def __init__(self): self.primary_service = PrimaryService() self.candidate_service = CandidateService() self.comparator = ResponseComparator() self.metrics = MetricsCollector() async def parallel_execute(self, request): # Execute both services concurrently primary_task = asyncio.create_task( self.primary_service.process(request) ) candidate_task = asyncio.create_task( self.candidate_service.process(request) ) # Always wait for primary primary_result = await primary_task try: # Wait for candidate with timeout candidate_result = await asyncio.wait_for( candidate_task, timeout=5.0 ) # Compare results comparison = self.comparator.compare( primary_result, candidate_result ) # Record metrics self.metrics.record_comparison(comparison) except asyncio.TimeoutError: self.metrics.record_timeout("candidate") except Exception as e: self.metrics.record_error("candidate", str(e)) # Always return primary result return primary_result ``` ### 3. Blue-Green Deployment Pattern **Use Case:** Zero-downtime service updates **Complexity:** Low-Medium **Risk Level:** Low #### Description Maintain two identical production environments (blue and green), switching traffic between them for deployments. #### Kubernetes Implementation ```yaml # Blue Deployment apiVersion: apps/v1 kind: Deployment metadata: name: app-blue labels: version: blue spec: replicas: 3 selector: matchLabels: app: myapp version: blue template: metadata: labels: app: myapp version: blue spec: containers: - name: app image: myapp:v1.0.0 --- # Green Deployment apiVersion: apps/v1 kind: Deployment metadata: name: app-green labels: version: green spec: replicas: 3 selector: matchLabels: app: myapp version: green template: metadata: labels: app: myapp version: green spec: containers: - name: app image: myapp:v2.0.0 --- # Service (switches between blue and green) apiVersion: v1 kind: Service metadata: name: app-service spec: selector: app: myapp version: blue # Change to green for deployment ports: - port: 80 targetPort: 8080 ``` ## Infrastructure Migration Patterns ### 1. Lift and Shift Pattern **Use Case:** Quick cloud migration with minimal changes **Complexity:** Low-Medium **Risk Level:** Low #### Description Migrate applications to cloud infrastructure with minimal or no code changes, focusing on infrastructure compatibility. #### Migration Checklist ```yaml Pre-Migration Assessment: - inventory_current_infrastructure: - servers_and_specifications - network_configuration - storage_requirements - security_configurations - identify_dependencies: - database_connections - external_service_integrations - file_system_dependencies - assess_compatibility: - operating_system_versions - runtime_dependencies - license_requirements Migration Execution: - provision_target_infrastructure: - compute_instances - storage_volumes - network_configuration - security_groups - migrate_data: - database_backup_restore - file_system_replication - configuration_files - update_configurations: - connection_strings - environment_variables - dns_records - validate_functionality: - application_health_checks - end_to_end_testing - performance_validation ``` ### 2. Hybrid Cloud Migration **Use Case:** Gradual cloud adoption with on-premises integration **Complexity:** High **Risk Level:** Medium-High #### Description Maintain some components on-premises while migrating others to cloud, requiring secure connectivity and data synchronization. #### Network Architecture ```hcl # Terraform configuration for hybrid connectivity resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true } resource "aws_vpn_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "hybrid-vpn-gateway" } } resource "aws_customer_gateway" "main" { bgp_asn = 65000 ip_address = var.on_premises_public_ip type = "ipsec.1" tags = { Name = "on-premises-gateway" } } resource "aws_vpn_connection" "main" { vpn_gateway_id = aws_vpn_gateway.main.id customer_gateway_id = aws_customer_gateway.main.id type = "ipsec.1" static_routes_only = true } ``` #### Data Synchronization Pattern ```python class HybridDataSync: def __init__(self): self.on_prem_db = OnPremiseDatabase() self.cloud_db = CloudDatabase() self.sync_log = SyncLogManager() async def bidirectional_sync(self): """Synchronize data between on-premises and cloud""" # Get last sync timestamp last_sync = self.sync_log.get_last_sync_time() # Sync on-prem changes to cloud on_prem_changes = self.on_prem_db.get_changes_since(last_sync) for change in on_prem_changes: await self.apply_change_to_cloud(change) # Sync cloud changes to on-prem cloud_changes = self.cloud_db.get_changes_since(last_sync) for change in cloud_changes: await self.apply_change_to_on_prem(change) # Handle conflicts conflicts = self.detect_conflicts(on_prem_changes, cloud_changes) for conflict in conflicts: await self.resolve_conflict(conflict) # Update sync timestamp self.sync_log.record_sync_completion() async def apply_change_to_cloud(self, change): """Apply on-premises change to cloud database""" try: if change.operation == "INSERT": await self.cloud_db.insert(change.table, change.data) elif change.operation == "UPDATE": await self.cloud_db.update(change.table, change.key, change.data) elif change.operation == "DELETE": await self.cloud_db.delete(change.table, change.key) self.sync_log.record_success(change.id, "cloud") except Exception as e: self.sync_log.record_failure(change.id, "cloud", str(e)) raise ``` ### 3. Multi-Cloud Migration **Use Case:** Avoiding vendor lock-in or regulatory requirements **Complexity:** Very High **Risk Level:** High #### Description Distribute workloads across multiple cloud providers for resilience, compliance, or cost optimization. #### Service Mesh Configuration ```yaml # Istio configuration for multi-cloud service mesh apiVersion: networking.istio.io/v1beta1 kind: ServiceEntry metadata: name: aws-service spec: hosts: - aws-service.company.com ports: - number: 443 name: https protocol: HTTPS location: MESH_EXTERNAL resolution: DNS --- apiVersion: networking.istio.io/v1beta1 kind: VirtualService metadata: name: multi-cloud-routing spec: hosts: - user-service http: - match: - headers: region: exact: "us-east" route: - destination: host: aws-service.company.com weight: 100 - match: - headers: region: exact: "eu-west" route: - destination: host: gcp-service.company.com weight: 100 - route: # Default routing - destination: host: user-service subset: local weight: 80 - destination: host: aws-service.company.com weight: 20 ``` ## Feature Flag Patterns ### 1. Progressive Rollout Pattern **Use Case:** Gradual feature deployment with risk mitigation **Implementation:** ```python class ProgressiveRollout: def __init__(self, feature_name): self.feature_name = feature_name self.rollout_percentage = 0 self.user_buckets = {} def is_enabled_for_user(self, user_id): # Consistent user bucketing user_hash = hashlib.md5(f"{self.feature_name}:{user_id}".encode()).hexdigest() bucket = int(user_hash, 16) % 100 return bucket < self.rollout_percentage def increase_rollout(self, target_percentage, step_size=10): """Gradually increase rollout percentage""" while self.rollout_percentage < target_percentage: self.rollout_percentage = min( self.rollout_percentage + step_size, target_percentage ) # Monitor metrics before next increase yield self.rollout_percentage time.sleep(300) # Wait 5 minutes between increases ``` ### 2. Circuit Breaker Pattern **Use Case:** Automatic fallback during migration issues ```python class MigrationCircuitBreaker: def __init__(self, failure_threshold=5, timeout=60): self.failure_count = 0 self.failure_threshold = failure_threshold self.timeout = timeout self.last_failure_time = None self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN def call_new_service(self, request): if self.state == 'OPEN': if self.should_attempt_reset(): self.state = 'HALF_OPEN' else: return self.fallback_to_legacy(request) try: response = self.new_service.process(request) self.on_success() return response except Exception as e: self.on_failure() return self.fallback_to_legacy(request) def on_success(self): self.failure_count = 0 self.state = 'CLOSED' def on_failure(self): self.failure_count += 1 self.last_failure_time = time.time() if self.failure_count >= self.failure_threshold: self.state = 'OPEN' def should_attempt_reset(self): return (time.time() - self.last_failure_time) >= self.timeout ``` ## Migration Anti-Patterns ### 1. Big Bang Migration (Anti-Pattern) **Why to Avoid:** - High risk of complete system failure - Difficult to rollback - Extended downtime - All-or-nothing deployment **Better Alternative:** Use incremental migration patterns like Strangler Fig or Parallel Run. ### 2. No Rollback Plan (Anti-Pattern) **Why to Avoid:** - Cannot recover from failures - Increases business risk - Panic-driven decisions during issues **Better Alternative:** Always implement comprehensive rollback procedures before migration. ### 3. Insufficient Testing (Anti-Pattern) **Why to Avoid:** - Unknown compatibility issues - Performance degradation - Data corruption risks **Better Alternative:** Implement comprehensive testing at each migration phase. ## Pattern Selection Matrix | Migration Type | Complexity | Downtime Tolerance | Recommended Pattern | |---------------|------------|-------------------|-------------------| | Schema Change | Low | Zero | Expand-Contract | | Schema Change | High | Zero | Parallel Schema | | Service Replace | Medium | Zero | Strangler Fig | | Service Update | Low | Zero | Blue-Green | | Data Migration | High | Some | Event Sourcing | | Infrastructure | Low | Some | Lift and Shift | | Infrastructure | High | Zero | Hybrid Cloud | ## Success Metrics ### Technical Metrics - Migration completion rate - System availability during migration - Performance impact (response time, throughput) - Error rate changes - Rollback execution time ### Business Metrics - Customer impact score - Revenue protection - Time to value realization - Stakeholder satisfaction ### Operational Metrics - Team efficiency - Knowledge transfer effectiveness - Post-migration support requirements - Documentation completeness ## Lessons Learned ### Common Pitfalls 1. **Underestimating data dependencies** - Always map all data relationships 2. **Insufficient monitoring** - Implement comprehensive observability before migration 3. **Poor communication** - Keep all stakeholders informed throughout the process 4. **Rushed timelines** - Allow adequate time for testing and validation 5. **Ignoring performance impact** - Benchmark before and after migration ### Best Practices 1. **Start with low-risk migrations** - Build confidence and experience 2. **Automate everything possible** - Reduce human error and increase repeatability 3. **Test rollback procedures** - Ensure you can recover from any failure 4. **Monitor continuously** - Use real-time dashboards and alerting 5. **Document everything** - Create comprehensive runbooks and documentation This catalog serves as a reference for selecting appropriate migration patterns based on specific requirements, risk tolerance, and technical constraints.