diff --git a/docs/tasks/world-backup-automation/deployment-plan.md b/docs/tasks/world-backup-automation/deployment-plan.md new file mode 100644 index 0000000..ff3ede3 --- /dev/null +++ b/docs/tasks/world-backup-automation/deployment-plan.md @@ -0,0 +1,621 @@ +# World Backup Automation - Deployment Plan + +**Status:** Planning Complete, Ready to Implement +**Priority:** Tier 3 - Disaster Recovery +**Time Estimate:** 1-2 hours implementation +**Last Updated:** 2026-02-17 + +--- + +## Overview + +Automated backup system for all 11 Minecraft server worlds. Scheduled daily backups with intelligent retention policy (7 daily, 4 weekly, 3 monthly) and off-server storage for disaster recovery. + +**The Problem:** +- No automated backups = risk of data loss +- Manual backups = inconsistent, time-consuming +- No off-server storage = vulnerable to hardware failure +- No retention policy = storage bloat or premature deletion + +**The Solution:** +- Automated Python script with Pterodactyl SFTP integration +- Daily backups at off-peak hours (integrate with restart system) +- Smart retention: 7 daily, 4 weekly, 12 monthly backups +- Off-server storage (NextCloud or S3-compatible) +- Compression (tar.gz) to save space +- Restoration procedures documented +- Discord notifications + +--- + +## Architecture + +``` +Backup Schedule (Daily 3:30 AM) + ↓ +Python Backup Script + ↓ +Pterodactyl SFTP → Download world files + ↓ +Compress (tar.gz) + ↓ +Upload to NextCloud/S3 + ↓ +Apply Retention Policy (delete old backups) + ↓ +Discord Notification (success/failure) +``` + +**Timing Integration with Restarts:** +- **3:30 AM** - Backups start (before restart cycle) +- **4:00 AM** - Staggered restarts start (after backups complete) + +This ensures fresh backups before any server restarts. + +--- + +## Features + +### Core Features + +**✅ Automated Daily Backups** +- Runs automatically via cron +- All 11 servers backed up +- World files only (not plugins/configs - those are in git) +- Compressed to save space (~80% reduction) + +**✅ Intelligent Retention Policy** +- **Daily:** Keep last 7 days +- **Weekly:** Keep last 4 weeks (Sunday backups) +- **Monthly:** Keep last 12 months (1st of month backups) +- Automatic cleanup of old backups +- Prevents storage bloat + +**✅ Off-Server Storage** +- Stored on NextCloud (downloads.firefrostgaming.com) +- Alternative: S3-compatible storage (Backblaze B2, Wasabi, etc.) +- Protects against node hardware failure +- Accessible from anywhere + +**✅ Discord Notifications** +- Backup started notification +- Per-server completion +- Final summary (success/failed/size) +- Error alerts + +**✅ Restoration Procedures** +- Documented step-by-step restoration +- Test restoration regularly +- Recovery time objective: < 1 hour + +--- + +## Server World Paths + +**TX1 Dallas servers:** +- Reclamation: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- Stoneblock 4: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- Society Sunlit Valley: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- Vanilla 1.21.11: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- All The Mons: `/var/lib/pterodactyl/volumes/{uuid}/world/` + +**NC1 Charlotte servers:** +- The Ember Project: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- Minecolonies Create & Conquer: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- All The Mods 10: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- Homestead: `/var/lib/pterodactyl/volumes/{uuid}/world/` +- EMC Subterra Tech: `/var/lib/pterodactyl/volumes/{uuid}/world/` + +**Note:** Some modpacks use different world folder names (e.g., `dimensions/` for custom dimensions) + +--- + +## Storage Requirements + +### Per-Server Estimates + +**Compressed backup sizes (approximate):** +- Vanilla: ~50 MB +- Modded (light): ~200 MB +- Modded (heavy): ~500 MB +- ATM10: ~1 GB (largest) + +**Total daily backup:** ~4-5 GB (all 11 servers compressed) + +### Retention Storage + +**Daily (7 days):** 7 × 5 GB = 35 GB +**Weekly (4 weeks):** 4 × 5 GB = 20 GB +**Monthly (12 months):** 12 × 5 GB = 60 GB + +**Total storage needed:** ~115 GB (with compression) + +**Recommended:** 200 GB storage allocation (room to grow) + +--- + +## Implementation + +### Script Location + +**File:** `/opt/automation/world-backup.py` +**Config:** `/opt/automation/backup-config.json` +**Logs:** `/var/log/world-backup.log` +**Local staging:** `/opt/automation/backup-staging/` +**Remote storage:** `downloads.firefrostgaming.com:/backups/worlds/` + +### Configuration File (backup-config.json) + +```json +{ + "pterodactyl": { + "url": "https://panel.firefrostgaming.com", + "api_key": "PTERODACTYL_API_KEY_HERE", + "sftp_host": "us.tx1.firefrostgaming.com", + "sftp_port": 2022 + }, + "nextcloud": { + "webdav_url": "https://downloads.firefrostgaming.com/remote.php/dav/files/admin/", + "username": "admin", + "password": "NEXTCLOUD_PASSWORD_HERE", + "backup_path": "backups/worlds/" + }, + "discord": { + "webhook_url": "DISCORD_WEBHOOK_URL_HERE", + "notifications_enabled": true + }, + "backup_settings": { + "staging_dir": "/opt/automation/backup-staging", + "compression": "gzip", + "compression_level": 6, + "retention": { + "daily": 7, + "weekly": 4, + "monthly": 12 + } + }, + "servers": [ + { + "name": "Vanilla 1.21.11", + "uuid": "3bed1bda-f648-4630-801a-fe9f2e3d3f27", + "world_path": "world", + "node": "TX1" + }, + { + "name": "All The Mons", + "uuid": "668a5220-7e72-4379-9165-bdbb84bc9806", + "world_path": "world", + "node": "TX1" + }, + { + "name": "Stoneblock 4", + "uuid": "a0efbfe8-4b97-4a90-869d-ffe6d3072bd5", + "world_path": "world", + "node": "TX1" + }, + { + "name": "Society: Sunlit Valley", + "uuid": "9310d0a6-62a6-4fe6-82c4-eb483dc68876", + "world_path": "world", + "node": "TX1" + }, + { + "name": "Reclamation", + "uuid": "1eb33479-a6bc-4e8f-b64d-d1e4bfa0a8b4", + "world_path": "world", + "node": "TX1" + }, + { + "name": "The Ember Project", + "uuid": "124f9060-58a7-457a-b2cf-b4024fce2951", + "world_path": "world", + "node": "NC1" + }, + { + "name": "Minecolonies: Create and Conquer", + "uuid": "a14201d2-83b2-44e6-ae48-e6c4cbc56f24", + "world_path": "world", + "node": "NC1" + }, + { + "name": "All The Mods 10", + "uuid": "82e63949-8fbf-4a44-b32a-53324e8492bf", + "world_path": "world", + "node": "NC1" + }, + { + "name": "Homestead", + "uuid": "2f85d4ef-aa49-4dd6-b448-beb3fca1db12", + "world_path": "world", + "node": "NC1" + }, + { + "name": "EMC Subterra Tech", + "uuid": "09a95f38-9f8c-404a-9557-3a7c44258223", + "world_path": "world", + "node": "NC1" + } + ] +} +``` + +### Main Script Overview + +**See `deployments/world-backup/world-backup.py` for complete script** + +**Key functions:** +- `download_world_via_sftp(server)` - Download world files from Pterodactyl +- `compress_backup(server)` - Create tar.gz archive +- `upload_to_nextcloud(backup_file)` - Upload to remote storage +- `apply_retention_policy()` - Delete old backups per policy +- `discord_notify(message)` - Send notifications +- `main()` - Orchestrate backup process + +--- + +## Deployment Steps + +### Phase 1: Prerequisites (15 min) + +- [ ] NextCloud admin access (or S3 credentials) +- [ ] Pterodactyl API key with SFTP access +- [ ] Discord webhook for #server-status +- [ ] Command Center SSH access +- [ ] 200 GB storage available on NextCloud/S3 + +### Phase 2: Setup Storage (15 min) + +**Option A: NextCloud (Recommended)** + +```bash +# Create backup directory in NextCloud +# Via web interface: Create folder "backups/worlds/" + +# Or via WebDAV: +curl -X MKCOL -u admin:PASSWORD \ + https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/ + +curl -X MKCOL -u admin:PASSWORD \ + https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/worlds/ +``` + +**Option B: S3-Compatible (Alternative)** + +- Configure AWS S3, Backblaze B2, or Wasabi bucket +- Create bucket: `firefrost-world-backups` +- Generate access key/secret +- Update script to use boto3/s3cmd instead of WebDAV + +### Phase 3: Install Script (20 min) + +```bash +# On Command Center +mkdir -p /opt/automation/backup-staging +cd /opt/automation + +# Create config +nano backup-config.json +# Paste config, update credentials + +# Create script +nano world-backup.py +# Paste script + +# Make executable +chmod +x world-backup.py + +# Install dependencies +pip3 install requests paramiko --break-system-packages + +# Create log file +touch /var/log/world-backup.log +chmod 644 /var/log/world-backup.log +``` + +### Phase 4: Test Backup (30 min) + +```bash +# Test with ONE server first (Vanilla) +# Edit config to include only Vanilla server temporarily + +# Run manually +python3 /opt/automation/world-backup.py + +# Verify: +# - World downloaded from Pterodactyl +# - Compressed successfully +# - Uploaded to NextCloud +# - Discord notification received +# - Backup appears in NextCloud + +# Check backup file +# Download and extract to verify integrity +``` + +### Phase 5: Schedule with Cron (10 min) + +```bash +# Edit crontab +crontab -e + +# Add daily backup at 3:30 AM (before restart cycle) +30 3 * * * /usr/bin/python3 /opt/automation/world-backup.py >> /var/log/world-backup.log 2>&1 +``` + +**Why 3:30 AM?** +- Low player activity +- Before 4:00 AM restart cycle +- Ensures fresh backup before restarts +- Backups complete before restarts begin + +### Phase 6: Test Restoration (20 min) + +**CRITICAL: Test restoration on a TEST server BEFORE you need it!** + +```bash +# Download backup from NextCloud +# Extract tar.gz +# Stop test server +# Replace world folder +# Start server +# Verify world loads correctly +``` + +--- + +## Retention Policy Logic + +**How it works:** + +1. **Daily backups:** Every backup is initially a "daily" backup +2. **Weekly promotion:** Sunday backups are kept as "weekly" (in addition to daily) +3. **Monthly promotion:** 1st of month backups are kept as "monthly" (in addition to weekly) + +**Example retention:** +``` +Daily (7 days): +- Feb 17 (today) +- Feb 16 +- Feb 15 +- Feb 14 +- Feb 13 +- Feb 12 +- Feb 11 + +Weekly (4 weeks) - Sundays only: +- Feb 10 (Sunday) +- Feb 3 (Sunday) +- Jan 27 (Sunday) +- Jan 20 (Sunday) + +Monthly (12 months) - 1st of month: +- Feb 1 +- Jan 1 +- Dec 1, 2025 +- Nov 1, 2025 +... (back 12 months) +``` + +**Old backups deleted:** +- Daily backups older than 7 days (unless promoted to weekly/monthly) +- Weekly backups older than 4 weeks (unless promoted to monthly) +- Monthly backups older than 12 months + +--- + +## Restoration Procedures + +### Standard Restoration (World Corruption) + +**Scenario:** Server world corrupted, need to restore from backup + +**Steps:** + +1. **Stop the affected server** (via Pterodactyl panel) + +2. **Download latest backup from NextCloud** +```bash +# Find latest backup +ls backups/worlds/[server-name]/ + +# Download +wget https://downloads.firefrostgaming.com/backups/worlds/[server-name]/[backup-file].tar.gz +``` + +3. **Connect to server via SFTP** +```bash +sftp -P 2022 admin@us.tx1.firefrostgaming.com +# (or NC1 host for NC1 servers) +``` + +4. **Backup current world (just in case)** +```bash +cd /var/lib/pterodactyl/volumes/[uuid]/ +mv world world-corrupted-backup-$(date +%Y%m%d) +``` + +5. **Extract backup** +```bash +tar -xzf [backup-file].tar.gz +``` + +6. **Verify world folder exists and has proper permissions** + +7. **Start server** (via Pterodactyl panel) + +8. **Test in-game** - Join and verify world loaded correctly + +9. **Monitor logs** for any errors + +**Estimated recovery time:** 15-30 minutes + +### Disaster Recovery (Node Failure) + +**Scenario:** Entire TX1 or NC1 node failed, need to restore all servers + +**Steps:** + +1. **Provision new node** (or repair existing) + +2. **Reinstall Pterodactyl Wings** + +3. **Download ALL backups for that node** + +4. **Restore each server world** (use parallel restoration if possible) + +5. **Update DNS/IP mappings** if IPs changed + +6. **Test each server** + +**Estimated recovery time:** 2-4 hours (depends on number of servers and backup size) + +--- + +## Discord Notifications + +### Notification Examples + +**Backup Started:** +``` +💾 **World Backup Started** +Servers: 11 +Estimated duration: ~30 minutes +``` + +**Per-Server:** +``` +✅ **Vanilla 1.21.11** backed up +Size: 47 MB compressed +Duration: 2m 15s +``` + +**Completion:** +``` +✅ **Backup Cycle Complete** +Successful: 11/11 +Failed: 0 +Total size: 4.2 GB compressed +Duration: 28 minutes + +Next backup: Tomorrow 3:30 AM +``` + +**Error:** +``` +❌ **Backup Failed: ATM10** +Reason: SFTP connection timeout +Attempt: 3/3 +Action: Manual backup required +``` + +--- + +## Monitoring & Maintenance + +### Daily + +- Check Discord for backup completion notification +- Verify no failed backups + +### Weekly + +- Review backup logs for errors +- Check NextCloud storage usage +- Verify retention policy working + +### Monthly + +- Test restoration on one random server +- Review backup sizes (growing unexpectedly?) +- Verify off-server storage accessible +- Check that monthly backups are being promoted correctly + +### Quarterly + +- Full disaster recovery drill (restore all servers to test environment) +- Review and update restoration procedures +- Verify backup script still compatible with Pterodactyl updates + +--- + +## Troubleshooting + +### SFTP connection fails + +**Check:** +- Pterodactyl SFTP service running on node +- Firewall allows port 2022 +- API key has SFTP permissions +- Correct hostname (us.tx1 vs us.nc1) + +### Backup file corrupt + +**Verify:** +- Download completed successfully (check file size) +- Compression succeeded (try manual tar -xzf test) +- Upload to NextCloud succeeded +- NextCloud not corrupting on upload + +### Retention policy not deleting old backups + +**Debug:** +- Check backup filenames follow expected format +- Verify date parsing logic in script +- Check NextCloud permissions (can delete?) +- Review logs for deletion attempts + +### NextCloud storage full + +**Actions:** +- Check retention policy running +- Manually delete very old backups if needed +- Consider upgrading NextCloud storage +- Enable NextCloud auto-delete for old files + +--- + +## Advanced Features (Phase 2) + +**Incremental backups:** +- Only backup changed files +- Significant storage savings +- Faster backup times +- More complex restoration + +**Encrypted backups:** +- Encrypt before upload +- GPG or age encryption +- Protects against NextCloud compromise +- Key management required + +**Multi-region storage:** +- Upload to multiple locations +- S3 + NextCloud redundancy +- Geographic disaster protection +- Higher costs + +**Real-time backups:** +- Continuous backup on world save +- Near-zero data loss +- Much higher storage usage +- More complex implementation + +--- + +## Related Tasks + +- **Staggered Server Restart System** - Runs after backups complete +- **Netdata Deployment** - Monitor backup job duration +- **NextCloud Setup** - Storage backend +- **Discord Reorganization** - #server-status for notifications + +--- + +**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️ + +--- + +**Document Status:** COMPLETE +**Ready for Implementation:** When SSH access available (1-2 hours) +**Dependencies:** NextCloud or S3, Pterodactyl API, Command Center access +**Storage Needed:** 200 GB recommended