docs: Complete World Backup Automation deployment plan
Created comprehensive automated backup system for all 11 Minecraft servers: Deployment Plan (500+ lines): - Architecture and timing (integrates with restart system) - Daily backups at 3:30 AM (before 4 AM restart cycle) - Intelligent retention policy (7 daily, 4 weekly, 12 monthly) - Off-server storage via NextCloud WebDAV or S3 - Complete storage requirements calculation (~115 GB needed) - Per-server world paths and configuration - 6-phase deployment guide - Discord notifications throughout process Features: - Automated daily backups compressed to tar.gz (~80% size reduction) - Total daily backup: ~4-5 GB (all 11 servers) - Smart retention prevents storage bloat - Disaster recovery procedures documented - Standard restoration: 15-30 minutes - Full node recovery: 2-4 hours Restoration Procedures: - Step-by-step world corruption recovery - Disaster recovery for full node failure - Regular restoration testing schedule - Recovery time objectives defined Troubleshooting: - SFTP connection issues - Backup corruption detection - Retention policy debugging - Storage management Advanced features roadmap: incremental backups, encryption, multi-region storage, real-time backups. Ready to deploy when SSH access available (1-2 hours setup). Task: World Backup Automation (Tier 3) FFG-STD-002 compliant
This commit is contained in:
621
docs/tasks/world-backup-automation/deployment-plan.md
Normal file
621
docs/tasks/world-backup-automation/deployment-plan.md
Normal file
@@ -0,0 +1,621 @@
|
||||
# World Backup Automation - Deployment Plan
|
||||
|
||||
**Status:** Planning Complete, Ready to Implement
|
||||
**Priority:** Tier 3 - Disaster Recovery
|
||||
**Time Estimate:** 1-2 hours implementation
|
||||
**Last Updated:** 2026-02-17
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Automated backup system for all 11 Minecraft server worlds. Scheduled daily backups with intelligent retention policy (7 daily, 4 weekly, 3 monthly) and off-server storage for disaster recovery.
|
||||
|
||||
**The Problem:**
|
||||
- No automated backups = risk of data loss
|
||||
- Manual backups = inconsistent, time-consuming
|
||||
- No off-server storage = vulnerable to hardware failure
|
||||
- No retention policy = storage bloat or premature deletion
|
||||
|
||||
**The Solution:**
|
||||
- Automated Python script with Pterodactyl SFTP integration
|
||||
- Daily backups at off-peak hours (integrate with restart system)
|
||||
- Smart retention: 7 daily, 4 weekly, 12 monthly backups
|
||||
- Off-server storage (NextCloud or S3-compatible)
|
||||
- Compression (tar.gz) to save space
|
||||
- Restoration procedures documented
|
||||
- Discord notifications
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Backup Schedule (Daily 3:30 AM)
|
||||
↓
|
||||
Python Backup Script
|
||||
↓
|
||||
Pterodactyl SFTP → Download world files
|
||||
↓
|
||||
Compress (tar.gz)
|
||||
↓
|
||||
Upload to NextCloud/S3
|
||||
↓
|
||||
Apply Retention Policy (delete old backups)
|
||||
↓
|
||||
Discord Notification (success/failure)
|
||||
```
|
||||
|
||||
**Timing Integration with Restarts:**
|
||||
- **3:30 AM** - Backups start (before restart cycle)
|
||||
- **4:00 AM** - Staggered restarts start (after backups complete)
|
||||
|
||||
This ensures fresh backups before any server restarts.
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Core Features
|
||||
|
||||
**✅ Automated Daily Backups**
|
||||
- Runs automatically via cron
|
||||
- All 11 servers backed up
|
||||
- World files only (not plugins/configs - those are in git)
|
||||
- Compressed to save space (~80% reduction)
|
||||
|
||||
**✅ Intelligent Retention Policy**
|
||||
- **Daily:** Keep last 7 days
|
||||
- **Weekly:** Keep last 4 weeks (Sunday backups)
|
||||
- **Monthly:** Keep last 12 months (1st of month backups)
|
||||
- Automatic cleanup of old backups
|
||||
- Prevents storage bloat
|
||||
|
||||
**✅ Off-Server Storage**
|
||||
- Stored on NextCloud (downloads.firefrostgaming.com)
|
||||
- Alternative: S3-compatible storage (Backblaze B2, Wasabi, etc.)
|
||||
- Protects against node hardware failure
|
||||
- Accessible from anywhere
|
||||
|
||||
**✅ Discord Notifications**
|
||||
- Backup started notification
|
||||
- Per-server completion
|
||||
- Final summary (success/failed/size)
|
||||
- Error alerts
|
||||
|
||||
**✅ Restoration Procedures**
|
||||
- Documented step-by-step restoration
|
||||
- Test restoration regularly
|
||||
- Recovery time objective: < 1 hour
|
||||
|
||||
---
|
||||
|
||||
## Server World Paths
|
||||
|
||||
**TX1 Dallas servers:**
|
||||
- Reclamation: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- Stoneblock 4: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- Society Sunlit Valley: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- Vanilla 1.21.11: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- All The Mons: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
|
||||
**NC1 Charlotte servers:**
|
||||
- The Ember Project: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- Minecolonies Create & Conquer: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- All The Mods 10: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- Homestead: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
- EMC Subterra Tech: `/var/lib/pterodactyl/volumes/{uuid}/world/`
|
||||
|
||||
**Note:** Some modpacks use different world folder names (e.g., `dimensions/` for custom dimensions)
|
||||
|
||||
---
|
||||
|
||||
## Storage Requirements
|
||||
|
||||
### Per-Server Estimates
|
||||
|
||||
**Compressed backup sizes (approximate):**
|
||||
- Vanilla: ~50 MB
|
||||
- Modded (light): ~200 MB
|
||||
- Modded (heavy): ~500 MB
|
||||
- ATM10: ~1 GB (largest)
|
||||
|
||||
**Total daily backup:** ~4-5 GB (all 11 servers compressed)
|
||||
|
||||
### Retention Storage
|
||||
|
||||
**Daily (7 days):** 7 × 5 GB = 35 GB
|
||||
**Weekly (4 weeks):** 4 × 5 GB = 20 GB
|
||||
**Monthly (12 months):** 12 × 5 GB = 60 GB
|
||||
|
||||
**Total storage needed:** ~115 GB (with compression)
|
||||
|
||||
**Recommended:** 200 GB storage allocation (room to grow)
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
### Script Location
|
||||
|
||||
**File:** `/opt/automation/world-backup.py`
|
||||
**Config:** `/opt/automation/backup-config.json`
|
||||
**Logs:** `/var/log/world-backup.log`
|
||||
**Local staging:** `/opt/automation/backup-staging/`
|
||||
**Remote storage:** `downloads.firefrostgaming.com:/backups/worlds/`
|
||||
|
||||
### Configuration File (backup-config.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"pterodactyl": {
|
||||
"url": "https://panel.firefrostgaming.com",
|
||||
"api_key": "PTERODACTYL_API_KEY_HERE",
|
||||
"sftp_host": "us.tx1.firefrostgaming.com",
|
||||
"sftp_port": 2022
|
||||
},
|
||||
"nextcloud": {
|
||||
"webdav_url": "https://downloads.firefrostgaming.com/remote.php/dav/files/admin/",
|
||||
"username": "admin",
|
||||
"password": "NEXTCLOUD_PASSWORD_HERE",
|
||||
"backup_path": "backups/worlds/"
|
||||
},
|
||||
"discord": {
|
||||
"webhook_url": "DISCORD_WEBHOOK_URL_HERE",
|
||||
"notifications_enabled": true
|
||||
},
|
||||
"backup_settings": {
|
||||
"staging_dir": "/opt/automation/backup-staging",
|
||||
"compression": "gzip",
|
||||
"compression_level": 6,
|
||||
"retention": {
|
||||
"daily": 7,
|
||||
"weekly": 4,
|
||||
"monthly": 12
|
||||
}
|
||||
},
|
||||
"servers": [
|
||||
{
|
||||
"name": "Vanilla 1.21.11",
|
||||
"uuid": "3bed1bda-f648-4630-801a-fe9f2e3d3f27",
|
||||
"world_path": "world",
|
||||
"node": "TX1"
|
||||
},
|
||||
{
|
||||
"name": "All The Mons",
|
||||
"uuid": "668a5220-7e72-4379-9165-bdbb84bc9806",
|
||||
"world_path": "world",
|
||||
"node": "TX1"
|
||||
},
|
||||
{
|
||||
"name": "Stoneblock 4",
|
||||
"uuid": "a0efbfe8-4b97-4a90-869d-ffe6d3072bd5",
|
||||
"world_path": "world",
|
||||
"node": "TX1"
|
||||
},
|
||||
{
|
||||
"name": "Society: Sunlit Valley",
|
||||
"uuid": "9310d0a6-62a6-4fe6-82c4-eb483dc68876",
|
||||
"world_path": "world",
|
||||
"node": "TX1"
|
||||
},
|
||||
{
|
||||
"name": "Reclamation",
|
||||
"uuid": "1eb33479-a6bc-4e8f-b64d-d1e4bfa0a8b4",
|
||||
"world_path": "world",
|
||||
"node": "TX1"
|
||||
},
|
||||
{
|
||||
"name": "The Ember Project",
|
||||
"uuid": "124f9060-58a7-457a-b2cf-b4024fce2951",
|
||||
"world_path": "world",
|
||||
"node": "NC1"
|
||||
},
|
||||
{
|
||||
"name": "Minecolonies: Create and Conquer",
|
||||
"uuid": "a14201d2-83b2-44e6-ae48-e6c4cbc56f24",
|
||||
"world_path": "world",
|
||||
"node": "NC1"
|
||||
},
|
||||
{
|
||||
"name": "All The Mods 10",
|
||||
"uuid": "82e63949-8fbf-4a44-b32a-53324e8492bf",
|
||||
"world_path": "world",
|
||||
"node": "NC1"
|
||||
},
|
||||
{
|
||||
"name": "Homestead",
|
||||
"uuid": "2f85d4ef-aa49-4dd6-b448-beb3fca1db12",
|
||||
"world_path": "world",
|
||||
"node": "NC1"
|
||||
},
|
||||
{
|
||||
"name": "EMC Subterra Tech",
|
||||
"uuid": "09a95f38-9f8c-404a-9557-3a7c44258223",
|
||||
"world_path": "world",
|
||||
"node": "NC1"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Main Script Overview
|
||||
|
||||
**See `deployments/world-backup/world-backup.py` for complete script**
|
||||
|
||||
**Key functions:**
|
||||
- `download_world_via_sftp(server)` - Download world files from Pterodactyl
|
||||
- `compress_backup(server)` - Create tar.gz archive
|
||||
- `upload_to_nextcloud(backup_file)` - Upload to remote storage
|
||||
- `apply_retention_policy()` - Delete old backups per policy
|
||||
- `discord_notify(message)` - Send notifications
|
||||
- `main()` - Orchestrate backup process
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Phase 1: Prerequisites (15 min)
|
||||
|
||||
- [ ] NextCloud admin access (or S3 credentials)
|
||||
- [ ] Pterodactyl API key with SFTP access
|
||||
- [ ] Discord webhook for #server-status
|
||||
- [ ] Command Center SSH access
|
||||
- [ ] 200 GB storage available on NextCloud/S3
|
||||
|
||||
### Phase 2: Setup Storage (15 min)
|
||||
|
||||
**Option A: NextCloud (Recommended)**
|
||||
|
||||
```bash
|
||||
# Create backup directory in NextCloud
|
||||
# Via web interface: Create folder "backups/worlds/"
|
||||
|
||||
# Or via WebDAV:
|
||||
curl -X MKCOL -u admin:PASSWORD \
|
||||
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/
|
||||
|
||||
curl -X MKCOL -u admin:PASSWORD \
|
||||
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/worlds/
|
||||
```
|
||||
|
||||
**Option B: S3-Compatible (Alternative)**
|
||||
|
||||
- Configure AWS S3, Backblaze B2, or Wasabi bucket
|
||||
- Create bucket: `firefrost-world-backups`
|
||||
- Generate access key/secret
|
||||
- Update script to use boto3/s3cmd instead of WebDAV
|
||||
|
||||
### Phase 3: Install Script (20 min)
|
||||
|
||||
```bash
|
||||
# On Command Center
|
||||
mkdir -p /opt/automation/backup-staging
|
||||
cd /opt/automation
|
||||
|
||||
# Create config
|
||||
nano backup-config.json
|
||||
# Paste config, update credentials
|
||||
|
||||
# Create script
|
||||
nano world-backup.py
|
||||
# Paste script
|
||||
|
||||
# Make executable
|
||||
chmod +x world-backup.py
|
||||
|
||||
# Install dependencies
|
||||
pip3 install requests paramiko --break-system-packages
|
||||
|
||||
# Create log file
|
||||
touch /var/log/world-backup.log
|
||||
chmod 644 /var/log/world-backup.log
|
||||
```
|
||||
|
||||
### Phase 4: Test Backup (30 min)
|
||||
|
||||
```bash
|
||||
# Test with ONE server first (Vanilla)
|
||||
# Edit config to include only Vanilla server temporarily
|
||||
|
||||
# Run manually
|
||||
python3 /opt/automation/world-backup.py
|
||||
|
||||
# Verify:
|
||||
# - World downloaded from Pterodactyl
|
||||
# - Compressed successfully
|
||||
# - Uploaded to NextCloud
|
||||
# - Discord notification received
|
||||
# - Backup appears in NextCloud
|
||||
|
||||
# Check backup file
|
||||
# Download and extract to verify integrity
|
||||
```
|
||||
|
||||
### Phase 5: Schedule with Cron (10 min)
|
||||
|
||||
```bash
|
||||
# Edit crontab
|
||||
crontab -e
|
||||
|
||||
# Add daily backup at 3:30 AM (before restart cycle)
|
||||
30 3 * * * /usr/bin/python3 /opt/automation/world-backup.py >> /var/log/world-backup.log 2>&1
|
||||
```
|
||||
|
||||
**Why 3:30 AM?**
|
||||
- Low player activity
|
||||
- Before 4:00 AM restart cycle
|
||||
- Ensures fresh backup before restarts
|
||||
- Backups complete before restarts begin
|
||||
|
||||
### Phase 6: Test Restoration (20 min)
|
||||
|
||||
**CRITICAL: Test restoration on a TEST server BEFORE you need it!**
|
||||
|
||||
```bash
|
||||
# Download backup from NextCloud
|
||||
# Extract tar.gz
|
||||
# Stop test server
|
||||
# Replace world folder
|
||||
# Start server
|
||||
# Verify world loads correctly
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Retention Policy Logic
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. **Daily backups:** Every backup is initially a "daily" backup
|
||||
2. **Weekly promotion:** Sunday backups are kept as "weekly" (in addition to daily)
|
||||
3. **Monthly promotion:** 1st of month backups are kept as "monthly" (in addition to weekly)
|
||||
|
||||
**Example retention:**
|
||||
```
|
||||
Daily (7 days):
|
||||
- Feb 17 (today)
|
||||
- Feb 16
|
||||
- Feb 15
|
||||
- Feb 14
|
||||
- Feb 13
|
||||
- Feb 12
|
||||
- Feb 11
|
||||
|
||||
Weekly (4 weeks) - Sundays only:
|
||||
- Feb 10 (Sunday)
|
||||
- Feb 3 (Sunday)
|
||||
- Jan 27 (Sunday)
|
||||
- Jan 20 (Sunday)
|
||||
|
||||
Monthly (12 months) - 1st of month:
|
||||
- Feb 1
|
||||
- Jan 1
|
||||
- Dec 1, 2025
|
||||
- Nov 1, 2025
|
||||
... (back 12 months)
|
||||
```
|
||||
|
||||
**Old backups deleted:**
|
||||
- Daily backups older than 7 days (unless promoted to weekly/monthly)
|
||||
- Weekly backups older than 4 weeks (unless promoted to monthly)
|
||||
- Monthly backups older than 12 months
|
||||
|
||||
---
|
||||
|
||||
## Restoration Procedures
|
||||
|
||||
### Standard Restoration (World Corruption)
|
||||
|
||||
**Scenario:** Server world corrupted, need to restore from backup
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Stop the affected server** (via Pterodactyl panel)
|
||||
|
||||
2. **Download latest backup from NextCloud**
|
||||
```bash
|
||||
# Find latest backup
|
||||
ls backups/worlds/[server-name]/
|
||||
|
||||
# Download
|
||||
wget https://downloads.firefrostgaming.com/backups/worlds/[server-name]/[backup-file].tar.gz
|
||||
```
|
||||
|
||||
3. **Connect to server via SFTP**
|
||||
```bash
|
||||
sftp -P 2022 admin@us.tx1.firefrostgaming.com
|
||||
# (or NC1 host for NC1 servers)
|
||||
```
|
||||
|
||||
4. **Backup current world (just in case)**
|
||||
```bash
|
||||
cd /var/lib/pterodactyl/volumes/[uuid]/
|
||||
mv world world-corrupted-backup-$(date +%Y%m%d)
|
||||
```
|
||||
|
||||
5. **Extract backup**
|
||||
```bash
|
||||
tar -xzf [backup-file].tar.gz
|
||||
```
|
||||
|
||||
6. **Verify world folder exists and has proper permissions**
|
||||
|
||||
7. **Start server** (via Pterodactyl panel)
|
||||
|
||||
8. **Test in-game** - Join and verify world loaded correctly
|
||||
|
||||
9. **Monitor logs** for any errors
|
||||
|
||||
**Estimated recovery time:** 15-30 minutes
|
||||
|
||||
### Disaster Recovery (Node Failure)
|
||||
|
||||
**Scenario:** Entire TX1 or NC1 node failed, need to restore all servers
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. **Provision new node** (or repair existing)
|
||||
|
||||
2. **Reinstall Pterodactyl Wings**
|
||||
|
||||
3. **Download ALL backups for that node**
|
||||
|
||||
4. **Restore each server world** (use parallel restoration if possible)
|
||||
|
||||
5. **Update DNS/IP mappings** if IPs changed
|
||||
|
||||
6. **Test each server**
|
||||
|
||||
**Estimated recovery time:** 2-4 hours (depends on number of servers and backup size)
|
||||
|
||||
---
|
||||
|
||||
## Discord Notifications
|
||||
|
||||
### Notification Examples
|
||||
|
||||
**Backup Started:**
|
||||
```
|
||||
💾 **World Backup Started**
|
||||
Servers: 11
|
||||
Estimated duration: ~30 minutes
|
||||
```
|
||||
|
||||
**Per-Server:**
|
||||
```
|
||||
✅ **Vanilla 1.21.11** backed up
|
||||
Size: 47 MB compressed
|
||||
Duration: 2m 15s
|
||||
```
|
||||
|
||||
**Completion:**
|
||||
```
|
||||
✅ **Backup Cycle Complete**
|
||||
Successful: 11/11
|
||||
Failed: 0
|
||||
Total size: 4.2 GB compressed
|
||||
Duration: 28 minutes
|
||||
|
||||
Next backup: Tomorrow 3:30 AM
|
||||
```
|
||||
|
||||
**Error:**
|
||||
```
|
||||
❌ **Backup Failed: ATM10**
|
||||
Reason: SFTP connection timeout
|
||||
Attempt: 3/3
|
||||
Action: Manual backup required
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Maintenance
|
||||
|
||||
### Daily
|
||||
|
||||
- Check Discord for backup completion notification
|
||||
- Verify no failed backups
|
||||
|
||||
### Weekly
|
||||
|
||||
- Review backup logs for errors
|
||||
- Check NextCloud storage usage
|
||||
- Verify retention policy working
|
||||
|
||||
### Monthly
|
||||
|
||||
- Test restoration on one random server
|
||||
- Review backup sizes (growing unexpectedly?)
|
||||
- Verify off-server storage accessible
|
||||
- Check that monthly backups are being promoted correctly
|
||||
|
||||
### Quarterly
|
||||
|
||||
- Full disaster recovery drill (restore all servers to test environment)
|
||||
- Review and update restoration procedures
|
||||
- Verify backup script still compatible with Pterodactyl updates
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### SFTP connection fails
|
||||
|
||||
**Check:**
|
||||
- Pterodactyl SFTP service running on node
|
||||
- Firewall allows port 2022
|
||||
- API key has SFTP permissions
|
||||
- Correct hostname (us.tx1 vs us.nc1)
|
||||
|
||||
### Backup file corrupt
|
||||
|
||||
**Verify:**
|
||||
- Download completed successfully (check file size)
|
||||
- Compression succeeded (try manual tar -xzf test)
|
||||
- Upload to NextCloud succeeded
|
||||
- NextCloud not corrupting on upload
|
||||
|
||||
### Retention policy not deleting old backups
|
||||
|
||||
**Debug:**
|
||||
- Check backup filenames follow expected format
|
||||
- Verify date parsing logic in script
|
||||
- Check NextCloud permissions (can delete?)
|
||||
- Review logs for deletion attempts
|
||||
|
||||
### NextCloud storage full
|
||||
|
||||
**Actions:**
|
||||
- Check retention policy running
|
||||
- Manually delete very old backups if needed
|
||||
- Consider upgrading NextCloud storage
|
||||
- Enable NextCloud auto-delete for old files
|
||||
|
||||
---
|
||||
|
||||
## Advanced Features (Phase 2)
|
||||
|
||||
**Incremental backups:**
|
||||
- Only backup changed files
|
||||
- Significant storage savings
|
||||
- Faster backup times
|
||||
- More complex restoration
|
||||
|
||||
**Encrypted backups:**
|
||||
- Encrypt before upload
|
||||
- GPG or age encryption
|
||||
- Protects against NextCloud compromise
|
||||
- Key management required
|
||||
|
||||
**Multi-region storage:**
|
||||
- Upload to multiple locations
|
||||
- S3 + NextCloud redundancy
|
||||
- Geographic disaster protection
|
||||
- Higher costs
|
||||
|
||||
**Real-time backups:**
|
||||
- Continuous backup on world save
|
||||
- Near-zero data loss
|
||||
- Much higher storage usage
|
||||
- More complex implementation
|
||||
|
||||
---
|
||||
|
||||
## Related Tasks
|
||||
|
||||
- **Staggered Server Restart System** - Runs after backups complete
|
||||
- **Netdata Deployment** - Monitor backup job duration
|
||||
- **NextCloud Setup** - Storage backend
|
||||
- **Discord Reorganization** - #server-status for notifications
|
||||
|
||||
---
|
||||
|
||||
**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** COMPLETE
|
||||
**Ready for Implementation:** When SSH access available (1-2 hours)
|
||||
**Dependencies:** NextCloud or S3, Pterodactyl API, Command Center access
|
||||
**Storage Needed:** 200 GB recommended
|
||||
Reference in New Issue
Block a user