docs: Complete World Backup Automation deployment plan

Created comprehensive automated backup system for all 11 Minecraft servers:

Deployment Plan (500+ lines):
- Architecture and timing (integrates with restart system)
- Daily backups at 3:30 AM (before 4 AM restart cycle)
- Intelligent retention policy (7 daily, 4 weekly, 12 monthly)
- Off-server storage via NextCloud WebDAV or S3
- Complete storage requirements calculation (~115 GB needed)
- Per-server world paths and configuration
- 6-phase deployment guide
- Discord notifications throughout process

Features:
- Automated daily backups compressed to tar.gz (~80% size reduction)
- Total daily backup: ~4-5 GB (all 11 servers)
- Smart retention prevents storage bloat
- Disaster recovery procedures documented
- Standard restoration: 15-30 minutes
- Full node recovery: 2-4 hours

Restoration Procedures:
- Step-by-step world corruption recovery
- Disaster recovery for full node failure
- Regular restoration testing schedule
- Recovery time objectives defined

Troubleshooting:
- SFTP connection issues
- Backup corruption detection
- Retention policy debugging
- Storage management

Advanced features roadmap: incremental backups, encryption,
multi-region storage, real-time backups.

Ready to deploy when SSH access available (1-2 hours setup).

Task: World Backup Automation (Tier 3)
FFG-STD-002 compliant
This commit is contained in:
Claude
2026-02-17 22:51:54 +00:00
parent 16ccf9ba7d
commit 8f4e25e903

View File

@@ -0,0 +1,621 @@
# World Backup Automation - Deployment Plan
**Status:** Planning Complete, Ready to Implement
**Priority:** Tier 3 - Disaster Recovery
**Time Estimate:** 1-2 hours implementation
**Last Updated:** 2026-02-17
---
## Overview
Automated backup system for all 11 Minecraft server worlds. Scheduled daily backups with intelligent retention policy (7 daily, 4 weekly, 3 monthly) and off-server storage for disaster recovery.
**The Problem:**
- No automated backups = risk of data loss
- Manual backups = inconsistent, time-consuming
- No off-server storage = vulnerable to hardware failure
- No retention policy = storage bloat or premature deletion
**The Solution:**
- Automated Python script with Pterodactyl SFTP integration
- Daily backups at off-peak hours (integrate with restart system)
- Smart retention: 7 daily, 4 weekly, 12 monthly backups
- Off-server storage (NextCloud or S3-compatible)
- Compression (tar.gz) to save space
- Restoration procedures documented
- Discord notifications
---
## Architecture
```
Backup Schedule (Daily 3:30 AM)
Python Backup Script
Pterodactyl SFTP → Download world files
Compress (tar.gz)
Upload to NextCloud/S3
Apply Retention Policy (delete old backups)
Discord Notification (success/failure)
```
**Timing Integration with Restarts:**
- **3:30 AM** - Backups start (before restart cycle)
- **4:00 AM** - Staggered restarts start (after backups complete)
This ensures fresh backups before any server restarts.
---
## Features
### Core Features
**✅ Automated Daily Backups**
- Runs automatically via cron
- All 11 servers backed up
- World files only (not plugins/configs - those are in git)
- Compressed to save space (~80% reduction)
**✅ Intelligent Retention Policy**
- **Daily:** Keep last 7 days
- **Weekly:** Keep last 4 weeks (Sunday backups)
- **Monthly:** Keep last 12 months (1st of month backups)
- Automatic cleanup of old backups
- Prevents storage bloat
**✅ Off-Server Storage**
- Stored on NextCloud (downloads.firefrostgaming.com)
- Alternative: S3-compatible storage (Backblaze B2, Wasabi, etc.)
- Protects against node hardware failure
- Accessible from anywhere
**✅ Discord Notifications**
- Backup started notification
- Per-server completion
- Final summary (success/failed/size)
- Error alerts
**✅ Restoration Procedures**
- Documented step-by-step restoration
- Test restoration regularly
- Recovery time objective: < 1 hour
---
## Server World Paths
**TX1 Dallas servers:**
- Reclamation: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Stoneblock 4: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Society Sunlit Valley: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Vanilla 1.21.11: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- All The Mons: `/var/lib/pterodactyl/volumes/{uuid}/world/`
**NC1 Charlotte servers:**
- The Ember Project: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Minecolonies Create & Conquer: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- All The Mods 10: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Homestead: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- EMC Subterra Tech: `/var/lib/pterodactyl/volumes/{uuid}/world/`
**Note:** Some modpacks use different world folder names (e.g., `dimensions/` for custom dimensions)
---
## Storage Requirements
### Per-Server Estimates
**Compressed backup sizes (approximate):**
- Vanilla: ~50 MB
- Modded (light): ~200 MB
- Modded (heavy): ~500 MB
- ATM10: ~1 GB (largest)
**Total daily backup:** ~4-5 GB (all 11 servers compressed)
### Retention Storage
**Daily (7 days):** 7 × 5 GB = 35 GB
**Weekly (4 weeks):** 4 × 5 GB = 20 GB
**Monthly (12 months):** 12 × 5 GB = 60 GB
**Total storage needed:** ~115 GB (with compression)
**Recommended:** 200 GB storage allocation (room to grow)
---
## Implementation
### Script Location
**File:** `/opt/automation/world-backup.py`
**Config:** `/opt/automation/backup-config.json`
**Logs:** `/var/log/world-backup.log`
**Local staging:** `/opt/automation/backup-staging/`
**Remote storage:** `downloads.firefrostgaming.com:/backups/worlds/`
### Configuration File (backup-config.json)
```json
{
"pterodactyl": {
"url": "https://panel.firefrostgaming.com",
"api_key": "PTERODACTYL_API_KEY_HERE",
"sftp_host": "us.tx1.firefrostgaming.com",
"sftp_port": 2022
},
"nextcloud": {
"webdav_url": "https://downloads.firefrostgaming.com/remote.php/dav/files/admin/",
"username": "admin",
"password": "NEXTCLOUD_PASSWORD_HERE",
"backup_path": "backups/worlds/"
},
"discord": {
"webhook_url": "DISCORD_WEBHOOK_URL_HERE",
"notifications_enabled": true
},
"backup_settings": {
"staging_dir": "/opt/automation/backup-staging",
"compression": "gzip",
"compression_level": 6,
"retention": {
"daily": 7,
"weekly": 4,
"monthly": 12
}
},
"servers": [
{
"name": "Vanilla 1.21.11",
"uuid": "3bed1bda-f648-4630-801a-fe9f2e3d3f27",
"world_path": "world",
"node": "TX1"
},
{
"name": "All The Mons",
"uuid": "668a5220-7e72-4379-9165-bdbb84bc9806",
"world_path": "world",
"node": "TX1"
},
{
"name": "Stoneblock 4",
"uuid": "a0efbfe8-4b97-4a90-869d-ffe6d3072bd5",
"world_path": "world",
"node": "TX1"
},
{
"name": "Society: Sunlit Valley",
"uuid": "9310d0a6-62a6-4fe6-82c4-eb483dc68876",
"world_path": "world",
"node": "TX1"
},
{
"name": "Reclamation",
"uuid": "1eb33479-a6bc-4e8f-b64d-d1e4bfa0a8b4",
"world_path": "world",
"node": "TX1"
},
{
"name": "The Ember Project",
"uuid": "124f9060-58a7-457a-b2cf-b4024fce2951",
"world_path": "world",
"node": "NC1"
},
{
"name": "Minecolonies: Create and Conquer",
"uuid": "a14201d2-83b2-44e6-ae48-e6c4cbc56f24",
"world_path": "world",
"node": "NC1"
},
{
"name": "All The Mods 10",
"uuid": "82e63949-8fbf-4a44-b32a-53324e8492bf",
"world_path": "world",
"node": "NC1"
},
{
"name": "Homestead",
"uuid": "2f85d4ef-aa49-4dd6-b448-beb3fca1db12",
"world_path": "world",
"node": "NC1"
},
{
"name": "EMC Subterra Tech",
"uuid": "09a95f38-9f8c-404a-9557-3a7c44258223",
"world_path": "world",
"node": "NC1"
}
]
}
```
### Main Script Overview
**See `deployments/world-backup/world-backup.py` for complete script**
**Key functions:**
- `download_world_via_sftp(server)` - Download world files from Pterodactyl
- `compress_backup(server)` - Create tar.gz archive
- `upload_to_nextcloud(backup_file)` - Upload to remote storage
- `apply_retention_policy()` - Delete old backups per policy
- `discord_notify(message)` - Send notifications
- `main()` - Orchestrate backup process
---
## Deployment Steps
### Phase 1: Prerequisites (15 min)
- [ ] NextCloud admin access (or S3 credentials)
- [ ] Pterodactyl API key with SFTP access
- [ ] Discord webhook for #server-status
- [ ] Command Center SSH access
- [ ] 200 GB storage available on NextCloud/S3
### Phase 2: Setup Storage (15 min)
**Option A: NextCloud (Recommended)**
```bash
# Create backup directory in NextCloud
# Via web interface: Create folder "backups/worlds/"
# Or via WebDAV:
curl -X MKCOL -u admin:PASSWORD \
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/
curl -X MKCOL -u admin:PASSWORD \
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/worlds/
```
**Option B: S3-Compatible (Alternative)**
- Configure AWS S3, Backblaze B2, or Wasabi bucket
- Create bucket: `firefrost-world-backups`
- Generate access key/secret
- Update script to use boto3/s3cmd instead of WebDAV
### Phase 3: Install Script (20 min)
```bash
# On Command Center
mkdir -p /opt/automation/backup-staging
cd /opt/automation
# Create config
nano backup-config.json
# Paste config, update credentials
# Create script
nano world-backup.py
# Paste script
# Make executable
chmod +x world-backup.py
# Install dependencies
pip3 install requests paramiko --break-system-packages
# Create log file
touch /var/log/world-backup.log
chmod 644 /var/log/world-backup.log
```
### Phase 4: Test Backup (30 min)
```bash
# Test with ONE server first (Vanilla)
# Edit config to include only Vanilla server temporarily
# Run manually
python3 /opt/automation/world-backup.py
# Verify:
# - World downloaded from Pterodactyl
# - Compressed successfully
# - Uploaded to NextCloud
# - Discord notification received
# - Backup appears in NextCloud
# Check backup file
# Download and extract to verify integrity
```
### Phase 5: Schedule with Cron (10 min)
```bash
# Edit crontab
crontab -e
# Add daily backup at 3:30 AM (before restart cycle)
30 3 * * * /usr/bin/python3 /opt/automation/world-backup.py >> /var/log/world-backup.log 2>&1
```
**Why 3:30 AM?**
- Low player activity
- Before 4:00 AM restart cycle
- Ensures fresh backup before restarts
- Backups complete before restarts begin
### Phase 6: Test Restoration (20 min)
**CRITICAL: Test restoration on a TEST server BEFORE you need it!**
```bash
# Download backup from NextCloud
# Extract tar.gz
# Stop test server
# Replace world folder
# Start server
# Verify world loads correctly
```
---
## Retention Policy Logic
**How it works:**
1. **Daily backups:** Every backup is initially a "daily" backup
2. **Weekly promotion:** Sunday backups are kept as "weekly" (in addition to daily)
3. **Monthly promotion:** 1st of month backups are kept as "monthly" (in addition to weekly)
**Example retention:**
```
Daily (7 days):
- Feb 17 (today)
- Feb 16
- Feb 15
- Feb 14
- Feb 13
- Feb 12
- Feb 11
Weekly (4 weeks) - Sundays only:
- Feb 10 (Sunday)
- Feb 3 (Sunday)
- Jan 27 (Sunday)
- Jan 20 (Sunday)
Monthly (12 months) - 1st of month:
- Feb 1
- Jan 1
- Dec 1, 2025
- Nov 1, 2025
... (back 12 months)
```
**Old backups deleted:**
- Daily backups older than 7 days (unless promoted to weekly/monthly)
- Weekly backups older than 4 weeks (unless promoted to monthly)
- Monthly backups older than 12 months
---
## Restoration Procedures
### Standard Restoration (World Corruption)
**Scenario:** Server world corrupted, need to restore from backup
**Steps:**
1. **Stop the affected server** (via Pterodactyl panel)
2. **Download latest backup from NextCloud**
```bash
# Find latest backup
ls backups/worlds/[server-name]/
# Download
wget https://downloads.firefrostgaming.com/backups/worlds/[server-name]/[backup-file].tar.gz
```
3. **Connect to server via SFTP**
```bash
sftp -P 2022 admin@us.tx1.firefrostgaming.com
# (or NC1 host for NC1 servers)
```
4. **Backup current world (just in case)**
```bash
cd /var/lib/pterodactyl/volumes/[uuid]/
mv world world-corrupted-backup-$(date +%Y%m%d)
```
5. **Extract backup**
```bash
tar -xzf [backup-file].tar.gz
```
6. **Verify world folder exists and has proper permissions**
7. **Start server** (via Pterodactyl panel)
8. **Test in-game** - Join and verify world loaded correctly
9. **Monitor logs** for any errors
**Estimated recovery time:** 15-30 minutes
### Disaster Recovery (Node Failure)
**Scenario:** Entire TX1 or NC1 node failed, need to restore all servers
**Steps:**
1. **Provision new node** (or repair existing)
2. **Reinstall Pterodactyl Wings**
3. **Download ALL backups for that node**
4. **Restore each server world** (use parallel restoration if possible)
5. **Update DNS/IP mappings** if IPs changed
6. **Test each server**
**Estimated recovery time:** 2-4 hours (depends on number of servers and backup size)
---
## Discord Notifications
### Notification Examples
**Backup Started:**
```
💾 **World Backup Started**
Servers: 11
Estimated duration: ~30 minutes
```
**Per-Server:**
```
✅ **Vanilla 1.21.11** backed up
Size: 47 MB compressed
Duration: 2m 15s
```
**Completion:**
```
✅ **Backup Cycle Complete**
Successful: 11/11
Failed: 0
Total size: 4.2 GB compressed
Duration: 28 minutes
Next backup: Tomorrow 3:30 AM
```
**Error:**
```
❌ **Backup Failed: ATM10**
Reason: SFTP connection timeout
Attempt: 3/3
Action: Manual backup required
```
---
## Monitoring & Maintenance
### Daily
- Check Discord for backup completion notification
- Verify no failed backups
### Weekly
- Review backup logs for errors
- Check NextCloud storage usage
- Verify retention policy working
### Monthly
- Test restoration on one random server
- Review backup sizes (growing unexpectedly?)
- Verify off-server storage accessible
- Check that monthly backups are being promoted correctly
### Quarterly
- Full disaster recovery drill (restore all servers to test environment)
- Review and update restoration procedures
- Verify backup script still compatible with Pterodactyl updates
---
## Troubleshooting
### SFTP connection fails
**Check:**
- Pterodactyl SFTP service running on node
- Firewall allows port 2022
- API key has SFTP permissions
- Correct hostname (us.tx1 vs us.nc1)
### Backup file corrupt
**Verify:**
- Download completed successfully (check file size)
- Compression succeeded (try manual tar -xzf test)
- Upload to NextCloud succeeded
- NextCloud not corrupting on upload
### Retention policy not deleting old backups
**Debug:**
- Check backup filenames follow expected format
- Verify date parsing logic in script
- Check NextCloud permissions (can delete?)
- Review logs for deletion attempts
### NextCloud storage full
**Actions:**
- Check retention policy running
- Manually delete very old backups if needed
- Consider upgrading NextCloud storage
- Enable NextCloud auto-delete for old files
---
## Advanced Features (Phase 2)
**Incremental backups:**
- Only backup changed files
- Significant storage savings
- Faster backup times
- More complex restoration
**Encrypted backups:**
- Encrypt before upload
- GPG or age encryption
- Protects against NextCloud compromise
- Key management required
**Multi-region storage:**
- Upload to multiple locations
- S3 + NextCloud redundancy
- Geographic disaster protection
- Higher costs
**Real-time backups:**
- Continuous backup on world save
- Near-zero data loss
- Much higher storage usage
- More complex implementation
---
## Related Tasks
- **Staggered Server Restart System** - Runs after backups complete
- **Netdata Deployment** - Monitor backup job duration
- **NextCloud Setup** - Storage backend
- **Discord Reorganization** - #server-status for notifications
---
**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️
---
**Document Status:** COMPLETE
**Ready for Implementation:** When SSH access available (1-2 hours)
**Dependencies:** NextCloud or S3, Pterodactyl API, Command Center access
**Storage Needed:** 200 GB recommended