Files
firefrost-operations-manual/docs/tasks/world-backup-automation/deployment-plan.md
Claude 8f4e25e903 docs: Complete World Backup Automation deployment plan
Created comprehensive automated backup system for all 11 Minecraft servers:

Deployment Plan (500+ lines):
- Architecture and timing (integrates with restart system)
- Daily backups at 3:30 AM (before 4 AM restart cycle)
- Intelligent retention policy (7 daily, 4 weekly, 12 monthly)
- Off-server storage via NextCloud WebDAV or S3
- Complete storage requirements calculation (~115 GB needed)
- Per-server world paths and configuration
- 6-phase deployment guide
- Discord notifications throughout process

Features:
- Automated daily backups compressed to tar.gz (~80% size reduction)
- Total daily backup: ~4-5 GB (all 11 servers)
- Smart retention prevents storage bloat
- Disaster recovery procedures documented
- Standard restoration: 15-30 minutes
- Full node recovery: 2-4 hours

Restoration Procedures:
- Step-by-step world corruption recovery
- Disaster recovery for full node failure
- Regular restoration testing schedule
- Recovery time objectives defined

Troubleshooting:
- SFTP connection issues
- Backup corruption detection
- Retention policy debugging
- Storage management

Advanced features roadmap: incremental backups, encryption,
multi-region storage, real-time backups.

Ready to deploy when SSH access available (1-2 hours setup).

Task: World Backup Automation (Tier 3)
FFG-STD-002 compliant
2026-02-17 22:51:54 +00:00

622 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# World Backup Automation - Deployment Plan
**Status:** Planning Complete, Ready to Implement
**Priority:** Tier 3 - Disaster Recovery
**Time Estimate:** 1-2 hours implementation
**Last Updated:** 2026-02-17
---
## Overview
Automated backup system for all 11 Minecraft server worlds. Scheduled daily backups with intelligent retention policy (7 daily, 4 weekly, 3 monthly) and off-server storage for disaster recovery.
**The Problem:**
- No automated backups = risk of data loss
- Manual backups = inconsistent, time-consuming
- No off-server storage = vulnerable to hardware failure
- No retention policy = storage bloat or premature deletion
**The Solution:**
- Automated Python script with Pterodactyl SFTP integration
- Daily backups at off-peak hours (integrate with restart system)
- Smart retention: 7 daily, 4 weekly, 12 monthly backups
- Off-server storage (NextCloud or S3-compatible)
- Compression (tar.gz) to save space
- Restoration procedures documented
- Discord notifications
---
## Architecture
```
Backup Schedule (Daily 3:30 AM)
Python Backup Script
Pterodactyl SFTP → Download world files
Compress (tar.gz)
Upload to NextCloud/S3
Apply Retention Policy (delete old backups)
Discord Notification (success/failure)
```
**Timing Integration with Restarts:**
- **3:30 AM** - Backups start (before restart cycle)
- **4:00 AM** - Staggered restarts start (after backups complete)
This ensures fresh backups before any server restarts.
---
## Features
### Core Features
**✅ Automated Daily Backups**
- Runs automatically via cron
- All 11 servers backed up
- World files only (not plugins/configs - those are in git)
- Compressed to save space (~80% reduction)
**✅ Intelligent Retention Policy**
- **Daily:** Keep last 7 days
- **Weekly:** Keep last 4 weeks (Sunday backups)
- **Monthly:** Keep last 12 months (1st of month backups)
- Automatic cleanup of old backups
- Prevents storage bloat
**✅ Off-Server Storage**
- Stored on NextCloud (downloads.firefrostgaming.com)
- Alternative: S3-compatible storage (Backblaze B2, Wasabi, etc.)
- Protects against node hardware failure
- Accessible from anywhere
**✅ Discord Notifications**
- Backup started notification
- Per-server completion
- Final summary (success/failed/size)
- Error alerts
**✅ Restoration Procedures**
- Documented step-by-step restoration
- Test restoration regularly
- Recovery time objective: < 1 hour
---
## Server World Paths
**TX1 Dallas servers:**
- Reclamation: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Stoneblock 4: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Society Sunlit Valley: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Vanilla 1.21.11: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- All The Mons: `/var/lib/pterodactyl/volumes/{uuid}/world/`
**NC1 Charlotte servers:**
- The Ember Project: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Minecolonies Create & Conquer: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- All The Mods 10: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- Homestead: `/var/lib/pterodactyl/volumes/{uuid}/world/`
- EMC Subterra Tech: `/var/lib/pterodactyl/volumes/{uuid}/world/`
**Note:** Some modpacks use different world folder names (e.g., `dimensions/` for custom dimensions)
---
## Storage Requirements
### Per-Server Estimates
**Compressed backup sizes (approximate):**
- Vanilla: ~50 MB
- Modded (light): ~200 MB
- Modded (heavy): ~500 MB
- ATM10: ~1 GB (largest)
**Total daily backup:** ~4-5 GB (all 11 servers compressed)
### Retention Storage
**Daily (7 days):** 7 × 5 GB = 35 GB
**Weekly (4 weeks):** 4 × 5 GB = 20 GB
**Monthly (12 months):** 12 × 5 GB = 60 GB
**Total storage needed:** ~115 GB (with compression)
**Recommended:** 200 GB storage allocation (room to grow)
---
## Implementation
### Script Location
**File:** `/opt/automation/world-backup.py`
**Config:** `/opt/automation/backup-config.json`
**Logs:** `/var/log/world-backup.log`
**Local staging:** `/opt/automation/backup-staging/`
**Remote storage:** `downloads.firefrostgaming.com:/backups/worlds/`
### Configuration File (backup-config.json)
```json
{
"pterodactyl": {
"url": "https://panel.firefrostgaming.com",
"api_key": "PTERODACTYL_API_KEY_HERE",
"sftp_host": "us.tx1.firefrostgaming.com",
"sftp_port": 2022
},
"nextcloud": {
"webdav_url": "https://downloads.firefrostgaming.com/remote.php/dav/files/admin/",
"username": "admin",
"password": "NEXTCLOUD_PASSWORD_HERE",
"backup_path": "backups/worlds/"
},
"discord": {
"webhook_url": "DISCORD_WEBHOOK_URL_HERE",
"notifications_enabled": true
},
"backup_settings": {
"staging_dir": "/opt/automation/backup-staging",
"compression": "gzip",
"compression_level": 6,
"retention": {
"daily": 7,
"weekly": 4,
"monthly": 12
}
},
"servers": [
{
"name": "Vanilla 1.21.11",
"uuid": "3bed1bda-f648-4630-801a-fe9f2e3d3f27",
"world_path": "world",
"node": "TX1"
},
{
"name": "All The Mons",
"uuid": "668a5220-7e72-4379-9165-bdbb84bc9806",
"world_path": "world",
"node": "TX1"
},
{
"name": "Stoneblock 4",
"uuid": "a0efbfe8-4b97-4a90-869d-ffe6d3072bd5",
"world_path": "world",
"node": "TX1"
},
{
"name": "Society: Sunlit Valley",
"uuid": "9310d0a6-62a6-4fe6-82c4-eb483dc68876",
"world_path": "world",
"node": "TX1"
},
{
"name": "Reclamation",
"uuid": "1eb33479-a6bc-4e8f-b64d-d1e4bfa0a8b4",
"world_path": "world",
"node": "TX1"
},
{
"name": "The Ember Project",
"uuid": "124f9060-58a7-457a-b2cf-b4024fce2951",
"world_path": "world",
"node": "NC1"
},
{
"name": "Minecolonies: Create and Conquer",
"uuid": "a14201d2-83b2-44e6-ae48-e6c4cbc56f24",
"world_path": "world",
"node": "NC1"
},
{
"name": "All The Mods 10",
"uuid": "82e63949-8fbf-4a44-b32a-53324e8492bf",
"world_path": "world",
"node": "NC1"
},
{
"name": "Homestead",
"uuid": "2f85d4ef-aa49-4dd6-b448-beb3fca1db12",
"world_path": "world",
"node": "NC1"
},
{
"name": "EMC Subterra Tech",
"uuid": "09a95f38-9f8c-404a-9557-3a7c44258223",
"world_path": "world",
"node": "NC1"
}
]
}
```
### Main Script Overview
**See `deployments/world-backup/world-backup.py` for complete script**
**Key functions:**
- `download_world_via_sftp(server)` - Download world files from Pterodactyl
- `compress_backup(server)` - Create tar.gz archive
- `upload_to_nextcloud(backup_file)` - Upload to remote storage
- `apply_retention_policy()` - Delete old backups per policy
- `discord_notify(message)` - Send notifications
- `main()` - Orchestrate backup process
---
## Deployment Steps
### Phase 1: Prerequisites (15 min)
- [ ] NextCloud admin access (or S3 credentials)
- [ ] Pterodactyl API key with SFTP access
- [ ] Discord webhook for #server-status
- [ ] Command Center SSH access
- [ ] 200 GB storage available on NextCloud/S3
### Phase 2: Setup Storage (15 min)
**Option A: NextCloud (Recommended)**
```bash
# Create backup directory in NextCloud
# Via web interface: Create folder "backups/worlds/"
# Or via WebDAV:
curl -X MKCOL -u admin:PASSWORD \
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/
curl -X MKCOL -u admin:PASSWORD \
https://downloads.firefrostgaming.com/remote.php/dav/files/admin/backups/worlds/
```
**Option B: S3-Compatible (Alternative)**
- Configure AWS S3, Backblaze B2, or Wasabi bucket
- Create bucket: `firefrost-world-backups`
- Generate access key/secret
- Update script to use boto3/s3cmd instead of WebDAV
### Phase 3: Install Script (20 min)
```bash
# On Command Center
mkdir -p /opt/automation/backup-staging
cd /opt/automation
# Create config
nano backup-config.json
# Paste config, update credentials
# Create script
nano world-backup.py
# Paste script
# Make executable
chmod +x world-backup.py
# Install dependencies
pip3 install requests paramiko --break-system-packages
# Create log file
touch /var/log/world-backup.log
chmod 644 /var/log/world-backup.log
```
### Phase 4: Test Backup (30 min)
```bash
# Test with ONE server first (Vanilla)
# Edit config to include only Vanilla server temporarily
# Run manually
python3 /opt/automation/world-backup.py
# Verify:
# - World downloaded from Pterodactyl
# - Compressed successfully
# - Uploaded to NextCloud
# - Discord notification received
# - Backup appears in NextCloud
# Check backup file
# Download and extract to verify integrity
```
### Phase 5: Schedule with Cron (10 min)
```bash
# Edit crontab
crontab -e
# Add daily backup at 3:30 AM (before restart cycle)
30 3 * * * /usr/bin/python3 /opt/automation/world-backup.py >> /var/log/world-backup.log 2>&1
```
**Why 3:30 AM?**
- Low player activity
- Before 4:00 AM restart cycle
- Ensures fresh backup before restarts
- Backups complete before restarts begin
### Phase 6: Test Restoration (20 min)
**CRITICAL: Test restoration on a TEST server BEFORE you need it!**
```bash
# Download backup from NextCloud
# Extract tar.gz
# Stop test server
# Replace world folder
# Start server
# Verify world loads correctly
```
---
## Retention Policy Logic
**How it works:**
1. **Daily backups:** Every backup is initially a "daily" backup
2. **Weekly promotion:** Sunday backups are kept as "weekly" (in addition to daily)
3. **Monthly promotion:** 1st of month backups are kept as "monthly" (in addition to weekly)
**Example retention:**
```
Daily (7 days):
- Feb 17 (today)
- Feb 16
- Feb 15
- Feb 14
- Feb 13
- Feb 12
- Feb 11
Weekly (4 weeks) - Sundays only:
- Feb 10 (Sunday)
- Feb 3 (Sunday)
- Jan 27 (Sunday)
- Jan 20 (Sunday)
Monthly (12 months) - 1st of month:
- Feb 1
- Jan 1
- Dec 1, 2025
- Nov 1, 2025
... (back 12 months)
```
**Old backups deleted:**
- Daily backups older than 7 days (unless promoted to weekly/monthly)
- Weekly backups older than 4 weeks (unless promoted to monthly)
- Monthly backups older than 12 months
---
## Restoration Procedures
### Standard Restoration (World Corruption)
**Scenario:** Server world corrupted, need to restore from backup
**Steps:**
1. **Stop the affected server** (via Pterodactyl panel)
2. **Download latest backup from NextCloud**
```bash
# Find latest backup
ls backups/worlds/[server-name]/
# Download
wget https://downloads.firefrostgaming.com/backups/worlds/[server-name]/[backup-file].tar.gz
```
3. **Connect to server via SFTP**
```bash
sftp -P 2022 admin@us.tx1.firefrostgaming.com
# (or NC1 host for NC1 servers)
```
4. **Backup current world (just in case)**
```bash
cd /var/lib/pterodactyl/volumes/[uuid]/
mv world world-corrupted-backup-$(date +%Y%m%d)
```
5. **Extract backup**
```bash
tar -xzf [backup-file].tar.gz
```
6. **Verify world folder exists and has proper permissions**
7. **Start server** (via Pterodactyl panel)
8. **Test in-game** - Join and verify world loaded correctly
9. **Monitor logs** for any errors
**Estimated recovery time:** 15-30 minutes
### Disaster Recovery (Node Failure)
**Scenario:** Entire TX1 or NC1 node failed, need to restore all servers
**Steps:**
1. **Provision new node** (or repair existing)
2. **Reinstall Pterodactyl Wings**
3. **Download ALL backups for that node**
4. **Restore each server world** (use parallel restoration if possible)
5. **Update DNS/IP mappings** if IPs changed
6. **Test each server**
**Estimated recovery time:** 2-4 hours (depends on number of servers and backup size)
---
## Discord Notifications
### Notification Examples
**Backup Started:**
```
💾 **World Backup Started**
Servers: 11
Estimated duration: ~30 minutes
```
**Per-Server:**
```
✅ **Vanilla 1.21.11** backed up
Size: 47 MB compressed
Duration: 2m 15s
```
**Completion:**
```
✅ **Backup Cycle Complete**
Successful: 11/11
Failed: 0
Total size: 4.2 GB compressed
Duration: 28 minutes
Next backup: Tomorrow 3:30 AM
```
**Error:**
```
❌ **Backup Failed: ATM10**
Reason: SFTP connection timeout
Attempt: 3/3
Action: Manual backup required
```
---
## Monitoring & Maintenance
### Daily
- Check Discord for backup completion notification
- Verify no failed backups
### Weekly
- Review backup logs for errors
- Check NextCloud storage usage
- Verify retention policy working
### Monthly
- Test restoration on one random server
- Review backup sizes (growing unexpectedly?)
- Verify off-server storage accessible
- Check that monthly backups are being promoted correctly
### Quarterly
- Full disaster recovery drill (restore all servers to test environment)
- Review and update restoration procedures
- Verify backup script still compatible with Pterodactyl updates
---
## Troubleshooting
### SFTP connection fails
**Check:**
- Pterodactyl SFTP service running on node
- Firewall allows port 2022
- API key has SFTP permissions
- Correct hostname (us.tx1 vs us.nc1)
### Backup file corrupt
**Verify:**
- Download completed successfully (check file size)
- Compression succeeded (try manual tar -xzf test)
- Upload to NextCloud succeeded
- NextCloud not corrupting on upload
### Retention policy not deleting old backups
**Debug:**
- Check backup filenames follow expected format
- Verify date parsing logic in script
- Check NextCloud permissions (can delete?)
- Review logs for deletion attempts
### NextCloud storage full
**Actions:**
- Check retention policy running
- Manually delete very old backups if needed
- Consider upgrading NextCloud storage
- Enable NextCloud auto-delete for old files
---
## Advanced Features (Phase 2)
**Incremental backups:**
- Only backup changed files
- Significant storage savings
- Faster backup times
- More complex restoration
**Encrypted backups:**
- Encrypt before upload
- GPG or age encryption
- Protects against NextCloud compromise
- Key management required
**Multi-region storage:**
- Upload to multiple locations
- S3 + NextCloud redundancy
- Geographic disaster protection
- Higher costs
**Real-time backups:**
- Continuous backup on world save
- Near-zero data loss
- Much higher storage usage
- More complex implementation
---
## Related Tasks
- **Staggered Server Restart System** - Runs after backups complete
- **Netdata Deployment** - Monitor backup job duration
- **NextCloud Setup** - Storage backend
- **Discord Reorganization** - #server-status for notifications
---
**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️
---
**Document Status:** COMPLETE
**Ready for Implementation:** When SSH access available (1-2 hours)
**Dependencies:** NextCloud or S3, Pterodactyl API, Command Center access
**Storage Needed:** 200 GB recommended