Created comprehensive documentation for Frostwall Protocol rebuild: deployment-plan.md (500+ lines): - Complete 7-phase implementation guide - GRE tunnel configuration for Command Center ↔ TX1/NC1 - Iron Wall UFW firewall rules - NAT/port forwarding setup - Self-healing tunnel monitoring with auto-recovery - DNS configuration - Testing and verification procedures - Rollback plan - Performance considerations ip-hierarchy.md (400+ lines): - Three-tier IP architecture explained - Complete service mapping table (all 11 game servers) - GRE tunnel IP addressing - Traffic flow diagrams - DNS configuration reference - Security summary - Quick command reference troubleshooting.md (450+ lines): - Quick diagnostics checklist - Common problems with step-by-step solutions: - Tunnel won't come up - Can't ping tunnel IP - Port forwarding not working - Tunnel breaks after reboot - Self-healing monitor issues - High latency/packet loss - UFW blocking traffic - Emergency recovery procedures - Common error messages decoded - Health check commands This documentation enables rebuilding the Frostwall Protocol from scratch with proper IP hierarchy, DDoS protection, and self-healing capabilities. Unblocks: Mailcow deployment, AI stack, all Tier 2+ infrastructure Task: Frostwall Protocol (Tier 1, Critical) FFG-STD-002 compliant
13 KiB
Frostwall Protocol - Troubleshooting Guide
Purpose: Diagnose and resolve common Frostwall issues
Last Updated: 2026-02-17
Status: Ready for use
Quick Diagnostics Checklist
When something's not working, run through this checklist first:
- Are GRE tunnels up? (
ip tunnel show) - Can you ping tunnel endpoints? (
ping 10.0.1.2,ping 10.0.2.2) - Is UFW blocking necessary traffic? (
ufw status verbose) - Are NAT rules present? (
iptables -t nat -L -n -v) - Is IP forwarding enabled? (
cat /proc/sys/net/ipv4/ip_forward) - Is the self-healing monitor running? (
crontab -l) - Did the server reboot recently? (tunnels may need manual restart)
Problem: Tunnel Won't Come Up
Symptoms
ip tunnel showshows no tunnel interface- Cannot create tunnel with
ip tunnel add - Error: "RTNETLINK answers: File exists" or similar
Diagnosis
Step 1: Check if GRE module is loaded
lsmod | grep gre
Expected output:
ip_gre 28672 0
gre 16384 1 ip_gre
If not loaded:
modprobe ip_gre
Step 2: Check if tunnel already exists
ip tunnel show
If tunnel exists but is down:
ip link set gre-tx1 up # or gre-nc1, gre-hub as appropriate
Step 3: Verify remote endpoint is reachable
ping 38.68.14.26 # TX1 physical IP
ping 216.239.104.130 # NC1 physical IP
If physical IPs aren't reachable, the GRE tunnel can't form.
Solution
Delete and recreate tunnel:
# If tunnel exists
ip link set gre-tx1 down
ip tunnel del gre-tx1
# Recreate
ip tunnel add gre-tx1 mode gre remote 38.68.14.26 local 63.143.34.217 ttl 255
ip addr add 10.0.1.1/30 dev gre-tx1
ip link set gre-tx1 up
# Test
ping 10.0.1.2
Problem: Can't Ping Tunnel IP
Symptoms
- Tunnel shows as "UP" in
ip link show ping 10.0.1.2times out or fails- No response from remote tunnel endpoint
Diagnosis
Step 1: Verify tunnel interface is actually up
ip link show gre-tx1
Look for: state UP
Step 2: Check if UFW is blocking GRE
ufw status verbose | grep -i gre
Expected:
Anywhere ALLOW Anywhere # allow GRE
47 ALLOW Anywhere
Step 3: Check routing table
ip route show
You should see routes for tunnel IPs:
10.0.1.0/30 dev gre-tx1 proto kernel scope link src 10.0.1.1
Step 4: Check if remote server has tunnel up
# SSH to remote server
ssh root@38.68.14.26
# Check tunnel
ip link show gre-hub
ping 10.0.1.1
Solution
On both Command Center and remote node:
# Restart both ends of the tunnel
# Command Center:
ip link set gre-tx1 down
sleep 2
ip link set gre-tx1 up
# Remote (TX1):
ip link set gre-hub down
sleep 2
ip link set gre-hub up
# Test from both sides
ping 10.0.1.2 # From Command Center
ping 10.0.1.1 # From TX1
If UFW is blocking:
# On Command Center
ufw allow proto gre
# On TX1/NC1
ufw allow from 63.143.34.217 proto gre
Problem: Port Forwarding Not Working
Symptoms
- Players can't connect to game servers
telnet 63.143.34.217 25565times out or refuses- Tunnel is up and pingable, but game traffic doesn't flow
Diagnosis
Step 1: Check if NAT rules exist
iptables -t nat -L PREROUTING -n -v
Expected output (example for port 25565):
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:25565 to:10.0.1.2:25565
Step 2: Check FORWARD chain
iptables -L FORWARD -n -v
Expected:
ACCEPT tcp -- 0.0.0.0/0 10.0.1.2 tcp dpt:25565
Step 3: Verify IP forwarding is enabled
cat /proc/sys/net/ipv4/ip_forward
Expected: 1
Step 4: Test from Command Center itself
# From Command Center, test connection to tunnel IP
telnet 10.0.1.2 25565
If this works, but external connections don't, it's a NAT issue.
Step 5: Check if game server is actually listening
# SSH to TX1
ssh root@38.68.14.26
# Check if Minecraft is listening
netstat -tuln | grep 25565
Expected:
tcp6 0 0 :::25565 :::* LISTEN
Solution
Add missing NAT rules:
# On Command Center
iptables -t nat -A PREROUTING -p tcp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
iptables -t nat -A PREROUTING -p udp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
iptables -A FORWARD -p tcp -d 10.0.1.2 --dport 25565 -j ACCEPT
iptables -A FORWARD -p udp -d 10.0.1.2 --dport 25565 -j ACCEPT
# Add masquerading if not present
iptables -t nat -A POSTROUTING -o gre-tx1 -j MASQUERADE
# Save rules
iptables-save > /etc/iptables/rules.v4
Enable IP forwarding if disabled:
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p
Problem: Tunnel Works But Breaks After Reboot
Symptoms
- Tunnel works fine until server reboots
- After reboot, tunnel doesn't come back up
- Must manually recreate tunnel every time
Diagnosis
Check if persistence script exists:
ls -la /etc/network/if-up.d/frostwall-*
Check if it's executable:
ls -la /etc/network/if-up.d/frostwall-tunnels
Should show: -rwxr-xr-x
Check if script is being called on boot:
# Check recent boot logs
journalctl -b | grep frostwall
Solution
Create or fix persistence script:
See deployment-plan.md Phase 1.3 for full scripts.
Make sure it's executable:
chmod +x /etc/network/if-up.d/frostwall-tunnels
Test the script manually:
# Bring tunnel down first
ip link set gre-tx1 down
# Run the script
/etc/network/if-up.d/frostwall-tunnels
# Check if tunnel came back up
ip tunnel show
ping 10.0.1.2
Alternative: Use systemd service
If if-up.d hooks don't work, create a systemd service:
/etc/systemd/system/frostwall-tunnels.service:
[Unit]
Description=Frostwall GRE Tunnels
After=network.target
[Service]
Type=oneshot
ExecStart=/etc/network/if-up.d/frostwall-tunnels
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable frostwall-tunnels
systemctl start frostwall-tunnels
Problem: Self-Healing Monitor Not Running
Symptoms
- Tunnels go down and don't auto-recover
- No entries in
/var/log/frostwall-monitor.log - Cron job not running
Diagnosis
Check if cron job is scheduled:
crontab -l | grep frostwall
Expected:
*/5 * * * * /usr/local/bin/frostwall-monitor.sh
Check if script exists:
ls -la /usr/local/bin/frostwall-monitor.sh
Check if it's executable:
chmod +x /usr/local/bin/frostwall-monitor.sh
Run script manually to test:
/usr/local/bin/frostwall-monitor.sh
Check logs:
cat /var/log/frostwall-monitor.log
Solution
Add cron job if missing:
crontab -e
Add:
*/5 * * * * /usr/local/bin/frostwall-monitor.sh
Fix script permissions:
chmod +x /usr/local/bin/frostwall-monitor.sh
Create log file if it doesn't exist:
touch /var/log/frostwall-monitor.log
chmod 644 /var/log/frostwall-monitor.log
Test the monitor:
# Bring down a tunnel manually
ip link set gre-tx1 down
# Wait 5 minutes
sleep 300
# Check if it auto-recovered
ping 10.0.1.2
# Check logs
tail /var/log/frostwall-monitor.log
Problem: High Latency or Packet Loss Through Tunnel
Symptoms
- Players experience lag
ping 10.0.1.2shows high latency or packet loss- Game connections are unstable
Diagnosis
Test latency:
# From Command Center
ping -c 100 10.0.1.2 | tail -5
Look for:
- Average latency (should be <2ms for TX1, ~30-40ms for NC1)
- Packet loss (should be 0%)
Test MTU size:
# Try different MTU sizes
ping -M do -s 1472 10.0.1.2 # Standard ethernet MTU
ping -M do -s 1450 10.0.1.2 # Lower MTU
If larger packets fail but smaller succeed, MTU is the issue.
Check CPU load:
top
High CPU on either end of tunnel could cause performance issues.
Check bandwidth:
# Install iperf3 if not present
apt install iperf3
# On TX1:
iperf3 -s
# On Command Center:
iperf3 -c 10.0.1.2
Solution
Adjust MTU if needed:
# On tunnel interface
ip link set gre-tx1 mtu 1400
Add to persistence script:
# In /etc/network/if-up.d/frostwall-tunnels
ip link set gre-tx1 mtu 1400
If packet loss persists:
- Check physical network between nodes
- Contact datacenter if persistent issues
- Verify no other services saturating bandwidth
Problem: UFW Blocking Legitimate Traffic
Symptoms
- Can't SSH to server
- Specific ports not working despite NAT rules
- Connection refused or timeout
Diagnosis
Check UFW status:
ufw status verbose
Check UFW logs:
tail -100 /var/log/ufw.log
Look for BLOCK entries for the port/IP you're trying to reach.
Test with UFW temporarily disabled:
ufw disable
# Try connection
# Re-enable immediately
ufw enable
⚠️ WARNING: Only disable UFW for brief testing, re-enable immediately.
Solution
Add specific rule for your management IP:
ufw allow from MANAGEMENT_IP to any port 22 proto tcp
Allow traffic on tunnel interfaces:
ufw allow in on gre-tx1
ufw allow in on gre-nc1
ufw allow in on gre-hub # On TX1/NC1
Check rule order:
ufw status numbered
Rules are processed in order - make sure allow rules come before deny rules.
Delete and re-add rules if needed:
# Delete rule by number
ufw delete 5
# Re-add in correct order
ufw insert 1 allow from MANAGEMENT_IP to any port 22
Emergency Recovery Procedures
Complete Tunnel Failure
If all troubleshooting fails, rebuild from scratch:
# On Command Center
ip link set gre-tx1 down
ip link set gre-nc1 down
ip tunnel del gre-tx1
ip tunnel del gre-nc1
# On TX1
ip link set gre-hub down
ip tunnel del gre-hub
# On NC1
ip link set gre-hub down
ip tunnel del gre-hub
# Then follow deployment-plan.md Phase 1 to rebuild
Lost SSH Access
If locked out due to UFW misconfiguration:
- Access server via provider's console (IPMI, VNC, etc.)
- Log in as root
- Disable UFW:
ufw disable - Fix rules, re-enable carefully
- Test SSH before closing console session
Complete Frostwall Removal (Rollback)
If you need to remove Frostwall entirely:
# Stop monitoring
crontab -e # Remove frostwall-monitor line
# Remove tunnels
ip link set gre-tx1 down
ip link set gre-nc1 down
ip tunnel del gre-tx1
ip tunnel del gre-nc1
# Remove NAT rules
iptables -t nat -F
iptables -F
# Restore previous UFW rules
ufw --force reset
# Re-add basic rules
# Remove persistence scripts
rm /etc/network/if-up.d/frostwall-*
rm /usr/local/bin/frostwall-monitor.sh
# Update DNS to point directly to server IPs
Common Error Messages
"RTNETLINK answers: File exists"
Meaning: Tunnel with that name already exists
Solution:
ip tunnel del gre-tx1 # Delete existing
# Then recreate
"RTNETLINK answers: Network is unreachable"
Meaning: Can't reach remote endpoint
Solution:
- Verify remote IP is correct
- Check if physical network to remote is up
- Ping remote physical IP
"GRE: DF set but fragmentation needed"
Meaning: MTU mismatch, packet too large
Solution:
ip link set gre-tx1 mtu 1400
"Operation not permitted"
Meaning: Not running as root or module not loaded
Solution:
sudo su # Become root
modprobe ip_gre # Load module
Monitoring and Health Checks
Daily health check commands:
# Check all tunnels are up
ip tunnel show
# Ping all tunnel endpoints
ping -c 4 10.0.1.2
ping -c 4 10.0.2.2
# Check monitor log
tail -20 /var/log/frostwall-monitor.log
# Verify NAT rules
iptables -t nat -L -n -v | head -20
Set up alerts (optional):
# Add to monitor script to send email on failure
# Requires mail configured
echo "Tunnel failure detected" | mail -s "ALERT: Frostwall Tunnel Down" admin@firefrostgaming.com
Getting Help
If none of these troubleshooting steps resolve your issue:
-
Gather diagnostics:
ip tunnel show > /tmp/frostwall-diag.txt ip addr show >> /tmp/frostwall-diag.txt ip route show >> /tmp/frostwall-diag.txt iptables -t nat -L -n -v >> /tmp/frostwall-diag.txt ufw status verbose >> /tmp/frostwall-diag.txt tail -100 /var/log/frostwall-monitor.log >> /tmp/frostwall-diag.txt -
Document symptoms:
- What were you trying to do?
- What happened instead?
- When did it start?
- What changed recently?
-
Check documentation:
- Review deployment-plan.md
- Review ip-hierarchy.md
-
Ask The Chronicler (future Claude session) with full diagnostics
Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️
Document Status: TROUBLESHOOTING GUIDE
Update When: New issues discovered, solutions found, error messages encountered