# Frostwall Protocol - Troubleshooting Guide **Purpose:** Diagnose and resolve common Frostwall issues **Last Updated:** 2026-02-17 **Status:** Ready for use --- ## Quick Diagnostics Checklist When something's not working, run through this checklist first: - [ ] Are GRE tunnels up? (`ip tunnel show`) - [ ] Can you ping tunnel endpoints? (`ping 10.0.1.2`, `ping 10.0.2.2`) - [ ] Is UFW blocking necessary traffic? (`ufw status verbose`) - [ ] Are NAT rules present? (`iptables -t nat -L -n -v`) - [ ] Is IP forwarding enabled? (`cat /proc/sys/net/ipv4/ip_forward`) - [ ] Is the self-healing monitor running? (`crontab -l`) - [ ] Did the server reboot recently? (tunnels may need manual restart) --- ## Problem: Tunnel Won't Come Up ### Symptoms - `ip tunnel show` shows no tunnel interface - Cannot create tunnel with `ip tunnel add` - Error: "RTNETLINK answers: File exists" or similar ### Diagnosis **Step 1: Check if GRE module is loaded** ```bash lsmod | grep gre ``` **Expected output:** ``` ip_gre 28672 0 gre 16384 1 ip_gre ``` **If not loaded:** ```bash modprobe ip_gre ``` **Step 2: Check if tunnel already exists** ```bash ip tunnel show ``` **If tunnel exists but is down:** ```bash ip link set gre-tx1 up # or gre-nc1, gre-hub as appropriate ``` **Step 3: Verify remote endpoint is reachable** ```bash ping 38.68.14.26 # TX1 physical IP ping 216.239.104.130 # NC1 physical IP ``` If physical IPs aren't reachable, the GRE tunnel can't form. ### Solution **Delete and recreate tunnel:** ```bash # If tunnel exists ip link set gre-tx1 down ip tunnel del gre-tx1 # Recreate ip tunnel add gre-tx1 mode gre remote 38.68.14.26 local 63.143.34.217 ttl 255 ip addr add 10.0.1.1/30 dev gre-tx1 ip link set gre-tx1 up # Test ping 10.0.1.2 ``` --- ## Problem: Can't Ping Tunnel IP ### Symptoms - Tunnel shows as "UP" in `ip link show` - `ping 10.0.1.2` times out or fails - No response from remote tunnel endpoint ### Diagnosis **Step 1: Verify tunnel interface is actually up** ```bash ip link show gre-tx1 ``` Look for: `state UP` **Step 2: Check if UFW is blocking GRE** ```bash ufw status verbose | grep -i gre ``` **Expected:** ``` Anywhere ALLOW Anywhere # allow GRE 47 ALLOW Anywhere ``` **Step 3: Check routing table** ```bash ip route show ``` You should see routes for tunnel IPs: ``` 10.0.1.0/30 dev gre-tx1 proto kernel scope link src 10.0.1.1 ``` **Step 4: Check if remote server has tunnel up** ```bash # SSH to remote server ssh root@38.68.14.26 # Check tunnel ip link show gre-hub ping 10.0.1.1 ``` ### Solution **On both Command Center and remote node:** ```bash # Restart both ends of the tunnel # Command Center: ip link set gre-tx1 down sleep 2 ip link set gre-tx1 up # Remote (TX1): ip link set gre-hub down sleep 2 ip link set gre-hub up # Test from both sides ping 10.0.1.2 # From Command Center ping 10.0.1.1 # From TX1 ``` **If UFW is blocking:** ```bash # On Command Center ufw allow proto gre # On TX1/NC1 ufw allow from 63.143.34.217 proto gre ``` --- ## Problem: Port Forwarding Not Working ### Symptoms - Players can't connect to game servers - `telnet 63.143.34.217 25565` times out or refuses - Tunnel is up and pingable, but game traffic doesn't flow ### Diagnosis **Step 1: Check if NAT rules exist** ```bash iptables -t nat -L PREROUTING -n -v ``` **Expected output (example for port 25565):** ``` Chain PREROUTING (policy ACCEPT) target prot opt source destination DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:25565 to:10.0.1.2:25565 ``` **Step 2: Check FORWARD chain** ```bash iptables -L FORWARD -n -v ``` **Expected:** ``` ACCEPT tcp -- 0.0.0.0/0 10.0.1.2 tcp dpt:25565 ``` **Step 3: Verify IP forwarding is enabled** ```bash cat /proc/sys/net/ipv4/ip_forward ``` **Expected:** `1` **Step 4: Test from Command Center itself** ```bash # From Command Center, test connection to tunnel IP telnet 10.0.1.2 25565 ``` If this works, but external connections don't, it's a NAT issue. **Step 5: Check if game server is actually listening** ```bash # SSH to TX1 ssh root@38.68.14.26 # Check if Minecraft is listening netstat -tuln | grep 25565 ``` **Expected:** ``` tcp6 0 0 :::25565 :::* LISTEN ``` ### Solution **Add missing NAT rules:** ```bash # On Command Center iptables -t nat -A PREROUTING -p tcp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565 iptables -t nat -A PREROUTING -p udp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565 iptables -A FORWARD -p tcp -d 10.0.1.2 --dport 25565 -j ACCEPT iptables -A FORWARD -p udp -d 10.0.1.2 --dport 25565 -j ACCEPT # Add masquerading if not present iptables -t nat -A POSTROUTING -o gre-tx1 -j MASQUERADE # Save rules iptables-save > /etc/iptables/rules.v4 ``` **Enable IP forwarding if disabled:** ```bash echo 1 > /proc/sys/net/ipv4/ip_forward echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf sysctl -p ``` --- ## Problem: Tunnel Works But Breaks After Reboot ### Symptoms - Tunnel works fine until server reboots - After reboot, tunnel doesn't come back up - Must manually recreate tunnel every time ### Diagnosis **Check if persistence script exists:** ```bash ls -la /etc/network/if-up.d/frostwall-* ``` **Check if it's executable:** ```bash ls -la /etc/network/if-up.d/frostwall-tunnels ``` Should show: `-rwxr-xr-x` **Check if script is being called on boot:** ```bash # Check recent boot logs journalctl -b | grep frostwall ``` ### Solution **Create or fix persistence script:** See deployment-plan.md Phase 1.3 for full scripts. **Make sure it's executable:** ```bash chmod +x /etc/network/if-up.d/frostwall-tunnels ``` **Test the script manually:** ```bash # Bring tunnel down first ip link set gre-tx1 down # Run the script /etc/network/if-up.d/frostwall-tunnels # Check if tunnel came back up ip tunnel show ping 10.0.1.2 ``` **Alternative: Use systemd service** If if-up.d hooks don't work, create a systemd service: `/etc/systemd/system/frostwall-tunnels.service`: ``` [Unit] Description=Frostwall GRE Tunnels After=network.target [Service] Type=oneshot ExecStart=/etc/network/if-up.d/frostwall-tunnels RemainAfterExit=yes [Install] WantedBy=multi-user.target ``` ```bash systemctl daemon-reload systemctl enable frostwall-tunnels systemctl start frostwall-tunnels ``` --- ## Problem: Self-Healing Monitor Not Running ### Symptoms - Tunnels go down and don't auto-recover - No entries in `/var/log/frostwall-monitor.log` - Cron job not running ### Diagnosis **Check if cron job is scheduled:** ```bash crontab -l | grep frostwall ``` **Expected:** ``` */5 * * * * /usr/local/bin/frostwall-monitor.sh ``` **Check if script exists:** ```bash ls -la /usr/local/bin/frostwall-monitor.sh ``` **Check if it's executable:** ```bash chmod +x /usr/local/bin/frostwall-monitor.sh ``` **Run script manually to test:** ```bash /usr/local/bin/frostwall-monitor.sh ``` **Check logs:** ```bash cat /var/log/frostwall-monitor.log ``` ### Solution **Add cron job if missing:** ```bash crontab -e ``` Add: ``` */5 * * * * /usr/local/bin/frostwall-monitor.sh ``` **Fix script permissions:** ```bash chmod +x /usr/local/bin/frostwall-monitor.sh ``` **Create log file if it doesn't exist:** ```bash touch /var/log/frostwall-monitor.log chmod 644 /var/log/frostwall-monitor.log ``` **Test the monitor:** ```bash # Bring down a tunnel manually ip link set gre-tx1 down # Wait 5 minutes sleep 300 # Check if it auto-recovered ping 10.0.1.2 # Check logs tail /var/log/frostwall-monitor.log ``` --- ## Problem: High Latency or Packet Loss Through Tunnel ### Symptoms - Players experience lag - `ping 10.0.1.2` shows high latency or packet loss - Game connections are unstable ### Diagnosis **Test latency:** ```bash # From Command Center ping -c 100 10.0.1.2 | tail -5 ``` Look for: - Average latency (should be <2ms for TX1, ~30-40ms for NC1) - Packet loss (should be 0%) **Test MTU size:** ```bash # Try different MTU sizes ping -M do -s 1472 10.0.1.2 # Standard ethernet MTU ping -M do -s 1450 10.0.1.2 # Lower MTU ``` If larger packets fail but smaller succeed, MTU is the issue. **Check CPU load:** ```bash top ``` High CPU on either end of tunnel could cause performance issues. **Check bandwidth:** ```bash # Install iperf3 if not present apt install iperf3 # On TX1: iperf3 -s # On Command Center: iperf3 -c 10.0.1.2 ``` ### Solution **Adjust MTU if needed:** ```bash # On tunnel interface ip link set gre-tx1 mtu 1400 ``` **Add to persistence script:** ```bash # In /etc/network/if-up.d/frostwall-tunnels ip link set gre-tx1 mtu 1400 ``` **If packet loss persists:** - Check physical network between nodes - Contact datacenter if persistent issues - Verify no other services saturating bandwidth --- ## Problem: UFW Blocking Legitimate Traffic ### Symptoms - Can't SSH to server - Specific ports not working despite NAT rules - Connection refused or timeout ### Diagnosis **Check UFW status:** ```bash ufw status verbose ``` **Check UFW logs:** ```bash tail -100 /var/log/ufw.log ``` Look for BLOCK entries for the port/IP you're trying to reach. **Test with UFW temporarily disabled:** ```bash ufw disable # Try connection # Re-enable immediately ufw enable ``` **⚠️ WARNING:** Only disable UFW for brief testing, re-enable immediately. ### Solution **Add specific rule for your management IP:** ```bash ufw allow from MANAGEMENT_IP to any port 22 proto tcp ``` **Allow traffic on tunnel interfaces:** ```bash ufw allow in on gre-tx1 ufw allow in on gre-nc1 ufw allow in on gre-hub # On TX1/NC1 ``` **Check rule order:** ```bash ufw status numbered ``` Rules are processed in order - make sure allow rules come before deny rules. **Delete and re-add rules if needed:** ```bash # Delete rule by number ufw delete 5 # Re-add in correct order ufw insert 1 allow from MANAGEMENT_IP to any port 22 ``` --- ## Emergency Recovery Procedures ### Complete Tunnel Failure **If all troubleshooting fails, rebuild from scratch:** ```bash # On Command Center ip link set gre-tx1 down ip link set gre-nc1 down ip tunnel del gre-tx1 ip tunnel del gre-nc1 # On TX1 ip link set gre-hub down ip tunnel del gre-hub # On NC1 ip link set gre-hub down ip tunnel del gre-hub # Then follow deployment-plan.md Phase 1 to rebuild ``` ### Lost SSH Access **If locked out due to UFW misconfiguration:** 1. Access server via provider's console (IPMI, VNC, etc.) 2. Log in as root 3. Disable UFW: `ufw disable` 4. Fix rules, re-enable carefully 5. Test SSH before closing console session ### Complete Frostwall Removal (Rollback) **If you need to remove Frostwall entirely:** ```bash # Stop monitoring crontab -e # Remove frostwall-monitor line # Remove tunnels ip link set gre-tx1 down ip link set gre-nc1 down ip tunnel del gre-tx1 ip tunnel del gre-nc1 # Remove NAT rules iptables -t nat -F iptables -F # Restore previous UFW rules ufw --force reset # Re-add basic rules # Remove persistence scripts rm /etc/network/if-up.d/frostwall-* rm /usr/local/bin/frostwall-monitor.sh # Update DNS to point directly to server IPs ``` --- ## Common Error Messages ### "RTNETLINK answers: File exists" **Meaning:** Tunnel with that name already exists **Solution:** ```bash ip tunnel del gre-tx1 # Delete existing # Then recreate ``` ### "RTNETLINK answers: Network is unreachable" **Meaning:** Can't reach remote endpoint **Solution:** - Verify remote IP is correct - Check if physical network to remote is up - Ping remote physical IP ### "GRE: DF set but fragmentation needed" **Meaning:** MTU mismatch, packet too large **Solution:** ```bash ip link set gre-tx1 mtu 1400 ``` ### "Operation not permitted" **Meaning:** Not running as root or module not loaded **Solution:** ```bash sudo su # Become root modprobe ip_gre # Load module ``` --- ## Monitoring and Health Checks **Daily health check commands:** ```bash # Check all tunnels are up ip tunnel show # Ping all tunnel endpoints ping -c 4 10.0.1.2 ping -c 4 10.0.2.2 # Check monitor log tail -20 /var/log/frostwall-monitor.log # Verify NAT rules iptables -t nat -L -n -v | head -20 ``` **Set up alerts (optional):** ```bash # Add to monitor script to send email on failure # Requires mail configured echo "Tunnel failure detected" | mail -s "ALERT: Frostwall Tunnel Down" admin@firefrostgaming.com ``` --- ## Getting Help If none of these troubleshooting steps resolve your issue: 1. **Gather diagnostics:** ```bash ip tunnel show > /tmp/frostwall-diag.txt ip addr show >> /tmp/frostwall-diag.txt ip route show >> /tmp/frostwall-diag.txt iptables -t nat -L -n -v >> /tmp/frostwall-diag.txt ufw status verbose >> /tmp/frostwall-diag.txt tail -100 /var/log/frostwall-monitor.log >> /tmp/frostwall-diag.txt ``` 2. **Document symptoms:** - What were you trying to do? - What happened instead? - When did it start? - What changed recently? 3. **Check documentation:** - Review deployment-plan.md - Review ip-hierarchy.md 4. **Ask The Chronicler** (future Claude session) with full diagnostics --- **Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️ --- **Document Status:** TROUBLESHOOTING GUIDE **Update When:** New issues discovered, solutions found, error messages encountered