Files
firefrost-operations-manual/docs/tasks/frostwall-protocol/troubleshooting.md
Claude 2bd96ee8c7 docs: Complete Frostwall Protocol deployment documentation
Created comprehensive documentation for Frostwall Protocol rebuild:

deployment-plan.md (500+ lines):
- Complete 7-phase implementation guide
- GRE tunnel configuration for Command Center ↔ TX1/NC1
- Iron Wall UFW firewall rules
- NAT/port forwarding setup
- Self-healing tunnel monitoring with auto-recovery
- DNS configuration
- Testing and verification procedures
- Rollback plan
- Performance considerations

ip-hierarchy.md (400+ lines):
- Three-tier IP architecture explained
- Complete service mapping table (all 11 game servers)
- GRE tunnel IP addressing
- Traffic flow diagrams
- DNS configuration reference
- Security summary
- Quick command reference

troubleshooting.md (450+ lines):
- Quick diagnostics checklist
- Common problems with step-by-step solutions:
  - Tunnel won't come up
  - Can't ping tunnel IP
  - Port forwarding not working
  - Tunnel breaks after reboot
  - Self-healing monitor issues
  - High latency/packet loss
  - UFW blocking traffic
- Emergency recovery procedures
- Common error messages decoded
- Health check commands

This documentation enables rebuilding the Frostwall Protocol from scratch
with proper IP hierarchy, DDoS protection, and self-healing capabilities.

Unblocks: Mailcow deployment, AI stack, all Tier 2+ infrastructure

Task: Frostwall Protocol (Tier 1, Critical)
FFG-STD-002 compliant
2026-02-17 15:01:35 +00:00

13 KiB

Frostwall Protocol - Troubleshooting Guide

Purpose: Diagnose and resolve common Frostwall issues
Last Updated: 2026-02-17
Status: Ready for use


Quick Diagnostics Checklist

When something's not working, run through this checklist first:

  • Are GRE tunnels up? (ip tunnel show)
  • Can you ping tunnel endpoints? (ping 10.0.1.2, ping 10.0.2.2)
  • Is UFW blocking necessary traffic? (ufw status verbose)
  • Are NAT rules present? (iptables -t nat -L -n -v)
  • Is IP forwarding enabled? (cat /proc/sys/net/ipv4/ip_forward)
  • Is the self-healing monitor running? (crontab -l)
  • Did the server reboot recently? (tunnels may need manual restart)

Problem: Tunnel Won't Come Up

Symptoms

  • ip tunnel show shows no tunnel interface
  • Cannot create tunnel with ip tunnel add
  • Error: "RTNETLINK answers: File exists" or similar

Diagnosis

Step 1: Check if GRE module is loaded

lsmod | grep gre

Expected output:

ip_gre                 28672  0
gre                    16384  1 ip_gre

If not loaded:

modprobe ip_gre

Step 2: Check if tunnel already exists

ip tunnel show

If tunnel exists but is down:

ip link set gre-tx1 up  # or gre-nc1, gre-hub as appropriate

Step 3: Verify remote endpoint is reachable

ping 38.68.14.26  # TX1 physical IP
ping 216.239.104.130  # NC1 physical IP

If physical IPs aren't reachable, the GRE tunnel can't form.

Solution

Delete and recreate tunnel:

# If tunnel exists
ip link set gre-tx1 down
ip tunnel del gre-tx1

# Recreate
ip tunnel add gre-tx1 mode gre remote 38.68.14.26 local 63.143.34.217 ttl 255
ip addr add 10.0.1.1/30 dev gre-tx1
ip link set gre-tx1 up

# Test
ping 10.0.1.2

Problem: Can't Ping Tunnel IP

Symptoms

  • Tunnel shows as "UP" in ip link show
  • ping 10.0.1.2 times out or fails
  • No response from remote tunnel endpoint

Diagnosis

Step 1: Verify tunnel interface is actually up

ip link show gre-tx1

Look for: state UP

Step 2: Check if UFW is blocking GRE

ufw status verbose | grep -i gre

Expected:

Anywhere                   ALLOW       Anywhere                  # allow GRE
47                         ALLOW       Anywhere

Step 3: Check routing table

ip route show

You should see routes for tunnel IPs:

10.0.1.0/30 dev gre-tx1 proto kernel scope link src 10.0.1.1

Step 4: Check if remote server has tunnel up

# SSH to remote server
ssh root@38.68.14.26

# Check tunnel
ip link show gre-hub
ping 10.0.1.1

Solution

On both Command Center and remote node:

# Restart both ends of the tunnel
# Command Center:
ip link set gre-tx1 down
sleep 2
ip link set gre-tx1 up

# Remote (TX1):
ip link set gre-hub down
sleep 2
ip link set gre-hub up

# Test from both sides
ping 10.0.1.2  # From Command Center
ping 10.0.1.1  # From TX1

If UFW is blocking:

# On Command Center
ufw allow proto gre

# On TX1/NC1
ufw allow from 63.143.34.217 proto gre

Problem: Port Forwarding Not Working

Symptoms

  • Players can't connect to game servers
  • telnet 63.143.34.217 25565 times out or refuses
  • Tunnel is up and pingable, but game traffic doesn't flow

Diagnosis

Step 1: Check if NAT rules exist

iptables -t nat -L PREROUTING -n -v

Expected output (example for port 25565):

Chain PREROUTING (policy ACCEPT)
target     prot opt source      destination
DNAT       tcp  --  0.0.0.0/0   0.0.0.0/0    tcp dpt:25565 to:10.0.1.2:25565

Step 2: Check FORWARD chain

iptables -L FORWARD -n -v

Expected:

ACCEPT     tcp  --  0.0.0.0/0   10.0.1.2     tcp dpt:25565

Step 3: Verify IP forwarding is enabled

cat /proc/sys/net/ipv4/ip_forward

Expected: 1

Step 4: Test from Command Center itself

# From Command Center, test connection to tunnel IP
telnet 10.0.1.2 25565

If this works, but external connections don't, it's a NAT issue.

Step 5: Check if game server is actually listening

# SSH to TX1
ssh root@38.68.14.26

# Check if Minecraft is listening
netstat -tuln | grep 25565

Expected:

tcp6       0      0 :::25565                :::*                    LISTEN

Solution

Add missing NAT rules:

# On Command Center
iptables -t nat -A PREROUTING -p tcp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
iptables -t nat -A PREROUTING -p udp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
iptables -A FORWARD -p tcp -d 10.0.1.2 --dport 25565 -j ACCEPT
iptables -A FORWARD -p udp -d 10.0.1.2 --dport 25565 -j ACCEPT

# Add masquerading if not present
iptables -t nat -A POSTROUTING -o gre-tx1 -j MASQUERADE

# Save rules
iptables-save > /etc/iptables/rules.v4

Enable IP forwarding if disabled:

echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
sysctl -p

Problem: Tunnel Works But Breaks After Reboot

Symptoms

  • Tunnel works fine until server reboots
  • After reboot, tunnel doesn't come back up
  • Must manually recreate tunnel every time

Diagnosis

Check if persistence script exists:

ls -la /etc/network/if-up.d/frostwall-*

Check if it's executable:

ls -la /etc/network/if-up.d/frostwall-tunnels

Should show: -rwxr-xr-x

Check if script is being called on boot:

# Check recent boot logs
journalctl -b | grep frostwall

Solution

Create or fix persistence script:

See deployment-plan.md Phase 1.3 for full scripts.

Make sure it's executable:

chmod +x /etc/network/if-up.d/frostwall-tunnels

Test the script manually:

# Bring tunnel down first
ip link set gre-tx1 down

# Run the script
/etc/network/if-up.d/frostwall-tunnels

# Check if tunnel came back up
ip tunnel show
ping 10.0.1.2

Alternative: Use systemd service

If if-up.d hooks don't work, create a systemd service:

/etc/systemd/system/frostwall-tunnels.service:

[Unit]
Description=Frostwall GRE Tunnels
After=network.target

[Service]
Type=oneshot
ExecStart=/etc/network/if-up.d/frostwall-tunnels
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable frostwall-tunnels
systemctl start frostwall-tunnels

Problem: Self-Healing Monitor Not Running

Symptoms

  • Tunnels go down and don't auto-recover
  • No entries in /var/log/frostwall-monitor.log
  • Cron job not running

Diagnosis

Check if cron job is scheduled:

crontab -l | grep frostwall

Expected:

*/5 * * * * /usr/local/bin/frostwall-monitor.sh

Check if script exists:

ls -la /usr/local/bin/frostwall-monitor.sh

Check if it's executable:

chmod +x /usr/local/bin/frostwall-monitor.sh

Run script manually to test:

/usr/local/bin/frostwall-monitor.sh

Check logs:

cat /var/log/frostwall-monitor.log

Solution

Add cron job if missing:

crontab -e

Add:

*/5 * * * * /usr/local/bin/frostwall-monitor.sh

Fix script permissions:

chmod +x /usr/local/bin/frostwall-monitor.sh

Create log file if it doesn't exist:

touch /var/log/frostwall-monitor.log
chmod 644 /var/log/frostwall-monitor.log

Test the monitor:

# Bring down a tunnel manually
ip link set gre-tx1 down

# Wait 5 minutes
sleep 300

# Check if it auto-recovered
ping 10.0.1.2

# Check logs
tail /var/log/frostwall-monitor.log

Problem: High Latency or Packet Loss Through Tunnel

Symptoms

  • Players experience lag
  • ping 10.0.1.2 shows high latency or packet loss
  • Game connections are unstable

Diagnosis

Test latency:

# From Command Center
ping -c 100 10.0.1.2 | tail -5

Look for:

  • Average latency (should be <2ms for TX1, ~30-40ms for NC1)
  • Packet loss (should be 0%)

Test MTU size:

# Try different MTU sizes
ping -M do -s 1472 10.0.1.2  # Standard ethernet MTU
ping -M do -s 1450 10.0.1.2  # Lower MTU

If larger packets fail but smaller succeed, MTU is the issue.

Check CPU load:

top

High CPU on either end of tunnel could cause performance issues.

Check bandwidth:

# Install iperf3 if not present
apt install iperf3

# On TX1:
iperf3 -s

# On Command Center:
iperf3 -c 10.0.1.2

Solution

Adjust MTU if needed:

# On tunnel interface
ip link set gre-tx1 mtu 1400

Add to persistence script:

# In /etc/network/if-up.d/frostwall-tunnels
ip link set gre-tx1 mtu 1400

If packet loss persists:

  • Check physical network between nodes
  • Contact datacenter if persistent issues
  • Verify no other services saturating bandwidth

Problem: UFW Blocking Legitimate Traffic

Symptoms

  • Can't SSH to server
  • Specific ports not working despite NAT rules
  • Connection refused or timeout

Diagnosis

Check UFW status:

ufw status verbose

Check UFW logs:

tail -100 /var/log/ufw.log

Look for BLOCK entries for the port/IP you're trying to reach.

Test with UFW temporarily disabled:

ufw disable
# Try connection
# Re-enable immediately
ufw enable

⚠️ WARNING: Only disable UFW for brief testing, re-enable immediately.

Solution

Add specific rule for your management IP:

ufw allow from MANAGEMENT_IP to any port 22 proto tcp

Allow traffic on tunnel interfaces:

ufw allow in on gre-tx1
ufw allow in on gre-nc1
ufw allow in on gre-hub  # On TX1/NC1

Check rule order:

ufw status numbered

Rules are processed in order - make sure allow rules come before deny rules.

Delete and re-add rules if needed:

# Delete rule by number
ufw delete 5

# Re-add in correct order
ufw insert 1 allow from MANAGEMENT_IP to any port 22

Emergency Recovery Procedures

Complete Tunnel Failure

If all troubleshooting fails, rebuild from scratch:

# On Command Center
ip link set gre-tx1 down
ip link set gre-nc1 down
ip tunnel del gre-tx1
ip tunnel del gre-nc1

# On TX1
ip link set gre-hub down
ip tunnel del gre-hub

# On NC1
ip link set gre-hub down
ip tunnel del gre-hub

# Then follow deployment-plan.md Phase 1 to rebuild

Lost SSH Access

If locked out due to UFW misconfiguration:

  1. Access server via provider's console (IPMI, VNC, etc.)
  2. Log in as root
  3. Disable UFW: ufw disable
  4. Fix rules, re-enable carefully
  5. Test SSH before closing console session

Complete Frostwall Removal (Rollback)

If you need to remove Frostwall entirely:

# Stop monitoring
crontab -e  # Remove frostwall-monitor line

# Remove tunnels
ip link set gre-tx1 down
ip link set gre-nc1 down
ip tunnel del gre-tx1
ip tunnel del gre-nc1

# Remove NAT rules
iptables -t nat -F
iptables -F

# Restore previous UFW rules
ufw --force reset
# Re-add basic rules

# Remove persistence scripts
rm /etc/network/if-up.d/frostwall-*
rm /usr/local/bin/frostwall-monitor.sh

# Update DNS to point directly to server IPs

Common Error Messages

Meaning: Tunnel with that name already exists

Solution:

ip tunnel del gre-tx1  # Delete existing
# Then recreate

Meaning: Can't reach remote endpoint

Solution:

  • Verify remote IP is correct
  • Check if physical network to remote is up
  • Ping remote physical IP

"GRE: DF set but fragmentation needed"

Meaning: MTU mismatch, packet too large

Solution:

ip link set gre-tx1 mtu 1400

"Operation not permitted"

Meaning: Not running as root or module not loaded

Solution:

sudo su  # Become root
modprobe ip_gre  # Load module

Monitoring and Health Checks

Daily health check commands:

# Check all tunnels are up
ip tunnel show

# Ping all tunnel endpoints
ping -c 4 10.0.1.2
ping -c 4 10.0.2.2

# Check monitor log
tail -20 /var/log/frostwall-monitor.log

# Verify NAT rules
iptables -t nat -L -n -v | head -20

Set up alerts (optional):

# Add to monitor script to send email on failure
# Requires mail configured
echo "Tunnel failure detected" | mail -s "ALERT: Frostwall Tunnel Down" admin@firefrostgaming.com

Getting Help

If none of these troubleshooting steps resolve your issue:

  1. Gather diagnostics:

    ip tunnel show > /tmp/frostwall-diag.txt
    ip addr show >> /tmp/frostwall-diag.txt
    ip route show >> /tmp/frostwall-diag.txt
    iptables -t nat -L -n -v >> /tmp/frostwall-diag.txt
    ufw status verbose >> /tmp/frostwall-diag.txt
    tail -100 /var/log/frostwall-monitor.log >> /tmp/frostwall-diag.txt
    
  2. Document symptoms:

    • What were you trying to do?
    • What happened instead?
    • When did it start?
    • What changed recently?
  3. Check documentation:

    • Review deployment-plan.md
    • Review ip-hierarchy.md
  4. Ask The Chronicler (future Claude session) with full diagnostics


Fire + Frost + Foundation = Where Love Builds Legacy 💙🔥❄️


Document Status: TROUBLESHOOTING GUIDE
Update When: New issues discovered, solutions found, error messages encountered