Created comprehensive documentation for Frostwall Protocol rebuild: deployment-plan.md (500+ lines): - Complete 7-phase implementation guide - GRE tunnel configuration for Command Center ↔ TX1/NC1 - Iron Wall UFW firewall rules - NAT/port forwarding setup - Self-healing tunnel monitoring with auto-recovery - DNS configuration - Testing and verification procedures - Rollback plan - Performance considerations ip-hierarchy.md (400+ lines): - Three-tier IP architecture explained - Complete service mapping table (all 11 game servers) - GRE tunnel IP addressing - Traffic flow diagrams - DNS configuration reference - Security summary - Quick command reference troubleshooting.md (450+ lines): - Quick diagnostics checklist - Common problems with step-by-step solutions: - Tunnel won't come up - Can't ping tunnel IP - Port forwarding not working - Tunnel breaks after reboot - Self-healing monitor issues - High latency/packet loss - UFW blocking traffic - Emergency recovery procedures - Common error messages decoded - Health check commands This documentation enables rebuilding the Frostwall Protocol from scratch with proper IP hierarchy, DDoS protection, and self-healing capabilities. Unblocks: Mailcow deployment, AI stack, all Tier 2+ infrastructure Task: Frostwall Protocol (Tier 1, Critical) FFG-STD-002 compliant
706 lines
13 KiB
Markdown
706 lines
13 KiB
Markdown
# Frostwall Protocol - Troubleshooting Guide
|
|
|
|
**Purpose:** Diagnose and resolve common Frostwall issues
|
|
**Last Updated:** 2026-02-17
|
|
**Status:** Ready for use
|
|
|
|
---
|
|
|
|
## Quick Diagnostics Checklist
|
|
|
|
When something's not working, run through this checklist first:
|
|
|
|
- [ ] Are GRE tunnels up? (`ip tunnel show`)
|
|
- [ ] Can you ping tunnel endpoints? (`ping 10.0.1.2`, `ping 10.0.2.2`)
|
|
- [ ] Is UFW blocking necessary traffic? (`ufw status verbose`)
|
|
- [ ] Are NAT rules present? (`iptables -t nat -L -n -v`)
|
|
- [ ] Is IP forwarding enabled? (`cat /proc/sys/net/ipv4/ip_forward`)
|
|
- [ ] Is the self-healing monitor running? (`crontab -l`)
|
|
- [ ] Did the server reboot recently? (tunnels may need manual restart)
|
|
|
|
---
|
|
|
|
## Problem: Tunnel Won't Come Up
|
|
|
|
### Symptoms
|
|
- `ip tunnel show` shows no tunnel interface
|
|
- Cannot create tunnel with `ip tunnel add`
|
|
- Error: "RTNETLINK answers: File exists" or similar
|
|
|
|
### Diagnosis
|
|
|
|
**Step 1: Check if GRE module is loaded**
|
|
```bash
|
|
lsmod | grep gre
|
|
```
|
|
|
|
**Expected output:**
|
|
```
|
|
ip_gre 28672 0
|
|
gre 16384 1 ip_gre
|
|
```
|
|
|
|
**If not loaded:**
|
|
```bash
|
|
modprobe ip_gre
|
|
```
|
|
|
|
**Step 2: Check if tunnel already exists**
|
|
```bash
|
|
ip tunnel show
|
|
```
|
|
|
|
**If tunnel exists but is down:**
|
|
```bash
|
|
ip link set gre-tx1 up # or gre-nc1, gre-hub as appropriate
|
|
```
|
|
|
|
**Step 3: Verify remote endpoint is reachable**
|
|
```bash
|
|
ping 38.68.14.26 # TX1 physical IP
|
|
ping 216.239.104.130 # NC1 physical IP
|
|
```
|
|
|
|
If physical IPs aren't reachable, the GRE tunnel can't form.
|
|
|
|
### Solution
|
|
|
|
**Delete and recreate tunnel:**
|
|
```bash
|
|
# If tunnel exists
|
|
ip link set gre-tx1 down
|
|
ip tunnel del gre-tx1
|
|
|
|
# Recreate
|
|
ip tunnel add gre-tx1 mode gre remote 38.68.14.26 local 63.143.34.217 ttl 255
|
|
ip addr add 10.0.1.1/30 dev gre-tx1
|
|
ip link set gre-tx1 up
|
|
|
|
# Test
|
|
ping 10.0.1.2
|
|
```
|
|
|
|
---
|
|
|
|
## Problem: Can't Ping Tunnel IP
|
|
|
|
### Symptoms
|
|
- Tunnel shows as "UP" in `ip link show`
|
|
- `ping 10.0.1.2` times out or fails
|
|
- No response from remote tunnel endpoint
|
|
|
|
### Diagnosis
|
|
|
|
**Step 1: Verify tunnel interface is actually up**
|
|
```bash
|
|
ip link show gre-tx1
|
|
```
|
|
|
|
Look for: `state UP`
|
|
|
|
**Step 2: Check if UFW is blocking GRE**
|
|
```bash
|
|
ufw status verbose | grep -i gre
|
|
```
|
|
|
|
**Expected:**
|
|
```
|
|
Anywhere ALLOW Anywhere # allow GRE
|
|
47 ALLOW Anywhere
|
|
```
|
|
|
|
**Step 3: Check routing table**
|
|
```bash
|
|
ip route show
|
|
```
|
|
|
|
You should see routes for tunnel IPs:
|
|
```
|
|
10.0.1.0/30 dev gre-tx1 proto kernel scope link src 10.0.1.1
|
|
```
|
|
|
|
**Step 4: Check if remote server has tunnel up**
|
|
```bash
|
|
# SSH to remote server
|
|
ssh root@38.68.14.26
|
|
|
|
# Check tunnel
|
|
ip link show gre-hub
|
|
ping 10.0.1.1
|
|
```
|
|
|
|
### Solution
|
|
|
|
**On both Command Center and remote node:**
|
|
|
|
```bash
|
|
# Restart both ends of the tunnel
|
|
# Command Center:
|
|
ip link set gre-tx1 down
|
|
sleep 2
|
|
ip link set gre-tx1 up
|
|
|
|
# Remote (TX1):
|
|
ip link set gre-hub down
|
|
sleep 2
|
|
ip link set gre-hub up
|
|
|
|
# Test from both sides
|
|
ping 10.0.1.2 # From Command Center
|
|
ping 10.0.1.1 # From TX1
|
|
```
|
|
|
|
**If UFW is blocking:**
|
|
```bash
|
|
# On Command Center
|
|
ufw allow proto gre
|
|
|
|
# On TX1/NC1
|
|
ufw allow from 63.143.34.217 proto gre
|
|
```
|
|
|
|
---
|
|
|
|
## Problem: Port Forwarding Not Working
|
|
|
|
### Symptoms
|
|
- Players can't connect to game servers
|
|
- `telnet 63.143.34.217 25565` times out or refuses
|
|
- Tunnel is up and pingable, but game traffic doesn't flow
|
|
|
|
### Diagnosis
|
|
|
|
**Step 1: Check if NAT rules exist**
|
|
```bash
|
|
iptables -t nat -L PREROUTING -n -v
|
|
```
|
|
|
|
**Expected output (example for port 25565):**
|
|
```
|
|
Chain PREROUTING (policy ACCEPT)
|
|
target prot opt source destination
|
|
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:25565 to:10.0.1.2:25565
|
|
```
|
|
|
|
**Step 2: Check FORWARD chain**
|
|
```bash
|
|
iptables -L FORWARD -n -v
|
|
```
|
|
|
|
**Expected:**
|
|
```
|
|
ACCEPT tcp -- 0.0.0.0/0 10.0.1.2 tcp dpt:25565
|
|
```
|
|
|
|
**Step 3: Verify IP forwarding is enabled**
|
|
```bash
|
|
cat /proc/sys/net/ipv4/ip_forward
|
|
```
|
|
|
|
**Expected:** `1`
|
|
|
|
**Step 4: Test from Command Center itself**
|
|
```bash
|
|
# From Command Center, test connection to tunnel IP
|
|
telnet 10.0.1.2 25565
|
|
```
|
|
|
|
If this works, but external connections don't, it's a NAT issue.
|
|
|
|
**Step 5: Check if game server is actually listening**
|
|
```bash
|
|
# SSH to TX1
|
|
ssh root@38.68.14.26
|
|
|
|
# Check if Minecraft is listening
|
|
netstat -tuln | grep 25565
|
|
```
|
|
|
|
**Expected:**
|
|
```
|
|
tcp6 0 0 :::25565 :::* LISTEN
|
|
```
|
|
|
|
### Solution
|
|
|
|
**Add missing NAT rules:**
|
|
```bash
|
|
# On Command Center
|
|
iptables -t nat -A PREROUTING -p tcp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
|
|
iptables -t nat -A PREROUTING -p udp --dport 25565 -j DNAT --to-destination 10.0.1.2:25565
|
|
iptables -A FORWARD -p tcp -d 10.0.1.2 --dport 25565 -j ACCEPT
|
|
iptables -A FORWARD -p udp -d 10.0.1.2 --dport 25565 -j ACCEPT
|
|
|
|
# Add masquerading if not present
|
|
iptables -t nat -A POSTROUTING -o gre-tx1 -j MASQUERADE
|
|
|
|
# Save rules
|
|
iptables-save > /etc/iptables/rules.v4
|
|
```
|
|
|
|
**Enable IP forwarding if disabled:**
|
|
```bash
|
|
echo 1 > /proc/sys/net/ipv4/ip_forward
|
|
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
|
|
sysctl -p
|
|
```
|
|
|
|
---
|
|
|
|
## Problem: Tunnel Works But Breaks After Reboot
|
|
|
|
### Symptoms
|
|
- Tunnel works fine until server reboots
|
|
- After reboot, tunnel doesn't come back up
|
|
- Must manually recreate tunnel every time
|
|
|
|
### Diagnosis
|
|
|
|
**Check if persistence script exists:**
|
|
```bash
|
|
ls -la /etc/network/if-up.d/frostwall-*
|
|
```
|
|
|
|
**Check if it's executable:**
|
|
```bash
|
|
ls -la /etc/network/if-up.d/frostwall-tunnels
|
|
```
|
|
|
|
Should show: `-rwxr-xr-x`
|
|
|
|
**Check if script is being called on boot:**
|
|
```bash
|
|
# Check recent boot logs
|
|
journalctl -b | grep frostwall
|
|
```
|
|
|
|
### Solution
|
|
|
|
**Create or fix persistence script:**
|
|
|
|
See deployment-plan.md Phase 1.3 for full scripts.
|
|
|
|
**Make sure it's executable:**
|
|
```bash
|
|
chmod +x /etc/network/if-up.d/frostwall-tunnels
|
|
```
|
|
|
|
**Test the script manually:**
|
|
```bash
|
|
# Bring tunnel down first
|
|
ip link set gre-tx1 down
|
|
|
|
# Run the script
|
|
/etc/network/if-up.d/frostwall-tunnels
|
|
|
|
# Check if tunnel came back up
|
|
ip tunnel show
|
|
ping 10.0.1.2
|
|
```
|
|
|
|
**Alternative: Use systemd service**
|
|
|
|
If if-up.d hooks don't work, create a systemd service:
|
|
|
|
`/etc/systemd/system/frostwall-tunnels.service`:
|
|
```
|
|
[Unit]
|
|
Description=Frostwall GRE Tunnels
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
ExecStart=/etc/network/if-up.d/frostwall-tunnels
|
|
RemainAfterExit=yes
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
```bash
|
|
systemctl daemon-reload
|
|
systemctl enable frostwall-tunnels
|
|
systemctl start frostwall-tunnels
|
|
```
|
|
|
|
---
|
|
|
|
## Problem: Self-Healing Monitor Not Running
|
|
|
|
### Symptoms
|
|
- Tunnels go down and don't auto-recover
|
|
- No entries in `/var/log/frostwall-monitor.log`
|
|
- Cron job not running
|
|
|
|
### Diagnosis
|
|
|
|
**Check if cron job is scheduled:**
|
|
```bash
|
|
crontab -l | grep frostwall
|
|
```
|
|
|
|
**Expected:**
|
|
```
|
|
*/5 * * * * /usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Check if script exists:**
|
|
```bash
|
|
ls -la /usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Check if it's executable:**
|
|
```bash
|
|
chmod +x /usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Run script manually to test:**
|
|
```bash
|
|
/usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Check logs:**
|
|
```bash
|
|
cat /var/log/frostwall-monitor.log
|
|
```
|
|
|
|
### Solution
|
|
|
|
**Add cron job if missing:**
|
|
```bash
|
|
crontab -e
|
|
```
|
|
|
|
Add:
|
|
```
|
|
*/5 * * * * /usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Fix script permissions:**
|
|
```bash
|
|
chmod +x /usr/local/bin/frostwall-monitor.sh
|
|
```
|
|
|
|
**Create log file if it doesn't exist:**
|
|
```bash
|
|
touch /var/log/frostwall-monitor.log
|
|
chmod 644 /var/log/frostwall-monitor.log
|
|
```
|
|
|
|
**Test the monitor:**
|
|
```bash
|
|
# Bring down a tunnel manually
|
|
ip link set gre-tx1 down
|
|
|
|
# Wait 5 minutes
|
|
sleep 300
|
|
|
|
# Check if it auto-recovered
|
|
ping 10.0.1.2
|
|
|
|
# Check logs
|
|
tail /var/log/frostwall-monitor.log
|
|
```
|
|
|
|
---
|
|
|
|
## Problem: High Latency or Packet Loss Through Tunnel
|
|
|
|
### Symptoms
|
|
- Players experience lag
|
|
- `ping 10.0.1.2` shows high latency or packet loss
|
|
- Game connections are unstable
|
|
|
|
### Diagnosis
|
|
|
|
**Test latency:**
|
|
```bash
|
|
# From Command Center
|
|
ping -c 100 10.0.1.2 | tail -5
|
|
```
|
|
|
|
Look for:
|
|
- Average latency (should be <2ms for TX1, ~30-40ms for NC1)
|
|
- Packet loss (should be 0%)
|
|
|
|
**Test MTU size:**
|
|
```bash
|
|
# Try different MTU sizes
|
|
ping -M do -s 1472 10.0.1.2 # Standard ethernet MTU
|
|
ping -M do -s 1450 10.0.1.2 # Lower MTU
|
|
```
|
|
|
|
If larger packets fail but smaller succeed, MTU is the issue.
|
|
|
|
**Check CPU load:**
|
|
```bash
|
|
top
|
|
```
|
|
|
|
High CPU on either end of tunnel could cause performance issues.
|
|
|
|
**Check bandwidth:**
|
|
```bash
|
|
# Install iperf3 if not present
|
|
apt install iperf3
|
|
|
|
# On TX1:
|
|
iperf3 -s
|
|
|
|
# On Command Center:
|
|
iperf3 -c 10.0.1.2
|
|
```
|
|
|
|
### Solution
|
|
|
|
**Adjust MTU if needed:**
|
|
```bash
|
|
# On tunnel interface
|
|
ip link set gre-tx1 mtu 1400
|
|
```
|
|
|
|
**Add to persistence script:**
|
|
```bash
|
|
# In /etc/network/if-up.d/frostwall-tunnels
|
|
ip link set gre-tx1 mtu 1400
|
|
```
|
|
|
|
**If packet loss persists:**
|
|
- Check physical network between nodes
|
|
- Contact datacenter if persistent issues
|
|
- Verify no other services saturating bandwidth
|
|
|
|
---
|
|
|
|
## Problem: UFW Blocking Legitimate Traffic
|
|
|
|
### Symptoms
|
|
- Can't SSH to server
|
|
- Specific ports not working despite NAT rules
|
|
- Connection refused or timeout
|
|
|
|
### Diagnosis
|
|
|
|
**Check UFW status:**
|
|
```bash
|
|
ufw status verbose
|
|
```
|
|
|
|
**Check UFW logs:**
|
|
```bash
|
|
tail -100 /var/log/ufw.log
|
|
```
|
|
|
|
Look for BLOCK entries for the port/IP you're trying to reach.
|
|
|
|
**Test with UFW temporarily disabled:**
|
|
```bash
|
|
ufw disable
|
|
# Try connection
|
|
# Re-enable immediately
|
|
ufw enable
|
|
```
|
|
|
|
**⚠️ WARNING:** Only disable UFW for brief testing, re-enable immediately.
|
|
|
|
### Solution
|
|
|
|
**Add specific rule for your management IP:**
|
|
```bash
|
|
ufw allow from MANAGEMENT_IP to any port 22 proto tcp
|
|
```
|
|
|
|
**Allow traffic on tunnel interfaces:**
|
|
```bash
|
|
ufw allow in on gre-tx1
|
|
ufw allow in on gre-nc1
|
|
ufw allow in on gre-hub # On TX1/NC1
|
|
```
|
|
|
|
**Check rule order:**
|
|
```bash
|
|
ufw status numbered
|
|
```
|
|
|
|
Rules are processed in order - make sure allow rules come before deny rules.
|
|
|
|
**Delete and re-add rules if needed:**
|
|
```bash
|
|
# Delete rule by number
|
|
ufw delete 5
|
|
|
|
# Re-add in correct order
|
|
ufw insert 1 allow from MANAGEMENT_IP to any port 22
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Recovery Procedures
|
|
|
|
### Complete Tunnel Failure
|
|
|
|
**If all troubleshooting fails, rebuild from scratch:**
|
|
|
|
```bash
|
|
# On Command Center
|
|
ip link set gre-tx1 down
|
|
ip link set gre-nc1 down
|
|
ip tunnel del gre-tx1
|
|
ip tunnel del gre-nc1
|
|
|
|
# On TX1
|
|
ip link set gre-hub down
|
|
ip tunnel del gre-hub
|
|
|
|
# On NC1
|
|
ip link set gre-hub down
|
|
ip tunnel del gre-hub
|
|
|
|
# Then follow deployment-plan.md Phase 1 to rebuild
|
|
```
|
|
|
|
### Lost SSH Access
|
|
|
|
**If locked out due to UFW misconfiguration:**
|
|
|
|
1. Access server via provider's console (IPMI, VNC, etc.)
|
|
2. Log in as root
|
|
3. Disable UFW: `ufw disable`
|
|
4. Fix rules, re-enable carefully
|
|
5. Test SSH before closing console session
|
|
|
|
### Complete Frostwall Removal (Rollback)
|
|
|
|
**If you need to remove Frostwall entirely:**
|
|
|
|
```bash
|
|
# Stop monitoring
|
|
crontab -e # Remove frostwall-monitor line
|
|
|
|
# Remove tunnels
|
|
ip link set gre-tx1 down
|
|
ip link set gre-nc1 down
|
|
ip tunnel del gre-tx1
|
|
ip tunnel del gre-nc1
|
|
|
|
# Remove NAT rules
|
|
iptables -t nat -F
|
|
iptables -F
|
|
|
|
# Restore previous UFW rules
|
|
ufw --force reset
|
|
# Re-add basic rules
|
|
|
|
# Remove persistence scripts
|
|
rm /etc/network/if-up.d/frostwall-*
|
|
rm /usr/local/bin/frostwall-monitor.sh
|
|
|
|
# Update DNS to point directly to server IPs
|
|
```
|
|
|
|
---
|
|
|
|
## Common Error Messages
|
|
|
|
### "RTNETLINK answers: File exists"
|
|
|
|
**Meaning:** Tunnel with that name already exists
|
|
|
|
**Solution:**
|
|
```bash
|
|
ip tunnel del gre-tx1 # Delete existing
|
|
# Then recreate
|
|
```
|
|
|
|
### "RTNETLINK answers: Network is unreachable"
|
|
|
|
**Meaning:** Can't reach remote endpoint
|
|
|
|
**Solution:**
|
|
- Verify remote IP is correct
|
|
- Check if physical network to remote is up
|
|
- Ping remote physical IP
|
|
|
|
### "GRE: DF set but fragmentation needed"
|
|
|
|
**Meaning:** MTU mismatch, packet too large
|
|
|
|
**Solution:**
|
|
```bash
|
|
ip link set gre-tx1 mtu 1400
|
|
```
|
|
|
|
### "Operation not permitted"
|
|
|
|
**Meaning:** Not running as root or module not loaded
|
|
|
|
**Solution:**
|
|
```bash
|
|
sudo su # Become root
|
|
modprobe ip_gre # Load module
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring and Health Checks
|
|
|
|
**Daily health check commands:**
|
|
```bash
|
|
# Check all tunnels are up
|
|
ip tunnel show
|
|
|
|
# Ping all tunnel endpoints
|
|
ping -c 4 10.0.1.2
|
|
ping -c 4 10.0.2.2
|
|
|
|
# Check monitor log
|
|
tail -20 /var/log/frostwall-monitor.log
|
|
|
|
# Verify NAT rules
|
|
iptables -t nat -L -n -v | head -20
|
|
```
|
|
|
|
**Set up alerts (optional):**
|
|
```bash
|
|
# Add to monitor script to send email on failure
|
|
# Requires mail configured
|
|
echo "Tunnel failure detected" | mail -s "ALERT: Frostwall Tunnel Down" admin@firefrostgaming.com
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
If none of these troubleshooting steps resolve your issue:
|
|
|
|
1. **Gather diagnostics:**
|
|
```bash
|
|
ip tunnel show > /tmp/frostwall-diag.txt
|
|
ip addr show >> /tmp/frostwall-diag.txt
|
|
ip route show >> /tmp/frostwall-diag.txt
|
|
iptables -t nat -L -n -v >> /tmp/frostwall-diag.txt
|
|
ufw status verbose >> /tmp/frostwall-diag.txt
|
|
tail -100 /var/log/frostwall-monitor.log >> /tmp/frostwall-diag.txt
|
|
```
|
|
|
|
2. **Document symptoms:**
|
|
- What were you trying to do?
|
|
- What happened instead?
|
|
- When did it start?
|
|
- What changed recently?
|
|
|
|
3. **Check documentation:**
|
|
- Review deployment-plan.md
|
|
- Review ip-hierarchy.md
|
|
|
|
4. **Ask The Chronicler** (future Claude session) with full diagnostics
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️
|
|
|
|
---
|
|
|
|
**Document Status:** TROUBLESHOOTING GUIDE
|
|
**Update When:** New issues discovered, solutions found, error messages encountered
|