Files
firefrost-operations-manual/docs/phase0-dismantling.md
Michael Krause dd03bf2a52 Fix documentation gaps from comprehensive audit
NEW: phase0-dismantling.md, mkdocs-deployment.md
UPDATED: architecture-decisions.md, pterodactyl-extensions-plan.md, INDEX.md

Audit: February 9, 2026 - All 5 gaps fixed
2026-02-09 12:58:57 -06:00

2.9 KiB

Phase 0: Infrastructure Dismantling & Vanilla Reset

Date: February 7, 2026
Status: COMPLETE
Purpose: Document what was removed and why during the Phase 0 vanilla reset


Executive Summary

On February 7, 2026, we dismantled the "Frostwall Protocol v1.0" - a complex GRE tunnel architecture that was causing more problems than it solved. This document preserves the technical details for future reference and explains the strategic decision to rebuild from a "vanilla baseline."


What Was Dismantled

Command Center (63.143.34.217)

GRE Tunnels Removed:

  • gre-nc1 - Tunnel to NC1 Charlotte (192.168.20.1/30)
  • gre-tx1 - Tunnel to TX1 Dallas (192.168.10.1/30)

Processes Killed:

  • 68 leaked tunnel-related processes
  • master_restore.sh background processes
  • reboot_audit.sh background processes

Cron Jobs Disabled:

  • master_restore.sh - Auto-restore tunnel configuration
  • reboot_audit.sh - Tunnel health monitoring

iptables Rules Cleaned:

  • All GRE-related NAT rules
  • All tunnel routing rules
  • Reset to default firewall policy

NC1 Charlotte (216.239.104.130)

GRE Tunnel Removed:

  • gre-cc - Tunnel to Command Center
  • Tunnel IP: 192.168.20.2/30
  • Peer: 63.143.34.217

TX1 Dallas (38.68.14.26)

GRE Tunnel Removed:

  • gre-tx1 - Tunnel to Command Center
  • Tunnel IP: 192.168.10.2/30
  • Secondary IP on tunnel: 38.68.14.188/32 (Billing Portal routing)
  • Peer: 63.143.34.217

Why It Was Removed

Problem 1: CosmicGuard Double-Encapsulation

The original Charlotte node was behind CosmicGuard DDoS protection, which automatically creates GRE tunnels. Running our tunnel over their tunnel created double encapsulation and MTU issues.

Problem 2: Protocol 47 Blocking

Upstream carrier was black-holing Protocol 47 (GRE) on 38.x IP ranges. Required migration to 216.239.104.x range.

Problem 3: Complexity vs. Benefit

Constant connectivity issues, difficult troubleshooting, 68+ leaked processes, midnight emergencies.

Problem 4: Maintenance Burden

With Michael's health and family planning goals, midnight pages were unsustainable.


The Decision: Vanilla Reset

Philosophy: "Start from a clean baseline and rebuild properly."

Future Plan (Phase 1):

  • Design simplified DDoS protection
  • Cloudflare Spectrum or simplified GRE (decision pending)
  • Focus on reliability over complexity

Lessons Learned

  1. Complexity has a cost - Every added layer is a potential failure point
  2. Health matters - Infrastructure should support life, not consume it
  3. Document before dismantling - This document preserves institutional knowledge
  4. Vanilla baseline enables iteration - Easier to build correctly from scratch
  5. Provider relationships matter - Breezehost's Jon Beard was crucial

Revision History

Version Date Changes
1.0 2026-02-09 Initial documentation (retroactive)