Files
firefrost-operations-manual/docs/phase0-dismantling.md
Michael Krause dd03bf2a52 Fix documentation gaps from comprehensive audit
NEW: phase0-dismantling.md, mkdocs-deployment.md
UPDATED: architecture-decisions.md, pterodactyl-extensions-plan.md, INDEX.md

Audit: February 9, 2026 - All 5 gaps fixed
2026-02-09 12:58:57 -06:00

97 lines
2.9 KiB
Markdown

# Phase 0: Infrastructure Dismantling & Vanilla Reset
**Date:** February 7, 2026
**Status:** COMPLETE
**Purpose:** Document what was removed and why during the Phase 0 vanilla reset
---
## Executive Summary
On February 7, 2026, we dismantled the "Frostwall Protocol v1.0" - a complex GRE tunnel architecture that was causing more problems than it solved. This document preserves the technical details for future reference and explains the strategic decision to rebuild from a "vanilla baseline."
---
## What Was Dismantled
### Command Center (63.143.34.217)
**GRE Tunnels Removed:**
- gre-nc1 - Tunnel to NC1 Charlotte (192.168.20.1/30)
- gre-tx1 - Tunnel to TX1 Dallas (192.168.10.1/30)
**Processes Killed:**
- 68 leaked tunnel-related processes
- master_restore.sh background processes
- reboot_audit.sh background processes
**Cron Jobs Disabled:**
- master_restore.sh - Auto-restore tunnel configuration
- reboot_audit.sh - Tunnel health monitoring
**iptables Rules Cleaned:**
- All GRE-related NAT rules
- All tunnel routing rules
- Reset to default firewall policy
### NC1 Charlotte (216.239.104.130)
**GRE Tunnel Removed:**
- gre-cc - Tunnel to Command Center
- Tunnel IP: 192.168.20.2/30
- Peer: 63.143.34.217
### TX1 Dallas (38.68.14.26)
**GRE Tunnel Removed:**
- gre-tx1 - Tunnel to Command Center
- Tunnel IP: 192.168.10.2/30
- Secondary IP on tunnel: 38.68.14.188/32 (Billing Portal routing)
- Peer: 63.143.34.217
---
## Why It Was Removed
### Problem 1: CosmicGuard Double-Encapsulation
The original Charlotte node was behind CosmicGuard DDoS protection, which automatically creates GRE tunnels. Running our tunnel over their tunnel created double encapsulation and MTU issues.
### Problem 2: Protocol 47 Blocking
Upstream carrier was black-holing Protocol 47 (GRE) on 38.x IP ranges. Required migration to 216.239.104.x range.
### Problem 3: Complexity vs. Benefit
Constant connectivity issues, difficult troubleshooting, 68+ leaked processes, midnight emergencies.
### Problem 4: Maintenance Burden
With Michael's health and family planning goals, midnight pages were unsustainable.
---
## The Decision: Vanilla Reset
**Philosophy:** "Start from a clean baseline and rebuild properly."
**Future Plan (Phase 1):**
- Design simplified DDoS protection
- Cloudflare Spectrum or simplified GRE (decision pending)
- Focus on reliability over complexity
---
## Lessons Learned
1. Complexity has a cost - Every added layer is a potential failure point
2. Health matters - Infrastructure should support life, not consume it
3. Document before dismantling - This document preserves institutional knowledge
4. Vanilla baseline enables iteration - Easier to build correctly from scratch
5. Provider relationships matter - Breezehost's Jon Beard was crucial
---
## Revision History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-02-09 | Initial documentation (retroactive) |