NEW: phase0-dismantling.md, mkdocs-deployment.md UPDATED: architecture-decisions.md, pterodactyl-extensions-plan.md, INDEX.md Audit: February 9, 2026 - All 5 gaps fixed
97 lines
2.9 KiB
Markdown
97 lines
2.9 KiB
Markdown
# Phase 0: Infrastructure Dismantling & Vanilla Reset
|
|
|
|
**Date:** February 7, 2026
|
|
**Status:** COMPLETE
|
|
**Purpose:** Document what was removed and why during the Phase 0 vanilla reset
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
On February 7, 2026, we dismantled the "Frostwall Protocol v1.0" - a complex GRE tunnel architecture that was causing more problems than it solved. This document preserves the technical details for future reference and explains the strategic decision to rebuild from a "vanilla baseline."
|
|
|
|
---
|
|
|
|
## What Was Dismantled
|
|
|
|
### Command Center (63.143.34.217)
|
|
|
|
**GRE Tunnels Removed:**
|
|
- gre-nc1 - Tunnel to NC1 Charlotte (192.168.20.1/30)
|
|
- gre-tx1 - Tunnel to TX1 Dallas (192.168.10.1/30)
|
|
|
|
**Processes Killed:**
|
|
- 68 leaked tunnel-related processes
|
|
- master_restore.sh background processes
|
|
- reboot_audit.sh background processes
|
|
|
|
**Cron Jobs Disabled:**
|
|
- master_restore.sh - Auto-restore tunnel configuration
|
|
- reboot_audit.sh - Tunnel health monitoring
|
|
|
|
**iptables Rules Cleaned:**
|
|
- All GRE-related NAT rules
|
|
- All tunnel routing rules
|
|
- Reset to default firewall policy
|
|
|
|
### NC1 Charlotte (216.239.104.130)
|
|
|
|
**GRE Tunnel Removed:**
|
|
- gre-cc - Tunnel to Command Center
|
|
- Tunnel IP: 192.168.20.2/30
|
|
- Peer: 63.143.34.217
|
|
|
|
### TX1 Dallas (38.68.14.26)
|
|
|
|
**GRE Tunnel Removed:**
|
|
- gre-tx1 - Tunnel to Command Center
|
|
- Tunnel IP: 192.168.10.2/30
|
|
- Secondary IP on tunnel: 38.68.14.188/32 (Billing Portal routing)
|
|
- Peer: 63.143.34.217
|
|
|
|
---
|
|
|
|
## Why It Was Removed
|
|
|
|
### Problem 1: CosmicGuard Double-Encapsulation
|
|
The original Charlotte node was behind CosmicGuard DDoS protection, which automatically creates GRE tunnels. Running our tunnel over their tunnel created double encapsulation and MTU issues.
|
|
|
|
### Problem 2: Protocol 47 Blocking
|
|
Upstream carrier was black-holing Protocol 47 (GRE) on 38.x IP ranges. Required migration to 216.239.104.x range.
|
|
|
|
### Problem 3: Complexity vs. Benefit
|
|
Constant connectivity issues, difficult troubleshooting, 68+ leaked processes, midnight emergencies.
|
|
|
|
### Problem 4: Maintenance Burden
|
|
With Michael's health and family planning goals, midnight pages were unsustainable.
|
|
|
|
---
|
|
|
|
## The Decision: Vanilla Reset
|
|
|
|
**Philosophy:** "Start from a clean baseline and rebuild properly."
|
|
|
|
**Future Plan (Phase 1):**
|
|
- Design simplified DDoS protection
|
|
- Cloudflare Spectrum or simplified GRE (decision pending)
|
|
- Focus on reliability over complexity
|
|
|
|
---
|
|
|
|
## Lessons Learned
|
|
|
|
1. Complexity has a cost - Every added layer is a potential failure point
|
|
2. Health matters - Infrastructure should support life, not consume it
|
|
3. Document before dismantling - This document preserves institutional knowledge
|
|
4. Vanilla baseline enables iteration - Easier to build correctly from scratch
|
|
5. Provider relationships matter - Breezehost's Jon Beard was crucial
|
|
|
|
---
|
|
|
|
## Revision History
|
|
|
|
| Version | Date | Changes |
|
|
|---------|------|---------|
|
|
| 1.0 | 2026-02-09 | Initial documentation (retroactive) |
|
|
|