Files
firefrost-operations-manual/docs/architecture-decisions.md
Michael Krause 600c67b3aa Add architecture decisions log
Finalized infrastructure role assignments:
- Command Center: Pure Frostwall Gateway
- TX1 Dallas: Management Hub (Phase 0.5 target)
- NC1 Charlotte: Pure game workload
- Ghost VPS: Marketing + flex capacity (keeping)
- ATM10 Java heap fix documented

Decision made 2026-02-08 after full capacity analysis.
2026-02-08 16:07:26 -06:00

119 lines
3.6 KiB
Markdown

# Architecture Decisions Log
## Decision Record: Infrastructure Role Assignments
**Date:** February 8, 2026
**Status:** APPROVED - Implementation in Progress
---
### Context
Phase 0 Vanilla Reset completed. Need to decide where Phase 0.5 management services deploy:
- Option 1: Command Center (original plan)
- Option 2: TX1 Dallas (consolidation on underutilized dedi)
- Option 3: New VPS (additional cost)
After full infrastructure audit, we have:
- 4 VPS (modest specs)
- 2 Dedicated Servers (32 vCPU, 256GB RAM each - massively underutilized)
---
### Decision: TX1 Dallas = Management Hub
**Command Center (63.143.34.217 + /29):**
- **Role:** Pure Frostwall Gateway ONLY
- **Purpose:** Game traffic proxy (Cloudflare → Command Center → GRE → Hidden servers)
- **No management services** - Optimized for bandwidth/forwarding
- **Specs:** 2 vCPU, 4GB RAM, 10TB bandwidth @ 1Gbps
**TX1 Dallas (38.68.14.26):**
- **Role:** Management Services Hub + Game Servers
- **Current:** 6 game servers, 1% RAM usage (2.9GB/256GB)
- **Phase 0.5 Deploys HERE:**
- Gitea (migrate from Command Center)
- Uptime Kuma (status.firefrostgaming.com)
- BookStack (docs.firefrostgaming.com)
- Netdata (analytics.firefrostgaming.com)
- Vaultwarden (vault.firefrostgaming.com)
- **Specs:** 32 vCPU, 256GB RAM, 1TB disk
- **Impact:** +3GB RAM, +2.5 vCPU (negligible on 256GB system)
**NC1 Charlotte (216.239.104.130):**
- **Role:** Pure game server workload
- **Current:** 9 game servers, 7% RAM usage (18GB/256GB)
- **No management services** - All resources for games
**Ghost VPS (64.50.188.14):**
- **Role:** Marketing + Flex Capacity
- **Decision:** KEEP (not cancel)
- **Purpose:** Ghost CMS + future testing/staging/emergency failover
- **Cost:** $10/month justified for operational flexibility
---
### Rationale
**Why TX1 for Management:**
- ✅ Already paying $160/month for dedis (zero additional cost)
- ✅ Massively underutilized (1% RAM, light CPU)
- ✅ Management services tiny compared to game servers (~3GB vs 256GB available)
- ✅ Clean separation: Gateway = Command Center, Operations = TX1
- ✅ Professional architecture without new VPS cost
**Why Keep Command Center Pure:**
- Frostwall Gateway is public-facing (different security posture)
- Mixing management + game proxy creates complex attack surface
- 2 vCPU sufficient for proxy, not ideal for dual-role
**Why Keep Ghost VPS:**
- Marketing site isolation (public-facing)
- Emergency failover capacity
- Future expansion flexibility
- $10/month = cheap insurance
---
### Impact Analysis
**Capacity:**
- NC1: Can handle 20-25 game instances (currently 9)
- TX1: Can handle 18-23 game instances after management (currently 6)
- Total: 38-48 instances possible (currently 15 = 40% utilization)
**Cost:**
- Current: $207/month
- After: $207/month (no change)
- Savings from optimization: $0 (but better architecture)
**Performance:**
- Management services: <5% impact on TX1 capacity
- Command Center: Freed up for pure Frostwall focus
- NC1: Unchanged, dedicated to games
---
### Related Fixes
**ATM10 Performance Issue (NC1):**
- **Problem:** Server ticks falling behind
- **Root Cause:** `-Xms128M -XX:MaxRAMPercentage=95.0` (dangerous heap)
- **Fix:** Changed to `-Xms12G -Xmx12G` (fixed heap)
- **Date:** 2026-02-08 15:52
- **Result:** Consistent GC, no resizing pauses
---
### Next Steps
1. ✅ ATM10 fix applied (testing in progress)
2. ⏳ Deploy Uptime Kuma on TX1 (Phase 0.5 Service 2/5)
3. ⏳ Deploy BookStack on TX1 (Phase 0.5 Service 3/5)
4. ⏳ Migrate Gitea: Command Center → TX1
5. ⏳ Deploy Netdata, Vaultwarden on TX1
6. ⏳ Future: Rebuild Frostwall GRE tunnels (post-Phase 0.5)
---
**End of Decision Record**