Files
firefrost-operations-manual/docs/architecture-decisions.md
Michael Krause 600c67b3aa Add architecture decisions log
Finalized infrastructure role assignments:
- Command Center: Pure Frostwall Gateway
- TX1 Dallas: Management Hub (Phase 0.5 target)
- NC1 Charlotte: Pure game workload
- Ghost VPS: Marketing + flex capacity (keeping)
- ATM10 Java heap fix documented

Decision made 2026-02-08 after full capacity analysis.
2026-02-08 16:07:26 -06:00

3.6 KiB

Architecture Decisions Log

Decision Record: Infrastructure Role Assignments

Date: February 8, 2026
Status: APPROVED - Implementation in Progress


Context

Phase 0 Vanilla Reset completed. Need to decide where Phase 0.5 management services deploy:

  • Option 1: Command Center (original plan)
  • Option 2: TX1 Dallas (consolidation on underutilized dedi)
  • Option 3: New VPS (additional cost)

After full infrastructure audit, we have:

  • 4 VPS (modest specs)
  • 2 Dedicated Servers (32 vCPU, 256GB RAM each - massively underutilized)

Decision: TX1 Dallas = Management Hub

Command Center (63.143.34.217 + /29):

  • Role: Pure Frostwall Gateway ONLY
  • Purpose: Game traffic proxy (Cloudflare → Command Center → GRE → Hidden servers)
  • No management services - Optimized for bandwidth/forwarding
  • Specs: 2 vCPU, 4GB RAM, 10TB bandwidth @ 1Gbps

TX1 Dallas (38.68.14.26):

  • Role: Management Services Hub + Game Servers
  • Current: 6 game servers, 1% RAM usage (2.9GB/256GB)
  • Phase 0.5 Deploys HERE:
    • Gitea (migrate from Command Center)
    • Uptime Kuma (status.firefrostgaming.com)
    • BookStack (docs.firefrostgaming.com)
    • Netdata (analytics.firefrostgaming.com)
    • Vaultwarden (vault.firefrostgaming.com)
  • Specs: 32 vCPU, 256GB RAM, 1TB disk
  • Impact: +3GB RAM, +2.5 vCPU (negligible on 256GB system)

NC1 Charlotte (216.239.104.130):

  • Role: Pure game server workload
  • Current: 9 game servers, 7% RAM usage (18GB/256GB)
  • No management services - All resources for games

Ghost VPS (64.50.188.14):

  • Role: Marketing + Flex Capacity
  • Decision: KEEP (not cancel)
  • Purpose: Ghost CMS + future testing/staging/emergency failover
  • Cost: $10/month justified for operational flexibility

Rationale

Why TX1 for Management:

  • Already paying $160/month for dedis (zero additional cost)
  • Massively underutilized (1% RAM, light CPU)
  • Management services tiny compared to game servers (~3GB vs 256GB available)
  • Clean separation: Gateway = Command Center, Operations = TX1
  • Professional architecture without new VPS cost

Why Keep Command Center Pure:

  • Frostwall Gateway is public-facing (different security posture)
  • Mixing management + game proxy creates complex attack surface
  • 2 vCPU sufficient for proxy, not ideal for dual-role

Why Keep Ghost VPS:

  • Marketing site isolation (public-facing)
  • Emergency failover capacity
  • Future expansion flexibility
  • $10/month = cheap insurance

Impact Analysis

Capacity:

  • NC1: Can handle 20-25 game instances (currently 9)
  • TX1: Can handle 18-23 game instances after management (currently 6)
  • Total: 38-48 instances possible (currently 15 = 40% utilization)

Cost:

  • Current: $207/month
  • After: $207/month (no change)
  • Savings from optimization: $0 (but better architecture)

Performance:

  • Management services: <5% impact on TX1 capacity
  • Command Center: Freed up for pure Frostwall focus
  • NC1: Unchanged, dedicated to games

ATM10 Performance Issue (NC1):

  • Problem: Server ticks falling behind
  • Root Cause: -Xms128M -XX:MaxRAMPercentage=95.0 (dangerous heap)
  • Fix: Changed to -Xms12G -Xmx12G (fixed heap)
  • Date: 2026-02-08 15:52
  • Result: Consistent GC, no resizing pauses

Next Steps

  1. ATM10 fix applied (testing in progress)
  2. Deploy Uptime Kuma on TX1 (Phase 0.5 Service 2/5)
  3. Deploy BookStack on TX1 (Phase 0.5 Service 3/5)
  4. Migrate Gitea: Command Center → TX1
  5. Deploy Netdata, Vaultwarden on TX1
  6. Future: Rebuild Frostwall GRE tunnels (post-Phase 0.5)

End of Decision Record