From 3c8a068b1dfe602278c791799f67fcb8d1475a91 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 17 Feb 2026 22:58:31 +0000 Subject: [PATCH] docs: Add comprehensive Netdata deployment guide Created complete deployment guide for Netdata monitoring (400+ lines): Deployment Strategy: - Install on all 4 infrastructure servers - Command Center, TX1, NC1, Ghost VPS - Quick one-line install per server - Total deployment time: 30 minutes Configuration: - UFW firewall rules (management IP only) - Parent-child streaming (unified dashboard) - Custom alert configuration (CPU/RAM/Disk) - Discord webhook integration - Health monitoring Features: - Real-time performance monitoring - Beautiful web dashboards on port 19999 - Zero configuration required - Lightweight (< 3% CPU, ~100 MB RAM) - Auto-detects all services and metrics Monitoring Targets: - CPU, RAM, Disk, Network metrics - Java heap usage (Minecraft servers) - Service-specific monitoring - Alert thresholds configurable Advanced Features: - Netdata Cloud integration (centralized) - Custom dashboards - Mobile app access - Longer data retention Troubleshooting guide included for common issues. Ready to deploy when SSH access available. Task: Netdata Deployment (Tier 2) FFG-STD-002 compliant --- .../netdata-deployment/deployment-guide.md | 503 ++++++++++++++++++ 1 file changed, 503 insertions(+) create mode 100644 docs/tasks/netdata-deployment/deployment-guide.md diff --git a/docs/tasks/netdata-deployment/deployment-guide.md b/docs/tasks/netdata-deployment/deployment-guide.md new file mode 100644 index 0000000..5034d45 --- /dev/null +++ b/docs/tasks/netdata-deployment/deployment-guide.md @@ -0,0 +1,503 @@ +# Netdata Deployment - Complete Guide + +**Status:** Ready to Deploy +**Priority:** Tier 2 - Infrastructure Monitoring +**Time Estimate:** 30 minutes (all servers) +**Last Updated:** 2026-02-17 + +--- + +## Overview + +Deploy Netdata real-time monitoring across all Firefrost infrastructure. Provides beautiful dashboards for CPU, RAM, disk, network, and application metrics with zero configuration required. + +**What is Netdata?** +- Real-time performance monitoring +- Beautiful web dashboards +- Zero configuration needed +- Extremely lightweight (< 3% CPU, ~100 MB RAM) +- Open source and free + +--- + +## Deployment Targets + +**All 4 infrastructure servers:** + +1. **Command Center** (63.143.34.217) - Dallas hub + - Services: Gitea, Uptime Kuma, Code-Server, Automation + - Dashboard: `http://63.143.34.217:19999` + +2. **TX1** (38.68.14.26) - Dallas game servers + - Services: 5 Minecraft servers + FoundryVTT + - Dashboard: `http://38.68.14.26:19999` + +3. **NC1** (216.239.104.130) - Charlotte game servers + - Services: 6 Minecraft servers + Hytale + - Dashboard: `http://216.239.104.130:19999` + +4. **Ghost VPS** (64.50.188.14) - Chicago staff services + - Services: MkDocs, Wiki.js (x2), NextCloud + - Dashboard: `http://64.50.188.14:19999` + +--- + +## Installation (Per Server) + +### One-Line Install + +**On each server:** + +```bash +# Install Netdata +bash <(curl -Ss https://my-netdata.io/kickstart.sh) + +# The installer will: +# - Auto-detect your OS +# - Install dependencies +# - Compile and install Netdata +# - Start the service +# - Open port 19999 +``` + +**Installation takes:** 2-5 minutes per server + +--- + +## Step-by-Step Deployment + +### Phase 1: Install on Command Center (10 min) + +```bash +# SSH to Command Center +ssh root@63.143.34.217 + +# Run installer +bash <(curl -Ss https://my-netdata.io/kickstart.sh) + +# Wait for installation to complete +# Answer prompts (usually just press Enter for defaults) + +# Verify installation +systemctl status netdata + +# Should show: active (running) + +# Test dashboard +curl http://localhost:19999 + +# Should return HTML +``` + +**Open in browser:** `http://63.143.34.217:19999` + +You should see the Netdata dashboard! + +--- + +### Phase 2: Install on TX1 (5 min) + +```bash +# SSH to TX1 +ssh root@38.68.14.26 + +# Run installer +bash <(curl -Ss https://my-netdata.io/kickstart.sh) + +# Verify +systemctl status netdata + +# Test +curl http://localhost:19999 +``` + +**Open in browser:** `http://38.68.14.26:19999` + +--- + +### Phase 3: Install on NC1 (5 min) + +```bash +# SSH to NC1 +ssh root@216.239.104.130 + +# Run installer +bash <(curl -Ss https://my-netdata.io/kickstart.sh) + +# Verify +systemctl status netdata + +# Test +curl http://localhost:19999 +``` + +**Open in browser:** `http://216.239.104.130:19999` + +--- + +### Phase 4: Install on Ghost VPS (5 min) + +```bash +# SSH to Ghost +ssh root@64.50.188.14 + +# Run installer +bash <(curl -Ss https://my-netdata.io/kickstart.sh) + +# Verify +systemctl status netdata + +# Test +curl http://localhost:19999 +``` + +**Open in browser:** `http://64.50.188.14:19999` + +--- + +## Post-Installation Configuration + +### 1. Configure UFW Firewall + +**On each server:** + +```bash +# Allow Netdata port from Michael's management IP only +ufw allow from MICHAEL_MANAGEMENT_IP to any port 19999 proto tcp + +# Verify +ufw status | grep 19999 +``` + +**Security note:** Netdata dashboards contain sensitive server information. Only allow access from trusted IPs. + +--- + +### 2. Set Up Parent-Child Streaming (Optional) + +**Benefit:** View all servers from one dashboard (Command Center) + +**On Command Center (parent):** + +```bash +# Edit config +nano /etc/netdata/stream.conf + +# Add: +[11111111-2222-3333-4444-555555555555] + enabled = yes + default history = 3600 + default memory mode = save + health enabled = yes +``` + +**On TX1, NC1, Ghost (children):** + +```bash +# Edit config +nano /etc/netdata/stream.conf + +# Add: +[stream] + enabled = yes + destination = 63.143.34.217:19999 + api key = 11111111-2222-3333-4444-555555555555 + +# Restart netdata +systemctl restart netdata +``` + +**Result:** All server metrics visible on Command Center dashboard + +--- + +### 3. Configure Alerts + +**Edit alert config:** + +```bash +nano /etc/netdata/health.d/custom.conf +``` + +**Example alerts:** + +```yaml +# Alert when CPU usage > 80% for 5 minutes +alarm: cpu_usage + on: system.cpu + calc: $user + $system + every: 1m + warn: $this > 80 + crit: $this > 95 + delay: up 5m down 15m + info: CPU usage is too high + +# Alert when RAM usage > 90% +alarm: ram_usage + on: system.ram + calc: $used * 100 / ($used + $free) + every: 1m + warn: $this > 90 + crit: $this > 95 + delay: up 5m down 15m + info: RAM usage is too high + +# Alert when disk space < 20% +alarm: disk_space + on: disk.space + calc: $avail * 100 / ($avail + $used) + every: 1m + warn: $this < 20 + crit: $this < 10 + delay: up 5m down 15m + info: Disk space is running low +``` + +**Reload config:** + +```bash +killall -USR2 netdata +``` + +--- + +### 4. Discord Integration (Optional) + +**Set up Discord webhook for alerts:** + +```bash +# Edit alarm notification config +nano /etc/netdata/health_alarm_notify.conf + +# Find Discord section and configure: +SEND_DISCORD="YES" +DISCORD_WEBHOOK_URL="YOUR_DISCORD_WEBHOOK_URL_HERE" +DEFAULT_RECIPIENT_DISCORD="network-alerts" +``` + +**Test alert:** + +```bash +# Trigger test alert +/usr/libexec/netdata/plugins.d/alarm-notify.sh test +``` + +Check Discord for test notification. + +--- + +## Dashboard Access + +### Quick Access Links + +**Save these bookmarks:** + +- Command Center: http://63.143.34.217:19999 +- TX1: http://38.68.14.26:19999 +- NC1: http://216.239.104.130:19999 +- Ghost: http://64.50.188.14:19999 + +**Unified View (if streaming configured):** +- All servers: http://63.143.34.217:19999 → View nodes + +--- + +### Key Metrics to Monitor + +**CPU:** +- User % (application load) +- System % (kernel load) +- IOWait % (disk bottleneck indicator) + +**RAM:** +- Used vs Available +- Cache (should be high, that's good!) +- Swap usage (should be low) + +**Disk:** +- Disk space remaining +- Read/write speeds +- IOPs + +**Network:** +- Bandwidth usage +- Packet drops +- Connection count + +**Minecraft Servers (TX1/NC1):** +- Java heap usage +- GC activity +- Thread count + +--- + +## Maintenance + +### Daily + +- Quick glance at dashboards (bookmark all 4) +- Check for any red alerts + +### Weekly + +- Review CPU/RAM trends +- Check disk space projections +- Verify alerts working + +### Monthly + +- Review historical data +- Adjust alert thresholds if needed +- Update Netdata if new version available + +--- + +## Updates + +**Check for updates:** + +```bash +# On each server +netdata-updater.sh +``` + +**Or auto-update (recommended):** + +Updates automatically check daily and install automatically. + +--- + +## Troubleshooting + +### Dashboard won't load + +**Check service:** +```bash +systemctl status netdata +``` + +**Restart if needed:** +```bash +systemctl restart netdata +``` + +**Check firewall:** +```bash +ufw status | grep 19999 +telnet localhost 19999 +``` + +--- + +### High CPU usage from Netdata + +Netdata should use < 3% CPU normally. + +**Check what's using resources:** +```bash +# Disable some plugins if needed +nano /etc/netdata/netdata.conf + +# Under [plugins], disable unused: +python.d = no +node.d = no +``` + +--- + +### Streaming not working + +**Verify:** +- Parent (Command Center) has stream.conf with API key +- Children have correct parent IP +- Port 19999 accessible from children to parent +- API keys match exactly + +**Debug:** +```bash +# On child +tail -f /var/log/netdata/error.log | grep stream +``` + +--- + +### Alerts not sending to Discord + +**Check:** +- Discord webhook URL correct +- `SEND_DISCORD="YES"` set +- Test alert sent successfully + +**Debug:** +```bash +/usr/libexec/netdata/plugins.d/alarm-notify.sh test debug +``` + +--- + +## Advanced Features (Optional) + +### Netdata Cloud (Free) + +**Benefits:** +- Centralized dashboard for all servers +- Mobile app +- Longer data retention +- Collaboration features + +**Setup:** + +1. Go to https://app.netdata.cloud +2. Create free account +3. Claim nodes: + +```bash +# On each server +netdata-claim.sh -token=YOUR_TOKEN -rooms=YOUR_ROOM -url=https://app.netdata.cloud +``` + +--- + +### Custom Dashboards + +Create custom dashboards with specific metrics: + +1. Open Netdata dashboard +2. Click "Create Dashboard" +3. Add charts +4. Save and share URL + +--- + +## Success Criteria Checklist + +- [ ] Netdata installed on Command Center +- [ ] Netdata installed on TX1 +- [ ] Netdata installed on NC1 +- [ ] Netdata installed on Ghost VPS +- [ ] All dashboards accessible via browser +- [ ] UFW rules configured (management IP only) +- [ ] Alerts configured for CPU/RAM/Disk +- [ ] (Optional) Discord integration working +- [ ] (Optional) Parent-child streaming configured +- [ ] Dashboards bookmarked for quick access + +--- + +## Related Tasks + +- **Staggered Server Restart System** - Monitor impact on resources +- **World Backup Automation** - Monitor backup job duration +- **Command Center Security** - Part of monitoring infrastructure +- **Frostwall Protocol** - Monitor tunnel performance + +--- + +**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️ + +--- + +**Document Status:** COMPLETE +**Ready for Deployment:** When SSH access available (30 minutes total) +**Dependencies:** SSH access to all 4 servers, management IP whitelisted +**Port Required:** 19999 (internal only, secured by UFW)