Created complete deployment guide for Netdata monitoring (400+ lines): Deployment Strategy: - Install on all 4 infrastructure servers - Command Center, TX1, NC1, Ghost VPS - Quick one-line install per server - Total deployment time: 30 minutes Configuration: - UFW firewall rules (management IP only) - Parent-child streaming (unified dashboard) - Custom alert configuration (CPU/RAM/Disk) - Discord webhook integration - Health monitoring Features: - Real-time performance monitoring - Beautiful web dashboards on port 19999 - Zero configuration required - Lightweight (< 3% CPU, ~100 MB RAM) - Auto-detects all services and metrics Monitoring Targets: - CPU, RAM, Disk, Network metrics - Java heap usage (Minecraft servers) - Service-specific monitoring - Alert thresholds configurable Advanced Features: - Netdata Cloud integration (centralized) - Custom dashboards - Mobile app access - Longer data retention Troubleshooting guide included for common issues. Ready to deploy when SSH access available. Task: Netdata Deployment (Tier 2) FFG-STD-002 compliant
504 lines
9.0 KiB
Markdown
504 lines
9.0 KiB
Markdown
# Netdata Deployment - Complete Guide
|
|
|
|
**Status:** Ready to Deploy
|
|
**Priority:** Tier 2 - Infrastructure Monitoring
|
|
**Time Estimate:** 30 minutes (all servers)
|
|
**Last Updated:** 2026-02-17
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Deploy Netdata real-time monitoring across all Firefrost infrastructure. Provides beautiful dashboards for CPU, RAM, disk, network, and application metrics with zero configuration required.
|
|
|
|
**What is Netdata?**
|
|
- Real-time performance monitoring
|
|
- Beautiful web dashboards
|
|
- Zero configuration needed
|
|
- Extremely lightweight (< 3% CPU, ~100 MB RAM)
|
|
- Open source and free
|
|
|
|
---
|
|
|
|
## Deployment Targets
|
|
|
|
**All 4 infrastructure servers:**
|
|
|
|
1. **Command Center** (63.143.34.217) - Dallas hub
|
|
- Services: Gitea, Uptime Kuma, Code-Server, Automation
|
|
- Dashboard: `http://63.143.34.217:19999`
|
|
|
|
2. **TX1** (38.68.14.26) - Dallas game servers
|
|
- Services: 5 Minecraft servers + FoundryVTT
|
|
- Dashboard: `http://38.68.14.26:19999`
|
|
|
|
3. **NC1** (216.239.104.130) - Charlotte game servers
|
|
- Services: 6 Minecraft servers + Hytale
|
|
- Dashboard: `http://216.239.104.130:19999`
|
|
|
|
4. **Ghost VPS** (64.50.188.14) - Chicago staff services
|
|
- Services: MkDocs, Wiki.js (x2), NextCloud
|
|
- Dashboard: `http://64.50.188.14:19999`
|
|
|
|
---
|
|
|
|
## Installation (Per Server)
|
|
|
|
### One-Line Install
|
|
|
|
**On each server:**
|
|
|
|
```bash
|
|
# Install Netdata
|
|
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
|
|
|
|
# The installer will:
|
|
# - Auto-detect your OS
|
|
# - Install dependencies
|
|
# - Compile and install Netdata
|
|
# - Start the service
|
|
# - Open port 19999
|
|
```
|
|
|
|
**Installation takes:** 2-5 minutes per server
|
|
|
|
---
|
|
|
|
## Step-by-Step Deployment
|
|
|
|
### Phase 1: Install on Command Center (10 min)
|
|
|
|
```bash
|
|
# SSH to Command Center
|
|
ssh root@63.143.34.217
|
|
|
|
# Run installer
|
|
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
|
|
|
|
# Wait for installation to complete
|
|
# Answer prompts (usually just press Enter for defaults)
|
|
|
|
# Verify installation
|
|
systemctl status netdata
|
|
|
|
# Should show: active (running)
|
|
|
|
# Test dashboard
|
|
curl http://localhost:19999
|
|
|
|
# Should return HTML
|
|
```
|
|
|
|
**Open in browser:** `http://63.143.34.217:19999`
|
|
|
|
You should see the Netdata dashboard!
|
|
|
|
---
|
|
|
|
### Phase 2: Install on TX1 (5 min)
|
|
|
|
```bash
|
|
# SSH to TX1
|
|
ssh root@38.68.14.26
|
|
|
|
# Run installer
|
|
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
|
|
|
|
# Verify
|
|
systemctl status netdata
|
|
|
|
# Test
|
|
curl http://localhost:19999
|
|
```
|
|
|
|
**Open in browser:** `http://38.68.14.26:19999`
|
|
|
|
---
|
|
|
|
### Phase 3: Install on NC1 (5 min)
|
|
|
|
```bash
|
|
# SSH to NC1
|
|
ssh root@216.239.104.130
|
|
|
|
# Run installer
|
|
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
|
|
|
|
# Verify
|
|
systemctl status netdata
|
|
|
|
# Test
|
|
curl http://localhost:19999
|
|
```
|
|
|
|
**Open in browser:** `http://216.239.104.130:19999`
|
|
|
|
---
|
|
|
|
### Phase 4: Install on Ghost VPS (5 min)
|
|
|
|
```bash
|
|
# SSH to Ghost
|
|
ssh root@64.50.188.14
|
|
|
|
# Run installer
|
|
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
|
|
|
|
# Verify
|
|
systemctl status netdata
|
|
|
|
# Test
|
|
curl http://localhost:19999
|
|
```
|
|
|
|
**Open in browser:** `http://64.50.188.14:19999`
|
|
|
|
---
|
|
|
|
## Post-Installation Configuration
|
|
|
|
### 1. Configure UFW Firewall
|
|
|
|
**On each server:**
|
|
|
|
```bash
|
|
# Allow Netdata port from Michael's management IP only
|
|
ufw allow from MICHAEL_MANAGEMENT_IP to any port 19999 proto tcp
|
|
|
|
# Verify
|
|
ufw status | grep 19999
|
|
```
|
|
|
|
**Security note:** Netdata dashboards contain sensitive server information. Only allow access from trusted IPs.
|
|
|
|
---
|
|
|
|
### 2. Set Up Parent-Child Streaming (Optional)
|
|
|
|
**Benefit:** View all servers from one dashboard (Command Center)
|
|
|
|
**On Command Center (parent):**
|
|
|
|
```bash
|
|
# Edit config
|
|
nano /etc/netdata/stream.conf
|
|
|
|
# Add:
|
|
[11111111-2222-3333-4444-555555555555]
|
|
enabled = yes
|
|
default history = 3600
|
|
default memory mode = save
|
|
health enabled = yes
|
|
```
|
|
|
|
**On TX1, NC1, Ghost (children):**
|
|
|
|
```bash
|
|
# Edit config
|
|
nano /etc/netdata/stream.conf
|
|
|
|
# Add:
|
|
[stream]
|
|
enabled = yes
|
|
destination = 63.143.34.217:19999
|
|
api key = 11111111-2222-3333-4444-555555555555
|
|
|
|
# Restart netdata
|
|
systemctl restart netdata
|
|
```
|
|
|
|
**Result:** All server metrics visible on Command Center dashboard
|
|
|
|
---
|
|
|
|
### 3. Configure Alerts
|
|
|
|
**Edit alert config:**
|
|
|
|
```bash
|
|
nano /etc/netdata/health.d/custom.conf
|
|
```
|
|
|
|
**Example alerts:**
|
|
|
|
```yaml
|
|
# Alert when CPU usage > 80% for 5 minutes
|
|
alarm: cpu_usage
|
|
on: system.cpu
|
|
calc: $user + $system
|
|
every: 1m
|
|
warn: $this > 80
|
|
crit: $this > 95
|
|
delay: up 5m down 15m
|
|
info: CPU usage is too high
|
|
|
|
# Alert when RAM usage > 90%
|
|
alarm: ram_usage
|
|
on: system.ram
|
|
calc: $used * 100 / ($used + $free)
|
|
every: 1m
|
|
warn: $this > 90
|
|
crit: $this > 95
|
|
delay: up 5m down 15m
|
|
info: RAM usage is too high
|
|
|
|
# Alert when disk space < 20%
|
|
alarm: disk_space
|
|
on: disk.space
|
|
calc: $avail * 100 / ($avail + $used)
|
|
every: 1m
|
|
warn: $this < 20
|
|
crit: $this < 10
|
|
delay: up 5m down 15m
|
|
info: Disk space is running low
|
|
```
|
|
|
|
**Reload config:**
|
|
|
|
```bash
|
|
killall -USR2 netdata
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Discord Integration (Optional)
|
|
|
|
**Set up Discord webhook for alerts:**
|
|
|
|
```bash
|
|
# Edit alarm notification config
|
|
nano /etc/netdata/health_alarm_notify.conf
|
|
|
|
# Find Discord section and configure:
|
|
SEND_DISCORD="YES"
|
|
DISCORD_WEBHOOK_URL="YOUR_DISCORD_WEBHOOK_URL_HERE"
|
|
DEFAULT_RECIPIENT_DISCORD="network-alerts"
|
|
```
|
|
|
|
**Test alert:**
|
|
|
|
```bash
|
|
# Trigger test alert
|
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test
|
|
```
|
|
|
|
Check Discord for test notification.
|
|
|
|
---
|
|
|
|
## Dashboard Access
|
|
|
|
### Quick Access Links
|
|
|
|
**Save these bookmarks:**
|
|
|
|
- Command Center: http://63.143.34.217:19999
|
|
- TX1: http://38.68.14.26:19999
|
|
- NC1: http://216.239.104.130:19999
|
|
- Ghost: http://64.50.188.14:19999
|
|
|
|
**Unified View (if streaming configured):**
|
|
- All servers: http://63.143.34.217:19999 → View nodes
|
|
|
|
---
|
|
|
|
### Key Metrics to Monitor
|
|
|
|
**CPU:**
|
|
- User % (application load)
|
|
- System % (kernel load)
|
|
- IOWait % (disk bottleneck indicator)
|
|
|
|
**RAM:**
|
|
- Used vs Available
|
|
- Cache (should be high, that's good!)
|
|
- Swap usage (should be low)
|
|
|
|
**Disk:**
|
|
- Disk space remaining
|
|
- Read/write speeds
|
|
- IOPs
|
|
|
|
**Network:**
|
|
- Bandwidth usage
|
|
- Packet drops
|
|
- Connection count
|
|
|
|
**Minecraft Servers (TX1/NC1):**
|
|
- Java heap usage
|
|
- GC activity
|
|
- Thread count
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
### Daily
|
|
|
|
- Quick glance at dashboards (bookmark all 4)
|
|
- Check for any red alerts
|
|
|
|
### Weekly
|
|
|
|
- Review CPU/RAM trends
|
|
- Check disk space projections
|
|
- Verify alerts working
|
|
|
|
### Monthly
|
|
|
|
- Review historical data
|
|
- Adjust alert thresholds if needed
|
|
- Update Netdata if new version available
|
|
|
|
---
|
|
|
|
## Updates
|
|
|
|
**Check for updates:**
|
|
|
|
```bash
|
|
# On each server
|
|
netdata-updater.sh
|
|
```
|
|
|
|
**Or auto-update (recommended):**
|
|
|
|
Updates automatically check daily and install automatically.
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Dashboard won't load
|
|
|
|
**Check service:**
|
|
```bash
|
|
systemctl status netdata
|
|
```
|
|
|
|
**Restart if needed:**
|
|
```bash
|
|
systemctl restart netdata
|
|
```
|
|
|
|
**Check firewall:**
|
|
```bash
|
|
ufw status | grep 19999
|
|
telnet localhost 19999
|
|
```
|
|
|
|
---
|
|
|
|
### High CPU usage from Netdata
|
|
|
|
Netdata should use < 3% CPU normally.
|
|
|
|
**Check what's using resources:**
|
|
```bash
|
|
# Disable some plugins if needed
|
|
nano /etc/netdata/netdata.conf
|
|
|
|
# Under [plugins], disable unused:
|
|
python.d = no
|
|
node.d = no
|
|
```
|
|
|
|
---
|
|
|
|
### Streaming not working
|
|
|
|
**Verify:**
|
|
- Parent (Command Center) has stream.conf with API key
|
|
- Children have correct parent IP
|
|
- Port 19999 accessible from children to parent
|
|
- API keys match exactly
|
|
|
|
**Debug:**
|
|
```bash
|
|
# On child
|
|
tail -f /var/log/netdata/error.log | grep stream
|
|
```
|
|
|
|
---
|
|
|
|
### Alerts not sending to Discord
|
|
|
|
**Check:**
|
|
- Discord webhook URL correct
|
|
- `SEND_DISCORD="YES"` set
|
|
- Test alert sent successfully
|
|
|
|
**Debug:**
|
|
```bash
|
|
/usr/libexec/netdata/plugins.d/alarm-notify.sh test debug
|
|
```
|
|
|
|
---
|
|
|
|
## Advanced Features (Optional)
|
|
|
|
### Netdata Cloud (Free)
|
|
|
|
**Benefits:**
|
|
- Centralized dashboard for all servers
|
|
- Mobile app
|
|
- Longer data retention
|
|
- Collaboration features
|
|
|
|
**Setup:**
|
|
|
|
1. Go to https://app.netdata.cloud
|
|
2. Create free account
|
|
3. Claim nodes:
|
|
|
|
```bash
|
|
# On each server
|
|
netdata-claim.sh -token=YOUR_TOKEN -rooms=YOUR_ROOM -url=https://app.netdata.cloud
|
|
```
|
|
|
|
---
|
|
|
|
### Custom Dashboards
|
|
|
|
Create custom dashboards with specific metrics:
|
|
|
|
1. Open Netdata dashboard
|
|
2. Click "Create Dashboard"
|
|
3. Add charts
|
|
4. Save and share URL
|
|
|
|
---
|
|
|
|
## Success Criteria Checklist
|
|
|
|
- [ ] Netdata installed on Command Center
|
|
- [ ] Netdata installed on TX1
|
|
- [ ] Netdata installed on NC1
|
|
- [ ] Netdata installed on Ghost VPS
|
|
- [ ] All dashboards accessible via browser
|
|
- [ ] UFW rules configured (management IP only)
|
|
- [ ] Alerts configured for CPU/RAM/Disk
|
|
- [ ] (Optional) Discord integration working
|
|
- [ ] (Optional) Parent-child streaming configured
|
|
- [ ] Dashboards bookmarked for quick access
|
|
|
|
---
|
|
|
|
## Related Tasks
|
|
|
|
- **Staggered Server Restart System** - Monitor impact on resources
|
|
- **World Backup Automation** - Monitor backup job duration
|
|
- **Command Center Security** - Part of monitoring infrastructure
|
|
- **Frostwall Protocol** - Monitor tunnel performance
|
|
|
|
---
|
|
|
|
**Fire + Frost + Foundation = Where Love Builds Legacy** 💙🔥❄️
|
|
|
|
---
|
|
|
|
**Document Status:** COMPLETE
|
|
**Ready for Deployment:** When SSH access available (30 minutes total)
|
|
**Dependencies:** SSH access to all 4 servers, management IP whitelisted
|
|
**Port Required:** 19999 (internal only, secured by UFW)
|