name, description, risk, source, date_added
name
description
risk
source
date_added
server-management
Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.
unknown
community
2026-02-27
Server Management
Server management principles for production operations.
Learn to THINK, not memorize commands.
1. Process Management Principles
Tool Selection
Scenario
Tool
Node.js app
PM2 (clustering, reload)
Any app
systemd (Linux native)
Containers
Docker/Podman
Orchestration
Kubernetes, Docker Swarm
Process Management Goals
Goal
What It Means
Restart on crash
Auto-recovery
Zero-downtime reload
No service interruption
Clustering
Use all CPU cores
Persistence
Survive server reboot
2. Monitoring Principles
What to Monitor
Category
Key Metrics
Availability
Uptime, health checks
Performance
Response time, throughput
Errors
Error rate, types
Resources
CPU, memory, disk
Alert Severity Strategy
Level
Response
Critical
Immediate action
Warning
Investigate soon
Info
Review daily
Monitoring Tool Selection
Need
Options
Simple/Free
PM2 metrics, htop
Full observability
Grafana, Datadog
Error tracking
Sentry
Uptime
UptimeRobot, Pingdom
3. Log Management Principles
Log Strategy
Log Type
Purpose
Application logs
Debug, audit
Access logs
Traffic analysis
Error logs
Issue detection
Log Principles
Rotate logs to prevent disk fill
Structured logging (JSON) for parsing
Appropriate levels (error/warn/info/debug)
No sensitive data in logs
4. Scaling Decisions
When to Scale
Symptom
Solution
High CPU
Add instances (horizontal)
High memory
Increase RAM or fix leak
Slow response
Profile first, then scale
Traffic spikes
Auto-scaling
Scaling Strategy
Type
When to Use
Vertical
Quick fix, single instance
Horizontal
Sustainable, distributed
Auto
Variable traffic
5. Health Check Principles
What Constitutes Healthy
Check
Meaning
HTTP 200
Service responding
Database connected
Data accessible
Dependencies OK
External services reachable
Resources OK
CPU/memory not exhausted
Health Check Implementation
Simple: Just return 200
Deep: Check all dependencies
Choose based on load balancer needs
6. Security Principles
Area
Principle
Access
SSH keys only, no passwords
Firewall
Only needed ports open
Updates
Regular security patches
Secrets
Environment vars, not files
Audit
Log access and changes
7. Troubleshooting Priority
When something's wrong:
Check if running (process status)
Check logs (error messages)
Check resources (disk, memory, CPU)
Check network (ports, DNS)
Check dependencies (database, APIs)
8. Anti-Patterns
❌ Don't
✅ Do
Run as root
Use non-root user
Ignore logs
Set up log rotation
Skip monitoring
Monitor from day one
Manual restarts
Auto-restart config
No backups
Regular backup schedule
Remember: A well-managed server is boring. That's the goal.
When to Use
This skill is applicable to execute the workflow or actions described in the overview.