Files
antigravity-skills-reference/skills/server-management/SKILL.md
sck_0 aa71e76eb9 chore: release 6.5.0 - Community & Experience
- Add date_added to all 950+ skills for complete tracking
- Update version to 6.5.0 in package.json and README
- Regenerate all indexes and catalog
- Sync all generated files

Features from merged PR #150:
- Stars/Upvotes system for community-driven discovery
- Auto-update mechanism via START_APP.bat
- Interactive Prompt Builder
- Date tracking badges
- Smart auto-categorization

All skills validated and indexed.

Made-with: Cursor
2026-02-27 09:19:41 +01:00

167 lines
3.8 KiB
Markdown

---
name: server-management
description: "Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands."
risk: unknown
source: community
date_added: "2026-02-27"
---
# Server Management
> Server management principles for production operations.
> **Learn to THINK, not memorize commands.**
---
## 1. Process Management Principles
### Tool Selection
| Scenario | Tool |
|----------|------|
| **Node.js app** | PM2 (clustering, reload) |
| **Any app** | systemd (Linux native) |
| **Containers** | Docker/Podman |
| **Orchestration** | Kubernetes, Docker Swarm |
### Process Management Goals
| Goal | What It Means |
|------|---------------|
| **Restart on crash** | Auto-recovery |
| **Zero-downtime reload** | No service interruption |
| **Clustering** | Use all CPU cores |
| **Persistence** | Survive server reboot |
---
## 2. Monitoring Principles
### What to Monitor
| Category | Key Metrics |
|----------|-------------|
| **Availability** | Uptime, health checks |
| **Performance** | Response time, throughput |
| **Errors** | Error rate, types |
| **Resources** | CPU, memory, disk |
### Alert Severity Strategy
| Level | Response |
|-------|----------|
| **Critical** | Immediate action |
| **Warning** | Investigate soon |
| **Info** | Review daily |
### Monitoring Tool Selection
| Need | Options |
|------|---------|
| Simple/Free | PM2 metrics, htop |
| Full observability | Grafana, Datadog |
| Error tracking | Sentry |
| Uptime | UptimeRobot, Pingdom |
---
## 3. Log Management Principles
### Log Strategy
| Log Type | Purpose |
|----------|---------|
| **Application logs** | Debug, audit |
| **Access logs** | Traffic analysis |
| **Error logs** | Issue detection |
### Log Principles
1. **Rotate logs** to prevent disk fill
2. **Structured logging** (JSON) for parsing
3. **Appropriate levels** (error/warn/info/debug)
4. **No sensitive data** in logs
---
## 4. Scaling Decisions
### When to Scale
| Symptom | Solution |
|---------|----------|
| High CPU | Add instances (horizontal) |
| High memory | Increase RAM or fix leak |
| Slow response | Profile first, then scale |
| Traffic spikes | Auto-scaling |
### Scaling Strategy
| Type | When to Use |
|------|-------------|
| **Vertical** | Quick fix, single instance |
| **Horizontal** | Sustainable, distributed |
| **Auto** | Variable traffic |
---
## 5. Health Check Principles
### What Constitutes Healthy
| Check | Meaning |
|-------|---------|
| **HTTP 200** | Service responding |
| **Database connected** | Data accessible |
| **Dependencies OK** | External services reachable |
| **Resources OK** | CPU/memory not exhausted |
### Health Check Implementation
- Simple: Just return 200
- Deep: Check all dependencies
- Choose based on load balancer needs
---
## 6. Security Principles
| Area | Principle |
|------|-----------|
| **Access** | SSH keys only, no passwords |
| **Firewall** | Only needed ports open |
| **Updates** | Regular security patches |
| **Secrets** | Environment vars, not files |
| **Audit** | Log access and changes |
---
## 7. Troubleshooting Priority
When something's wrong:
1. **Check if running** (process status)
2. **Check logs** (error messages)
3. **Check resources** (disk, memory, CPU)
4. **Check network** (ports, DNS)
5. **Check dependencies** (database, APIs)
---
## 8. Anti-Patterns
| ❌ Don't | ✅ Do |
|----------|-------|
| Run as root | Use non-root user |
| Ignore logs | Set up log rotation |
| Skip monitoring | Monitor from day one |
| Manual restarts | Auto-restart config |
| No backups | Regular backup schedule |
---
> **Remember:** A well-managed server is boring. That's the goal.
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.