revert: restore DOCKER_GUIDE.md and KUBERNETES_GUIDE.md

These files were incorrectly deleted — they have distinct content from
the *_DEPLOYMENT.md files (different structure, different focus, different
examples) and are not duplicates.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
yusyus
2026-02-18 22:24:34 +03:00
parent 0cbe151c40
commit 66c823107e
2 changed files with 1532 additions and 0 deletions

575
docs/DOCKER_GUIDE.md Normal file
View File

@@ -0,0 +1,575 @@
# Docker Deployment Guide
Complete guide for deploying Skill Seekers using Docker and Docker Compose.
## Quick Start
### 1. Prerequisites
- Docker 20.10+ installed
- Docker Compose 2.0+ installed
- 2GB+ available RAM
- 5GB+ available disk space
```bash
# Check Docker installation
docker --version
docker-compose --version
```
### 2. Clone Repository
```bash
git clone https://github.com/your-org/skill-seekers.git
cd skill-seekers
```
### 3. Configure Environment
```bash
# Copy environment template
cp .env.example .env
# Edit .env with your API keys
nano .env # or your preferred editor
```
**Minimum Required:**
- `ANTHROPIC_API_KEY` - For AI enhancement features
### 4. Start Services
```bash
# Start all services (CLI + MCP server + vector DBs)
docker-compose up -d
# Or start specific services
docker-compose up -d mcp-server weaviate
```
### 5. Verify Deployment
```bash
# Check service status
docker-compose ps
# Test CLI
docker-compose run skill-seekers skill-seekers --version
# Test MCP server
curl http://localhost:8765/health
```
---
## Available Images
### 1. skill-seekers (CLI)
**Purpose:** Main CLI application for documentation scraping and skill generation
**Usage:**
```bash
# Run CLI command
docker run --rm \
-v $(pwd)/output:/output \
-e ANTHROPIC_API_KEY=your-key \
skill-seekers skill-seekers scrape --config /configs/react.json
# Interactive shell
docker run -it --rm skill-seekers bash
```
**Image Size:** ~400MB
**Platforms:** linux/amd64, linux/arm64
### 2. skill-seekers-mcp (MCP Server)
**Purpose:** MCP server with 25 tools for AI assistants
**Usage:**
```bash
# HTTP mode (default)
docker run -d -p 8765:8765 \
-e ANTHROPIC_API_KEY=your-key \
skill-seekers-mcp
# Stdio mode
docker run -it \
-e ANTHROPIC_API_KEY=your-key \
skill-seekers-mcp \
python -m skill_seekers.mcp.server_fastmcp --transport stdio
```
**Image Size:** ~450MB
**Platforms:** linux/amd64, linux/arm64
**Health Check:** http://localhost:8765/health
---
## Docker Compose Services
### Service Architecture
```
┌─────────────────────┐
│ skill-seekers │ CLI Application
└─────────────────────┘
┌─────────────────────┐
│ mcp-server │ MCP Server (25 tools)
│ Port: 8765 │
└─────────────────────┘
┌─────────────────────┐
│ weaviate │ Vector DB (hybrid search)
│ Port: 8080 │
└─────────────────────┘
┌─────────────────────┐
│ qdrant │ Vector DB (native filtering)
│ Ports: 6333/6334 │
└─────────────────────┘
┌─────────────────────┐
│ chroma │ Vector DB (local-first)
│ Port: 8000 │
└─────────────────────┘
```
### Service Commands
```bash
# Start all services
docker-compose up -d
# Start specific services
docker-compose up -d mcp-server weaviate
# Stop all services
docker-compose down
# View logs
docker-compose logs -f mcp-server
# Restart service
docker-compose restart mcp-server
# Scale service (if supported)
docker-compose up -d --scale mcp-server=3
```
---
## Common Use Cases
### Use Case 1: Scrape Documentation
```bash
# Create skill from React documentation
docker-compose run skill-seekers \
skill-seekers scrape --config /configs/react.json
# Output will be in ./output/react/
```
### Use Case 2: Export to Vector Databases
```bash
# Export React skill to all vector databases
docker-compose run skill-seekers bash -c "
skill-seekers scrape --config /configs/react.json &&
python -c '
import sys
from pathlib import Path
sys.path.insert(0, \"/app/src\")
from skill_seekers.cli.adaptors import get_adaptor
for target in [\"weaviate\", \"chroma\", \"faiss\", \"qdrant\"]:
adaptor = get_adaptor(target)
adaptor.package(Path(\"/output/react\"), Path(\"/output\"))
print(f\"✅ Exported to {target}\")
'
"
```
### Use Case 3: Run Quality Analysis
```bash
# Generate quality report for a skill
docker-compose run skill-seekers bash -c "
python3 <<'EOF'
import sys
from pathlib import Path
sys.path.insert(0, '/app/src')
from skill_seekers.cli.quality_metrics import QualityAnalyzer
analyzer = QualityAnalyzer(Path('/output/react'))
report = analyzer.generate_report()
print(analyzer.format_report(report))
EOF
"
```
### Use Case 4: MCP Server Integration
```bash
# Start MCP server
docker-compose up -d mcp-server
# Configure Claude Desktop
# Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"skill-seekers": {
"url": "http://localhost:8765/sse"
}
}
}
```
---
## Volume Management
### Default Volumes
| Volume | Path | Purpose |
|--------|------|---------|
| `./data` | `/data` | Persistent data (cache, logs) |
| `./configs` | `/configs` | Configuration files (read-only) |
| `./output` | `/output` | Generated skills and exports |
| `weaviate-data` | N/A | Weaviate database storage |
| `qdrant-data` | N/A | Qdrant database storage |
| `chroma-data` | N/A | Chroma database storage |
### Backup Volumes
```bash
# Backup vector database data
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
alpine tar czf /backup/weaviate-backup.tar.gz -C /data .
# Restore from backup
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
alpine tar xzf /backup/weaviate-backup.tar.gz -C /data
```
### Clean Up Volumes
```bash
# Remove all volumes (WARNING: deletes all data)
docker-compose down -v
# Remove specific volume
docker volume rm skill-seekers_weaviate-data
```
---
## Environment Variables
### Required Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `ANTHROPIC_API_KEY` | Claude AI API key | `sk-ant-...` |
### Optional Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `GOOGLE_API_KEY` | Gemini API key | - |
| `OPENAI_API_KEY` | OpenAI API key | - |
| `GITHUB_TOKEN` | GitHub API token | - |
| `MCP_TRANSPORT` | MCP transport mode | `http` |
| `MCP_PORT` | MCP server port | `8765` |
### Setting Variables
**Option 1: .env file (recommended)**
```bash
cp .env.example .env
# Edit .env with your keys
```
**Option 2: Export in shell**
```bash
export ANTHROPIC_API_KEY=sk-ant-your-key
docker-compose up -d
```
**Option 3: Inline**
```bash
ANTHROPIC_API_KEY=sk-ant-your-key docker-compose up -d
```
---
## Building Images Locally
### Build CLI Image
```bash
docker build -t skill-seekers:local -f Dockerfile .
```
### Build MCP Server Image
```bash
docker build -t skill-seekers-mcp:local -f Dockerfile.mcp .
```
### Build with Custom Base Image
```bash
# Use slim base (smaller)
docker build -t skill-seekers:slim \
--build-arg BASE_IMAGE=python:3.12-slim \
-f Dockerfile .
# Use alpine base (smallest)
docker build -t skill-seekers:alpine \
--build-arg BASE_IMAGE=python:3.12-alpine \
-f Dockerfile .
```
---
## Troubleshooting
### Issue: MCP Server Won't Start
**Symptoms:**
- Container exits immediately
- Health check fails
**Solutions:**
```bash
# Check logs
docker-compose logs mcp-server
# Verify port is available
lsof -i :8765
# Test MCP package installation
docker-compose run mcp-server python -c "import mcp; print('OK')"
```
### Issue: Permission Denied
**Symptoms:**
- Cannot write to /output
- Cannot access /configs
**Solutions:**
```bash
# Fix permissions
chmod -R 777 data/ output/
# Or use specific user ID
docker-compose run -u $(id -u):$(id -g) skill-seekers ...
```
### Issue: Out of Memory
**Symptoms:**
- Container killed
- OOMKilled in `docker-compose ps`
**Solutions:**
```bash
# Increase Docker memory limit
# Edit docker-compose.yml, add:
services:
skill-seekers:
mem_limit: 4g
memswap_limit: 4g
# Or use streaming for large docs
docker-compose run skill-seekers \
skill-seekers scrape --config /configs/react.json --streaming
```
### Issue: Vector Database Connection Failed
**Symptoms:**
- Cannot connect to Weaviate/Qdrant/Chroma
- Connection refused errors
**Solutions:**
```bash
# Check if services are running
docker-compose ps
# Test connectivity
docker-compose exec skill-seekers curl http://weaviate:8080
docker-compose exec skill-seekers curl http://qdrant:6333
docker-compose exec skill-seekers curl http://chroma:8000
# Restart services
docker-compose restart weaviate qdrant chroma
```
### Issue: Slow Performance
**Symptoms:**
- Long scraping times
- Slow container startup
**Solutions:**
```bash
# Use smaller image
docker pull skill-seekers:slim
# Enable BuildKit cache
export DOCKER_BUILDKIT=1
docker build -t skill-seekers:local .
# Increase CPU allocation
docker-compose up -d --scale skill-seekers=1 --cpu-shares=2048
```
---
## Production Deployment
### Security Hardening
1. **Use secrets management**
```bash
# Docker secrets (Swarm mode)
echo "sk-ant-your-key" | docker secret create anthropic_key -
# Kubernetes secrets
kubectl create secret generic skill-seekers-secrets \
--from-literal=anthropic-api-key=sk-ant-your-key
```
2. **Run as non-root**
```dockerfile
# Already configured in Dockerfile
USER skillseeker # UID 1000
```
3. **Read-only filesystems**
```yaml
# docker-compose.yml
services:
mcp-server:
read_only: true
tmpfs:
- /tmp
```
4. **Resource limits**
```yaml
services:
mcp-server:
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
reservations:
cpus: '0.5'
memory: 512M
```
### Monitoring
1. **Health checks**
```bash
# Check all services
docker-compose ps
# Detailed health status
docker inspect --format='{{.State.Health.Status}}' skill-seekers-mcp
```
2. **Logs**
```bash
# Stream logs
docker-compose logs -f --tail=100
# Export logs
docker-compose logs > skill-seekers-logs.txt
```
3. **Metrics**
```bash
# Resource usage
docker stats
# Container inspect
docker-compose exec mcp-server ps aux
docker-compose exec mcp-server df -h
```
### Scaling
1. **Horizontal scaling**
```bash
# Scale MCP servers
docker-compose up -d --scale mcp-server=3
# Use load balancer
# Add nginx/haproxy in docker-compose.yml
```
2. **Vertical scaling**
```yaml
# Increase resources
services:
mcp-server:
deploy:
resources:
limits:
cpus: '4.0'
memory: 8G
```
---
## Best Practices
### 1. Use Multi-Stage Builds
✅ Already implemented in Dockerfile
- Builder stage for dependencies
- Runtime stage for production
### 2. Minimize Image Size
- Use slim base images
- Clean up apt cache
- Remove unnecessary files via .dockerignore
### 3. Security
- Run as non-root user (UID 1000)
- Use secrets for sensitive data
- Keep images updated
### 4. Persistence
- Use named volumes for databases
- Mount ./output for generated skills
- Regular backups of vector DB data
### 5. Monitoring
- Enable health checks
- Stream logs to external service
- Monitor resource usage
---
## Additional Resources
- [Docker Documentation](https://docs.docker.com/)
- [Docker Compose Reference](https://docs.docker.com/compose/compose-file/)
- [Skill Seekers Documentation](https://skillseekersweb.com/)
- [MCP Server Setup](docs/MCP_SETUP.md)
- [Vector Database Integration](docs/strategy/WEEK2_COMPLETE.md)
---
**Last Updated:** February 7, 2026
**Docker Version:** 20.10+
**Compose Version:** 2.0+

957
docs/KUBERNETES_GUIDE.md Normal file
View File

@@ -0,0 +1,957 @@
# Kubernetes Deployment Guide
Complete guide for deploying Skill Seekers to Kubernetes using Helm charts.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Installation Methods](#installation-methods)
- [Configuration](#configuration)
- [Accessing Services](#accessing-services)
- [Scaling](#scaling)
- [Persistence](#persistence)
- [Vector Databases](#vector-databases)
- [Security](#security)
- [Monitoring](#monitoring)
- [Troubleshooting](#troubleshooting)
- [Production Best Practices](#production-best-practices)
## Prerequisites
### Required
- Kubernetes cluster (1.23+)
- Helm 3.8+
- kubectl configured for your cluster
- 20GB+ available storage (for persistence)
### Recommended
- Ingress controller (nginx, traefik)
- cert-manager (for TLS certificates)
- Prometheus operator (for monitoring)
- Persistent storage provisioner
### Cluster Resource Requirements
**Minimum (Development):**
- 2 CPU cores
- 8GB RAM
- 20GB storage
**Recommended (Production):**
- 8+ CPU cores
- 32GB+ RAM
- 200GB+ storage (persistent volumes)
## Quick Start
### 1. Add Helm Repository (if published)
```bash
# Add Helm repo
helm repo add skill-seekers https://yourusername.github.io/skill-seekers
helm repo update
# Install with default values
helm install my-skill-seekers skill-seekers/skill-seekers \
--create-namespace \
--namespace skill-seekers
```
### 2. Install from Local Chart
```bash
# Clone repository
git clone https://github.com/yourusername/skill-seekers.git
cd skill-seekers
# Install chart
helm install my-skill-seekers ./helm/skill-seekers \
--create-namespace \
--namespace skill-seekers
```
### 3. Quick Test
```bash
# Port-forward MCP server
kubectl port-forward -n skill-seekers svc/my-skill-seekers-mcp 8765:8765
# Test health endpoint
curl http://localhost:8765/health
# Expected response: {"status": "ok"}
```
## Installation Methods
### Method 1: Minimal Installation (Testing)
Smallest deployment for testing - no persistence, no vector databases.
```bash
helm install my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--create-namespace \
--set persistence.enabled=false \
--set vectorDatabases.weaviate.enabled=false \
--set vectorDatabases.qdrant.enabled=false \
--set vectorDatabases.chroma.enabled=false \
--set mcpServer.replicaCount=1 \
--set mcpServer.autoscaling.enabled=false
```
### Method 2: Development Installation
Moderate resources with persistence for local development.
```bash
helm install my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--create-namespace \
--set persistence.data.size=5Gi \
--set persistence.output.size=10Gi \
--set vectorDatabases.weaviate.persistence.size=20Gi \
--set mcpServer.replicaCount=1 \
--set secrets.anthropicApiKey="sk-ant-..."
```
### Method 3: Production Installation
Full production deployment with autoscaling, persistence, and all vector databases.
```bash
helm install my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--create-namespace \
--values production-values.yaml
```
**production-values.yaml:**
```yaml
global:
environment: production
mcpServer:
enabled: true
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 70
resources:
limits:
cpu: 2000m
memory: 4Gi
requests:
cpu: 500m
memory: 1Gi
persistence:
data:
size: 20Gi
storageClass: "fast-ssd"
output:
size: 50Gi
storageClass: "fast-ssd"
vectorDatabases:
weaviate:
enabled: true
persistence:
size: 100Gi
storageClass: "fast-ssd"
qdrant:
enabled: true
persistence:
size: 100Gi
storageClass: "fast-ssd"
chroma:
enabled: true
persistence:
size: 50Gi
storageClass: "fast-ssd"
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: skill-seekers.example.com
paths:
- path: /mcp
pathType: Prefix
backend:
service:
name: mcp
port: 8765
tls:
- secretName: skill-seekers-tls
hosts:
- skill-seekers.example.com
secrets:
anthropicApiKey: "sk-ant-..."
googleApiKey: ""
openaiApiKey: ""
githubToken: ""
```
### Method 4: Custom Values Installation
```bash
# Create custom values
cat > my-values.yaml <<EOF
mcpServer:
replicaCount: 2
resources:
requests:
cpu: 1000m
memory: 2Gi
secrets:
anthropicApiKey: "sk-ant-..."
EOF
# Install with custom values
helm install my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--create-namespace \
--values my-values.yaml
```
## Configuration
### API Keys and Secrets
**Option 1: Via Helm values (NOT recommended for production)**
```bash
helm install my-skill-seekers ./helm/skill-seekers \
--set secrets.anthropicApiKey="sk-ant-..." \
--set secrets.githubToken="ghp_..."
```
**Option 2: Create Secret first (Recommended)**
```bash
# Create secret
kubectl create secret generic skill-seekers-secrets \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
--from-literal=GITHUB_TOKEN="ghp_..." \
--namespace skill-seekers
# Reference in values
# (Chart already uses the secret name pattern)
helm install my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers
```
**Option 3: External Secrets Operator**
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: skill-seekers-secrets
namespace: skill-seekers
spec:
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: skill-seekers-secrets
data:
- secretKey: ANTHROPIC_API_KEY
remoteRef:
key: skill-seekers/anthropic-api-key
```
### Environment Variables
Customize via ConfigMap values:
```yaml
env:
MCP_TRANSPORT: "http"
MCP_PORT: "8765"
PYTHONUNBUFFERED: "1"
CUSTOM_VAR: "value"
```
### Resource Limits
**Development:**
```yaml
mcpServer:
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 250m
memory: 512Mi
```
**Production:**
```yaml
mcpServer:
resources:
limits:
cpu: 4000m
memory: 8Gi
requests:
cpu: 1000m
memory: 2Gi
```
## Accessing Services
### Port Forwarding (Development)
```bash
# MCP Server
kubectl port-forward -n skill-seekers svc/my-skill-seekers-mcp 8765:8765
# Weaviate
kubectl port-forward -n skill-seekers svc/my-skill-seekers-weaviate 8080:8080
# Qdrant
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6333:6333
# Chroma
kubectl port-forward -n skill-seekers svc/my-skill-seekers-chroma 8000:8000
```
### Via LoadBalancer
```yaml
mcpServer:
service:
type: LoadBalancer
```
Get external IP:
```bash
kubectl get svc -n skill-seekers my-skill-seekers-mcp
```
### Via Ingress (Production)
```yaml
ingress:
enabled: true
className: nginx
hosts:
- host: skill-seekers.example.com
paths:
- path: /mcp
pathType: Prefix
backend:
service:
name: mcp
port: 8765
```
Access at: `https://skill-seekers.example.com/mcp`
## Scaling
### Manual Scaling
```bash
# Scale MCP server
kubectl scale deployment -n skill-seekers my-skill-seekers-mcp --replicas=5
# Scale Weaviate
kubectl scale deployment -n skill-seekers my-skill-seekers-weaviate --replicas=3
```
### Horizontal Pod Autoscaler
Enabled by default for MCP server:
```yaml
mcpServer:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
```
Monitor HPA:
```bash
kubectl get hpa -n skill-seekers
kubectl describe hpa -n skill-seekers my-skill-seekers-mcp
```
### Vertical Scaling
Update resource requests/limits:
```bash
helm upgrade my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--set mcpServer.resources.requests.cpu=2000m \
--set mcpServer.resources.requests.memory=4Gi \
--reuse-values
```
## Persistence
### Storage Classes
Specify storage class for different workloads:
```yaml
persistence:
data:
storageClass: "fast-ssd" # Frequently accessed
output:
storageClass: "standard" # Archive storage
configs:
storageClass: "fast-ssd" # Configuration files
```
### PVC Management
```bash
# List PVCs
kubectl get pvc -n skill-seekers
# Expand PVC (if storage class supports it)
kubectl patch pvc my-skill-seekers-data \
-n skill-seekers \
-p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
# View PVC details
kubectl describe pvc -n skill-seekers my-skill-seekers-data
```
### Backup and Restore
**Backup:**
```bash
# Using Velero
velero backup create skill-seekers-backup \
--include-namespaces skill-seekers
# Manual backup (example with data PVC)
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
tar czf - /data | \
cat > skill-seekers-data-backup.tar.gz
```
**Restore:**
```bash
# Using Velero
velero restore create --from-backup skill-seekers-backup
# Manual restore
kubectl exec -i -n skill-seekers deployment/my-skill-seekers-mcp -- \
tar xzf - -C /data < skill-seekers-data-backup.tar.gz
```
## Vector Databases
### Weaviate
**Access:**
```bash
kubectl port-forward -n skill-seekers svc/my-skill-seekers-weaviate 8080:8080
```
**Query:**
```bash
curl http://localhost:8080/v1/schema
```
### Qdrant
**Access:**
```bash
# HTTP API
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6333:6333
# gRPC
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6334:6334
```
**Query:**
```bash
curl http://localhost:6333/collections
```
### Chroma
**Access:**
```bash
kubectl port-forward -n skill-seekers svc/my-skill-seekers-chroma 8000:8000
```
**Query:**
```bash
curl http://localhost:8000/api/v1/collections
```
### Disable Vector Databases
To disable individual vector databases:
```yaml
vectorDatabases:
weaviate:
enabled: false
qdrant:
enabled: false
chroma:
enabled: false
```
## Security
### Pod Security Context
Runs as non-root user (UID 1000):
```yaml
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
allowPrivilegeEscalation: false
```
### Network Policies
Create network policies for isolation:
```yaml
networkPolicy:
enabled: true
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
egress:
- to:
- namespaceSelector: {}
```
### RBAC
Enable RBAC with minimal permissions:
```yaml
rbac:
create: true
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list"]
```
### Secrets Management
**Best Practices:**
1. Never commit secrets to git
2. Use external secret managers (AWS Secrets Manager, HashiCorp Vault)
3. Enable encryption at rest in Kubernetes
4. Rotate secrets regularly
**Example with Sealed Secrets:**
```bash
# Create sealed secret
kubectl create secret generic skill-seekers-secrets \
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
--dry-run=client -o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Apply sealed secret
kubectl apply -f sealed-secret.yaml -n skill-seekers
```
## Monitoring
### Pod Metrics
```bash
# View pod status
kubectl get pods -n skill-seekers
# View pod metrics (requires metrics-server)
kubectl top pods -n skill-seekers
# View pod logs
kubectl logs -n skill-seekers -l app.kubernetes.io/component=mcp-server --tail=100 -f
```
### Prometheus Integration
Enable ServiceMonitor (requires Prometheus Operator):
```yaml
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: 10s
labels:
prometheus: kube-prometheus
```
### Grafana Dashboards
Import dashboard JSON from `helm/skill-seekers/dashboards/`.
### Health Checks
MCP server has built-in health checks:
```yaml
livenessProbe:
httpGet:
path: /health
port: 8765
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8765
initialDelaySeconds: 10
periodSeconds: 5
```
Test manually:
```bash
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
curl http://localhost:8765/health
```
## Troubleshooting
### Pods Not Starting
```bash
# Check pod status
kubectl get pods -n skill-seekers
# View events
kubectl get events -n skill-seekers --sort-by='.lastTimestamp'
# Describe pod
kubectl describe pod -n skill-seekers <pod-name>
# Check logs
kubectl logs -n skill-seekers <pod-name>
```
### Common Issues
**Issue: ImagePullBackOff**
```bash
# Check image pull secrets
kubectl get secrets -n skill-seekers
# Verify image exists
docker pull <image-name>
```
**Issue: CrashLoopBackOff**
```bash
# View recent logs
kubectl logs -n skill-seekers <pod-name> --previous
# Check environment variables
kubectl exec -n skill-seekers <pod-name> -- env
```
**Issue: PVC Pending**
```bash
# Check storage class
kubectl get storageclass
# View PVC events
kubectl describe pvc -n skill-seekers <pvc-name>
# Check if provisioner is running
kubectl get pods -n kube-system | grep provisioner
```
**Issue: API Key Not Working**
```bash
# Verify secret exists
kubectl get secret -n skill-seekers my-skill-seekers
# Check secret contents (base64 encoded)
kubectl get secret -n skill-seekers my-skill-seekers -o yaml
# Test API key manually
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
env | grep ANTHROPIC
```
### Debug Container
Run debug container in same namespace:
```bash
kubectl run debug -n skill-seekers --rm -it \
--image=nicolaka/netshoot \
--restart=Never -- bash
# Inside debug container:
# Test MCP server connectivity
curl http://my-skill-seekers-mcp:8765/health
# Test vector database connectivity
curl http://my-skill-seekers-weaviate:8080/v1/.well-known/ready
```
## Production Best Practices
### 1. Resource Planning
**Capacity Planning:**
- MCP Server: 500m CPU + 1Gi RAM per 10 concurrent requests
- Vector DBs: 2GB RAM + 10GB storage per 100K documents
- Reserve 30% overhead for spikes
**Example Production Setup:**
```yaml
mcpServer:
replicaCount: 5 # Handle 50 concurrent requests
resources:
requests:
cpu: 2500m
memory: 5Gi
autoscaling:
minReplicas: 5
maxReplicas: 20
```
### 2. High Availability
**Anti-Affinity Rules:**
```yaml
mcpServer:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- mcp-server
topologyKey: kubernetes.io/hostname
```
**Multiple Replicas:**
- MCP Server: 3+ replicas across different nodes
- Vector DBs: 2+ replicas with replication
### 3. Monitoring and Alerting
**Key Metrics to Monitor:**
- Pod restart count (> 5 per hour = critical)
- Memory usage (> 90% = warning)
- CPU throttling (> 50% = investigate)
- Request latency (p95 > 1s = warning)
- Error rate (> 1% = critical)
**Prometheus Alerts:**
```yaml
- alert: HighPodRestarts
expr: rate(kube_pod_container_status_restarts_total{namespace="skill-seekers"}[15m]) > 0.1
for: 5m
labels:
severity: warning
```
### 4. Backup Strategy
**Automated Backups:**
```yaml
# CronJob for daily backups
apiVersion: batch/v1
kind: CronJob
metadata:
name: skill-seekers-backup
spec:
schedule: "0 2 * * *" # 2 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: skill-seekers:latest
command:
- /bin/sh
- -c
- tar czf /backup/data-$(date +%Y%m%d).tar.gz /data
```
### 5. Security Hardening
**Security Checklist:**
- [ ] Enable Pod Security Standards
- [ ] Use Network Policies
- [ ] Enable RBAC with least privilege
- [ ] Rotate secrets every 90 days
- [ ] Scan images for vulnerabilities
- [ ] Enable audit logging
- [ ] Use private container registry
- [ ] Enable encryption at rest
### 6. Cost Optimization
**Strategies:**
- Use spot/preemptible instances for non-critical workloads
- Enable cluster autoscaler
- Right-size resource requests
- Use storage tiering (hot/warm/cold)
- Schedule downscaling during off-hours
**Example Cost Optimization:**
```yaml
# Development environment: downscale at night
# Create CronJob to scale down replicas
apiVersion: batch/v1
kind: CronJob
metadata:
name: downscale-dev
spec:
schedule: "0 20 * * *" # 8 PM
jobTemplate:
spec:
template:
spec:
serviceAccountName: scaler
containers:
- name: kubectl
image: bitnami/kubectl
command:
- kubectl
- scale
- deployment
- my-skill-seekers-mcp
- --replicas=1
```
### 7. Update Strategy
**Rolling Updates:**
```yaml
mcpServer:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
```
**Update Process:**
```bash
# 1. Test in staging
helm upgrade my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers-staging \
--values staging-values.yaml
# 2. Run smoke tests
./scripts/smoke-test.sh
# 3. Deploy to production
helm upgrade my-skill-seekers ./helm/skill-seekers \
--namespace skill-seekers \
--values production-values.yaml
# 4. Monitor for 15 minutes
kubectl rollout status deployment -n skill-seekers my-skill-seekers-mcp
# 5. Rollback if issues
helm rollback my-skill-seekers -n skill-seekers
```
## Upgrade Guide
### Minor Version Upgrade
```bash
# Fetch latest chart
helm repo update
# Upgrade with existing values
helm upgrade my-skill-seekers skill-seekers/skill-seekers \
--namespace skill-seekers \
--reuse-values
```
### Major Version Upgrade
```bash
# Backup current values
helm get values my-skill-seekers -n skill-seekers > backup-values.yaml
# Review CHANGELOG for breaking changes
curl https://raw.githubusercontent.com/yourusername/skill-seekers/main/CHANGELOG.md
# Upgrade with migration steps
helm upgrade my-skill-seekers skill-seekers/skill-seekers \
--namespace skill-seekers \
--values backup-values.yaml \
--force # Only if schema changed
```
## Uninstallation
### Full Cleanup
```bash
# Delete Helm release
helm uninstall my-skill-seekers -n skill-seekers
# Delete PVCs (if you want to remove data)
kubectl delete pvc -n skill-seekers --all
# Delete namespace
kubectl delete namespace skill-seekers
```
### Keep Data
```bash
# Delete release but keep PVCs
helm uninstall my-skill-seekers -n skill-seekers
# PVCs remain for later use
kubectl get pvc -n skill-seekers
```
## Additional Resources
- [Helm Documentation](https://helm.sh/docs/)
- [Kubernetes Documentation](https://kubernetes.io/docs/)
- [Skill Seekers GitHub](https://github.com/yourusername/skill-seekers)
- [Issue Tracker](https://github.com/yourusername/skill-seekers/issues)
---
**Need Help?**
- GitHub Issues: https://github.com/yourusername/skill-seekers/issues
- Documentation: https://skillseekersweb.com
- Community: [Link to Discord/Slack]