Files
skill-seekers-reference/docs/DOCKER_GUIDE.md
yusyus 8b3f31409e fix: Enforce min_chunk_size in RAG chunker
- Filter out chunks smaller than min_chunk_size (default 100 tokens)
- Exception: Keep all chunks if entire document is smaller than target size
- All 15 tests passing (100% pass rate)

Fixes edge case where very small chunks (e.g., 'Short.' = 6 chars) were
being created despite min_chunk_size=100 setting.

Test: pytest tests/test_rag_chunker.py -v
2026-02-07 20:59:03 +03:00

12 KiB

Docker Deployment Guide

Complete guide for deploying Skill Seekers using Docker and Docker Compose.

Quick Start

1. Prerequisites

  • Docker 20.10+ installed
  • Docker Compose 2.0+ installed
  • 2GB+ available RAM
  • 5GB+ available disk space
# Check Docker installation
docker --version
docker-compose --version

2. Clone Repository

git clone https://github.com/your-org/skill-seekers.git
cd skill-seekers

3. Configure Environment

# Copy environment template
cp .env.example .env

# Edit .env with your API keys
nano .env  # or your preferred editor

Minimum Required:

  • ANTHROPIC_API_KEY - For AI enhancement features

4. Start Services

# Start all services (CLI + MCP server + vector DBs)
docker-compose up -d

# Or start specific services
docker-compose up -d mcp-server weaviate

5. Verify Deployment

# Check service status
docker-compose ps

# Test CLI
docker-compose run skill-seekers skill-seekers --version

# Test MCP server
curl http://localhost:8765/health

Available Images

1. skill-seekers (CLI)

Purpose: Main CLI application for documentation scraping and skill generation

Usage:

# Run CLI command
docker run --rm \
  -v $(pwd)/output:/output \
  -e ANTHROPIC_API_KEY=your-key \
  skill-seekers skill-seekers scrape --config /configs/react.json

# Interactive shell
docker run -it --rm skill-seekers bash

Image Size: ~400MB Platforms: linux/amd64, linux/arm64

2. skill-seekers-mcp (MCP Server)

Purpose: MCP server with 25 tools for AI assistants

Usage:

# HTTP mode (default)
docker run -d -p 8765:8765 \
  -e ANTHROPIC_API_KEY=your-key \
  skill-seekers-mcp

# Stdio mode
docker run -it \
  -e ANTHROPIC_API_KEY=your-key \
  skill-seekers-mcp \
  python -m skill_seekers.mcp.server_fastmcp --transport stdio

Image Size: ~450MB Platforms: linux/amd64, linux/arm64 Health Check: http://localhost:8765/health


Docker Compose Services

Service Architecture

┌─────────────────────┐
│   skill-seekers     │  CLI Application
└─────────────────────┘

┌─────────────────────┐
│    mcp-server       │  MCP Server (25 tools)
│    Port: 8765       │
└─────────────────────┘

┌─────────────────────┐
│     weaviate        │  Vector DB (hybrid search)
│    Port: 8080       │
└─────────────────────┘

┌─────────────────────┐
│      qdrant         │  Vector DB (native filtering)
│    Ports: 6333/6334 │
└─────────────────────┘

┌─────────────────────┐
│      chroma         │  Vector DB (local-first)
│    Port: 8000       │
└─────────────────────┘

Service Commands

# Start all services
docker-compose up -d

# Start specific services
docker-compose up -d mcp-server weaviate

# Stop all services
docker-compose down

# View logs
docker-compose logs -f mcp-server

# Restart service
docker-compose restart mcp-server

# Scale service (if supported)
docker-compose up -d --scale mcp-server=3

Common Use Cases

Use Case 1: Scrape Documentation

# Create skill from React documentation
docker-compose run skill-seekers \
  skill-seekers scrape --config /configs/react.json

# Output will be in ./output/react/

Use Case 2: Export to Vector Databases

# Export React skill to all vector databases
docker-compose run skill-seekers bash -c "
  skill-seekers scrape --config /configs/react.json &&
  python -c '
import sys
from pathlib import Path
sys.path.insert(0, \"/app/src\")
from skill_seekers.cli.adaptors import get_adaptor

for target in [\"weaviate\", \"chroma\", \"faiss\", \"qdrant\"]:
    adaptor = get_adaptor(target)
    adaptor.package(Path(\"/output/react\"), Path(\"/output\"))
    print(f\"✅ Exported to {target}\")
  '
"

Use Case 3: Run Quality Analysis

# Generate quality report for a skill
docker-compose run skill-seekers bash -c "
  python3 <<'EOF'
import sys
from pathlib import Path
sys.path.insert(0, '/app/src')
from skill_seekers.cli.quality_metrics import QualityAnalyzer

analyzer = QualityAnalyzer(Path('/output/react'))
report = analyzer.generate_report()
print(analyzer.format_report(report))
EOF
"

Use Case 4: MCP Server Integration

# Start MCP server
docker-compose up -d mcp-server

# Configure Claude Desktop
# Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
  "mcpServers": {
    "skill-seekers": {
      "url": "http://localhost:8765/sse"
    }
  }
}

Volume Management

Default Volumes

Volume Path Purpose
./data /data Persistent data (cache, logs)
./configs /configs Configuration files (read-only)
./output /output Generated skills and exports
weaviate-data N/A Weaviate database storage
qdrant-data N/A Qdrant database storage
chroma-data N/A Chroma database storage

Backup Volumes

# Backup vector database data
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
  alpine tar czf /backup/weaviate-backup.tar.gz -C /data .

# Restore from backup
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
  alpine tar xzf /backup/weaviate-backup.tar.gz -C /data

Clean Up Volumes

# Remove all volumes (WARNING: deletes all data)
docker-compose down -v

# Remove specific volume
docker volume rm skill-seekers_weaviate-data

Environment Variables

Required Variables

Variable Description Example
ANTHROPIC_API_KEY Claude AI API key sk-ant-...

Optional Variables

Variable Description Default
GOOGLE_API_KEY Gemini API key -
OPENAI_API_KEY OpenAI API key -
GITHUB_TOKEN GitHub API token -
MCP_TRANSPORT MCP transport mode http
MCP_PORT MCP server port 8765

Setting Variables

Option 1: .env file (recommended)

cp .env.example .env
# Edit .env with your keys

Option 2: Export in shell

export ANTHROPIC_API_KEY=sk-ant-your-key
docker-compose up -d

Option 3: Inline

ANTHROPIC_API_KEY=sk-ant-your-key docker-compose up -d

Building Images Locally

Build CLI Image

docker build -t skill-seekers:local -f Dockerfile .

Build MCP Server Image

docker build -t skill-seekers-mcp:local -f Dockerfile.mcp .

Build with Custom Base Image

# Use slim base (smaller)
docker build -t skill-seekers:slim \
  --build-arg BASE_IMAGE=python:3.12-slim \
  -f Dockerfile .

# Use alpine base (smallest)
docker build -t skill-seekers:alpine \
  --build-arg BASE_IMAGE=python:3.12-alpine \
  -f Dockerfile .

Troubleshooting

Issue: MCP Server Won't Start

Symptoms:

  • Container exits immediately
  • Health check fails

Solutions:

# Check logs
docker-compose logs mcp-server

# Verify port is available
lsof -i :8765

# Test MCP package installation
docker-compose run mcp-server python -c "import mcp; print('OK')"

Issue: Permission Denied

Symptoms:

  • Cannot write to /output
  • Cannot access /configs

Solutions:

# Fix permissions
chmod -R 777 data/ output/

# Or use specific user ID
docker-compose run -u $(id -u):$(id -g) skill-seekers ...

Issue: Out of Memory

Symptoms:

  • Container killed
  • OOMKilled in docker-compose ps

Solutions:

# Increase Docker memory limit
# Edit docker-compose.yml, add:
services:
  skill-seekers:
    mem_limit: 4g
    memswap_limit: 4g

# Or use streaming for large docs
docker-compose run skill-seekers \
  skill-seekers scrape --config /configs/react.json --streaming

Issue: Vector Database Connection Failed

Symptoms:

  • Cannot connect to Weaviate/Qdrant/Chroma
  • Connection refused errors

Solutions:

# Check if services are running
docker-compose ps

# Test connectivity
docker-compose exec skill-seekers curl http://weaviate:8080
docker-compose exec skill-seekers curl http://qdrant:6333
docker-compose exec skill-seekers curl http://chroma:8000

# Restart services
docker-compose restart weaviate qdrant chroma

Issue: Slow Performance

Symptoms:

  • Long scraping times
  • Slow container startup

Solutions:

# Use smaller image
docker pull skill-seekers:slim

# Enable BuildKit cache
export DOCKER_BUILDKIT=1
docker build -t skill-seekers:local .

# Increase CPU allocation
docker-compose up -d --scale skill-seekers=1 --cpu-shares=2048

Production Deployment

Security Hardening

  1. Use secrets management
# Docker secrets (Swarm mode)
echo "sk-ant-your-key" | docker secret create anthropic_key -

# Kubernetes secrets
kubectl create secret generic skill-seekers-secrets \
  --from-literal=anthropic-api-key=sk-ant-your-key
  1. Run as non-root
# Already configured in Dockerfile
USER skillseeker  # UID 1000
  1. Read-only filesystems
# docker-compose.yml
services:
  mcp-server:
    read_only: true
    tmpfs:
      - /tmp
  1. Resource limits
services:
  mcp-server:
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '0.5'
          memory: 512M

Monitoring

  1. Health checks
# Check all services
docker-compose ps

# Detailed health status
docker inspect --format='{{.State.Health.Status}}' skill-seekers-mcp
  1. Logs
# Stream logs
docker-compose logs -f --tail=100

# Export logs
docker-compose logs > skill-seekers-logs.txt
  1. Metrics
# Resource usage
docker stats

# Container inspect
docker-compose exec mcp-server ps aux
docker-compose exec mcp-server df -h

Scaling

  1. Horizontal scaling
# Scale MCP servers
docker-compose up -d --scale mcp-server=3

# Use load balancer
# Add nginx/haproxy in docker-compose.yml
  1. Vertical scaling
# Increase resources
services:
  mcp-server:
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 8G

Best Practices

1. Use Multi-Stage Builds

Already implemented in Dockerfile

  • Builder stage for dependencies
  • Runtime stage for production

2. Minimize Image Size

  • Use slim base images
  • Clean up apt cache
  • Remove unnecessary files via .dockerignore

3. Security

  • Run as non-root user (UID 1000)
  • Use secrets for sensitive data
  • Keep images updated

4. Persistence

  • Use named volumes for databases
  • Mount ./output for generated skills
  • Regular backups of vector DB data

5. Monitoring

  • Enable health checks
  • Stream logs to external service
  • Monitor resource usage

Additional Resources


Last Updated: February 7, 2026 Docker Version: 20.10+ Compose Version: 2.0+