docs: audit and clean up docs/ directory
Removals (duplicate/stale): - docs/DOCKER_GUIDE.md: 80% overlap with DOCKER_DEPLOYMENT.md - docs/KUBERNETES_GUIDE.md: 70% overlap with KUBERNETES_DEPLOYMENT.md - docs/strategy/TASK19_COMPLETE.md: stale task tracking - docs/strategy/TASK20_COMPLETE.md: stale task tracking - docs/strategy/TASK21_COMPLETE.md: stale task tracking - docs/strategy/WEEK2_COMPLETE.md: stale progress report Updates (version/counts): - docs/FAQ.md: v2.7.0 → v3.1.0-dev, 18 MCP tools → 26, 4 platforms → 16+ - docs/QUICK_REFERENCE.md: 18 MCP tools → 26, 1200+ tests → 1,880+, footer updated - docs/features/BOOTSTRAP_SKILL.md: v2.7.0 → v3.1.0-dev header and footer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,575 +0,0 @@
|
|||||||
# Docker Deployment Guide
|
|
||||||
|
|
||||||
Complete guide for deploying Skill Seekers using Docker and Docker Compose.
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
### 1. Prerequisites
|
|
||||||
|
|
||||||
- Docker 20.10+ installed
|
|
||||||
- Docker Compose 2.0+ installed
|
|
||||||
- 2GB+ available RAM
|
|
||||||
- 5GB+ available disk space
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check Docker installation
|
|
||||||
docker --version
|
|
||||||
docker-compose --version
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Clone Repository
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/your-org/skill-seekers.git
|
|
||||||
cd skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Configure Environment
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Copy environment template
|
|
||||||
cp .env.example .env
|
|
||||||
|
|
||||||
# Edit .env with your API keys
|
|
||||||
nano .env # or your preferred editor
|
|
||||||
```
|
|
||||||
|
|
||||||
**Minimum Required:**
|
|
||||||
- `ANTHROPIC_API_KEY` - For AI enhancement features
|
|
||||||
|
|
||||||
### 4. Start Services
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start all services (CLI + MCP server + vector DBs)
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Or start specific services
|
|
||||||
docker-compose up -d mcp-server weaviate
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Verify Deployment
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check service status
|
|
||||||
docker-compose ps
|
|
||||||
|
|
||||||
# Test CLI
|
|
||||||
docker-compose run skill-seekers skill-seekers --version
|
|
||||||
|
|
||||||
# Test MCP server
|
|
||||||
curl http://localhost:8765/health
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Available Images
|
|
||||||
|
|
||||||
### 1. skill-seekers (CLI)
|
|
||||||
|
|
||||||
**Purpose:** Main CLI application for documentation scraping and skill generation
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
# Run CLI command
|
|
||||||
docker run --rm \
|
|
||||||
-v $(pwd)/output:/output \
|
|
||||||
-e ANTHROPIC_API_KEY=your-key \
|
|
||||||
skill-seekers skill-seekers scrape --config /configs/react.json
|
|
||||||
|
|
||||||
# Interactive shell
|
|
||||||
docker run -it --rm skill-seekers bash
|
|
||||||
```
|
|
||||||
|
|
||||||
**Image Size:** ~400MB
|
|
||||||
**Platforms:** linux/amd64, linux/arm64
|
|
||||||
|
|
||||||
### 2. skill-seekers-mcp (MCP Server)
|
|
||||||
|
|
||||||
**Purpose:** MCP server with 25 tools for AI assistants
|
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
# HTTP mode (default)
|
|
||||||
docker run -d -p 8765:8765 \
|
|
||||||
-e ANTHROPIC_API_KEY=your-key \
|
|
||||||
skill-seekers-mcp
|
|
||||||
|
|
||||||
# Stdio mode
|
|
||||||
docker run -it \
|
|
||||||
-e ANTHROPIC_API_KEY=your-key \
|
|
||||||
skill-seekers-mcp \
|
|
||||||
python -m skill_seekers.mcp.server_fastmcp --transport stdio
|
|
||||||
```
|
|
||||||
|
|
||||||
**Image Size:** ~450MB
|
|
||||||
**Platforms:** linux/amd64, linux/arm64
|
|
||||||
**Health Check:** http://localhost:8765/health
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Docker Compose Services
|
|
||||||
|
|
||||||
### Service Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────┐
|
|
||||||
│ skill-seekers │ CLI Application
|
|
||||||
└─────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────┐
|
|
||||||
│ mcp-server │ MCP Server (25 tools)
|
|
||||||
│ Port: 8765 │
|
|
||||||
└─────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────┐
|
|
||||||
│ weaviate │ Vector DB (hybrid search)
|
|
||||||
│ Port: 8080 │
|
|
||||||
└─────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────┐
|
|
||||||
│ qdrant │ Vector DB (native filtering)
|
|
||||||
│ Ports: 6333/6334 │
|
|
||||||
└─────────────────────┘
|
|
||||||
|
|
||||||
┌─────────────────────┐
|
|
||||||
│ chroma │ Vector DB (local-first)
|
|
||||||
│ Port: 8000 │
|
|
||||||
└─────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
### Service Commands
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start all services
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# Start specific services
|
|
||||||
docker-compose up -d mcp-server weaviate
|
|
||||||
|
|
||||||
# Stop all services
|
|
||||||
docker-compose down
|
|
||||||
|
|
||||||
# View logs
|
|
||||||
docker-compose logs -f mcp-server
|
|
||||||
|
|
||||||
# Restart service
|
|
||||||
docker-compose restart mcp-server
|
|
||||||
|
|
||||||
# Scale service (if supported)
|
|
||||||
docker-compose up -d --scale mcp-server=3
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Common Use Cases
|
|
||||||
|
|
||||||
### Use Case 1: Scrape Documentation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create skill from React documentation
|
|
||||||
docker-compose run skill-seekers \
|
|
||||||
skill-seekers scrape --config /configs/react.json
|
|
||||||
|
|
||||||
# Output will be in ./output/react/
|
|
||||||
```
|
|
||||||
|
|
||||||
### Use Case 2: Export to Vector Databases
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Export React skill to all vector databases
|
|
||||||
docker-compose run skill-seekers bash -c "
|
|
||||||
skill-seekers scrape --config /configs/react.json &&
|
|
||||||
python -c '
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
sys.path.insert(0, \"/app/src\")
|
|
||||||
from skill_seekers.cli.adaptors import get_adaptor
|
|
||||||
|
|
||||||
for target in [\"weaviate\", \"chroma\", \"faiss\", \"qdrant\"]:
|
|
||||||
adaptor = get_adaptor(target)
|
|
||||||
adaptor.package(Path(\"/output/react\"), Path(\"/output\"))
|
|
||||||
print(f\"✅ Exported to {target}\")
|
|
||||||
'
|
|
||||||
"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Use Case 3: Run Quality Analysis
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Generate quality report for a skill
|
|
||||||
docker-compose run skill-seekers bash -c "
|
|
||||||
python3 <<'EOF'
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
sys.path.insert(0, '/app/src')
|
|
||||||
from skill_seekers.cli.quality_metrics import QualityAnalyzer
|
|
||||||
|
|
||||||
analyzer = QualityAnalyzer(Path('/output/react'))
|
|
||||||
report = analyzer.generate_report()
|
|
||||||
print(analyzer.format_report(report))
|
|
||||||
EOF
|
|
||||||
"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Use Case 4: MCP Server Integration
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start MCP server
|
|
||||||
docker-compose up -d mcp-server
|
|
||||||
|
|
||||||
# Configure Claude Desktop
|
|
||||||
# Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
|
|
||||||
{
|
|
||||||
"mcpServers": {
|
|
||||||
"skill-seekers": {
|
|
||||||
"url": "http://localhost:8765/sse"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Volume Management
|
|
||||||
|
|
||||||
### Default Volumes
|
|
||||||
|
|
||||||
| Volume | Path | Purpose |
|
|
||||||
|--------|------|---------|
|
|
||||||
| `./data` | `/data` | Persistent data (cache, logs) |
|
|
||||||
| `./configs` | `/configs` | Configuration files (read-only) |
|
|
||||||
| `./output` | `/output` | Generated skills and exports |
|
|
||||||
| `weaviate-data` | N/A | Weaviate database storage |
|
|
||||||
| `qdrant-data` | N/A | Qdrant database storage |
|
|
||||||
| `chroma-data` | N/A | Chroma database storage |
|
|
||||||
|
|
||||||
### Backup Volumes
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Backup vector database data
|
|
||||||
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
|
|
||||||
alpine tar czf /backup/weaviate-backup.tar.gz -C /data .
|
|
||||||
|
|
||||||
# Restore from backup
|
|
||||||
docker run --rm -v skill-seekers_weaviate-data:/data -v $(pwd):/backup \
|
|
||||||
alpine tar xzf /backup/weaviate-backup.tar.gz -C /data
|
|
||||||
```
|
|
||||||
|
|
||||||
### Clean Up Volumes
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Remove all volumes (WARNING: deletes all data)
|
|
||||||
docker-compose down -v
|
|
||||||
|
|
||||||
# Remove specific volume
|
|
||||||
docker volume rm skill-seekers_weaviate-data
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Environment Variables
|
|
||||||
|
|
||||||
### Required Variables
|
|
||||||
|
|
||||||
| Variable | Description | Example |
|
|
||||||
|----------|-------------|---------|
|
|
||||||
| `ANTHROPIC_API_KEY` | Claude AI API key | `sk-ant-...` |
|
|
||||||
|
|
||||||
### Optional Variables
|
|
||||||
|
|
||||||
| Variable | Description | Default |
|
|
||||||
|----------|-------------|---------|
|
|
||||||
| `GOOGLE_API_KEY` | Gemini API key | - |
|
|
||||||
| `OPENAI_API_KEY` | OpenAI API key | - |
|
|
||||||
| `GITHUB_TOKEN` | GitHub API token | - |
|
|
||||||
| `MCP_TRANSPORT` | MCP transport mode | `http` |
|
|
||||||
| `MCP_PORT` | MCP server port | `8765` |
|
|
||||||
|
|
||||||
### Setting Variables
|
|
||||||
|
|
||||||
**Option 1: .env file (recommended)**
|
|
||||||
```bash
|
|
||||||
cp .env.example .env
|
|
||||||
# Edit .env with your keys
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 2: Export in shell**
|
|
||||||
```bash
|
|
||||||
export ANTHROPIC_API_KEY=sk-ant-your-key
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 3: Inline**
|
|
||||||
```bash
|
|
||||||
ANTHROPIC_API_KEY=sk-ant-your-key docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Building Images Locally
|
|
||||||
|
|
||||||
### Build CLI Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker build -t skill-seekers:local -f Dockerfile .
|
|
||||||
```
|
|
||||||
|
|
||||||
### Build MCP Server Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker build -t skill-seekers-mcp:local -f Dockerfile.mcp .
|
|
||||||
```
|
|
||||||
|
|
||||||
### Build with Custom Base Image
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Use slim base (smaller)
|
|
||||||
docker build -t skill-seekers:slim \
|
|
||||||
--build-arg BASE_IMAGE=python:3.12-slim \
|
|
||||||
-f Dockerfile .
|
|
||||||
|
|
||||||
# Use alpine base (smallest)
|
|
||||||
docker build -t skill-seekers:alpine \
|
|
||||||
--build-arg BASE_IMAGE=python:3.12-alpine \
|
|
||||||
-f Dockerfile .
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Issue: MCP Server Won't Start
|
|
||||||
|
|
||||||
**Symptoms:**
|
|
||||||
- Container exits immediately
|
|
||||||
- Health check fails
|
|
||||||
|
|
||||||
**Solutions:**
|
|
||||||
```bash
|
|
||||||
# Check logs
|
|
||||||
docker-compose logs mcp-server
|
|
||||||
|
|
||||||
# Verify port is available
|
|
||||||
lsof -i :8765
|
|
||||||
|
|
||||||
# Test MCP package installation
|
|
||||||
docker-compose run mcp-server python -c "import mcp; print('OK')"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Issue: Permission Denied
|
|
||||||
|
|
||||||
**Symptoms:**
|
|
||||||
- Cannot write to /output
|
|
||||||
- Cannot access /configs
|
|
||||||
|
|
||||||
**Solutions:**
|
|
||||||
```bash
|
|
||||||
# Fix permissions
|
|
||||||
chmod -R 777 data/ output/
|
|
||||||
|
|
||||||
# Or use specific user ID
|
|
||||||
docker-compose run -u $(id -u):$(id -g) skill-seekers ...
|
|
||||||
```
|
|
||||||
|
|
||||||
### Issue: Out of Memory
|
|
||||||
|
|
||||||
**Symptoms:**
|
|
||||||
- Container killed
|
|
||||||
- OOMKilled in `docker-compose ps`
|
|
||||||
|
|
||||||
**Solutions:**
|
|
||||||
```bash
|
|
||||||
# Increase Docker memory limit
|
|
||||||
# Edit docker-compose.yml, add:
|
|
||||||
services:
|
|
||||||
skill-seekers:
|
|
||||||
mem_limit: 4g
|
|
||||||
memswap_limit: 4g
|
|
||||||
|
|
||||||
# Or use streaming for large docs
|
|
||||||
docker-compose run skill-seekers \
|
|
||||||
skill-seekers scrape --config /configs/react.json --streaming
|
|
||||||
```
|
|
||||||
|
|
||||||
### Issue: Vector Database Connection Failed
|
|
||||||
|
|
||||||
**Symptoms:**
|
|
||||||
- Cannot connect to Weaviate/Qdrant/Chroma
|
|
||||||
- Connection refused errors
|
|
||||||
|
|
||||||
**Solutions:**
|
|
||||||
```bash
|
|
||||||
# Check if services are running
|
|
||||||
docker-compose ps
|
|
||||||
|
|
||||||
# Test connectivity
|
|
||||||
docker-compose exec skill-seekers curl http://weaviate:8080
|
|
||||||
docker-compose exec skill-seekers curl http://qdrant:6333
|
|
||||||
docker-compose exec skill-seekers curl http://chroma:8000
|
|
||||||
|
|
||||||
# Restart services
|
|
||||||
docker-compose restart weaviate qdrant chroma
|
|
||||||
```
|
|
||||||
|
|
||||||
### Issue: Slow Performance
|
|
||||||
|
|
||||||
**Symptoms:**
|
|
||||||
- Long scraping times
|
|
||||||
- Slow container startup
|
|
||||||
|
|
||||||
**Solutions:**
|
|
||||||
```bash
|
|
||||||
# Use smaller image
|
|
||||||
docker pull skill-seekers:slim
|
|
||||||
|
|
||||||
# Enable BuildKit cache
|
|
||||||
export DOCKER_BUILDKIT=1
|
|
||||||
docker build -t skill-seekers:local .
|
|
||||||
|
|
||||||
# Increase CPU allocation
|
|
||||||
docker-compose up -d --scale skill-seekers=1 --cpu-shares=2048
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Production Deployment
|
|
||||||
|
|
||||||
### Security Hardening
|
|
||||||
|
|
||||||
1. **Use secrets management**
|
|
||||||
```bash
|
|
||||||
# Docker secrets (Swarm mode)
|
|
||||||
echo "sk-ant-your-key" | docker secret create anthropic_key -
|
|
||||||
|
|
||||||
# Kubernetes secrets
|
|
||||||
kubectl create secret generic skill-seekers-secrets \
|
|
||||||
--from-literal=anthropic-api-key=sk-ant-your-key
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Run as non-root**
|
|
||||||
```dockerfile
|
|
||||||
# Already configured in Dockerfile
|
|
||||||
USER skillseeker # UID 1000
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Read-only filesystems**
|
|
||||||
```yaml
|
|
||||||
# docker-compose.yml
|
|
||||||
services:
|
|
||||||
mcp-server:
|
|
||||||
read_only: true
|
|
||||||
tmpfs:
|
|
||||||
- /tmp
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Resource limits**
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
mcp-server:
|
|
||||||
deploy:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpus: '2.0'
|
|
||||||
memory: 2G
|
|
||||||
reservations:
|
|
||||||
cpus: '0.5'
|
|
||||||
memory: 512M
|
|
||||||
```
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
|
|
||||||
1. **Health checks**
|
|
||||||
```bash
|
|
||||||
# Check all services
|
|
||||||
docker-compose ps
|
|
||||||
|
|
||||||
# Detailed health status
|
|
||||||
docker inspect --format='{{.State.Health.Status}}' skill-seekers-mcp
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Logs**
|
|
||||||
```bash
|
|
||||||
# Stream logs
|
|
||||||
docker-compose logs -f --tail=100
|
|
||||||
|
|
||||||
# Export logs
|
|
||||||
docker-compose logs > skill-seekers-logs.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Metrics**
|
|
||||||
```bash
|
|
||||||
# Resource usage
|
|
||||||
docker stats
|
|
||||||
|
|
||||||
# Container inspect
|
|
||||||
docker-compose exec mcp-server ps aux
|
|
||||||
docker-compose exec mcp-server df -h
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scaling
|
|
||||||
|
|
||||||
1. **Horizontal scaling**
|
|
||||||
```bash
|
|
||||||
# Scale MCP servers
|
|
||||||
docker-compose up -d --scale mcp-server=3
|
|
||||||
|
|
||||||
# Use load balancer
|
|
||||||
# Add nginx/haproxy in docker-compose.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Vertical scaling**
|
|
||||||
```yaml
|
|
||||||
# Increase resources
|
|
||||||
services:
|
|
||||||
mcp-server:
|
|
||||||
deploy:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpus: '4.0'
|
|
||||||
memory: 8G
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### 1. Use Multi-Stage Builds
|
|
||||||
✅ Already implemented in Dockerfile
|
|
||||||
- Builder stage for dependencies
|
|
||||||
- Runtime stage for production
|
|
||||||
|
|
||||||
### 2. Minimize Image Size
|
|
||||||
- Use slim base images
|
|
||||||
- Clean up apt cache
|
|
||||||
- Remove unnecessary files via .dockerignore
|
|
||||||
|
|
||||||
### 3. Security
|
|
||||||
- Run as non-root user (UID 1000)
|
|
||||||
- Use secrets for sensitive data
|
|
||||||
- Keep images updated
|
|
||||||
|
|
||||||
### 4. Persistence
|
|
||||||
- Use named volumes for databases
|
|
||||||
- Mount ./output for generated skills
|
|
||||||
- Regular backups of vector DB data
|
|
||||||
|
|
||||||
### 5. Monitoring
|
|
||||||
- Enable health checks
|
|
||||||
- Stream logs to external service
|
|
||||||
- Monitor resource usage
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Additional Resources
|
|
||||||
|
|
||||||
- [Docker Documentation](https://docs.docker.com/)
|
|
||||||
- [Docker Compose Reference](https://docs.docker.com/compose/compose-file/)
|
|
||||||
- [Skill Seekers Documentation](https://skillseekersweb.com/)
|
|
||||||
- [MCP Server Setup](docs/MCP_SETUP.md)
|
|
||||||
- [Vector Database Integration](docs/strategy/WEEK2_COMPLETE.md)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Last Updated:** February 7, 2026
|
|
||||||
**Docker Version:** 20.10+
|
|
||||||
**Compose Version:** 2.0+
|
|
||||||
54
docs/FAQ.md
54
docs/FAQ.md
@@ -1,7 +1,7 @@
|
|||||||
# Frequently Asked Questions (FAQ)
|
# Frequently Asked Questions (FAQ)
|
||||||
|
|
||||||
**Version:** 2.7.0
|
**Version:** 3.1.0-dev
|
||||||
**Last Updated:** 2026-01-18
|
**Last Updated:** 2026-02-18
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -9,7 +9,7 @@
|
|||||||
|
|
||||||
### What is Skill Seekers?
|
### What is Skill Seekers?
|
||||||
|
|
||||||
Skill Seekers is a Python tool that converts documentation websites, GitHub repositories, and PDF files into AI skills for Claude AI, Google Gemini, OpenAI ChatGPT, and generic Markdown format.
|
Skill Seekers is a Python tool that converts documentation websites, GitHub repositories, and PDF files into AI-ready formats for 16+ platforms: LLM platforms (Claude, Gemini, OpenAI), RAG frameworks (LangChain, LlamaIndex, Haystack), vector databases (ChromaDB, FAISS, Weaviate, Qdrant, Pinecone), and AI coding assistants (Cursor, Windsurf, Cline, Continue.dev).
|
||||||
|
|
||||||
**Use Cases:**
|
**Use Cases:**
|
||||||
- Create custom documentation skills for your favorite frameworks
|
- Create custom documentation skills for your favorite frameworks
|
||||||
@@ -19,12 +19,32 @@ Skill Seekers is a Python tool that converts documentation websites, GitHub repo
|
|||||||
|
|
||||||
### Which platforms are supported?
|
### Which platforms are supported?
|
||||||
|
|
||||||
**Supported Platforms (4):**
|
**Supported Platforms (16+):**
|
||||||
|
|
||||||
|
*LLM Platforms:*
|
||||||
1. **Claude AI** - ZIP format with YAML frontmatter
|
1. **Claude AI** - ZIP format with YAML frontmatter
|
||||||
2. **Google Gemini** - tar.gz format for Grounded Generation
|
2. **Google Gemini** - tar.gz format for Grounded Generation
|
||||||
3. **OpenAI ChatGPT** - ZIP format for Vector Stores
|
3. **OpenAI ChatGPT** - ZIP format for Vector Stores
|
||||||
4. **Generic Markdown** - ZIP format with markdown files
|
4. **Generic Markdown** - ZIP format with markdown files
|
||||||
|
|
||||||
|
*RAG Frameworks:*
|
||||||
|
5. **LangChain** - Document objects for QA chains and agents
|
||||||
|
6. **LlamaIndex** - TextNodes for query engines
|
||||||
|
7. **Haystack** - Document objects for enterprise RAG
|
||||||
|
|
||||||
|
*Vector Databases:*
|
||||||
|
8. **ChromaDB** - Direct collection upload
|
||||||
|
9. **FAISS** - Index files for local similarity search
|
||||||
|
10. **Weaviate** - Vector objects with schema creation
|
||||||
|
11. **Qdrant** - Points with payload indexing
|
||||||
|
12. **Pinecone** - Ready-to-upsert format
|
||||||
|
|
||||||
|
*AI Coding Assistants:*
|
||||||
|
13. **Cursor** - .cursorrules persistent context
|
||||||
|
14. **Windsurf** - .windsurfrules AI coding rules
|
||||||
|
15. **Cline** - .clinerules + MCP integration
|
||||||
|
16. **Continue.dev** - HTTP context server (all IDEs)
|
||||||
|
|
||||||
Each platform has a dedicated adaptor for optimal formatting and upload.
|
Each platform has a dedicated adaptor for optimal formatting and upload.
|
||||||
|
|
||||||
### Is it free to use?
|
### Is it free to use?
|
||||||
@@ -472,16 +492,20 @@ skill-seekers-mcp --transport http --port 8765
|
|||||||
|
|
||||||
### What MCP tools are available?
|
### What MCP tools are available?
|
||||||
|
|
||||||
**18 MCP tools:**
|
**26 MCP tools:**
|
||||||
|
|
||||||
|
*Core Tools (9):*
|
||||||
1. `list_configs` - List preset configurations
|
1. `list_configs` - List preset configurations
|
||||||
2. `generate_config` - Generate config from docs URL
|
2. `generate_config` - Generate config from docs URL
|
||||||
3. `validate_config` - Validate config structure
|
3. `validate_config` - Validate config structure
|
||||||
4. `estimate_pages` - Estimate page count
|
4. `estimate_pages` - Estimate page count
|
||||||
5. `scrape_docs` - Scrape documentation
|
5. `scrape_docs` - Scrape documentation
|
||||||
6. `package_skill` - Package to .zip
|
6. `package_skill` - Package to .zip (supports `--format` and `--target`)
|
||||||
7. `upload_skill` - Upload to platform
|
7. `upload_skill` - Upload to platform (supports `--target`)
|
||||||
8. `enhance_skill` - AI enhancement
|
8. `enhance_skill` - AI enhancement
|
||||||
9. `install_skill` - Complete workflow
|
9. `install_skill` - Complete workflow
|
||||||
|
|
||||||
|
*Extended Tools (10):*
|
||||||
10. `scrape_github` - GitHub analysis
|
10. `scrape_github` - GitHub analysis
|
||||||
11. `scrape_pdf` - PDF extraction
|
11. `scrape_pdf` - PDF extraction
|
||||||
12. `unified_scrape` - Multi-source scraping
|
12. `unified_scrape` - Multi-source scraping
|
||||||
@@ -491,6 +515,18 @@ skill-seekers-mcp --transport http --port 8765
|
|||||||
16. `generate_router` - Generate router skills
|
16. `generate_router` - Generate router skills
|
||||||
17. `add_config_source` - Register git repos
|
17. `add_config_source` - Register git repos
|
||||||
18. `fetch_config` - Fetch configs from git
|
18. `fetch_config` - Fetch configs from git
|
||||||
|
19. `list_config_sources` - List registered sources
|
||||||
|
20. `remove_config_source` - Remove config source
|
||||||
|
|
||||||
|
*Vector DB Tools (4):*
|
||||||
|
21. `export_to_chroma` - Export to ChromaDB
|
||||||
|
22. `export_to_weaviate` - Export to Weaviate
|
||||||
|
23. `export_to_faiss` - Export to FAISS
|
||||||
|
24. `export_to_qdrant` - Export to Qdrant
|
||||||
|
|
||||||
|
*Cloud Tools (3):*
|
||||||
|
25. `cloud_upload` - Upload to S3/GCS/Azure
|
||||||
|
26. `cloud_download` - Download from cloud storage
|
||||||
|
|
||||||
### How do I configure MCP for Claude Code?
|
### How do I configure MCP for Claude Code?
|
||||||
|
|
||||||
@@ -650,6 +686,6 @@ Yes!
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Version:** 2.7.0
|
**Version:** 3.1.0-dev
|
||||||
**Last Updated:** 2026-01-18
|
**Last Updated:** 2026-02-18
|
||||||
**Questions? Ask on [GitHub Discussions](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)**
|
**Questions? Ask on [GitHub Discussions](https://github.com/yusufkaraaslan/Skill_Seekers/discussions)**
|
||||||
|
|||||||
@@ -1,957 +0,0 @@
|
|||||||
# Kubernetes Deployment Guide
|
|
||||||
|
|
||||||
Complete guide for deploying Skill Seekers to Kubernetes using Helm charts.
|
|
||||||
|
|
||||||
## Table of Contents
|
|
||||||
|
|
||||||
- [Prerequisites](#prerequisites)
|
|
||||||
- [Quick Start](#quick-start)
|
|
||||||
- [Installation Methods](#installation-methods)
|
|
||||||
- [Configuration](#configuration)
|
|
||||||
- [Accessing Services](#accessing-services)
|
|
||||||
- [Scaling](#scaling)
|
|
||||||
- [Persistence](#persistence)
|
|
||||||
- [Vector Databases](#vector-databases)
|
|
||||||
- [Security](#security)
|
|
||||||
- [Monitoring](#monitoring)
|
|
||||||
- [Troubleshooting](#troubleshooting)
|
|
||||||
- [Production Best Practices](#production-best-practices)
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
### Required
|
|
||||||
|
|
||||||
- Kubernetes cluster (1.23+)
|
|
||||||
- Helm 3.8+
|
|
||||||
- kubectl configured for your cluster
|
|
||||||
- 20GB+ available storage (for persistence)
|
|
||||||
|
|
||||||
### Recommended
|
|
||||||
|
|
||||||
- Ingress controller (nginx, traefik)
|
|
||||||
- cert-manager (for TLS certificates)
|
|
||||||
- Prometheus operator (for monitoring)
|
|
||||||
- Persistent storage provisioner
|
|
||||||
|
|
||||||
### Cluster Resource Requirements
|
|
||||||
|
|
||||||
**Minimum (Development):**
|
|
||||||
- 2 CPU cores
|
|
||||||
- 8GB RAM
|
|
||||||
- 20GB storage
|
|
||||||
|
|
||||||
**Recommended (Production):**
|
|
||||||
- 8+ CPU cores
|
|
||||||
- 32GB+ RAM
|
|
||||||
- 200GB+ storage (persistent volumes)
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
### 1. Add Helm Repository (if published)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Add Helm repo
|
|
||||||
helm repo add skill-seekers https://yourusername.github.io/skill-seekers
|
|
||||||
helm repo update
|
|
||||||
|
|
||||||
# Install with default values
|
|
||||||
helm install my-skill-seekers skill-seekers/skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--namespace skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Install from Local Chart
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Clone repository
|
|
||||||
git clone https://github.com/yourusername/skill-seekers.git
|
|
||||||
cd skill-seekers
|
|
||||||
|
|
||||||
# Install chart
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--namespace skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Quick Test
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Port-forward MCP server
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-mcp 8765:8765
|
|
||||||
|
|
||||||
# Test health endpoint
|
|
||||||
curl http://localhost:8765/health
|
|
||||||
|
|
||||||
# Expected response: {"status": "ok"}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Installation Methods
|
|
||||||
|
|
||||||
### Method 1: Minimal Installation (Testing)
|
|
||||||
|
|
||||||
Smallest deployment for testing - no persistence, no vector databases.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--set persistence.enabled=false \
|
|
||||||
--set vectorDatabases.weaviate.enabled=false \
|
|
||||||
--set vectorDatabases.qdrant.enabled=false \
|
|
||||||
--set vectorDatabases.chroma.enabled=false \
|
|
||||||
--set mcpServer.replicaCount=1 \
|
|
||||||
--set mcpServer.autoscaling.enabled=false
|
|
||||||
```
|
|
||||||
|
|
||||||
### Method 2: Development Installation
|
|
||||||
|
|
||||||
Moderate resources with persistence for local development.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--set persistence.data.size=5Gi \
|
|
||||||
--set persistence.output.size=10Gi \
|
|
||||||
--set vectorDatabases.weaviate.persistence.size=20Gi \
|
|
||||||
--set mcpServer.replicaCount=1 \
|
|
||||||
--set secrets.anthropicApiKey="sk-ant-..."
|
|
||||||
```
|
|
||||||
|
|
||||||
### Method 3: Production Installation
|
|
||||||
|
|
||||||
Full production deployment with autoscaling, persistence, and all vector databases.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--values production-values.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
**production-values.yaml:**
|
|
||||||
```yaml
|
|
||||||
global:
|
|
||||||
environment: production
|
|
||||||
|
|
||||||
mcpServer:
|
|
||||||
enabled: true
|
|
||||||
replicaCount: 3
|
|
||||||
autoscaling:
|
|
||||||
enabled: true
|
|
||||||
minReplicas: 3
|
|
||||||
maxReplicas: 20
|
|
||||||
targetCPUUtilizationPercentage: 70
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 2000m
|
|
||||||
memory: 4Gi
|
|
||||||
requests:
|
|
||||||
cpu: 500m
|
|
||||||
memory: 1Gi
|
|
||||||
|
|
||||||
persistence:
|
|
||||||
data:
|
|
||||||
size: 20Gi
|
|
||||||
storageClass: "fast-ssd"
|
|
||||||
output:
|
|
||||||
size: 50Gi
|
|
||||||
storageClass: "fast-ssd"
|
|
||||||
|
|
||||||
vectorDatabases:
|
|
||||||
weaviate:
|
|
||||||
enabled: true
|
|
||||||
persistence:
|
|
||||||
size: 100Gi
|
|
||||||
storageClass: "fast-ssd"
|
|
||||||
qdrant:
|
|
||||||
enabled: true
|
|
||||||
persistence:
|
|
||||||
size: 100Gi
|
|
||||||
storageClass: "fast-ssd"
|
|
||||||
chroma:
|
|
||||||
enabled: true
|
|
||||||
persistence:
|
|
||||||
size: 50Gi
|
|
||||||
storageClass: "fast-ssd"
|
|
||||||
|
|
||||||
ingress:
|
|
||||||
enabled: true
|
|
||||||
className: nginx
|
|
||||||
annotations:
|
|
||||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
||||||
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
||||||
hosts:
|
|
||||||
- host: skill-seekers.example.com
|
|
||||||
paths:
|
|
||||||
- path: /mcp
|
|
||||||
pathType: Prefix
|
|
||||||
backend:
|
|
||||||
service:
|
|
||||||
name: mcp
|
|
||||||
port: 8765
|
|
||||||
tls:
|
|
||||||
- secretName: skill-seekers-tls
|
|
||||||
hosts:
|
|
||||||
- skill-seekers.example.com
|
|
||||||
|
|
||||||
secrets:
|
|
||||||
anthropicApiKey: "sk-ant-..."
|
|
||||||
googleApiKey: ""
|
|
||||||
openaiApiKey: ""
|
|
||||||
githubToken: ""
|
|
||||||
```
|
|
||||||
|
|
||||||
### Method 4: Custom Values Installation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create custom values
|
|
||||||
cat > my-values.yaml <<EOF
|
|
||||||
mcpServer:
|
|
||||||
replicaCount: 2
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 1000m
|
|
||||||
memory: 2Gi
|
|
||||||
secrets:
|
|
||||||
anthropicApiKey: "sk-ant-..."
|
|
||||||
EOF
|
|
||||||
|
|
||||||
# Install with custom values
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--create-namespace \
|
|
||||||
--values my-values.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### API Keys and Secrets
|
|
||||||
|
|
||||||
**Option 1: Via Helm values (NOT recommended for production)**
|
|
||||||
```bash
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--set secrets.anthropicApiKey="sk-ant-..." \
|
|
||||||
--set secrets.githubToken="ghp_..."
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 2: Create Secret first (Recommended)**
|
|
||||||
```bash
|
|
||||||
# Create secret
|
|
||||||
kubectl create secret generic skill-seekers-secrets \
|
|
||||||
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
|
|
||||||
--from-literal=GITHUB_TOKEN="ghp_..." \
|
|
||||||
--namespace skill-seekers
|
|
||||||
|
|
||||||
# Reference in values
|
|
||||||
# (Chart already uses the secret name pattern)
|
|
||||||
helm install my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option 3: External Secrets Operator**
|
|
||||||
```yaml
|
|
||||||
apiVersion: external-secrets.io/v1beta1
|
|
||||||
kind: ExternalSecret
|
|
||||||
metadata:
|
|
||||||
name: skill-seekers-secrets
|
|
||||||
namespace: skill-seekers
|
|
||||||
spec:
|
|
||||||
secretStoreRef:
|
|
||||||
name: aws-secrets-manager
|
|
||||||
kind: SecretStore
|
|
||||||
target:
|
|
||||||
name: skill-seekers-secrets
|
|
||||||
data:
|
|
||||||
- secretKey: ANTHROPIC_API_KEY
|
|
||||||
remoteRef:
|
|
||||||
key: skill-seekers/anthropic-api-key
|
|
||||||
```
|
|
||||||
|
|
||||||
### Environment Variables
|
|
||||||
|
|
||||||
Customize via ConfigMap values:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
env:
|
|
||||||
MCP_TRANSPORT: "http"
|
|
||||||
MCP_PORT: "8765"
|
|
||||||
PYTHONUNBUFFERED: "1"
|
|
||||||
CUSTOM_VAR: "value"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Resource Limits
|
|
||||||
|
|
||||||
**Development:**
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 1000m
|
|
||||||
memory: 2Gi
|
|
||||||
requests:
|
|
||||||
cpu: 250m
|
|
||||||
memory: 512Mi
|
|
||||||
```
|
|
||||||
|
|
||||||
**Production:**
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpu: 4000m
|
|
||||||
memory: 8Gi
|
|
||||||
requests:
|
|
||||||
cpu: 1000m
|
|
||||||
memory: 2Gi
|
|
||||||
```
|
|
||||||
|
|
||||||
## Accessing Services
|
|
||||||
|
|
||||||
### Port Forwarding (Development)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# MCP Server
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-mcp 8765:8765
|
|
||||||
|
|
||||||
# Weaviate
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-weaviate 8080:8080
|
|
||||||
|
|
||||||
# Qdrant
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6333:6333
|
|
||||||
|
|
||||||
# Chroma
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-chroma 8000:8000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Via LoadBalancer
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
service:
|
|
||||||
type: LoadBalancer
|
|
||||||
```
|
|
||||||
|
|
||||||
Get external IP:
|
|
||||||
```bash
|
|
||||||
kubectl get svc -n skill-seekers my-skill-seekers-mcp
|
|
||||||
```
|
|
||||||
|
|
||||||
### Via Ingress (Production)
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
ingress:
|
|
||||||
enabled: true
|
|
||||||
className: nginx
|
|
||||||
hosts:
|
|
||||||
- host: skill-seekers.example.com
|
|
||||||
paths:
|
|
||||||
- path: /mcp
|
|
||||||
pathType: Prefix
|
|
||||||
backend:
|
|
||||||
service:
|
|
||||||
name: mcp
|
|
||||||
port: 8765
|
|
||||||
```
|
|
||||||
|
|
||||||
Access at: `https://skill-seekers.example.com/mcp`
|
|
||||||
|
|
||||||
## Scaling
|
|
||||||
|
|
||||||
### Manual Scaling
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Scale MCP server
|
|
||||||
kubectl scale deployment -n skill-seekers my-skill-seekers-mcp --replicas=5
|
|
||||||
|
|
||||||
# Scale Weaviate
|
|
||||||
kubectl scale deployment -n skill-seekers my-skill-seekers-weaviate --replicas=3
|
|
||||||
```
|
|
||||||
|
|
||||||
### Horizontal Pod Autoscaler
|
|
||||||
|
|
||||||
Enabled by default for MCP server:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
autoscaling:
|
|
||||||
enabled: true
|
|
||||||
minReplicas: 2
|
|
||||||
maxReplicas: 10
|
|
||||||
targetCPUUtilizationPercentage: 70
|
|
||||||
targetMemoryUtilizationPercentage: 80
|
|
||||||
```
|
|
||||||
|
|
||||||
Monitor HPA:
|
|
||||||
```bash
|
|
||||||
kubectl get hpa -n skill-seekers
|
|
||||||
kubectl describe hpa -n skill-seekers my-skill-seekers-mcp
|
|
||||||
```
|
|
||||||
|
|
||||||
### Vertical Scaling
|
|
||||||
|
|
||||||
Update resource requests/limits:
|
|
||||||
```bash
|
|
||||||
helm upgrade my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--set mcpServer.resources.requests.cpu=2000m \
|
|
||||||
--set mcpServer.resources.requests.memory=4Gi \
|
|
||||||
--reuse-values
|
|
||||||
```
|
|
||||||
|
|
||||||
## Persistence
|
|
||||||
|
|
||||||
### Storage Classes
|
|
||||||
|
|
||||||
Specify storage class for different workloads:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
persistence:
|
|
||||||
data:
|
|
||||||
storageClass: "fast-ssd" # Frequently accessed
|
|
||||||
output:
|
|
||||||
storageClass: "standard" # Archive storage
|
|
||||||
configs:
|
|
||||||
storageClass: "fast-ssd" # Configuration files
|
|
||||||
```
|
|
||||||
|
|
||||||
### PVC Management
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# List PVCs
|
|
||||||
kubectl get pvc -n skill-seekers
|
|
||||||
|
|
||||||
# Expand PVC (if storage class supports it)
|
|
||||||
kubectl patch pvc my-skill-seekers-data \
|
|
||||||
-n skill-seekers \
|
|
||||||
-p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
|
|
||||||
|
|
||||||
# View PVC details
|
|
||||||
kubectl describe pvc -n skill-seekers my-skill-seekers-data
|
|
||||||
```
|
|
||||||
|
|
||||||
### Backup and Restore
|
|
||||||
|
|
||||||
**Backup:**
|
|
||||||
```bash
|
|
||||||
# Using Velero
|
|
||||||
velero backup create skill-seekers-backup \
|
|
||||||
--include-namespaces skill-seekers
|
|
||||||
|
|
||||||
# Manual backup (example with data PVC)
|
|
||||||
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
|
|
||||||
tar czf - /data | \
|
|
||||||
cat > skill-seekers-data-backup.tar.gz
|
|
||||||
```
|
|
||||||
|
|
||||||
**Restore:**
|
|
||||||
```bash
|
|
||||||
# Using Velero
|
|
||||||
velero restore create --from-backup skill-seekers-backup
|
|
||||||
|
|
||||||
# Manual restore
|
|
||||||
kubectl exec -i -n skill-seekers deployment/my-skill-seekers-mcp -- \
|
|
||||||
tar xzf - -C /data < skill-seekers-data-backup.tar.gz
|
|
||||||
```
|
|
||||||
|
|
||||||
## Vector Databases
|
|
||||||
|
|
||||||
### Weaviate
|
|
||||||
|
|
||||||
**Access:**
|
|
||||||
```bash
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-weaviate 8080:8080
|
|
||||||
```
|
|
||||||
|
|
||||||
**Query:**
|
|
||||||
```bash
|
|
||||||
curl http://localhost:8080/v1/schema
|
|
||||||
```
|
|
||||||
|
|
||||||
### Qdrant
|
|
||||||
|
|
||||||
**Access:**
|
|
||||||
```bash
|
|
||||||
# HTTP API
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6333:6333
|
|
||||||
|
|
||||||
# gRPC
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-qdrant 6334:6334
|
|
||||||
```
|
|
||||||
|
|
||||||
**Query:**
|
|
||||||
```bash
|
|
||||||
curl http://localhost:6333/collections
|
|
||||||
```
|
|
||||||
|
|
||||||
### Chroma
|
|
||||||
|
|
||||||
**Access:**
|
|
||||||
```bash
|
|
||||||
kubectl port-forward -n skill-seekers svc/my-skill-seekers-chroma 8000:8000
|
|
||||||
```
|
|
||||||
|
|
||||||
**Query:**
|
|
||||||
```bash
|
|
||||||
curl http://localhost:8000/api/v1/collections
|
|
||||||
```
|
|
||||||
|
|
||||||
### Disable Vector Databases
|
|
||||||
|
|
||||||
To disable individual vector databases:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
vectorDatabases:
|
|
||||||
weaviate:
|
|
||||||
enabled: false
|
|
||||||
qdrant:
|
|
||||||
enabled: false
|
|
||||||
chroma:
|
|
||||||
enabled: false
|
|
||||||
```
|
|
||||||
|
|
||||||
## Security
|
|
||||||
|
|
||||||
### Pod Security Context
|
|
||||||
|
|
||||||
Runs as non-root user (UID 1000):
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
podSecurityContext:
|
|
||||||
runAsNonRoot: true
|
|
||||||
runAsUser: 1000
|
|
||||||
fsGroup: 1000
|
|
||||||
|
|
||||||
securityContext:
|
|
||||||
capabilities:
|
|
||||||
drop:
|
|
||||||
- ALL
|
|
||||||
readOnlyRootFilesystem: false
|
|
||||||
allowPrivilegeEscalation: false
|
|
||||||
```
|
|
||||||
|
|
||||||
### Network Policies
|
|
||||||
|
|
||||||
Create network policies for isolation:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
networkPolicy:
|
|
||||||
enabled: true
|
|
||||||
policyTypes:
|
|
||||||
- Ingress
|
|
||||||
- Egress
|
|
||||||
ingress:
|
|
||||||
- from:
|
|
||||||
- namespaceSelector:
|
|
||||||
matchLabels:
|
|
||||||
name: ingress-nginx
|
|
||||||
egress:
|
|
||||||
- to:
|
|
||||||
- namespaceSelector: {}
|
|
||||||
```
|
|
||||||
|
|
||||||
### RBAC
|
|
||||||
|
|
||||||
Enable RBAC with minimal permissions:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
rbac:
|
|
||||||
create: true
|
|
||||||
rules:
|
|
||||||
- apiGroups: [""]
|
|
||||||
resources: ["configmaps", "secrets"]
|
|
||||||
verbs: ["get", "list"]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Secrets Management
|
|
||||||
|
|
||||||
**Best Practices:**
|
|
||||||
1. Never commit secrets to git
|
|
||||||
2. Use external secret managers (AWS Secrets Manager, HashiCorp Vault)
|
|
||||||
3. Enable encryption at rest in Kubernetes
|
|
||||||
4. Rotate secrets regularly
|
|
||||||
|
|
||||||
**Example with Sealed Secrets:**
|
|
||||||
```bash
|
|
||||||
# Create sealed secret
|
|
||||||
kubectl create secret generic skill-seekers-secrets \
|
|
||||||
--from-literal=ANTHROPIC_API_KEY="sk-ant-..." \
|
|
||||||
--dry-run=client -o yaml | \
|
|
||||||
kubeseal -o yaml > sealed-secret.yaml
|
|
||||||
|
|
||||||
# Apply sealed secret
|
|
||||||
kubectl apply -f sealed-secret.yaml -n skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
## Monitoring
|
|
||||||
|
|
||||||
### Pod Metrics
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# View pod status
|
|
||||||
kubectl get pods -n skill-seekers
|
|
||||||
|
|
||||||
# View pod metrics (requires metrics-server)
|
|
||||||
kubectl top pods -n skill-seekers
|
|
||||||
|
|
||||||
# View pod logs
|
|
||||||
kubectl logs -n skill-seekers -l app.kubernetes.io/component=mcp-server --tail=100 -f
|
|
||||||
```
|
|
||||||
|
|
||||||
### Prometheus Integration
|
|
||||||
|
|
||||||
Enable ServiceMonitor (requires Prometheus Operator):
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
serviceMonitor:
|
|
||||||
enabled: true
|
|
||||||
interval: 30s
|
|
||||||
scrapeTimeout: 10s
|
|
||||||
labels:
|
|
||||||
prometheus: kube-prometheus
|
|
||||||
```
|
|
||||||
|
|
||||||
### Grafana Dashboards
|
|
||||||
|
|
||||||
Import dashboard JSON from `helm/skill-seekers/dashboards/`.
|
|
||||||
|
|
||||||
### Health Checks
|
|
||||||
|
|
||||||
MCP server has built-in health checks:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health
|
|
||||||
port: 8765
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
periodSeconds: 10
|
|
||||||
|
|
||||||
readinessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health
|
|
||||||
port: 8765
|
|
||||||
initialDelaySeconds: 10
|
|
||||||
periodSeconds: 5
|
|
||||||
```
|
|
||||||
|
|
||||||
Test manually:
|
|
||||||
```bash
|
|
||||||
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
|
|
||||||
curl http://localhost:8765/health
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Pods Not Starting
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check pod status
|
|
||||||
kubectl get pods -n skill-seekers
|
|
||||||
|
|
||||||
# View events
|
|
||||||
kubectl get events -n skill-seekers --sort-by='.lastTimestamp'
|
|
||||||
|
|
||||||
# Describe pod
|
|
||||||
kubectl describe pod -n skill-seekers <pod-name>
|
|
||||||
|
|
||||||
# Check logs
|
|
||||||
kubectl logs -n skill-seekers <pod-name>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
**Issue: ImagePullBackOff**
|
|
||||||
```bash
|
|
||||||
# Check image pull secrets
|
|
||||||
kubectl get secrets -n skill-seekers
|
|
||||||
|
|
||||||
# Verify image exists
|
|
||||||
docker pull <image-name>
|
|
||||||
```
|
|
||||||
|
|
||||||
**Issue: CrashLoopBackOff**
|
|
||||||
```bash
|
|
||||||
# View recent logs
|
|
||||||
kubectl logs -n skill-seekers <pod-name> --previous
|
|
||||||
|
|
||||||
# Check environment variables
|
|
||||||
kubectl exec -n skill-seekers <pod-name> -- env
|
|
||||||
```
|
|
||||||
|
|
||||||
**Issue: PVC Pending**
|
|
||||||
```bash
|
|
||||||
# Check storage class
|
|
||||||
kubectl get storageclass
|
|
||||||
|
|
||||||
# View PVC events
|
|
||||||
kubectl describe pvc -n skill-seekers <pvc-name>
|
|
||||||
|
|
||||||
# Check if provisioner is running
|
|
||||||
kubectl get pods -n kube-system | grep provisioner
|
|
||||||
```
|
|
||||||
|
|
||||||
**Issue: API Key Not Working**
|
|
||||||
```bash
|
|
||||||
# Verify secret exists
|
|
||||||
kubectl get secret -n skill-seekers my-skill-seekers
|
|
||||||
|
|
||||||
# Check secret contents (base64 encoded)
|
|
||||||
kubectl get secret -n skill-seekers my-skill-seekers -o yaml
|
|
||||||
|
|
||||||
# Test API key manually
|
|
||||||
kubectl exec -n skill-seekers deployment/my-skill-seekers-mcp -- \
|
|
||||||
env | grep ANTHROPIC
|
|
||||||
```
|
|
||||||
|
|
||||||
### Debug Container
|
|
||||||
|
|
||||||
Run debug container in same namespace:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl run debug -n skill-seekers --rm -it \
|
|
||||||
--image=nicolaka/netshoot \
|
|
||||||
--restart=Never -- bash
|
|
||||||
|
|
||||||
# Inside debug container:
|
|
||||||
# Test MCP server connectivity
|
|
||||||
curl http://my-skill-seekers-mcp:8765/health
|
|
||||||
|
|
||||||
# Test vector database connectivity
|
|
||||||
curl http://my-skill-seekers-weaviate:8080/v1/.well-known/ready
|
|
||||||
```
|
|
||||||
|
|
||||||
## Production Best Practices
|
|
||||||
|
|
||||||
### 1. Resource Planning
|
|
||||||
|
|
||||||
**Capacity Planning:**
|
|
||||||
- MCP Server: 500m CPU + 1Gi RAM per 10 concurrent requests
|
|
||||||
- Vector DBs: 2GB RAM + 10GB storage per 100K documents
|
|
||||||
- Reserve 30% overhead for spikes
|
|
||||||
|
|
||||||
**Example Production Setup:**
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
replicaCount: 5 # Handle 50 concurrent requests
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 2500m
|
|
||||||
memory: 5Gi
|
|
||||||
autoscaling:
|
|
||||||
minReplicas: 5
|
|
||||||
maxReplicas: 20
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. High Availability
|
|
||||||
|
|
||||||
**Anti-Affinity Rules:**
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
affinity:
|
|
||||||
podAntiAffinity:
|
|
||||||
requiredDuringSchedulingIgnoredDuringExecution:
|
|
||||||
- labelSelector:
|
|
||||||
matchExpressions:
|
|
||||||
- key: app.kubernetes.io/component
|
|
||||||
operator: In
|
|
||||||
values:
|
|
||||||
- mcp-server
|
|
||||||
topologyKey: kubernetes.io/hostname
|
|
||||||
```
|
|
||||||
|
|
||||||
**Multiple Replicas:**
|
|
||||||
- MCP Server: 3+ replicas across different nodes
|
|
||||||
- Vector DBs: 2+ replicas with replication
|
|
||||||
|
|
||||||
### 3. Monitoring and Alerting
|
|
||||||
|
|
||||||
**Key Metrics to Monitor:**
|
|
||||||
- Pod restart count (> 5 per hour = critical)
|
|
||||||
- Memory usage (> 90% = warning)
|
|
||||||
- CPU throttling (> 50% = investigate)
|
|
||||||
- Request latency (p95 > 1s = warning)
|
|
||||||
- Error rate (> 1% = critical)
|
|
||||||
|
|
||||||
**Prometheus Alerts:**
|
|
||||||
```yaml
|
|
||||||
- alert: HighPodRestarts
|
|
||||||
expr: rate(kube_pod_container_status_restarts_total{namespace="skill-seekers"}[15m]) > 0.1
|
|
||||||
for: 5m
|
|
||||||
labels:
|
|
||||||
severity: warning
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Backup Strategy
|
|
||||||
|
|
||||||
**Automated Backups:**
|
|
||||||
```yaml
|
|
||||||
# CronJob for daily backups
|
|
||||||
apiVersion: batch/v1
|
|
||||||
kind: CronJob
|
|
||||||
metadata:
|
|
||||||
name: skill-seekers-backup
|
|
||||||
spec:
|
|
||||||
schedule: "0 2 * * *" # 2 AM daily
|
|
||||||
jobTemplate:
|
|
||||||
spec:
|
|
||||||
template:
|
|
||||||
spec:
|
|
||||||
containers:
|
|
||||||
- name: backup
|
|
||||||
image: skill-seekers:latest
|
|
||||||
command:
|
|
||||||
- /bin/sh
|
|
||||||
- -c
|
|
||||||
- tar czf /backup/data-$(date +%Y%m%d).tar.gz /data
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Security Hardening
|
|
||||||
|
|
||||||
**Security Checklist:**
|
|
||||||
- [ ] Enable Pod Security Standards
|
|
||||||
- [ ] Use Network Policies
|
|
||||||
- [ ] Enable RBAC with least privilege
|
|
||||||
- [ ] Rotate secrets every 90 days
|
|
||||||
- [ ] Scan images for vulnerabilities
|
|
||||||
- [ ] Enable audit logging
|
|
||||||
- [ ] Use private container registry
|
|
||||||
- [ ] Enable encryption at rest
|
|
||||||
|
|
||||||
### 6. Cost Optimization
|
|
||||||
|
|
||||||
**Strategies:**
|
|
||||||
- Use spot/preemptible instances for non-critical workloads
|
|
||||||
- Enable cluster autoscaler
|
|
||||||
- Right-size resource requests
|
|
||||||
- Use storage tiering (hot/warm/cold)
|
|
||||||
- Schedule downscaling during off-hours
|
|
||||||
|
|
||||||
**Example Cost Optimization:**
|
|
||||||
```yaml
|
|
||||||
# Development environment: downscale at night
|
|
||||||
# Create CronJob to scale down replicas
|
|
||||||
apiVersion: batch/v1
|
|
||||||
kind: CronJob
|
|
||||||
metadata:
|
|
||||||
name: downscale-dev
|
|
||||||
spec:
|
|
||||||
schedule: "0 20 * * *" # 8 PM
|
|
||||||
jobTemplate:
|
|
||||||
spec:
|
|
||||||
template:
|
|
||||||
spec:
|
|
||||||
serviceAccountName: scaler
|
|
||||||
containers:
|
|
||||||
- name: kubectl
|
|
||||||
image: bitnami/kubectl
|
|
||||||
command:
|
|
||||||
- kubectl
|
|
||||||
- scale
|
|
||||||
- deployment
|
|
||||||
- my-skill-seekers-mcp
|
|
||||||
- --replicas=1
|
|
||||||
```
|
|
||||||
|
|
||||||
### 7. Update Strategy
|
|
||||||
|
|
||||||
**Rolling Updates:**
|
|
||||||
```yaml
|
|
||||||
mcpServer:
|
|
||||||
strategy:
|
|
||||||
type: RollingUpdate
|
|
||||||
rollingUpdate:
|
|
||||||
maxSurge: 1
|
|
||||||
maxUnavailable: 0
|
|
||||||
```
|
|
||||||
|
|
||||||
**Update Process:**
|
|
||||||
```bash
|
|
||||||
# 1. Test in staging
|
|
||||||
helm upgrade my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers-staging \
|
|
||||||
--values staging-values.yaml
|
|
||||||
|
|
||||||
# 2. Run smoke tests
|
|
||||||
./scripts/smoke-test.sh
|
|
||||||
|
|
||||||
# 3. Deploy to production
|
|
||||||
helm upgrade my-skill-seekers ./helm/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--values production-values.yaml
|
|
||||||
|
|
||||||
# 4. Monitor for 15 minutes
|
|
||||||
kubectl rollout status deployment -n skill-seekers my-skill-seekers-mcp
|
|
||||||
|
|
||||||
# 5. Rollback if issues
|
|
||||||
helm rollback my-skill-seekers -n skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
## Upgrade Guide
|
|
||||||
|
|
||||||
### Minor Version Upgrade
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Fetch latest chart
|
|
||||||
helm repo update
|
|
||||||
|
|
||||||
# Upgrade with existing values
|
|
||||||
helm upgrade my-skill-seekers skill-seekers/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--reuse-values
|
|
||||||
```
|
|
||||||
|
|
||||||
### Major Version Upgrade
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Backup current values
|
|
||||||
helm get values my-skill-seekers -n skill-seekers > backup-values.yaml
|
|
||||||
|
|
||||||
# Review CHANGELOG for breaking changes
|
|
||||||
curl https://raw.githubusercontent.com/yourusername/skill-seekers/main/CHANGELOG.md
|
|
||||||
|
|
||||||
# Upgrade with migration steps
|
|
||||||
helm upgrade my-skill-seekers skill-seekers/skill-seekers \
|
|
||||||
--namespace skill-seekers \
|
|
||||||
--values backup-values.yaml \
|
|
||||||
--force # Only if schema changed
|
|
||||||
```
|
|
||||||
|
|
||||||
## Uninstallation
|
|
||||||
|
|
||||||
### Full Cleanup
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Delete Helm release
|
|
||||||
helm uninstall my-skill-seekers -n skill-seekers
|
|
||||||
|
|
||||||
# Delete PVCs (if you want to remove data)
|
|
||||||
kubectl delete pvc -n skill-seekers --all
|
|
||||||
|
|
||||||
# Delete namespace
|
|
||||||
kubectl delete namespace skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
### Keep Data
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Delete release but keep PVCs
|
|
||||||
helm uninstall my-skill-seekers -n skill-seekers
|
|
||||||
|
|
||||||
# PVCs remain for later use
|
|
||||||
kubectl get pvc -n skill-seekers
|
|
||||||
```
|
|
||||||
|
|
||||||
## Additional Resources
|
|
||||||
|
|
||||||
- [Helm Documentation](https://helm.sh/docs/)
|
|
||||||
- [Kubernetes Documentation](https://kubernetes.io/docs/)
|
|
||||||
- [Skill Seekers GitHub](https://github.com/yourusername/skill-seekers)
|
|
||||||
- [Issue Tracker](https://github.com/yourusername/skill-seekers/issues)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Need Help?**
|
|
||||||
- GitHub Issues: https://github.com/yourusername/skill-seekers/issues
|
|
||||||
- Documentation: https://skillseekersweb.com
|
|
||||||
- Community: [Link to Discord/Slack]
|
|
||||||
@@ -239,7 +239,7 @@ skill-seekers-mcp
|
|||||||
skill-seekers-mcp --transport http --port 8765
|
skill-seekers-mcp --transport http --port 8765
|
||||||
```
|
```
|
||||||
|
|
||||||
### MCP Tools (18 total)
|
### MCP Tools (26 total)
|
||||||
|
|
||||||
**Core Tools:**
|
**Core Tools:**
|
||||||
1. `list_configs` - List preset configurations
|
1. `list_configs` - List preset configurations
|
||||||
@@ -286,7 +286,7 @@ export GITHUB_TOKEN=ghp_...
|
|||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Run all tests (1200+)
|
# Run all tests (1,880+)
|
||||||
pytest tests/ -v
|
pytest tests/ -v
|
||||||
|
|
||||||
# Run with coverage
|
# Run with coverage
|
||||||
@@ -463,4 +463,4 @@ skill-seekers validate-config configs/my-config.json
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Version:** 3.1.0-dev | **Test Count:** 1880+ | **Platforms:** Claude, Gemini, OpenAI, Markdown
|
**Version:** 3.1.0-dev | **Test Count:** 1,880+ | **MCP Tools:** 26 | **Platforms:** 16+ (Claude, Gemini, OpenAI, LangChain, LlamaIndex, ChromaDB, FAISS, Cursor, Windsurf, and more)
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
# Bootstrap Skill - Self-Hosting (v2.7.0)
|
# Bootstrap Skill - Self-Hosting (v3.1.0-dev)
|
||||||
|
|
||||||
**Version:** 2.7.0
|
**Version:** 3.1.0-dev
|
||||||
**Feature:** Bootstrap Skill (Dogfooding)
|
**Feature:** Bootstrap Skill (Dogfooding)
|
||||||
**Status:** ✅ Production Ready
|
**Status:** ✅ Production Ready
|
||||||
**Last Updated:** 2026-01-18
|
**Last Updated:** 2026-02-18
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -691,6 +691,6 @@ echo "✅ Validation passed"
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Version:** 2.7.0
|
**Version:** 3.1.0-dev
|
||||||
**Last Updated:** 2026-01-18
|
**Last Updated:** 2026-02-18
|
||||||
**Status:** ✅ Production Ready
|
**Status:** ✅ Production Ready
|
||||||
|
|||||||
@@ -1,422 +0,0 @@
|
|||||||
# Task #19 Complete: MCP Server Integration for Vector Databases
|
|
||||||
|
|
||||||
**Completion Date:** February 7, 2026
|
|
||||||
**Status:** ✅ Complete
|
|
||||||
**Tests:** 8/8 passing
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Extend the MCP server to expose the 4 new vector database adaptors (Weaviate, Chroma, FAISS, Qdrant) as MCP tools, enabling Claude AI assistants to export skills directly to vector databases.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Summary
|
|
||||||
|
|
||||||
### Files Created
|
|
||||||
|
|
||||||
1. **src/skill_seekers/mcp/tools/vector_db_tools.py** (500+ lines)
|
|
||||||
- 4 async implementation functions
|
|
||||||
- Comprehensive docstrings with examples
|
|
||||||
- Error handling for missing directories/adaptors
|
|
||||||
- Usage instructions with code examples
|
|
||||||
- Links to official documentation
|
|
||||||
|
|
||||||
2. **tests/test_mcp_vector_dbs.py** (274 lines)
|
|
||||||
- 8 comprehensive test cases
|
|
||||||
- Test fixtures for skill directories
|
|
||||||
- Validation of exports, error handling, and output format
|
|
||||||
- All tests passing (8/8)
|
|
||||||
|
|
||||||
### Files Modified
|
|
||||||
|
|
||||||
1. **src/skill_seekers/mcp/tools/__init__.py**
|
|
||||||
- Added vector_db_tools module to docstring
|
|
||||||
- Imported 4 new tool implementations
|
|
||||||
- Added to __all__ exports
|
|
||||||
|
|
||||||
2. **src/skill_seekers/mcp/server_fastmcp.py**
|
|
||||||
- Updated docstring from "21 tools" to "25 tools"
|
|
||||||
- Added 6th category: "Vector Database tools"
|
|
||||||
- Imported 4 new implementations (both try/except blocks)
|
|
||||||
- Registered 4 new tools with @safe_tool_decorator
|
|
||||||
- Added VECTOR DATABASE TOOLS section (125 lines)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New MCP Tools
|
|
||||||
|
|
||||||
### 1. export_to_weaviate
|
|
||||||
|
|
||||||
**Description:** Export skill to Weaviate vector database format (hybrid search, 450K+ users)
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `skill_dir` (str): Path to skill directory
|
|
||||||
- `output_dir` (str, optional): Output directory
|
|
||||||
|
|
||||||
**Output:** JSON file with Weaviate schema, objects, and configuration
|
|
||||||
|
|
||||||
**Usage Instructions Include:**
|
|
||||||
- Python code for uploading to Weaviate
|
|
||||||
- Hybrid search query examples
|
|
||||||
- Links to Weaviate documentation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. export_to_chroma
|
|
||||||
|
|
||||||
**Description:** Export skill to Chroma vector database format (local-first, 800K+ developers)
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `skill_dir` (str): Path to skill directory
|
|
||||||
- `output_dir` (str, optional): Output directory
|
|
||||||
|
|
||||||
**Output:** JSON file with Chroma collection data
|
|
||||||
|
|
||||||
**Usage Instructions Include:**
|
|
||||||
- Python code for loading into Chroma
|
|
||||||
- Query collection examples
|
|
||||||
- Links to Chroma documentation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. export_to_faiss
|
|
||||||
|
|
||||||
**Description:** Export skill to FAISS vector index format (billion-scale, GPU-accelerated)
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `skill_dir` (str): Path to skill directory
|
|
||||||
- `output_dir` (str, optional): Output directory
|
|
||||||
|
|
||||||
**Output:** JSON file with FAISS embeddings, metadata, and index config
|
|
||||||
|
|
||||||
**Usage Instructions Include:**
|
|
||||||
- Python code for building FAISS index (Flat, IVF, HNSW options)
|
|
||||||
- Search examples
|
|
||||||
- Index saving/loading
|
|
||||||
- Links to FAISS documentation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. export_to_qdrant
|
|
||||||
|
|
||||||
**Description:** Export skill to Qdrant vector database format (native filtering, 100K+ users)
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `skill_dir` (str): Path to skill directory
|
|
||||||
- `output_dir` (str, optional): Output directory
|
|
||||||
|
|
||||||
**Output:** JSON file with Qdrant collection data and points
|
|
||||||
|
|
||||||
**Usage Instructions Include:**
|
|
||||||
- Python code for uploading to Qdrant
|
|
||||||
- Search with filters examples
|
|
||||||
- Links to Qdrant documentation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Test Coverage
|
|
||||||
|
|
||||||
### Test Cases (8/8 passing)
|
|
||||||
|
|
||||||
1. **test_export_to_weaviate** - Validates Weaviate export with output verification
|
|
||||||
2. **test_export_to_chroma** - Validates Chroma export with output verification
|
|
||||||
3. **test_export_to_faiss** - Validates FAISS export with output verification
|
|
||||||
4. **test_export_to_qdrant** - Validates Qdrant export with output verification
|
|
||||||
5. **test_export_with_default_output_dir** - Tests default output directory behavior
|
|
||||||
6. **test_export_missing_skill_dir** - Validates error handling for missing directories
|
|
||||||
7. **test_all_exports_create_files** - Validates file creation for all 4 exports
|
|
||||||
8. **test_export_output_includes_instructions** - Validates usage instructions in output
|
|
||||||
|
|
||||||
### Test Results
|
|
||||||
|
|
||||||
```
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_to_weaviate PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_to_chroma PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_to_faiss PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_to_qdrant PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_with_default_output_dir PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_missing_skill_dir PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_all_exports_create_files PASSED
|
|
||||||
tests/test_mcp_vector_dbs.py::test_export_output_includes_instructions PASSED
|
|
||||||
|
|
||||||
8 passed in 0.35s
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Integration Architecture
|
|
||||||
|
|
||||||
### MCP Server Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
MCP Server (25 tools, 6 categories)
|
|
||||||
├── Config tools (3)
|
|
||||||
├── Scraping tools (8)
|
|
||||||
├── Packaging tools (4)
|
|
||||||
├── Splitting tools (2)
|
|
||||||
├── Source tools (4)
|
|
||||||
└── Vector Database tools (4) ← NEW
|
|
||||||
├── export_to_weaviate
|
|
||||||
├── export_to_chroma
|
|
||||||
├── export_to_faiss
|
|
||||||
└── export_to_qdrant
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tool Implementation Pattern
|
|
||||||
|
|
||||||
Each tool follows the FastMCP pattern:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@safe_tool_decorator(description="...")
|
|
||||||
async def export_to_<target>(
|
|
||||||
skill_dir: str,
|
|
||||||
output_dir: str | None = None,
|
|
||||||
) -> str:
|
|
||||||
"""Tool docstring with args and returns."""
|
|
||||||
args = {"skill_dir": skill_dir}
|
|
||||||
if output_dir:
|
|
||||||
args["output_dir"] = output_dir
|
|
||||||
|
|
||||||
result = await export_to_<target>_impl(args)
|
|
||||||
if isinstance(result, list) and result:
|
|
||||||
return result[0].text if hasattr(result[0], "text") else str(result[0])
|
|
||||||
return str(result)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Claude Desktop MCP Config
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"mcpServers": {
|
|
||||||
"skill-seeker": {
|
|
||||||
"command": "python",
|
|
||||||
"args": ["-m", "skill_seekers.mcp.server_fastmcp"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Using Vector Database Tools
|
|
||||||
|
|
||||||
**Example 1: Export to Weaviate**
|
|
||||||
|
|
||||||
```
|
|
||||||
export_to_weaviate(
|
|
||||||
skill_dir="output/react",
|
|
||||||
output_dir="output"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example 2: Export to Chroma with default output**
|
|
||||||
|
|
||||||
```
|
|
||||||
export_to_chroma(skill_dir="output/django")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example 3: Export to FAISS**
|
|
||||||
|
|
||||||
```
|
|
||||||
export_to_faiss(
|
|
||||||
skill_dir="output/fastapi",
|
|
||||||
output_dir="/tmp/exports"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example 4: Export to Qdrant**
|
|
||||||
|
|
||||||
```
|
|
||||||
export_to_qdrant(skill_dir="output/vue")
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Output Format Example
|
|
||||||
|
|
||||||
Each tool returns comprehensive instructions:
|
|
||||||
|
|
||||||
```
|
|
||||||
✅ Weaviate Export Complete!
|
|
||||||
|
|
||||||
📦 Package: react-weaviate.json
|
|
||||||
📁 Location: output/
|
|
||||||
📊 Size: 45,678 bytes
|
|
||||||
|
|
||||||
🔧 Next Steps:
|
|
||||||
1. Upload to Weaviate:
|
|
||||||
```python
|
|
||||||
import weaviate
|
|
||||||
import json
|
|
||||||
|
|
||||||
client = weaviate.Client("http://localhost:8080")
|
|
||||||
data = json.load(open("output/react-weaviate.json"))
|
|
||||||
|
|
||||||
# Create schema
|
|
||||||
client.schema.create_class(data["schema"])
|
|
||||||
|
|
||||||
# Batch upload objects
|
|
||||||
with client.batch as batch:
|
|
||||||
for obj in data["objects"]:
|
|
||||||
batch.add_data_object(obj["properties"], data["class_name"])
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Query with hybrid search:
|
|
||||||
```python
|
|
||||||
result = client.query.get(data["class_name"], ["content", "source"]) \
|
|
||||||
.with_hybrid("React hooks usage") \
|
|
||||||
.with_limit(5) \
|
|
||||||
.do()
|
|
||||||
```
|
|
||||||
|
|
||||||
📚 Resources:
|
|
||||||
- Weaviate Docs: https://weaviate.io/developers/weaviate
|
|
||||||
- Hybrid Search: https://weaviate.io/developers/weaviate/search/hybrid
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Technical Achievements
|
|
||||||
|
|
||||||
### 1. Consistent Interface
|
|
||||||
|
|
||||||
All 4 tools share the same interface:
|
|
||||||
- Same parameter structure
|
|
||||||
- Same error handling pattern
|
|
||||||
- Same output format (TextContent with detailed instructions)
|
|
||||||
- Same integration with existing adaptors
|
|
||||||
|
|
||||||
### 2. Comprehensive Documentation
|
|
||||||
|
|
||||||
Each tool includes:
|
|
||||||
- Clear docstrings with parameter descriptions
|
|
||||||
- Usage examples in output
|
|
||||||
- Python code snippets for uploading
|
|
||||||
- Query examples for searching
|
|
||||||
- Links to official documentation
|
|
||||||
|
|
||||||
### 3. Robust Error Handling
|
|
||||||
|
|
||||||
- Missing skill directory detection
|
|
||||||
- Adaptor import failure handling
|
|
||||||
- Graceful fallback for missing dependencies
|
|
||||||
- Clear error messages with suggestions
|
|
||||||
|
|
||||||
### 4. Complete Test Coverage
|
|
||||||
|
|
||||||
- 8 test cases covering all scenarios
|
|
||||||
- Fixture-based test setup for reusability
|
|
||||||
- Validation of structure, content, and files
|
|
||||||
- Error case testing
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Impact
|
|
||||||
|
|
||||||
### MCP Server Expansion
|
|
||||||
|
|
||||||
- **Before:** 21 tools across 5 categories
|
|
||||||
- **After:** 25 tools across 6 categories (+19% growth)
|
|
||||||
- **New Capability:** Direct vector database export from MCP
|
|
||||||
|
|
||||||
### Vector Database Support
|
|
||||||
|
|
||||||
- **Weaviate:** Hybrid search (vector + BM25), 450K+ users
|
|
||||||
- **Chroma:** Local-first development, 800K+ developers
|
|
||||||
- **FAISS:** Billion-scale search, GPU-accelerated
|
|
||||||
- **Qdrant:** Native filtering, 100K+ users
|
|
||||||
|
|
||||||
### Developer Experience
|
|
||||||
|
|
||||||
- Claude AI assistants can now export skills to vector databases directly
|
|
||||||
- No manual CLI commands needed
|
|
||||||
- Comprehensive usage instructions included
|
|
||||||
- Complete end-to-end workflow from scraping to vector database
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Integration with Week 2 Adaptors
|
|
||||||
|
|
||||||
Task #19 completes the MCP integration of Week 2's vector database adaptors:
|
|
||||||
|
|
||||||
| Task | Feature | MCP Integration |
|
|
||||||
|------|---------|-----------------|
|
|
||||||
| #10 | Weaviate Adaptor | ✅ export_to_weaviate |
|
|
||||||
| #11 | Chroma Adaptor | ✅ export_to_chroma |
|
|
||||||
| #12 | FAISS Adaptor | ✅ export_to_faiss |
|
|
||||||
| #13 | Qdrant Adaptor | ✅ export_to_qdrant |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Steps (Week 3)
|
|
||||||
|
|
||||||
With Task #19 complete, Week 3 can begin:
|
|
||||||
|
|
||||||
- **Task #20:** GitHub Actions automation
|
|
||||||
- **Task #21:** Docker deployment
|
|
||||||
- **Task #22:** Kubernetes Helm charts
|
|
||||||
- **Task #23:** Multi-cloud storage (S3, GCS, Azure Blob)
|
|
||||||
- **Task #24:** API server for embedding generation
|
|
||||||
- **Task #25:** Real-time documentation sync
|
|
||||||
- **Task #26:** Performance benchmarking suite
|
|
||||||
- **Task #27:** Production deployment guides
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Summary
|
|
||||||
|
|
||||||
### Created (2 files, ~800 lines)
|
|
||||||
|
|
||||||
- `src/skill_seekers/mcp/tools/vector_db_tools.py` (500+ lines)
|
|
||||||
- `tests/test_mcp_vector_dbs.py` (274 lines)
|
|
||||||
|
|
||||||
### Modified (3 files)
|
|
||||||
|
|
||||||
- `src/skill_seekers/mcp/tools/__init__.py` (+16 lines)
|
|
||||||
- `src/skill_seekers/mcp/server_fastmcp.py` (+140 lines)
|
|
||||||
- (Updated: tool count, imports, new section)
|
|
||||||
|
|
||||||
### Total Impact
|
|
||||||
|
|
||||||
- **New Lines:** ~800
|
|
||||||
- **Modified Lines:** ~150
|
|
||||||
- **Test Coverage:** 8/8 passing
|
|
||||||
- **New MCP Tools:** 4
|
|
||||||
- **MCP Tool Count:** 21 → 25
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Lessons Learned
|
|
||||||
|
|
||||||
### What Worked Well ✅
|
|
||||||
|
|
||||||
1. **Consistent patterns** - Following existing MCP tool structure made integration seamless
|
|
||||||
2. **Comprehensive testing** - 8 test cases caught all edge cases
|
|
||||||
3. **Clear documentation** - Usage instructions in output reduce support burden
|
|
||||||
4. **Error handling** - Graceful degradation for missing dependencies
|
|
||||||
|
|
||||||
### Challenges Overcome ⚡
|
|
||||||
|
|
||||||
1. **Async testing** - Converted to synchronous tests with asyncio.run() wrapper
|
|
||||||
2. **pytest-asyncio unavailable** - Used run_async() helper for compatibility
|
|
||||||
3. **Import paths** - Careful CLI_DIR path handling for adaptor access
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quality Metrics
|
|
||||||
|
|
||||||
- **Test Pass Rate:** 100% (8/8)
|
|
||||||
- **Code Coverage:** All new functions tested
|
|
||||||
- **Documentation:** Complete docstrings and usage examples
|
|
||||||
- **Integration:** Seamless with existing MCP server
|
|
||||||
- **Performance:** Tests run in <0.5 seconds
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Task #19: MCP Server Integration for Vector Databases - COMPLETE ✅**
|
|
||||||
|
|
||||||
**Ready for Week 3 Task #20: GitHub Actions Automation**
|
|
||||||
@@ -1,439 +0,0 @@
|
|||||||
# Task #20 Complete: GitHub Actions Automation Workflows
|
|
||||||
|
|
||||||
**Completion Date:** February 7, 2026
|
|
||||||
**Status:** ✅ Complete
|
|
||||||
**New Workflows:** 4
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Extend GitHub Actions with automated workflows for Week 2 features, including vector database exports, quality metrics automation, scheduled skill updates, and comprehensive testing infrastructure.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Implementation Summary
|
|
||||||
|
|
||||||
Created 4 new GitHub Actions workflows that automate Week 2 features and provide comprehensive CI/CD capabilities for skill generation, quality analysis, and vector database integration.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Workflows
|
|
||||||
|
|
||||||
### 1. Vector Database Export (`vector-db-export.yml`)
|
|
||||||
|
|
||||||
**Triggers:**
|
|
||||||
- Manual (`workflow_dispatch`) with parameters
|
|
||||||
- Scheduled (weekly on Sundays at 2 AM UTC)
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Matrix strategy for popular frameworks (react, django, godot, fastapi)
|
|
||||||
- Export to all 4 vector databases (Weaviate, Chroma, FAISS, Qdrant)
|
|
||||||
- Configurable targets (single, multiple, or all)
|
|
||||||
- Automatic quality report generation
|
|
||||||
- Artifact uploads with 30-day retention
|
|
||||||
- GitHub Step Summary with export results
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `skill_name`: Framework to export
|
|
||||||
- `targets`: Vector databases (comma-separated or "all")
|
|
||||||
- `config_path`: Optional config file path
|
|
||||||
|
|
||||||
**Output:**
|
|
||||||
- Vector database JSON exports
|
|
||||||
- Quality metrics report
|
|
||||||
- Export summary in GitHub UI
|
|
||||||
|
|
||||||
**Security:** All inputs accessed via environment variables (safe pattern)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. Quality Metrics Dashboard (`quality-metrics.yml`)
|
|
||||||
|
|
||||||
**Triggers:**
|
|
||||||
- Manual (`workflow_dispatch`) with parameters
|
|
||||||
- Pull requests affecting `output/` or `configs/`
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Automated quality analysis with 4-dimensional scoring
|
|
||||||
- GitHub annotations (errors, warnings, notices)
|
|
||||||
- Configurable fail threshold (default: 70/100)
|
|
||||||
- Automatic PR comments with quality dashboard
|
|
||||||
- Multi-skill analysis support
|
|
||||||
- Artifact uploads of detailed reports
|
|
||||||
|
|
||||||
**Quality Dimensions:**
|
|
||||||
1. **Completeness** (30% weight) - SKILL.md, references, metadata
|
|
||||||
2. **Accuracy** (25% weight) - No TODOs, valid JSON, no placeholders
|
|
||||||
3. **Coverage** (25% weight) - Getting started, API docs, examples
|
|
||||||
4. **Health** (20% weight) - No empty files, proper structure
|
|
||||||
|
|
||||||
**Output:**
|
|
||||||
- Quality score with letter grade (A+ to F)
|
|
||||||
- Component breakdowns
|
|
||||||
- GitHub annotations on files
|
|
||||||
- PR comments with dashboard
|
|
||||||
- Detailed reports as artifacts
|
|
||||||
|
|
||||||
**Security:** Workflow_dispatch inputs and PR events only, no untrusted content
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Test Vector Database Adaptors (`test-vector-dbs.yml`)
|
|
||||||
|
|
||||||
**Triggers:**
|
|
||||||
- Push to `main` or `development`
|
|
||||||
- Pull requests
|
|
||||||
- Manual (`workflow_dispatch`)
|
|
||||||
- Path filters for adaptor/MCP code
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Matrix testing across 4 adaptors × 2 Python versions (3.10, 3.12)
|
|
||||||
- Individual adaptor tests
|
|
||||||
- Integration testing with real packaging
|
|
||||||
- MCP tool testing
|
|
||||||
- Week 2 validation script
|
|
||||||
- Test artifact uploads
|
|
||||||
- Comprehensive test summary
|
|
||||||
|
|
||||||
**Test Jobs:**
|
|
||||||
1. **test-adaptors** - Tests each adaptor (Weaviate, Chroma, FAISS, Qdrant)
|
|
||||||
2. **test-mcp-tools** - Tests MCP vector database tools
|
|
||||||
3. **test-week2-integration** - Full Week 2 feature validation
|
|
||||||
|
|
||||||
**Coverage:**
|
|
||||||
- 4 vector database adaptors
|
|
||||||
- 8 MCP tools
|
|
||||||
- 6 Week 2 feature categories
|
|
||||||
- Python 3.10 and 3.12 compatibility
|
|
||||||
|
|
||||||
**Security:** Push/PR/workflow_dispatch only, matrix values are hardcoded constants
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Scheduled Skill Updates (`scheduled-updates.yml`)
|
|
||||||
|
|
||||||
**Triggers:**
|
|
||||||
- Scheduled (weekly on Sundays at 3 AM UTC)
|
|
||||||
- Manual (`workflow_dispatch`) with optional framework filter
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Matrix strategy for 6 popular frameworks
|
|
||||||
- Incremental updates using change detection (95% faster)
|
|
||||||
- Full scrape for new skills
|
|
||||||
- Streaming ingestion for large docs
|
|
||||||
- Automatic quality report generation
|
|
||||||
- Claude AI packaging
|
|
||||||
- Artifact uploads with 90-day retention
|
|
||||||
- Update summary dashboard
|
|
||||||
|
|
||||||
**Supported Frameworks:**
|
|
||||||
- React
|
|
||||||
- Django
|
|
||||||
- FastAPI
|
|
||||||
- Godot
|
|
||||||
- Vue
|
|
||||||
- Flask
|
|
||||||
|
|
||||||
**Workflow:**
|
|
||||||
1. Check if skill exists
|
|
||||||
2. Incremental update if exists (change detection)
|
|
||||||
3. Full scrape if new
|
|
||||||
4. Generate quality metrics
|
|
||||||
5. Package for Claude AI
|
|
||||||
6. Upload artifacts
|
|
||||||
|
|
||||||
**Parameters:**
|
|
||||||
- `frameworks`: Comma-separated list or "all" (default: all)
|
|
||||||
|
|
||||||
**Security:** Schedule + workflow_dispatch, input accessed via FRAMEWORKS_INPUT env variable
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Workflow Integration
|
|
||||||
|
|
||||||
### Existing Workflows Enhanced
|
|
||||||
|
|
||||||
The new workflows complement existing CI/CD:
|
|
||||||
|
|
||||||
| Workflow | Purpose | Integration |
|
|
||||||
|----------|---------|-------------|
|
|
||||||
| `tests.yml` | Core testing | Enhanced with Week 2 test runs |
|
|
||||||
| `release.yml` | PyPI publishing | Now includes quality metrics |
|
|
||||||
| `vector-db-export.yml` | ✨ NEW - Export automation | |
|
|
||||||
| `quality-metrics.yml` | ✨ NEW - Quality dashboard | |
|
|
||||||
| `test-vector-dbs.yml` | ✨ NEW - Week 2 testing | |
|
|
||||||
| `scheduled-updates.yml` | ✨ NEW - Auto-refresh | |
|
|
||||||
|
|
||||||
### Workflow Relationships
|
|
||||||
|
|
||||||
```
|
|
||||||
tests.yml (Core CI)
|
|
||||||
└─> test-vector-dbs.yml (Week 2 specific)
|
|
||||||
└─> quality-metrics.yml (Quality gates)
|
|
||||||
|
|
||||||
scheduled-updates.yml (Weekly refresh)
|
|
||||||
└─> vector-db-export.yml (Export to vector DBs)
|
|
||||||
└─> quality-metrics.yml (Quality check)
|
|
||||||
|
|
||||||
Pull Request
|
|
||||||
└─> tests.yml + quality-metrics.yml (PR validation)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Features & Benefits
|
|
||||||
|
|
||||||
### 1. Automation
|
|
||||||
|
|
||||||
**Before Task #20:**
|
|
||||||
- Manual vector database exports
|
|
||||||
- Manual quality checks
|
|
||||||
- No automated skill updates
|
|
||||||
- Limited CI/CD for Week 2 features
|
|
||||||
|
|
||||||
**After Task #20:**
|
|
||||||
- ✅ Automated weekly exports to 4 vector databases
|
|
||||||
- ✅ Automated quality analysis with PR comments
|
|
||||||
- ✅ Automated skill refresh for 6 frameworks
|
|
||||||
- ✅ Comprehensive Week 2 feature testing
|
|
||||||
|
|
||||||
### 2. Quality Gates
|
|
||||||
|
|
||||||
**PR Quality Checks:**
|
|
||||||
1. Code quality (ruff, mypy) - `tests.yml`
|
|
||||||
2. Unit tests (pytest) - `tests.yml`
|
|
||||||
3. Vector DB tests - `test-vector-dbs.yml`
|
|
||||||
4. Quality metrics - `quality-metrics.yml`
|
|
||||||
|
|
||||||
**Release Quality:**
|
|
||||||
1. All tests pass
|
|
||||||
2. Quality score ≥ 70/100
|
|
||||||
3. Vector DB exports successful
|
|
||||||
4. MCP tools validated
|
|
||||||
|
|
||||||
### 3. Continuous Delivery
|
|
||||||
|
|
||||||
**Weekly Automation:**
|
|
||||||
- Sunday 2 AM: Vector DB exports (`vector-db-export.yml`)
|
|
||||||
- Sunday 3 AM: Skill updates (`scheduled-updates.yml`)
|
|
||||||
|
|
||||||
**On-Demand:**
|
|
||||||
- Manual triggers for all workflows
|
|
||||||
- Custom framework selection
|
|
||||||
- Configurable quality thresholds
|
|
||||||
- Selective vector database exports
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Security Measures
|
|
||||||
|
|
||||||
All workflows follow GitHub Actions security best practices:
|
|
||||||
|
|
||||||
### ✅ Safe Input Handling
|
|
||||||
|
|
||||||
1. **Environment Variables:** All inputs accessed via `env:` section
|
|
||||||
2. **No Direct Interpolation:** Never use `${{ github.event.* }}` in `run:` commands
|
|
||||||
3. **Quoted Variables:** All shell variables properly quoted
|
|
||||||
4. **Controlled Triggers:** Only `workflow_dispatch`, `schedule`, `push`, `pull_request`
|
|
||||||
|
|
||||||
### ❌ Avoided Patterns
|
|
||||||
|
|
||||||
- No `github.event.issue.title/body` usage
|
|
||||||
- No `github.event.comment.body` in run commands
|
|
||||||
- No `github.event.pull_request.head.ref` direct usage
|
|
||||||
- No untrusted commit messages in commands
|
|
||||||
|
|
||||||
### Security Documentation
|
|
||||||
|
|
||||||
Each workflow includes security comment header:
|
|
||||||
```yaml
|
|
||||||
# Security Note: This workflow uses [trigger types].
|
|
||||||
# All inputs accessed via environment variables (safe pattern).
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Manual Vector Database Export
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Export React skill to all vector databases
|
|
||||||
gh workflow run vector-db-export.yml \
|
|
||||||
-f skill_name=react \
|
|
||||||
-f targets=all
|
|
||||||
|
|
||||||
# Export Django to specific databases
|
|
||||||
gh workflow run vector-db-export.yml \
|
|
||||||
-f skill_name=django \
|
|
||||||
-f targets=weaviate,chroma
|
|
||||||
```
|
|
||||||
|
|
||||||
### Quality Analysis
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Analyze specific skill
|
|
||||||
gh workflow run quality-metrics.yml \
|
|
||||||
-f skill_dir=output/react \
|
|
||||||
-f fail_threshold=80
|
|
||||||
|
|
||||||
# On PR: Automatically triggered
|
|
||||||
# (no manual invocation needed)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scheduled Updates
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Update specific frameworks
|
|
||||||
gh workflow run scheduled-updates.yml \
|
|
||||||
-f frameworks=react,django
|
|
||||||
|
|
||||||
# Weekly automatic updates
|
|
||||||
# (runs every Sunday at 3 AM UTC)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Vector DB Testing
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Manual test run
|
|
||||||
gh workflow run test-vector-dbs.yml
|
|
||||||
|
|
||||||
# Automatic on push/PR
|
|
||||||
# (triggered by adaptor code changes)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Artifacts & Outputs
|
|
||||||
|
|
||||||
### Artifact Types
|
|
||||||
|
|
||||||
1. **Vector Database Exports** (30-day retention)
|
|
||||||
- `{skill}-vector-exports` - All 4 JSON files
|
|
||||||
- Format: `{skill}-{target}.json`
|
|
||||||
|
|
||||||
2. **Quality Reports** (30-day retention)
|
|
||||||
- `{skill}-quality-report` - Detailed analysis
|
|
||||||
- `quality-metrics-reports` - All reports
|
|
||||||
|
|
||||||
3. **Updated Skills** (90-day retention)
|
|
||||||
- `{framework}-skill-updated` - Refreshed skill ZIPs
|
|
||||||
- Claude AI ready packages
|
|
||||||
|
|
||||||
4. **Test Packages** (7-day retention)
|
|
||||||
- `test-package-{adaptor}-py{version}` - Test exports
|
|
||||||
|
|
||||||
### GitHub UI Integration
|
|
||||||
|
|
||||||
**Step Summaries:**
|
|
||||||
- Export results with file sizes
|
|
||||||
- Quality dashboard with grades
|
|
||||||
- Test results matrix
|
|
||||||
- Update status for frameworks
|
|
||||||
|
|
||||||
**PR Comments:**
|
|
||||||
- Quality metrics dashboard
|
|
||||||
- Threshold pass/fail status
|
|
||||||
- Recommendations for improvement
|
|
||||||
|
|
||||||
**Annotations:**
|
|
||||||
- Errors: Quality < threshold
|
|
||||||
- Warnings: Quality < 80
|
|
||||||
- Notices: Quality ≥ 80
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Performance Metrics
|
|
||||||
|
|
||||||
### Workflow Execution Times
|
|
||||||
|
|
||||||
| Workflow | Duration | Frequency |
|
|
||||||
|----------|----------|-----------|
|
|
||||||
| vector-db-export.yml | 5-10 min/skill | Weekly + manual |
|
|
||||||
| quality-metrics.yml | 1-2 min/skill | PR + manual |
|
|
||||||
| test-vector-dbs.yml | 8-12 min | Push/PR |
|
|
||||||
| scheduled-updates.yml | 10-15 min/framework | Weekly |
|
|
||||||
|
|
||||||
### Resource Usage
|
|
||||||
|
|
||||||
- **Concurrency:** Matrix strategies for parallelization
|
|
||||||
- **Caching:** pip cache for dependencies
|
|
||||||
- **Artifacts:** Compressed with retention policies
|
|
||||||
- **Storage:** ~500MB/week for all workflows
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Integration with Week 2 Features
|
|
||||||
|
|
||||||
Task #20 workflows integrate all Week 2 capabilities:
|
|
||||||
|
|
||||||
| Week 2 Feature | Workflow Integration |
|
|
||||||
|----------------|---------------------|
|
|
||||||
| **Weaviate Adaptor** | `vector-db-export.yml`, `test-vector-dbs.yml` |
|
|
||||||
| **Chroma Adaptor** | `vector-db-export.yml`, `test-vector-dbs.yml` |
|
|
||||||
| **FAISS Adaptor** | `vector-db-export.yml`, `test-vector-dbs.yml` |
|
|
||||||
| **Qdrant Adaptor** | `vector-db-export.yml`, `test-vector-dbs.yml` |
|
|
||||||
| **Streaming Ingestion** | `scheduled-updates.yml` |
|
|
||||||
| **Incremental Updates** | `scheduled-updates.yml` |
|
|
||||||
| **Multi-Language** | All workflows (language detection) |
|
|
||||||
| **Embedding Pipeline** | `vector-db-export.yml` |
|
|
||||||
| **Quality Metrics** | `quality-metrics.yml` |
|
|
||||||
| **MCP Integration** | `test-vector-dbs.yml` |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Steps (Week 3 Remaining)
|
|
||||||
|
|
||||||
With Task #20 complete, continue Week 3 automation:
|
|
||||||
|
|
||||||
- **Task #21:** Docker deployment
|
|
||||||
- **Task #22:** Kubernetes Helm charts
|
|
||||||
- **Task #23:** Multi-cloud storage (S3, GCS, Azure)
|
|
||||||
- **Task #24:** API server for embedding generation
|
|
||||||
- **Task #25:** Real-time documentation sync
|
|
||||||
- **Task #26:** Performance benchmarking suite
|
|
||||||
- **Task #27:** Production deployment guides
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Created
|
|
||||||
|
|
||||||
### GitHub Actions Workflows (4 files)
|
|
||||||
|
|
||||||
1. `.github/workflows/vector-db-export.yml` (220 lines)
|
|
||||||
2. `.github/workflows/quality-metrics.yml` (180 lines)
|
|
||||||
3. `.github/workflows/test-vector-dbs.yml` (140 lines)
|
|
||||||
4. `.github/workflows/scheduled-updates.yml` (200 lines)
|
|
||||||
|
|
||||||
### Total Impact
|
|
||||||
|
|
||||||
- **New Files:** 4 workflows (~740 lines)
|
|
||||||
- **Enhanced Workflows:** 2 (tests.yml, release.yml)
|
|
||||||
- **Automation Coverage:** 10 Week 2 features
|
|
||||||
- **CI/CD Maturity:** Basic → Advanced
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quality Improvements
|
|
||||||
|
|
||||||
### CI/CD Coverage
|
|
||||||
|
|
||||||
- **Before:** 2 workflows (tests, release)
|
|
||||||
- **After:** 6 workflows (+4 new)
|
|
||||||
- **Automation:** Manual → Automated
|
|
||||||
- **Frequency:** On-demand → Scheduled
|
|
||||||
|
|
||||||
### Developer Experience
|
|
||||||
|
|
||||||
- **Quality Feedback:** Manual → Automated PR comments
|
|
||||||
- **Vector DB Export:** CLI → GitHub Actions
|
|
||||||
- **Skill Updates:** Manual → Weekly automatic
|
|
||||||
- **Testing:** Basic → Comprehensive matrix
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Task #20: GitHub Actions Automation Workflows - COMPLETE ✅**
|
|
||||||
|
|
||||||
**Week 3 Progress:** 1/8 tasks complete
|
|
||||||
**Ready for Task #21:** Docker Deployment
|
|
||||||
@@ -1,515 +0,0 @@
|
|||||||
# Task #21 Complete: Docker Deployment Infrastructure
|
|
||||||
|
|
||||||
**Completion Date:** February 7, 2026
|
|
||||||
**Status:** ✅ Complete
|
|
||||||
**Deliverables:** 6 files
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Objective
|
|
||||||
|
|
||||||
Create comprehensive Docker deployment infrastructure including multi-stage builds, Docker Compose orchestration, vector database integration, CI/CD automation, and production-ready documentation.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Deliverables
|
|
||||||
|
|
||||||
### 1. Dockerfile (Main CLI)
|
|
||||||
|
|
||||||
**File:** `Dockerfile` (70 lines)
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Multi-stage build (builder + runtime)
|
|
||||||
- Python 3.12 slim base
|
|
||||||
- Non-root user (UID 1000)
|
|
||||||
- Health checks
|
|
||||||
- Volume mounts for data/configs/output
|
|
||||||
- MCP server port exposed (8765)
|
|
||||||
- Image size optimization
|
|
||||||
|
|
||||||
**Image Size:** ~400MB
|
|
||||||
**Platforms:** linux/amd64, linux/arm64
|
|
||||||
|
|
||||||
### 2. Dockerfile.mcp (MCP Server)
|
|
||||||
|
|
||||||
**File:** `Dockerfile.mcp` (65 lines)
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Specialized for MCP server deployment
|
|
||||||
- HTTP mode by default (--transport http)
|
|
||||||
- Health check endpoint
|
|
||||||
- Non-root user
|
|
||||||
- Environment configuration
|
|
||||||
- Volume persistence
|
|
||||||
|
|
||||||
**Image Size:** ~450MB
|
|
||||||
**Platforms:** linux/amd64, linux/arm64
|
|
||||||
|
|
||||||
### 3. Docker Compose
|
|
||||||
|
|
||||||
**File:** `docker-compose.yml` (120 lines)
|
|
||||||
|
|
||||||
**Services:**
|
|
||||||
1. **skill-seekers** - CLI application
|
|
||||||
2. **mcp-server** - MCP server (port 8765)
|
|
||||||
3. **weaviate** - Vector DB (port 8080)
|
|
||||||
4. **qdrant** - Vector DB (ports 6333/6334)
|
|
||||||
5. **chroma** - Vector DB (port 8000)
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Service orchestration
|
|
||||||
- Named volumes for persistence
|
|
||||||
- Network isolation
|
|
||||||
- Health checks
|
|
||||||
- Environment variable configuration
|
|
||||||
- Auto-restart policies
|
|
||||||
|
|
||||||
### 4. Docker Ignore
|
|
||||||
|
|
||||||
**File:** `.dockerignore` (80 lines)
|
|
||||||
|
|
||||||
**Optimizations:**
|
|
||||||
- Excludes tests, docs, IDE files
|
|
||||||
- Reduces build context size
|
|
||||||
- Faster build times
|
|
||||||
- Smaller image sizes
|
|
||||||
|
|
||||||
### 5. Environment Configuration
|
|
||||||
|
|
||||||
**File:** `.env.example` (40 lines)
|
|
||||||
|
|
||||||
**Variables:**
|
|
||||||
- API keys (Anthropic, Google, OpenAI)
|
|
||||||
- GitHub token
|
|
||||||
- MCP server configuration
|
|
||||||
- Resource limits
|
|
||||||
- Vector database ports
|
|
||||||
- Logging configuration
|
|
||||||
|
|
||||||
### 6. Comprehensive Documentation
|
|
||||||
|
|
||||||
**File:** `docs/DOCKER_GUIDE.md` (650+ lines)
|
|
||||||
|
|
||||||
**Sections:**
|
|
||||||
- Quick start guide
|
|
||||||
- Available images
|
|
||||||
- Service architecture
|
|
||||||
- Common use cases
|
|
||||||
- Volume management
|
|
||||||
- Environment variables
|
|
||||||
- Building locally
|
|
||||||
- Troubleshooting
|
|
||||||
- Production deployment
|
|
||||||
- Security hardening
|
|
||||||
- Monitoring & scaling
|
|
||||||
- Best practices
|
|
||||||
|
|
||||||
### 7. CI/CD Automation
|
|
||||||
|
|
||||||
**File:** `.github/workflows/docker-publish.yml` (130 lines)
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Automated builds on push/tag/PR
|
|
||||||
- Multi-platform builds (amd64 + arm64)
|
|
||||||
- Docker Hub publishing
|
|
||||||
- Image testing
|
|
||||||
- Metadata extraction
|
|
||||||
- Build caching (GitHub Actions cache)
|
|
||||||
- Docker Compose validation
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Key Features
|
|
||||||
|
|
||||||
### Multi-Stage Builds
|
|
||||||
|
|
||||||
**Stage 1: Builder**
|
|
||||||
- Install build dependencies
|
|
||||||
- Build Python packages
|
|
||||||
- Install all dependencies
|
|
||||||
|
|
||||||
**Stage 2: Runtime**
|
|
||||||
- Minimal production image
|
|
||||||
- Copy only runtime artifacts
|
|
||||||
- Remove build tools
|
|
||||||
- 40% smaller final image
|
|
||||||
|
|
||||||
### Security
|
|
||||||
|
|
||||||
✅ **Non-Root User**
|
|
||||||
- All containers run as UID 1000
|
|
||||||
- No privileged access
|
|
||||||
- Secure by default
|
|
||||||
|
|
||||||
✅ **Secrets Management**
|
|
||||||
- Environment variables
|
|
||||||
- Docker secrets support
|
|
||||||
- .gitignore for .env
|
|
||||||
|
|
||||||
✅ **Read-Only Filesystems**
|
|
||||||
- Configurable in production
|
|
||||||
- Temporary directories via tmpfs
|
|
||||||
|
|
||||||
✅ **Resource Limits**
|
|
||||||
- CPU and memory constraints
|
|
||||||
- Prevents resource exhaustion
|
|
||||||
|
|
||||||
### Orchestration
|
|
||||||
|
|
||||||
**Docker Compose Features:**
|
|
||||||
1. **Service Dependencies** - Proper startup order
|
|
||||||
2. **Named Volumes** - Persistent data storage
|
|
||||||
3. **Networks** - Service isolation
|
|
||||||
4. **Health Checks** - Automated monitoring
|
|
||||||
5. **Auto-Restart** - High availability
|
|
||||||
|
|
||||||
**Architecture:**
|
|
||||||
```
|
|
||||||
┌──────────────┐
|
|
||||||
│ skill-seekers│ CLI Application
|
|
||||||
└──────────────┘
|
|
||||||
│
|
|
||||||
┌──────────────┐
|
|
||||||
│ mcp-server │ MCP Server :8765
|
|
||||||
└──────────────┘
|
|
||||||
│
|
|
||||||
┌───┴───┬────────┬────────┐
|
|
||||||
│ │ │ │
|
|
||||||
┌──┴──┐ ┌──┴──┐ ┌───┴──┐ ┌───┴──┐
|
|
||||||
│Weav-│ │Qdrant│ │Chroma│ │FAISS │
|
|
||||||
│iate │ │ │ │ │ │(CLI) │
|
|
||||||
└─────┘ └──────┘ └──────┘ └──────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
### CI/CD Integration
|
|
||||||
|
|
||||||
**GitHub Actions Workflow:**
|
|
||||||
1. **Build Matrix** - 2 images (CLI + MCP)
|
|
||||||
2. **Multi-Platform** - amd64 + arm64
|
|
||||||
3. **Automated Testing** - Health checks + command tests
|
|
||||||
4. **Docker Hub** - Auto-publish on tags
|
|
||||||
5. **Caching** - GitHub Actions cache
|
|
||||||
|
|
||||||
**Triggers:**
|
|
||||||
- Push to main
|
|
||||||
- Version tags (v*)
|
|
||||||
- Pull requests (test only)
|
|
||||||
- Manual dispatch
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Quick Start
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 1. Clone repository
|
|
||||||
git clone https://github.com/your-org/skill-seekers.git
|
|
||||||
cd skill-seekers
|
|
||||||
|
|
||||||
# 2. Configure environment
|
|
||||||
cp .env.example .env
|
|
||||||
# Edit .env with your API keys
|
|
||||||
|
|
||||||
# 3. Start services
|
|
||||||
docker-compose up -d
|
|
||||||
|
|
||||||
# 4. Verify
|
|
||||||
docker-compose ps
|
|
||||||
curl http://localhost:8765/health
|
|
||||||
```
|
|
||||||
|
|
||||||
### Scrape Documentation
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose run skill-seekers \
|
|
||||||
skill-seekers scrape --config /configs/react.json
|
|
||||||
```
|
|
||||||
|
|
||||||
### Export to Vector Databases
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose run skill-seekers bash -c "
|
|
||||||
for target in weaviate chroma faiss qdrant; do
|
|
||||||
python -c \"
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
sys.path.insert(0, '/app/src')
|
|
||||||
from skill_seekers.cli.adaptors import get_adaptor
|
|
||||||
adaptor = get_adaptor('$target')
|
|
||||||
adaptor.package(Path('/output/react'), Path('/output'))
|
|
||||||
print('✅ $target export complete')
|
|
||||||
\"
|
|
||||||
done
|
|
||||||
"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Run Quality Analysis
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose run skill-seekers \
|
|
||||||
python3 -c "
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
sys.path.insert(0, '/app/src')
|
|
||||||
from skill_seekers.cli.quality_metrics import QualityAnalyzer
|
|
||||||
analyzer = QualityAnalyzer(Path('/output/react'))
|
|
||||||
report = analyzer.generate_report()
|
|
||||||
print(analyzer.format_report(report))
|
|
||||||
"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Production Deployment
|
|
||||||
|
|
||||||
### Resource Requirements
|
|
||||||
|
|
||||||
**Minimum:**
|
|
||||||
- CPU: 2 cores
|
|
||||||
- RAM: 2GB
|
|
||||||
- Disk: 5GB
|
|
||||||
|
|
||||||
**Recommended:**
|
|
||||||
- CPU: 4 cores
|
|
||||||
- RAM: 4GB
|
|
||||||
- Disk: 20GB (with vector DBs)
|
|
||||||
|
|
||||||
### Security Hardening
|
|
||||||
|
|
||||||
1. **Secrets Management**
|
|
||||||
```bash
|
|
||||||
# Docker secrets
|
|
||||||
echo "sk-ant-key" | docker secret create anthropic_key -
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Resource Limits**
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
mcp-server:
|
|
||||||
deploy:
|
|
||||||
resources:
|
|
||||||
limits:
|
|
||||||
cpus: '2.0'
|
|
||||||
memory: 2G
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Read-Only Filesystem**
|
|
||||||
```yaml
|
|
||||||
services:
|
|
||||||
mcp-server:
|
|
||||||
read_only: true
|
|
||||||
tmpfs:
|
|
||||||
- /tmp
|
|
||||||
```
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
|
|
||||||
**Health Checks:**
|
|
||||||
```bash
|
|
||||||
# Check services
|
|
||||||
docker-compose ps
|
|
||||||
|
|
||||||
# Detailed health
|
|
||||||
docker inspect skill-seekers-mcp | grep Health
|
|
||||||
```
|
|
||||||
|
|
||||||
**Logs:**
|
|
||||||
```bash
|
|
||||||
# Stream logs
|
|
||||||
docker-compose logs -f
|
|
||||||
|
|
||||||
# Export logs
|
|
||||||
docker-compose logs > logs.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
**Metrics:**
|
|
||||||
```bash
|
|
||||||
# Resource usage
|
|
||||||
docker stats
|
|
||||||
|
|
||||||
# Per-service metrics
|
|
||||||
docker-compose top
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Integration with Week 2 Features
|
|
||||||
|
|
||||||
Docker deployment supports all Week 2 capabilities:
|
|
||||||
|
|
||||||
| Feature | Docker Support |
|
|
||||||
|---------|----------------|
|
|
||||||
| **Vector Database Adaptors** | ✅ All 4 (Weaviate, Chroma, FAISS, Qdrant) |
|
|
||||||
| **MCP Server** | ✅ Dedicated container (HTTP/stdio) |
|
|
||||||
| **Streaming Ingestion** | ✅ Memory-efficient in containers |
|
|
||||||
| **Incremental Updates** | ✅ Persistent volumes |
|
|
||||||
| **Multi-Language** | ✅ Full language support |
|
|
||||||
| **Embedding Pipeline** | ✅ Cache persisted |
|
|
||||||
| **Quality Metrics** | ✅ Automated analysis |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Performance Metrics
|
|
||||||
|
|
||||||
### Build Times
|
|
||||||
|
|
||||||
| Target | Duration | Cache Hit |
|
|
||||||
|--------|----------|-----------|
|
|
||||||
| CLI (first build) | 3-5 min | 0% |
|
|
||||||
| CLI (cached) | 30-60 sec | 80%+ |
|
|
||||||
| MCP (first build) | 3-5 min | 0% |
|
|
||||||
| MCP (cached) | 30-60 sec | 80%+ |
|
|
||||||
|
|
||||||
### Image Sizes
|
|
||||||
|
|
||||||
| Image | Size | Compressed |
|
|
||||||
|-------|------|------------|
|
|
||||||
| skill-seekers | ~400MB | ~150MB |
|
|
||||||
| skill-seekers-mcp | ~450MB | ~170MB |
|
|
||||||
| python:3.12-slim (base) | ~130MB | ~50MB |
|
|
||||||
|
|
||||||
### Runtime Performance
|
|
||||||
|
|
||||||
| Operation | Container | Native | Overhead |
|
|
||||||
|-----------|-----------|--------|----------|
|
|
||||||
| Scraping | 10 min | 9.5 min | +5% |
|
|
||||||
| Quality Analysis | 2 sec | 1.8 sec | +10% |
|
|
||||||
| Vector Export | 5 sec | 4.5 sec | +10% |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Best Practices Implemented
|
|
||||||
|
|
||||||
### ✅ Image Optimization
|
|
||||||
|
|
||||||
1. **Multi-stage builds** - 40% size reduction
|
|
||||||
2. **Slim base images** - Python 3.12-slim
|
|
||||||
3. **.dockerignore** - Reduced build context
|
|
||||||
4. **Layer caching** - Faster rebuilds
|
|
||||||
|
|
||||||
### ✅ Security
|
|
||||||
|
|
||||||
1. **Non-root user** - UID 1000 (skillseeker)
|
|
||||||
2. **Secrets via env** - No hardcoded keys
|
|
||||||
3. **Read-only support** - Configurable
|
|
||||||
4. **Resource limits** - Prevent DoS
|
|
||||||
|
|
||||||
### ✅ Reliability
|
|
||||||
|
|
||||||
1. **Health checks** - All services
|
|
||||||
2. **Auto-restart** - unless-stopped
|
|
||||||
3. **Volume persistence** - Named volumes
|
|
||||||
4. **Graceful shutdown** - SIGTERM handling
|
|
||||||
|
|
||||||
### ✅ Developer Experience
|
|
||||||
|
|
||||||
1. **One-command start** - `docker-compose up`
|
|
||||||
2. **Hot reload** - Volume mounts
|
|
||||||
3. **Easy configuration** - .env file
|
|
||||||
4. **Comprehensive docs** - 650+ line guide
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Troubleshooting Guide
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
1. **Port Already in Use**
|
|
||||||
```bash
|
|
||||||
# Check what's using the port
|
|
||||||
lsof -i :8765
|
|
||||||
|
|
||||||
# Use different port
|
|
||||||
MCP_PORT=8766 docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Permission Denied**
|
|
||||||
```bash
|
|
||||||
# Fix ownership
|
|
||||||
sudo chown -R $(id -u):$(id -g) data/ output/
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Out of Memory**
|
|
||||||
```bash
|
|
||||||
# Increase limits
|
|
||||||
docker-compose up -d --scale mcp-server=1 --memory=4g
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Slow Build**
|
|
||||||
```bash
|
|
||||||
# Enable BuildKit
|
|
||||||
export DOCKER_BUILDKIT=1
|
|
||||||
docker build -t skill-seekers:local .
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Next Steps (Week 3 Remaining)
|
|
||||||
|
|
||||||
With Task #21 complete, continue Week 3:
|
|
||||||
|
|
||||||
- **Task #22:** Kubernetes Helm charts
|
|
||||||
- **Task #23:** Multi-cloud storage (S3, GCS, Azure)
|
|
||||||
- **Task #24:** API server for embedding generation
|
|
||||||
- **Task #25:** Real-time documentation sync
|
|
||||||
- **Task #26:** Performance benchmarking suite
|
|
||||||
- **Task #27:** Production deployment guides
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files Created
|
|
||||||
|
|
||||||
### Docker Infrastructure (6 files)
|
|
||||||
|
|
||||||
1. `Dockerfile` (70 lines) - Main CLI image
|
|
||||||
2. `Dockerfile.mcp` (65 lines) - MCP server image
|
|
||||||
3. `docker-compose.yml` (120 lines) - Service orchestration
|
|
||||||
4. `.dockerignore` (80 lines) - Build optimization
|
|
||||||
5. `.env.example` (40 lines) - Environment template
|
|
||||||
6. `docs/DOCKER_GUIDE.md` (650+ lines) - Comprehensive documentation
|
|
||||||
|
|
||||||
### CI/CD (1 file)
|
|
||||||
|
|
||||||
7. `.github/workflows/docker-publish.yml` (130 lines) - Automated builds
|
|
||||||
|
|
||||||
### Total Impact
|
|
||||||
|
|
||||||
- **New Files:** 7 (~1,155 lines)
|
|
||||||
- **Docker Images:** 2 (CLI + MCP)
|
|
||||||
- **Docker Compose Services:** 5
|
|
||||||
- **Supported Platforms:** 2 (amd64 + arm64)
|
|
||||||
- **Documentation:** 650+ lines
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quality Achievements
|
|
||||||
|
|
||||||
### Deployment Readiness
|
|
||||||
|
|
||||||
- **Before:** Manual Python installation required
|
|
||||||
- **After:** One-command Docker deployment
|
|
||||||
- **Improvement:** 95% faster setup (10 min → 30 sec)
|
|
||||||
|
|
||||||
### Platform Support
|
|
||||||
|
|
||||||
- **Before:** Python 3.10+ only
|
|
||||||
- **After:** Docker (any OS with Docker)
|
|
||||||
- **Platforms:** Linux, macOS, Windows (via Docker)
|
|
||||||
|
|
||||||
### Production Features
|
|
||||||
|
|
||||||
- **Multi-stage builds** ✅
|
|
||||||
- **Health checks** ✅
|
|
||||||
- **Volume persistence** ✅
|
|
||||||
- **Resource limits** ✅
|
|
||||||
- **Security hardening** ✅
|
|
||||||
- **CI/CD automation** ✅
|
|
||||||
- **Comprehensive docs** ✅
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Task #21: Docker Deployment Infrastructure - COMPLETE ✅**
|
|
||||||
|
|
||||||
**Week 3 Progress:** 2/8 tasks complete (25%)
|
|
||||||
**Ready for Task #22:** Kubernetes Helm Charts
|
|
||||||
@@ -1,501 +0,0 @@
|
|||||||
# Week 2 Complete: Universal Infrastructure Features
|
|
||||||
|
|
||||||
**Completion Date:** February 7, 2026
|
|
||||||
**Branch:** `feature/universal-infrastructure-strategy`
|
|
||||||
**Status:** ✅ 100% Complete (9/9 tasks)
|
|
||||||
**Total Implementation:** ~4,000 lines of production code + 140+ tests
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎯 Week 2 Objective
|
|
||||||
|
|
||||||
Build universal infrastructure capabilities to support multiple vector databases, handle large-scale documentation, enable incremental updates, support multi-language content, and provide production-ready quality monitoring.
|
|
||||||
|
|
||||||
**Strategic Goal:** Transform Skill Seekers from a single-output tool into a flexible infrastructure layer that can adapt to any RAG pipeline, vector database, or deployment scenario.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ✅ Completed Tasks (9/9)
|
|
||||||
|
|
||||||
### **Task #10: Weaviate Vector Database Adaptor**
|
|
||||||
**Commit:** `baccbf9`
|
|
||||||
**Files:** `src/skill_seekers/cli/adaptors/weaviate.py` (405 lines)
|
|
||||||
**Tests:** 11 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- REST API compatible output format
|
|
||||||
- Semantic schema with hybrid search support
|
|
||||||
- BM25 keyword search + vector similarity
|
|
||||||
- Property-based filtering capabilities
|
|
||||||
- Production-ready batching for ingestion
|
|
||||||
|
|
||||||
**Impact:** Enables enterprise-scale vector search with Weaviate (450K+ users)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #11: Chroma Vector Database Adaptor**
|
|
||||||
**Commit:** `6fd8474`
|
|
||||||
**Files:** `src/skill_seekers/cli/adaptors/chroma.py` (436 lines)
|
|
||||||
**Tests:** 12 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- ChromaDB collection format export
|
|
||||||
- Metadata filtering and querying
|
|
||||||
- Multi-modal embedding support
|
|
||||||
- Distance metrics: cosine, L2, IP
|
|
||||||
- Local-first development friendly
|
|
||||||
|
|
||||||
**Impact:** Supports popular open-source vector DB (800K+ developers)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #12: FAISS Similarity Search Adaptor**
|
|
||||||
**Commit:** `ff41968`
|
|
||||||
**Files:** `src/skill_seekers/cli/adaptors/faiss_helpers.py` (398 lines)
|
|
||||||
**Tests:** 10 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Facebook AI Similarity Search integration
|
|
||||||
- Multiple index types: Flat, IVF, HNSW
|
|
||||||
- Billion-scale vector search
|
|
||||||
- GPU acceleration support
|
|
||||||
- Memory-efficient indexing
|
|
||||||
|
|
||||||
**Impact:** Ultra-fast local search for large-scale deployments
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #13: Qdrant Vector Database Adaptor**
|
|
||||||
**Commit:** `359f266`
|
|
||||||
**Files:** `src/skill_seekers/cli/adaptors/qdrant.py` (466 lines)
|
|
||||||
**Tests:** 9 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Point-based storage with payloads
|
|
||||||
- Native payload filtering
|
|
||||||
- UUID v5 generation for stable IDs
|
|
||||||
- REST API compatible output
|
|
||||||
- Advanced filtering capabilities
|
|
||||||
|
|
||||||
**Impact:** Modern vector search with rich metadata (100K+ users)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #14: Streaming Ingestion for Large Docs**
|
|
||||||
**Commit:** `5ce3ed4`
|
|
||||||
**Files:**
|
|
||||||
- `src/skill_seekers/cli/streaming_ingest.py` (397 lines)
|
|
||||||
- `src/skill_seekers/cli/adaptors/streaming_adaptor.py` (320 lines)
|
|
||||||
- Updated `package_skill.py` with streaming support
|
|
||||||
|
|
||||||
**Tests:** 10 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Memory-efficient chunking with overlap (4000 chars default, 200 char overlap)
|
|
||||||
- Progress tracking for large batches
|
|
||||||
- Batch iteration (100 docs default)
|
|
||||||
- Checkpoint support for resume capability
|
|
||||||
- Streaming adaptor mixin for all platforms
|
|
||||||
|
|
||||||
**CLI:**
|
|
||||||
```bash
|
|
||||||
skill-seekers package output/react/ --streaming --chunk-size 4000 --chunk-overlap 200
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** Process 10GB+ documentation without memory issues (100x scale improvement)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #15: Incremental Updates with Change Detection**
|
|
||||||
**Commit:** `7762d10`
|
|
||||||
**Files:** `src/skill_seekers/cli/incremental_updater.py` (450 lines)
|
|
||||||
**Tests:** 12 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- SHA256 hashing for change detection
|
|
||||||
- Version tracking (major.minor.patch)
|
|
||||||
- Delta package generation
|
|
||||||
- Change classification: added/modified/deleted
|
|
||||||
- Detailed diff reports with line counts
|
|
||||||
|
|
||||||
**Update Types:**
|
|
||||||
- Full rebuild (major version bump)
|
|
||||||
- Delta update (minor version bump)
|
|
||||||
- Patch update (patch version bump)
|
|
||||||
|
|
||||||
**Impact:** 95% faster updates (45 min → 2 min for small changes)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #16: Multi-Language Documentation Support**
|
|
||||||
**Commit:** `261f28f`
|
|
||||||
**Files:** `src/skill_seekers/cli/multilang_support.py` (421 lines)
|
|
||||||
**Tests:** 22 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- 11 languages supported:
|
|
||||||
- English, Spanish, French, German, Portuguese
|
|
||||||
- Italian, Chinese, Japanese, Korean
|
|
||||||
- Russian, Arabic
|
|
||||||
- Filename pattern recognition:
|
|
||||||
- `file.en.md`, `file_en.md`, `file-en.md`
|
|
||||||
- Content-based language detection
|
|
||||||
- Translation status tracking
|
|
||||||
- Export by language
|
|
||||||
- Primary language auto-detection
|
|
||||||
|
|
||||||
**Impact:** Global reach for international developer communities (3B+ users)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #17: Custom Embedding Pipeline**
|
|
||||||
**Commit:** `b475b51`
|
|
||||||
**Files:** `src/skill_seekers/cli/embedding_pipeline.py` (435 lines)
|
|
||||||
**Tests:** 18 tests passing
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- Provider abstraction: OpenAI, Local (extensible)
|
|
||||||
- Two-tier caching: memory + disk
|
|
||||||
- Cost tracking and estimation
|
|
||||||
- Batch processing with progress
|
|
||||||
- Dimension validation
|
|
||||||
- Deterministic local embeddings (development)
|
|
||||||
|
|
||||||
**OpenAI Models Supported:**
|
|
||||||
- text-embedding-ada-002 (1536 dims, $0.10/1M tokens)
|
|
||||||
- text-embedding-3-small (1536 dims, $0.02/1M tokens)
|
|
||||||
- text-embedding-3-large (3072 dims, $0.13/1M tokens)
|
|
||||||
|
|
||||||
**Impact:** 70% cost reduction via caching + flexible provider switching
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### **Task #18: Quality Metrics Dashboard**
|
|
||||||
**Commit:** `3e8c913`
|
|
||||||
**Files:**
|
|
||||||
- `src/skill_seekers/cli/quality_metrics.py` (542 lines)
|
|
||||||
- `tests/test_quality_metrics.py` (18 tests)
|
|
||||||
|
|
||||||
**Tests:** 18/18 passing ✅
|
|
||||||
|
|
||||||
**Features:**
|
|
||||||
- 4-dimensional quality scoring:
|
|
||||||
1. **Completeness** (30% weight): SKILL.md, references, metadata
|
|
||||||
2. **Accuracy** (25% weight): No TODOs, no placeholders, valid JSON
|
|
||||||
3. **Coverage** (25% weight): Getting started, API docs, examples
|
|
||||||
4. **Health** (20% weight): No empty files, proper structure
|
|
||||||
|
|
||||||
- Grading system: A+ to F (11 grades)
|
|
||||||
- Smart recommendations (priority-based)
|
|
||||||
- Metric severity levels: INFO/WARNING/ERROR/CRITICAL
|
|
||||||
- Formatted dashboard output
|
|
||||||
- Statistics tracking (files, words, size)
|
|
||||||
- JSON export support
|
|
||||||
|
|
||||||
**Scoring Example:**
|
|
||||||
```
|
|
||||||
🎯 OVERALL SCORE
|
|
||||||
Grade: B+
|
|
||||||
Score: 82.5/100
|
|
||||||
|
|
||||||
📈 COMPONENT SCORES
|
|
||||||
Completeness: 85.0% (30% weight)
|
|
||||||
Accuracy: 90.0% (25% weight)
|
|
||||||
Coverage: 75.0% (25% weight)
|
|
||||||
Health: 85.0% (20% weight)
|
|
||||||
|
|
||||||
💡 RECOMMENDATIONS
|
|
||||||
🟡 Expand documentation coverage (API, examples)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Impact:** Objective quality measurement (0/10 → 8.5/10 avg improvement)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📊 Week 2 Summary Statistics
|
|
||||||
|
|
||||||
### Code Metrics
|
|
||||||
- **Production Code:** ~4,000 lines
|
|
||||||
- **Test Code:** ~2,200 lines
|
|
||||||
- **Test Coverage:** 140+ tests (100% pass rate)
|
|
||||||
- **New Files:** 10 modules + 7 test files
|
|
||||||
|
|
||||||
### Capabilities Added
|
|
||||||
- **Vector Databases:** 4 adaptors (Weaviate, Chroma, FAISS, Qdrant)
|
|
||||||
- **Languages Supported:** 11 languages
|
|
||||||
- **Embedding Providers:** 2 (OpenAI, Local)
|
|
||||||
- **Quality Dimensions:** 4 dimensions with weighted scoring
|
|
||||||
- **Streaming:** Memory-efficient processing for 10GB+ docs
|
|
||||||
- **Incremental Updates:** 95% faster updates
|
|
||||||
|
|
||||||
### Platform Support Expanded
|
|
||||||
| Platform | Before | After | Improvement |
|
|
||||||
|----------|--------|-------|-------------|
|
|
||||||
| Vector DBs | 0 | 4 | +4 adaptors |
|
|
||||||
| Max Doc Size | 100MB | 10GB+ | 100x scale |
|
|
||||||
| Update Speed | 45 min | 2 min | 95% faster |
|
|
||||||
| Languages | 1 (EN) | 11 | Global reach |
|
|
||||||
| Quality Metrics | Manual | Automated | 8.5/10 avg |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎯 Strategic Impact
|
|
||||||
|
|
||||||
### Before Week 2
|
|
||||||
- Single-format output (Claude skills)
|
|
||||||
- Memory-limited (100MB docs)
|
|
||||||
- Full rebuild required (45 min)
|
|
||||||
- English-only documentation
|
|
||||||
- No quality measurement
|
|
||||||
|
|
||||||
### After Week 2
|
|
||||||
- **4 vector database formats** (Weaviate, Chroma, FAISS, Qdrant)
|
|
||||||
- **Streaming ingestion** for unlimited scale (10GB+)
|
|
||||||
- **Incremental updates** (95% faster)
|
|
||||||
- **11 languages** for global reach
|
|
||||||
- **Custom embedding pipeline** (70% cost savings)
|
|
||||||
- **Quality metrics** (objective measurement)
|
|
||||||
|
|
||||||
### Market Expansion
|
|
||||||
- **Before:** RAG pipelines (5M users)
|
|
||||||
- **After:** RAG + Vector DBs + Multi-language + Enterprise (12M+ users)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🔧 Technical Achievements
|
|
||||||
|
|
||||||
### 1. Platform Adaptor Pattern
|
|
||||||
Consistent interface across 4 vector databases:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.adaptors import get_adaptor
|
|
||||||
|
|
||||||
adaptor = get_adaptor('weaviate') # or 'chroma', 'faiss', 'qdrant'
|
|
||||||
adaptor.package(skill_dir='output/react/', output_path='output/')
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Streaming Architecture
|
|
||||||
Memory-efficient processing for massive documentation:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.streaming_ingest import StreamingIngester
|
|
||||||
|
|
||||||
ingester = StreamingIngester(chunk_size=4000, chunk_overlap=200)
|
|
||||||
for chunk, metadata in ingester.chunk_document(content, metadata):
|
|
||||||
# Process chunk without loading entire doc into memory
|
|
||||||
yield chunk, metadata
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Incremental Update System
|
|
||||||
Smart change detection with version tracking:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.incremental_updater import IncrementalUpdater
|
|
||||||
|
|
||||||
updater = IncrementalUpdater(skill_dir='output/react/')
|
|
||||||
changes = updater.detect_changes(previous_version='1.2.3')
|
|
||||||
# Returns: ChangeSet(added=[], modified=['api_reference.md'], deleted=[])
|
|
||||||
updater.generate_delta_package(changes, output_path='delta.zip')
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. Multi-Language Manager
|
|
||||||
Language detection and translation tracking:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.multilang_support import MultiLanguageManager
|
|
||||||
|
|
||||||
manager = MultiLanguageManager()
|
|
||||||
manager.add_document('README.md', content, metadata)
|
|
||||||
manager.add_document('README.es.md', spanish_content, metadata)
|
|
||||||
status = manager.get_translation_status()
|
|
||||||
# Returns: TranslationStatus(source='en', translated=['es'], coverage=100%)
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. Embedding Pipeline
|
|
||||||
Provider abstraction with caching:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.embedding_pipeline import EmbeddingPipeline, EmbeddingConfig
|
|
||||||
|
|
||||||
config = EmbeddingConfig(
|
|
||||||
provider='openai', # or 'local'
|
|
||||||
model='text-embedding-3-small',
|
|
||||||
dimension=1536,
|
|
||||||
batch_size=100
|
|
||||||
)
|
|
||||||
pipeline = EmbeddingPipeline(config)
|
|
||||||
result = pipeline.generate_batch(texts)
|
|
||||||
# Automatic caching reduces cost by 70%
|
|
||||||
```
|
|
||||||
|
|
||||||
### 6. Quality Analytics
|
|
||||||
Objective quality measurement:
|
|
||||||
```python
|
|
||||||
from skill_seekers.cli.quality_metrics import QualityAnalyzer
|
|
||||||
|
|
||||||
analyzer = QualityAnalyzer(skill_dir='output/react/')
|
|
||||||
report = analyzer.generate_report()
|
|
||||||
print(f"Grade: {report.overall_score.grade}") # e.g., "A-"
|
|
||||||
print(f"Score: {report.overall_score.total_score}") # e.g., 87.5
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🚀 Integration Examples
|
|
||||||
|
|
||||||
### Example 1: Stream to Weaviate
|
|
||||||
```bash
|
|
||||||
# Generate skill with streaming + Weaviate format
|
|
||||||
skill-seekers scrape --config configs/react.json
|
|
||||||
skill-seekers package output/react/ \
|
|
||||||
--target weaviate \
|
|
||||||
--streaming \
|
|
||||||
--chunk-size 4000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Example 2: Incremental Update to Chroma
|
|
||||||
```bash
|
|
||||||
# Initial build
|
|
||||||
skill-seekers scrape --config configs/react.json
|
|
||||||
skill-seekers package output/react/ --target chroma
|
|
||||||
|
|
||||||
# Update docs (only changed files)
|
|
||||||
skill-seekers scrape --config configs/react.json --incremental
|
|
||||||
skill-seekers package output/react/ --target chroma --delta-only
|
|
||||||
# 95% faster: 2 min vs 45 min
|
|
||||||
```
|
|
||||||
|
|
||||||
### Example 3: Multi-Language with Quality Checks
|
|
||||||
```bash
|
|
||||||
# Scrape multi-language docs
|
|
||||||
skill-seekers scrape --config configs/vue.json --detect-languages
|
|
||||||
|
|
||||||
# Check quality before deployment
|
|
||||||
skill-seekers analyze output/vue/
|
|
||||||
# Quality Grade: A- (87.5/100)
|
|
||||||
# ✅ Ready for production
|
|
||||||
|
|
||||||
# Package by language
|
|
||||||
skill-seekers package output/vue/ --target qdrant --language es
|
|
||||||
```
|
|
||||||
|
|
||||||
### Example 4: Custom Embeddings with Cost Tracking
|
|
||||||
```bash
|
|
||||||
# Generate embeddings with caching
|
|
||||||
skill-seekers embed output/react/ \
|
|
||||||
--provider openai \
|
|
||||||
--model text-embedding-3-small \
|
|
||||||
--cache-dir .embeddings_cache
|
|
||||||
|
|
||||||
# Result: $0.05 (vs $0.15 without caching = 67% savings)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎯 Quality Improvements
|
|
||||||
|
|
||||||
### Measurable Impact
|
|
||||||
| Metric | Before | After | Improvement |
|
|
||||||
|--------|--------|-------|-------------|
|
|
||||||
| Max Scale | 100MB | 10GB+ | 100x |
|
|
||||||
| Update Time | 45 min | 2 min | 95% faster |
|
|
||||||
| Language Support | 1 | 11 | 11x reach |
|
|
||||||
| Embedding Cost | $0.15 | $0.05 | 67% savings |
|
|
||||||
| Quality Score | Manual | 8.5/10 | Automated |
|
|
||||||
| Vector DB Support | 0 | 4 | +4 platforms |
|
|
||||||
|
|
||||||
### Test Coverage
|
|
||||||
- ✅ 140+ tests across all features
|
|
||||||
- ✅ 100% test pass rate
|
|
||||||
- ✅ Comprehensive edge case coverage
|
|
||||||
- ✅ Integration tests for all adaptors
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 📋 Files Changed
|
|
||||||
|
|
||||||
### New Modules (10)
|
|
||||||
1. `src/skill_seekers/cli/adaptors/weaviate.py` (405 lines)
|
|
||||||
2. `src/skill_seekers/cli/adaptors/chroma.py` (436 lines)
|
|
||||||
3. `src/skill_seekers/cli/adaptors/faiss_helpers.py` (398 lines)
|
|
||||||
4. `src/skill_seekers/cli/adaptors/qdrant.py` (466 lines)
|
|
||||||
5. `src/skill_seekers/cli/streaming_ingest.py` (397 lines)
|
|
||||||
6. `src/skill_seekers/cli/adaptors/streaming_adaptor.py` (320 lines)
|
|
||||||
7. `src/skill_seekers/cli/incremental_updater.py` (450 lines)
|
|
||||||
8. `src/skill_seekers/cli/multilang_support.py` (421 lines)
|
|
||||||
9. `src/skill_seekers/cli/embedding_pipeline.py` (435 lines)
|
|
||||||
10. `src/skill_seekers/cli/quality_metrics.py` (542 lines)
|
|
||||||
|
|
||||||
### Test Files (7)
|
|
||||||
1. `tests/test_weaviate_adaptor.py` (11 tests)
|
|
||||||
2. `tests/test_chroma_adaptor.py` (12 tests)
|
|
||||||
3. `tests/test_faiss_helpers.py` (10 tests)
|
|
||||||
4. `tests/test_qdrant_adaptor.py` (9 tests)
|
|
||||||
5. `tests/test_streaming_ingest.py` (10 tests)
|
|
||||||
6. `tests/test_incremental_updater.py` (12 tests)
|
|
||||||
7. `tests/test_multilang_support.py` (22 tests)
|
|
||||||
8. `tests/test_embedding_pipeline.py` (18 tests)
|
|
||||||
9. `tests/test_quality_metrics.py` (18 tests)
|
|
||||||
|
|
||||||
### Modified Files
|
|
||||||
- `src/skill_seekers/cli/adaptors/__init__.py` (added 4 adaptor registrations)
|
|
||||||
- `src/skill_seekers/cli/package_skill.py` (added streaming parameters)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎓 Lessons Learned
|
|
||||||
|
|
||||||
### What Worked Well ✅
|
|
||||||
1. **Consistent abstractions** - Platform adaptor pattern scales beautifully
|
|
||||||
2. **Test-driven development** - 100% test pass rate prevented regressions
|
|
||||||
3. **Incremental approach** - 9 focused tasks easier than 1 monolithic task
|
|
||||||
4. **Streaming architecture** - Memory-efficient from day 1
|
|
||||||
5. **Quality metrics** - Objective measurement guides improvements
|
|
||||||
|
|
||||||
### Challenges Overcome ⚡
|
|
||||||
1. **Vector DB format differences** - Solved with adaptor pattern
|
|
||||||
2. **Memory constraints** - Streaming ingestion handles 10GB+ docs
|
|
||||||
3. **Language detection** - Pattern matching + content heuristics work well
|
|
||||||
4. **Cost optimization** - Two-tier caching reduces embedding costs 70%
|
|
||||||
5. **Quality measurement** - Weighted scoring balances multiple dimensions
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🔮 Next Steps: Week 3 Preview
|
|
||||||
|
|
||||||
### Upcoming Tasks
|
|
||||||
- **Task #19:** MCP server integration for vector databases
|
|
||||||
- **Task #20:** GitHub Actions automation
|
|
||||||
- **Task #21:** Docker deployment
|
|
||||||
- **Task #22:** Kubernetes Helm charts
|
|
||||||
- **Task #23:** Multi-cloud storage (S3, GCS, Azure Blob)
|
|
||||||
- **Task #24:** API server for embedding generation
|
|
||||||
- **Task #25:** Real-time documentation sync
|
|
||||||
- **Task #26:** Performance benchmarking suite
|
|
||||||
- **Task #27:** Production deployment guides
|
|
||||||
|
|
||||||
### Strategic Goals
|
|
||||||
- Automation infrastructure (GitHub Actions, Docker, K8s)
|
|
||||||
- Cloud-native deployment
|
|
||||||
- Real-time sync capabilities
|
|
||||||
- Production-ready monitoring
|
|
||||||
- Comprehensive benchmarks
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🎉 Week 2 Achievement
|
|
||||||
|
|
||||||
**Status:** ✅ 100% Complete
|
|
||||||
**Tasks Completed:** 9/9 (100%)
|
|
||||||
**Tests Passing:** 140+/140+ (100%)
|
|
||||||
**Code Quality:** All tests green, comprehensive coverage
|
|
||||||
**Timeline:** On schedule
|
|
||||||
**Strategic Impact:** Universal infrastructure foundation established
|
|
||||||
|
|
||||||
**Ready for Week 3:** Multi-cloud deployment and automation infrastructure
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Contributors:**
|
|
||||||
- Primary Development: Claude Sonnet 4.5 + @yusyus
|
|
||||||
- Testing: Comprehensive test suites
|
|
||||||
- Documentation: Inline code documentation
|
|
||||||
|
|
||||||
**Branch:** `feature/universal-infrastructure-strategy`
|
|
||||||
**Base:** `main`
|
|
||||||
**Ready for:** Merge after Week 3-4 completion
|
|
||||||
Reference in New Issue
Block a user