firefrost-gaming/claude-skills-reference

Files

Alireza Rezvani 757b0c9dd7 fix(skill): rewrite senior-ml-engineer with real ML content (#74 ) (#141 )

- Replace 3 boilerplate reference files with real technical content:
  - mlops_production_patterns.md: deployment, feature stores, A/B testing
  - llm_integration_guide.md: provider abstraction, cost management
  - rag_system_architecture.md: vector DBs, chunking, reranking
- Rewrite SKILL.md: add trigger phrases, TOC, numbered workflows
- Remove "world-class" marketing language (appeared 5+ times)
- Standardize terminology to "MLOps" (not "Mlops")
- Add validation checkpoints to all workflows

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-30 13:03:09 +01:00

7.1 KiB

Raw Blame History

MLOps Production Patterns

Production ML infrastructure patterns for model deployment, monitoring, and lifecycle management.

Model Deployment Pipeline
Feature Store Architecture
Model Monitoring
A/B Testing Infrastructure
Automated Retraining

Model Deployment Pipeline

Deployment Workflow

Export trained model to standardized format (ONNX, TorchScript, SavedModel)
Package model with dependencies in Docker container
Deploy to staging environment
Run integration tests against staging
Deploy canary (5% traffic) to production
Monitor latency and error rates for 1 hour
Promote to full production if metrics pass
Validation: p95 latency < 100ms, error rate < 0.1%

Container Structure

FROM python:3.11-slim

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model artifacts
COPY model/ /app/model/
COPY src/ /app/src/

# Health check endpoint
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]

Model Serving Options

Option	Latency	Throughput	Use Case
FastAPI + Uvicorn	Low	Medium	REST APIs, small models
Triton Inference Server	Very Low	Very High	GPU inference, batching
TensorFlow Serving	Low	High	TensorFlow models
TorchServe	Low	High	PyTorch models
Ray Serve	Medium	High	Complex pipelines, multi-model

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  replicas: 3
  selector:
    matchLabels:
      app: model-serving
  template:
    spec:
      containers:
      - name: model
        image: model:v1.0.0
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Feature Store Architecture

Feature Store Components

Component	Purpose	Tools
Offline Store	Training data, batch features	BigQuery, Snowflake, S3
Online Store	Low-latency serving	Redis, DynamoDB, Feast
Feature Registry	Metadata, lineage	Feast, Tecton, Hopsworks
Transformation	Feature engineering	Spark, Flink, dbt

Feature Pipeline Workflow

Define feature schema in registry
Implement transformation logic (SQL or Python)
Backfill historical features to offline store
Schedule incremental updates
Materialize to online store for serving
Monitor feature freshness and quality
Validation: Feature values within expected ranges, no nulls in required fields

Feature Definition Example

from feast import Entity, Feature, FeatureView, FileSource

user = Entity(name="user_id", value_type=ValueType.INT64)

user_features = FeatureView(
    name="user_features",
    entities=["user_id"],
    ttl=timedelta(days=1),
    features=[
        Feature(name="purchase_count_30d", dtype=ValueType.INT64),
        Feature(name="avg_order_value", dtype=ValueType.FLOAT),
        Feature(name="days_since_last_purchase", dtype=ValueType.INT64),
    ],
    online=True,
    source=FileSource(path="data/user_features.parquet"),
)

Model Monitoring

Monitoring Dimensions

Dimension	Metrics	Alert Threshold
Latency	p50, p95, p99	p95 > 100ms
Throughput	requests/sec	< 80% baseline
Errors	error rate, 5xx count	> 0.1%
Data Drift	PSI, KS statistic	PSI > 0.2
Model Drift	accuracy, AUC decay	> 5% drop

Data Drift Detection

from scipy.stats import ks_2samp
import numpy as np

def detect_drift(reference: np.array, current: np.array, threshold: float = 0.05):
    """Detect distribution drift using Kolmogorov-Smirnov test."""
    statistic, p_value = ks_2samp(reference, current)

    drift_detected = p_value < threshold

    return {
        "drift_detected": drift_detected,
        "ks_statistic": statistic,
        "p_value": p_value,
        "threshold": threshold
    }

Monitoring Dashboard Metrics

Infrastructure:

Request latency (p50, p95, p99)
Requests per second
Error rate by type
CPU/memory utilization
GPU utilization (if applicable)

Model Performance:

Prediction distribution
Feature value distributions
Model output confidence
Ground truth vs predictions (when available)

A/B Testing Infrastructure

Experiment Workflow

Define experiment hypothesis and success metrics
Calculate required sample size for statistical power
Configure traffic split (control vs treatment)
Deploy treatment model alongside control
Route traffic based on user/session hash
Collect metrics for both variants
Run statistical significance test
Validation: p-value < 0.05, minimum sample size reached

Traffic Splitting

import hashlib

def get_variant(user_id: str, experiment: str, control_pct: float = 0.5) -> str:
    """Deterministic traffic splitting based on user ID."""
    hash_input = f"{user_id}:{experiment}"
    hash_value = int(hashlib.md5(hash_input.encode()).hexdigest(), 16)
    bucket = (hash_value % 100) / 100.0

    return "control" if bucket < control_pct else "treatment"

Metrics Collection

Metric Type	Examples	Collection Method
Primary	Conversion rate, revenue	Event logging
Secondary	Latency, engagement	Request logs
Guardrail	Error rate, crashes	Monitoring system

Automated Retraining

Retraining Triggers

Trigger	Detection Method	Action
Scheduled	Cron (weekly/monthly)	Full retrain
Performance drop	Accuracy < threshold	Immediate retrain
Data drift	PSI > 0.2	Evaluate, then retrain
New data volume	X new samples	Incremental update

Retraining Pipeline

Trigger detection (schedule, drift, performance)
Fetch latest training data from feature store
Run training job with hyperparameter config
Evaluate model on holdout set
Compare against production model
If improved: register new model version
Deploy to staging for validation
Promote to production via canary
Validation: New model outperforms baseline on key metrics

MLflow Model Registry Integration

import mlflow

def register_model(model, metrics: dict, model_name: str):
    """Register trained model with MLflow."""
    with mlflow.start_run():
        # Log metrics
        for name, value in metrics.items():
            mlflow.log_metric(name, value)

        # Log model
        mlflow.sklearn.log_model(model, "model")

        # Register in model registry
        model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
        mlflow.register_model(model_uri, model_name)

7.1 KiB Raw Blame History

MLOps Production Patterns

Table of Contents

Model Deployment Pipeline

Deployment Workflow

Container Structure

Model Serving Options

Kubernetes Deployment

Feature Store Architecture

Feature Store Components

Feature Pipeline Workflow

Feature Definition Example

Model Monitoring

Monitoring Dimensions

Data Drift Detection

Monitoring Dashboard Metrics

A/B Testing Infrastructure

Experiment Workflow

Traffic Splitting

Metrics Collection

Automated Retraining

Retraining Triggers

Retraining Pipeline

MLflow Model Registry Integration

7.1 KiB

Raw Blame History