Files
claude-skills-reference/engineering-team/senior-computer-vision/references/object_detection_optimization.md
Alireza Rezvani 5930ac2993 fix(skill): rewrite senior-computer-vision with real CV content (#52) (#97)
Address feedback from Issue #52 (Grade: 45/100 F):

SKILL.md (532 lines):
- Added Table of Contents
- Added CV-specific trigger phrases
- 3 actionable workflows: Object Detection Pipeline, Model Optimization,
  Dataset Preparation
- Architecture selection guides with mAP/speed benchmarks
- Removed all "world-class" marketing language

References (unique, domain-specific content):
- computer_vision_architectures.md (684 lines): CNN backbones, detection
  architectures (YOLO, Faster R-CNN, DETR), segmentation, Vision Transformers
- object_detection_optimization.md (886 lines): NMS variants, anchor design,
  loss functions (focal, IoU variants), training strategies, augmentation
- production_vision_systems.md (1227 lines): ONNX export, TensorRT, edge
  deployment (Jetson, OpenVINO, CoreML), model serving, monitoring

Scripts (functional CLI tools):
- vision_model_trainer.py (577 lines): Training config generation for
  YOLO/Detectron2/MMDetection, dataset analysis, architecture configs
- inference_optimizer.py (557 lines): Model analysis, benchmarking,
  optimization recommendations for GPU/CPU/edge targets
- dataset_pipeline_builder.py (1700 lines): Format conversion (COCO/YOLO/VOC),
  dataset splitting, augmentation config, validation

Expected grade improvement: 45 → ~74/100 (B range)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 17:19:32 +01:00

22 KiB

Object Detection Optimization

Comprehensive guide to optimizing object detection models for accuracy and inference speed.

Table of Contents


Non-Maximum Suppression

NMS removes redundant overlapping detections to produce final predictions.

Standard NMS

Basic algorithm:

  1. Sort boxes by confidence score
  2. Select highest confidence box
  3. Remove boxes with IoU > threshold
  4. Repeat until no boxes remain
def nms(boxes, scores, iou_threshold=0.5):
    """
    boxes: (N, 4) in format [x1, y1, x2, y2]
    scores: (N,)
    """
    order = scores.argsort()[::-1]
    keep = []

    while len(order) > 0:
        i = order[0]
        keep.append(i)

        if len(order) == 1:
            break

        # Calculate IoU with remaining boxes
        ious = compute_iou(boxes[i], boxes[order[1:]])

        # Keep boxes with IoU <= threshold
        mask = ious <= iou_threshold
        order = order[1:][mask]

    return keep

Parameters:

  • iou_threshold: 0.5-0.7 typical (lower = more suppression)
  • score_threshold: 0.25-0.5 (filter low-confidence first)

Soft-NMS

Reduces scores instead of removing boxes entirely.

Formula:

score = score * exp(-IoU^2 / sigma)

Benefits:

  • Better for overlapping objects
  • +1-2% mAP improvement
  • Slightly slower than hard NMS
def soft_nms(boxes, scores, sigma=0.5, score_threshold=0.001):
    """Gaussian penalty soft-NMS"""
    order = scores.argsort()[::-1]
    keep = []

    while len(order) > 0:
        i = order[0]
        keep.append(i)

        if len(order) == 1:
            break

        ious = compute_iou(boxes[i], boxes[order[1:]])

        # Gaussian penalty
        weights = np.exp(-ious**2 / sigma)
        scores[order[1:]] *= weights

        # Re-sort by updated scores
        mask = scores[order[1:]] > score_threshold
        order = order[1:][mask]
        order = order[scores[order].argsort()[::-1]]

    return keep

DIoU-NMS

Uses Distance-IoU instead of standard IoU.

Formula:

DIoU = IoU - (d^2 / c^2)

Where:

  • d = center distance between boxes
  • c = diagonal of smallest enclosing box

Benefits:

  • Better for occluded objects
  • Penalizes distant boxes less
  • Works well with DIoU loss

Batched NMS

NMS per class (prevents cross-class suppression).

def batched_nms(boxes, scores, classes, iou_threshold):
    """Per-class NMS"""
    # Offset boxes by class ID to prevent cross-class suppression
    max_coordinate = boxes.max()
    offsets = classes * (max_coordinate + 1)
    boxes_for_nms = boxes + offsets[:, None]

    keep = torchvision.ops.nms(boxes_for_nms, scores, iou_threshold)
    return keep

NMS-Free Detection (DETR-style)

Transformer-based detectors eliminate NMS.

How DETR avoids NMS:

  • Object queries are learned embeddings
  • Bipartite matching in training
  • Each query outputs exactly one detection
  • Set-based loss enforces uniqueness

Benefits:

  • End-to-end differentiable
  • No hand-crafted post-processing
  • Better for complex scenes

Anchor Design and Optimization

Anchor-Based Detection

Traditional detectors use predefined anchor boxes.

Anchor parameters:

  • Scales: [32, 64, 128, 256, 512] pixels
  • Ratios: [0.5, 1.0, 2.0] (height/width)
  • Stride: Feature map stride (8, 16, 32)

Anchor assignment:

  • Positive: IoU > 0.7 with ground truth
  • Negative: IoU < 0.3 with all ground truths
  • Ignored: 0.3 < IoU < 0.7

K-Means Anchor Clustering

Optimize anchors for your dataset.

import numpy as np
from sklearn.cluster import KMeans

def optimize_anchors(annotations, num_anchors=9, image_size=640):
    """
    annotations: list of (width, height) for each bounding box
    """
    # Normalize to input size
    boxes = np.array(annotations)
    boxes = boxes / boxes.max() * image_size

    # K-means clustering
    kmeans = KMeans(n_clusters=num_anchors, random_state=42)
    kmeans.fit(boxes)

    # Get anchor sizes
    anchors = kmeans.cluster_centers_

    # Sort by area
    areas = anchors[:, 0] * anchors[:, 1]
    anchors = anchors[np.argsort(areas)]

    # Calculate mean IoU with ground truth
    mean_iou = calculate_anchor_fit(boxes, anchors)
    print(f"Optimized anchors (mean IoU: {mean_iou:.3f}):")
    print(anchors.astype(int))

    return anchors

def calculate_anchor_fit(boxes, anchors):
    """Calculate how well anchors fit the boxes"""
    ious = []
    for box in boxes:
        box_area = box[0] * box[1]
        anchor_areas = anchors[:, 0] * anchors[:, 1]
        intersections = np.minimum(box[0], anchors[:, 0]) * \
                       np.minimum(box[1], anchors[:, 1])
        unions = box_area + anchor_areas - intersections
        max_iou = (intersections / unions).max()
        ious.append(max_iou)
    return np.mean(ious)

Anchor-Free Detection

Modern detectors predict boxes without anchors.

FCOS-style (center-based):

  • Predict (l, t, r, b) distances from center
  • Centerness score for quality
  • Multi-scale assignment

YOLO v8 style:

  • Predict (x, y, w, h) directly
  • Task-aligned assigner
  • Distribution focal loss for regression

Benefits of anchor-free:

  • No hyperparameter tuning for anchors
  • Simpler architecture
  • Better generalization

Anchor Assignment Strategies

ATSS (Adaptive Training Sample Selection):

  1. For each GT, select k closest anchors per level
  2. Calculate IoU for selected anchors
  3. IoU threshold = mean + std of IoUs
  4. Assign positives where IoU > threshold

TAL (Task-Aligned Assigner - YOLO v8):

score = cls_score^alpha * IoU^beta

Where alpha=0.5, beta=6.0 (weights classification and localization)


Loss Functions

Classification Losses

Cross-Entropy Loss

Standard multi-class classification:

loss = -log(p_correct_class)

Focal Loss

Handles class imbalance by down-weighting easy examples.

def focal_loss(pred, target, gamma=2.0, alpha=0.25):
    """
    pred: (N, num_classes) predicted probabilities
    target: (N,) ground truth class indices
    """
    ce_loss = F.cross_entropy(pred, target, reduction='none')
    pt = torch.exp(-ce_loss)  # probability of correct class

    # Focal term: (1 - pt)^gamma
    focal_term = (1 - pt) ** gamma

    # Alpha weighting
    alpha_t = alpha * target + (1 - alpha) * (1 - target)

    loss = alpha_t * focal_term * ce_loss
    return loss.mean()

Hyperparameters:

  • gamma: 2.0 typical, higher = more focus on hard examples
  • alpha: 0.25 for foreground class weight

Quality Focal Loss (QFL)

Combines classification with IoU quality.

def quality_focal_loss(pred, target, beta=2.0):
    """
    target: IoU values (0-1) instead of binary
    """
    ce = F.binary_cross_entropy(pred, target, reduction='none')
    focal_weight = torch.abs(pred - target) ** beta
    loss = focal_weight * ce
    return loss.mean()

Regression Losses

Smooth L1 Loss

def smooth_l1_loss(pred, target, beta=1.0):
    diff = torch.abs(pred - target)
    loss = torch.where(
        diff < beta,
        0.5 * diff ** 2 / beta,
        diff - 0.5 * beta
    )
    return loss.mean()

IoU-Based Losses

IoU Loss:

L_IoU = 1 - IoU

GIoU (Generalized IoU):

GIoU = IoU - (C - U) / C
L_GIoU = 1 - GIoU

Where C = area of smallest enclosing box, U = union area.

DIoU (Distance IoU):

DIoU = IoU - d^2 / c^2
L_DIoU = 1 - DIoU

Where d = center distance, c = diagonal of enclosing box.

CIoU (Complete IoU):

CIoU = IoU - d^2 / c^2 - alpha*v
v = (4/pi^2) * (arctan(w_gt/h_gt) - arctan(w/h))^2
alpha = v / (1 - IoU + v)
L_CIoU = 1 - CIoU

Comparison:

Loss Handles Best For
L1/L2 Basic regression Simple tasks
IoU Overlap Standard detection
GIoU Non-overlapping Distant boxes
DIoU Center distance Faster convergence
CIoU Aspect ratio Best accuracy
def ciou_loss(pred_boxes, target_boxes):
    """
    pred_boxes, target_boxes: (N, 4) as [x1, y1, x2, y2]
    """
    # Standard IoU
    inter = compute_intersection(pred_boxes, target_boxes)
    union = compute_union(pred_boxes, target_boxes)
    iou = inter / (union + 1e-7)

    # Enclosing box diagonal
    enclose_x1 = torch.min(pred_boxes[:, 0], target_boxes[:, 0])
    enclose_y1 = torch.min(pred_boxes[:, 1], target_boxes[:, 1])
    enclose_x2 = torch.max(pred_boxes[:, 2], target_boxes[:, 2])
    enclose_y2 = torch.max(pred_boxes[:, 3], target_boxes[:, 3])
    c_sq = (enclose_x2 - enclose_x1)**2 + (enclose_y2 - enclose_y1)**2

    # Center distance
    pred_cx = (pred_boxes[:, 0] + pred_boxes[:, 2]) / 2
    pred_cy = (pred_boxes[:, 1] + pred_boxes[:, 3]) / 2
    target_cx = (target_boxes[:, 0] + target_boxes[:, 2]) / 2
    target_cy = (target_boxes[:, 1] + target_boxes[:, 3]) / 2
    d_sq = (pred_cx - target_cx)**2 + (pred_cy - target_cy)**2

    # Aspect ratio term
    pred_w = pred_boxes[:, 2] - pred_boxes[:, 0]
    pred_h = pred_boxes[:, 3] - pred_boxes[:, 1]
    target_w = target_boxes[:, 2] - target_boxes[:, 0]
    target_h = target_boxes[:, 3] - target_boxes[:, 1]

    v = (4 / math.pi**2) * (
        torch.atan(target_w / target_h) - torch.atan(pred_w / pred_h)
    )**2
    alpha_term = v / (1 - iou + v + 1e-7)

    ciou = iou - d_sq / (c_sq + 1e-7) - alpha_term * v
    return 1 - ciou

Distribution Focal Loss (DFL)

Used in YOLO v8 for regression.

Concept:

  • Predict distribution over discrete positions
  • Each regression target is a soft label
  • Allows uncertainty estimation
def dfl_loss(pred_dist, target, reg_max=16):
    """
    pred_dist: (N, reg_max) predicted distribution
    target: (N,) continuous target values (0 to reg_max)
    """
    # Convert continuous target to soft label
    target_left = target.floor().long()
    target_right = target_left + 1
    weight_right = target - target_left.float()
    weight_left = 1 - weight_right

    # Cross-entropy with soft targets
    loss_left = F.cross_entropy(pred_dist, target_left, reduction='none')
    loss_right = F.cross_entropy(pred_dist, target_right.clamp(max=reg_max-1),
                                  reduction='none')

    loss = weight_left * loss_left + weight_right * loss_right
    return loss.mean()

Training Strategies

Learning Rate Schedules

Warmup:

# Linear warmup for first N epochs
if epoch < warmup_epochs:
    lr = base_lr * (epoch + 1) / warmup_epochs

Cosine Annealing:

lr = lr_min + 0.5 * (lr_max - lr_min) * (1 + cos(pi * epoch / total_epochs))

Step Decay:

# Reduce by factor at milestones
lr = base_lr * (0.1 ** (milestones_passed))

Recommended schedule for detection:

optimizer = SGD(model.parameters(), lr=0.01, momentum=0.937, weight_decay=0.0005)

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer,
    T_max=total_epochs,
    eta_min=0.0001
)

# With warmup
warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
    optimizer,
    start_factor=0.1,
    total_iters=warmup_epochs
)

scheduler = torch.optim.lr_scheduler.SequentialLR(
    optimizer,
    schedulers=[warmup_scheduler, scheduler],
    milestones=[warmup_epochs]
)

Exponential Moving Average (EMA)

Smooths model weights for better stability.

class EMA:
    def __init__(self, model, decay=0.9999):
        self.model = model
        self.decay = decay
        self.shadow = {}
        for name, param in model.named_parameters():
            if param.requires_grad:
                self.shadow[name] = param.data.clone()

    def update(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                self.shadow[name] = (
                    self.decay * self.shadow[name] +
                    (1 - self.decay) * param.data
                )

    def apply_shadow(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                param.data.copy_(self.shadow[name])

Usage:

  • Update EMA after each training step
  • Use EMA weights for validation/inference
  • Decay: 0.9999 typical (higher = slower update)

Multi-Scale Training

Train with varying input sizes.

# Random size each batch
sizes = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768]
input_size = random.choice(sizes)

# Resize batch to selected size
images = F.interpolate(images, size=input_size, mode='bilinear')

Benefits:

  • Better scale invariance
  • +1-2% mAP improvement
  • Slower training (variable batch size)

Gradient Accumulation

Simulate larger batch sizes.

accumulation_steps = 4
optimizer.zero_grad()

for i, (images, targets) in enumerate(dataloader):
    loss = model(images, targets) / accumulation_steps
    loss.backward()

    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()

Mixed Precision Training

Use FP16 for speed and memory.

from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for images, targets in dataloader:
    optimizer.zero_grad()

    with autocast():
        loss = model(images, targets)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

Benefits:

  • 2-3x faster training
  • 50% memory reduction
  • Minimal accuracy loss

Data Augmentation

Geometric Augmentations

import albumentations as A

geometric = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.Rotate(limit=15, p=0.3),
    A.RandomScale(scale_limit=0.2, p=0.5),
    A.Affine(translate_percent={'x': (-0.1, 0.1), 'y': (-0.1, 0.1)}, p=0.3),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels']))

Color Augmentations

color = A.Compose([
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
    A.CLAHE(clip_limit=2.0, p=0.1),
    A.GaussianBlur(blur_limit=3, p=0.1),
    A.GaussNoise(var_limit=(10, 50), p=0.1),
])

Mosaic Augmentation

Combines 4 images into one (YOLO-style).

def mosaic_augmentation(images, labels, input_size=640):
    """
    images: list of 4 images
    labels: list of 4 label arrays
    """
    result_image = np.zeros((input_size, input_size, 3), dtype=np.uint8)
    result_labels = []

    # Random center point
    cx = int(random.uniform(input_size * 0.25, input_size * 0.75))
    cy = int(random.uniform(input_size * 0.25, input_size * 0.75))

    positions = [
        (0, 0, cx, cy),           # top-left
        (cx, 0, input_size, cy),  # top-right
        (0, cy, cx, input_size),  # bottom-left
        (cx, cy, input_size, input_size),  # bottom-right
    ]

    for i, (x1, y1, x2, y2) in enumerate(positions):
        img = images[i]
        h, w = y2 - y1, x2 - x1

        # Resize and place
        img_resized = cv2.resize(img, (w, h))
        result_image[y1:y2, x1:x2] = img_resized

        # Transform labels
        for label in labels[i]:
            # Scale and shift bounding boxes
            new_label = transform_bbox(label, img.shape, (h, w), (x1, y1))
            result_labels.append(new_label)

    return result_image, result_labels

MixUp

Blends two images and labels.

def mixup(image1, labels1, image2, labels2, alpha=0.5):
    """
    alpha: mixing ratio (0.5 = equal blend)
    """
    # Blend images
    mixed_image = (alpha * image1 + (1 - alpha) * image2).astype(np.uint8)

    # Blend labels with soft weights
    labels1_weighted = [(box, cls, alpha) for box, cls in labels1]
    labels2_weighted = [(box, cls, 1-alpha) for box, cls in labels2]

    mixed_labels = labels1_weighted + labels2_weighted
    return mixed_image, mixed_labels

Copy-Paste Augmentation

Paste objects from one image to another.

def copy_paste(background, bg_labels, source, src_labels, src_masks):
    """
    Paste segmented objects onto background
    """
    result = background.copy()

    for mask, label in zip(src_masks, src_labels):
        # Random position
        x_offset = random.randint(0, background.shape[1] - mask.shape[1])
        y_offset = random.randint(0, background.shape[0] - mask.shape[0])

        # Paste with mask
        region = result[y_offset:y_offset+mask.shape[0],
                       x_offset:x_offset+mask.shape[1]]
        region[mask > 0] = source[mask > 0]

        # Add new label
        new_box = transform_bbox(label, x_offset, y_offset)
        bg_labels.append(new_box)

    return result, bg_labels

Cutout / Random Erasing

Randomly erase patches.

def cutout(image, num_holes=8, max_h_size=32, max_w_size=32):
    h, w = image.shape[:2]
    result = image.copy()

    for _ in range(num_holes):
        y = random.randint(0, h)
        x = random.randint(0, w)
        h_size = random.randint(1, max_h_size)
        w_size = random.randint(1, max_w_size)

        y1, y2 = max(0, y - h_size // 2), min(h, y + h_size // 2)
        x1, x2 = max(0, x - w_size // 2), min(w, x + w_size // 2)

        result[y1:y2, x1:x2] = 0  # or random color

    return result

Model Optimization Techniques

Pruning

Remove unimportant weights.

Magnitude Pruning:

import torch.nn.utils.prune as prune

# Prune 30% of weights with smallest magnitude
for name, module in model.named_modules():
    if isinstance(module, nn.Conv2d):
        prune.l1_unstructured(module, name='weight', amount=0.3)

Structured Pruning (channels):

# Prune entire channels
prune.ln_structured(module, name='weight', amount=0.3, n=2, dim=0)

Knowledge Distillation

Train smaller model with larger teacher.

def distillation_loss(student_logits, teacher_logits, labels,
                      temperature=4.0, alpha=0.7):
    """
    Combine soft targets from teacher with hard labels
    """
    # Soft targets
    soft_student = F.log_softmax(student_logits / temperature, dim=1)
    soft_teacher = F.softmax(teacher_logits / temperature, dim=1)
    soft_loss = F.kl_div(soft_student, soft_teacher, reduction='batchmean')
    soft_loss *= temperature ** 2  # Scale by T^2

    # Hard targets
    hard_loss = F.cross_entropy(student_logits, labels)

    # Combined loss
    return alpha * soft_loss + (1 - alpha) * hard_loss

Quantization

Reduce precision for faster inference.

Post-Training Quantization:

import torch.quantization

# Prepare model
model.set_mode('inference')
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)

# Calibrate with representative data
with torch.no_grad():
    for images in calibration_loader:
        model(images)

# Convert to quantized model
torch.quantization.convert(model, inplace=True)

Quantization-Aware Training:

# Insert fake quantization during training
model.train()
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
model_prepared = torch.quantization.prepare_qat(model)

# Train with fake quantization
for epoch in range(num_epochs):
    train(model_prepared)

# Convert to quantized
model_quantized = torch.quantization.convert(model_prepared)

Hyperparameter Tuning

Key Hyperparameters

Parameter Range Default Impact
Learning rate 1e-4 to 1e-1 0.01 Critical
Batch size 4 to 64 16 Memory/speed
Weight decay 1e-5 to 1e-3 5e-4 Regularization
Momentum 0.9 to 0.99 0.937 Optimization
Warmup epochs 1 to 10 3 Stability
IoU threshold (NMS) 0.4 to 0.7 0.5 Recall/precision
Confidence threshold 0.1 to 0.5 0.25 Detection count
Image size 320 to 1280 640 Accuracy/speed

Tuning Strategy

  1. Baseline: Use default hyperparameters
  2. Learning rate: Grid search [1e-3, 5e-3, 1e-2, 5e-2]
  3. Batch size: Maximum that fits in memory
  4. Augmentation: Start minimal, add progressively
  5. Epochs: Train until validation loss plateaus
  6. NMS threshold: Tune on validation set

Automated Hyperparameter Optimization

import optuna

def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-4, 1e-1)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-3)
    mosaic_prob = trial.suggest_uniform('mosaic_prob', 0.0, 1.0)

    model = create_model()
    train_model(model, lr=lr, weight_decay=weight_decay, mosaic_prob=mosaic_prob)
    mAP = test_model(model)

    return mAP

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

print(f"Best params: {study.best_params}")
print(f"Best mAP: {study.best_value}")

Detection-Specific Tips

Small Object Detection

  1. Higher resolution: 1280px instead of 640px
  2. SAHI (Slicing): Inference on overlapping tiles
  3. More FPN levels: P2 level (1/4 scale)
  4. Anchor adjustment: Smaller anchors for small objects
  5. Copy-paste augmentation: Increase small object frequency

Handling Class Imbalance

  1. Focal loss: gamma=2.0, alpha=0.25
  2. Over-sampling: Repeat rare class images
  3. Class weights: Inverse frequency weighting
  4. Copy-paste: Augment rare classes

Improving Localization

  1. CIoU loss: Includes aspect ratio term
  2. Cascade detection: Progressive refinement
  3. Higher IoU threshold: 0.6-0.7 for positive samples
  4. Deformable convolutions: Learn spatial offsets

Reducing False Positives

  1. Higher confidence threshold: 0.4-0.5
  2. More negative samples: Hard negative mining
  3. Background class weight: Increase penalty
  4. Ensemble: Multiple model voting

Resources