claude-skills-reference/engineering-team/senior-computer-vision/references/object_detection_optimization.md

# Object Detection Optimization

Comprehensive guide to optimizing object detection models for accuracy and inference speed.

## Table of Contents

- [Non-Maximum Suppression](#non-maximum-suppression)
- [Anchor Design and Optimization](#anchor-design-and-optimization)
- [Loss Functions](#loss-functions)
- [Training Strategies](#training-strategies)
- [Data Augmentation](#data-augmentation)
- [Model Optimization Techniques](#model-optimization-techniques)
- [Hyperparameter Tuning](#hyperparameter-tuning)

---

## Non-Maximum Suppression

NMS removes redundant overlapping detections to produce final predictions.

### Standard NMS

Basic algorithm:
1. Sort boxes by confidence score
2. Select highest confidence box
3. Remove boxes with IoU > threshold
4. Repeat until no boxes remain

```python
def nms(boxes, scores, iou_threshold=0.5):
    """
    boxes: (N, 4) in format [x1, y1, x2, y2]
    scores: (N,)
    """
    order = scores.argsort()[::-1]
    keep = []

    while len(order) > 0:
        i = order[0]
        keep.append(i)

        if len(order) == 1:
            break

        # Calculate IoU with remaining boxes
        ious = compute_iou(boxes[i], boxes[order[1:]])

        # Keep boxes with IoU <= threshold
        mask = ious <= iou_threshold
        order = order[1:][mask]

    return keep
```

**Parameters:**
- `iou_threshold`: 0.5-0.7 typical (lower = more suppression)
- `score_threshold`: 0.25-0.5 (filter low-confidence first)

### Soft-NMS

Reduces scores instead of removing boxes entirely.

**Formula:**
```
score = score * exp(-IoU^2 / sigma)
```

**Benefits:**
- Better for overlapping objects
- +1-2% mAP improvement
- Slightly slower than hard NMS

```python
def soft_nms(boxes, scores, sigma=0.5, score_threshold=0.001):
    """Gaussian penalty soft-NMS"""
    order = scores.argsort()[::-1]
    keep = []

    while len(order) > 0:
        i = order[0]
        keep.append(i)

        if len(order) == 1:
            break

        ious = compute_iou(boxes[i], boxes[order[1:]])

        # Gaussian penalty
        weights = np.exp(-ious**2 / sigma)
        scores[order[1:]] *= weights

        # Re-sort by updated scores
        mask = scores[order[1:]] > score_threshold
        order = order[1:][mask]
        order = order[scores[order].argsort()[::-1]]

    return keep
```

### DIoU-NMS

Uses Distance-IoU instead of standard IoU.

**Formula:**
```
DIoU = IoU - (d^2 / c^2)
```

Where:
- d = center distance between boxes
- c = diagonal of smallest enclosing box

**Benefits:**
- Better for occluded objects
- Penalizes distant boxes less
- Works well with DIoU loss

### Batched NMS

NMS per class (prevents cross-class suppression).

```python
def batched_nms(boxes, scores, classes, iou_threshold):
    """Per-class NMS"""
    # Offset boxes by class ID to prevent cross-class suppression
    max_coordinate = boxes.max()
    offsets = classes * (max_coordinate + 1)
    boxes_for_nms = boxes + offsets[:, None]

    keep = torchvision.ops.nms(boxes_for_nms, scores, iou_threshold)
    return keep
```

### NMS-Free Detection (DETR-style)

Transformer-based detectors eliminate NMS.

**How DETR avoids NMS:**
- Object queries are learned embeddings
- Bipartite matching in training
- Each query outputs exactly one detection
- Set-based loss enforces uniqueness

**Benefits:**
- End-to-end differentiable
- No hand-crafted post-processing
- Better for complex scenes

---

## Anchor Design and Optimization

### Anchor-Based Detection

Traditional detectors use predefined anchor boxes.

**Anchor parameters:**
- Scales: [32, 64, 128, 256, 512] pixels
- Ratios: [0.5, 1.0, 2.0] (height/width)
- Stride: Feature map stride (8, 16, 32)

**Anchor assignment:**
- Positive: IoU > 0.7 with ground truth
- Negative: IoU < 0.3 with all ground truths
- Ignored: 0.3 < IoU < 0.7

### K-Means Anchor Clustering

Optimize anchors for your dataset.

```python
import numpy as np
from sklearn.cluster import KMeans

def optimize_anchors(annotations, num_anchors=9, image_size=640):
    """
    annotations: list of (width, height) for each bounding box
    """
    # Normalize to input size
    boxes = np.array(annotations)
    boxes = boxes / boxes.max() * image_size

    # K-means clustering
    kmeans = KMeans(n_clusters=num_anchors, random_state=42)
    kmeans.fit(boxes)

    # Get anchor sizes
    anchors = kmeans.cluster_centers_

    # Sort by area
    areas = anchors[:, 0] * anchors[:, 1]
    anchors = anchors[np.argsort(areas)]

    # Calculate mean IoU with ground truth
    mean_iou = calculate_anchor_fit(boxes, anchors)
    print(f"Optimized anchors (mean IoU: {mean_iou:.3f}):")
    print(anchors.astype(int))

    return anchors

def calculate_anchor_fit(boxes, anchors):
    """Calculate how well anchors fit the boxes"""
    ious = []
    for box in boxes:
        box_area = box[0] * box[1]
        anchor_areas = anchors[:, 0] * anchors[:, 1]
        intersections = np.minimum(box[0], anchors[:, 0]) * \
                       np.minimum(box[1], anchors[:, 1])
        unions = box_area + anchor_areas - intersections
        max_iou = (intersections / unions).max()
        ious.append(max_iou)
    return np.mean(ious)
```

### Anchor-Free Detection

Modern detectors predict boxes without anchors.

**FCOS-style (center-based):**
- Predict (l, t, r, b) distances from center
- Centerness score for quality
- Multi-scale assignment

**YOLO v8 style:**
- Predict (x, y, w, h) directly
- Task-aligned assigner
- Distribution focal loss for regression

**Benefits of anchor-free:**
- No hyperparameter tuning for anchors
- Simpler architecture
- Better generalization

### Anchor Assignment Strategies

**ATSS (Adaptive Training Sample Selection):**
1. For each GT, select k closest anchors per level
2. Calculate IoU for selected anchors
3. IoU threshold = mean + std of IoUs
4. Assign positives where IoU > threshold

**TAL (Task-Aligned Assigner - YOLO v8):**
```
score = cls_score^alpha * IoU^beta
```

Where alpha=0.5, beta=6.0 (weights classification and localization)

---

## Loss Functions

### Classification Losses

#### Cross-Entropy Loss

Standard multi-class classification:
```python
loss = -log(p_correct_class)
```

#### Focal Loss

Handles class imbalance by down-weighting easy examples.

```python
def focal_loss(pred, target, gamma=2.0, alpha=0.25):
    """
    pred: (N, num_classes) predicted probabilities
    target: (N,) ground truth class indices
    """
    ce_loss = F.cross_entropy(pred, target, reduction='none')
    pt = torch.exp(-ce_loss)  # probability of correct class

    # Focal term: (1 - pt)^gamma
    focal_term = (1 - pt) ** gamma

    # Alpha weighting
    alpha_t = alpha * target + (1 - alpha) * (1 - target)

    loss = alpha_t * focal_term * ce_loss
    return loss.mean()
```

**Hyperparameters:**
- gamma: 2.0 typical, higher = more focus on hard examples
- alpha: 0.25 for foreground class weight

#### Quality Focal Loss (QFL)

Combines classification with IoU quality.

```python
def quality_focal_loss(pred, target, beta=2.0):
    """
    target: IoU values (0-1) instead of binary
    """
    ce = F.binary_cross_entropy(pred, target, reduction='none')
    focal_weight = torch.abs(pred - target) ** beta
    loss = focal_weight * ce
    return loss.mean()
```

### Regression Losses

#### Smooth L1 Loss

```python
def smooth_l1_loss(pred, target, beta=1.0):
    diff = torch.abs(pred - target)
    loss = torch.where(
        diff < beta,
        0.5 * diff ** 2 / beta,
        diff - 0.5 * beta
    )
    return loss.mean()
```

#### IoU-Based Losses

**IoU Loss:**
```
L_IoU = 1 - IoU
```

**GIoU (Generalized IoU):**
```
GIoU = IoU - (C - U) / C
L_GIoU = 1 - GIoU
```

Where C = area of smallest enclosing box, U = union area.

**DIoU (Distance IoU):**
```
DIoU = IoU - d^2 / c^2
L_DIoU = 1 - DIoU
```

Where d = center distance, c = diagonal of enclosing box.

**CIoU (Complete IoU):**
```
CIoU = IoU - d^2 / c^2 - alpha*v
v = (4/pi^2) * (arctan(w_gt/h_gt) - arctan(w/h))^2
alpha = v / (1 - IoU + v)
L_CIoU = 1 - CIoU
```

**Comparison:**

| Loss | Handles | Best For |
|------|---------|----------|
| L1/L2 | Basic regression | Simple tasks |
| IoU | Overlap | Standard detection |
| GIoU | Non-overlapping | Distant boxes |
| DIoU | Center distance | Faster convergence |
| CIoU | Aspect ratio | Best accuracy |

```python
def ciou_loss(pred_boxes, target_boxes):
    """
    pred_boxes, target_boxes: (N, 4) as [x1, y1, x2, y2]
    """
    # Standard IoU
    inter = compute_intersection(pred_boxes, target_boxes)
    union = compute_union(pred_boxes, target_boxes)
    iou = inter / (union + 1e-7)

    # Enclosing box diagonal
    enclose_x1 = torch.min(pred_boxes[:, 0], target_boxes[:, 0])
    enclose_y1 = torch.min(pred_boxes[:, 1], target_boxes[:, 1])
    enclose_x2 = torch.max(pred_boxes[:, 2], target_boxes[:, 2])
    enclose_y2 = torch.max(pred_boxes[:, 3], target_boxes[:, 3])
    c_sq = (enclose_x2 - enclose_x1)**2 + (enclose_y2 - enclose_y1)**2

    # Center distance
    pred_cx = (pred_boxes[:, 0] + pred_boxes[:, 2]) / 2
    pred_cy = (pred_boxes[:, 1] + pred_boxes[:, 3]) / 2
    target_cx = (target_boxes[:, 0] + target_boxes[:, 2]) / 2
    target_cy = (target_boxes[:, 1] + target_boxes[:, 3]) / 2
    d_sq = (pred_cx - target_cx)**2 + (pred_cy - target_cy)**2

    # Aspect ratio term
    pred_w = pred_boxes[:, 2] - pred_boxes[:, 0]
    pred_h = pred_boxes[:, 3] - pred_boxes[:, 1]
    target_w = target_boxes[:, 2] - target_boxes[:, 0]
    target_h = target_boxes[:, 3] - target_boxes[:, 1]

    v = (4 / math.pi**2) * (
        torch.atan(target_w / target_h) - torch.atan(pred_w / pred_h)
    )**2
    alpha_term = v / (1 - iou + v + 1e-7)

    ciou = iou - d_sq / (c_sq + 1e-7) - alpha_term * v
    return 1 - ciou
```

### Distribution Focal Loss (DFL)

Used in YOLO v8 for regression.

**Concept:**
- Predict distribution over discrete positions
- Each regression target is a soft label
- Allows uncertainty estimation

```python
def dfl_loss(pred_dist, target, reg_max=16):
    """
    pred_dist: (N, reg_max) predicted distribution
    target: (N,) continuous target values (0 to reg_max)
    """
    # Convert continuous target to soft label
    target_left = target.floor().long()
    target_right = target_left + 1
    weight_right = target - target_left.float()
    weight_left = 1 - weight_right

    # Cross-entropy with soft targets
    loss_left = F.cross_entropy(pred_dist, target_left, reduction='none')
    loss_right = F.cross_entropy(pred_dist, target_right.clamp(max=reg_max-1),
                                  reduction='none')

    loss = weight_left * loss_left + weight_right * loss_right
    return loss.mean()
```

---

## Training Strategies

### Learning Rate Schedules

**Warmup:**
```python
# Linear warmup for first N epochs
if epoch < warmup_epochs:
    lr = base_lr * (epoch + 1) / warmup_epochs
```

**Cosine Annealing:**
```python
lr = lr_min + 0.5 * (lr_max - lr_min) * (1 + cos(pi * epoch / total_epochs))
```

**Step Decay:**
```python
# Reduce by factor at milestones
lr = base_lr * (0.1 ** (milestones_passed))
```

**Recommended schedule for detection:**
```python
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.937, weight_decay=0.0005)

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
    optimizer,
    T_max=total_epochs,
    eta_min=0.0001
)

# With warmup
warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
    optimizer,
    start_factor=0.1,
    total_iters=warmup_epochs
)

scheduler = torch.optim.lr_scheduler.SequentialLR(
    optimizer,
    schedulers=[warmup_scheduler, scheduler],
    milestones=[warmup_epochs]
)
```

### Exponential Moving Average (EMA)

Smooths model weights for better stability.

```python
class EMA:
    def __init__(self, model, decay=0.9999):
        self.model = model
        self.decay = decay
        self.shadow = {}
        for name, param in model.named_parameters():
            if param.requires_grad:
                self.shadow[name] = param.data.clone()

    def update(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                self.shadow[name] = (
                    self.decay * self.shadow[name] +
                    (1 - self.decay) * param.data
                )

    def apply_shadow(self):
        for name, param in self.model.named_parameters():
            if param.requires_grad:
                param.data.copy_(self.shadow[name])
```

**Usage:**
- Update EMA after each training step
- Use EMA weights for validation/inference
- Decay: 0.9999 typical (higher = slower update)

### Multi-Scale Training

Train with varying input sizes.

```python
# Random size each batch
sizes = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768]
input_size = random.choice(sizes)

# Resize batch to selected size
images = F.interpolate(images, size=input_size, mode='bilinear')
```

**Benefits:**
- Better scale invariance
- +1-2% mAP improvement
- Slower training (variable batch size)

### Gradient Accumulation

Simulate larger batch sizes.

```python
accumulation_steps = 4
optimizer.zero_grad()

for i, (images, targets) in enumerate(dataloader):
    loss = model(images, targets) / accumulation_steps
    loss.backward()

    if (i + 1) % accumulation_steps == 0:
        optimizer.step()
        optimizer.zero_grad()
```

### Mixed Precision Training

Use FP16 for speed and memory.

```python
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for images, targets in dataloader:
    optimizer.zero_grad()

    with autocast():
        loss = model(images, targets)

    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
```

**Benefits:**
- 2-3x faster training
- 50% memory reduction
- Minimal accuracy loss

---

## Data Augmentation

### Geometric Augmentations

```python
import albumentations as A

geometric = A.Compose([
    A.HorizontalFlip(p=0.5),
    A.Rotate(limit=15, p=0.3),
    A.RandomScale(scale_limit=0.2, p=0.5),
    A.Affine(translate_percent={'x': (-0.1, 0.1), 'y': (-0.1, 0.1)}, p=0.3),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels']))
```

### Color Augmentations

```python
color = A.Compose([
    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
    A.CLAHE(clip_limit=2.0, p=0.1),
    A.GaussianBlur(blur_limit=3, p=0.1),
    A.GaussNoise(var_limit=(10, 50), p=0.1),
])
```

### Mosaic Augmentation

Combines 4 images into one (YOLO-style).

```python
def mosaic_augmentation(images, labels, input_size=640):
    """
    images: list of 4 images
    labels: list of 4 label arrays
    """
    result_image = np.zeros((input_size, input_size, 3), dtype=np.uint8)
    result_labels = []

    # Random center point
    cx = int(random.uniform(input_size * 0.25, input_size * 0.75))
    cy = int(random.uniform(input_size * 0.25, input_size * 0.75))

    positions = [
        (0, 0, cx, cy),           # top-left
        (cx, 0, input_size, cy),  # top-right
        (0, cy, cx, input_size),  # bottom-left
        (cx, cy, input_size, input_size),  # bottom-right
    ]

    for i, (x1, y1, x2, y2) in enumerate(positions):
        img = images[i]
        h, w = y2 - y1, x2 - x1

        # Resize and place
        img_resized = cv2.resize(img, (w, h))
        result_image[y1:y2, x1:x2] = img_resized

        # Transform labels
        for label in labels[i]:
            # Scale and shift bounding boxes
            new_label = transform_bbox(label, img.shape, (h, w), (x1, y1))
            result_labels.append(new_label)

    return result_image, result_labels
```

### MixUp

Blends two images and labels.

```python
def mixup(image1, labels1, image2, labels2, alpha=0.5):
    """
    alpha: mixing ratio (0.5 = equal blend)
    """
    # Blend images
    mixed_image = (alpha * image1 + (1 - alpha) * image2).astype(np.uint8)

    # Blend labels with soft weights
    labels1_weighted = [(box, cls, alpha) for box, cls in labels1]
    labels2_weighted = [(box, cls, 1-alpha) for box, cls in labels2]

    mixed_labels = labels1_weighted + labels2_weighted
    return mixed_image, mixed_labels
```

### Copy-Paste Augmentation

Paste objects from one image to another.

```python
def copy_paste(background, bg_labels, source, src_labels, src_masks):
    """
    Paste segmented objects onto background
    """
    result = background.copy()

    for mask, label in zip(src_masks, src_labels):
        # Random position
        x_offset = random.randint(0, background.shape[1] - mask.shape[1])
        y_offset = random.randint(0, background.shape[0] - mask.shape[0])

        # Paste with mask
        region = result[y_offset:y_offset+mask.shape[0],
                       x_offset:x_offset+mask.shape[1]]
        region[mask > 0] = source[mask > 0]

        # Add new label
        new_box = transform_bbox(label, x_offset, y_offset)
        bg_labels.append(new_box)

    return result, bg_labels
```

### Cutout / Random Erasing

Randomly erase patches.

```python
def cutout(image, num_holes=8, max_h_size=32, max_w_size=32):
    h, w = image.shape[:2]
    result = image.copy()

    for _ in range(num_holes):
        y = random.randint(0, h)
        x = random.randint(0, w)
        h_size = random.randint(1, max_h_size)
        w_size = random.randint(1, max_w_size)

        y1, y2 = max(0, y - h_size // 2), min(h, y + h_size // 2)
        x1, x2 = max(0, x - w_size // 2), min(w, x + w_size // 2)

        result[y1:y2, x1:x2] = 0  # or random color

    return result
```

---

## Model Optimization Techniques

### Pruning

Remove unimportant weights.

**Magnitude Pruning:**
```python
import torch.nn.utils.prune as prune

# Prune 30% of weights with smallest magnitude
for name, module in model.named_modules():
    if isinstance(module, nn.Conv2d):
        prune.l1_unstructured(module, name='weight', amount=0.3)
```

**Structured Pruning (channels):**
```python
# Prune entire channels
prune.ln_structured(module, name='weight', amount=0.3, n=2, dim=0)
```

### Knowledge Distillation

Train smaller model with larger teacher.

```python
def distillation_loss(student_logits, teacher_logits, labels,
                      temperature=4.0, alpha=0.7):
    """
    Combine soft targets from teacher with hard labels
    """
    # Soft targets
    soft_student = F.log_softmax(student_logits / temperature, dim=1)
    soft_teacher = F.softmax(teacher_logits / temperature, dim=1)
    soft_loss = F.kl_div(soft_student, soft_teacher, reduction='batchmean')
    soft_loss *= temperature ** 2  # Scale by T^2

    # Hard targets
    hard_loss = F.cross_entropy(student_logits, labels)

    # Combined loss
    return alpha * soft_loss + (1 - alpha) * hard_loss
```

### Quantization

Reduce precision for faster inference.

**Post-Training Quantization:**
```python
import torch.quantization

# Prepare model
model.set_mode('inference')
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
torch.quantization.prepare(model, inplace=True)

# Calibrate with representative data
with torch.no_grad():
    for images in calibration_loader:
        model(images)

# Convert to quantized model
torch.quantization.convert(model, inplace=True)
```

**Quantization-Aware Training:**
```python
# Insert fake quantization during training
model.train()
model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
model_prepared = torch.quantization.prepare_qat(model)

# Train with fake quantization
for epoch in range(num_epochs):
    train(model_prepared)

# Convert to quantized
model_quantized = torch.quantization.convert(model_prepared)
```

---

## Hyperparameter Tuning

### Key Hyperparameters

| Parameter | Range | Default | Impact |
|-----------|-------|---------|--------|
| Learning rate | 1e-4 to 1e-1 | 0.01 | Critical |
| Batch size | 4 to 64 | 16 | Memory/speed |
| Weight decay | 1e-5 to 1e-3 | 5e-4 | Regularization |
| Momentum | 0.9 to 0.99 | 0.937 | Optimization |
| Warmup epochs | 1 to 10 | 3 | Stability |
| IoU threshold (NMS) | 0.4 to 0.7 | 0.5 | Recall/precision |
| Confidence threshold | 0.1 to 0.5 | 0.25 | Detection count |
| Image size | 320 to 1280 | 640 | Accuracy/speed |

### Tuning Strategy

1. **Baseline**: Use default hyperparameters
2. **Learning rate**: Grid search [1e-3, 5e-3, 1e-2, 5e-2]
3. **Batch size**: Maximum that fits in memory
4. **Augmentation**: Start minimal, add progressively
5. **Epochs**: Train until validation loss plateaus
6. **NMS threshold**: Tune on validation set

### Automated Hyperparameter Optimization

```python
import optuna

def objective(trial):
    lr = trial.suggest_loguniform('lr', 1e-4, 1e-1)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-3)
    mosaic_prob = trial.suggest_uniform('mosaic_prob', 0.0, 1.0)

    model = create_model()
    train_model(model, lr=lr, weight_decay=weight_decay, mosaic_prob=mosaic_prob)
    mAP = test_model(model)

    return mAP

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

print(f"Best params: {study.best_params}")
print(f"Best mAP: {study.best_value}")
```

---

## Detection-Specific Tips

### Small Object Detection

1. **Higher resolution**: 1280px instead of 640px
2. **SAHI (Slicing)**: Inference on overlapping tiles
3. **More FPN levels**: P2 level (1/4 scale)
4. **Anchor adjustment**: Smaller anchors for small objects
5. **Copy-paste augmentation**: Increase small object frequency

### Handling Class Imbalance

1. **Focal loss**: gamma=2.0, alpha=0.25
2. **Over-sampling**: Repeat rare class images
3. **Class weights**: Inverse frequency weighting
4. **Copy-paste**: Augment rare classes

### Improving Localization

1. **CIoU loss**: Includes aspect ratio term
2. **Cascade detection**: Progressive refinement
3. **Higher IoU threshold**: 0.6-0.7 for positive samples
4. **Deformable convolutions**: Learn spatial offsets

### Reducing False Positives

1. **Higher confidence threshold**: 0.4-0.5
2. **More negative samples**: Hard negative mining
3. **Background class weight**: Increase penalty
4. **Ensemble**: Multiple model voting

---

## Resources

- [MMDetection training configs](https://github.com/open-mmlab/mmdetection/tree/main/configs)
- [Ultralytics training tips](https://docs.ultralytics.com/guides/hyperparameter-tuning/)
- [Albumentations detection](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/)
- [Focal Loss paper](https://arxiv.org/abs/1708.02002)
- [CIoU paper](https://arxiv.org/abs/2005.03572)