fix(skill): rewrite senior-computer-vision with real CV content (#52) (#97)

Address feedback from Issue #52 (Grade: 45/100 F): SKILL.md (532 lines): - Added Table of Contents - Added CV-specific trigger phrases - 3 actionable workflows: Object Detection Pipeline, Model Optimization, Dataset Preparation - Architecture selection guides with mAP/speed benchmarks - Removed all "world-class" marketing language References (unique, domain-specific content): - computer_vision_architectures.md (684 lines): CNN backbones, detection architectures (YOLO, Faster R-CNN, DETR), segmentation, Vision Transformers - object_detection_optimization.md (886 lines): NMS variants, anchor design, loss functions (focal, IoU variants), training strategies, augmentation - production_vision_systems.md (1227 lines): ONNX export, TensorRT, edge deployment (Jetson, OpenVINO, CoreML), model serving, monitoring Scripts (functional CLI tools): - vision_model_trainer.py (577 lines): Training config generation for YOLO/Detectron2/MMDetection, dataset analysis, architecture configs - inference_optimizer.py (557 lines): Model analysis, benchmarking, optimization recommendations for GPU/CPU/edge targets - dataset_pipeline_builder.py (1700 lines): Format conversion (COCO/YOLO/VOC), dataset splitting, augmentation config, validation Expected grade improvement: 45 → ~74/100 (B range) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 17:19:32 +01:00
parent c7f3c23b6f
commit 5930ac2993
7 changed files with 5948 additions and 557 deletions
--- a/engineering-team/senior-computer-vision/SKILL.md
+++ b/engineering-team/senior-computer-vision/SKILL.md
@@ -1,226 +1,531 @@
 ---
 name: senior-computer-vision
-description: World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production deployment. Use when building vision AI systems, implementing object detection, training custom vision models, or optimizing inference pipelines.
+description: Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.
 ---

 # Senior Computer Vision Engineer

-World-class senior computer vision engineer skill for production-grade AI/ML/Data systems.
+Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.
+
+## Table of Contents
+
+- [Quick Start](#quick-start)
+- [Core Expertise](#core-expertise)
+- [Tech Stack](#tech-stack)
+- [Workflow 1: Object Detection Pipeline](#workflow-1-object-detection-pipeline)
+- [Workflow 2: Model Optimization and Deployment](#workflow-2-model-optimization-and-deployment)
+- [Workflow 3: Custom Dataset Preparation](#workflow-3-custom-dataset-preparation)
+- [Architecture Selection Guide](#architecture-selection-guide)
+- [Reference Documentation](#reference-documentation)
+- [Common Commands](#common-commands)

 ## Quick Start

-### Main Capabilities
-
 ```bash
-# Core Tool 1
-python scripts/vision_model_trainer.py --input data/ --output results/
+# Generate training configuration for YOLO or Faster R-CNN
+python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

-# Core Tool 2  
-python scripts/inference_optimizer.py --target project/ --analyze
+# Analyze model for optimization opportunities (quantization, pruning)
+python scripts/inference_optimizer.py model.pt --target onnx --benchmark

-# Core Tool 3
-python scripts/dataset_pipeline_builder.py --config config.yaml --deploy
+# Build dataset pipeline with augmentations
+python scripts/dataset_pipeline_builder.py images/ --format coco --augment
 ```

 ## Core Expertise

-This skill covers world-class capabilities in:
+This skill provides guidance on:

- Advanced production patterns and architectures
- Scalable system design and implementation
- Performance optimization at scale
- MLOps and DataOps best practices
- Real-time processing and inference
- Distributed computing frameworks
- Model deployment and monitoring
- Security and compliance
- Cost optimization
- Team leadership and mentoring
+- **Object Detection**: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
+- **Instance Segmentation**: Mask R-CNN, YOLACT, SOLOv2
+- **Semantic Segmentation**: DeepLabV3+, SegFormer, SAM (Segment Anything)
+- **Image Classification**: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
+- **Video Analysis**: Object tracking (ByteTrack, SORT), action recognition
+- **3D Vision**: Depth estimation, point cloud processing, NeRF
+- **Production Deployment**: ONNX, TensorRT, OpenVINO, CoreML

 ## Tech Stack

-**Languages:** Python, SQL, R, Scala, Go
-**ML Frameworks:** PyTorch, TensorFlow, Scikit-learn, XGBoost
-**Data Tools:** Spark, Airflow, dbt, Kafka, Databricks
-**LLM Frameworks:** LangChain, LlamaIndex, DSPy
-**Deployment:** Docker, Kubernetes, AWS/GCP/Azure
-**Monitoring:** MLflow, Weights & Biases, Prometheus
-**Databases:** PostgreSQL, BigQuery, Snowflake, Pinecone
+| Category | Technologies |
+|----------|--------------|
+| Frameworks | PyTorch, torchvision, timm |
+| Detection | Ultralytics (YOLO), Detectron2, MMDetection |
+| Segmentation | segment-anything, mmsegmentation |
+| Optimization | ONNX, TensorRT, OpenVINO, torch.compile |
+| Image Processing | OpenCV, Pillow, albumentations |
+| Annotation | CVAT, Label Studio, Roboflow |
+| Experiment Tracking | MLflow, Weights & Biases |
+| Serving | Triton Inference Server, TorchServe |
+
+## Workflow 1: Object Detection Pipeline
+
+Use this workflow when building an object detection system from scratch.
+
+### Step 1: Define Detection Requirements
+
+Analyze the detection task requirements:
+
+```
+Detection Requirements Analysis:
+- Target objects: [list specific classes to detect]
+- Real-time requirement: [yes/no, target FPS]
+- Accuracy priority: [speed vs accuracy trade-off]
+- Deployment target: [cloud GPU, edge device, mobile]
+- Dataset size: [number of images, annotations per class]
+```
+
+### Step 2: Select Detection Architecture
+
+Choose architecture based on requirements:
+
+| Requirement | Recommended Architecture | Why |
+|-------------|-------------------------|-----|
+| Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed |
+| High accuracy | Faster R-CNN, DINO | Two-stage, better localization |
+| Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection |
+| Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures |
+| Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required |
+
+### Step 3: Prepare Dataset
+
+Convert annotations to required format:
+
+```bash
+# COCO format (recommended)
+python scripts/dataset_pipeline_builder.py data/images/ \
+    --annotations data/labels/ \
+    --format coco \
+    --split 0.8 0.1 0.1 \
+    --output data/coco/
+
+# Verify dataset
+python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
+```
+
+### Step 4: Configure Training
+
+Generate training configuration:
+
+```bash
+# For Ultralytics YOLO
+python scripts/vision_model_trainer.py data/coco/ \
+    --task detection \
+    --arch yolov8m \
+    --epochs 100 \
+    --batch 16 \
+    --imgsz 640 \
+    --output configs/
+
+# For Detectron2
+python scripts/vision_model_trainer.py data/coco/ \
+    --task detection \
+    --arch faster_rcnn_R_50_FPN \
+    --framework detectron2 \
+    --output configs/
+```
+
+### Step 5: Train and Validate
+
+```bash
+# Ultralytics training
+yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
+
+# Detectron2 training
+python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
+
+# Validate on test set
+yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
+```
+
+### Step 6: Evaluate Results
+
+Key metrics to analyze:
+
+| Metric | Target | Description |
+|--------|--------|-------------|
+| mAP@50 | >0.7 | Mean Average Precision at IoU 0.5 |
+| mAP@50:95 | >0.5 | COCO primary metric |
+| Precision | >0.8 | Low false positives |
+| Recall | >0.8 | Low missed detections |
+| Inference time | <33ms | For 30 FPS real-time |
+
+## Workflow 2: Model Optimization and Deployment
+
+Use this workflow when preparing a trained model for production deployment.
+
+### Step 1: Benchmark Baseline Performance
+
+```bash
+# Measure current model performance
+python scripts/inference_optimizer.py model.pt \
+    --benchmark \
+    --input-size 640 640 \
+    --batch-sizes 1 4 8 16 \
+    --warmup 10 \
+    --iterations 100
+```
+
+Expected output:
+
+```
+Baseline Performance (PyTorch FP32):
+- Batch 1: 45.2ms (22.1 FPS)
+- Batch 4: 89.4ms (44.7 FPS)
+- Batch 8: 165.3ms (48.4 FPS)
+- Memory: 2.1 GB
+- Parameters: 25.9M
+```
+
+### Step 2: Select Optimization Strategy
+
+| Deployment Target | Optimization Path |
+|-------------------|-------------------|
+| NVIDIA GPU (cloud) | PyTorch → ONNX → TensorRT FP16 |
+| NVIDIA GPU (edge) | PyTorch → TensorRT INT8 |
+| Intel CPU | PyTorch → ONNX → OpenVINO |
+| Apple Silicon | PyTorch → CoreML |
+| Generic CPU | PyTorch → ONNX Runtime |
+| Mobile | PyTorch → TFLite or ONNX Mobile |
+
+### Step 3: Export to ONNX
+
+```bash
+# Export with dynamic batch size
+python scripts/inference_optimizer.py model.pt \
+    --export onnx \
+    --input-size 640 640 \
+    --dynamic-batch \
+    --simplify \
+    --output model.onnx
+
+# Verify ONNX model
+python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
+```
+
+### Step 4: Apply Quantization (Optional)
+
+For INT8 quantization with calibration:
+
+```bash
+# Generate calibration dataset
+python scripts/inference_optimizer.py model.onnx \
+    --quantize int8 \
+    --calibration-data data/calibration/ \
+    --calibration-samples 500 \
+    --output model_int8.onnx
+```
+
+Quantization impact analysis:
+
+| Precision | Size | Speed | Accuracy Drop |
+|-----------|------|-------|---------------|
+| FP32 | 100% | 1x | 0% |
+| FP16 | 50% | 1.5-2x | <0.5% |
+| INT8 | 25% | 2-4x | 1-3% |
+
+### Step 5: Convert to Target Runtime
+
+```bash
+# TensorRT (NVIDIA GPU)
+trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
+
+# OpenVINO (Intel)
+mo --input_model model.onnx --output_dir openvino/
+
+# CoreML (Apple)
+python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
+```
+
+### Step 6: Benchmark Optimized Model
+
+```bash
+python scripts/inference_optimizer.py model.engine \
+    --benchmark \
+    --runtime tensorrt \
+    --compare model.pt
+```
+
+Expected speedup:
+
+```
+Optimization Results:
+- Original (PyTorch FP32): 45.2ms
+- Optimized (TensorRT FP16): 12.8ms
+- Speedup: 3.5x
+- Accuracy change: -0.3% mAP
+```
+
+## Workflow 3: Custom Dataset Preparation
+
+Use this workflow when preparing a computer vision dataset for training.
+
+### Step 1: Audit Raw Data
+
+```bash
+# Analyze image dataset
+python scripts/dataset_pipeline_builder.py data/raw/ \
+    --analyze \
+    --output analysis/
+```
+
+Analysis report includes:
+
+```
+Dataset Analysis:
+- Total images: 5,234
+- Image sizes: 640x480 to 4096x3072 (variable)
+- Formats: JPEG (4,891), PNG (343)
+- Corrupted: 12 files
+- Duplicates: 45 pairs
+
+Annotation Analysis:
+- Format detected: Pascal VOC XML
+- Total annotations: 28,456
+- Classes: 5 (car, person, bicycle, dog, cat)
+- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
+- Empty images: 234
+```
+
+### Step 2: Clean and Validate
+
+```bash
+# Remove corrupted and duplicate images
+python scripts/dataset_pipeline_builder.py data/raw/ \
+    --clean \
+    --remove-corrupted \
+    --remove-duplicates \
+    --output data/cleaned/
+```
+
+### Step 3: Convert Annotation Format
+
+```bash
+# Convert VOC to COCO format
+python scripts/dataset_pipeline_builder.py data/cleaned/ \
+    --annotations data/annotations/ \
+    --input-format voc \
+    --output-format coco \
+    --output data/coco/
+```
+
+Supported format conversions:
+
+| From | To |
+|------|-----|
+| Pascal VOC XML | COCO JSON |
+| YOLO TXT | COCO JSON |
+| COCO JSON | YOLO TXT |
+| LabelMe JSON | COCO JSON |
+| CVAT XML | COCO JSON |
+
+### Step 4: Apply Augmentations
+
+```bash
+# Generate augmentation config
+python scripts/dataset_pipeline_builder.py data/coco/ \
+    --augment \
+    --aug-config configs/augmentation.yaml \
+    --output data/augmented/
+```
+
+Recommended augmentations for detection:
+
+```yaml
+# configs/augmentation.yaml
+augmentations:
+  geometric:
+    - horizontal_flip: { p: 0.5 }
+    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
+    - rotate: { limit: 15, p: 0.3 }
+    - scale: { scale_limit: 0.2, p: 0.5 }
+
+  color:
+    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
+    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
+    - blur: { blur_limit: 3, p: 0.1 }
+
+  advanced:
+    - mosaic: { p: 0.5 }  # YOLO-style mosaic
+    - mixup: { p: 0.1 }   # Image mixing
+    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
+```
+
+### Step 5: Create Train/Val/Test Splits
+
+```bash
+python scripts/dataset_pipeline_builder.py data/augmented/ \
+    --split 0.8 0.1 0.1 \
+    --stratify \
+    --seed 42 \
+    --output data/final/
+```
+
+Split strategy guidelines:
+
+| Dataset Size | Train | Val | Test |
+|--------------|-------|-----|------|
+| <1,000 images | 70% | 15% | 15% |
+| 1,000-10,000 | 80% | 10% | 10% |
+| >10,000 | 90% | 5% | 5% |
+
+### Step 6: Generate Dataset Configuration
+
+```bash
+# For Ultralytics YOLO
+python scripts/dataset_pipeline_builder.py data/final/ \
+    --generate-config yolo \
+    --output data.yaml
+
+# For Detectron2
+python scripts/dataset_pipeline_builder.py data/final/ \
+    --generate-config detectron2 \
+    --output detectron2_config.py
+```
+
+## Architecture Selection Guide
+
+### Object Detection Architectures
+
+| Architecture | Speed | Accuracy | Best For |
+|--------------|-------|----------|----------|
+| YOLOv8n | 1.2ms | 37.3 mAP | Edge, mobile, real-time |
+| YOLOv8s | 2.1ms | 44.9 mAP | Balanced speed/accuracy |
+| YOLOv8m | 4.2ms | 50.2 mAP | General purpose |
+| YOLOv8l | 6.8ms | 52.9 mAP | High accuracy |
+| YOLOv8x | 10.1ms | 53.9 mAP | Maximum accuracy |
+| RT-DETR-L | 5.3ms | 53.0 mAP | Transformer, no NMS |
+| Faster R-CNN R50 | 46ms | 40.2 mAP | Two-stage, high quality |
+| DINO-4scale | 85ms | 49.0 mAP | SOTA transformer |
+
+### Segmentation Architectures
+
+| Architecture | Type | Speed | Best For |
+|--------------|------|-------|----------|
+| YOLOv8-seg | Instance | 4.5ms | Real-time instance seg |
+| Mask R-CNN | Instance | 67ms | High-quality masks |
+| SAM | Promptable | 50ms | Zero-shot segmentation |
+| DeepLabV3+ | Semantic | 25ms | Scene parsing |
+| SegFormer | Semantic | 15ms | Efficient semantic seg |
+
+### CNN vs Vision Transformer Trade-offs
+
+| Aspect | CNN (YOLO, R-CNN) | ViT (DETR, DINO) |
+|--------|-------------------|------------------|
+| Training data needed | 1K-10K images | 10K-100K+ images |
+| Training time | Fast | Slow (needs more epochs) |
+| Inference speed | Faster | Slower |
+| Small objects | Good with FPN | Needs multi-scale |
+| Global context | Limited | Excellent |
+| Positional encoding | Implicit | Explicit |

 ## Reference Documentation

 ### 1. Computer Vision Architectures

-Comprehensive guide available in `references/computer_vision_architectures.md` covering:
+See `references/computer_vision_architectures.md` for:

- Advanced patterns and best practices
- Production implementation strategies
- Performance optimization techniques
- Scalability considerations
- Security and compliance
- Real-world case studies
+- CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
+- Vision Transformer variants (ViT, DeiT, Swin)
+- Detection heads (anchor-based vs anchor-free)
+- Feature Pyramid Networks (FPN, BiFPN, PANet)
+- Neck architectures for multi-scale detection

 ### 2. Object Detection Optimization

-Complete workflow documentation in `references/object_detection_optimization.md` including:
+See `references/object_detection_optimization.md` for:

- Step-by-step processes
- Architecture design patterns
- Tool integration guides
- Performance tuning strategies
- Troubleshooting procedures
+- Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
+- Anchor optimization and anchor-free alternatives
+- Loss function design (focal loss, GIoU, CIoU, DIoU)
+- Training strategies (warmup, cosine annealing, EMA)
+- Data augmentation for detection (mosaic, mixup, copy-paste)

 ### 3. Production Vision Systems

-Technical reference guide in `references/production_vision_systems.md` with:
+See `references/production_vision_systems.md` for:

- System design principles
- Implementation examples
- Configuration best practices
- Deployment strategies
- Monitoring and observability
-
-## Production Patterns
-
-### Pattern 1: Scalable Data Processing
-
-Enterprise-scale data processing with distributed computing:
-
- Horizontal scaling architecture
- Fault-tolerant design
- Real-time and batch processing
- Data quality validation
- Performance monitoring
-
-### Pattern 2: ML Model Deployment
-
-Production ML system with high availability:
-
- Model serving with low latency
- A/B testing infrastructure
- Feature store integration
- Model monitoring and drift detection
- Automated retraining pipelines
-
-### Pattern 3: Real-Time Inference
-
-High-throughput inference system:
-
- Batching and caching strategies
- Load balancing
- Auto-scaling
- Latency optimization
- Cost optimization
-
-## Best Practices
-
-### Development
-
- Test-driven development
- Code reviews and pair programming
- Documentation as code
- Version control everything
- Continuous integration
-
-### Production
-
- Monitor everything critical
- Automate deployments
- Feature flags for releases
- Canary deployments
- Comprehensive logging
-
-### Team Leadership
-
- Mentor junior engineers
- Drive technical decisions
- Establish coding standards
- Foster learning culture
- Cross-functional collaboration
-
-## Performance Targets
-
-**Latency:**
- P50: < 50ms
- P95: < 100ms
- P99: < 200ms
-
-**Throughput:**
- Requests/second: > 1000
- Concurrent users: > 10,000
-
-**Availability:**
- Uptime: 99.9%
- Error rate: < 0.1%
-
-## Security & Compliance
-
- Authentication & authorization
- Data encryption (at rest & in transit)
- PII handling and anonymization
- GDPR/CCPA compliance
- Regular security audits
- Vulnerability management
+- ONNX export and optimization
+- TensorRT deployment pipeline
+- Batch inference optimization
+- Edge device deployment (Jetson, Intel NCS)
+- Model serving with Triton
+- Video processing pipelines

 ## Common Commands

+### Ultralytics YOLO
+
 ```bash
-# Development
-python -m pytest tests/ -v --cov
-python -m black src/
-python -m pylint src/
-
 # Training
-python scripts/train.py --config prod.yaml
-python scripts/evaluate.py --model best.pth
+yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640

-# Deployment
-docker build -t service:v1 .
-kubectl apply -f k8s/
-helm upgrade service ./charts/
+# Validation
+yolo detect val model=best.pt data=coco.yaml

-# Monitoring
-kubectl logs -f deployment/service
-python scripts/health_check.py
+# Inference
+yolo detect predict model=best.pt source=images/ save=True
+
+# Export
+yolo export model=best.pt format=onnx simplify=True dynamic=True
 ```

+### Detectron2
+
+```bash
+# Training
+python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
+    --num-gpus 1 OUTPUT_DIR ./output
+
+# Evaluation
+python train_net.py --config-file configs/faster_rcnn.yaml --eval-only \
+    MODEL.WEIGHTS output/model_final.pth
+
+# Inference
+python demo.py --config-file configs/faster_rcnn.yaml \
+    --input images/*.jpg --output results/ \
+    --opts MODEL.WEIGHTS output/model_final.pth
+```
+
+### MMDetection
+
+```bash
+# Training
+python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py
+
+# Testing
+python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox
+
+# Inference
+python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth
+```
+
+### Model Optimization
+
+```bash
+# ONNX export and simplify
+python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
+python -m onnxsim model.onnx model_sim.onnx
+
+# TensorRT conversion
+trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096
+
+# Benchmark
+trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100
+```
+
+## Performance Targets
+
+| Metric | Real-time | High Accuracy | Edge |
+|--------|-----------|---------------|------|
+| FPS | >30 | >10 | >15 |
+| mAP@50 | >0.6 | >0.8 | >0.5 |
+| Latency P99 | <50ms | <150ms | <100ms |
+| GPU Memory | <4GB | <8GB | <2GB |
+| Model Size | <50MB | <200MB | <20MB |
+
 ## Resources

- Advanced Patterns: `references/computer_vision_architectures.md`
- Implementation Guide: `references/object_detection_optimization.md`
- Technical Reference: `references/production_vision_systems.md`
- Automation Scripts: `scripts/` directory
-
-## Senior-Level Responsibilities
-
-As a world-class senior professional:
-
-1. **Technical Leadership**
-   - Drive architectural decisions
-   - Mentor team members
-   - Establish best practices
-   - Ensure code quality
-
-2. **Strategic Thinking**
-   - Align with business goals
-   - Evaluate trade-offs
-   - Plan for scale
-   - Manage technical debt
-
-3. **Collaboration**
-   - Work across teams
-   - Communicate effectively
-   - Build consensus
-   - Share knowledge
-
-4. **Innovation**
-   - Stay current with research
-   - Experiment with new approaches
-   - Contribute to community
-   - Drive continuous improvement
-
-5. **Production Excellence**
-   - Ensure high availability
-   - Monitor proactively
-   - Optimize performance
-   - Respond to incidents
+- **Architecture Guide**: `references/computer_vision_architectures.md`
+- **Optimization Guide**: `references/object_detection_optimization.md`
+- **Deployment Guide**: `references/production_vision_systems.md`
+- **Scripts**: `scripts/` directory for automation tools
--- a/engineering-team/senior-computer-vision/references/computer_vision_architectures.md
+++ b/engineering-team/senior-computer-vision/references/computer_vision_architectures.md
@@ -1,80 +1,683 @@
 # Computer Vision Architectures

-## Overview
+Comprehensive guide to CNN and Vision Transformer architectures for object detection, segmentation, and image classification.

-World-class computer vision architectures for senior computer vision engineer.
+## Table of Contents

-## Core Principles
+- [Backbone Architectures](#backbone-architectures)
+- [Detection Architectures](#detection-architectures)
+- [Segmentation Architectures](#segmentation-architectures)
+- [Vision Transformers](#vision-transformers)
+- [Feature Pyramid Networks](#feature-pyramid-networks)
+- [Architecture Selection](#architecture-selection)

-### Production-First Design
+---

-Always design with production in mind:
- Scalability: Handle 10x current load
- Reliability: 99.9% uptime target
- Maintainability: Clear, documented code
- Observability: Monitor everything
+## Backbone Architectures

-### Performance by Design
+Backbone networks extract feature representations from images. The choice of backbone affects both accuracy and inference speed.

-Optimize from the start:
- Efficient algorithms
- Resource awareness
- Strategic caching
- Batch processing
+### ResNet Family

-### Security & Privacy
+ResNet introduced residual connections that enable training of very deep networks.

-Build security in:
- Input validation
- Data encryption
- Access control
- Audit logging
+| Variant | Params | GFLOPs | Top-1 Acc | Use Case |
+|---------|--------|--------|-----------|----------|
+| ResNet-18 | 11.7M | 1.8 | 69.8% | Edge, mobile |
+| ResNet-34 | 21.8M | 3.7 | 73.3% | Balanced |
+| ResNet-50 | 25.6M | 4.1 | 76.1% | Standard backbone |
+| ResNet-101 | 44.5M | 7.8 | 77.4% | High accuracy |
+| ResNet-152 | 60.2M | 11.6 | 78.3% | Maximum accuracy |

-## Advanced Patterns
+**Residual Block Architecture:**

-### Pattern 1: Distributed Processing
+```
+Input
+  |
+  +---> Conv 1x1 (reduce channels)
+  |         |
+  |     Conv 3x3
+  |         |
+  |     Conv 1x1 (expand channels)
+  |         |
+  +-----> Add <----+
+            |
+         ReLU
+            |
+         Output
+```

-Enterprise-scale data processing with fault tolerance.
+**When to use ResNet:**
+- Standard detection/segmentation tasks
+- When pretrained weights are important
+- Moderate compute budget
+- Well-understood, stable architecture

-### Pattern 2: Real-Time Systems
+### EfficientNet Family

-Low-latency, high-throughput systems.
+EfficientNet uses compound scaling to balance depth, width, and resolution.

-### Pattern 3: ML at Scale
+| Variant | Params | GFLOPs | Top-1 Acc | Relative Speed |
+|---------|--------|--------|-----------|----------------|
+| EfficientNet-B0 | 5.3M | 0.4 | 77.1% | 1x |
+| EfficientNet-B1 | 7.8M | 0.7 | 79.1% | 0.7x |
+| EfficientNet-B2 | 9.2M | 1.0 | 80.1% | 0.6x |
+| EfficientNet-B3 | 12M | 1.8 | 81.6% | 0.4x |
+| EfficientNet-B4 | 19M | 4.2 | 82.9% | 0.25x |
+| EfficientNet-B5 | 30M | 9.9 | 83.6% | 0.15x |
+| EfficientNet-B6 | 43M | 19 | 84.0% | 0.1x |
+| EfficientNet-B7 | 66M | 37 | 84.3% | 0.05x |

-Production ML with monitoring and automation.
+**Key innovations:**
+- Mobile Inverted Bottleneck (MBConv) blocks
+- Squeeze-and-Excitation attention
+- Compound scaling coefficients
+- Swish activation function

-## Best Practices
+**When to use EfficientNet:**
+- Mobile and edge deployment
+- When parameter efficiency matters
+- Classification tasks
+- Limited compute resources

-### Code Quality
- Comprehensive testing
- Clear documentation
- Code reviews
- Type hints
+### ConvNeXt

-### Performance
- Profile before optimizing
- Monitor continuously
- Cache strategically
- Batch operations
+ConvNeXt modernizes ResNet with techniques from Vision Transformers.

-### Reliability
- Design for failure
- Implement retries
- Use circuit breakers
- Monitor health
+| Variant | Params | GFLOPs | Top-1 Acc |
+|---------|--------|--------|-----------|
+| ConvNeXt-T | 29M | 4.5 | 82.1% |
+| ConvNeXt-S | 50M | 8.7 | 83.1% |
+| ConvNeXt-B | 89M | 15.4 | 83.8% |
+| ConvNeXt-L | 198M | 34.4 | 84.3% |
+| ConvNeXt-XL | 350M | 60.9 | 84.7% |

-## Tools & Technologies
+**Key design choices:**
+- 7x7 depthwise convolutions (like ViT patch size)
+- Layer normalization instead of batch norm
+- GELU activation
+- Fewer but wider stages
+- Inverted bottleneck design

-Essential tools for this domain:
- Development frameworks
- Testing libraries
- Deployment platforms
- Monitoring solutions
+**ConvNeXt Block:**

-## Further Reading
+```
+Input
+  |
+  +---> DWConv 7x7
+  |         |
+  |     LayerNorm
+  |         |
+  |     Linear (4x channels)
+  |         |
+  |     GELU
+  |         |
+  |     Linear (1x channels)
+  |         |
+  +-----> Add <----+
+            |
+         Output
+```

- Research papers
- Industry blogs
- Conference talks
- Open source projects
+### CSPNet (Cross Stage Partial)
+
+CSPNet is the backbone design used in YOLO v4-v8.
+
+**Key features:**
+- Gradient flow optimization
+- Reduced computation while maintaining accuracy
+- Cross-stage partial connections
+- Optimized for real-time detection
+
+**CSP Block:**
+
+```
+Input
+  |
+  +----> Split ----+
+  |                |
+  |            Conv Block
+  |                |
+  |            Conv Block
+  |                |
+  +----> Concat <--+
+            |
+         Output
+```
+
+---
+
+## Detection Architectures
+
+### Two-Stage Detectors
+
+Two-stage detectors first propose regions, then classify and refine them.
+
+#### Faster R-CNN
+
+Architecture:
+1. **Backbone**: Feature extraction (ResNet, etc.)
+2. **RPN (Region Proposal Network)**: Generate object proposals
+3. **RoI Pooling/Align**: Extract fixed-size features
+4. **Classification Head**: Classify and refine boxes
+
+```
+Image → Backbone → Feature Map
+                      |
+                      +→ RPN → Proposals
+                      |           |
+                      +→ RoI Align ← +
+                            |
+                      FC Layers
+                            |
+                    Class + BBox
+```
+
+**RPN Details:**
+- Sliding window over feature map
+- Anchor boxes at each position (3 scales × 3 ratios = 9)
+- Predicts objectness score and box refinement
+- NMS to reduce proposals (typically 300-2000)
+
+**Performance characteristics:**
+- mAP@50:95: ~40-42 (COCO, R50-FPN)
+- Inference: ~50-100ms per image
+- Better localization than single-stage
+- Slower but more accurate
+
+#### Cascade R-CNN
+
+Multi-stage refinement with increasing IoU thresholds.
+
+```
+Stage 1 (IoU 0.5) → Stage 2 (IoU 0.6) → Stage 3 (IoU 0.7)
+```
+
+**Benefits:**
+- Progressive refinement
+- Better high-IoU predictions
+- +3-4 mAP over Faster R-CNN
+- Minimal additional cost per stage
+
+### Single-Stage Detectors
+
+Single-stage detectors predict boxes and classes in one pass.
+
+#### YOLO Family
+
+**YOLOv8 Architecture:**
+
+```
+Input Image
+     |
+  Backbone (CSPDarknet)
+     |
+  +--+--+--+
+  |  |  |  |
+ P3 P4 P5 (multi-scale features)
+  |  |  |
+  Neck (PANet + C2f)
+  |  |  |
+  Head (Decoupled)
+     |
+ Boxes + Classes
+```
+
+**Key YOLOv8 innovations:**
+- C2f module (faster CSP variant)
+- Anchor-free detection head
+- Decoupled classification/regression heads
+- Task-aligned assigner (TAL)
+- Distribution focal loss (DFL)
+
+**YOLO variant comparison:**
+
+| Model | Size (px) | Params | mAP@50:95 | Speed (ms) |
+|-------|-----------|--------|-----------|------------|
+| YOLOv5n | 640 | 1.9M | 28.0 | 1.2 |
+| YOLOv5s | 640 | 7.2M | 37.4 | 1.8 |
+| YOLOv5m | 640 | 21.2M | 45.4 | 3.5 |
+| YOLOv8n | 640 | 3.2M | 37.3 | 1.2 |
+| YOLOv8s | 640 | 11.2M | 44.9 | 2.1 |
+| YOLOv8m | 640 | 25.9M | 50.2 | 4.2 |
+| YOLOv8l | 640 | 43.7M | 52.9 | 6.8 |
+| YOLOv8x | 640 | 68.2M | 53.9 | 10.1 |
+
+#### SSD (Single Shot Detector)
+
+Multi-scale detection with default boxes.
+
+**Architecture:**
+- VGG16 or MobileNet backbone
+- Additional convolution layers for multi-scale
+- Default boxes at each scale
+- Direct classification and regression
+
+**When to use SSD:**
+- Edge deployment (SSD-MobileNet)
+- When YOLO alternatives needed
+- Simple architecture requirements
+
+#### RetinaNet
+
+Focal loss to handle class imbalance.
+
+**Key innovation:**
+```python
+FL(p_t) = -α_t * (1 - p_t)^γ * log(p_t)
+```
+
+Where:
+- γ (focusing parameter) = 2 typically
+- α (class weight) = 0.25 for background
+
+**Benefits:**
+- Handles extreme foreground-background imbalance
+- Matches two-stage accuracy
+- Single-stage speed
+
+---
+
+## Segmentation Architectures
+
+### Instance Segmentation
+
+#### Mask R-CNN
+
+Extends Faster R-CNN with mask prediction branch.
+
+```
+RoI Features → FC Layers → Class + BBox
+      |
+      +→ Conv Layers → Mask (28×28 per class)
+```
+
+**Key details:**
+- RoI Align (bilinear interpolation, no quantization)
+- Per-class binary mask prediction
+- Decoupled mask and classification
+- 14×14 or 28×28 mask resolution
+
+**Performance:**
+- mAP (box): ~39 on COCO
+- mAP (mask): ~35 on COCO
+- Inference: ~100-200ms
+
+#### YOLACT / YOLACT++
+
+Real-time instance segmentation.
+
+**Approach:**
+1. Generate prototype masks (global)
+2. Predict mask coefficients per instance
+3. Linear combination: mask = Σ(coefficients × prototypes)
+
+**Benefits:**
+- Real-time (~30 FPS)
+- Simpler than Mask R-CNN
+- Global prototypes capture spatial info
+
+#### YOLOv8-Seg
+
+Adds segmentation head to YOLOv8.
+
+**Performance:**
+- mAP (box): 44.6
+- mAP (mask): 36.8
+- Speed: 4.5ms
+
+### Semantic Segmentation
+
+#### DeepLabV3+
+
+Atrous convolutions for multi-scale context.
+
+**Key components:**
+1. **ASPP (Atrous Spatial Pyramid Pooling)**
+   - Parallel atrous convolutions at different rates
+   - Captures multi-scale context
+   - Rates: 6, 12, 18 typically
+
+2. **Encoder-Decoder**
+   - Encoder: Backbone + ASPP
+   - Decoder: Upsample with skip connections
+
+```
+Image → Backbone → ASPP → Decoder → Segmentation
+              ↘    ↗
+          Low-level features
+```
+
+**Performance:**
+- mIoU: 89.0 on Cityscapes
+- Inference: ~25ms (ResNet-50)
+
+#### SegFormer
+
+Transformer-based semantic segmentation.
+
+**Architecture:**
+1. **Hierarchical Transformer Encoder**
+   - Multi-scale feature maps
+   - Efficient self-attention
+   - Overlapping patch embedding
+
+2. **MLP Decoder**
+   - Simple MLP aggregation
+   - No complex decoders needed
+
+**Benefits:**
+- No positional encoding needed
+- Efficient attention mechanism
+- Strong multi-scale features
+
+### Promptable Segmentation
+
+#### SAM (Segment Anything Model)
+
+Zero-shot segmentation with prompts.
+
+**Architecture:**
+1. **Image Encoder**: ViT-H (632M params)
+2. **Prompt Encoder**: Points, boxes, masks, text
+3. **Mask Decoder**: Lightweight transformer
+
+**Prompts supported:**
+- Points (foreground/background)
+- Bounding boxes
+- Rough masks
+- Text (via CLIP integration)
+
+**Usage patterns:**
+```python
+# Point prompt
+masks = sam.predict(image, point_coords=[[500, 375]], point_labels=[1])
+
+# Box prompt
+masks = sam.predict(image, box=[100, 100, 400, 400])
+
+# Multiple points
+masks = sam.predict(image, point_coords=[[500, 375], [200, 300]],
+                   point_labels=[1, 0])  # 1=foreground, 0=background
+```
+
+---
+
+## Vision Transformers
+
+### ViT (Vision Transformer)
+
+Original vision transformer architecture.
+
+**Architecture:**
+
+```
+Image → Patch Embedding → [CLS] + Position Embedding
+                              ↓
+                    Transformer Encoder ×L
+                              ↓
+                         [CLS] token
+                              ↓
+                     Classification Head
+```
+
+**Key details:**
+- Patch size: 16×16 or 14×14 typically
+- Position embeddings: Learned 1D
+- [CLS] token for classification
+- Standard transformer encoder blocks
+
+**Variants:**
+
+| Model | Patch | Layers | Hidden | Heads | Params |
+|-------|-------|--------|--------|-------|--------|
+| ViT-Ti | 16 | 12 | 192 | 3 | 5.7M |
+| ViT-S | 16 | 12 | 384 | 6 | 22M |
+| ViT-B | 16 | 12 | 768 | 12 | 86M |
+| ViT-L | 16 | 24 | 1024 | 16 | 304M |
+| ViT-H | 14 | 32 | 1280 | 16 | 632M |
+
+### DeiT (Data-efficient Image Transformers)
+
+Training ViT without massive datasets.
+
+**Key innovations:**
+- Knowledge distillation from CNN teachers
+- Strong data augmentation
+- Regularization (stochastic depth, label smoothing)
+- Distillation token (learns from teacher)
+
+**Training recipe:**
+- RandAugment
+- Mixup (α=0.8)
+- CutMix (α=1.0)
+- Random erasing (p=0.25)
+- Stochastic depth (p=0.1)
+
+### Swin Transformer
+
+Hierarchical transformer with shifted windows.
+
+**Key innovations:**
+1. **Shifted Window Attention**
+   - Local attention within windows
+   - Cross-window connection via shifting
+   - O(n) complexity vs O(n²) for global attention
+
+2. **Hierarchical Feature Maps**
+   - Patch merging between stages
+   - Similar to CNN feature pyramids
+   - Direct use in detection/segmentation
+
+**Architecture:**
+
+```
+Stage 1: 56×56, 96-dim   → Patch Merge
+Stage 2: 28×28, 192-dim  → Patch Merge
+Stage 3: 14×14, 384-dim  → Patch Merge
+Stage 4: 7×7, 768-dim
+```
+
+**Variants:**
+
+| Model | Params | GFLOPs | Top-1 |
+|-------|--------|--------|-------|
+| Swin-T | 29M | 4.5 | 81.3% |
+| Swin-S | 50M | 8.7 | 83.0% |
+| Swin-B | 88M | 15.4 | 83.5% |
+| Swin-L | 197M | 34.5 | 84.5% |
+
+---
+
+## Feature Pyramid Networks
+
+FPN variants for multi-scale detection.
+
+### Original FPN
+
+Top-down pathway with lateral connections.
+
+```
+P5 ← C5 (1/32)
+ ↓
+P4 ← C4 + Upsample(P5) (1/16)
+ ↓
+P3 ← C3 + Upsample(P4) (1/8)
+ ↓
+P2 ← C2 + Upsample(P3) (1/4)
+```
+
+### PANet (Path Aggregation Network)
+
+Bottom-up augmentation after FPN.
+
+```
+FPN top-down → Bottom-up augmentation
+P2 → N2 ↘
+P3 → N3 → N3 ↘
+P4 → N4 → N4 → N4 ↘
+P5 → N5 → N5 → N5 → N5
+```
+
+**Benefits:**
+- Shorter path from low-level to high-level
+- Better localization signals
+- +1-2 mAP improvement
+
+### BiFPN (Bidirectional FPN)
+
+Weighted bidirectional feature fusion.
+
+**Key innovations:**
+- Learnable fusion weights
+- Bidirectional cross-scale connections
+- Repeated blocks for iterative refinement
+
+**Fusion formula:**
+```
+O = Σ(w_i × I_i) / (ε + Σ w_i)
+```
+
+Where weights are learned via fast normalized fusion.
+
+### NAS-FPN
+
+Neural architecture search for FPN design.
+
+**Searched on COCO:**
+- 7 fusion cells
+- Optimized connection patterns
+- 3-4 mAP improvement over FPN
+
+---
+
+## Architecture Selection
+
+### Decision Matrix
+
+| Requirement | Recommended | Alternative |
+|-------------|-------------|-------------|
+| Real-time (>30 FPS) | YOLOv8s | RT-DETR-S |
+| Edge (<4GB RAM) | YOLOv8n | MobileNetV3-SSD |
+| High accuracy | DINO, Cascade R-CNN | YOLOv8x |
+| Instance segmentation | Mask R-CNN | YOLOv8-seg |
+| Semantic segmentation | SegFormer | DeepLabV3+ |
+| Zero-shot | SAM | CLIP+segmentation |
+| Small objects | YOLO+SAHI | Cascade R-CNN |
+| Video real-time | YOLOv8 + ByteTrack | YOLOX + SORT |
+
+### Training Data Requirements
+
+| Architecture | Minimum Images | Recommended |
+|--------------|----------------|-------------|
+| YOLO (fine-tune) | 100-500 | 1,000-5,000 |
+| YOLO (from scratch) | 5,000+ | 10,000+ |
+| Faster R-CNN | 1,000+ | 5,000+ |
+| DETR/DINO | 10,000+ | 50,000+ |
+| ViT backbone | 10,000+ | 100,000+ |
+| SAM (fine-tune) | 100-1,000 | 5,000+ |
+
+### Compute Requirements
+
+| Architecture | Training GPU | Inference GPU |
+|--------------|--------------|---------------|
+| YOLOv8n | 4GB VRAM | 2GB VRAM |
+| YOLOv8m | 8GB VRAM | 4GB VRAM |
+| YOLOv8x | 16GB VRAM | 8GB VRAM |
+| Faster R-CNN R50 | 8GB VRAM | 4GB VRAM |
+| Mask R-CNN R101 | 16GB VRAM | 8GB VRAM |
+| DINO-4scale | 32GB VRAM | 16GB VRAM |
+| SAM ViT-H | 32GB VRAM | 8GB VRAM |
+
+---
+
+## Code Examples
+
+### Load Pretrained Backbone (timm)
+
+```python
+import timm
+
+# List available models
+print(timm.list_models('*resnet*'))
+
+# Load pretrained
+backbone = timm.create_model('resnet50', pretrained=True, features_only=True)
+
+# Get feature maps
+features = backbone(torch.randn(1, 3, 224, 224))
+for f in features:
+    print(f.shape)
+# torch.Size([1, 64, 56, 56])
+# torch.Size([1, 256, 56, 56])
+# torch.Size([1, 512, 28, 28])
+# torch.Size([1, 1024, 14, 14])
+# torch.Size([1, 2048, 7, 7])
+```
+
+### Custom Detection Backbone
+
+```python
+import torch.nn as nn
+from torchvision.models import resnet50
+from torchvision.ops import FeaturePyramidNetwork
+
+class DetectionBackbone(nn.Module):
+    def __init__(self):
+        super().__init__()
+        backbone = resnet50(pretrained=True)
+
+        self.layer1 = nn.Sequential(backbone.conv1, backbone.bn1,
+                                     backbone.relu, backbone.maxpool,
+                                     backbone.layer1)
+        self.layer2 = backbone.layer2
+        self.layer3 = backbone.layer3
+        self.layer4 = backbone.layer4
+
+        self.fpn = FeaturePyramidNetwork(
+            in_channels_list=[256, 512, 1024, 2048],
+            out_channels=256
+        )
+
+    def forward(self, x):
+        c1 = self.layer1(x)
+        c2 = self.layer2(c1)
+        c3 = self.layer3(c2)
+        c4 = self.layer4(c3)
+
+        features = {'feat0': c1, 'feat1': c2, 'feat2': c3, 'feat3': c4}
+        pyramid = self.fpn(features)
+        return pyramid
+```
+
+### Vision Transformer with Detection Head
+
+```python
+import timm
+
+# Swin Transformer for detection
+swin = timm.create_model('swin_base_patch4_window7_224',
+                          pretrained=True,
+                          features_only=True,
+                          out_indices=[0, 1, 2, 3])
+
+# Get multi-scale features
+x = torch.randn(1, 3, 224, 224)
+features = swin(x)
+for i, f in enumerate(features):
+    print(f"Stage {i}: {f.shape}")
+# Stage 0: torch.Size([1, 128, 56, 56])
+# Stage 1: torch.Size([1, 256, 28, 28])
+# Stage 2: torch.Size([1, 512, 14, 14])
+# Stage 3: torch.Size([1, 1024, 7, 7])
+```
+
+---
+
+## Resources
+
+- [torchvision models](https://pytorch.org/vision/stable/models.html)
+- [timm library](https://github.com/huggingface/pytorch-image-models)
+- [Detectron2 Model Zoo](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md)
+- [MMDetection Model Zoo](https://github.com/open-mmlab/mmdetection/blob/main/docs/en/model_zoo.md)
+- [Ultralytics YOLOv8](https://docs.ultralytics.com/)
--- a/engineering-team/senior-computer-vision/references/object_detection_optimization.md
+++ b/engineering-team/senior-computer-vision/references/object_detection_optimization.md
@@ -1,80 +1,885 @@
 # Object Detection Optimization

-## Overview
+Comprehensive guide to optimizing object detection models for accuracy and inference speed.

-World-class object detection optimization for senior computer vision engineer.
+## Table of Contents

-## Core Principles
+- [Non-Maximum Suppression](#non-maximum-suppression)
+- [Anchor Design and Optimization](#anchor-design-and-optimization)
+- [Loss Functions](#loss-functions)
+- [Training Strategies](#training-strategies)
+- [Data Augmentation](#data-augmentation)
+- [Model Optimization Techniques](#model-optimization-techniques)
+- [Hyperparameter Tuning](#hyperparameter-tuning)

-### Production-First Design
+---

-Always design with production in mind:
- Scalability: Handle 10x current load
- Reliability: 99.9% uptime target
- Maintainability: Clear, documented code
- Observability: Monitor everything
+## Non-Maximum Suppression

-### Performance by Design
+NMS removes redundant overlapping detections to produce final predictions.

-Optimize from the start:
- Efficient algorithms
- Resource awareness
- Strategic caching
- Batch processing
+### Standard NMS

-### Security & Privacy
+Basic algorithm:
+1. Sort boxes by confidence score
+2. Select highest confidence box
+3. Remove boxes with IoU > threshold
+4. Repeat until no boxes remain

-Build security in:
- Input validation
- Data encryption
- Access control
- Audit logging
+```python
+def nms(boxes, scores, iou_threshold=0.5):
+    """
+    boxes: (N, 4) in format [x1, y1, x2, y2]
+    scores: (N,)
+    """
+    order = scores.argsort()[::-1]
+    keep = []

-## Advanced Patterns
+    while len(order) > 0:
+        i = order[0]
+        keep.append(i)

-### Pattern 1: Distributed Processing
+        if len(order) == 1:
+            break

-Enterprise-scale data processing with fault tolerance.
+        # Calculate IoU with remaining boxes
+        ious = compute_iou(boxes[i], boxes[order[1:]])

-### Pattern 2: Real-Time Systems
+        # Keep boxes with IoU <= threshold
+        mask = ious <= iou_threshold
+        order = order[1:][mask]

-Low-latency, high-throughput systems.
+    return keep
+```

-### Pattern 3: ML at Scale
+**Parameters:**
+- `iou_threshold`: 0.5-0.7 typical (lower = more suppression)
+- `score_threshold`: 0.25-0.5 (filter low-confidence first)

-Production ML with monitoring and automation.
+### Soft-NMS

-## Best Practices
+Reduces scores instead of removing boxes entirely.

-### Code Quality
- Comprehensive testing
- Clear documentation
- Code reviews
- Type hints
+**Formula:**
+```
+score = score * exp(-IoU^2 / sigma)
+```

-### Performance
- Profile before optimizing
- Monitor continuously
- Cache strategically
- Batch operations
+**Benefits:**
+- Better for overlapping objects
+- +1-2% mAP improvement
+- Slightly slower than hard NMS

-### Reliability
- Design for failure
- Implement retries
- Use circuit breakers
- Monitor health
+```python
+def soft_nms(boxes, scores, sigma=0.5, score_threshold=0.001):
+    """Gaussian penalty soft-NMS"""
+    order = scores.argsort()[::-1]
+    keep = []

-## Tools & Technologies
+    while len(order) > 0:
+        i = order[0]
+        keep.append(i)

-Essential tools for this domain:
- Development frameworks
- Testing libraries
- Deployment platforms
- Monitoring solutions
+        if len(order) == 1:
+            break

-## Further Reading
+        ious = compute_iou(boxes[i], boxes[order[1:]])

- Research papers
- Industry blogs
- Conference talks
- Open source projects
+        # Gaussian penalty
+        weights = np.exp(-ious**2 / sigma)
+        scores[order[1:]] *= weights
+
+        # Re-sort by updated scores
+        mask = scores[order[1:]] > score_threshold
+        order = order[1:][mask]
+        order = order[scores[order].argsort()[::-1]]
+
+    return keep
+```
+
+### DIoU-NMS
+
+Uses Distance-IoU instead of standard IoU.
+
+**Formula:**
+```
+DIoU = IoU - (d^2 / c^2)
+```
+
+Where:
+- d = center distance between boxes
+- c = diagonal of smallest enclosing box
+
+**Benefits:**
+- Better for occluded objects
+- Penalizes distant boxes less
+- Works well with DIoU loss
+
+### Batched NMS
+
+NMS per class (prevents cross-class suppression).
+
+```python
+def batched_nms(boxes, scores, classes, iou_threshold):
+    """Per-class NMS"""
+    # Offset boxes by class ID to prevent cross-class suppression
+    max_coordinate = boxes.max()
+    offsets = classes * (max_coordinate + 1)
+    boxes_for_nms = boxes + offsets[:, None]
+
+    keep = torchvision.ops.nms(boxes_for_nms, scores, iou_threshold)
+    return keep
+```
+
+### NMS-Free Detection (DETR-style)
+
+Transformer-based detectors eliminate NMS.
+
+**How DETR avoids NMS:**
+- Object queries are learned embeddings
+- Bipartite matching in training
+- Each query outputs exactly one detection
+- Set-based loss enforces uniqueness
+
+**Benefits:**
+- End-to-end differentiable
+- No hand-crafted post-processing
+- Better for complex scenes
+
+---
+
+## Anchor Design and Optimization
+
+### Anchor-Based Detection
+
+Traditional detectors use predefined anchor boxes.
+
+**Anchor parameters:**
+- Scales: [32, 64, 128, 256, 512] pixels
+- Ratios: [0.5, 1.0, 2.0] (height/width)
+- Stride: Feature map stride (8, 16, 32)
+
+**Anchor assignment:**
+- Positive: IoU > 0.7 with ground truth
+- Negative: IoU < 0.3 with all ground truths
+- Ignored: 0.3 < IoU < 0.7
+
+### K-Means Anchor Clustering
+
+Optimize anchors for your dataset.
+
+```python
+import numpy as np
+from sklearn.cluster import KMeans
+
+def optimize_anchors(annotations, num_anchors=9, image_size=640):
+    """
+    annotations: list of (width, height) for each bounding box
+    """
+    # Normalize to input size
+    boxes = np.array(annotations)
+    boxes = boxes / boxes.max() * image_size
+
+    # K-means clustering
+    kmeans = KMeans(n_clusters=num_anchors, random_state=42)
+    kmeans.fit(boxes)
+
+    # Get anchor sizes
+    anchors = kmeans.cluster_centers_
+
+    # Sort by area
+    areas = anchors[:, 0] * anchors[:, 1]
+    anchors = anchors[np.argsort(areas)]
+
+    # Calculate mean IoU with ground truth
+    mean_iou = calculate_anchor_fit(boxes, anchors)
+    print(f"Optimized anchors (mean IoU: {mean_iou:.3f}):")
+    print(anchors.astype(int))
+
+    return anchors
+
+def calculate_anchor_fit(boxes, anchors):
+    """Calculate how well anchors fit the boxes"""
+    ious = []
+    for box in boxes:
+        box_area = box[0] * box[1]
+        anchor_areas = anchors[:, 0] * anchors[:, 1]
+        intersections = np.minimum(box[0], anchors[:, 0]) * \
+                       np.minimum(box[1], anchors[:, 1])
+        unions = box_area + anchor_areas - intersections
+        max_iou = (intersections / unions).max()
+        ious.append(max_iou)
+    return np.mean(ious)
+```
+
+### Anchor-Free Detection
+
+Modern detectors predict boxes without anchors.
+
+**FCOS-style (center-based):**
+- Predict (l, t, r, b) distances from center
+- Centerness score for quality
+- Multi-scale assignment
+
+**YOLO v8 style:**
+- Predict (x, y, w, h) directly
+- Task-aligned assigner
+- Distribution focal loss for regression
+
+**Benefits of anchor-free:**
+- No hyperparameter tuning for anchors
+- Simpler architecture
+- Better generalization
+
+### Anchor Assignment Strategies
+
+**ATSS (Adaptive Training Sample Selection):**
+1. For each GT, select k closest anchors per level
+2. Calculate IoU for selected anchors
+3. IoU threshold = mean + std of IoUs
+4. Assign positives where IoU > threshold
+
+**TAL (Task-Aligned Assigner - YOLO v8):**
+```
+score = cls_score^alpha * IoU^beta
+```
+
+Where alpha=0.5, beta=6.0 (weights classification and localization)
+
+---
+
+## Loss Functions
+
+### Classification Losses
+
+#### Cross-Entropy Loss
+
+Standard multi-class classification:
+```python
+loss = -log(p_correct_class)
+```
+
+#### Focal Loss
+
+Handles class imbalance by down-weighting easy examples.
+
+```python
+def focal_loss(pred, target, gamma=2.0, alpha=0.25):
+    """
+    pred: (N, num_classes) predicted probabilities
+    target: (N,) ground truth class indices
+    """
+    ce_loss = F.cross_entropy(pred, target, reduction='none')
+    pt = torch.exp(-ce_loss)  # probability of correct class
+
+    # Focal term: (1 - pt)^gamma
+    focal_term = (1 - pt) ** gamma
+
+    # Alpha weighting
+    alpha_t = alpha * target + (1 - alpha) * (1 - target)
+
+    loss = alpha_t * focal_term * ce_loss
+    return loss.mean()
+```
+
+**Hyperparameters:**
+- gamma: 2.0 typical, higher = more focus on hard examples
+- alpha: 0.25 for foreground class weight
+
+#### Quality Focal Loss (QFL)
+
+Combines classification with IoU quality.
+
+```python
+def quality_focal_loss(pred, target, beta=2.0):
+    """
+    target: IoU values (0-1) instead of binary
+    """
+    ce = F.binary_cross_entropy(pred, target, reduction='none')
+    focal_weight = torch.abs(pred - target) ** beta
+    loss = focal_weight * ce
+    return loss.mean()
+```
+
+### Regression Losses
+
+#### Smooth L1 Loss
+
+```python
+def smooth_l1_loss(pred, target, beta=1.0):
+    diff = torch.abs(pred - target)
+    loss = torch.where(
+        diff < beta,
+        0.5 * diff ** 2 / beta,
+        diff - 0.5 * beta
+    )
+    return loss.mean()
+```
+
+#### IoU-Based Losses
+
+**IoU Loss:**
+```
+L_IoU = 1 - IoU
+```
+
+**GIoU (Generalized IoU):**
+```
+GIoU = IoU - (C - U) / C
+L_GIoU = 1 - GIoU
+```
+
+Where C = area of smallest enclosing box, U = union area.
+
+**DIoU (Distance IoU):**
+```
+DIoU = IoU - d^2 / c^2
+L_DIoU = 1 - DIoU
+```
+
+Where d = center distance, c = diagonal of enclosing box.
+
+**CIoU (Complete IoU):**
+```
+CIoU = IoU - d^2 / c^2 - alpha*v
+v = (4/pi^2) * (arctan(w_gt/h_gt) - arctan(w/h))^2
+alpha = v / (1 - IoU + v)
+L_CIoU = 1 - CIoU
+```
+
+**Comparison:**
+
+| Loss | Handles | Best For |
+|------|---------|----------|
+| L1/L2 | Basic regression | Simple tasks |
+| IoU | Overlap | Standard detection |
+| GIoU | Non-overlapping | Distant boxes |
+| DIoU | Center distance | Faster convergence |
+| CIoU | Aspect ratio | Best accuracy |
+
+```python
+def ciou_loss(pred_boxes, target_boxes):
+    """
+    pred_boxes, target_boxes: (N, 4) as [x1, y1, x2, y2]
+    """
+    # Standard IoU
+    inter = compute_intersection(pred_boxes, target_boxes)
+    union = compute_union(pred_boxes, target_boxes)
+    iou = inter / (union + 1e-7)
+
+    # Enclosing box diagonal
+    enclose_x1 = torch.min(pred_boxes[:, 0], target_boxes[:, 0])
+    enclose_y1 = torch.min(pred_boxes[:, 1], target_boxes[:, 1])
+    enclose_x2 = torch.max(pred_boxes[:, 2], target_boxes[:, 2])
+    enclose_y2 = torch.max(pred_boxes[:, 3], target_boxes[:, 3])
+    c_sq = (enclose_x2 - enclose_x1)**2 + (enclose_y2 - enclose_y1)**2
+
+    # Center distance
+    pred_cx = (pred_boxes[:, 0] + pred_boxes[:, 2]) / 2
+    pred_cy = (pred_boxes[:, 1] + pred_boxes[:, 3]) / 2
+    target_cx = (target_boxes[:, 0] + target_boxes[:, 2]) / 2
+    target_cy = (target_boxes[:, 1] + target_boxes[:, 3]) / 2
+    d_sq = (pred_cx - target_cx)**2 + (pred_cy - target_cy)**2
+
+    # Aspect ratio term
+    pred_w = pred_boxes[:, 2] - pred_boxes[:, 0]
+    pred_h = pred_boxes[:, 3] - pred_boxes[:, 1]
+    target_w = target_boxes[:, 2] - target_boxes[:, 0]
+    target_h = target_boxes[:, 3] - target_boxes[:, 1]
+
+    v = (4 / math.pi**2) * (
+        torch.atan(target_w / target_h) - torch.atan(pred_w / pred_h)
+    )**2
+    alpha_term = v / (1 - iou + v + 1e-7)
+
+    ciou = iou - d_sq / (c_sq + 1e-7) - alpha_term * v
+    return 1 - ciou
+```
+
+### Distribution Focal Loss (DFL)
+
+Used in YOLO v8 for regression.
+
+**Concept:**
+- Predict distribution over discrete positions
+- Each regression target is a soft label
+- Allows uncertainty estimation
+
+```python
+def dfl_loss(pred_dist, target, reg_max=16):
+    """
+    pred_dist: (N, reg_max) predicted distribution
+    target: (N,) continuous target values (0 to reg_max)
+    """
+    # Convert continuous target to soft label
+    target_left = target.floor().long()
+    target_right = target_left + 1
+    weight_right = target - target_left.float()
+    weight_left = 1 - weight_right
+
+    # Cross-entropy with soft targets
+    loss_left = F.cross_entropy(pred_dist, target_left, reduction='none')
+    loss_right = F.cross_entropy(pred_dist, target_right.clamp(max=reg_max-1),
+                                  reduction='none')
+
+    loss = weight_left * loss_left + weight_right * loss_right
+    return loss.mean()
+```
+
+---
+
+## Training Strategies
+
+### Learning Rate Schedules
+
+**Warmup:**
+```python
+# Linear warmup for first N epochs
+if epoch < warmup_epochs:
+    lr = base_lr * (epoch + 1) / warmup_epochs
+```
+
+**Cosine Annealing:**
+```python
+lr = lr_min + 0.5 * (lr_max - lr_min) * (1 + cos(pi * epoch / total_epochs))
+```
+
+**Step Decay:**
+```python
+# Reduce by factor at milestones
+lr = base_lr * (0.1 ** (milestones_passed))
+```
+
+**Recommended schedule for detection:**
+```python
+optimizer = SGD(model.parameters(), lr=0.01, momentum=0.937, weight_decay=0.0005)
+
+scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
+    optimizer,
+    T_max=total_epochs,
+    eta_min=0.0001
+)
+
+# With warmup
+warmup_scheduler = torch.optim.lr_scheduler.LinearLR(
+    optimizer,
+    start_factor=0.1,
+    total_iters=warmup_epochs
+)
+
+scheduler = torch.optim.lr_scheduler.SequentialLR(
+    optimizer,
+    schedulers=[warmup_scheduler, scheduler],
+    milestones=[warmup_epochs]
+)
+```
+
+### Exponential Moving Average (EMA)
+
+Smooths model weights for better stability.
+
+```python
+class EMA:
+    def __init__(self, model, decay=0.9999):
+        self.model = model
+        self.decay = decay
+        self.shadow = {}
+        for name, param in model.named_parameters():
+            if param.requires_grad:
+                self.shadow[name] = param.data.clone()
+
+    def update(self):
+        for name, param in self.model.named_parameters():
+            if param.requires_grad:
+                self.shadow[name] = (
+                    self.decay * self.shadow[name] +
+                    (1 - self.decay) * param.data
+                )
+
+    def apply_shadow(self):
+        for name, param in self.model.named_parameters():
+            if param.requires_grad:
+                param.data.copy_(self.shadow[name])
+```
+
+**Usage:**
+- Update EMA after each training step
+- Use EMA weights for validation/inference
+- Decay: 0.9999 typical (higher = slower update)
+
+### Multi-Scale Training
+
+Train with varying input sizes.
+
+```python
+# Random size each batch
+sizes = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768]
+input_size = random.choice(sizes)
+
+# Resize batch to selected size
+images = F.interpolate(images, size=input_size, mode='bilinear')
+```
+
+**Benefits:**
+- Better scale invariance
+- +1-2% mAP improvement
+- Slower training (variable batch size)
+
+### Gradient Accumulation
+
+Simulate larger batch sizes.
+
+```python
+accumulation_steps = 4
+optimizer.zero_grad()
+
+for i, (images, targets) in enumerate(dataloader):
+    loss = model(images, targets) / accumulation_steps
+    loss.backward()
+
+    if (i + 1) % accumulation_steps == 0:
+        optimizer.step()
+        optimizer.zero_grad()
+```
+
+### Mixed Precision Training
+
+Use FP16 for speed and memory.
+
+```python
+from torch.cuda.amp import autocast, GradScaler
+
+scaler = GradScaler()
+
+for images, targets in dataloader:
+    optimizer.zero_grad()
+
+    with autocast():
+        loss = model(images, targets)
+
+    scaler.scale(loss).backward()
+    scaler.step(optimizer)
+    scaler.update()
+```
+
+**Benefits:**
+- 2-3x faster training
+- 50% memory reduction
+- Minimal accuracy loss
+
+---
+
+## Data Augmentation
+
+### Geometric Augmentations
+
+```python
+import albumentations as A
+
+geometric = A.Compose([
+    A.HorizontalFlip(p=0.5),
+    A.Rotate(limit=15, p=0.3),
+    A.RandomScale(scale_limit=0.2, p=0.5),
+    A.Affine(translate_percent={'x': (-0.1, 0.1), 'y': (-0.1, 0.1)}, p=0.3),
+], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels']))
+```
+
+### Color Augmentations
+
+```python
+color = A.Compose([
+    A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
+    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=30, val_shift_limit=20, p=0.5),
+    A.CLAHE(clip_limit=2.0, p=0.1),
+    A.GaussianBlur(blur_limit=3, p=0.1),
+    A.GaussNoise(var_limit=(10, 50), p=0.1),
+])
+```
+
+### Mosaic Augmentation
+
+Combines 4 images into one (YOLO-style).
+
+```python
+def mosaic_augmentation(images, labels, input_size=640):
+    """
+    images: list of 4 images
+    labels: list of 4 label arrays
+    """
+    result_image = np.zeros((input_size, input_size, 3), dtype=np.uint8)
+    result_labels = []
+
+    # Random center point
+    cx = int(random.uniform(input_size * 0.25, input_size * 0.75))
+    cy = int(random.uniform(input_size * 0.25, input_size * 0.75))
+
+    positions = [
+        (0, 0, cx, cy),           # top-left
+        (cx, 0, input_size, cy),  # top-right
+        (0, cy, cx, input_size),  # bottom-left
+        (cx, cy, input_size, input_size),  # bottom-right
+    ]
+
+    for i, (x1, y1, x2, y2) in enumerate(positions):
+        img = images[i]
+        h, w = y2 - y1, x2 - x1
+
+        # Resize and place
+        img_resized = cv2.resize(img, (w, h))
+        result_image[y1:y2, x1:x2] = img_resized
+
+        # Transform labels
+        for label in labels[i]:
+            # Scale and shift bounding boxes
+            new_label = transform_bbox(label, img.shape, (h, w), (x1, y1))
+            result_labels.append(new_label)
+
+    return result_image, result_labels
+```
+
+### MixUp
+
+Blends two images and labels.
+
+```python
+def mixup(image1, labels1, image2, labels2, alpha=0.5):
+    """
+    alpha: mixing ratio (0.5 = equal blend)
+    """
+    # Blend images
+    mixed_image = (alpha * image1 + (1 - alpha) * image2).astype(np.uint8)
+
+    # Blend labels with soft weights
+    labels1_weighted = [(box, cls, alpha) for box, cls in labels1]
+    labels2_weighted = [(box, cls, 1-alpha) for box, cls in labels2]
+
+    mixed_labels = labels1_weighted + labels2_weighted
+    return mixed_image, mixed_labels
+```
+
+### Copy-Paste Augmentation
+
+Paste objects from one image to another.
+
+```python
+def copy_paste(background, bg_labels, source, src_labels, src_masks):
+    """
+    Paste segmented objects onto background
+    """
+    result = background.copy()
+
+    for mask, label in zip(src_masks, src_labels):
+        # Random position
+        x_offset = random.randint(0, background.shape[1] - mask.shape[1])
+        y_offset = random.randint(0, background.shape[0] - mask.shape[0])
+
+        # Paste with mask
+        region = result[y_offset:y_offset+mask.shape[0],
+                       x_offset:x_offset+mask.shape[1]]
+        region[mask > 0] = source[mask > 0]
+
+        # Add new label
+        new_box = transform_bbox(label, x_offset, y_offset)
+        bg_labels.append(new_box)
+
+    return result, bg_labels
+```
+
+### Cutout / Random Erasing
+
+Randomly erase patches.
+
+```python
+def cutout(image, num_holes=8, max_h_size=32, max_w_size=32):
+    h, w = image.shape[:2]
+    result = image.copy()
+
+    for _ in range(num_holes):
+        y = random.randint(0, h)
+        x = random.randint(0, w)
+        h_size = random.randint(1, max_h_size)
+        w_size = random.randint(1, max_w_size)
+
+        y1, y2 = max(0, y - h_size // 2), min(h, y + h_size // 2)
+        x1, x2 = max(0, x - w_size // 2), min(w, x + w_size // 2)
+
+        result[y1:y2, x1:x2] = 0  # or random color
+
+    return result
+```
+
+---
+
+## Model Optimization Techniques
+
+### Pruning
+
+Remove unimportant weights.
+
+**Magnitude Pruning:**
+```python
+import torch.nn.utils.prune as prune
+
+# Prune 30% of weights with smallest magnitude
+for name, module in model.named_modules():
+    if isinstance(module, nn.Conv2d):
+        prune.l1_unstructured(module, name='weight', amount=0.3)
+```
+
+**Structured Pruning (channels):**
+```python
+# Prune entire channels
+prune.ln_structured(module, name='weight', amount=0.3, n=2, dim=0)
+```
+
+### Knowledge Distillation
+
+Train smaller model with larger teacher.
+
+```python
+def distillation_loss(student_logits, teacher_logits, labels,
+                      temperature=4.0, alpha=0.7):
+    """
+    Combine soft targets from teacher with hard labels
+    """
+    # Soft targets
+    soft_student = F.log_softmax(student_logits / temperature, dim=1)
+    soft_teacher = F.softmax(teacher_logits / temperature, dim=1)
+    soft_loss = F.kl_div(soft_student, soft_teacher, reduction='batchmean')
+    soft_loss *= temperature ** 2  # Scale by T^2
+
+    # Hard targets
+    hard_loss = F.cross_entropy(student_logits, labels)
+
+    # Combined loss
+    return alpha * soft_loss + (1 - alpha) * hard_loss
+```
+
+### Quantization
+
+Reduce precision for faster inference.
+
+**Post-Training Quantization:**
+```python
+import torch.quantization
+
+# Prepare model
+model.set_mode('inference')
+model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
+torch.quantization.prepare(model, inplace=True)
+
+# Calibrate with representative data
+with torch.no_grad():
+    for images in calibration_loader:
+        model(images)
+
+# Convert to quantized model
+torch.quantization.convert(model, inplace=True)
+```
+
+**Quantization-Aware Training:**
+```python
+# Insert fake quantization during training
+model.train()
+model.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
+model_prepared = torch.quantization.prepare_qat(model)
+
+# Train with fake quantization
+for epoch in range(num_epochs):
+    train(model_prepared)
+
+# Convert to quantized
+model_quantized = torch.quantization.convert(model_prepared)
+```
+
+---
+
+## Hyperparameter Tuning
+
+### Key Hyperparameters
+
+| Parameter | Range | Default | Impact |
+|-----------|-------|---------|--------|
+| Learning rate | 1e-4 to 1e-1 | 0.01 | Critical |
+| Batch size | 4 to 64 | 16 | Memory/speed |
+| Weight decay | 1e-5 to 1e-3 | 5e-4 | Regularization |
+| Momentum | 0.9 to 0.99 | 0.937 | Optimization |
+| Warmup epochs | 1 to 10 | 3 | Stability |
+| IoU threshold (NMS) | 0.4 to 0.7 | 0.5 | Recall/precision |
+| Confidence threshold | 0.1 to 0.5 | 0.25 | Detection count |
+| Image size | 320 to 1280 | 640 | Accuracy/speed |
+
+### Tuning Strategy
+
+1. **Baseline**: Use default hyperparameters
+2. **Learning rate**: Grid search [1e-3, 5e-3, 1e-2, 5e-2]
+3. **Batch size**: Maximum that fits in memory
+4. **Augmentation**: Start minimal, add progressively
+5. **Epochs**: Train until validation loss plateaus
+6. **NMS threshold**: Tune on validation set
+
+### Automated Hyperparameter Optimization
+
+```python
+import optuna
+
+def objective(trial):
+    lr = trial.suggest_loguniform('lr', 1e-4, 1e-1)
+    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-3)
+    mosaic_prob = trial.suggest_uniform('mosaic_prob', 0.0, 1.0)
+
+    model = create_model()
+    train_model(model, lr=lr, weight_decay=weight_decay, mosaic_prob=mosaic_prob)
+    mAP = test_model(model)
+
+    return mAP
+
+study = optuna.create_study(direction='maximize')
+study.optimize(objective, n_trials=100)
+
+print(f"Best params: {study.best_params}")
+print(f"Best mAP: {study.best_value}")
+```
+
+---
+
+## Detection-Specific Tips
+
+### Small Object Detection
+
+1. **Higher resolution**: 1280px instead of 640px
+2. **SAHI (Slicing)**: Inference on overlapping tiles
+3. **More FPN levels**: P2 level (1/4 scale)
+4. **Anchor adjustment**: Smaller anchors for small objects
+5. **Copy-paste augmentation**: Increase small object frequency
+
+### Handling Class Imbalance
+
+1. **Focal loss**: gamma=2.0, alpha=0.25
+2. **Over-sampling**: Repeat rare class images
+3. **Class weights**: Inverse frequency weighting
+4. **Copy-paste**: Augment rare classes
+
+### Improving Localization
+
+1. **CIoU loss**: Includes aspect ratio term
+2. **Cascade detection**: Progressive refinement
+3. **Higher IoU threshold**: 0.6-0.7 for positive samples
+4. **Deformable convolutions**: Learn spatial offsets
+
+### Reducing False Positives
+
+1. **Higher confidence threshold**: 0.4-0.5
+2. **More negative samples**: Hard negative mining
+3. **Background class weight**: Increase penalty
+4. **Ensemble**: Multiple model voting
+
+---
+
+## Resources
+
+- [MMDetection training configs](https://github.com/open-mmlab/mmdetection/tree/main/configs)
+- [Ultralytics training tips](https://docs.ultralytics.com/guides/hyperparameter-tuning/)
+- [Albumentations detection](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/)
+- [Focal Loss paper](https://arxiv.org/abs/1708.02002)
+- [CIoU paper](https://arxiv.org/abs/2005.03572)
--- a/engineering-team/senior-computer-vision/references/production_vision_systems.md
+++ b/engineering-team/senior-computer-vision/references/production_vision_systems.md
--- a/engineering-team/senior-computer-vision/scripts/dataset_pipeline_builder.py
+++ b/engineering-team/senior-computer-vision/scripts/dataset_pipeline_builder.py
--- a/engineering-team/senior-computer-vision/scripts/inference_optimizer.py
+++ b/engineering-team/senior-computer-vision/scripts/inference_optimizer.py
@@ -1,17 +1,26 @@
 #!/usr/bin/env python3
 """
 Inference Optimizer
-Production-grade tool for senior computer vision engineer
+
+Analyzes and benchmarks vision models, and provides optimization recommendations.
+Supports PyTorch, ONNX, and TensorRT models.
+
+Usage:
+    python inference_optimizer.py model.pt --benchmark
+    python inference_optimizer.py model.pt --export onnx --output model.onnx
+    python inference_optimizer.py model.onnx --analyze
 """

 import os
 import sys
 import json
-import logging
 import argparse
+import logging
+import time
 from pathlib import Path
-from typing import Dict, List, Optional
+from typing import Dict, List, Optional, Any, Tuple
 from datetime import datetime
+import statistics

 logging.basicConfig(
    level=logging.INFO,
@@ -19,82 +28,530 @@ logging.basicConfig(
 )
 logger = logging.getLogger(__name__)

+
+# Model format signatures
+MODEL_FORMATS = {
+    '.pt': 'pytorch',
+    '.pth': 'pytorch',
+    '.onnx': 'onnx',
+    '.engine': 'tensorrt',
+    '.trt': 'tensorrt',
+    '.xml': 'openvino',
+    '.mlpackage': 'coreml',
+    '.mlmodel': 'coreml',
+}
+
+# Optimization recommendations
+OPTIMIZATION_PATHS = {
+    ('pytorch', 'gpu'): ['onnx', 'tensorrt_fp16'],
+    ('pytorch', 'cpu'): ['onnx', 'onnxruntime'],
+    ('pytorch', 'edge'): ['onnx', 'tensorrt_int8'],
+    ('pytorch', 'mobile'): ['onnx', 'tflite'],
+    ('pytorch', 'apple'): ['coreml'],
+    ('pytorch', 'intel'): ['onnx', 'openvino'],
+    ('onnx', 'gpu'): ['tensorrt_fp16'],
+    ('onnx', 'cpu'): ['onnxruntime'],
+}
+
+
 class InferenceOptimizer:
-    """Production-grade inference optimizer"""
-    
-    def __init__(self, config: Dict):
-        self.config = config
-        self.results = {
-            'status': 'initialized',
-            'start_time': datetime.now().isoformat(),
-            'processed_items': 0
+    """Analyzes and optimizes vision model inference."""
+
+    def __init__(self, model_path: str):
+        self.model_path = Path(model_path)
+        self.model_format = self._detect_format()
+        self.model_info = {}
+        self.benchmark_results = {}
+
+    def _detect_format(self) -> str:
+        """Detect model format from file extension."""
+        suffix = self.model_path.suffix.lower()
+        if suffix in MODEL_FORMATS:
+            return MODEL_FORMATS[suffix]
+        raise ValueError(f"Unknown model format: {suffix}")
+
+    def analyze_model(self) -> Dict[str, Any]:
+        """Analyze model structure and size."""
+        logger.info(f"Analyzing model: {self.model_path}")
+
+        analysis = {
+            'path': str(self.model_path),
+            'format': self.model_format,
+            'file_size_mb': self.model_path.stat().st_size / 1024 / 1024,
+            'parameters': None,
+            'layers': [],
+            'input_shape': None,
+            'output_shape': None,
+            'ops_count': None,
        }
-        logger.info(f"Initialized {self.__class__.__name__}")
-    
-    def validate_config(self) -> bool:
-        """Validate configuration"""
-        logger.info("Validating configuration...")
-        # Add validation logic
-        logger.info("Configuration validated")
-        return True
-    
-    def process(self) -> Dict:
-        """Main processing logic"""
-        logger.info("Starting processing...")
-        
+
+        if self.model_format == 'onnx':
+            analysis.update(self._analyze_onnx())
+        elif self.model_format == 'pytorch':
+            analysis.update(self._analyze_pytorch())
+
+        self.model_info = analysis
+        return analysis
+
+    def _analyze_onnx(self) -> Dict[str, Any]:
+        """Analyze ONNX model."""
        try:
-            self.validate_config()
-            
-            # Main processing
-            result = self._execute()
-            
-            self.results['status'] = 'completed'
-            self.results['end_time'] = datetime.now().isoformat()
-            
-            logger.info("Processing completed successfully")
-            return self.results
-            
+            import onnx
+            model = onnx.load(str(self.model_path))
+            onnx.checker.check_model(model)
+
+            # Count parameters
+            total_params = 0
+            for initializer in model.graph.initializer:
+                param_count = 1
+                for dim in initializer.dims:
+                    param_count *= dim
+                total_params += param_count
+
+            # Get input/output shapes
+            inputs = []
+            for inp in model.graph.input:
+                shape = [d.dim_value if d.dim_value else -1
+                        for d in inp.type.tensor_type.shape.dim]
+                inputs.append({'name': inp.name, 'shape': shape})
+
+            outputs = []
+            for out in model.graph.output:
+                shape = [d.dim_value if d.dim_value else -1
+                        for d in out.type.tensor_type.shape.dim]
+                outputs.append({'name': out.name, 'shape': shape})
+
+            # Count operators
+            op_counts = {}
+            for node in model.graph.node:
+                op_type = node.op_type
+                op_counts[op_type] = op_counts.get(op_type, 0) + 1
+
+            return {
+                'parameters': total_params,
+                'inputs': inputs,
+                'outputs': outputs,
+                'operator_counts': op_counts,
+                'num_nodes': len(model.graph.node),
+                'opset_version': model.opset_import[0].version if model.opset_import else None,
+            }
+
+        except ImportError:
+            logger.warning("onnx package not installed, skipping detailed analysis")
+            return {}
        except Exception as e:
-            self.results['status'] = 'failed'
-            self.results['error'] = str(e)
-            logger.error(f"Processing failed: {e}")
-            raise
-    
-    def _execute(self) -> Dict:
-        """Execute main logic"""
-        # Implementation here
-        return {'success': True}
+            logger.error(f"Error analyzing ONNX model: {e}")
+            return {'error': str(e)}
+
+    def _analyze_pytorch(self) -> Dict[str, Any]:
+        """Analyze PyTorch model."""
+        try:
+            import torch
+
+            # Try to load as checkpoint
+            checkpoint = torch.load(str(self.model_path), map_location='cpu')
+
+            # Handle different checkpoint formats
+            if isinstance(checkpoint, dict):
+                if 'model' in checkpoint:
+                    state_dict = checkpoint['model']
+                elif 'state_dict' in checkpoint:
+                    state_dict = checkpoint['state_dict']
+                else:
+                    state_dict = checkpoint
+            else:
+                # Assume it's the model itself
+                if hasattr(checkpoint, 'state_dict'):
+                    state_dict = checkpoint.state_dict()
+                else:
+                    return {'error': 'Could not extract state dict'}
+
+            # Count parameters
+            total_params = 0
+            layer_info = []
+            for name, param in state_dict.items():
+                if hasattr(param, 'numel'):
+                    param_count = param.numel()
+                    total_params += param_count
+                    layer_info.append({
+                        'name': name,
+                        'shape': list(param.shape),
+                        'params': param_count,
+                        'dtype': str(param.dtype)
+                    })
+
+            return {
+                'parameters': total_params,
+                'layers': layer_info[:20],  # First 20 layers
+                'num_layers': len(layer_info),
+            }
+
+        except ImportError:
+            logger.warning("torch package not installed, skipping detailed analysis")
+            return {}
+        except Exception as e:
+            logger.error(f"Error analyzing PyTorch model: {e}")
+            return {'error': str(e)}
+
+    def benchmark(self, input_size: Tuple[int, int] = (640, 640),
+                  batch_sizes: List[int] = None,
+                  num_iterations: int = 100,
+                  warmup: int = 10) -> Dict[str, Any]:
+        """Benchmark model inference speed."""
+        if batch_sizes is None:
+            batch_sizes = [1, 4, 8, 16]
+
+        logger.info(f"Benchmarking model with input size {input_size}")
+
+        results = {
+            'input_size': input_size,
+            'num_iterations': num_iterations,
+            'warmup_iterations': warmup,
+            'batch_results': [],
+            'device': 'cpu',
+        }
+
+        try:
+            if self.model_format == 'onnx':
+                results.update(self._benchmark_onnx(input_size, batch_sizes,
+                                                    num_iterations, warmup))
+            elif self.model_format == 'pytorch':
+                results.update(self._benchmark_pytorch(input_size, batch_sizes,
+                                                       num_iterations, warmup))
+            else:
+                results['error'] = f"Benchmarking not supported for {self.model_format}"
+
+        except Exception as e:
+            results['error'] = str(e)
+            logger.error(f"Benchmark failed: {e}")
+
+        self.benchmark_results = results
+        return results
+
+    def _benchmark_onnx(self, input_size: Tuple[int, int],
+                        batch_sizes: List[int],
+                        num_iterations: int, warmup: int) -> Dict[str, Any]:
+        """Benchmark ONNX model."""
+        import numpy as np
+
+        try:
+            import onnxruntime as ort
+
+            # Try GPU first, fall back to CPU
+            providers = ['CPUExecutionProvider']
+            try:
+                if 'CUDAExecutionProvider' in ort.get_available_providers():
+                    providers = ['CUDAExecutionProvider'] + providers
+            except:
+                pass
+
+            session = ort.InferenceSession(str(self.model_path), providers=providers)
+            input_name = session.get_inputs()[0].name
+            device = 'cuda' if 'CUDA' in session.get_providers()[0] else 'cpu'
+
+            results = {'device': device, 'provider': session.get_providers()[0]}
+            batch_results = []
+
+            for batch_size in batch_sizes:
+                # Create dummy input
+                dummy = np.random.randn(batch_size, 3, *input_size).astype(np.float32)
+
+                # Warmup
+                for _ in range(warmup):
+                    session.run(None, {input_name: dummy})
+
+                # Benchmark
+                latencies = []
+                for _ in range(num_iterations):
+                    start = time.perf_counter()
+                    session.run(None, {input_name: dummy})
+                    latencies.append((time.perf_counter() - start) * 1000)
+
+                batch_result = {
+                    'batch_size': batch_size,
+                    'mean_latency_ms': statistics.mean(latencies),
+                    'std_latency_ms': statistics.stdev(latencies) if len(latencies) > 1 else 0,
+                    'min_latency_ms': min(latencies),
+                    'max_latency_ms': max(latencies),
+                    'p50_latency_ms': sorted(latencies)[len(latencies) // 2],
+                    'p95_latency_ms': sorted(latencies)[int(len(latencies) * 0.95)],
+                    'p99_latency_ms': sorted(latencies)[int(len(latencies) * 0.99)],
+                    'throughput_fps': batch_size * 1000 / statistics.mean(latencies),
+                }
+                batch_results.append(batch_result)
+
+                logger.info(f"Batch {batch_size}: {batch_result['mean_latency_ms']:.2f}ms, "
+                           f"{batch_result['throughput_fps']:.1f} FPS")
+
+            results['batch_results'] = batch_results
+            return results
+
+        except ImportError:
+            return {'error': 'onnxruntime not installed'}
+
+    def _benchmark_pytorch(self, input_size: Tuple[int, int],
+                          batch_sizes: List[int],
+                          num_iterations: int, warmup: int) -> Dict[str, Any]:
+        """Benchmark PyTorch model."""
+        try:
+            import torch
+            import numpy as np
+
+            # Load model
+            device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+            checkpoint = torch.load(str(self.model_path), map_location=device)
+
+            # Handle different checkpoint formats
+            if isinstance(checkpoint, dict) and 'model' in checkpoint:
+                model = checkpoint['model']
+            elif hasattr(checkpoint, 'forward'):
+                model = checkpoint
+            else:
+                return {'error': 'Could not load model for benchmarking'}
+
+            model.to(device)
+            model.train(False)
+
+            results = {'device': str(device)}
+            batch_results = []
+
+            with torch.no_grad():
+                for batch_size in batch_sizes:
+                    dummy = torch.randn(batch_size, 3, *input_size, device=device)
+
+                    # Warmup
+                    for _ in range(warmup):
+                        _ = model(dummy)
+                    if device.type == 'cuda':
+                        torch.cuda.synchronize()
+
+                    # Benchmark
+                    latencies = []
+                    for _ in range(num_iterations):
+                        if device.type == 'cuda':
+                            torch.cuda.synchronize()
+                        start = time.perf_counter()
+                        _ = model(dummy)
+                        if device.type == 'cuda':
+                            torch.cuda.synchronize()
+                        latencies.append((time.perf_counter() - start) * 1000)
+
+                    batch_result = {
+                        'batch_size': batch_size,
+                        'mean_latency_ms': statistics.mean(latencies),
+                        'std_latency_ms': statistics.stdev(latencies) if len(latencies) > 1 else 0,
+                        'min_latency_ms': min(latencies),
+                        'max_latency_ms': max(latencies),
+                        'throughput_fps': batch_size * 1000 / statistics.mean(latencies),
+                    }
+                    batch_results.append(batch_result)
+
+                    logger.info(f"Batch {batch_size}: {batch_result['mean_latency_ms']:.2f}ms, "
+                               f"{batch_result['throughput_fps']:.1f} FPS")
+
+            results['batch_results'] = batch_results
+            return results
+
+        except ImportError:
+            return {'error': 'torch not installed'}
+        except Exception as e:
+            return {'error': str(e)}
+
+    def get_optimization_recommendations(self, target: str = 'gpu') -> List[Dict[str, Any]]:
+        """Get optimization recommendations for target platform."""
+        recommendations = []
+
+        key = (self.model_format, target)
+        if key in OPTIMIZATION_PATHS:
+            path = OPTIMIZATION_PATHS[key]
+            for step in path:
+                rec = {
+                    'step': step,
+                    'description': self._get_step_description(step),
+                    'expected_speedup': self._get_expected_speedup(step),
+                    'command': self._get_step_command(step),
+                }
+                recommendations.append(rec)
+
+        # Add general recommendations
+        if self.model_info:
+            params = self.model_info.get('parameters', 0)
+            if params and params > 50_000_000:
+                recommendations.append({
+                    'step': 'pruning',
+                    'description': f'Model has {params/1e6:.1f}M parameters. '
+                                 'Consider structured pruning to reduce size.',
+                    'expected_speedup': '1.5-2x',
+                })
+
+            file_size = self.model_info.get('file_size_mb', 0)
+            if file_size > 100:
+                recommendations.append({
+                    'step': 'quantization',
+                    'description': f'Model size is {file_size:.1f}MB. '
+                                 'INT8 quantization can reduce by 75%.',
+                    'expected_speedup': '2-4x',
+                })
+
+        return recommendations
+
+    def _get_step_description(self, step: str) -> str:
+        """Get description for optimization step."""
+        descriptions = {
+            'onnx': 'Export to ONNX format for framework-agnostic deployment',
+            'tensorrt_fp16': 'Convert to TensorRT with FP16 precision for NVIDIA GPUs',
+            'tensorrt_int8': 'Convert to TensorRT with INT8 quantization for edge devices',
+            'onnxruntime': 'Use ONNX Runtime for optimized CPU/GPU inference',
+            'openvino': 'Convert to OpenVINO for Intel CPU/GPU optimization',
+            'coreml': 'Convert to CoreML for Apple Silicon acceleration',
+            'tflite': 'Convert to TensorFlow Lite for mobile deployment',
+        }
+        return descriptions.get(step, step)
+
+    def _get_expected_speedup(self, step: str) -> str:
+        """Get expected speedup for optimization step."""
+        speedups = {
+            'onnx': '1-1.5x',
+            'tensorrt_fp16': '2-4x',
+            'tensorrt_int8': '3-6x',
+            'onnxruntime': '1.2-2x',
+            'openvino': '1.5-3x',
+            'coreml': '2-5x (on Apple Silicon)',
+            'tflite': '1-2x',
+        }
+        return speedups.get(step, 'varies')
+
+    def _get_step_command(self, step: str) -> str:
+        """Get command for optimization step."""
+        model_name = self.model_path.stem
+        commands = {
+            'onnx': f'yolo export model={model_name}.pt format=onnx',
+            'tensorrt_fp16': f'trtexec --onnx={model_name}.onnx --saveEngine={model_name}.engine --fp16',
+            'tensorrt_int8': f'trtexec --onnx={model_name}.onnx --saveEngine={model_name}.engine --int8',
+            'onnxruntime': f'pip install onnxruntime-gpu',
+            'openvino': f'mo --input_model {model_name}.onnx --output_dir openvino/',
+            'coreml': f'yolo export model={model_name}.pt format=coreml',
+        }
+        return commands.get(step, '')
+
+    def print_summary(self):
+        """Print analysis and benchmark summary."""
+        print("\n" + "=" * 70)
+        print("MODEL ANALYSIS SUMMARY")
+        print("=" * 70)
+
+        if self.model_info:
+            print(f"Path:        {self.model_info.get('path', 'N/A')}")
+            print(f"Format:      {self.model_info.get('format', 'N/A')}")
+            print(f"File Size:   {self.model_info.get('file_size_mb', 0):.2f} MB")
+
+            params = self.model_info.get('parameters')
+            if params:
+                print(f"Parameters:  {params:,} ({params/1e6:.2f}M)")
+
+            if 'num_nodes' in self.model_info:
+                print(f"Nodes:       {self.model_info['num_nodes']}")
+
+        if self.benchmark_results and 'batch_results' in self.benchmark_results:
+            print("\n" + "-" * 70)
+            print("BENCHMARK RESULTS")
+            print("-" * 70)
+            print(f"Device:      {self.benchmark_results.get('device', 'N/A')}")
+            print(f"Input Size:  {self.benchmark_results.get('input_size', 'N/A')}")
+            print()
+            print(f"{'Batch':<8} {'Latency (ms)':<15} {'Throughput (FPS)':<18} {'P99 (ms)':<12}")
+            print("-" * 55)
+
+            for result in self.benchmark_results['batch_results']:
+                print(f"{result['batch_size']:<8} "
+                      f"{result['mean_latency_ms']:<15.2f} "
+                      f"{result['throughput_fps']:<18.1f} "
+                      f"{result.get('p99_latency_ms', 0):<12.2f}")
+
+        print("=" * 70 + "\n")
+

 def main():
-    """Main entry point"""
    parser = argparse.ArgumentParser(
-        description="Inference Optimizer"
+        description="Analyze and optimize vision model inference"
    )
-    parser.add_argument('--input', '-i', required=True, help='Input path')
-    parser.add_argument('--output', '-o', required=True, help='Output path')
-    parser.add_argument('--config', '-c', help='Configuration file')
-    parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output')
-    
+    parser.add_argument('model_path', help='Path to model file')
+    parser.add_argument('--analyze', action='store_true',
+                       help='Analyze model structure')
+    parser.add_argument('--benchmark', action='store_true',
+                       help='Benchmark inference speed')
+    parser.add_argument('--input-size', type=int, nargs=2, default=[640, 640],
+                       metavar=('H', 'W'), help='Input image size')
+    parser.add_argument('--batch-sizes', type=int, nargs='+', default=[1, 4, 8],
+                       help='Batch sizes to benchmark')
+    parser.add_argument('--iterations', type=int, default=100,
+                       help='Number of benchmark iterations')
+    parser.add_argument('--warmup', type=int, default=10,
+                       help='Number of warmup iterations')
+    parser.add_argument('--target', choices=['gpu', 'cpu', 'edge', 'mobile', 'apple', 'intel'],
+                       default='gpu', help='Target deployment platform')
+    parser.add_argument('--recommend', action='store_true',
+                       help='Show optimization recommendations')
+    parser.add_argument('--json', action='store_true',
+                       help='Output as JSON')
+    parser.add_argument('--output', '-o', help='Output file path')
+
    args = parser.parse_args()
-    
-    if args.verbose:
-        logging.getLogger().setLevel(logging.DEBUG)
-    
-    try:
-        config = {
-            'input': args.input,
-            'output': args.output
-        }
-        
-        processor = InferenceOptimizer(config)
-        results = processor.process()
-        
-        print(json.dumps(results, indent=2))
-        sys.exit(0)
-        
-    except Exception as e:
-        logger.error(f"Fatal error: {e}")
+
+    if not Path(args.model_path).exists():
+        logger.error(f"Model not found: {args.model_path}")
        sys.exit(1)

+    try:
+        optimizer = InferenceOptimizer(args.model_path)
+    except ValueError as e:
+        logger.error(str(e))
+        sys.exit(1)
+
+    results = {}
+
+    # Analyze model
+    if args.analyze or not (args.benchmark or args.recommend):
+        results['analysis'] = optimizer.analyze_model()
+
+    # Benchmark
+    if args.benchmark:
+        results['benchmark'] = optimizer.benchmark(
+            input_size=tuple(args.input_size),
+            batch_sizes=args.batch_sizes,
+            num_iterations=args.iterations,
+            warmup=args.warmup
+        )
+
+    # Recommendations
+    if args.recommend:
+        if not optimizer.model_info:
+            optimizer.analyze_model()
+        results['recommendations'] = optimizer.get_optimization_recommendations(args.target)
+
+    # Output
+    if args.json:
+        print(json.dumps(results, indent=2, default=str))
+    else:
+        optimizer.print_summary()
+
+        if args.recommend and 'recommendations' in results:
+            print("OPTIMIZATION RECOMMENDATIONS")
+            print("-" * 70)
+            for i, rec in enumerate(results['recommendations'], 1):
+                print(f"\n{i}. {rec['step'].upper()}")
+                print(f"   {rec['description']}")
+                print(f"   Expected speedup: {rec['expected_speedup']}")
+                if rec.get('command'):
+                    print(f"   Command: {rec['command']}")
+            print()
+
+    # Save to file
+    if args.output:
+        with open(args.output, 'w') as f:
+            json.dump(results, f, indent=2, default=str)
+        logger.info(f"Results saved to {args.output}")
+
+
 if __name__ == '__main__':
    main()
--- a/engineering-team/senior-computer-vision/scripts/vision_model_trainer.py
+++ b/engineering-team/senior-computer-vision/scripts/vision_model_trainer.py
@@ -1,16 +1,22 @@
 #!/usr/bin/env python3
 """
-Vision Model Trainer
-Production-grade tool for senior computer vision engineer
+Vision Model Trainer Configuration Generator
+
+Generates training configuration files for object detection and segmentation models.
+Supports Ultralytics YOLO, Detectron2, and MMDetection frameworks.
+
+Usage:
+    python vision_model_trainer.py <data_dir> --task detection --arch yolov8m
+    python vision_model_trainer.py <data_dir> --framework detectron2 --arch faster_rcnn_R_50_FPN
 """

 import os
 import sys
 import json
-import logging
 import argparse
+import logging
 from pathlib import Path
-from typing import Dict, List, Optional
+from typing import Dict, List, Optional, Any
 from datetime import datetime

 logging.basicConfig(
@@ -19,82 +25,552 @@ logging.basicConfig(
 )
 logger = logging.getLogger(__name__)

+
+# Architecture configurations
+YOLO_ARCHITECTURES = {
+    'yolov8n': {'params': '3.2M', 'gflops': 8.7, 'map': 37.3},
+    'yolov8s': {'params': '11.2M', 'gflops': 28.6, 'map': 44.9},
+    'yolov8m': {'params': '25.9M', 'gflops': 78.9, 'map': 50.2},
+    'yolov8l': {'params': '43.7M', 'gflops': 165.2, 'map': 52.9},
+    'yolov8x': {'params': '68.2M', 'gflops': 257.8, 'map': 53.9},
+    'yolov5n': {'params': '1.9M', 'gflops': 4.5, 'map': 28.0},
+    'yolov5s': {'params': '7.2M', 'gflops': 16.5, 'map': 37.4},
+    'yolov5m': {'params': '21.2M', 'gflops': 49.0, 'map': 45.4},
+    'yolov5l': {'params': '46.5M', 'gflops': 109.1, 'map': 49.0},
+    'yolov5x': {'params': '86.7M', 'gflops': 205.7, 'map': 50.7},
+}
+
+DETECTRON2_ARCHITECTURES = {
+    'faster_rcnn_R_50_FPN': {'backbone': 'R-50-FPN', 'map': 37.9},
+    'faster_rcnn_R_101_FPN': {'backbone': 'R-101-FPN', 'map': 39.4},
+    'faster_rcnn_X_101_FPN': {'backbone': 'X-101-FPN', 'map': 41.0},
+    'mask_rcnn_R_50_FPN': {'backbone': 'R-50-FPN', 'map': 38.6},
+    'mask_rcnn_R_101_FPN': {'backbone': 'R-101-FPN', 'map': 40.0},
+    'retinanet_R_50_FPN': {'backbone': 'R-50-FPN', 'map': 36.4},
+    'retinanet_R_101_FPN': {'backbone': 'R-101-FPN', 'map': 37.7},
+}
+
+MMDETECTION_ARCHITECTURES = {
+    'faster_rcnn_r50_fpn': {'backbone': 'ResNet50', 'map': 37.4},
+    'faster_rcnn_r101_fpn': {'backbone': 'ResNet101', 'map': 39.4},
+    'mask_rcnn_r50_fpn': {'backbone': 'ResNet50', 'map': 38.2},
+    'yolox_s': {'backbone': 'CSPDarknet', 'map': 40.5},
+    'yolox_m': {'backbone': 'CSPDarknet', 'map': 46.9},
+    'yolox_l': {'backbone': 'CSPDarknet', 'map': 49.7},
+    'detr_r50': {'backbone': 'ResNet50', 'map': 42.0},
+    'dino_r50': {'backbone': 'ResNet50', 'map': 49.0},
+}
+
+
 class VisionModelTrainer:
-    """Production-grade vision model trainer"""
-    
-    def __init__(self, config: Dict):
-        self.config = config
-        self.results = {
-            'status': 'initialized',
-            'start_time': datetime.now().isoformat(),
-            'processed_items': 0
+    """Generates training configurations for vision models."""
+
+    def __init__(self, data_dir: str, task: str = 'detection',
+                 framework: str = 'ultralytics'):
+        self.data_dir = Path(data_dir)
+        self.task = task
+        self.framework = framework
+        self.config = {}
+
+    def analyze_dataset(self) -> Dict[str, Any]:
+        """Analyze dataset structure and statistics."""
+        logger.info(f"Analyzing dataset at {self.data_dir}")
+
+        analysis = {
+            'path': str(self.data_dir),
+            'exists': self.data_dir.exists(),
+            'images': {'train': 0, 'val': 0, 'test': 0},
+            'annotations': {'format': None, 'classes': []},
+            'recommendations': []
        }
-        logger.info(f"Initialized {self.__class__.__name__}")
-    
-    def validate_config(self) -> bool:
-        """Validate configuration"""
-        logger.info("Validating configuration...")
-        # Add validation logic
-        logger.info("Configuration validated")
-        return True
-    
-    def process(self) -> Dict:
-        """Main processing logic"""
-        logger.info("Starting processing...")
-        
-        try:
-            self.validate_config()
-            
-            # Main processing
-            result = self._execute()
-            
-            self.results['status'] = 'completed'
-            self.results['end_time'] = datetime.now().isoformat()
-            
-            logger.info("Processing completed successfully")
-            return self.results
-            
-        except Exception as e:
-            self.results['status'] = 'failed'
-            self.results['error'] = str(e)
-            logger.error(f"Processing failed: {e}")
-            raise
-    
-    def _execute(self) -> Dict:
-        """Execute main logic"""
-        # Implementation here
-        return {'success': True}
+
+        if not self.data_dir.exists():
+            analysis['recommendations'].append(
+                f"Directory {self.data_dir} does not exist"
+            )
+            return analysis
+
+        # Check for common dataset structures
+        # COCO format
+        if (self.data_dir / 'annotations').exists():
+            analysis['annotations']['format'] = 'coco'
+            for split in ['train', 'val', 'test']:
+                ann_file = self.data_dir / 'annotations' / f'{split}.json'
+                if ann_file.exists():
+                    with open(ann_file, 'r') as f:
+                        data = json.load(f)
+                        analysis['images'][split] = len(data.get('images', []))
+                        if not analysis['annotations']['classes']:
+                            analysis['annotations']['classes'] = [
+                                c['name'] for c in data.get('categories', [])
+                            ]
+
+        # YOLO format
+        elif (self.data_dir / 'labels').exists():
+            analysis['annotations']['format'] = 'yolo'
+            for split in ['train', 'val', 'test']:
+                img_dir = self.data_dir / 'images' / split
+                if img_dir.exists():
+                    analysis['images'][split] = len(list(img_dir.glob('*.*')))
+
+            # Try to read classes from data.yaml
+            data_yaml = self.data_dir / 'data.yaml'
+            if data_yaml.exists():
+                import yaml
+                with open(data_yaml, 'r') as f:
+                    data = yaml.safe_load(f)
+                    analysis['annotations']['classes'] = data.get('names', [])
+
+        # Generate recommendations
+        total_images = sum(analysis['images'].values())
+        if total_images < 100:
+            analysis['recommendations'].append(
+                f"Dataset has only {total_images} images. "
+                "Consider collecting more data or using transfer learning."
+            )
+        if total_images < 1000:
+            analysis['recommendations'].append(
+                "Use aggressive data augmentation (mosaic, mixup) for small datasets."
+            )
+
+        num_classes = len(analysis['annotations']['classes'])
+        if num_classes > 80:
+            analysis['recommendations'].append(
+                f"Large number of classes ({num_classes}). "
+                "Consider using larger model (yolov8l/x) or longer training."
+            )
+
+        logger.info(f"Found {total_images} images, {num_classes} classes")
+        return analysis
+
+    def generate_yolo_config(self, arch: str, epochs: int = 100,
+                             batch: int = 16, imgsz: int = 640,
+                             **kwargs) -> Dict[str, Any]:
+        """Generate Ultralytics YOLO training configuration."""
+        if arch not in YOLO_ARCHITECTURES:
+            available = ', '.join(YOLO_ARCHITECTURES.keys())
+            raise ValueError(f"Unknown architecture: {arch}. Available: {available}")
+
+        arch_info = YOLO_ARCHITECTURES[arch]
+
+        config = {
+            'model': f'{arch}.pt',
+            'data': str(self.data_dir / 'data.yaml'),
+            'epochs': epochs,
+            'batch': batch,
+            'imgsz': imgsz,
+            'patience': 50,
+            'save': True,
+            'save_period': -1,
+            'cache': False,
+            'device': '0',
+            'workers': 8,
+            'project': 'runs/detect',
+            'name': f'{arch}_{datetime.now().strftime("%Y%m%d_%H%M%S")}',
+            'exist_ok': False,
+            'pretrained': True,
+            'optimizer': 'auto',
+            'verbose': True,
+            'seed': 0,
+            'deterministic': True,
+            'single_cls': False,
+            'rect': False,
+            'cos_lr': False,
+            'close_mosaic': 10,
+            'resume': False,
+            'amp': True,
+            'fraction': 1.0,
+            'profile': False,
+            'freeze': None,
+            'lr0': 0.01,
+            'lrf': 0.01,
+            'momentum': 0.937,
+            'weight_decay': 0.0005,
+            'warmup_epochs': 3.0,
+            'warmup_momentum': 0.8,
+            'warmup_bias_lr': 0.1,
+            'box': 7.5,
+            'cls': 0.5,
+            'dfl': 1.5,
+            'pose': 12.0,
+            'kobj': 1.0,
+            'label_smoothing': 0.0,
+            'nbs': 64,
+            'hsv_h': 0.015,
+            'hsv_s': 0.7,
+            'hsv_v': 0.4,
+            'degrees': 0.0,
+            'translate': 0.1,
+            'scale': 0.5,
+            'shear': 0.0,
+            'perspective': 0.0,
+            'flipud': 0.0,
+            'fliplr': 0.5,
+            'bgr': 0.0,
+            'mosaic': 1.0,
+            'mixup': 0.0,
+            'copy_paste': 0.0,
+            'auto_augment': 'randaugment',
+            'erasing': 0.4,
+            'crop_fraction': 1.0,
+        }
+
+        # Update with user overrides
+        config.update(kwargs)
+
+        # Task-specific settings
+        if self.task == 'segmentation':
+            config['model'] = f'{arch}-seg.pt'
+            config['overlap_mask'] = True
+            config['mask_ratio'] = 4
+
+        # Metadata
+        config['_metadata'] = {
+            'architecture': arch,
+            'arch_info': arch_info,
+            'task': self.task,
+            'framework': 'ultralytics',
+            'generated_at': datetime.now().isoformat()
+        }
+
+        self.config = config
+        return config
+
+    def generate_detectron2_config(self, arch: str, epochs: int = 12,
+                                   batch: int = 16, **kwargs) -> Dict[str, Any]:
+        """Generate Detectron2 training configuration."""
+        if arch not in DETECTRON2_ARCHITECTURES:
+            available = ', '.join(DETECTRON2_ARCHITECTURES.keys())
+            raise ValueError(f"Unknown architecture: {arch}. Available: {available}")
+
+        arch_info = DETECTRON2_ARCHITECTURES[arch]
+        iterations = epochs * 1000  # Approximate
+
+        config = {
+            'MODEL': {
+                'WEIGHTS': f'detectron2://COCO-Detection/{arch}_3x/137849458/model_final_280758.pkl',
+                'ROI_HEADS': {
+                    'NUM_CLASSES': len(self._get_classes()),
+                    'BATCH_SIZE_PER_IMAGE': 512,
+                    'POSITIVE_FRACTION': 0.25,
+                    'SCORE_THRESH_TEST': 0.05,
+                    'NMS_THRESH_TEST': 0.5,
+                },
+                'BACKBONE': {
+                    'FREEZE_AT': 2
+                },
+                'FPN': {
+                    'IN_FEATURES': ['res2', 'res3', 'res4', 'res5']
+                },
+                'ANCHOR_GENERATOR': {
+                    'SIZES': [[32], [64], [128], [256], [512]],
+                    'ASPECT_RATIOS': [[0.5, 1.0, 2.0]]
+                },
+                'RPN': {
+                    'PRE_NMS_TOPK_TRAIN': 2000,
+                    'PRE_NMS_TOPK_TEST': 1000,
+                    'POST_NMS_TOPK_TRAIN': 1000,
+                    'POST_NMS_TOPK_TEST': 1000,
+                }
+            },
+            'DATASETS': {
+                'TRAIN': ('custom_train',),
+                'TEST': ('custom_val',),
+            },
+            'DATALOADER': {
+                'NUM_WORKERS': 4,
+                'SAMPLER_TRAIN': 'TrainingSampler',
+                'FILTER_EMPTY_ANNOTATIONS': True,
+            },
+            'SOLVER': {
+                'IMS_PER_BATCH': batch,
+                'BASE_LR': 0.001,
+                'STEPS': (int(iterations * 0.7), int(iterations * 0.9)),
+                'MAX_ITER': iterations,
+                'WARMUP_FACTOR': 1.0 / 1000,
+                'WARMUP_ITERS': 1000,
+                'WARMUP_METHOD': 'linear',
+                'GAMMA': 0.1,
+                'MOMENTUM': 0.9,
+                'WEIGHT_DECAY': 0.0001,
+                'WEIGHT_DECAY_NORM': 0.0,
+                'CHECKPOINT_PERIOD': 5000,
+                'AMP': {
+                    'ENABLED': True
+                }
+            },
+            'INPUT': {
+                'MIN_SIZE_TRAIN': (640, 672, 704, 736, 768, 800),
+                'MAX_SIZE_TRAIN': 1333,
+                'MIN_SIZE_TEST': 800,
+                'MAX_SIZE_TEST': 1333,
+                'FORMAT': 'BGR',
+            },
+            'TEST': {
+                'EVAL_PERIOD': 5000,
+                'DETECTIONS_PER_IMAGE': 100,
+            },
+            'OUTPUT_DIR': f'./output/{arch}_{datetime.now().strftime("%Y%m%d_%H%M%S")}',
+        }
+
+        # Add mask head for instance segmentation
+        if 'mask' in arch.lower():
+            config['MODEL']['MASK_ON'] = True
+            config['MODEL']['ROI_MASK_HEAD'] = {
+                'POOLER_RESOLUTION': 14,
+                'POOLER_SAMPLING_RATIO': 0,
+                'POOLER_TYPE': 'ROIAlignV2'
+            }
+
+        config.update(kwargs)
+        config['_metadata'] = {
+            'architecture': arch,
+            'arch_info': arch_info,
+            'task': self.task,
+            'framework': 'detectron2',
+            'generated_at': datetime.now().isoformat()
+        }
+
+        self.config = config
+        return config
+
+    def generate_mmdetection_config(self, arch: str, epochs: int = 12,
+                                    batch: int = 16, **kwargs) -> Dict[str, Any]:
+        """Generate MMDetection training configuration."""
+        if arch not in MMDETECTION_ARCHITECTURES:
+            available = ', '.join(MMDETECTION_ARCHITECTURES.keys())
+            raise ValueError(f"Unknown architecture: {arch}. Available: {available}")
+
+        arch_info = MMDETECTION_ARCHITECTURES[arch]
+
+        config = {
+            '_base_': [
+                f'../_base_/models/{arch}.py',
+                '../_base_/datasets/coco_detection.py',
+                '../_base_/schedules/schedule_1x.py',
+                '../_base_/default_runtime.py'
+            ],
+            'model': {
+                'roi_head': {
+                    'bbox_head': {
+                        'num_classes': len(self._get_classes())
+                    }
+                }
+            },
+            'data': {
+                'samples_per_gpu': batch // 2,
+                'workers_per_gpu': 4,
+                'train': {
+                    'type': 'CocoDataset',
+                    'ann_file': str(self.data_dir / 'annotations' / 'train.json'),
+                    'img_prefix': str(self.data_dir / 'images' / 'train'),
+                },
+                'val': {
+                    'type': 'CocoDataset',
+                    'ann_file': str(self.data_dir / 'annotations' / 'val.json'),
+                    'img_prefix': str(self.data_dir / 'images' / 'val'),
+                },
+                'test': {
+                    'type': 'CocoDataset',
+                    'ann_file': str(self.data_dir / 'annotations' / 'val.json'),
+                    'img_prefix': str(self.data_dir / 'images' / 'val'),
+                }
+            },
+            'optimizer': {
+                'type': 'SGD',
+                'lr': 0.02,
+                'momentum': 0.9,
+                'weight_decay': 0.0001
+            },
+            'optimizer_config': {
+                'grad_clip': {'max_norm': 35, 'norm_type': 2}
+            },
+            'lr_config': {
+                'policy': 'step',
+                'warmup': 'linear',
+                'warmup_iters': 500,
+                'warmup_ratio': 0.001,
+                'step': [int(epochs * 0.7), int(epochs * 0.9)]
+            },
+            'runner': {
+                'type': 'EpochBasedRunner',
+                'max_epochs': epochs
+            },
+            'checkpoint_config': {
+                'interval': 1
+            },
+            'log_config': {
+                'interval': 50,
+                'hooks': [
+                    {'type': 'TextLoggerHook'},
+                    {'type': 'TensorboardLoggerHook'}
+                ]
+            },
+            'work_dir': f'./work_dirs/{arch}_{datetime.now().strftime("%Y%m%d_%H%M%S")}',
+            'load_from': None,
+            'resume_from': None,
+            'fp16': {'loss_scale': 512.0}
+        }
+
+        config.update(kwargs)
+        config['_metadata'] = {
+            'architecture': arch,
+            'arch_info': arch_info,
+            'task': self.task,
+            'framework': 'mmdetection',
+            'generated_at': datetime.now().isoformat()
+        }
+
+        self.config = config
+        return config
+
+    def _get_classes(self) -> List[str]:
+        """Get class names from dataset."""
+        analysis = self.analyze_dataset()
+        classes = analysis['annotations']['classes']
+        if not classes:
+            classes = ['object']  # Default fallback
+        return classes
+
+    def save_config(self, output_path: str) -> str:
+        """Save configuration to file."""
+        output_path = Path(output_path)
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+
+        if self.framework == 'ultralytics':
+            # YOLO uses YAML
+            import yaml
+            with open(output_path, 'w') as f:
+                yaml.dump(self.config, f, default_flow_style=False, sort_keys=False)
+        else:
+            # Detectron2 and MMDetection use Python configs
+            with open(output_path, 'w') as f:
+                f.write("# Auto-generated configuration\n")
+                f.write(f"# Generated at: {datetime.now().isoformat()}\n\n")
+                f.write(f"config = {json.dumps(self.config, indent=2)}\n")
+
+        logger.info(f"Configuration saved to {output_path}")
+        return str(output_path)
+
+    def generate_training_command(self) -> str:
+        """Generate the training command for the framework."""
+        if self.framework == 'ultralytics':
+            return f"yolo detect train data={self.config.get('data', 'data.yaml')} " \
+                   f"model={self.config.get('model', 'yolov8m.pt')} " \
+                   f"epochs={self.config.get('epochs', 100)} " \
+                   f"imgsz={self.config.get('imgsz', 640)}"
+        elif self.framework == 'detectron2':
+            return f"python train_net.py --config-file config.yaml --num-gpus 1"
+        elif self.framework == 'mmdetection':
+            return f"python tools/train.py config.py"
+        return ""
+
+    def print_summary(self):
+        """Print configuration summary."""
+        meta = self.config.get('_metadata', {})
+
+        print("\n" + "=" * 60)
+        print("TRAINING CONFIGURATION SUMMARY")
+        print("=" * 60)
+        print(f"Framework:     {meta.get('framework', 'unknown')}")
+        print(f"Architecture:  {meta.get('architecture', 'unknown')}")
+        print(f"Task:          {meta.get('task', 'detection')}")
+
+        if 'arch_info' in meta:
+            info = meta['arch_info']
+            if 'params' in info:
+                print(f"Parameters:    {info['params']}")
+            if 'map' in info:
+                print(f"COCO mAP:      {info['map']}")
+
+        print("-" * 60)
+        print("Training Command:")
+        print(f"  {self.generate_training_command()}")
+        print("=" * 60 + "\n")
+

 def main():
-    """Main entry point"""
    parser = argparse.ArgumentParser(
-        description="Vision Model Trainer"
+        description="Generate vision model training configurations"
    )
-    parser.add_argument('--input', '-i', required=True, help='Input path')
-    parser.add_argument('--output', '-o', required=True, help='Output path')
-    parser.add_argument('--config', '-c', help='Configuration file')
-    parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output')
-    
+    parser.add_argument('data_dir', help='Path to dataset directory')
+    parser.add_argument('--task', choices=['detection', 'segmentation'],
+                       default='detection', help='Task type')
+    parser.add_argument('--framework', choices=['ultralytics', 'detectron2', 'mmdetection'],
+                       default='ultralytics', help='Training framework')
+    parser.add_argument('--arch', default='yolov8m',
+                       help='Model architecture')
+    parser.add_argument('--epochs', type=int, default=100, help='Training epochs')
+    parser.add_argument('--batch', type=int, default=16, help='Batch size')
+    parser.add_argument('--imgsz', type=int, default=640, help='Image size')
+    parser.add_argument('--output', '-o', help='Output config file path')
+    parser.add_argument('--analyze-only', action='store_true',
+                       help='Only analyze dataset, do not generate config')
+    parser.add_argument('--json', action='store_true',
+                       help='Output as JSON')
+
    args = parser.parse_args()
-    
-    if args.verbose:
-        logging.getLogger().setLevel(logging.DEBUG)
-    
+
+    trainer = VisionModelTrainer(
+        data_dir=args.data_dir,
+        task=args.task,
+        framework=args.framework
+    )
+
+    # Analyze dataset
+    analysis = trainer.analyze_dataset()
+
+    if args.analyze_only:
+        if args.json:
+            print(json.dumps(analysis, indent=2))
+        else:
+            print("\nDataset Analysis:")
+            print(f"  Path: {analysis['path']}")
+            print(f"  Format: {analysis['annotations']['format']}")
+            print(f"  Classes: {len(analysis['annotations']['classes'])}")
+            print(f"  Images - Train: {analysis['images']['train']}, "
+                  f"Val: {analysis['images']['val']}, "
+                  f"Test: {analysis['images']['test']}")
+            if analysis['recommendations']:
+                print("\nRecommendations:")
+                for rec in analysis['recommendations']:
+                    print(f"  - {rec}")
+        return
+
+    # Generate configuration
    try:
-        config = {
-            'input': args.input,
-            'output': args.output
-        }
-        
-        processor = VisionModelTrainer(config)
-        results = processor.process()
-        
-        print(json.dumps(results, indent=2))
-        sys.exit(0)
-        
-    except Exception as e:
-        logger.error(f"Fatal error: {e}")
+        if args.framework == 'ultralytics':
+            config = trainer.generate_yolo_config(
+                arch=args.arch,
+                epochs=args.epochs,
+                batch=args.batch,
+                imgsz=args.imgsz
+            )
+        elif args.framework == 'detectron2':
+            config = trainer.generate_detectron2_config(
+                arch=args.arch,
+                epochs=args.epochs,
+                batch=args.batch
+            )
+        elif args.framework == 'mmdetection':
+            config = trainer.generate_mmdetection_config(
+                arch=args.arch,
+                epochs=args.epochs,
+                batch=args.batch
+            )
+    except ValueError as e:
+        logger.error(str(e))
        sys.exit(1)

+    # Output
+    if args.json:
+        print(json.dumps(config, indent=2))
+    else:
+        trainer.print_summary()
+
+        if args.output:
+            trainer.save_config(args.output)
+
+
 if __name__ == '__main__':
    main()