Files
antigravity-skills-reference/skills/hugging-face-model-trainer/references/training_patterns.md
sickn33 bdcfbb9625 feat(hugging-face): Add official ecosystem skills
Import the official Hugging Face ecosystem skills and sync the\nexisting local coverage with upstream metadata and assets.\n\nRegenerate the canonical catalog, plugin mirrors, docs, and release\nnotes after the maintainer merge batch so main stays in sync.\n\nFixes #417
2026-03-29 18:31:46 +02:00

6.0 KiB
Raw Blame History

Common Training Patterns

This guide provides common training patterns and use cases for TRL on Hugging Face Jobs.

Multi-GPU Training

Automatic distributed training across multiple GPUs. TRL/Accelerate handles distribution automatically:

hf_jobs("uv", {
    "script": """
# Your training script here (same as single GPU)
# No changes needed - Accelerate detects multiple GPUs
""",
    "flavor": "a10g-largex2",  # 2x A10G GPUs
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Tips for multi-GPU:

  • No code changes needed
  • Use per_device_train_batch_size (per GPU, not total)
  • Effective batch size = per_device_train_batch_size × num_gpus × gradient_accumulation_steps
  • Monitor GPU utilization to ensure both GPUs are being used

DPO Training (Preference Learning)

Train with preference data for alignment:

hf_jobs("uv", {
    "script": """
# /// script
# dependencies = ["trl>=0.12.0", "trackio"]
# ///

from datasets import load_dataset
from trl import DPOTrainer, DPOConfig
import trackio

dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")

# Create train/eval split
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

config = DPOConfig(
    output_dir="dpo-model",
    push_to_hub=True,
    hub_model_id="username/dpo-model",
    num_train_epochs=1,
    beta=0.1,  # KL penalty coefficient
    eval_strategy="steps",
    eval_steps=50,
    report_to="trackio",
    run_name="baseline_run", # use a meaningful run name
    # max_length=1024,  # Default - only set if you need different sequence length
)

trainer = DPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",  # Use instruct model as base
    train_dataset=dataset_split["train"],
    eval_dataset=dataset_split["test"],  # IMPORTANT: Provide eval_dataset when eval_strategy is enabled
    args=config,
)

trainer.train()
trainer.push_to_hub()
trackio.finish()
""",
    "flavor": "a10g-large",
    "timeout": "3h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

For DPO documentation: Use hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")

GRPO Training (Online RL)

Group Relative Policy Optimization for online reinforcement learning:

hf_jobs("uv", {
    "script": "https://raw.githubusercontent.com/huggingface/trl/main/examples/scripts/grpo.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B-Instruct",
        "--dataset_name", "trl-lib/math_shepherd",
        "--output_dir", "grpo-model",
        "--push_to_hub",
        "--hub_model_id", "username/grpo-model"
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

For GRPO documentation: Use hf_doc_fetch("https://huggingface.co/docs/trl/grpo_trainer")

Trackio Configuration

Use sensible defaults for trackio setup. See references/trackio_guide.md for complete documentation including grouping runs for experiments.

Basic Pattern

import trackio

trackio.init(
    project="my-training",
    run_name="baseline-run",             # Descriptive name user will recognize
    space_id="username/trackio",     # Default space: {username}/trackio
    config={
        # Keep config minimal - hyperparameters and model/dataset info only
        "model": "Qwen/Qwen2.5-0.5B",
        "dataset": "trl-lib/Capybara",
        "learning_rate": 2e-5,
    }
)

# Your training code...

trackio.finish()

Grouping for Experiments (Optional)

When user wants to compare related runs, use the group parameter:

# Hyperparameter sweep
trackio.init(project="hyperparam-sweep", run_name="lr-0.001", group="lr_0.001")
trackio.init(project="hyperparam-sweep", run_name="lr-0.01", group="lr_0.01")

Pattern Selection Guide

Use Case Pattern Hardware Time
SFT training scripts/train_sft_example.py a10g-large 2-6 hours
Large dataset (>10K) Multi-GPU a10g-largex2 4-12 hours
Preference learning DPO Training a10g-large 2-4 hours
Online RL GRPO Training a10g-large 3-6 hours

Critical: Evaluation Dataset Requirements

⚠️ IMPORTANT: If you set eval_strategy="steps" or eval_strategy="epoch", you MUST provide an eval_dataset to the trainer, or the training will hang.

CORRECT - With eval dataset:

dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset_split["train"],
    eval_dataset=dataset_split["test"],  # ← MUST provide when eval_strategy is enabled
    args=SFTConfig(eval_strategy="steps", ...),
)

WRONG - Will hang:

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
    # NO eval_dataset but eval_strategy="steps" ← WILL HANG
    args=SFTConfig(eval_strategy="steps", ...),
)

Option: Disable evaluation if no eval dataset

config = SFTConfig(
    eval_strategy="no",  # ← Explicitly disable evaluation
    # ... other config
)

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset,
    # No eval_dataset needed
    args=config,
)

Best Practices

  1. Use train/eval splits - Create evaluation split for monitoring progress
  2. Enable Trackio - Monitor progress in real-time
  3. Add 20-30% buffer to timeout - Account for loading/saving overhead
  4. Test with TRL official scripts first - Use maintained examples before custom code
  5. Always provide eval_dataset - When using eval_strategy, or set to "no"
  6. Use multi-GPU for large models - 7B+ models benefit significantly

See Also

  • scripts/train_sft_example.py - Complete SFT template with Trackio and eval split
  • scripts/train_dpo_example.py - Complete DPO template
  • scripts/train_grpo_example.py - Complete GRPO template
  • references/hardware_guide.md - Detailed hardware specifications
  • references/training_methods.md - Overview of all TRL training methods
  • references/troubleshooting.md - Common issues and solutions