feat: Add validation loss tracking, early stopping, and checkpoint cleanup by NotNANtoN · Pull Request #2633 · huggingface/lerobot

NotNANtoN · 2025-12-12T13:58:49Z

Summary

Adds a robust validation pipeline to lerobot-train. This allows for monitoring generalization during training using a separate subset of episodes, enabling early stopping and automatic disk space management via checkpoint cleanup.

Features

Validation Split: validation_fraction config option to split dataset episodes into train/val sets.
Random Episode Shuffling: Includes early_stopping.shuffle_episodes (default: True) to ensure the validation set is a representative cross-section of the whole dataset, avoiding bias from sequential data collection.
Model-Agnostic Validation Loss: Computed using the policy's select_action method. This provides L1 and L2 losses that are comparable across different architectures.
Early Stopping: Automatically stop training if a monitored metric (val_loss or eval_success) stops improving after a set patience period.
Checkpoint Management: keep_last_n_checkpoints automatically prunes old checkpoints to save disk space. The logic is robust to resolving the last symlink, ensuring the target of the symlink is never deleted.

Design Decisions

select_action vs forward: We use select_action for validation to provide a "real-world" metric. This captures the model's performance as it would behave in deployment, including any inference-time processing or ensembling.
Inference Compatibility: Forcing batch_size=1 for validation is a proactive design choice. While not all current policies use internal state, many inference implementations (including those with temporal queuing or history tracking) are designed for single-stream execution. This ensures the validation framework is compatible with the widest range of policies.
Clean Validation Data: The validation dataset is instantiated with image transforms disabled to provide a consistent, unaugmented baseline for performance monitoring.
Minimal Impact: All features are opt-in. Default behavior (fraction = 0.0) remains identical to the current main branch.

Config Options

validation_fraction: float = 0.0          # 0.1 = 10% of episodes for validation
early_stopping.enable: bool = False       # Toggle early stopping
early_stopping.patience_steps: int = 10000 # Steps to wait for improvement
early_stopping.monitor: str = "val_loss"  # "val_loss" or "eval_success"
early_stopping.shuffle_episodes: bool = True # Shuffle before split (Recommended)
keep_last_n_checkpoints: int = 0          # 0 = keep all, N = keep only latest N

Testing

Verified with ACT (model-agnostic L1/L2 monitoring)
Verified with SmolVLA (VRAM efficiency with batch size 1)
Verified Early Stopping triggers correctly on val_loss

This PR adds the ability to track validation loss during training: Features: - validation_fraction config option to split episodes into train/val sets - Validation loss computed using inference (select_action) for model-agnostic metrics - L1 and L2 loss metrics logged to wandb under val/ prefix - Early stopping based on validation loss or eval success rate - keep_last_n_checkpoints option to automatically cleanup old checkpoints The validation uses a separate dataset copy without augmentations for clean evaluation. Uses select_action for inference-based validation, making it policy-agnostic. Backward compatible - defaults maintain existing behavior (no validation split). Config options: - validation_fraction: 0.0-1.0 (default 0.0, no validation) - early_stopping.enable: bool (default False) - early_stopping.patience_steps: int (default 10000) - early_stopping.monitor: 'val_loss' or 'eval_success' - keep_last_n_checkpoints: int (default 0, keep all)

…t_action, and safe checkpoint cleanup

imstevenpmwork · 2026-01-13T15:54:42Z

Thanks for the contribution!

Can you solve the conflicts?

NotNANtoN · 2026-01-17T16:43:46Z

Thanks for the contribution!

Can you solve the conflicts?

Thanks! Conflicts are solved now.

imstevenpmwork · 2026-01-19T19:25:24Z

Hello @NotNANtoN , thanks again for this contribution. Would it be too much to ask to split this PR into smaller ones? There are a lot of different -unrelated- things happening on this one which makes the review extensive. I would suggest opening a PR with only the early stopping for example, once that one is merged we can proceed with the 2 other features added here 😄

NotNANtoN · 2026-01-23T15:27:59Z

Thanks for the feedback! Happy to split this up. Just to confirm, since early stopping requires validation loss to monitor, should I keep validation tracking + early stopping together in one PR, and split out checkpoint cleanup (keep_last_n_checkpoints + logic) as a separate PR? Or did you have a different split in mind?

NotNANtoN added 2 commits December 12, 2025 14:55

feat: improve validation split with shuffling, batch size 1 for selec…

b6c0985

…t_action, and safe checkpoint cleanup

github-actions bot added the configuration Problems with configuration files or settings label Dec 24, 2025

Merge upstream main and resolve conflicts in train config

cb7a8b2

imstevenpmwork self-assigned this Jan 13, 2026

imstevenpmwork self-requested a review January 13, 2026 15:54

Merge upstream/main into validation_loss

5de04c2

Merge branch 'main' into validation_loss

c630b56

Merge branch 'main' into validation_loss

5feb449

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add validation loss tracking, early stopping, and checkpoint cleanup#2633

feat: Add validation loss tracking, early stopping, and checkpoint cleanup#2633
NotNANtoN wants to merge 6 commits intohuggingface:mainfrom
githubnemo:validation_loss

NotNANtoN commented Dec 12, 2025 •

edited

Loading

Uh oh!

imstevenpmwork commented Jan 13, 2026

Uh oh!

NotNANtoN commented Jan 17, 2026

Uh oh!

imstevenpmwork commented Jan 19, 2026

Uh oh!

NotNANtoN commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NotNANtoN commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Design Decisions

Config Options

Testing

Uh oh!

imstevenpmwork commented Jan 13, 2026

Uh oh!

NotNANtoN commented Jan 17, 2026

Uh oh!

imstevenpmwork commented Jan 19, 2026

Uh oh!

NotNANtoN commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NotNANtoN commented Dec 12, 2025 •

edited

Loading