feat: add val_at_end for all algorithms by terrykong · Pull Request #1863 · NVIDIA-NeMo/RL

terrykong · 2026-02-02T20:46:20Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added val_at_end configuration parameter across training algorithms, enabling validation execution at the end of training runs.
Bug Fixes
- Improved checkpoint selection logic to gracefully handle cases where metrics are missing from checkpoints, with appropriate warnings and fallback to latest checkpoint.
Tests
- Added comprehensive unit tests for checkpoint selection behavior, including edge cases with missing metrics.

Signed-off-by: Terry Kong <terryk@nvidia.com>

coderabbitai · 2026-02-02T20:51:34Z

📝 Walkthrough

Walkthrough

This PR introduces a new val_at_end configuration flag across multiple training algorithms (Distillation, DPO, GRPO, RM, SFT) to enable validation runs at the final training step. Corresponding configuration files, algorithm implementations, and tests are updated to support this feature, alongside improvements to checkpoint selection logic for handling missing metrics.

Changes

Cohort / File(s)	Summary
Configuration Files - Validation Flag `examples/configs/distillation_math.yaml`, `examples/configs/dpo.yaml`, `examples/configs/grpo_math_1B.yaml`, `examples/configs/grpo_math_1B_megatron.yaml`, `examples/configs/rm.yaml`, `examples/configs/sft.yaml`, `examples/configs/sft_openmathinstruct2.yaml`, `examples/configs/sft_openmathinstruct2_megatron.yaml`, `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`, `examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml`, `research/template_project/configs/grpo_math_1B.yaml`	Added `val_at_end: false` configuration flag across all training algorithm examples to enable end-of-training validation control.
Algorithm Implementations - Validation Feature `nemo_rl/algorithms/distillation.py`, `nemo_rl/algorithms/dpo.py`, `nemo_rl/algorithms/grpo.py`, `nemo_rl/algorithms/rm.py`, `nemo_rl/algorithms/sft.py`	Added `val_at_end: bool` field to respective config TypedDicts; extended validation dataset loading and training loops to trigger validation on final training step when flag is enabled alongside existing periodic validation.
Checkpoint Utility `nemo_rl/utils/checkpoint.py`	Enhanced `get_best_checkpoint_path` to filter checkpoints missing the target metric, emit warnings for missing metrics, and fallback to latest checkpoint when no valid checkpoints contain the metric.
Test Updates - Configuration `tests/functional/test_converter_roundtrip.py`, `tests/unit/algorithms/test_distillation.py`, `tests/unit/algorithms/test_dpo.py`, `tests/unit/algorithms/test_grpo.py`, `tests/unit/algorithms/test_rm.py`, `tests/unit/algorithms/test_sft.py`	Updated test fixtures and configuration dictionaries to include `val_at_end: False` flag in master_config blocks.
Test Updates - Checkpoint Logic `tests/unit/utils/test_checkpoint.py`	Added four new unit tests covering checkpoint selection with missing metrics, filtering behavior, and higher_is_better semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add Megatron support for on-policy distillation #1324: Modifies nemo_rl/algorithms/distillation.py with Megatron-related behavior changes in the same file being updated for validation flag logic.
cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0 #1398: Updates nemo_rl/algorithms/distillation.py for DistillationConfig and validation/training behavior modifications.
feat: add async RL support #1098: Introduces async_grpo_train and async utilities; this PR extends the same async path to respect the new val_at_end validation flag.

Suggested labels

CI:L1

Suggested reviewers

yuki-97
samodi-nv

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR adds new val_at_end feature across multiple algorithms, qualifying as major change requiring documented test results in PR description.	Update PR description to document test results, test scenarios covered, and verification of no regressions in existing validation behavior.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add val_at_end for all algorithms' directly and clearly describes the main change: adding a new configuration flag val_at_end across all training algorithms.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/val_at_end

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_rl/algorithms/sft.py (1)
391-475: ⚠️ Potential issue | 🟠 Major

Fail fast if val_at_end is enabled without a validation dataloader.

Right now val_at_end can be true while val_dataloader is None, which silently skips validation and defeats the feature (no final val metrics). Add a guard when any validation flag is enabled.
🛡️ Suggested guard
     val_period = sft_config["val_period"]
     val_at_start = sft_config["val_at_start"]
     val_at_end = sft_config["val_at_end"]
     max_num_epochs = sft_config["max_num_epochs"]
+
+    if (val_period > 0 or val_at_start or val_at_end) and val_dataloader is None:
+        raise AssertionError(
+            "Validation is enabled but no validation dataset/dataloader was provided."
+        )

🤖 Fix all issues with AI agents

In `@nemo_rl/algorithms/distillation.py`:
- Around line 83-85: Update the inline doc for DistillationConfig.val_at_end to
explicitly state the type (bool), allowed values (True/False), and the
recommended default (True) along with the short rationale ("run validation on
final training step so final checkpoint has validation metrics for
get_best_checkpoint_path()"); then update the exemplar YAMLs under
examples/configs/*.yaml to include val_at_end: true so the default is reflected
in examples. Reference: DistillationConfig and the val_at_end key when making
the edits.

In `@nemo_rl/algorithms/grpo.py`:
- Around line 128-130: The docstring for the GRPOConfig TypedDict is missing the
valid types/values and recommended default for the new key val_at_end; update
the inline comment next to val_at_end (in GRPOConfig) to state that it is a
boolean (True/False), explain its purpose (run validation on the final training
step so final checkpoint contains validation metrics used by
get_best_checkpoint_path()), and declare the recommended default (e.g., False or
True as decided); then update the exemplar YAMLs under examples/configs/*.yaml
to include val_at_end with that same default value so examples reflect the
documented default.

In `@nemo_rl/algorithms/sft.py`:
- Around line 73-75: The SFTConfig key val_at_end lacks a documented type/valid
values and a recommended default; update the SFTConfig TypedDict entry for
val_at_end to state it's a boolean (valid values: True/False), describe its
purpose briefly, and specify the recommended default (set default: False unless
your workflows require final-step validation), and then add val_at_end: false to
the exemplar YAMLs under examples/configs/*.yaml so examples reflect the
declared default; edit the inline comment next to val_at_end and the exemplar
YAMLs accordingly (reference symbol: val_at_end in SFTConfig).

🧹 Nitpick comments (3)

nemo_rl/utils/checkpoint.py (1)
251-255: Add stacklevel=2 to warnings.warn calls.

Both warnings.warn calls (lines 251 and 258) should include stacklevel=2 so the warning points to the caller rather than this internal method.
🔧 Proposed fix
         warnings.warn(
             f"Ignoring {ignored_count} checkpoint(s) at step(s) {ignored_steps} that do not have "
             f"metric '{self.metric_name}'. Consider enabling val_at_end or adjusting val_period "
-            f"to align with max_steps."
+            f"to align with max_steps.",
+            stacklevel=2,
         )
And similarly for line 258:
         warnings.warn(
             f"No checkpoints contain metric '{self.metric_name}'. Returning latest checkpoint. "
-            f"Consider enabling val_at_end or adjusting val_period to align with max_steps."
+            f"Consider enabling val_at_end or adjusting val_period to align with max_steps.",
+            stacklevel=2,
         )
tests/unit/utils/test_checkpoint.py (2)
352-355: Consider removing unused checkpoint_dir fixture parameter.

The checkpoint_dir fixture is not directly used in this test. If it's needed to ensure the directory exists for checkpoint_manager, the dependency is already handled through the fixture chain.
🔧 Proposed fix
-def test_get_best_checkpoint_path_no_checkpoints(checkpoint_manager, checkpoint_dir):
+def test_get_best_checkpoint_path_no_checkpoints(checkpoint_manager):
     """Test that get_best_checkpoint_path returns None when no checkpoints exist."""
     result = checkpoint_manager.get_best_checkpoint_path()
     assert result is None
379-381: Consider adding strict=True to zip() calls.

While the lists are hardcoded and known to be the same length, adding strict=True is a good defensive practice that catches mismatches during test maintenance.
🔧 Proposed fix for all three occurrences

Line 379:
-    for step, training_info in zip(steps, training_infos):
+    for step, training_info in zip(steps, training_infos, strict=True):
Line 421:
-    for step, training_info in zip(steps, training_infos):
+    for step, training_info in zip(steps, training_infos, strict=True):
Line 458:
-    for step, acc in zip(steps, accuracies):
+    for step, acc in zip(steps, accuracies, strict=True):

nemo_rl/algorithms/distillation.py

nemo_rl/algorithms/grpo.py

nemo_rl/algorithms/sft.py

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com>

feat: add val_at_end for all algorithms

b761dd1

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong requested a review from yuki-97 February 2, 2026 20:46

terrykong requested review from a team as code owners February 2, 2026 20:46

terrykong mentioned this pull request Feb 2, 2026

Last step checkpoint save for GRPO creates a checkpoint that doesn't have val_reward in training_info.json #1415

Closed

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

nemo_rl/algorithms/distillation.py Show resolved Hide resolved

nemo_rl/algorithms/grpo.py Show resolved Hide resolved

nemo_rl/algorithms/sft.py Show resolved Hide resolved

yuki-97 approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into tk/val_at_end

87cf68c

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Feb 3, 2026

yuki-97 had a problem deploying to nemo-ci February 3, 2026 06:38 — with GitHub Actions Error

yuki-97 enabled auto-merge (squash) February 3, 2026 06:39

Merge branch 'main' into tk/val_at_end

7848dac

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

yuki-97 temporarily deployed to nemo-ci February 3, 2026 13:59 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 3, 2026 16:25 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 4, 2026 05:17 — with GitHub Actions Inactive

yuki-97 merged commit 2bb4611 into main Feb 4, 2026
55 of 58 checks passed

yuki-97 deleted the tk/val_at_end branch February 4, 2026 09:24

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: add val_at_end for all algorithms (#1863)

8b1b6a1

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: add val_at_end for all algorithms (#1863)

01557ce

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

feat: add val_at_end for all algorithms (#1863)

81c9075

Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add val_at_end for all algorithms#1863

feat: add val_at_end for all algorithms#1863
yuki-97 merged 3 commits intomainfrom
tk/val_at_end

terrykong commented Feb 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

terrykong commented Feb 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

terrykong commented Feb 2, 2026 •

edited by coderabbitai bot

Loading