Skip to content

feat: add val_at_end for all algorithms#1863

Merged
yuki-97 merged 3 commits intomainfrom
tk/val_at_end
Feb 4, 2026
Merged

feat: add val_at_end for all algorithms#1863
yuki-97 merged 3 commits intomainfrom
tk/val_at_end

Conversation

@terrykong
Copy link
Collaborator

@terrykong terrykong commented Feb 2, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

closes #1415

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added val_at_end configuration parameter across training algorithms, enabling validation execution at the end of training runs.
  • Bug Fixes

    • Improved checkpoint selection logic to gracefully handle cases where metrics are missing from checkpoints, with appropriate warnings and fallback to latest checkpoint.
  • Tests

    • Added comprehensive unit tests for checkpoint selection behavior, including edge cases with missing metrics.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 2, 2026

📝 Walkthrough

Walkthrough

This PR introduces a new val_at_end configuration flag across multiple training algorithms (Distillation, DPO, GRPO, RM, SFT) to enable validation runs at the final training step. Corresponding configuration files, algorithm implementations, and tests are updated to support this feature, alongside improvements to checkpoint selection logic for handling missing metrics.

Changes

Cohort / File(s) Summary
Configuration Files - Validation Flag
examples/configs/distillation_math.yaml, examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml, examples/configs/rm.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml, examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml, research/template_project/configs/grpo_math_1B.yaml
Added val_at_end: false configuration flag across all training algorithm examples to enable end-of-training validation control.
Algorithm Implementations - Validation Feature
nemo_rl/algorithms/distillation.py, nemo_rl/algorithms/dpo.py, nemo_rl/algorithms/grpo.py, nemo_rl/algorithms/rm.py, nemo_rl/algorithms/sft.py
Added val_at_end: bool field to respective config TypedDicts; extended validation dataset loading and training loops to trigger validation on final training step when flag is enabled alongside existing periodic validation.
Checkpoint Utility
nemo_rl/utils/checkpoint.py
Enhanced get_best_checkpoint_path to filter checkpoints missing the target metric, emit warnings for missing metrics, and fallback to latest checkpoint when no valid checkpoints contain the metric.
Test Updates - Configuration
tests/functional/test_converter_roundtrip.py, tests/unit/algorithms/test_distillation.py, tests/unit/algorithms/test_dpo.py, tests/unit/algorithms/test_grpo.py, tests/unit/algorithms/test_rm.py, tests/unit/algorithms/test_sft.py
Updated test fixtures and configuration dictionaries to include val_at_end: False flag in master_config blocks.
Test Updates - Checkpoint Logic
tests/unit/utils/test_checkpoint.py
Added four new unit tests covering checkpoint selection with missing metrics, filtering behavior, and higher_is_better semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

CI:L1

Suggested reviewers

  • yuki-97
  • samodi-nv
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR adds new val_at_end feature across multiple algorithms, qualifying as major change requiring documented test results in PR description. Update PR description to document test results, test scenarios covered, and verification of no regressions in existing validation behavior.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add val_at_end for all algorithms' directly and clearly describes the main change: adding a new configuration flag val_at_end across all training algorithms.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/val_at_end

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/algorithms/sft.py (1)

391-475: ⚠️ Potential issue | 🟠 Major

Fail fast if val_at_end is enabled without a validation dataloader.

Right now val_at_end can be true while val_dataloader is None, which silently skips validation and defeats the feature (no final val metrics). Add a guard when any validation flag is enabled.

🛡️ Suggested guard
     val_period = sft_config["val_period"]
     val_at_start = sft_config["val_at_start"]
     val_at_end = sft_config["val_at_end"]
     max_num_epochs = sft_config["max_num_epochs"]
+
+    if (val_period > 0 or val_at_start or val_at_end) and val_dataloader is None:
+        raise AssertionError(
+            "Validation is enabled but no validation dataset/dataloader was provided."
+        )
🤖 Fix all issues with AI agents
In `@nemo_rl/algorithms/distillation.py`:
- Around line 83-85: Update the inline doc for DistillationConfig.val_at_end to
explicitly state the type (bool), allowed values (True/False), and the
recommended default (True) along with the short rationale ("run validation on
final training step so final checkpoint has validation metrics for
get_best_checkpoint_path()"); then update the exemplar YAMLs under
examples/configs/*.yaml to include val_at_end: true so the default is reflected
in examples. Reference: DistillationConfig and the val_at_end key when making
the edits.

In `@nemo_rl/algorithms/grpo.py`:
- Around line 128-130: The docstring for the GRPOConfig TypedDict is missing the
valid types/values and recommended default for the new key val_at_end; update
the inline comment next to val_at_end (in GRPOConfig) to state that it is a
boolean (True/False), explain its purpose (run validation on the final training
step so final checkpoint contains validation metrics used by
get_best_checkpoint_path()), and declare the recommended default (e.g., False or
True as decided); then update the exemplar YAMLs under examples/configs/*.yaml
to include val_at_end with that same default value so examples reflect the
documented default.

In `@nemo_rl/algorithms/sft.py`:
- Around line 73-75: The SFTConfig key val_at_end lacks a documented type/valid
values and a recommended default; update the SFTConfig TypedDict entry for
val_at_end to state it's a boolean (valid values: True/False), describe its
purpose briefly, and specify the recommended default (set default: False unless
your workflows require final-step validation), and then add val_at_end: false to
the exemplar YAMLs under examples/configs/*.yaml so examples reflect the
declared default; edit the inline comment next to val_at_end and the exemplar
YAMLs accordingly (reference symbol: val_at_end in SFTConfig).
🧹 Nitpick comments (3)
nemo_rl/utils/checkpoint.py (1)

251-255: Add stacklevel=2 to warnings.warn calls.

Both warnings.warn calls (lines 251 and 258) should include stacklevel=2 so the warning points to the caller rather than this internal method.

🔧 Proposed fix
         warnings.warn(
             f"Ignoring {ignored_count} checkpoint(s) at step(s) {ignored_steps} that do not have "
             f"metric '{self.metric_name}'. Consider enabling val_at_end or adjusting val_period "
-            f"to align with max_steps."
+            f"to align with max_steps.",
+            stacklevel=2,
         )

And similarly for line 258:

         warnings.warn(
             f"No checkpoints contain metric '{self.metric_name}'. Returning latest checkpoint. "
-            f"Consider enabling val_at_end or adjusting val_period to align with max_steps."
+            f"Consider enabling val_at_end or adjusting val_period to align with max_steps.",
+            stacklevel=2,
         )
tests/unit/utils/test_checkpoint.py (2)

352-355: Consider removing unused checkpoint_dir fixture parameter.

The checkpoint_dir fixture is not directly used in this test. If it's needed to ensure the directory exists for checkpoint_manager, the dependency is already handled through the fixture chain.

🔧 Proposed fix
-def test_get_best_checkpoint_path_no_checkpoints(checkpoint_manager, checkpoint_dir):
+def test_get_best_checkpoint_path_no_checkpoints(checkpoint_manager):
     """Test that get_best_checkpoint_path returns None when no checkpoints exist."""
     result = checkpoint_manager.get_best_checkpoint_path()
     assert result is None

379-381: Consider adding strict=True to zip() calls.

While the lists are hardcoded and known to be the same length, adding strict=True is a good defensive practice that catches mismatches during test maintenance.

🔧 Proposed fix for all three occurrences

Line 379:

-    for step, training_info in zip(steps, training_infos):
+    for step, training_info in zip(steps, training_infos, strict=True):

Line 421:

-    for step, training_info in zip(steps, training_infos):
+    for step, training_info in zip(steps, training_infos, strict=True):

Line 458:

-    for step, acc in zip(steps, accuracies):
+    for step, acc in zip(steps, accuracies, strict=True):

@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Feb 3, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) February 3, 2026 06:39
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026
@yuki-97 yuki-97 merged commit 2bb4611 into main Feb 4, 2026
55 of 58 checks passed
@yuki-97 yuki-97 deleted the tk/val_at_end branch February 4, 2026 09:24
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Last step checkpoint save for GRPO creates a checkpoint that doesn't have val_reward in training_info.json

2 participants