chore: fix grpo functional test metric by RayenTian · Pull Request #1643 · NVIDIA-NeMo/RL

RayenTian · 2025-12-16T11:26:30Z

What does this PR do ?

Reduce the functional test acceptance threshold for train/gen_kl_error from 1.05 to 0.001 to catch subtle regressions in generation logprob/KL alignment early.

Related Context

Link: #1506 (comment)
gen_kl_error should actually be pretty low, usually we set it to be <1e-3.

Summary by CodeRabbit

Tests
- Updated validation thresholds for training metrics to enforce stricter quality standards during testing procedures.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: ruit <ruit@nvidia.com>

coderabbitai · 2025-12-16T11:34:58Z

📝 Walkthrough

Walkthrough

A test threshold expectation is adjusted in the GRPO test suite, reducing the maximum tolerance for the train/gen_kl_error metric from 1.05 to 0.001, enforcing stricter validation criteria during test execution.

Changes

Cohort / File(s)	Summary
Test threshold adjustment `tests/functional/grpo.sh`, `tests/check_metrics.py`	Reduced max(`data["train/gen_kl_error"]`) threshold from 1.05 to 0.001 for stricter metric validation

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Simple numeric threshold modification with no logic or control flow changes
Verify that the new threshold is intentional and achievable under test conditions

Suggested labels

CI:L1

Suggested reviewers

yuki-97
joyang-nv

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: fix grpo functional test metric' directly describes the main change: adjusting the functional test acceptance threshold for a GRPO metric.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	Changes are minor test threshold adjustments in functional test suite, not production code modifications. No new features, breaking changes, or significant refactoring introduced.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/fix_grpo_functional_test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

fix grpo functional test metric

2bf1c23

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian requested a review from yuki-97 December 16, 2025 11:26

RayenTian marked this pull request as ready for review December 16, 2025 11:31

RayenTian requested a review from a team as a code owner December 16, 2025 11:31

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 16, 2025

RayenTian temporarily deployed to nemo-ci December 16, 2025 11:39 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 16, 2025 11:43 — with GitHub Actions Inactive

yuki-97 requested a review from terrykong December 16, 2025 11:46

yuki-97 approved these changes Dec 16, 2025

View reviewed changes

RayenTian temporarily deployed to nemo-ci December 16, 2025 13:21 — with GitHub Actions Inactive

terrykong approved these changes Dec 16, 2025

View reviewed changes

terrykong merged commit 52cebdf into main Dec 16, 2025
48 of 49 checks passed

terrykong deleted the ruit/fix_grpo_functional_test branch December 16, 2025 18:42

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

chore: fix grpo functional test metric (NVIDIA-NeMo#1643)

1856ed1

Signed-off-by: ruit <ruit@nvidia.com>

parthmannan pushed a commit to parthmannan/RL that referenced this pull request Jan 15, 2026

chore: fix grpo functional test metric (NVIDIA-NeMo#1643)

1143d7a

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026

chore: fix grpo functional test metric (NVIDIA-NeMo#1643)

4dc303a

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

chore: fix grpo functional test metric (NVIDIA-NeMo#1643)

91421ec

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: fix grpo functional test metric (#1643)

eab710b

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: fix grpo functional test metric (#1643)

42a00b5

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

chore: fix grpo functional test metric (#1643)

1fd6999

Signed-off-by: ruit <ruit@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: fix grpo functional test metric#1643

chore: fix grpo functional test metric#1643
terrykong merged 1 commit intomainfrom
ruit/fix_grpo_functional_test

RayenTian commented Dec 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RayenTian commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Related Context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RayenTian commented Dec 16, 2025 •

edited

Loading

coderabbitai bot commented Dec 16, 2025 •

edited

Loading