Skip to content

fix: Fix adv estimator configs#1994

Merged
terrykong merged 1 commit intomainfrom
yifu/fix_configs
Feb 21, 2026
Merged

fix: Fix adv estimator configs#1994
terrykong merged 1 commit intomainfrom
yifu/fix_configs

Conversation

@yfw
Copy link
Contributor

@yfw yfw commented Feb 20, 2026

What does this PR do ?

Previously there were cases where the grpo configs and adv_estimator configs did not match.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Chores
    • Externalized GRPO training configuration parameters (normalize_rewards and use_leave_one_out_baseline) from hardcoded values to template variables across example and template configurations, enabling flexible runtime configuration management.

Previously there were cases where the grpo configs and adv_estimator
configs did not match

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw requested review from a team and terrykong as code owners February 20, 2026 19:26
@yfw yfw changed the title Fix adv estimator configs fix: Fix adv estimator configs Feb 20, 2026
@yfw yfw added the CI:L1 Run doctests, unit tests, and functional tests label Feb 20, 2026
@yfw yfw removed the request for review from terrykong February 20, 2026 19:27
@yfw yfw requested review from hijkzzz and terrykong February 20, 2026 19:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

The PR replaces hardcoded boolean values in the adv_estimator configuration across five YAML configuration files with template variable references, allowing these values to be resolved at runtime from the grpo configuration namespace instead of being statically defined.

Changes

Cohort / File(s) Summary
GRPO Configuration Updates
examples/configs/grpo_math_1B.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml, examples/nemo_gym/grpo_workplace_assistant_nemotron_nano_v2_9b.yaml, research/template_project/configs/grpo_math_1B.yaml
Replaced hardcoded boolean values in adv_estimator with template variable references: normalize_rewards: true${grpo.normalize_rewards} and use_leave_one_out_baseline: false${grpo.use_leave_one_out_baseline}.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

CI:L1

Suggested reviewers

  • parthchadha
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies algorithm hyperparameters in GRPO configs (normalize_rewards, use_leave_one_out_baseline) without test results documentation. Include test results demonstrating no training regressions; run benchmarks on affected configs and document loss curves, convergence metrics.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: Fix adv estimator configs' directly matches the PR's core objective to fix adv estimator configuration mismatches, clearly summarizing the main change.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yifu/fix_configs

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong enabled auto-merge (squash) February 20, 2026 20:19
@terrykong terrykong merged commit 8630deb into main Feb 21, 2026
43 of 46 checks passed
@terrykong terrykong deleted the yifu/fix_configs branch February 21, 2026 03:02
sharonyu-115 pushed a commit to sharonyu-115/NeMo-RL that referenced this pull request Feb 28, 2026
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants