Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_... by JiwaniZakir · Pull Request #471 · rllm-org/rllm

JiwaniZakir · 2026-04-02T11:32:46Z

Closes #447

Summary

norm_adv_by_std_in_grpo is defined under rllm.algorithm in config files, but was mistakenly being read from rllm.stepwise_advantage in two places. This caused the flag to always fall back to the default value (True) rather than respecting the user's configured value.

Type of change

What changed

In rllm/experimental/common/config.py (AlgorithmConfig.from_config): changed config.rllm.stepwise_advantage.get("norm_adv_by_std_in_grpo", True) to config.rllm.algorithm.get("norm_adv_by_std_in_grpo", True)
In rllm/experimental/unified_trainer.py (UnifiedTrainer): applied the same fix, changing self.rllm_config.stepwise_advantage.get("norm_adv_by_std_in_grpo", True) to self.rllm_config.algorithm.get("norm_adv_by_std_in_grpo", True)
Added tests/unified_trainer/test_algorithm_config.py with two tests that verify AlgorithmConfig.from_config correctly reads norm_adv_by_std_in_grpo from rllm.algorithm (with the key intentionally absent from stepwise_advantage to catch any regression)

Validation

pre-commit run --all-files
Targeted tests: pytest tests/unified_trainer/test_algorithm_config.py
Manual validation performed
Not run (reason below)

Validation details:

New tests in test_algorithm_config.py cover both True and False values for norm_adv_by_std_in_grpo, using a minimal OmegaConf config that omits the key from rllm.stepwise_advantage to confirm the correct lookup path.

Breaking changes / migration notes

None

Docs / examples

Not needed

Related issues / PRs

Fixes norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_adv_by_std_in_grpo? #447
Related to #
Stacked on / depends on #

Screenshots / logs

N/A

This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kylemontgomery1 · 2026-04-04T00:28:42Z

Thanks for catching this!

Fix norm_adv_by_std_in_grpo read from algorithm not stepwise_advantage

829da71

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kylemontgomery1 merged commit 61a5145 into rllm-org:main Apr 4, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_...#471

Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_...#471
kylemontgomery1 merged 1 commit intorllm-org:mainfrom
JiwaniZakir:fix/447-norm-adv-by-std-in-grpo-should-be-from-r

JiwaniZakir commented Apr 2, 2026

Uh oh!

kylemontgomery1 commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JiwaniZakir commented Apr 2, 2026

Summary

Type of change

What changed

Validation

Breaking changes / migration notes

Docs / examples

Related issues / PRs

Screenshots / logs

Uh oh!

kylemontgomery1 commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants