Skip to content

Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_...#471

Merged
kylemontgomery1 merged 1 commit intorllm-org:mainfrom
JiwaniZakir:fix/447-norm-adv-by-std-in-grpo-should-be-from-r
Apr 4, 2026
Merged

Fix #447: norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_...#471
kylemontgomery1 merged 1 commit intorllm-org:mainfrom
JiwaniZakir:fix/447-norm-adv-by-std-in-grpo-should-be-from-r

Conversation

@JiwaniZakir
Copy link
Copy Markdown
Contributor

Closes #447

Summary

norm_adv_by_std_in_grpo is defined under rllm.algorithm in config files, but was mistakenly being read from rllm.stepwise_advantage in two places. This caused the flag to always fall back to the default value (True) rather than respecting the user's configured value.

Type of change

  • Feature
  • Fix
  • Docs
  • Refactor
  • Example / Project
  • Infra / CI

What changed

  • In rllm/experimental/common/config.py (AlgorithmConfig.from_config): changed config.rllm.stepwise_advantage.get("norm_adv_by_std_in_grpo", True) to config.rllm.algorithm.get("norm_adv_by_std_in_grpo", True)
  • In rllm/experimental/unified_trainer.py (UnifiedTrainer): applied the same fix, changing self.rllm_config.stepwise_advantage.get("norm_adv_by_std_in_grpo", True) to self.rllm_config.algorithm.get("norm_adv_by_std_in_grpo", True)
  • Added tests/unified_trainer/test_algorithm_config.py with two tests that verify AlgorithmConfig.from_config correctly reads norm_adv_by_std_in_grpo from rllm.algorithm (with the key intentionally absent from stepwise_advantage to catch any regression)

Validation

  • pre-commit run --all-files
  • Targeted tests: pytest tests/unified_trainer/test_algorithm_config.py
  • Manual validation performed
  • Not run (reason below)

Validation details:

  • New tests in test_algorithm_config.py cover both True and False values for norm_adv_by_std_in_grpo, using a minimal OmegaConf config that omits the key from rllm.stepwise_advantage to confirm the correct lookup path.

Breaking changes / migration notes

  • None

Docs / examples

  • Not needed

Related issues / PRs

Screenshots / logs

N/A


This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kylemontgomery1
Copy link
Copy Markdown
Collaborator

Thanks for catching this!

@kylemontgomery1 kylemontgomery1 merged commit 61a5145 into rllm-org:main Apr 4, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

norm_adv_by_std_in_grpo should be from rllm.algorithm.norm_adv_by_std_in_grpo?

2 participants