Describe the bug
When running the nemotron-helpsteer3 GRPO recipe (examples/configs/recipes/llm/grpo-helpsteer3-llama-3.3-nemotron-super-49b-v1.5-4n8g-fsdp2tp8.yaml) with the HelpSteer3 dataset, the log_prob_error is extremely high both before and after PR #1506.
