feat: Configurable precision by SahilJain314 · Pull Request #19 · NVIDIA-NeMo/RL

SahilJain314 · 2025-03-21T11:19:54Z

Investigating precision weirdness...

It looks like for SFT, we need to run fp32 (curves look worse without it).

In GRPO, we have peculiar behavior:
with fp32, we have better convergence, but eventual divergence. Logprob errors and KL are high (could be due to bad refit or vLLM's bf16 inference being a poor approximation of 'real' fp32 probabilities). With either bf16 or mixed precision (via fsdp mixed precision), convergence is worse, but it is stable and logprob errors are stable.

Green: fp32, pink: bf16, blue: bf16-mixed.

I'm not convinced mixed precision as implemented here is doing anything at all. Needs further investigation.

Until then, this makes precision configurable and sets default precision to float32 for sft and bfloat16 for grpo.

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

examples/configs/grpo_math_1B.yaml

Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

github-actions bot added the documentation Improvements or additions to documentation label Mar 21, 2025

Make precision configurable

c58334f

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

SahilJain314 force-pushed the sahilj/prec branch from c619e19 to c58334f Compare March 21, 2025 11:26

github-actions bot removed the documentation Improvements or additions to documentation label Mar 21, 2025

configure policy tests to use float32

18137f7

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

parthchadha approved these changes Mar 21, 2025

View reviewed changes

parthchadha reviewed Mar 21, 2025

View reviewed changes

examples/configs/grpo_math_1B.yaml Show resolved Hide resolved

terrykong approved these changes Mar 21, 2025

View reviewed changes

Merge branch 'main' into sahilj/prec

3a38e3e

terrykong added the Run CICD label Mar 21, 2025

terrykong merged commit 9515e47 into main Mar 21, 2025
5 checks passed

terrykong deleted the sahilj/prec branch March 21, 2025 18:33

parthchadha pushed a commit that referenced this pull request Mar 21, 2025

feat: Configurable precision (#19)

0d37758

Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com>

KiddoZhu pushed a commit that referenced this pull request May 6, 2025

feat: Configurable precision (#19)

a3c0209

Signed-off-by: Sahil Jain <sahilj@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Configurable precision#19

feat: Configurable precision#19
terrykong merged 3 commits intomainfrom
sahilj/prec

SahilJain314 commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SahilJain314 commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants