Skip to content

feat: Configurable precision#19

Merged
terrykong merged 3 commits intomainfrom
sahilj/prec
Mar 21, 2025
Merged

feat: Configurable precision#19
terrykong merged 3 commits intomainfrom
sahilj/prec

Conversation

@SahilJain314
Copy link
Copy Markdown
Contributor

Investigating precision weirdness...

It looks like for SFT, we need to run fp32 (curves look worse without it).

In GRPO, we have peculiar behavior:
with fp32, we have better convergence, but eventual divergence. Logprob errors and KL are high (could be due to bad refit or vLLM's bf16 inference being a poor approximation of 'real' fp32 probabilities). With either bf16 or mixed precision (via fsdp mixed precision), convergence is worse, but it is stable and logprob errors are stable.

image
image

Green: fp32, pink: bf16, blue: bf16-mixed.

I'm not convinced mixed precision as implemented here is doing anything at all. Needs further investigation.

Until then, this makes precision configurable and sets default precision to float32 for sft and bfloat16 for grpo.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Mar 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@terrykong terrykong merged commit 9515e47 into main Mar 21, 2025
5 checks passed
@terrykong terrykong deleted the sahilj/prec branch March 21, 2025 18:33
parthchadha pushed a commit that referenced this pull request Mar 21, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
KiddoZhu pushed a commit that referenced this pull request May 6, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants