Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for bf16 in deepspeed inference #16973

Merged
merged 9 commits into from
Mar 21, 2023
Merged

Add check for bf16 in deepspeed inference #16973

merged 9 commits into from
Mar 21, 2023

Conversation

colehawkins
Copy link
Contributor

@colehawkins colehawkins commented Mar 6, 2023

What does this PR do?

Fixes #16298 (comment) by checking for both fp16 and bf16 in the deepspeed config before initializing inference.

Previously there was no check for the bf16 dtype so it was possible to initialize a model with the incorrect dtype when calling trainer.test and/or trainer.validate which executed this code path.

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Mar 6, 2023
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli awaelchli added this to the v1.9.x milestone Mar 6, 2023
@awaelchli awaelchli added the bug Something isn't working label Mar 6, 2023
@carmocca carmocca added the community This PR is from the community label Mar 7, 2023
@awaelchli awaelchli self-assigned this Mar 21, 2023
@mergify mergify bot added the ready PRs ready to be merged label Mar 21, 2023
@awaelchli awaelchli enabled auto-merge (squash) March 21, 2023 15:20
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Mar 21, 2023
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Mar 21, 2023
@awaelchli awaelchli merged commit c271d4c into Lightning-AI:master Mar 21, 2023
carmocca added a commit that referenced this pull request Mar 30, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
lantiga pushed a commit that referenced this pull request Mar 30, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
Borda pushed a commit that referenced this pull request Mar 31, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
(cherry picked from commit c271d4c)
Borda pushed a commit that referenced this pull request Mar 31, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
(cherry picked from commit c271d4c)
Borda pushed a commit that referenced this pull request Mar 31, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
(cherry picked from commit c271d4c)
lantiga pushed a commit that referenced this pull request Apr 3, 2023
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Cole Hawkins <colehawk>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: awaelchli <[email protected]>
(cherry picked from commit c271d4c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community This PR is from the community pl Generic label for PyTorch Lightning package ready PRs ready to be merged strategy: deepspeed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trainer.test() gives an error when using deepspeed and bf16.
4 participants