Fix dtype of weights in from_pretrained when device_map is set #20602

sgugger · 2022-12-05T19:58:03Z

What does this PR do?

As reported in #20390, the dtype of the weights after from_pretrained is used for a checkpoint is inconsistent between device_map=None or device_map set:

device_map=None (which uses nn.Module.laod_state_dict) will have the dtype of the model stay the same, even if the checkpoints are in a different dtype (so loading a float16 checkpoint in a float32 model gives a float32 model)
device_map set (which manually sets the parameters) will change the dtype of the model to the dtype of the checkpoint (so loading a float16 checkpoint in a float32 model gives a float16 model).

This PR addresses this.

HuggingFaceDocBuilderDev · 2022-12-05T20:11:53Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

Thanks a lot for fixing and making model loading consistent between device_map=auto (or any) and device_map=None !
Just wondering if you need a special safety checker for safetensors (bear in mind that I am not very knowledgable about safetensors - what happens if old_params.dtype == torch.float16 and is_safetensors==True)

sgugger · 2022-12-05T20:46:54Z

There is no more safetensors at this stage, (is_safetensors means the checkpoint comes from safetensors, but the state dict is a dictionary name to parameter in this case as well).

LysandreJik

LGTM

…ngface#20602)

Fix dtype of weights in from_pretrained when device_map is set

304b582

sgugger requested review from LysandreJik and younesbelkada December 5, 2022 19:58

younesbelkada approved these changes Dec 5, 2022

View reviewed changes

LysandreJik approved these changes Dec 6, 2022

View reviewed changes

sgugger merged commit 7586a1a into main Dec 6, 2022

sgugger deleted the fix_dtype_none branch December 6, 2022 17:16

anton-l mentioned this pull request Dec 7, 2022

Fix common tests for FP16 huggingface/diffusers#1588

Merged

younesbelkada mentioned this pull request Dec 15, 2022

[Pipeline] fix failing bloom pipeline test #20778

Merged

mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022

Fix dtype of weights in from_pretrained when device_map is set (huggi…

7972297

…ngface#20602)

patrickvonplaten mentioned this pull request Dec 16, 2022

[Dtype] Align dtype casting behavior with Transformers and Accelerate huggingface/diffusers#1725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix dtype of weights in from_pretrained when device_map is set #20602

Fix dtype of weights in from_pretrained when device_map is set #20602

Uh oh!

sgugger commented Dec 5, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Dec 5, 2022 •

edited

Loading

Uh oh!

younesbelkada left a comment •

edited

Loading

Uh oh!

sgugger commented Dec 5, 2022

Uh oh!

LysandreJik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix dtype of weights in from_pretrained when device_map is set #20602

Fix dtype of weights in from_pretrained when device_map is set #20602

Uh oh!

Conversation

sgugger commented Dec 5, 2022

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger commented Dec 5, 2022

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Dec 5, 2022 •

edited

Loading

younesbelkada left a comment •

edited

Loading