Loading pretrained model after `fuse_lora()` and `save_pretrained()` results in an error

### Describe the bug

I am not able to cache a model to be re-loaded later after fusing a LoRA into it. My hope was that I could fuse the LoRA into the base model which would result in a new model that can be loaded as needed.

### Reproduction

```python
from diffusers import DiffusionPipeline
import torch

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")

pipe.fuse_lora(lora_scale=1.0)

prompt = "toy_face of a hacker with a hoodie"
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]

image.save("output.png")

pipe.save_pretrained("../pretrained")

pipe = DiffusionPipeline.from_pretrained("../pretrained", torch_dtype=torch.float16).to("cuda")
image = pipe(prompt, num_inference_steps=30, generator=torch.manual_seed(0)).images[0]

image.save("output2.png")
```

### Logs

```shell
python build_model_from_loras_example.py 
model_index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 609/609 [00:00<00:00, 7.04MB/s]
2024-01-15 22:43:01.649074: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-15 22:43:01.691497: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX512F AVX512_VNNI, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.2
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
scheduler/scheduler_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 479/479 [00:00<00:00, 5.40MB/s]
tokenizer/tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 737/737 [00:00<00:00, 10.2MB/s]
tokenizer/special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 6.73MB/s]
text_encoder_2/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 575/575 [00:00<00:00, 8.04MB/s]
text_encoder/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 565/565 [00:00<00:00, 8.01MB/s]
tokenizer/merges.txt: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 4.33MB/s]
tokenizer_2/tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 725/725 [00:00<00:00, 8.29MB/s]
tokenizer_2/special_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 460/460 [00:00<00:00, 4.87MB/s]
unet/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.68k/1.68k [00:00<00:00, 18.1MB/s]
tokenizer/vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 5.95MB/s]
tokenizer_2/merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 4.34MB/s]
vae/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 642/642 [00:00<00:00, 6.90MB/s]
tokenizer_2/vocab.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 5.90MB/s]
model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492M/492M [00:01<00:00, 389MB/s]
diffusion_pytorch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:01<00:00, 322MB/s]
diffusion_pytorch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:01<00:00, 192MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.78G/2.78G [00:10<00:00, 272MB/s]
diffusion_pytorch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.3G/10.3G [00:54<00:00, 189MB/s]
Fetching 19 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:56<00:00,  2.95s/it]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.27it/s]
toy_face_sdxl.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 171M/171M [00:01<00:00, 138MB/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:09<00:00,  3.14it/s]
Loading pipeline components...:  29%|███████████████████████████████████████▍                                                                                                  | 2/7 [00:00<00:01,  3.19it/s]
Traceback (most recent call last):
  File "/home/ubuntu/replicate-fun/build_model_from_loras_example.py", line 18, in <module>
    pipe2 = DiffusionPipeline.from_pretrained("../pretrained", torch_dtype=torch.float16).to("cuda")
  File "/home/ubuntu/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 1271, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 525, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 805, in from_pretrained
    raise ValueError(
ValueError: Cannot load <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'> from ../pretrained/unet because the following keys are missing: 
 down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias, down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight, up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias,
[really long list of keys],
up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight, up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight. 
 Please make sure to pass `low_cpu_mem_usage=False` and `device_map=None` if you want to randomly initialize those weights or else make sure your checkpoint file is correct.
```


### System Info

- `diffusers` version: 0.25.0
- Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1 (True)
- Huggingface_hub version: 0.20.2
- Transformers version: 4.36.2
- Accelerate version: 0.26.1
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: not sure

### Who can help?

@sayakpaul @patrickvonplaten I think you two are the right people to tag. Is this something that should be possible?

Note: If I set `low_cpu_usage_mode=False` and `device_map=None` as suggested the second output image is just noise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loading pretrained model after `fuse_lora()` and `save_pretrained()` results in an error #6602

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loading pretrained model after fuse_lora() and save_pretrained() results in an error #6602

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Loading pretrained model after `fuse_lora()` and `save_pretrained()` results in an error #6602