[Lora] Speed up lora loading #4994

patrickvonplaten · 2023-09-12T12:09:17Z

This PR refactors LoRA loading a bit (removing some boilerplate code) and speeds-up the loading process by being a bit smarter with device and dtype placement and adding low_cpu_mem_usage support.

The following should be sped up by at least a factor of 2.

from diffusers import DiffusionPipeline
from safetensors.torch import load_file
import torch
from pathlib import Path
from huggingface_hub import HfApi, hf_hub_download
import os
import hf_image_uploader as hiu
import time

api = HfApi()


pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.to("cuda")

# pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors", low_cpu_mem_usage=True)
# file = hf_hub_download("TheLastBen/Papercut_SDXL", filename="papercut.safetensors")
file = hf_hub_download("hf-internal-testing/sdxl-0.9-daiton-lora", filename="daiton-xl-lora-test.safetensors")
state_dict = load_file(file)
state_dict = {k: v.to(device="cuda", dtype=torch.float16) for k,v in state_dict.items() if torch.is_tensor(v)}

start_time = time.time()
pipe.load_lora_weights(state_dict, low_cpu_mem_usage=True)
print(time.time() - start_time)

src/diffusers/loaders.py

src/diffusers/models/modeling_utils.py

HuggingFaceDocBuilderDev · 2023-09-12T12:56:57Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

…speed_up_loading

src/diffusers/loaders.py

patrickvonplaten · 2023-09-29T16:27:27Z

Testing scripts here:

1.) Pure loading time:

from diffusers import DiffusionPipeline
from safetensors.torch import load_file
import torch
import time


pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.to("cuda")

# pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors", low_cpu_mem_usage=True)
# file = hf_hub_download("TheLastBen/Papercut_SDXL", filename="papercut.safetensors")
file = hf_hub_download("hf-internal-testing/sdxl-0.9-daiton-lora", filename="daiton-xl-lora-test.safetensors")
state_dict = load_file(file)
state_dict = {k: v.to(device="cuda", dtype=torch.float16) for k,v in state_dict.items() if torch.is_tensor(v)}

start_time = time.time()
pipe.load_lora_weights(state_dict, low_cpu_mem_usage=True)
print(time.time() - start_time)

2.) LoRA fusing/unfusing:

from diffusers import DiffusionPipeline
from safetensors.torch import load_file
import torch
import time


pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.to("cuda")

# pipe.load_lora_weights("stabilityai/stable-diffusion-xl-base-1.0", weight_name="sd_xl_offset_example-lora_1.0.safetensors", low_cpu_mem_usage=True)
# file = hf_hub_download("TheLastBen/Papercut_SDXL", filename="papercut.safetensors")
file = hf_hub_download("hf-internal-testing/sdxl-0.9-daiton-lora", filename="daiton-xl-lora-test.safetensors")
state_dict = load_file(file)
state_dict = {k: v.to(device="cuda", dtype=torch.float16) for k,v in state_dict.items() if torch.is_tensor(v)}
pipe.load_lora_weights(state_dict, low_cpu_mem_usage=True)

start_time = time.time()
pipe.fuse_lora()
print(time.time() - start_time)
start_time = time.time()
pipe.unfuse_lora()
print(time.time() - start_time)

* speed up lora loading * Apply suggestions from code review * up * up * Fix more * Correct more * Apply suggestions from code review * up * Fix more * Fix more - * up * up

speed up lora loading

40f248f

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/loaders.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/loaders.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

patrickvonplaten added 3 commits September 12, 2023 14:10

Apply suggestions from code review

b803e85

up

37aa233

up

01e461a

Fix more

a243319

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

Correct more

5ff2587

patrickvonplaten force-pushed the speed_up_loading branch from 7db098f to 5ff2587 Compare September 12, 2023 13:09

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Outdated Show resolved Hide resolved

patrickvonplaten added 3 commits September 12, 2023 15:10

Apply suggestions from code review

7299c8a

up

51b02b4

Merge branch 'main' of https://github.com/huggingface/diffusers into …

9ec3549

…speed_up_loading

patrickvonplaten commented Sep 12, 2023

View reviewed changes

src/diffusers/loaders.py Outdated Show resolved Hide resolved

Fix more

ad1cc9c

patrickvonplaten force-pushed the speed_up_loading branch from df4c9c6 to ad1cc9c Compare September 12, 2023 13:29

patrickvonplaten added 3 commits September 12, 2023 13:30

Fix more -

7a3db9f

up

682412d

up

dbb129e

patrickvonplaten changed the title ~~speed up lora loading~~ [Lora] Speed up lora loading Sep 12, 2023

patrickvonplaten merged commit 37cb819 into main Sep 12, 2023

patrickvonplaten deleted the speed_up_loading branch September 12, 2023 15:51

patrickvonplaten mentioned this pull request Sep 12, 2023

Fast LoRA initialization by skipping redundant linearizations #4980

Closed

6 tasks

This was referenced Sep 12, 2023

Avoid unnecessary copies when loading LoRAs #4979

Closed

Cache repeated parameter datatype queries for the same model #4976

Closed

isidentical mentioned this pull request Sep 12, 2023

LoRa loading is extremely inefficient due to repeated datatype queries #4975

Closed

patrickvonplaten mentioned this pull request Sep 29, 2023

[core / PEFT / LoRA] Integrate PEFT into Unet #5151

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lora] Speed up lora loading #4994

[Lora] Speed up lora loading #4994

patrickvonplaten commented Sep 12, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 12, 2023 •

edited

Loading

patrickvonplaten commented Sep 29, 2023

[Lora] Speed up lora loading #4994

[Lora] Speed up lora loading #4994

Conversation

patrickvonplaten commented Sep 12, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Sep 12, 2023 • edited Loading

patrickvonplaten commented Sep 29, 2023

patrickvonplaten commented Sep 12, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 12, 2023 •

edited

Loading