Skip to content

Conversation

@woshiyyya
Copy link
Contributor

@woshiyyya woshiyyya commented Jan 25, 2024

What does this PR do?

In my stable diffusion training workload, I am adding noise to the input image latents at each training step. From some analysis on the flamegraph, it seems that the self.alpha_cumprod.to operation in DDPMScheduler.add_noise takes a lot of time and becomes a bottleneck.

This PR moves the tensor to the sample's device at the first time, then the .to operations in the following add_noise calls will be noop. The flamegraph after this change indicates that add_noise calls take much less time.

This might not be the most elegant solution, but it did reduce a huge overhead.

Before:
Screenshot 2024-01-23 at 4 30 05 PM

After:
Screenshot 2024-01-24 at 4 05 00 PM

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@woshiyyya
Copy link
Contributor Author

cc @dhruvrnaik @yiyixuxu @sayakpaul

@sayakpaul sayakpaul requested a review from yiyixuxu January 25, 2024 03:38
Comment on lines 506 to 509
# Move the self.alphas_cumprod to device to avoid redundant CPU to GPU data movement
# for the subsequent add_noise calls
self.alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device)
alphas_cumprod = self.alphas_cumprod.to(dtype=original_samples.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally okay with this.

@woshiyyya
Copy link
Contributor Author

woshiyyya commented Jan 30, 2024

Hey diffusers team, any update on this 🙂?

@sayakpaul
Copy link
Member

Be a little patient as @yiyixuxu gets to this. But we will, for sure.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks!
can we fix the test? will merge once tests pass :)

Signed-off-by: woshiyyya <[email protected]>
@woshiyyya
Copy link
Contributor Author

woshiyyya commented Jan 30, 2024

Ok let me try to fix them:) Do you know how to trigger the test again?

@yiyixuxu
Copy link
Collaborator

@woshiyyya need to run make fix-copies

Signed-off-by: woshiyyya <[email protected]>
@yiyixuxu yiyixuxu merged commit 0db766b into huggingface:main Jan 30, 2024
dg845 pushed a commit to dg845/diffusers that referenced this pull request Feb 2, 2024
…a movement. (huggingface#6704)

* load cumprod tensor to device

Signed-off-by: woshiyyya <[email protected]>

* fixing ci

Signed-off-by: woshiyyya <[email protected]>

* make fix-copies

Signed-off-by: woshiyyya <[email protected]>

---------

Signed-off-by: woshiyyya <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…a movement. (huggingface#6704)

* load cumprod tensor to device

Signed-off-by: woshiyyya <[email protected]>

* fixing ci

Signed-off-by: woshiyyya <[email protected]>

* make fix-copies

Signed-off-by: woshiyyya <[email protected]>

---------

Signed-off-by: woshiyyya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants