-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement.
#6704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement.
#6704
Conversation
Signed-off-by: woshiyyya <[email protected]>
| # Move the self.alphas_cumprod to device to avoid redundant CPU to GPU data movement | ||
| # for the subsequent add_noise calls | ||
| self.alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device) | ||
| alphas_cumprod = self.alphas_cumprod.to(dtype=original_samples.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am personally okay with this.
|
Hey diffusers team, any update on this 🙂? |
|
Be a little patient as @yiyixuxu gets to this. But we will, for sure. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks!
can we fix the test? will merge once tests pass :)
Signed-off-by: woshiyyya <[email protected]>
|
Ok let me try to fix them:) Do you know how to trigger the test again? |
|
@woshiyyya need to run |
Signed-off-by: woshiyyya <[email protected]>
…a movement. (huggingface#6704) * load cumprod tensor to device Signed-off-by: woshiyyya <[email protected]> * fixing ci Signed-off-by: woshiyyya <[email protected]> * make fix-copies Signed-off-by: woshiyyya <[email protected]> --------- Signed-off-by: woshiyyya <[email protected]>
…a movement. (huggingface#6704) * load cumprod tensor to device Signed-off-by: woshiyyya <[email protected]> * fixing ci Signed-off-by: woshiyyya <[email protected]> * make fix-copies Signed-off-by: woshiyyya <[email protected]> --------- Signed-off-by: woshiyyya <[email protected]>
What does this PR do?
In my stable diffusion training workload, I am adding noise to the input image latents at each training step. From some analysis on the flamegraph, it seems that the
self.alpha_cumprod.tooperation inDDPMScheduler.add_noisetakes a lot of time and becomes a bottleneck.This PR moves the tensor to the sample's device at the first time, then the
.tooperations in the followingadd_noisecalls will be noop. The flamegraph after this change indicates thatadd_noisecalls take much less time.This might not be the most elegant solution, but it did reduce a huge overhead.
Before:

After:

Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.