Skip to content

Conversation

@rockerBOO
Copy link
Contributor

Fixes #2102

The issue is

def get_noise_noisy_latents_and_timesteps(
args, noise_scheduler, latents: torch.FloatTensor
) -> Tuple[torch.FloatTensor, torch.FloatTensor, torch.IntTensor]:
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents, device=latents.device)
if args.noise_offset:
if args.noise_offset_random_strength:
noise_offset = torch.rand(1, device=latents.device) * args.noise_offset
else:
noise_offset = args.noise_offset
noise = custom_train_functions.apply_noise_offset(latents, noise, noise_offset, args.adaptive_noise_scale)
if args.multires_noise_iterations:
noise = custom_train_functions.pyramid_noise_like(
noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount
)
# Sample a random timestep for each image
b_size = latents.shape[0]
min_timestep = 0 if args.min_timestep is None else args.min_timestep
max_timestep = noise_scheduler.config.num_train_timesteps if args.max_timestep is None else args.max_timestep
timesteps = get_timesteps(min_timestep, max_timestep, b_size, latents.device)
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
if args.ip_noise_gamma:
if args.ip_noise_gamma_random_strength:
strength = torch.rand(1, device=latents.device) * args.ip_noise_gamma
else:
strength = args.ip_noise_gamma
noisy_latents = noise_scheduler.add_noise(latents, noise + strength * torch.randn_like(latents), timesteps)
else:
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
return noise, noisy_latents, timesteps

noise_scheduler.add_noise() moves the alphas_cumprod to the GPU

And you can see it how it is working in the DDIMScheduler

https://github.com/huggingface/diffusers/blob/a4df8dbc40e170ff828f8d8f79c2c861c9f1748d/src/diffusers/schedulers/scheduling_ddim.py#L474-L498

So this issue is for sd and sdxl models specifically. We can move the alphas_cumprod back to the CPU after.

@kohya-ss
Copy link
Owner

Thank you, this makes sense! I will merge this tomorrow.

@rockerBOO
Copy link
Contributor Author

Maybe we should add a note about why we are doing that there for the future.

@kohya-ss
Copy link
Owner

I looked into why this hadn't been a problem until now, and it turns out Diffusers code has changed: huggingface/diffusers#6704

The size of alphas_cumprod is 1,000, so I don't think the overhead is that large.

@kohya-ss kohya-ss merged commit d53a532 into kohya-ss:sd3 Jul 17, 2025
2 checks passed
@rockerBOO
Copy link
Contributor Author

rockerBOO commented Jul 18, 2025

I think that PR might highlight that we might just want to move the timesteps to that device or just use that device in our huber lookup. We are moving the timesteps to the cpu there, and if it is having a big enough impact that they are showing, it might make sense to do that.

We can just move the timesteps to the noise_scheduler.alphas_cumprod device to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants