Fix alphas cumprod after add_noise for DDIMScheduler #2153

rockerBOO · 2025-07-16T02:38:04Z

The issue is

Lines 5977 to 6011 in 7c075a9

    
           def get_noise_noisy_latents_and_timesteps( 
        
               args, noise_scheduler, latents: torch.FloatTensor 
        
           ) -> Tuple[torch.FloatTensor, torch.FloatTensor, torch.IntTensor]: 
        
               # Sample noise that we'll add to the latents 
        
               noise = torch.randn_like(latents, device=latents.device) 
        
               if args.noise_offset: 
        
                   if args.noise_offset_random_strength: 
        
                       noise_offset = torch.rand(1, device=latents.device) * args.noise_offset 
        
                   else: 
        
                       noise_offset = args.noise_offset 
        
                   noise = custom_train_functions.apply_noise_offset(latents, noise, noise_offset, args.adaptive_noise_scale) 
        
               if args.multires_noise_iterations: 
        
                   noise = custom_train_functions.pyramid_noise_like( 
        
                       noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount 
        
                   ) 
        
               # Sample a random timestep for each image 
        
               b_size = latents.shape[0] 
        
               min_timestep = 0 if args.min_timestep is None else args.min_timestep 
        
               max_timestep = noise_scheduler.config.num_train_timesteps if args.max_timestep is None else args.max_timestep 
        
               timesteps = get_timesteps(min_timestep, max_timestep, b_size, latents.device) 
        
               # Add noise to the latents according to the noise magnitude at each timestep 
        
               # (this is the forward diffusion process) 
        
               if args.ip_noise_gamma: 
        
                   if args.ip_noise_gamma_random_strength: 
        
                       strength = torch.rand(1, device=latents.device) * args.ip_noise_gamma 
        
                   else: 
        
                       strength = args.ip_noise_gamma 
        
                   noisy_latents = noise_scheduler.add_noise(latents, noise + strength * torch.randn_like(latents), timesteps) 
        
               else: 
        
                   noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps) 
        
               return noise, noisy_latents, timesteps

noise_scheduler.add_noise() moves the alphas_cumprod to the GPU

And you can see it how it is working in the DDIMScheduler

https://github.com/huggingface/diffusers/blob/a4df8dbc40e170ff828f8d8f79c2c861c9f1748d/src/diffusers/schedulers/scheduling_ddim.py#L474-L498

So this issue is for sd and sdxl models specifically. We can move the alphas_cumprod back to the CPU after.

kohya-ss · 2025-07-16T14:24:18Z

Thank you, this makes sense! I will merge this tomorrow.

rockerBOO · 2025-07-16T17:56:28Z

Maybe we should add a note about why we are doing that there for the future.

kohya-ss · 2025-07-17T12:37:39Z

I looked into why this hadn't been a problem until now, and it turns out Diffusers code has changed: huggingface/diffusers#6704

The size of alphas_cumprod is 1,000, so I don't think the overhead is that large.

rockerBOO · 2025-07-18T01:38:26Z

I think that PR might highlight that we might just want to move the timesteps to that device or just use that device in our huber lookup. We are moving the timesteps to the cpu there, and if it is having a big enough impact that they are showing, it might make sense to do that.

We can just move the timesteps to the noise_scheduler.alphas_cumprod device to match.

Fix alphas cumprod after add_noise for DDIMScheduler

a7b33f3

Add note about why we are moving it

3adbbb6

kohya-ss merged commit d53a532 into kohya-ss:sd3 Jul 17, 2025
2 checks passed

bananasss00 mentioned this pull request Aug 26, 2025

RuntimeError: Expected all tensors , but found at least two devices, cuda:0 and cpu! Lora training not possible bmaltais/kohya_ss#3335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix alphas cumprod after add_noise for DDIMScheduler #2153

Fix alphas cumprod after add_noise for DDIMScheduler #2153

Uh oh!

rockerBOO commented Jul 16, 2025

Uh oh!

kohya-ss commented Jul 16, 2025

Uh oh!

rockerBOO commented Jul 16, 2025

Uh oh!

kohya-ss commented Jul 17, 2025

Uh oh!

Uh oh!

rockerBOO commented Jul 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def get_noise_noisy_latents_and_timesteps(
	args, noise_scheduler, latents: torch.FloatTensor
	) -> Tuple[torch.FloatTensor, torch.FloatTensor, torch.IntTensor]:
	# Sample noise that we'll add to the latents
	noise = torch.randn_like(latents, device=latents.device)
	if args.noise_offset:
	if args.noise_offset_random_strength:
	noise_offset = torch.rand(1, device=latents.device) * args.noise_offset
	else:
	noise_offset = args.noise_offset
	noise = custom_train_functions.apply_noise_offset(latents, noise, noise_offset, args.adaptive_noise_scale)
	if args.multires_noise_iterations:
	noise = custom_train_functions.pyramid_noise_like(
	noise, latents.device, args.multires_noise_iterations, args.multires_noise_discount
	)

	# Sample a random timestep for each image
	b_size = latents.shape[0]
	min_timestep = 0 if args.min_timestep is None else args.min_timestep
	max_timestep = noise_scheduler.config.num_train_timesteps if args.max_timestep is None else args.max_timestep

	timesteps = get_timesteps(min_timestep, max_timestep, b_size, latents.device)

	# Add noise to the latents according to the noise magnitude at each timestep
	# (this is the forward diffusion process)
	if args.ip_noise_gamma:
	if args.ip_noise_gamma_random_strength:
	strength = torch.rand(1, device=latents.device) * args.ip_noise_gamma
	else:
	strength = args.ip_noise_gamma
	noisy_latents = noise_scheduler.add_noise(latents, noise + strength * torch.randn_like(latents), timesteps)
	else:
	noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)

	return noise, noisy_latents, timesteps

Uh oh!

Fix alphas cumprod after add_noise for DDIMScheduler #2153

Fix alphas cumprod after add_noise for DDIMScheduler #2153

Uh oh!

Conversation

rockerBOO commented Jul 16, 2025

Uh oh!

kohya-ss commented Jul 16, 2025

Uh oh!

rockerBOO commented Jul 16, 2025

Uh oh!

kohya-ss commented Jul 17, 2025

Uh oh!

Uh oh!

rockerBOO commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rockerBOO commented Jul 18, 2025 •

edited

Loading