Skip to content

Conversation

@Slickytail
Copy link
Contributor

The loss-weighting schemes for SD3 training were not implemented correctly, causing all of them to be non-functional. I went ahead and implemented the lognorm and cosmap schemes, just by using the density at those timesteps. Potentially, a better approach would be to sample the timestep according to that density in the first place.

The Mode scheme is much harder to implement -- there's a reason that they didn't include an explicit form for the density in the paper (I couldn't find one...), so I put in an error message if you try to use it for now.

@sayakpaul @kashif

@kashif
Copy link
Contributor

kashif commented Jun 13, 2024

thanks @Slickytail checking

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bghira
Copy link
Contributor

bghira commented Jun 13, 2024

previous code

image

new code

image

@kashif
Copy link
Contributor

kashif commented Jun 13, 2024

@Slickytail can you kindly make style in the root dir of diffusers?

@chenbaiyujason
Copy link

I get the following error:
timesteps = noise_scheduler_copy.timesteps[indices].to(device=model_input.device)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

@sayakpaul
Copy link
Member

@kashif I will review the changes a bit later but could you test the scripts and see if they are running without errors?

@chenbaiyujason
Copy link

This avoids errors
1476 indices = (u * noise_scheduler_copy.config.num_train_timesteps).long()
fix:
indices = (u * noise_scheduler_copy.config.num_train_timesteps).long().to(device="cpu")

@xiaohu2015
Copy link
Contributor

a question about the loss: why here we firstly convert the model prediction to x_0 to compute loss?

@Slickytail
Copy link
Contributor Author

Hi @kashif, it looks like you commited the necessary formatting/style changes, so I'm assuming make style isn't needed from me anymore. Is there anything else that needs a fix in here?

@kashif
Copy link
Contributor

kashif commented Jun 14, 2024

@Slickytail yes let's keep all the u tensors on the CPU as the indices are on the CPU side so there is no need to set the device to gpu

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for these!

@asomoza has confirmed that these are working nicely.

In a follow up PR, I will add your PR in the comments to honor your contributions :)

Will also make it a little utility function and move it to training_utils.py.

Appreciate your help here!

@sayakpaul
Copy link
Member

Okay I can confirm that the failing example tests are only with the latest version of datasets. With the 2.19.1 version, they don't appear. So, will merge the PR. Requesting @lhoestq @albertvillanova to help.

@albertvillanova
Copy link
Member

Thanks for the ping, @sayakpaul.

In the latest datasets release, we had to introduce a breaking change for security reasons: huggingface/transformers#31406

Datasets with a Python loading script now require passing trust_remote_code=True to be used

@lhoestq
Copy link
Member

lhoestq commented Jun 17, 2024

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* Add lognorm and cosmap weighting

* Implement mode sampling

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* keep timestamp sampling fully on cpu

---------

Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants