Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusion about VSD loss implementation #21

Open
zzzzzuber opened this issue Jul 19, 2024 · 14 comments
Open

Some confusion about VSD loss implementation #21

zzzzzuber opened this issue Jul 19, 2024 · 14 comments

Comments

@zzzzzuber
Copy link

Hi, thanks for your wonderful work~
I'm a little confused about the implemention of vsd loss,
I followed your paper and read ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
I thought vsd loss is pixel-wise grad by net to input LQ, hence it's pixel-wise calcalation between pretrained_regularizer's output and finetuned regularizer's output, however lpips and mse loss is a scalar, i'm really confused about the implementation of vsd loss and how to apply with data loss?
Hope for your reply~
ps: the pic is ProlificDreamer's implementation
6f55b7a1-8c61-4854-b97c-56a44f1524af

@xhuang0904
Copy link

I am also confused, it will be great if the author can release the training code

@xhuang0904
Copy link

Have you reproduce the VSD loss now?

@theEricMa
Copy link

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

@zzzzzuber
Copy link
Author

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

Thanks for your reply~
Now i know how to reproduce vsd loss in my model~
Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss?
Thanks for your kindness help again!

@zzzzzuber
Copy link
Author

Have you reproduce the VSD loss now?

I will try it again, haha~

@theEricMa
Copy link

theEricMa commented Jul 29, 2024

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

@zzzzzuber
Copy link
Author

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂, and use mse loss directly is an easier way?

@theEricMa
Copy link

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂, and use mse loss directly is an easier way?

@xhuang0904
Copy link

Hi, can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

@zzzzzuber
Copy link
Author

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂, and use mse loss directly is an easier way?

ok, I see~~ Many thanks for your kind help~~~

@zzzzzuber
Copy link
Author

Hi, can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

I use the same way to implement vsd loss~

@Yangkai-Wei
Copy link

@zzzzzuber
Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

@beyondbatman-master
Copy link

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

@zzzzzero
Copy link

I use the official training code,also meet the same problem,Did you solve the problem?

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants