Some confusion about VSD loss implementation #21

zzzzzuber · 2024-07-19T03:12:09Z

Hi, thanks for your wonderful work~
I'm a little confused about the implemention of vsd loss,
I followed your paper and read ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
I thought vsd loss is pixel-wise grad by net to input LQ, hence it's pixel-wise calcalation between pretrained_regularizer's output and finetuned regularizer's output, however lpips and mse loss is a scalar, i'm really confused about the implementation of vsd loss and how to apply with data loss?
Hope for your reply~
ps: the pic is ProlificDreamer's implementation

xhuang0904 · 2024-07-26T01:06:34Z

I am also confused, it will be great if the author can release the training code

xhuang0904 · 2024-07-26T02:33:31Z

Have you reproduce the VSD loss now?

theEricMa · 2024-07-26T04:45:20Z

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

zzzzzuber · 2024-07-27T07:11:45Z

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

Thanks for your reply~
Now i know how to reproduce vsd loss in my model~
Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss?
Thanks for your kindness help again!

zzzzzuber · 2024-07-27T07:13:06Z

Have you reproduce the VSD loss now?

I will try it again, haha~

theEricMa · 2024-07-29T03:14:51Z

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

zzzzzuber · 2024-07-29T07:10:14Z

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

theEricMa · 2024-07-29T07:23:59Z

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

xhuang0904 · 2024-07-29T07:57:23Z

Hi， can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

zzzzzuber · 2024-07-29T09:49:31Z

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

ok, I see~~ Many thanks for your kind help~~~

zzzzzuber · 2024-07-29T09:50:33Z

Hi， can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

I use the same way to implement vsd loss~

Yangkai-Wei · 2024-08-28T08:22:21Z

@zzzzzuber
Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

beyondbatman-master · 2024-09-09T03:53:32Z

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

zzzzzero · 2024-10-30T02:21:29Z

I use the official training code，also meet the same problem，Did you solve the problem?

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some confusion about VSD loss implementation #21

Some confusion about VSD loss implementation #21

zzzzzuber commented Jul 19, 2024

xhuang0904 commented Jul 26, 2024

xhuang0904 commented Jul 26, 2024

theEricMa commented Jul 26, 2024

zzzzzuber commented Jul 27, 2024

zzzzzuber commented Jul 27, 2024

theEricMa commented Jul 29, 2024 •

edited

Loading

zzzzzuber commented Jul 29, 2024

theEricMa commented Jul 29, 2024

xhuang0904 commented Jul 29, 2024

zzzzzuber commented Jul 29, 2024

zzzzzuber commented Jul 29, 2024

Yangkai-Wei commented Aug 28, 2024

beyondbatman-master commented Sep 9, 2024

zzzzzero commented Oct 30, 2024

Some confusion about VSD loss implementation #21

Some confusion about VSD loss implementation #21

Comments

zzzzzuber commented Jul 19, 2024

xhuang0904 commented Jul 26, 2024

xhuang0904 commented Jul 26, 2024

theEricMa commented Jul 26, 2024

zzzzzuber commented Jul 27, 2024

zzzzzuber commented Jul 27, 2024

theEricMa commented Jul 29, 2024 • edited Loading

zzzzzuber commented Jul 29, 2024

theEricMa commented Jul 29, 2024

xhuang0904 commented Jul 29, 2024

zzzzzuber commented Jul 29, 2024

zzzzzuber commented Jul 29, 2024

Yangkai-Wei commented Aug 28, 2024

beyondbatman-master commented Sep 9, 2024

zzzzzero commented Oct 30, 2024

theEricMa commented Jul 29, 2024 •

edited

Loading