About the implementation of the method #10

ChenhLiwnl · 2024-07-01T00:40:11Z

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

theEricMa · 2024-07-26T04:33:16Z

Hi, thanks for your interest in our work and your question. I'm the third author of the paper, let me address your questions.

Thank you for pointing that out. In line 13, E_\theta and E_\theta' should also be E_\phi, and in line 14, E_\phi' should be E_\phi'. We will correct this in the next version.
Yes, the frozen regularizer used in the VSD loss is indeed the pre-trained model, specifically SD2.1-base. The trainable regularizer is initialized with the pre-trained model using LoRA. However, the gradient is not zero at the start of training. According to the official implementation, the classifier guidance scale (cfg) for the pre-trained regularizer is set greater than 1, typically at 7.5, similar to text-to-image generation. In contrast, the cfg for the fine-tuned regularizer is set to 1. This difference makes the VSD loss effective even at the beginning of training. Our experiments show that setting cfg to 7.5 for both pre-trained and fine-tuned regularizers does not yield as good results as following VSD's implementation.

I hope this clarifies your questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation of the method #10

About the implementation of the method #10

ChenhLiwnl commented Jul 1, 2024

theEricMa commented Jul 26, 2024

About the implementation of the method #10

About the implementation of the method #10

Comments

ChenhLiwnl commented Jul 1, 2024

theEricMa commented Jul 26, 2024