Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the implementation of the method #10

Open
ChenhLiwnl opened this issue Jul 1, 2024 · 1 comment
Open

About the implementation of the method #10

ChenhLiwnl opened this issue Jul 1, 2024 · 1 comment

Comments

@ChenhLiwnl
Copy link

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

@theEricMa
Copy link

Hi, thanks for your interest in our work and your question. I'm the third author of the paper, let me address your questions.

  1. Thank you for pointing that out. In line 13, E_\theta and E_\theta' should also be E_\phi, and in line 14, E_\phi' should be E_\phi'. We will correct this in the next version.

  2. Yes, the frozen regularizer used in the VSD loss is indeed the pre-trained model, specifically SD2.1-base. The trainable regularizer is initialized with the pre-trained model using LoRA. However, the gradient is not zero at the start of training. According to the official implementation, the classifier guidance scale (cfg) for the pre-trained regularizer is set greater than 1, typically at 7.5, similar to text-to-image generation. In contrast, the cfg for the fine-tuned regularizer is set to 1. This difference makes the VSD loss effective even at the beginning of training. Our experiments show that setting cfg to 7.5 for both pre-trained and fine-tuned regularizers does not yield as good results as following VSD's implementation.

I hope this clarifies your questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants