You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.
The text was updated successfully, but these errors were encountered:
Hi, thanks for your interest in our work and your question. I'm the third author of the paper, let me address your questions.
Thank you for pointing that out. In line 13, E_\theta and E_\theta' should also be E_\phi, and in line 14, E_\phi' should be E_\phi'. We will correct this in the next version.
Yes, the frozen regularizer used in the VSD loss is indeed the pre-trained model, specifically SD2.1-base. The trainable regularizer is initialized with the pre-trained model using LoRA. However, the gradient is not zero at the start of training. According to the official implementation, the classifier guidance scale (cfg) for the pre-trained regularizer is set greater than 1, typically at 7.5, similar to text-to-image generation. In contrast, the cfg for the fine-tuned regularizer is set to 1. This difference makes the VSD loss effective even at the beginning of training. Our experiments show that setting cfg to 7.5 for both pre-trained and fine-tuned regularizers does not yield as good results as following VSD's implementation.
Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.
The text was updated successfully, but these errors were encountered: