Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error when using a different noise schedule #15

Closed
KevinWang676 opened this issue May 29, 2023 · 15 comments
Closed

An error when using a different noise schedule #15

KevinWang676 opened this issue May 29, 2023 · 15 comments

Comments

@KevinWang676
Copy link

Hi, I got an error called "ValueError: only one element tensors can be converted to Python scalars" when I tried to use a different noise schedule for $\xi$. I want $\gamma_{0}, \cdots, \gamma_{T}$ to have differents values as $t$ increases, but it seems that I can't just introduce a different parameter new_nosie that is dependent on $t$. The error is shown below. Could you help me resolve this issue? Thank you!

image

@forever208
Copy link
Owner

print out the dimension of each variable, you then will figure it out

@KevinWang676
Copy link
Author

Thank you. Also, I wonder if there is any specific reason why you chose $\gamma_t$ to be 0.1 or 0.15, which is relatively a small number. Does the choice of $\gamma_t$ come from your experiments or from some theoretical results? Thanks!

@forever208
Copy link
Owner

@KevinWang676 From our experiments, and it is written in our paper

@KevinWang676
Copy link
Author

Thanks. In the paper you mentioned that to select a constant $\gamma$ you "search on a small range of values". Is it possible that you miss some "big" values of $\gamma$, say $\gamma > 1$, that may also lead to good results?

@forever208
Copy link
Owner

@KevinWang676 Too strong regularization would break down the original noise prediction

@KevinWang676
Copy link
Author

Thanks! I wonder if the code new_noise = noise + gamma * th.randn_like(noise) is essentially the same as noise = torch.randn_like(latents) + 0.1 * torch.randn(latents.shape[0], latents.shape[1], 1, 1) proposed in the Diffusion With Offset Noise blog.

@forever208
Copy link
Owner

refer to this

@KevinWang676
Copy link
Author

That makes sense, thank you. Also, in your paper you mentioned $\gamma = 0.1$ is the best value for cifar10 dataset, but in your repo, you used $\gamma =0.15$ for cifar10. Is there any reason for doing so? Thanks.

@forever208
Copy link
Owner

forever208 commented May 31, 2023

for most datasets, we find gamma=0.1 is a good option using ADM code. For cifar10, gamma=0.15 actually works better than gamma=0.1 in my recent experiments. Overall, you can try gamma between (0.1, 0.15) to find the optimal one for your own dataset.

@KevinWang676
Copy link
Author

KevinWang676 commented Jun 1, 2023

Got it, thank you! Could you explain to me why $\textbf{y}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t} (\mathbf{\epsilon} + \gamma_t \mathbf{\xi})$ would lead to better results than $\textbf{y}_t = \sqrt{\bar{\alpha}_t}\textbf{x}_0 + \sqrt{1-\bar{\alpha}_t} \sqrt{1+ \gamma^{2}} \mathbf{\epsilon}^{\prime}$? I'm a little confused about it since they actually have the same distribution.

@KevinWang676
Copy link
Author

KevinWang676 commented Jun 1, 2023

I think the reason is that $\mathbf{\epsilon}^{\prime}$ is multiplied by an extra factor $\sqrt{1+\gamma^{2}}$, which is greater than $1$, and this makes the prediction less accurate. Am I right? Thanks.

@forever208
Copy link
Owner

Got it, thank you! Could you explain to me why yt=α¯tx0+1−α¯t(ϵ+γtξ) would lead to better results than yt=α¯tx0+1−α¯t1+γ2ϵ′? I'm a little confused about it since they actually have the same distribution.

DDPM-IP and DDPM-y share the same input y_t, but they have different training target

@KevinWang676
Copy link
Author

KevinWang676 commented Jun 13, 2023

Thank! But I wonder how to determine which term is the training target because in the expression $\textbf{y}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t} (\mathbf{\epsilon} + \gamma_t \mathbf{\xi})$ the term $\mathbf{\epsilon}$ seems to have the same contribution as $\mathbf{\xi}$. How can we know $\mathbf{\epsilon}$ is actually the training target rather than $\mathbf{\xi}$? Thank you.

@forever208
Copy link
Owner

Thank! But I wonder how to determine which term is the training target because in the expression yt=α¯tx0+1−α¯t(ϵ+γtξ) the term ϵ seems to have the same contribution as ξ. How can we know ϵ is actually the training target rather than ξ? Thank you.

training target is determined by your loss function

@KevinWang676
Copy link
Author

Got it, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants