Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about upsampling? #28

Closed
chanshing opened this issue Jan 30, 2020 · 12 comments
Closed

What about upsampling? #28

chanshing opened this issue Jan 30, 2020 · 12 comments

Comments

@chanshing
Copy link

Thank you for your awesome work. I have a question:
What if I want to perform upsampling instead of downsampling? As I understand, aliasing is a problem only when downsampling. But I came across this paper A Style-Based Generator Architecture for Generative Adversarial Networks where they also blur during upsampling, citing your work. Here the blur was applied after upsampling (instead of before as in downsampling). Could you comment on that?

I intend to apply your technique in a VAE or GAN, and I would like to know whether I should include the blur in the decoder/generator.

@richzhang
Copy link
Contributor

Thanks for the question! Yes, when upsampling, blurring should be applied after. When downsampling, it should be applied before. In both cases, it is applied at the higher resolution.

To fully understand why, I recommend taking a signal processing course and seeing what these operations are actually doing in the Fourier domain.

@chanshing
Copy link
Author

I see... The more I think about it the more it makes sense. I will follow your recommendation. Thanks for the quick reply.

@chanshing chanshing reopened this Jan 31, 2020
@chanshing
Copy link
Author

chanshing commented Jan 31, 2020

May I ask one more question? I read your previous reply related to this,

#8 (comment)

Thanks for the question!

Yes, upsample with ConvTranspose2d with groups=in_channels, and then use [1, 1] or [1, 2, 1] as your weights.

For TensorFlow, perhaps you can refer to the StyleGan2 code/paper, which does smoothed upsampling as well.

I want to check if I understand correctly the Pytorch implementation. For downsampling we do:
Conv(stride=1), ReLU, Conv(stride=k, fixed blur weight)

For upsampling we do:
Conv(stride=1), ReLU, ConvTransposed(stride k, fixed blur weight)

Is this correct?

@richzhang
Copy link
Contributor

Yes, that's correct. For upsampling, it's not relevant what happens before the convtranspose layer. Just make sure you blur after upsampling. So I would phrase it as:

Whatever, ConvTransposed(stride k, fixed blur weight)

@chanshing
Copy link
Author

chanshing commented Feb 4, 2020

@richzhang Thank you for clarifying. Please if you don't mind I have two more questions:

  1. In practice, would you first upsample then conv+relu, or first conv+relu then upsample? In other words, ConvTransposed(stride=k, fixed blur weight), Conv(stride=1), ReLU or Conv(stride=1), ReLU, ConvTransposed(stride=k, fixed blur weight)? Many papers seem to go for the first option, perhaps influenced by https://distill.pub/2016/deconv-checkerboard/

  2. You mention in the paper

Note that applying the Rect-2 and Tri-3 filters
while upsampling correspond to “nearest” and “bilinear”
upsampling, respectively.

I think this is true only for strides of 2 (upsampling factor of 2). This is mostly the case anyway so you probably omitted this, but I just want to confirm this as others might use larger factors/stride.

@richzhang
Copy link
Contributor

richzhang commented Feb 4, 2020

Do Whatever, ConvTransposed(stride k, fixed blur weight), Whatever. As long as there is blurring directly after upsampling, you are antialiasing.

Yes about the stride

@HamsterHuey
Copy link

HamsterHuey commented Mar 27, 2020

@richzhang - I was trying to see how I could implement upsampling using ConvTranspose2d and fixed weights, and it appears to me that as far as Pytorch's nn.functional.upsample method is concerned, it seems to use the [1., 3., 3., 1.] kernel for 'bilinear' upsampling rather than the Tri [1., 2., 1.] kernel as you mention in the paper (and above).

Here is a simple reproducible code block that seems to confirm this for me:

import torch
import numpy as np

# Setup image of zeros with a 2x2 block of 1 in the center
a = np.zeros((6,6))
a[2:4, 2:4] = 1
a_t = torch.from_numpy(a[None, None, :, :])

# Setup filter
filt_1d = np.array([1.,3.,3.,1.])
filt_2d = filt_1d[None,:] * filt_1d[:,None]

# Conv transpose upsampling
filt = 4*torch.from_numpy(filt_2d[None, None, :, :]/filt_2d.sum())
conv =torch.nn.functional.conv_transpose2d(a_t, filt, stride=2, padding=1)
conv_np = conv.numpy().squeeze()

# Pytorch built-in upsamling
upsample = torch.nn.functional.upsample(a_t, scale_factor=2, mode='bilinear')

# Check that they match
upsample_np = upsample.numpy().squeeze()
np.testing.assert_allclose(upsample_np, conv_np) is None
>>> True

Am I doing something wrong here?

@richzhang
Copy link
Contributor

Yes, you started with a 2x2 filter a[2:4, 2:4] = 1. You effectively convolved a [1 1] with a [1 2 1] to get [1 3 3 1]. Do a delta function a[2,2]=1 instead.

@HamsterHuey
Copy link

Thanks for the quick reply! I am treating a_t as a poor mockup of the input image to be upsampled. Even if I change the above code such that I set a[2, 2] = 1 instead, I only get a match in results with the [1 3 3 1] filter rather than [1 2 1].

@richzhang
Copy link
Contributor

richzhang commented Apr 8, 2020

Thanks, got it. I'm not sure if the pytorch implementation is correct. If the signal is properly upsampled, you should be able to do simple subsampling upsample[:,:,::2,::2] to get your original input back.

Here's the operation for scipy, which is consistent with [1,2,1] filter.

import scipy.interpolate

a = np.zeros((5,5))
a[2,2] = 1

sample = np.arange(0,5,.5)
print(scipy.interpolate.interp2d(np.arange(5), np.arange(5), a, kind='linear')(sample, sample))

Output

[[0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.25 0.5  0.25 0.   0.   0.   0.  ]
 [0.   0.   0.   0.5  1.   0.5  0.   0.   0.   0.  ]
 [0.   0.   0.   0.25 0.5  0.25 0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]
 [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.  ]]

@richzhang
Copy link
Contributor

richzhang commented Apr 8, 2020

Okay I understand now.

[0 1 0 0] should be upsampled to [0 0.5 1.0 0.5 0 0 0 0]. Torch then shifts over by a half index and reinterpolates to [0 .25 .75 .75 .25 0 0 0]

I wouldn't have done that half-index shift.

@HamsterHuey
Copy link

Thanks @richardyang . It does appear that Pytorch is doing something weird with its upsampling. Btw, I really enjoyed reading through this paper and the experiments you laid out in it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants