Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

Open
htwang14 opened this issue Jun 22, 2021 · 1 comment
Open

Comments

@htwang14
Copy link

htwang14 commented Jun 22, 2021

Hi Kuang,

I think a batch normalization is missing after the last residual block and before the pooling/fully connected layer in the current pre-activation resnet implementation.

There should be something like out = F.relu(self.bn(out), inplace=True) inserted there, where self.bn = nn.BatchNorm2d(512) should be define in the __init__ function, as done in other public implementations such as this one.

The current Pre-activation resnet cannot converge on CIFAR10. I tried SGD optimizer with initial learning rate 0.1 on pre-act resnet18. The loss soon goes to NaN after the first several iterations.

Fixing the codes in my proposed way leads to good convergence.

:-)

Thanks,
Haotao

moyix added a commit to moyix/pytorch-cifar that referenced this issue Dec 8, 2021
@Zhidong-Gao
Copy link

It's maybe due to the fact that there are two consequent batch normalization layers.

You already have a batch normalization layer (self.bn1) after self.conv1. While in self.layer1, the self._make_layer() creates another batch normalization layer after self.bn1. Two consequent batch normalization layers easily lead NaN.

:-) Remove self.bn1 should work properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants