Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

htwang14 · 2021-06-22T06:59:13Z

Hi Kuang,

I think a batch normalization is missing after the last residual block and before the pooling/fully connected layer in the current pre-activation resnet implementation.

There should be something like out = F.relu(self.bn(out), inplace=True) inserted there, where self.bn = nn.BatchNorm2d(512) should be define in the __init__ function, as done in other public implementations such as this one.

The current Pre-activation resnet cannot converge on CIFAR10. I tried SGD optimizer with initial learning rate 0.1 on pre-act resnet18. The loss soon goes to NaN after the first several iterations.

Fixing the codes in my proposed way leads to good convergence.

:-)

Thanks,
Haotao

The text was updated successfully, but these errors were encountered:

Zhidong-Gao · 2022-12-02T15:25:18Z

It's maybe due to the fact that there are two consequent batch normalization layers.

You already have a batch normalization layer (self.bn1) after self.conv1. While in self.layer1, the self._make_layer() creates another batch normalization layer after self.bn1. Two consequent batch normalization layers easily lead NaN.

:-) Remove self.bn1 should work properly.

moyix added a commit to moyix/pytorch-cifar that referenced this issue Dec 8, 2021

Changes to add kc param, label noise, and fix suggested in kuangliu#137

7b3bddb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

htwang14 commented Jun 22, 2021 •

edited

Loading

Zhidong-Gao commented Dec 2, 2022

Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

Pre-Activation ResNet (ResNet-V2) implementation error leads to NaN loss #137

Comments

htwang14 commented Jun 22, 2021 • edited Loading

Zhidong-Gao commented Dec 2, 2022

htwang14 commented Jun 22, 2021 •

edited

Loading