resnet26 is worse than resnet20? #1

ZHAIXINGZHAIYUE · 2019-12-20T03:21:31Z

using Imagenette:
resnet26 is worse than resnet20?

akshaykulkarni07 · 2019-12-20T06:51:29Z

Which particular experiment are you talking about?

ZHAIXINGZHAIYUE · 2019-12-20T07:31:59Z

Results using Imagenette :

Student Model	Validation Accuracy without Teacher (%)	Validation Accuracy with simultaneous training (%)	Validation Accuracy with stagewise training (%)	Difference between Teacher and Student (for stagewise) (%)
ResNet10	91.8	92.2	97.4	1.8
ResNet14	91.2	93.2	98.8	0.4
ResNet18	91.4	92.4	98.8	0.4
ResNet20	91.6	92.4	98.8	0.4
ResNet26	90.6	91.8	99	0.2

akshaykulkarni07 · 2019-12-20T13:15:05Z

We have trained for 100 epochs using Adam optimizer with LR 1e-4. In such a case, we find ResNet26 marginally poor compared to ResNet20. This can possibly be due to:

ResNet26 has more parameters and thus, it will tend to overfit when trained with the same amount of data as used for ResNet20. Now, a similar argument can be applied for ResNet20 and ResNet18/14, but the observation is not the same. This may be because fewer parameters imply less learning capability. So, ResNet18 or 20 may be considered to be in a sort of 'sweet spot' between fewer parameters and too many parameters.
The first point becomes more apparent when we see that stagewise training results are increasing with number of parameters. This is because there are fewer parameters to optimize in each stage of training, which improves the overall training (because all parameters are not trained together).

ZHAIXINGZHAIYUE · 2019-12-23T02:40:47Z

@akshaykvnit thank you very much,

ZHAIXINGZHAIYUE · 2019-12-25T07:21:07Z

@akshaykvnit Should I fix the mean and var in BN of the first stage when I train the second stage?

akshaykulkarni07 · 2019-12-26T05:59:30Z

Yes, ideally we should freeze all parameters that are not in the particular stage being trained. This will also include the BN parameters. According to this answer, setting requires_grad = False will do the job (as we have done).

akshaykulkarni07 added the question Further information is requested label Dec 20, 2019

ZHAIXINGZHAIYUE closed this as completed Dec 23, 2019

SharathRaparthy mentioned this issue Jun 6, 2020

Implementation of ATKD #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resnet26 is worse than resnet20? #1

resnet26 is worse than resnet20? #1

ZHAIXINGZHAIYUE commented Dec 20, 2019

akshaykulkarni07 commented Dec 20, 2019

ZHAIXINGZHAIYUE commented Dec 20, 2019

akshaykulkarni07 commented Dec 20, 2019 •

edited

Loading

ZHAIXINGZHAIYUE commented Dec 23, 2019

ZHAIXINGZHAIYUE commented Dec 25, 2019

akshaykulkarni07 commented Dec 26, 2019

resnet26 is worse than resnet20? #1

resnet26 is worse than resnet20? #1

Comments

ZHAIXINGZHAIYUE commented Dec 20, 2019

akshaykulkarni07 commented Dec 20, 2019

ZHAIXINGZHAIYUE commented Dec 20, 2019

akshaykulkarni07 commented Dec 20, 2019 • edited Loading

ZHAIXINGZHAIYUE commented Dec 23, 2019

ZHAIXINGZHAIYUE commented Dec 25, 2019

akshaykulkarni07 commented Dec 26, 2019

akshaykulkarni07 commented Dec 20, 2019 •

edited

Loading