-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResNet18 performs much better than expected! #149
Comments
Hello, TL;DR: the longer you train, the higher the result, but it's not a test result, but a validation result. This code has a big issue, which I already raised: it uses the test set as validation set. What you get is the highest validation accuracy, not a test accuracy. As there is always a bit of randomness on the validation results, the longer you train, the higher the chance that due to this randomness you get a higher result. Yet, this does not reflect what you would get in a never-seen test set, and any scientific publication using this methodology should be immediately retracted. The right way of training the network would be to use some samples from the training set as validation, and use the test set only once, at the very end of the training - instead of using the test set in the optimization process. |
In a different setup, using a different implementation of the same model (ResNet-18 in CIFAR configuration), and a different code to perform the optimization, I also find that without using the test set during the evaluation I can reach 95.4 - 95.5% (I don't have a script to share since I am doing it as part of a bigger benchmark, but the gist of it is here). I am therefore concurring that this is not due to early stopping on the test set, or retaining the best test accuracy. However it is also true that this specific setup (e.g. in particular the weight decay and learning rate values) might have been tuned on the test set, which would be problematic. But I think this is another topic. As to why there is an improvement compared to the numbers reported in the repo, my guess is that with newer models, new training strategies were implemented and 93% is what you would get with an old strategy. |
This is not original ResNet18 network. That's the reason why accuracy is so high: #136 (comment) |
@skrbnv the differences pointed out in this comment are just the differences between the CIFAR-10 and the ImageNet versions. If you use a ResNet with striding in the first conv, and the initial MaxPooling, you will not obtain accuracies above 91% (I tried). The main reason is that you lose too much information at the very beginning of the network.
|
I ran this repo with ResNet18 using 4 GPUs and with the latest Pytorch version and I got 95.67% instead of 93.02% reported in the ReadMe table.
I'm wondering whether anything improved in Pytorch that could explain this major improvement?
My requirements list:
numpy==1.22.3
torch==1.11.0+cu115
torchvision==0.12.0+cu115
-f https://download.pytorch.org/whl/torch_stable.html
The text was updated successfully, but these errors were encountered: