Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test set is being used as validation set #145

Open
seuretm opened this issue Jan 24, 2022 · 6 comments
Open

The test set is being used as validation set #145

seuretm opened this issue Jan 24, 2022 · 6 comments

Comments

@seuretm
Copy link

seuretm commented Jan 24, 2022

The network is evaluated on the test set at every epoch, and whenever the result is higher, the network is saved (some kind of early stopping). This is what a validation set should be used for (as CIFAR-10 does not contain a validation set, a subset of the training data can be used for this). The goal of the test set is to know how well a network performs on unseen data; however in this case, the test set is used for optimizing the network's results.

The test set must be used only once, at the end of the training. This training procedure is erroneous, and therefore the reported results are unfortunately all invalid.

@zjysteven
Copy link

Agreed. Even more concerning, many papers now are reporting their performance using the best results on the test set.

@wihn2021
Copy link

But the net is set to eval mode before being tested, while it is set to train mode before training, using:
net.train() and net.eval()

@zjysteven
Copy link

It is not about batchnorm statistics... It is just that evaluating on the test set to select the best model (e.g., best checkpoint and hyperparameters) goes against the basic practice/assumption of machine learning and is not realistic. In real world, there is no way to obtain the expected test samples before the model is deployed.

@wihn2021
Copy link

🤔right.✌️

@melhzy
Copy link

melhzy commented Mar 24, 2022

Yes. It is a big issue here though. The test set is been used as the validation set. That means the models trained in the framework memorize the data patterns from the test set and train set. Overall, it causes overfitting.

@yolunghiu
Copy link

yolunghiu commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants