FMoW dataset and results variance #63
teasgen
announced in
Announcements
Replies: 1 comment
-
Hi @teasgen, thanks for the comment. I agree that some datasets in the evaluation suite are a bit noisy. We have some analysis on this in Appendix N, section "Clean subset". We didn't find substantial differences in trends when using a cleaner subset of the datasets, but we did not exclude FMoW for that. This may be something we want to revisit, I'm tagging others here to see if they have thoughts. @sagadre @afang-story @yaircarmon @Vaishaal @ludwigschmidt |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm using datacomp evaluation and it seems that FMoW dataset dramatically increases variance. The main metric is 'worst-region accuracy'. There are 5 regions, 4 of them have more than 700 samples. But 1 have only 4 images. It means that it's possible when the answer in 1 image can change the FMoW metric from 0 to 0.25. The average will be changed to 0.25/38≈0.0066 accordingly. For instance, average accuracy 70.0 and average accuracy 69.4 may differ by the answer in one picture!
Because it's impossible to improve the dataset, I suggest just to remove this region from predictions
Beta Was this translation helpful? Give feedback.
All reactions