FMoW dataset and results variance #63

teasgen · 2023-09-19T18:09:14Z

teasgen
Sep 19, 2023

Hi, I'm using datacomp evaluation and it seems that FMoW dataset dramatically increases variance. The main metric is 'worst-region accuracy'. There are 5 regions, 4 of them have more than 700 samples. But 1 have only 4 images. It means that it's possible when the answer in 1 image can change the FMoW metric from 0 to 0.25. The average will be changed to 0.25/38≈0.0066 accordingly. For instance, average accuracy 70.0 and average accuracy 69.4 may differ by the answer in one picture!

Because it's impossible to improve the dataset, I suggest just to remove this region from predictions

gabrielilharco · 2023-09-28T13:38:12Z

gabrielilharco
Sep 28, 2023
Maintainer

Hi @teasgen, thanks for the comment. I agree that some datasets in the evaluation suite are a bit noisy. We have some analysis on this in Appendix N, section "Clean subset". We didn't find substantial differences in trends when using a cleaner subset of the datasets, but we did not exclude FMoW for that. This may be something we want to revisit, I'm tagging others here to see if they have thoughts. @sagadre @afang-story @yaircarmon @Vaishaal @ludwigschmidt

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FMoW dataset and results variance #63

{{title}}

Replies: 1 comment

{{title}}

Select a reply

FMoW dataset and results variance #63

teasgen Sep 19, 2023

Replies: 1 comment

gabrielilharco Sep 28, 2023 Maintainer

teasgen
Sep 19, 2023

gabrielilharco
Sep 28, 2023
Maintainer