False sampling of data #41

hbellafkir · 2020-09-01T10:50:17Z

hi,

I just found out, that all images in the query list are also in the database list, which is not allowed for fair validation.

thanks

vinnik-dmitry07 · 2023-04-13T22:07:01Z

-- Second this.

prefix = 'D:/Downloads/HashNet-master/HashNet-master/pytorch/data/'
for dataset in ['imagenet', 'coco', 'nuswide_81']:
    with open(prefix + f'{dataset}/train.txt', 'r') as f:
        train = set(f.read().splitlines())
    with open(prefix + f'{dataset}/test.txt', 'r') as f:
        test = set(f.read().splitlines())
    with open(prefix + f'{dataset}/database.txt', 'r') as f:
        database = set(f.read().splitlines())
    print(dataset, len(train.intersection(database)))
    print(dataset, len(test.intersection(database)))
    print(dataset, len(test.intersection(train)))

imagenet 13000
imagenet 0
imagenet 0
coco 0
coco 5000
coco 0
nuswide_81 10000
nuswide_81 0
nuswide_81 0

During test time we use test.txt as query and database.txt as retrieval. They should not intersect which is wrong for COCO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False sampling of data #41

False sampling of data #41

hbellafkir commented Sep 1, 2020

vinnik-dmitry07 commented Apr 13, 2023

False sampling of data #41

False sampling of data #41

Comments

hbellafkir commented Sep 1, 2020

vinnik-dmitry07 commented Apr 13, 2023