Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False sampling of data #41

Open
hbellafkir opened this issue Sep 1, 2020 · 1 comment
Open

False sampling of data #41

hbellafkir opened this issue Sep 1, 2020 · 1 comment

Comments

@hbellafkir
Copy link

hi,

I just found out, that all images in the query list are also in the database list, which is not allowed for fair validation.

thanks

@vinnik-dmitry07
Copy link

-- Second this.

prefix = 'D:/Downloads/HashNet-master/HashNet-master/pytorch/data/'
for dataset in ['imagenet', 'coco', 'nuswide_81']:
    with open(prefix + f'{dataset}/train.txt', 'r') as f:
        train = set(f.read().splitlines())
    with open(prefix + f'{dataset}/test.txt', 'r') as f:
        test = set(f.read().splitlines())
    with open(prefix + f'{dataset}/database.txt', 'r') as f:
        database = set(f.read().splitlines())
    print(dataset, len(train.intersection(database)))
    print(dataset, len(test.intersection(database)))
    print(dataset, len(test.intersection(train)))
imagenet 13000
imagenet 0
imagenet 0
coco 0
coco 5000
coco 0
nuswide_81 10000
nuswide_81 0
nuswide_81 0

During test time we use test.txt as query and database.txt as retrieval. They should not intersect which is wrong for COCO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants