Implementation of work "Augmented Multi-Modality Fusion for GeneralizedZero-Shot Sketch-based Visual Retrieval".
A new generalized zero-shot sketch-based image retrieval evaluation protocol constructed on DomainNet dataset.
"Sketch-like" domains: Sketch(Sk), Quickdraw(Qu).
"Photo-like" domains: Real(Re), Painting(Pa), Infograph(In), Clipart(Cl).
DomainNet contains 345 categories intotal. 45 classes never present in ImageNet are chosen as unseen, and the rest 300 are seen.
3. Train and test datasets follow the same splits as DomainNet
Training stage: "sketch-like" domain seen categories training split data are available, and "photo-like" domain seen categories all data (including train and test splits) build up the retrieval gallery.
Test stage: Both "sketch-like" domain seen categories training and test splits data are evaluated. For retrieval gallery made up with the "photo-like" domain, all unseen categories data (including train and test splits), together with the same number of randomly selected seen categories samples build up the retrieval gallery., to avoid the influence of the data distribution imbalance.
21 categories not in ImageNet are treated as unseen categories, and the rest 104 classes are for training. For generalized experiments, we follow the previous works setting.
30 categories never present in ImageNet are treated as unseen categories, and the rest 220 classes are for training. For generalized experiments, we follow the previous works setting.
We adopt the word-to-vector model pre-trained on Google News dataset (~ 100 billion words) link.
PyTorch 1.8.1
Numpy 1.19.5
Sklearn
python main.py
Evaluation metrics: For seen (S) and unseen (U) categories, we report the mAP@all, Prec@100, mAP@200, Prec@200. To evaluate the model's generalization ability, we also report the harmonic mean of the seen and unseen categories results:
If you think this work is interesting, please cite:
If you have any questions about this work, feel free to contact