This repository is based on the pytorch implementation of Masked Autoencoders Are Scalable Vision Learners(He et al., 2021).
The system trains MAE model with style classification, genre classification and triplet learning task simultaneously.
For each data point
where the relevance measure
- WikiArt
- Notice) We used only images that had both style and genre labels.
- MulititasPainting100k
Loss Function | Wikiart paintings | MultitaskPainting100k | ||||||||||
Style | Genre | Style | Genre | |||||||||
P@1 | P@5 | P@10 | P@1 | P@5 | P@10 | P@1 | P@5 | P@10 | P@1 | P@5 | P@10 | |
69.11 | 67.86 | 67.48 | 64.56 | 59.95 | 57.26 | 62.89 | 59.58 | 58.19 | 57.25 | 52.81 | 50.43 | |
69.71 | 68.48 | 68.02 | 77.20 | 75.80 | 75.18 | 63.04 | 60.25 | 59.12 | 65.17 | 62.65 | 61.42 | |
41.71 | 36.79 | 34.30 | 77.53 | 77.10 | 77.18 | 40.63 | 34.61 | 31.83 | 67.36 | 66.38 | 65.93 | |
54.82 | 51.18 | 49.33 | 79.34 | 79.02 | 78.99 | 45.96 | 40.73 | 38.21 | 68.56 | 67.62 | 67.31 | |
69.09 | 66.81 | 65.67 | 78.66 | 77.41 | 76.70 | 61.79 | 57.18 | 54.88 | 66.92 | 63.99 | 62.61 | |
69.21 | 67.35 | 66.49 | 79.83 | 79.07 | 78.77 | 61.78 | 58.10 | 56.15 | 69.17 | 67.29 | 66.49 |