Model | Token Grid | Top-1 Acc. | Config |
---|---|---|---|
FastChannelVim-S/16.ckpt | 142 x 8 | 73.6 | FastChannelVim-S/16.yaml |
FastChannelVim-S/16 - Maxpool.ckpt | 142 x 8 | 72.9 | FastChannelVim-S/16 - Maxpool.yaml |
ChannelVim-S/16.ckpt | 142 x 8 | 73.5 | ChannelVim-S/16.yaml |
FastChannelVim-S/8.ckpt | 282 x 8 | 83.1 | FastChannelVim-S/8.yaml |
FastChannelVim-S/8 - Maxpool.ckpt | 282 x 8 | 85.0 | FastChannelVim-S/8 - Maxpool.yaml |
ChannelVim-S/8.ckpt | 282 x 8 | 83.0 | ChannelVim-S/8.yaml |
Notes:
- For reproducibility, make sure overall batch size remains 256 across GPUs/Nodes.
- The preprocessed JUMP-CP data used in this paper was previously released along with "Contextual Vision Transformers for Robust Representation Learning" insitro/ContextViT.