We have listed the most popular methods in the field of Masked Image Modeling (MIM). If there are any omissions, please feel free to submit a request for additions. (Note: The dates shown correspond to the first submission of the papers to arXiv, but the provided links may point to the latest versions.)
Additionally, we encourage you to cite our work, SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders.
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2022-04-06 | MIMDet | Arxiv 2022 | Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection | MIMDet |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2021-11-29 | Point-BERT | CVPR 2022 | Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling | Point-BERT |
2022-03-28 | Point-MAE | ECCV 2022 | Masked Autoencoders for Point Cloud Self-supervised Learning | Point-MAE |
2022-05-28 | Point-M2AE | NeurIPS 2022 | Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training | Point-M2AE |
2022-12-13 | I2P-MAE | CVPR 2023 | Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders | I2P-MAE |
2024-04-01 | NeRF-MAE | ECCV 2024 | NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields | NeRF-MAE |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2022-02-08 | MaskGIT | Arxiv 2022 | MaskGIT: Masked Generative Image Transformer | None |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2023-06-18 | MIC | CVPR 2023 | MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation | None |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2021-12-02 | BEVT | Arxiv 2021 | BEVT: BERT Pretraining of Video Transformers | BEVT |
2022-03-23 | VideoMAE | NeurIPS 2022 | VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | VideoMAE |
2022-05-18 | MAE_ST | NeurIPS 2022 | Masked Autoencoders As Spatiotemporal Learners | MAE_ST |
2023-03-29 | VideoMAE v2 | CVPR 2023 | VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | None |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2022-04-04 | MultiMAE | Arxiv 2022 | MultiMAE: Multi-modal Multi-task Masked Autoencoders | MultiMAE |
2022-05-27 | M3AE | Arxiv 2022 | Multimodal Masked Autoencoders Learn Transferable Representations | None |
2022-08-03 | xxx | Arxiv 2022 | Masked Vision and Language Modeling for Multi-modal Representation Learning | None |
2022-12-01 | FLIP | Arxiv 2022 | Scaling Language-Image Pre-training via Masking | None |
Date | Method | Conference | Title | Code |
---|---|---|---|---|
2022-03-10 | MedMAE | Arxiv 2022 | Self Pre-training with Masked Autoencoders for Medical Image Analysis | None |
Date | Method | Conference | Title |
---|---|---|---|
2022-08-08 | RelaxMIM | Arxiv 2022 | Understanding Masked Image Modeling via Learning Occlusion Invariant Feature |
Date | Conference | Title |
---|---|---|
2022-07-30 | Arxiv 2022 | A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond |
2023-12-31 | Arxiv 2023 | Masked Modeling for Self-supervised Representation Learning on Vision and Beyond |