Skip to content

Latest commit

 

History

History
106 lines (93 loc) · 10.6 KB

README.md

File metadata and controls

106 lines (93 loc) · 10.6 KB

awesome-MIM

We have listed the most popular methods in the field of Masked Image Modeling (MIM). If there are any omissions, please feel free to submit a request for additions. (Note: The dates shown correspond to the first submission of the papers to arXiv, but the provided links may point to the latest versions.)

Additionally, we encourage you to cite our work, SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders.

Backbone models.

Date Method Conference Title Code
2020-xx-xx(maybe 2019) iGPT ICML 2020 Generative Pretraining from Pixels iGPT
2020-10-22 ViT ICLR 2021 (Oral) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale ViT
2021-04-08 SiT Arxiv 2021 SiT: Self-supervised vIsion Transformer None
2021-06-10 MST NeurIPS 2021 MST: Masked Self-Supervised Transformer for Visual Representation None
2021-06-14 BEiT ICLR 2022 (Oral) BEiT: BERT Pre-Training of Image Transformers BEiT
2021-11-11 MAE Arxiv 2021 Masked Autoencoders Are Scalable Vision Learners MAE
2021-11-15 iBoT ICLR 2022 iBOT: Image BERT Pre-Training with Online Tokenizer iBoT
2021-11-18 SimMIM Arxiv 2021 SimMIM: A Simple Framework for Masked Image Modeling SimMIM
2021-11-24 PeCo Arxiv 2021 PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers None
2021-11-30 MC-SSL0.0 Arxiv 2021 MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning None
2021-12-16 MaskFeat Arxiv 2021 Masked Feature Prediction for Self-Supervised Visual Pre-Training None
2021-12-20 SplitMask Arxiv 2021 Are Large-scale Datasets Necessary for Self-Supervised Pre-training? None
2022-01-31 ADIOS Arxiv 2022 Adversarial Masking for Self-Supervised Learning None
2022-02-07 CAE Arxiv 2022 Context Autoencoder for Self-Supervised Representation Learning CAE
2022-02-07 CIM Arxiv 2022 Corrupted Image Modeling for Self-Supervised Visual Pre-Training None
2022-03-10 MVP Arxiv 2022 MVP: Multimodality-guided Visual Pre-training None
2022-03-23 AttMask ECCV 2022 What to Hide from Your Students: Attention-Guided Masked Image Modeling AttMask
2022-03-29 mc-BEiT Arxiv 2022 mc-BEiT: Multi-choice Discretization for Image BERT Pre-training None
2022-04-18 Ge2-AE Arxiv 2022 The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training None
2022-05-08 MCMAE NeurIPS 2022 MCMAE: Masked Convolution Meets Masked Autoencoders MCMAE
2022-05-20 UM-MAE Arxiv 2022 Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality UM-MAE
2022-05-26 GreenMIM Arxiv 2022 Green Hierarchical Vision Transformer for Masked Image Modeling GreenMIM
2022-05-26 MixMIM Arxiv 2022 MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning Code is Opening
2022-05-28 SupMAE Arxiv 2022 SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners SupMAE
2022-05-30 HiViT Arxiv 2022 HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling None
2022-06-01 LoMaR Arxiv 2022 Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction LoMaR
2022-06-22 SemMAE NeurIPS 2022 SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders SemMAE
2022-08-11 MILAN Arxiv 2022 MILAN: Masked Image Pretraining on Language Assisted Representation MILAN
2022-11-14 EVA Arxiv 2022 EVA: Exploring the Limits of Masked Visual Representation Learning at Scale EVA
2022-11-28 AMT AAAI 2023 Good helper is around you: Attention-driven Masked Image Modeling AMT
2023-01-03 TinyMIM CVPR 2023 TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models TinyMIM
2023-03-04 PixMIM Arxiv 2023 PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling PixMIM
2023-03-09 LocalMIM CVPR 2023 Masked Image Modeling with Local Multi-Scale Reconstruction LocalMIM
2023-03-12 AutoMAE Arxiv 2023 Improving Masked Autoencoders by Learning Where to Mask AutoMAE
2023-03-15 DeepMIM Arxiv 2023 DeepMIM: Deep Supervision for Masked Image Modeling DeepMIM
2023-04-25 Img2Vec Arxiv 2023 Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders None
2023-12-30 DTM Arxiv 2023 Masked Image Modeling via Dynamic Token Morphing None
2024-11-24 PR-MIM Arxiv 2024 PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling None

Others:

Object detection.

Date Method Conference Title Code
2022-04-06 MIMDet Arxiv 2022 Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection MIMDet

3D.

Date Method Conference Title Code
2021-11-29 Point-BERT CVPR 2022 Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling Point-BERT
2022-03-28 Point-MAE ECCV 2022 Masked Autoencoders for Point Cloud Self-supervised Learning Point-MAE
2022-05-28 Point-M2AE NeurIPS 2022 Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training Point-M2AE
2022-12-13 I2P-MAE CVPR 2023 Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders I2P-MAE
2024-04-01 NeRF-MAE ECCV 2024 NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields NeRF-MAE

Image generation.

Date Method Conference Title Code
2022-02-08 MaskGIT Arxiv 2022 MaskGIT: Masked Generative Image Transformer None

Unsupervised Domain Adaptation.

Date Method Conference Title Code
2023-06-18 MIC CVPR 2023 MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation None

Video.

Date Method Conference Title Code
2021-12-02 BEVT Arxiv 2021 BEVT: BERT Pretraining of Video Transformers BEVT
2022-03-23 VideoMAE NeurIPS 2022 VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training VideoMAE
2022-05-18 MAE_ST NeurIPS 2022 Masked Autoencoders As Spatiotemporal Learners MAE_ST
2023-03-29 VideoMAE v2 CVPR 2023 VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking None

Multi-modal.

Date Method Conference Title Code
2022-04-04 MultiMAE Arxiv 2022 MultiMAE: Multi-modal Multi-task Masked Autoencoders MultiMAE
2022-05-27 M3AE Arxiv 2022 Multimodal Masked Autoencoders Learn Transferable Representations None
2022-08-03 xxx Arxiv 2022 Masked Vision and Language Modeling for Multi-modal Representation Learning None
2022-12-01 FLIP Arxiv 2022 Scaling Language-Image Pre-training via Masking None

Medical.

Date Method Conference Title Code
2022-03-10 MedMAE Arxiv 2022 Self Pre-training with Masked Autoencoders for Medical Image Analysis None

Analysis.

Date Method Conference Title
2022-08-08 RelaxMIM Arxiv 2022 Understanding Masked Image Modeling via Learning Occlusion Invariant Feature

Survey.

Date Conference Title
2022-07-30 Arxiv 2022 A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
2023-12-31 Arxiv 2023 Masked Modeling for Self-supervised Representation Learning on Vision and Beyond