Baseline Methods: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning
The official PyTorch Implementation of Point-CMAE
Bin Ren 1,2, Guofeng Mei3, Danda Pani Paudel4,5, Weijie Wang2,3, Yawei Li 4, Mengyuan Liu6, Rita Cucchiara7,Luc Van Gool 4,5, and Nicu Sebe 2
1 University of Pisa, Italy,
2 University of Trento, Italy,
3 Fondazione Bruno Kessler, Italy,
4 ETH Zürich, Switzerland,
5 INSAIT Sofia University, Bulgaria,
6 Peking University, China,
7 University of Modena and Reggio Emilia, Italy,
- 📌
09/24/2024
: We are organizing the codes, it will be released after ICCV submission. - 🎉
09/20/2024
: Our paper is accepted by 17th Asian Conference on Computer Vision (ACCV2024)! - 📌
07/18/2024
: Repository is created. Our code will be made publicly available upon acceptance.
Abstract
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that, for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. The source code and trained models are availableTBD
TBD
TBD
- PyTorch >= 1.7.0
- python >= 3.7
- CUDA >= 9.0
- GCC >= 4.9
- torchvision
# Create the virtual environment via micromamba or anaconda:
micromamba/conda create -n points python=3.8 -y
# Install PyTorch 1.11.0 + CUDA 11.3
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
# Install Other libs
pip install -r requirements.txt
# Install pytorch3d from wheels (We use the chamfer distance loss within pytorch3d)
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html
bash install.sh
or from source:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# Install PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
# Install GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
For ModelNet40, ScanObjectNN, and ShapeNetPart datasets, we use ShapeNet for the pre-training of MaskPoint models, and then finetune on these datasets respectively.
The details of used datasets can be found in DATASET.md.
To pre-train the Point-CMAE models on ShapeNet, simply run:
python main.py --config cfgs/pretrain_shapenet.yaml \
--exp_name pretrain_shapenet \
[--val_freq 10]
We finetune our Point-CMAE on 5 downstream tasks: Classfication on ModelNet40, Few-shot learning on ModelNet40, Transfer learning on ScanObjectNN, Part segmentation on ShapeNetPart.
To finetune a pre-trained Point-CMAE model on ModelNet40, simply run:
python main.py
--config cfgs/finetune_modelnet.yaml \
--finetune_model \
--ckpts <path> \
--exp_name <name>
To evaluate a model finetuned on ModelNet40, simply run:
bash ./scripts/test.sh <GPU_IDS>\
--config cfgs/finetune_modelnet.yaml \
--ckpts <path> \
--exp_name <name>
We follow the few-shot setting in the previous work.
First, generate your own few-shot learning split or use the same split as us (see DATASET.md).
To finetune a pre-trained Point-CMAE model on ScanObjectNN, simply run:
python main.py \
--config cfgs/finetune_scanobject_hardest.yaml \
--finetune_model \
--ckpts <path> \
--exp_name <name>
To evaluate a model on ScanObjectNN, simply run:
bash ./scripts/test_scan.sh <GPU_IDS>\
--config cfgs/finetune_scanobject_hardest.yaml \
--ckpts <path> \
--exp_name <name>
TBD
If you find our work helpful, please consider citing the following paper and/or ⭐ the repo.
@inproceedings{ren2024bringing,
title={Bringing masked autoencoders explicit contrastive properties for point cloud self-supervised learning},
author={Ren, Bin and Mei, Guofeng and Paudel, Danda Pani and Wang, Weijie and Li, Yawei and Liu, Mengyuan and Cucchiara, Rita and Van Gool, Luc and Sebe, Nicu},
booktitle={Proceedings of the Asian Conference on Computer Vision},
pages={2034--2052},
year={2024}
}
Our code is built on the code base of Point-MAE.