To install the environment requirements needed for SceneVerse, you can run the installation scripts provided by:
$ conda env create -n sceneverse python=3.9
$ conda activate sceneverse
$ pip install --r requirements.txt
Meanwhile, SceneVerse depends on an efficient implementation of PointNet2 which is located in modules
. Remember to install it with
$ cd modules/third_party/pointnet2
$ python setup.py install
$ cd ../..
We provide all experiment configurations in configs/final
, you can find the experiment setting in the top of comment
each experiment file. To correctly use the configuration files, you need to change the following fields in the configuration
file to load paths correctly:
base_dir
: save path for model checkpoints, configurations, and logs.logger.entity
: we used W&B for logging experiments, change it to your corresponding account.data.{DATASET}_familiy_base
: path to{Dataset}
related data.model.vision.args.path
: path to the pre-trained object encoder (PointNet++).model.vision.args.lang_path
: deprecated, but basically text embeddings of the 607 classes in ScanNet.
You can walk through the configs/final/all_pretrain.yaml
and compare it with other files to see how we controlled
data and objectives used in training.
This codebase leverages Huggingface Accelerate package and
Facebook Submitit package for efficient model training on multi-node clusters.
We provide a launcher file launch.py
which provides three ways of launching experiment:
# Launching using submitit on a SLURM cluster (e.g. 10 hour 1 node 4 GPU experiment with config file $CONFIG)
$ python launch.py --mode submitit --time 10 --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
--gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME
# Launching using accelerator with a multi-gpu instance
$ python launch.py --mode accelerate --gpu_per_node 4 --num_nodes 1 -- config $CONFIG note=$NOTE name=$EXP_NAME
Basically, launch.py
set up process(es) to run the main entry point run.py
under multi GPU settings. You can
directly overwrite configurations in the configuration file $CONFIG
by setting property fields using =
after
all command line arguments. (e.g., name=$EXP_NAME
,solver.epochs=400
,dataloader.batchsize=4
)
For testing and inference, remember to set up the testing data correctly under each configuration files and switch the
mode
field in the configurations into test
(i.e., mode=test
).
If you want to debug your code without an additional job launcher, you can also directly run the file run.py
.
As an example, you can directly run the file for debugging with
# Single card direct run for debugging purposes
$ python run.py --config-path ${PROJ_PATH}/configs/final/ --config-name ${EXP_CONFIG_NAME}.yaml \
num_gpu=1 hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled \
debug.flag=True debug.debug_size=1 dataloader.batchsize=2 debug.hard_debug=True name=Debug_test
We provide all available checkpoints under the same data directory, named after Checkpoints
. Here we provide detailed
descriptions of checkpoint in the table below:
Setting | Description | Corresponding Experiment | Checkpoint based on experiment setting |
---|---|---|---|
pre-trained |
GPS model pre-trained on SceneVerse | 3D-VL grounding (Tab.2) | Model |
scratch |
GPS model trained on datasets from scratch | 3D-VL grounding (Tab.2) SceneVerse-val (Tab. 3) |
ScanRefer, Sr3D, Nr3D, SceneVerse-val |
fine-tuned |
GPS model fine-tuned on datasets with grounding heads | 3D-VL grounding (Tab.2) | ScanRefer, Sr3D, Nr3D |
zero-shot |
GPS model trained on SceneVerse without data from ScanNet and MultiScan | Zero-shot Transfer (Tab.3) | Model |
zero-shot text |
GPS | Zero-shot Transfer (Tab.3) | ScanNet, SceneVerse-val |
text-ablation |
Ablations on the type of language used during pre-training | Ablation on Text (Tab.7) | Template only, Template+LLM |
scene-ablation |
Ablations on the use of synthetic scenes during pre-training | Ablation on Scene (Tab.8) | Real only, S3D only, ProcTHOR only |
model-ablation |
Ablations on the use of losses during pre-training | Ablation on Model Design (Tab.9) | Refer only, Refer+Obj-lvl, w/o Scene-lvl |
To properly use the pre-trained checkpoints, you can use the pretrain_ckpt_path
key in the configs:
# Directly testing the checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
--gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME mode=test \
pretrain_ckpt_path=$PRETRAIN_CKPT
# Fine-tuning with pre-trained checkpoint
$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \
--gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME \
pretrain_ckpt_path=$PRETRAIN_CKPT
For fine-tuning the pre-trained checkpoint on datasets, you can use the fine-tuning config files provided under
configs/final/finetune
.