Skip to content

[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities

License

Notifications You must be signed in to change notification settings

rbler1234/ScanReason

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Empowering 3D Visual Grounding with Reasoning Capabilities

ECCV 2024
Chenming ZhuTai WangWenwei ZhangKai ChenXihui Liu*
The University of Hong Kong Shanghai AI Laboratory

arXiv

📦 Benchmark and Model

Benchmark Overview

ScanReason is the first comprehensive and hierarchical 3D reasoning grounding benchmark. We define 5 types of questions depending on which type of reasoning is required: Spatial reasoning and function reasoning require fundamental understanding of the 3D physical world, focusing on objects themselves and inter-object spatial relationships in a 3D scene respectively, and logistic reasoning, emotional reasoning, and safety reasoning are high-level reasoning skills built upon the two fundamental reasoning abilities to address user-centric real-world applications.

Model Overview

🔥 News

  • [2023-10-10] We release our pre-version of ScanReason validation benchmark. Download here. The corresponding 3D bounding boxes annotations could be obtained through the object ids from EmbodiedScan.
  • [2023-10-01] We release the training and inference codes of ReGround3D.
  • [2023-07-02] We release the paper of ScanReason.

Getting Started

1. Installation

  • We utilize at least 4 A100 GPU for training and inference.

  • We test the code under the following environment:

    • CUDA 11.8
    • Python 3.9
    • PyTorch 2.1.0
  • Git clone our repository and creating conda environment:

    git clone https://github.com/ZCMax/ScanReason.git
    conda create -n scanreason python=3.9
    conda activate scanreason
    pip install -r requirements.txt
  • Follow EmbodiedScan Installation Doc to install embodiedscan series.

  • Compile Pointnet2

    cd pointnet2
    python setup.py install --user
    

2. Data Preparation

  1. Follow EmbodiedScan Data Preparation Doc to download the raw scan (RGB-D) datasets and modify the VIDEO_FOLDER in train_ds.sh to the raw data path.

  2. Download the text annotations from Google Drive and modify the JSON_FOLDER in train_ds.sh to the annotations path, and modify the INFO_FILE data path which is included in the annotations.

3. Training ReGround3D

We provide the slurm training script with 4 A100 GPUs:

./scripts/train_ds.sh

4. Evaluation ReGround3D

After training, you can run the

./scripts/convert_zero_to_fp32.sh 

to convert the weights to pytorch_model.bin file, and then use

./scripts/merge_lora_weights.sh

to merge lora weight and obtain the final checkpoints under ReGround3D-7B.

Finally, run

./scripts/eval_ds.sh

to obtain the grounding results.

📝 TODO List

  • First Release.
  • Release ReGround3D code.
  • Release ScanReason datasets and benchmark.

📄 License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

👏 Acknowledgements

This repo benefits from LISA, EmbodiedScan, 3D-LLM, LLaVA.

About

[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.7%
  • Cuda 3.3%
  • C++ 2.4%
  • Other 0.6%