Empowering 3D Visual Grounding with Reasoning Capabilities

ECCV 2024
Chenming Zhu Tai Wang Wenwei Zhang Kai Chen Xihui Liu*
The University of Hong Kong Shanghai AI Laboratory

📦 Benchmark and Model

Benchmark Overview

ScanReason is the first comprehensive and hierarchical 3D reasoning grounding benchmark. We define 5 types of questions depending on which type of reasoning is required: Spatial reasoning and function reasoning require fundamental understanding of the 3D physical world, focusing on objects themselves and inter-object spatial relationships in a 3D scene respectively, and logistic reasoning, emotional reasoning, and safety reasoning are high-level reasoning skills built upon the two fundamental reasoning abilities to address user-centric real-world applications.

Model Overview

🔥 News

[2023-10-10] We release our pre-version of ScanReason validation benchmark. Download here. The corresponding 3D bounding boxes annotations could be obtained through the object ids from EmbodiedScan.
[2023-10-01] We release the training and inference codes of ReGround3D.
[2023-07-02] We release the paper of ScanReason.

Getting Started

1. Installation

We utilize at least 4 A100 GPU for training and inference.
We test the code under the following environment:
- CUDA 11.8
- Python 3.9
- PyTorch 2.1.0

Git clone our repository and creating conda environment:

git clone https://github.com/ZCMax/ScanReason.git
conda create -n scanreason python=3.9
conda activate scanreason
pip install -r requirements.txt

Follow EmbodiedScan Installation Doc to install embodiedscan series.

Compile Pointnet2

cd pointnet2
python setup.py install --user

2. Data Preparation

Follow EmbodiedScan Data Preparation Doc to download the raw scan (RGB-D) datasets and modify the VIDEO_FOLDER in train_ds.sh to the raw data path.
Download the text annotations from Google Drive and modify the JSON_FOLDER in train_ds.sh to the annotations path, and modify the INFO_FILE data path which is included in the annotations.

3. Training ReGround3D

We provide the slurm training script with 4 A100 GPUs:

./scripts/train_ds.sh

4. Evaluation ReGround3D

After training, you can run the

./scripts/convert_zero_to_fp32.sh

to convert the weights to pytorch_model.bin file, and then use

./scripts/merge_lora_weights.sh

to merge lora weight and obtain the final checkpoints under ReGround3D-7B.

Finally, run

./scripts/eval_ds.sh

to obtain the grounding results.

📝 TODO List

First Release.
Release ReGround3D code.
Release ScanReason datasets and benchmark.

📄 License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

👏 Acknowledgements

This repo benefits from LISA, EmbodiedScan, 3D-LLM, LLaVA.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
model		model
pointnet2		pointnet2
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Empowering 3D Visual Grounding with Reasoning Capabilities

📦 Benchmark and Model

Benchmark Overview

Model Overview

🔥 News

Getting Started

📝 TODO List

📄 License

👏 Acknowledgements

About

Releases

Packages

Languages

License

rbler1234/ScanReason

Folders and files

Latest commit

History

Repository files navigation

Empowering 3D Visual Grounding with Reasoning Capabilities

📦 Benchmark and Model

Benchmark Overview

Model Overview

🔥 News

Getting Started

📝 TODO List

📄 License

👏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages