Skip to content

The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"

Notifications You must be signed in to change notification settings

Ivan-Tang-3D/ENEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs".

[πŸ“– Paper] [πŸ€— HF Checkpoints for stage1]

🏠 About

Solution_Teaser
We introduce ENEL, an Encoder-free 3D Large Language Model capable of overcoming the challenges posed by encoder-based architectures, including the inability to adapt to varying point cloud resolutions and the failure of encoder-extracted point features to meet the semantic needs of Large Language Models. Building upon PointLLM, we conduct a comprehensive investigation into how the LLM can assume the role of the 3D encoder. Based on the PointLLM dataset, our 7B model is evaluated across three benchmark tasks: generative 3D object classification, 3D object captioning, and 3D VQA, with assessments performed using GPT-4 scoring and traditional metrics.

πŸ”₯ News

  • [2023-02-13] We release the codes for training in the pre-training stage with corresponding checkpoints and the codes for evaluation.
  • [2025-02-13] We release the paper of ENEL;

πŸ“‹ Contents

πŸ’¬ Dialogue Examples

Dialogue 1

πŸ” Overview

Model

The encoder-free 3D LMM directly utilizes a token embedding module to convert point cloud data into discrete point tokens, which are then concatenated with text tokens to serve as input to the LLM. To assume the role of the encoder, the LLM is guided to extract high-level semantic features of the point clouds and acquire multi-level knowledge from both global and local perspectives.

Experiment Results

Please refer to our paper for more results.

Reminder of the Model Zoo

In https://huggingface.co/IvanTang/ENEL/tree/main, to adapt to different paths, please modify the attributes: _name_or_path in the config.json file and special_tokens_map_file in the tokenizer_config.json file.

πŸ“¦ Training and Evaluation

Installation

To start:

  1. Clone this repository.
https://github.com/Ivan-Tang-3D/ENEL.git
cd ENEL
  1. Install packages
conda create -n ENEL python=3.10 -y
conda activate ENEL
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# * for training
pip install ninja
pip install flash-attn

# * for chamfer_dist
git clone https://github.com/Pang-Yatian/Point-MAE.git
cd ./extensions/chamfer_dist
python setup.py install --user

Data Preparation

Objaverse Training Data

  1. Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
  2. Run the following command to merge the two files into one and uncompress it. This will produce a folder named 8192_npy containing 660K point cloud files named {Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are xyz and the last three dimensions are rgb in [0, 1] range.
cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz
  1. In ENEL folder, create a folder data and create a soft link to the uncompressed file in the directory.
cd ENEL
mkdir data
ln -s /path/to/8192_npy data/objaverse_data

Instruction-Following Data

  1. In ENEL/data folder, create a directory named anno_data.
  2. Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.
  • The simple-description data has 660K samples and the complex instructions have 70K samples.
  • Both training data are based on the Objaverse dataset.
  • The complex instructions are generated with GPT-4.
  1. Put the data files in the anno_data directory. The directory should look like this:
ENEL/data/anno_data
β”œβ”€β”€ PointLLM_brief_description_660K_filtered.json
β”œβ”€β”€ PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json
  1. Note, the PointLLM_brief_description_660K_filtered.json is filtered from PointLLM_brief_description_660K.json by removing the 3000 objects we reserved as the validation set.

Evaluation Data

  1. Download the referencing GT PointLLM_brief_description_val_200_GT.json we use for the benchmarks on Objaverse dataset here, and put it in ENEL/data/anno_data.

Training

Download the Initial LLM Weight

  1. In ENEL folder, create a directory named checkpoints.
  2. Download the pre-trained LLM: PointLLM_7B_v1.1_init. Put them in the checkpoints directory.

Start Training

  1. For stage-1 training, simply run:
cd ENEL
scripts/ENEL_train_stage1.sh

Evaluation

Inferencing

  1. Run the following commands to infer the results.
  2. Different commands for inferencing on different benchmarks:
MODEL_NAME=
LOG_SUFFIX=
LOG_DIR="/ENEL/new_eval_logs"
LOG_EDIR="/ENEL/new_eval_logs"

export PYTHONPATH="/ENEL:$PYTHONPATH"

# Object captioning on Objaverse
CUDA_VISIBLE_DEVICES=1 python pointllm/eval/eval_objaverse.py --model_name $MODEL_NAME --task_type captioning --prompt_index 2 > $LOG_EDIR/try_obj_${LOG_SUFFIX}.log 2>&1 &

# Open Vocabulary Classification on Objaverse
CUDA_VISIBLE_DEVICES=2 python pointllm/eval/eval_objaverse.py  --model_name $MODEL_NAME --task_type classification --prompt_index 0 > $LOG_EDIR/try_objcls_${LOG_SUFFIX}.log 2>&1 &
  1. Please check the default command-line arguments of these two scripts. You can specify different prompts, data paths, and other parameters.
  2. After inferencing, the results will be saved in {model_name}/evaluation as a dict with the following format:
{
  "prompt": "",
  "results": [
    {
      "object_id": "",
      "ground_truth": "", 
      "model_output": "",
      "label_name": "" # only for classification on modelnet40
    }
  ]
}

ChatGPT/GPT-4 Evaluation

  1. Get your OpenAI API key at https://platform.openai.com/api-keys.
  2. Please set the OpenAI API Key in the 40th line of https://github.com/Ivan-Tang-3D/ENEL/blob/main/pointllm/eval/utils.py.
  3. Run the following commands to evaluate the model outputs in parallel with ChatGPT/GPT-4 (which cost approximately $1.5 to $2.2 USD).
export PYTHONPATH="/ENEL:$PYTHONPATH"

# Open Vocabulary Classification on Objaverse
python pointllm/eval/evaluator.py --results_path /path/to/model_output --model_type gpt-4-0613 --eval_type open-free-form-classification --parallel --num_workers 15

# Object captioning on Objaverse
python pointllm/eval/evaluator.py --results_path /path/to/model_output --model_type gpt-4-0613 --eval_type object-captioning --parallel --num_workers 15
  1. The evaluation script supports interruption and resumption. You can interrupt the evaluation process at any time by using Ctrl+C. This will save the temporary results. If an error occurs during the evaluation, the script will also save the current state. You can resume the evaluation from where it left off by running the same command again.
  2. The evaluation results will be saved in {model_name}/evaluation as another dict. Some of the metrics are explained as follows:
"average_score": The GPT-evaluated captioning score we report in our paper.
"accuracy": The classification accuracy we report in our paper, including random choices made by ChatGPT when model outputs are vague or ambiguous and ChatGPT outputs "INVALID".
"clean_accuracy": The classification accuracy after removing those "INVALID" outputs.
"total_predictions": The number of predictions.
"correct_predictions": The number of correct predictions.
"invalid_responses": The number of "INVALID" outputs by ChatGPT.

# Some other statistics for calling OpenAI API
"prompt_tokens": The total number of tokens of the prompts for ChatGPT/GPT-4.
"completion_tokens": The total number of tokens of the completion results from ChatGPT/GPT-4.
"GPT_cost": The API cost of the whole evaluation process, in US Dollars πŸ’΅.

Traditional Metric Evaluation

  1. For the object captioning task, run the following command to evaluate model outputs with traditional metrics including BLEU, ROUGE, METEOR, Sentence-BERT, and SimCSE.
export PYTHONPATH="/ENEL:$PYTHONPATH"

CUDA_VISIBLE_DEVICES=0 python pointllm/eval/traditional_evaluator.py --results_path /path/to/model_captioning_output

πŸ“ TODO List

  • Add training codes for stage1 with checkpoints.
  • Add evaluation&inferencing codes.
  • Add training codes for stage2.

πŸ”— Citation

If you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:

@misc{tang2025exploringpotentialencoderfreearchitectures,
      title={Exploring the Potential of Encoder-free Architectures in 3D LMMs}, 
      author={Yiwen Tang and Zoey Guo and Zhuhao Wang and Ray Zhang and Qizhi Chen and Junli Liu and Delin Qu and Zhigang Wang and Dong Wang and Xuelong Li and Bin Zhao},
      year={2025},
      eprint={2502.09620},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.09620}, 
}

πŸ“„ License

Creative Commons License This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

πŸ‘ Acknowledgements

About

The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published