Zichen Wen1,2,
Yifeng Gao1,
Shaobo Wang1,
Junyuan Zhang2,
Qintong Zhang2,4,
Weijia Li3,2,
Conghui He2✉,
Linfeng Zhang1✉,
1Shanghai Jiao Tong University, 2Shanghai AI Laboratory,
3Sun Yat-sen University, 4Peking University
2025.02.22
🤗🤗 We release our latest work DART, a plug-and-play, training-free token reduction method that seamlessly integrates with efficient attention operators. Code is available!2025.03.18
🤗🤗 We have released the implementation of DART for Qwen2-VL, and now you can easily evaluate it using lmms-eval!2025.03.19
🤗🤗 The implementation and evaluation scripts for LLaVA-Next are now available!
TLDR: We propose DART (Duplication-Aware Reduction of Tokens), a training-free method that prunes vision tokens based on duplication, achieving 88.9% token reduction and 1.99 speed-up while maintaining performance and compatibility with efficient attention operators.
- Clone this repository.
git clone https://github.com/ZichenWen1/DART
cd DART
- Environment Setup and Preparation
conda create -n DART python=3.10 -y
conda activate DART
pip install -e .
pip install flash-attn --no-build-isolation
- Download Multimodal Benchmark
Please follow the detailed instruction in LLaVA-Evaluation.
conda create -n DART_Qwen2VL python=3.10 -y
conda activate DART_Qwen2VL
cd Qwen2-VL/transformers && pip install -e .
pip install accelerate qwen-vl-utils[decord]
pip install flash-attn --no-build-isolation
cd lmms-eval && pip install -e .
bash scripts/v1_5/eval/[Benchmark].sh [Reduction_Ratio] [Max_Num_Trunction]
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh 0.778 128
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/pope.sh 0.778 128
CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh 0.778 128
cd Qwen2-VL
bash eval_scripts/lmms_eval.sh True [Reduction_Ratio]
This project is released under the Apache 2.0 license.
Please consider citing our paper in your publications, if our findings help your research.
@article{wen2025stop,
title={Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More},
author={Wen, Zichen and Gao, Yifeng and Wang, Shaobo and Zhang, Junyuan and Zhang, Qintong and Li, Weijia and He, Conghui and Zhang, Linfeng},
journal={arXiv preprint arXiv:2502.11494},
year={2025}
}
We extend our gratitude to the open-source efforts of LLaVA, Qwen2-VL, and lmms-eval.
For any questions about our paper or code, please email [email protected]
.