Efficient and accurate memory saving method towards W4A4 large multi-modal models. [Paper]
Q-VLM: Post-training Quantization for Large Vision-Language Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, Jiwen Lu
- Clone this repository and navigate to QVLM folder
git clone https://github.com/ChangyuanWang17/QVLM.git
cd QVLM
- Install Package
conda create -n QVLM python=3.10 -y
conda activate QVLM
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for Q-VLM
pip uninstall bitsandbytes
cd custom_bitsandbytes
python setup.py install
The following experiments were performed in GeForce RTX 3090 with 24GB memory.
sh scripts/generate_sqa_response.sh
sh scripts/evaluate_sqa_response.sh
Generate and evaluate with multi GPUs
sh scripts/generate_sqa_response_multi.sh
sh scripts/evaluate_sqa_response_multi.sh
Please check out Model Zoo for all public LLaVA checkpoints, and the instructions of how to use the weights. Thanks for LLaVA (https://github.com/haotian-liu/LLaVA) for the amazing open-source model!
We also uploaded LLaVA-v1.3-7B model finetuned on ScienceQA dataset to test the effect of quantization.
Please check out the documentation here.
We thank the authors of following works for opening source their excellent codes.