Lai Wei , Zihao Jiang, Weiran Huang, Lichao Sun.
Shanghai Jiao Tong University & Lehigh University
- We introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6% of the instruction-following data used in the alignment dataset for MiniGPT-4.
- We first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present an effective and trainable data selector to automatically identify and filter low-quality vision-language data. By employing this method, InstructionGPT-4 outperforms the original MiniGPT-4 on various evaluations.
- Our findings demonstrate that less but high-quality instruction tuning data is efficient to enable multimodal large language models to generate better output.
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
Follow this doc to prepare the environment and download the cc_sbu_align dataset.
You also need to replace '/path/to' with your own path.
Fill in your openai key here.
Generate GPT Score:
cd cc_sbu_align_test
python generate_gpt_score.py
python convert_score.py
Generate CLIP Score, Reward Score and Length Score:
python full_score.py
cd selector
python utils_image.py
python utils_text.py
python get_all_image_features.py
python get_all_text_features.py
cd cluster/kmeans++
python kmeans_pp.py
python average.py
python image_show.py
python split.py
python image2cap.py
Use each subset to fine-tune a new MiniGPT-4 model and follow this doc for evaluation. We choose GQA, IconQA, OKVQA and ScienceQA as the validation sets.
cd selector
bash train.sh
Conduct clustering to ensure the diversity.
cd cluster/spectral
python spectral_clustering.py
python image_show.py
Select the final dataset for InstructionGPT-4
cd selector
bash eval_clus.sh
200 multimodal instruction data are selected here.
InstructionGPT-4 is fine-tuned from these 200 selected samples based on pre-trained MiniGPT-4.
- MiniGPT-4 The model architecture of InstructionGPT-4 follows MiniGPT-4.
- LIMA Our inspiration comes from LIMA: Less Is More for Alignment.
- TinyGPT-V Great work on MLLMs.
If you're using InstructionGPT-4 in your research or applications, please cite using this BibTeX:
@article{wei2023instructiongpt,
title={InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4},
author={Wei, Lai and Jiang, Zihao and Huang, Weiran and Sun, Lichao},
journal={arXiv preprint arXiv:2308.12067},
year={2023}
}