APASI: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

[📖 Paper] [🤗 APASI-Model] [🤗 SI-Dataset]

Introduction

This is the official implementation of our paper: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations.

In this work, we propose Autonomous Preference Alignment via Self-Injection (APASI). Unlike previous methods relying on annotations from human or external AI models, APASI leverages the target LVLM itself to self-inject hallucinations into a generated response, creating a pair of responses with varying preference levels for DPO-based preference alignment.

Our work is accepted by EMNLP 2025.

Dataset

We present the preference dataset, SI-Dataset, which is constructed using only the target LVLM. Specifically, we construct the SI-23k, deriving from images and descriptive responses in the detail-23k subset of the LLaVA's instruction tuning data. The scaled-up SI-130k is constructed by adding unannotated images from the VisualGenome (VG) dataset

APASI Model Weight

We release the LoRA adaptation weights of APASI based on LLaVA-v1.5-7B. The models are trained with SI-23k and SI-130k, named APASI-Base-7B and APASI-Scaled, respectively.

To use the model, please follow the code in LLaVA's official repo.

Install

clone this repo

git clone https://github.com/davidluciolu/APASI.git
cd APASI

Create a conda environment and install the dependencies

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install nltk

Data Preparation

Download the images from the COCO and VG dataset, put them into ./playground/data
Download the SI-Dataset and make a directory under ./playground/data/neg_data

cd ./playground/data/neg_data/
mkdir detail_23k_llava-v1.5-7b_gen_llava-v1.5-7b_lvis_guide_replace_0.2_1_skip1_num1
cd detail_23k_llava-v1.5-7b_gen_llava-v1.5-7b_lvis_guide_replace_0.2_1_skip1_num1
mv [downloaded SI-23k parquet] detail_23k_llava-v1.5-7b_gen_llava-v1.5-7b_iter_0.parquet

If you want to prepare the SI data from LLaVA detail-23k, run the scripts

bash scripts/rl/make_data.sh

Train

Run the scripts:

# just run dpo training
bash scripts/rl/dpo.sh

# make data + dpo
bash scripts/rl/all_in_one.sh

Inference and Evaluation

Prepare the evaluation benchmark data following the instructions in LLaVA. For Object-Hal (CHAIR) evaluation, we use the sampled 500 images following OPERA.
Run the script:

bash scripts/rl/eval_all.sh

Acknowledgement

LLaVA: The baseline model for this work.
RLHF-V: We followed the DPO code in this repo for training.
OPERA: We follow the Object-Hal evaluation in this repo.

Citation

If you find our model/code/data/paper helpful, please consider cite our papers 📝 and star us ⭐️！

@misc{lu2025mitigatinghallucinationslargevisionlanguage,
      title={Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations},
      author={Yifan Lu and Ziqi Zhang and Chunfeng Yuan and Jun Gao and Congxuan Zhang and Xiaojuan Qi and Bing Li and Weiming Hu},
      year={2025},
      eprint={2509.11287},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.11287},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
llava.egg-info		llava.egg-info
llava		llava
playground/data/neg_data		playground/data/neg_data
scripts		scripts
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

APASI: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

Introduction

Dataset

APASI Model Weight

Install

Data Preparation

Train

Inference and Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

davidluciolu/APASI

Folders and files

Latest commit

History

Repository files navigation

APASI: Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations

Introduction

Dataset

APASI Model Weight

Install

Data Preparation

Train

Inference and Evaluation

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages