Skip to content

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

Notifications You must be signed in to change notification settings

matrix-alpha/Accountable-Textual-Visual-Chat

Repository files navigation

Python >=3.8 PyTorch >=1.1

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

 The overall framework of ATVC.

Requirements

  • Python 3.8
  • matplotlib == 3.1.1
  • numpy == 1.19.4
  • pandas == 0.25.1
  • scikit_learn == 0.21.3
  • torch == 1.8.0

Installation

We provide an environment file; environment.yml containing the required dependencies. Clone the repo and run the following command in the root of this directory:

conda env create -f environment.yml

Dataset

Please refer to DOWNLOAD.md for dataset preparation.

Pretrained Models

Please refer to pretrained-models to download the released models.

Train

Training commands

  • To train the first stage:
bash dist_train_vae.sh ${DATA_NAME} ${NODES} ${GPUS}
  • To train the second stage:
bash dist_train_atvc.sh ${VAE_PATH} ${DATA_NAME} ${NODES} ${GPUS}

Arguments

  • ${VAE_PATH}: path of pretrained vae model.
  • ${DATA_NAME}: dataset for training, e.g. CLEVR-ATVC, Fruit-ATVC.
  • ${NODES}: number of node.
  • ${GPUS}: number of gpus for each node.

Test

Testing commands

  • To test image reconstruction ability of the first stage:
bash gen_vae.sh ${GPU} ${VAE_PATH} ${IMAGE_PATH}
  • To test atvc final model:
bash gen_atvc.sh ${GPU} ${ATVC_PATH} ${TEXT_QUERY} ${IMAGE_PATH}

Arguments

  • ${GPU}: id of one gpu, e.g. 0.
  • ${VAE_PATH}: path of pretrained vae model.
  • ${IMAGE_PATH}: image path for reconstrction, e.g. input.png.
  • ${ATVC_PATH}: path of pretrained atvc model.
  • ${TEXT_QUERY}: text-based query, e.g. "Please put the small blue cube on top of the small yellow cylinder.".

License

ATVC is released under the Apache 2.0 license.

Citation

If you find this code useful for your research, please cite our paper

@article{zhang2023accountable,
  title={Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation},
  author={Zhang, Zhiwei and Liu, Yuliang},
  journal={arXiv preprint arXiv:2303.05983},
  year={2023}
}

Acknowledgement

Our code is learned from DALLE-pytorch and CLIP. We would like to thank all the people who help label text-image pairs and participate in human evaluation experiments. We hope our explorations and findings contribute valuable insights regarding the accountability of textual-visual generative models.

Contact

This project is developed by Zhiwei Zhang (@zzw-zwzhang) and Yuliang Liu (@Yuliang-Liu).

About

The official repository for Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published