Official Pytorch implementation for our paper GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis by Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu.
Generated Images
- python 3.9
- Pytorch 1.9
- At least 1x24GB 3090 GPU (for training)
- Only CPU (for sampling)
GALIP is a small and fast generative model which can generate multiple pictures in one second even on the CPU.
Clone this repo.
git clone https://github.com/tobran/GALIP
pip install -r requirements.txt
Install CLIP
- Download the preprocessed metadata for birds coco and extract them to
data/
- Download the birds image data. Extract them to
data/birds/
- Download coco2014 dataset and extract the images to
data/coco/images/
cd GALIP/code/
- For bird dataset:
bash scripts/train.sh ./cfg/bird.yml
- For coco dataset:
bash scripts/train.sh ./cfg/coco.yml
If your training process is interrupted unexpectedly, set state_epoch, log_dir, and pretrained_model_path in train.sh to resume training.
Our code supports automate FID evaluation during training, the results are stored in TensorBoard files under ./logs. You can change the test interval by changing test_interval in the YAML file.
- For bird dataset:
tensorboard --logdir=./code/logs/bird/train --port 8166
- For coco dataset:
tensorboard --logdir=./code/logs/coco/train --port 8177
- GALIP for COCO. Download and save it to
./code/saved_models/pretrained/
- GALIP for CC12M. Download and save it to
./code/saved_models/pretrained/
cd GALIP/code/
set pretrained_model in test.sh
- For bird dataset:
bash scripts/test.sh ./cfg/bird.yml
- For COCO dataset:
bash scripts/test.sh ./cfg/coco.yml
- For CC12M (zero-shot on COCO) dataset:
bash scripts/test.sh ./cfg/coco.yml
The released model achieves better performance than the paper version.
Model | COCO-FID↓ | COCO-CS↑ | CC12M-ZFID↓ |
---|---|---|---|
GALIP(paper) | 5.85 | 0.3338 | 12.54 |
GALIP(released) | 5.01 | 0.3379 | 12.54 |
- the sample.ipynb can be used to sample
If you find GALIP useful in your research, please consider citing our paper:
@inproceedings{tao2023galip,
title={GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis},
author={Tao, Ming and Bao, Bing-Kun and Tang, Hao and Xu, Changsheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14214--14223},
year={2023}
}
The code is released for academic research use only. For commercial use, please contact Ming Tao (陶明) ([email protected]).
Reference