- DiffBlender successfully synthesizes complex combinations of input modalities. It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences.
- We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks.
- Project page is open: link
- DiffBlender model: code & checkpoint
- Release inference code
- Release training code & pipeline
- Gradio UI
Install the necessary packages with:
$ pip install -r requirements.txt
Download DiffBlender model checkpoint from this Huggingface model, and place it under ./diffblender_checkpoints/
.
Also, prepare the SD model from this link (we used CompVis/sd-v1-4.ckpt).
$ python inference.py --ckpt_path=./diffblender_checkpoints/{CKPT_NAME}.pth \
--official_ckpt_path=/path/to/sd-v1-4.ckpt \
--save_name={SAVE_NAME}
Results will be saved under ./inference/{SAVE_NAME}/
, in the format as {conditions + generated image}.
@article{kim2023diffblender,
title={DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models},
author={Kim, Sungnyun and Lee, Junsoo and Hong, Kibeom and Kim, Daesik and Ahn, Namhyuk},
journal={arXiv preprint arXiv:2305.15194},
year={2023}
}