Skip to content

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Notifications You must be signed in to change notification settings

Zeqiang-Lai/Mini-DALLE3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minidalle3

Technical ReportProject pageDemo (Temporarily Unavailable)

minidalle3.mp4

teaser4

An experimental attempt to obtain the interactive and interleave text-to-image and text-to-text experience of DALL•E 3 and ChatGPT.

Try Yourself 🤗

  • Download the checkpoint and save it as following
checkpoints
   - models
   - sdxl_models
  • run the following commands, and you will get a gradio-based web demo.
export OPENAI_API_KEY="your key"
python -m minidalle3.web 
  • To use other LLM rather than ChatGPT, such as baichuan.
python -m minidalle3.llm.baichuan
export OPENAI_API_BASE="http://0.0.0.0:10039/v1"
python -m minidalle3.web

chatglm, baichuan, internlm are tested. llama have not supported yet. qwen is not tested.

TODO

  • Support generating image interleaved in the conversations.
  • Support generating multiple images at once.
  • Support selecting image.
  • Support refinement.
  • Support prompt refinement/variation.
  • Instruct tuned LLM/SD.

Citation

If you find this repo helpful, please consider citing us.

@misc{minidalle3,
    author={Lai, Zeqiang and Zhu, Xizhou and Dai, Jifeng and Qiao, Yu and Wang, Wenhai},
    title={Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models},
    year={2023},
    url={https://github.com/Zeqiang-Lai/Mini-DALLE3},
}

Acknowledgement

IP-AdapterStable Diffusion XL

Visitors