Skip to content

A prompt-model interaction dataset for generative recommendation.

License

Notifications You must be signed in to change notification settings

MAPS-research/GEMRec

Repository files navigation

GEMRec

Update: GEMRec is accepted by WSDM 2024 (Demo Track).

This is the official github repo for the paper GEMRec: Towards Generative Model Recommendation. We release the promptset and code for generating the GEMRec-18K dataset as well as the links to the corresponding dataset and space on HuggingFace.

Dataset Intro

GEMRec-18K is a prompt-model interaction dataset with 18K images generated by 200 publicly-available generative models paired with a diverse set of 90 textual prompts. We randomly sampled a subset of 197 models from the full set of models (all finetuned from Stable Diffusion) on Civitai according to the popularity distribution (i.e., download counts) and added 3 original Stable Diffusion checkpoints (v1.4, v1.5, v2.1) from HuggingFace. All the model checkpoints have been converted to the Diffusers format. The textual prompts were drawn from three sources: 60 prompts were sampled from Parti Prompts; 10 prompts were sampled from Civitai by popularity; we also handcrafted 10 prompts following the prompting guide from DreamStudio, and then extended them to 20 by creating a shortened and simplified version following the tips from Midjourney. The textual prompts were classified into 12 categories: abstract, animal, architecture, art, artifact, food, illustration, people, produce & plant, scenery, vehicle, and world knowledge.

HuggingFace Links

Dataset

Space

  • GEMRec-Gallery: Our demo app for GEMRec, which helps users identify useful models from a large corpus.

Files & Directories

Key existing files & directories

  • everything/get_models.py: Fetch all model metadata from Civitai via its api. The files will be stored as everything/models/{model_id}_{latest_modelVersion_id}.json.
  • download_and_generate.py: Download models and generate images (see more details below).
  • evaluate_and_upload.py: Compute evaluation metrics for the images and upload them.
  • roster.csv: The metadata for the 200 model checkpoints fetched from Civitai.
  • promptsets/: The prompts for image generation, we used v6 for GemRec-18k.
  • utils/: Miscellaneous scripts for generating the GemRec-18k dataset.

Directories to be created by the scripts

  • meta/: Metadata for the checkpoints.
  • output/: Converted models in Diffusers format.
  • download/: Checkpoint cache downloaded from Civitai.
  • generated/train/: Images generated with all downloaded models using certain promptset.

Usage

Basic: Replicate our GemRec-18k dataset

# Clone the repo
git clone https://github.com/MAPS-research/GEMRec.git && cd GEMRec

# Download models and generate images using the default settings
#  - promptset: ./promptsets/promptset_v6.csv
#  - model roster: ./roster.csv
python download_and_generate.py --sd --lr

Advanced: Build your own dataset

Civitai to Diffusers conversion

See more details under utils/civitai2diffusers/.

Fetch latest models from Civitai

Download models from Civitai or read local cache. Generate images with the given promptset.

python download_and_generate.py --fn

Sample a subset of models from Civitai

Note that the sampled model subset has the same popularity distribution as the full set.

# Download the metadata of all models from Civitai
cd everything && python get_models.py

# Plot histogram for old subset & find bin sizes
cd ../utils/popularity && python hist.py

# Distribute all models into the bins
python grouping.py

# Sample a subset of candidate models
python sampling.py

# Generate images with the sampled model subset
cd ../.. && python download_and_generate.py -p all

# Evaluate the generated images
python evaluate_and_upload.py

Acknowledgement

This work is supported in part by Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning at NYU Shanghai, STCSM 23YF1430300, and NYU HPC resources.

Citation

If you find our work helpful, please consider cite it as follows:

@article{guo2023gemrec,
  title={GEMRec: Towards Generative Model Recommendation},
  author={Guo, Yuanhe and Liu, Haoming and Wen, Hongyi},
  journal={arXiv preprint arXiv:2308.02205},
  year={2023}
}

About

A prompt-model interaction dataset for generative recommendation.

Resources

License

Stars

Watchers

Forks

Languages