Skip to content

LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Notifications You must be signed in to change notification settings

goirik-chakrabarty/LoMOE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LoMOE

This is the official PyTorch implementation for the ACMMM 24 paper: "LoMOE: Localized Multi-Object Editing via Multi-Diffusion". All the published data is available on our project page.

LoMOE

Requirements

This code was tested with python=3.9, pytorch=2.0.1 and torchvision=0.15.2. Please follow the instructions here to install PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Create a conda environment with the following dependencies:

conda create -n lomoe python=3.9
conda activate lomoe
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install accelerate==0.20.3 diffusers==0.12.1 einops==0.7.0 ipython transformers==4.26.1 salesforce-lavis==1.0.2

Getting Started

Usage

Start by downloading the SOE and MOE datasets from our project page to ./benchmark/data.

To generate the prompt, inverted latent, and store intermediate latents for an image, first run the inversion script located at ./lomoe/invert/inversion.py. Then, to apply edits, use ./lomoe/edit/main.py. A sample image and corresponding masks for single and multi-object edit operations are provided in ./lomoe/sample/.

Inversion

The invert/inversion.py script takes the following arguments

  • --input_image : Path to the image.
  • --results_folder : Path to store the prompt, inverted and intermediate latents.
CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/single/init_image.jpg" \
        --results_folder "invert/output/single"
CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/multi/init_image.png" \
        --results_folder "invert/output/multi"

Edit

The edit/main.py script takes the following arguments

  • --mask_paths : Path to the object mask.
  • --num_fgmasks : Number of foreground masks (defaults to 1).
  • --bg_prompt : Path to the background prompt (we use the prompt generated by inversion.py).
  • --bg_negative : Path to the background negative prompt (we use the prompt generated by inversion.py).
  • --fg_prompts : Edit prompt corresponding to the masks.
  • --fg_negative : The foreground negative prompt. (We use "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image")
  • --W : Output image width.
  • --H : Output image height.
  • --seed : The seed to initialize random number generators (defaults to 0).
  • --sd_version : The stable diffusion version to be used (use the same as that in inversion.py).
  • --steps : The number of diffusion timesteps (use the same as that in inversion.py).
  • --ca_coef : Cross attention preservation loss coefficient (defaults to 1.0).
  • --seg_coef : Background loss coefficient (defaults to 1.75).
  • --bootstrapping : Value of the bootstrap parameter (defaults to 20).
  • --latent : Path to the inverted latent produced by inversion.py.
  • --latent_list : Path to the latent list produced by inversion.py.
  • --rec_path : Path to save the reconstructed input image.
  • --edit_path : Path to save the edited image.
  • --save_path : Path to save the merged reconstructed and edited image.
CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/single/mask_1.jpg" \
  --bg_prompt "invert/output/single/prompt/init_image.txt" \
  --bg_negative "invert/output/single/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/single/inversion/init_image.pt' \
  --latent_list 'invert/output/single/latentlist/init_image.pt' \
  --rec_path 'results/single/1_reconstruction.png' \
  --edit_path 'results/single/2_edit.png' \
  --fg_prompts "a red dog collar" \
  --seed 1234 \
  --save_path 'results/single/3_merged.png'
CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/multi/mask_1.png" "sample/multi/mask_2.png" \
  --bg_prompt "invert/output/multi/prompt/init_image.txt" \
  --bg_negative "invert/output/multi/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/multi/inversion/init_image.pt' \
  --latent_list 'invert/output/multi/latentlist/init_image.pt' \
  --rec_path 'results/multi/1_reconstruction.png' \
  --edit_path 'results/multi/2_edit.png' \
  --fg_prompts "a crochet bird" "an origami bird" \
  --num_fgmasks 2 \
  --seed 1234 \
  --save_path 'results/multi/3_merged.png'

Results

Results

Metrics

To compute the classical and neural metrics, use compute_metrics.py in ./benchmark/metrics/{SOE/MOE}. This includes the SRC and TGT Clip Scores, BG LPIPS, BG PSNR, BG MSE, BG SSIM and the Structural Distance. The compute_aesthetic.py in ./benchmark/metrics/{SOE/MOE} computes the aesthetic metrics including HPS, IR and Aesthetic Score. This file also requires additional dependencies, namely HPSv2 and ImageReward.

NOTE: The compute_metrics.py and compute_aesthetic.py scripts expect a folder containing edits for all images in the dataset. Please modify the code to run them on a smaller subset or single images.

CUDA_VISIBLE_DEVICES=0 python compute_metrics.py --folder_name PATH_TO_SAVED_EDITS
CUDA_VISIBLE_DEVICES=0 python compute_aesthetic.py --folder_name PATH_TO_SAVED_EDITS

Citation

If you use LoMOE or find this work useful for your research, please use the following BibTeX entry.

@InProceedings{Chakrabarty_2024_ACMMM,
  author    = {Chakrabarty$^*$, Goirik and Chandrasekar$^*$, Aditya and Hebbalaguppe, Ramya and Prathosh, AP},
  title     = {LoMOE: Localized Multi-Object Editing via Multi-Diffusion},
  booktitle = {ACM Multimedia 2024},
  month     = {October},
  year      = {2024}
}

About

LoMOE: Localized Multi-Object Editing via Multi-Diffusion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages