LoMOE

This is the official PyTorch implementation for the ACMMM 24 paper: "LoMOE: Localized Multi-Object Editing via Multi-Diffusion". All the published data is available on our project page.

Requirements

This code was tested with python=3.9, pytorch=2.0.1 and torchvision=0.15.2. Please follow the instructions here to install PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Create a conda environment with the following dependencies:

conda create -n lomoe python=3.9
conda activate lomoe
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install accelerate==0.20.3 diffusers==0.12.1 einops==0.7.0 ipython transformers==4.26.1 salesforce-lavis==1.0.2

Getting Started

Usage

Start by downloading the SOE and MOE datasets from our project page to ./benchmark/data.

To generate the prompt, inverted latent, and store intermediate latents for an image, first run the inversion script located at ./lomoe/invert/inversion.py. Then, to apply edits, use ./lomoe/edit/main.py. A sample image and corresponding masks for single and multi-object edit operations are provided in ./lomoe/sample/.

Inversion

The invert/inversion.py script takes the following arguments

--input_image : Path to the image.
--results_folder : Path to store the prompt, inverted and intermediate latents.

CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/single/init_image.jpg" \
        --results_folder "invert/output/single"

CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/multi/init_image.png" \
        --results_folder "invert/output/multi"

Edit

The edit/main.py script takes the following arguments

--mask_paths : Path to the object mask.
--num_fgmasks : Number of foreground masks (defaults to 1).
--bg_prompt : Path to the background prompt (we use the prompt generated by inversion.py).
--bg_negative : Path to the background negative prompt (we use the prompt generated by inversion.py).
--fg_prompts : Edit prompt corresponding to the masks.
--fg_negative : The foreground negative prompt. (We use "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image")
--W : Output image width.
--H : Output image height.
--seed : The seed to initialize random number generators (defaults to 0).
--sd_version : The stable diffusion version to be used (use the same as that in inversion.py).
--steps : The number of diffusion timesteps (use the same as that in inversion.py).
--ca_coef : Cross attention preservation loss coefficient (defaults to 1.0).
--seg_coef : Background loss coefficient (defaults to 1.75).
--bootstrapping : Value of the bootstrap parameter (defaults to 20).
--latent : Path to the inverted latent produced by inversion.py.
--latent_list : Path to the latent list produced by inversion.py.
--rec_path : Path to save the reconstructed input image.
--edit_path : Path to save the edited image.
--save_path : Path to save the merged reconstructed and edited image.

CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/single/mask_1.jpg" \
  --bg_prompt "invert/output/single/prompt/init_image.txt" \
  --bg_negative "invert/output/single/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/single/inversion/init_image.pt' \
  --latent_list 'invert/output/single/latentlist/init_image.pt' \
  --rec_path 'results/single/1_reconstruction.png' \
  --edit_path 'results/single/2_edit.png' \
  --fg_prompts "a red dog collar" \
  --seed 1234 \
  --save_path 'results/single/3_merged.png'

CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/multi/mask_1.png" "sample/multi/mask_2.png" \
  --bg_prompt "invert/output/multi/prompt/init_image.txt" \
  --bg_negative "invert/output/multi/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/multi/inversion/init_image.pt' \
  --latent_list 'invert/output/multi/latentlist/init_image.pt' \
  --rec_path 'results/multi/1_reconstruction.png' \
  --edit_path 'results/multi/2_edit.png' \
  --fg_prompts "a crochet bird" "an origami bird" \
  --num_fgmasks 2 \
  --seed 1234 \
  --save_path 'results/multi/3_merged.png'

Results

Metrics

To compute the classical and neural metrics, use compute_metrics.py in ./benchmark/metrics/{SOE/MOE}. This includes the SRC and TGT Clip Scores, BG LPIPS, BG PSNR, BG MSE, BG SSIM and the Structural Distance. The compute_aesthetic.py in ./benchmark/metrics/{SOE/MOE} computes the aesthetic metrics including HPS, IR and Aesthetic Score. This file also requires additional dependencies, namely HPSv2 and ImageReward.

NOTE: The compute_metrics.py and compute_aesthetic.py scripts expect a folder containing edits for all images in the dataset. Please modify the code to run them on a smaller subset or single images.

CUDA_VISIBLE_DEVICES=0 python compute_metrics.py --folder_name PATH_TO_SAVED_EDITS
CUDA_VISIBLE_DEVICES=0 python compute_aesthetic.py --folder_name PATH_TO_SAVED_EDITS

Citation

If you use LoMOE or find this work useful for your research, please use the following BibTeX entry.

@InProceedings{Chakrabarty_2024_ACMMM,
  author    = {Chakrabarty$^*$, Goirik and Chandrasekar$^*$, Aditya and Hebbalaguppe, Ramya and Prathosh, AP},
  title     = {LoMOE: Localized Multi-Object Editing via Multi-Diffusion},
  booktitle = {ACM Multimedia 2024},
  month     = {October},
  year      = {2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LoMOE

Requirements

Getting Started

Usage

Inversion

Edit

Results

Metrics

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

LoMOE

Requirements

Getting Started

Usage

Inversion

Edit

Results

Metrics

Citation