Skip to content

Latest commit

 

History

History
131 lines (106 loc) · 6.18 KB

README.md

File metadata and controls

131 lines (106 loc) · 6.18 KB

LoMOE

This is the official PyTorch implementation for the ACMMM 24 paper: "LoMOE: Localized Multi-Object Editing via Multi-Diffusion". All the published data is available on our project page.

LoMOE

Requirements

This code was tested with python=3.9, pytorch=2.0.1 and torchvision=0.15.2. Please follow the instructions here to install PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Create a conda environment with the following dependencies:

conda create -n lomoe python=3.9
conda activate lomoe
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install accelerate==0.20.3 diffusers==0.12.1 einops==0.7.0 ipython transformers==4.26.1 salesforce-lavis==1.0.2

Getting Started

Usage

Start by downloading the SOE and MOE datasets from our project page to ./benchmark/data.

To generate the prompt, inverted latent, and store intermediate latents for an image, first run the inversion script located at ./lomoe/invert/inversion.py. Then, to apply edits, use ./lomoe/edit/main.py. A sample image and corresponding masks for single and multi-object edit operations are provided in ./lomoe/sample/.

Inversion

The invert/inversion.py script takes the following arguments

  • --input_image : Path to the image.
  • --results_folder : Path to store the prompt, inverted and intermediate latents.
CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/single/init_image.jpg" \
        --results_folder "invert/output/single"
CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/multi/init_image.png" \
        --results_folder "invert/output/multi"

Edit

The edit/main.py script takes the following arguments

  • --mask_paths : Path to the object mask.
  • --num_fgmasks : Number of foreground masks (defaults to 1).
  • --bg_prompt : Path to the background prompt (we use the prompt generated by inversion.py).
  • --bg_negative : Path to the background negative prompt (we use the prompt generated by inversion.py).
  • --fg_prompts : Edit prompt corresponding to the masks.
  • --fg_negative : The foreground negative prompt. (We use "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image")
  • --W : Output image width.
  • --H : Output image height.
  • --seed : The seed to initialize random number generators (defaults to 0).
  • --sd_version : The stable diffusion version to be used (use the same as that in inversion.py).
  • --steps : The number of diffusion timesteps (use the same as that in inversion.py).
  • --ca_coef : Cross attention preservation loss coefficient (defaults to 1.0).
  • --seg_coef : Background loss coefficient (defaults to 1.75).
  • --bootstrapping : Value of the bootstrap parameter (defaults to 20).
  • --latent : Path to the inverted latent produced by inversion.py.
  • --latent_list : Path to the latent list produced by inversion.py.
  • --rec_path : Path to save the reconstructed input image.
  • --edit_path : Path to save the edited image.
  • --save_path : Path to save the merged reconstructed and edited image.
CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/single/mask_1.jpg" \
  --bg_prompt "invert/output/single/prompt/init_image.txt" \
  --bg_negative "invert/output/single/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/single/inversion/init_image.pt' \
  --latent_list 'invert/output/single/latentlist/init_image.pt' \
  --rec_path 'results/single/1_reconstruction.png' \
  --edit_path 'results/single/2_edit.png' \
  --fg_prompts "a red dog collar" \
  --seed 1234 \
  --save_path 'results/single/3_merged.png'
CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/multi/mask_1.png" "sample/multi/mask_2.png" \
  --bg_prompt "invert/output/multi/prompt/init_image.txt" \
  --bg_negative "invert/output/multi/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/multi/inversion/init_image.pt' \
  --latent_list 'invert/output/multi/latentlist/init_image.pt' \
  --rec_path 'results/multi/1_reconstruction.png' \
  --edit_path 'results/multi/2_edit.png' \
  --fg_prompts "a crochet bird" "an origami bird" \
  --num_fgmasks 2 \
  --seed 1234 \
  --save_path 'results/multi/3_merged.png'

Results

Results

Metrics

To compute the classical and neural metrics, use compute_metrics.py in ./benchmark/metrics/{SOE/MOE}. This includes the SRC and TGT Clip Scores, BG LPIPS, BG PSNR, BG MSE, BG SSIM and the Structural Distance. The compute_aesthetic.py in ./benchmark/metrics/{SOE/MOE} computes the aesthetic metrics including HPS, IR and Aesthetic Score. This file also requires additional dependencies, namely HPSv2 and ImageReward.

NOTE: The compute_metrics.py and compute_aesthetic.py scripts expect a folder containing edits for all images in the dataset. Please modify the code to run them on a smaller subset or single images.

CUDA_VISIBLE_DEVICES=0 python compute_metrics.py --folder_name PATH_TO_SAVED_EDITS
CUDA_VISIBLE_DEVICES=0 python compute_aesthetic.py --folder_name PATH_TO_SAVED_EDITS

Citation

If you use LoMOE or find this work useful for your research, please use the following BibTeX entry.

@InProceedings{Chakrabarty_2024_ACMMM,
  author    = {Chakrabarty$^*$, Goirik and Chandrasekar$^*$, Aditya and Hebbalaguppe, Ramya and Prathosh, AP},
  title     = {LoMOE: Localized Multi-Object Editing via Multi-Diffusion},
  booktitle = {ACM Multimedia 2024},
  month     = {October},
  year      = {2024}
}