layout	title	description	show_buttons	repository_url
default	EDITVAL	Benchmarking Text-Guided Image Editing Methods	true	https://github.com/deep-ml-research/editval_code

Overview

EDITVAL is a standardized benchmark for evaluating text-guided image editing methods across diverse edit types, validated through a large-scale human study.

EDITVAL consists of the following distinct components:

A seed dataset D consisting of carefully selected images from MS-COCO. These are the real images which need to be edited by the different editing methods.
An attribute list A which consists of various dimensions in which the edits need to be made on the dataset D.
An evaluation template and procedure for human study on the edited images.
An automated evaluation procedure to check quality of edits using pre-trained vision-language models for a subset of attributes in A.

The attribute list A for ~100 images from MS-COCO can be downloaded from here{:target="_blank" rel="noopener"}. The format of the json file is as follows:

{
  "class_name" : {
    "image_id": { # image ids from MS-COCO
      "edit_attribute" : {
        "from" : ["initial state of attribute"],
        "to" : ["target states of attribute", ...]}}}
}

The complete list of edit attributes for evaluation currently is:

Object Addition: adding an object to the image.
Object Replacement: replacing an existing object in the image with another object.
Size: changing the size of an object.
Position Replacement: changing the position of an object in the image (e.g., left, center, right).
Positional Addition: adding an object in a specific position in the image.
Alter Parts: modifying the details of an object.
Background: changing the background of the image.
Texture: changing the texture of an object (e.g., wooden table, polka dot cat).
Color: changing the color of an object.
Shape: changing shape of an object (e.g., circle-shaped stop sign)
Action: changing the action that the main object is performing (e.g., dog running).
Viewpoint: changing the viewpoint in which the image is taken from (e.g., photo of a dog from above).

More Details on EditVal Dataset and Pipeline

EditVal benchmark contains 648 unique image-edit operations for 19 classes selected from MS-COCO spanning a variety of real-world edits. Edit operations span simple attribute categories like adding or replacing an object to more complex ones such as changing an action, camera viewpoint or replacing the position of an existing object.

MTurk Human Study

The template to run an MTurk study{:target="_blank" rel="noopener"} to evaluate the quality of the image editting methods is provided here{:target="_blank" rel="noopener"}.

Together with the template, an input csv file must be provided for the mturk study. Each row of the csv file represents one instance of edit, which contains these four inputs:

url_org: url of the original image.
url_edit: url of the editted image.
prompt: the prompt used to edit the image.
class_name: name of the main object in the image.

An example of an input csv file can be seen here{:target="_blank" rel="noopener"}. Below is an example of how the mturk study looks to the workers.

The right image is supposed to apply the prompt "Change apple to orange" to the left image.

How well is the edit from the given prompt applied?

Not applied Minorly applied Adequetly applied Perfectly applied

How well are the other properties (other than what the edit is targeting) of the main object (apple) preserved in the right image?

Object is completely changed Some parts are preserved Most parts are preserved Other properties of the object are perfectly preserved

How well are the other properties (other than what the edit is targeting) of the main object (apple) preserved in the right image?

Completely changed Some parts are preserved Most parts are preserved Perfectly preserved

Leaderboards

The numbers below for the human study are calculated only on the first question of the template, which does not consider the changes to the rest of the image. This has been done in order to keep the results comparable to our automatic evaluation framework. For each instant in the human study, a score of 1.0 is given if the edit is Adequetly applied or Perfectly applied, and a score of 0.0 otherwise.

Human Study

Method	Object Addition	Object Replacement	Position Replacement	Positional Addition	Size	Alter Parts	Average

Automatic Evaluation

Method	Object Addition	Object Replacement	Position Replacement	Positional Addition	Size	Alter Parts	Average

Contact Us

Contact us at xxx@gmail.com if you wish to add your method to the leaderboards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Overview

More Details on EditVal Dataset and Pipeline

MTurk Human Study

Leaderboards

Human Study

Automatic Evaluation

Contact Us

Files

index.md

Latest commit

History

index.md

File metadata and controls

Overview

More Details on EditVal Dataset and Pipeline

MTurk Human Study

Leaderboards

Human Study

Automatic Evaluation

Contact Us