About baseline #2

haohaodw · 2024-08-20T09:47:44Z

A nice work. I would like to ask a question about LURE. LURE needs to mask the object during inference and then correct it. However, POPE and MME are discriminant tasks, using YES/NO to answer questions. How do you test the performance of LURE on these two data sets?

Hyperwjf · 2024-08-22T08:50:48Z

Thanks for your interest! In our experiments, we have observed that the responses from the four LVLMs to POPE questions are in the format as "Yes/No, there is/isn't {object} ..." This format allows LURE to mask the object. For instance, the responses of mPLUG-Owl to some POPE questions are listed below:

The responses of LLaVA-1.5 to some POPE questions are listed below:

haohaodw · 2024-08-22T08:56:23Z

However, when calculating the accuracy of POPE, the calculation is yes or no. So how do you judge whether the modified response of LURE is correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About baseline #2

About baseline #2

haohaodw commented Aug 20, 2024

Hyperwjf commented Aug 22, 2024

haohaodw commented Aug 22, 2024

About baseline #2

About baseline #2

Comments

haohaodw commented Aug 20, 2024

Hyperwjf commented Aug 22, 2024

haohaodw commented Aug 22, 2024