[Question] mismatch between bbox and image in RefCOCO #16

WeitaiKang · 2024-10-14T02:34:07Z

Hi authors,

Thanks for your great job!

However, for the evaluation in Visual Grounding (Refcoco/+/g), I find that the coordinate of your normalized bbox mismatch with the image processed by LLaVA1.5.

Specifically, your code for bbox normalize the bbox based on the original image size. Instead, the image will go through LetterBoxPad and resize to 336px. Therefore, the normalized bbox's coordinates don't match the pixels' coordinates of input image in LLaVA.

Isn't it a problem? Is it the same way of how LLaVA generate their training data?

WeitaiKang · 2024-10-14T03:01:44Z

According to issue in LLaVA codebase, the bbox in their training data has considered the LetterBoxPad and 336px. Therefore, I think your preparation in ground truth bbox might not be correct. How do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] mismatch between bbox and image in RefCOCO #16

[Question] mismatch between bbox and image in RefCOCO #16

WeitaiKang commented Oct 14, 2024 •

edited

Loading

WeitaiKang commented Oct 14, 2024

[Question] mismatch between bbox and image in RefCOCO #16

[Question] mismatch between bbox and image in RefCOCO #16

Comments

WeitaiKang commented Oct 14, 2024 • edited Loading

WeitaiKang commented Oct 14, 2024

WeitaiKang commented Oct 14, 2024 •

edited

Loading