Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] mismatch between bbox and image in RefCOCO #16

Open
WeitaiKang opened this issue Oct 14, 2024 · 1 comment
Open

[Question] mismatch between bbox and image in RefCOCO #16

WeitaiKang opened this issue Oct 14, 2024 · 1 comment

Comments

@WeitaiKang
Copy link

WeitaiKang commented Oct 14, 2024

Hi authors,

Thanks for your great job!

However, for the evaluation in Visual Grounding (Refcoco/+/g), I find that the coordinate of your normalized bbox mismatch with the image processed by LLaVA1.5.

Specifically, your code for bbox normalize the bbox based on the original image size. Instead, the image will go through LetterBoxPad and resize to 336px. Therefore, the normalized bbox's coordinates don't match the pixels' coordinates of input image in LLaVA.

Isn't it a problem? Is it the same way of how LLaVA generate their training data?

@WeitaiKang
Copy link
Author

According to issue in LLaVA codebase, the bbox in their training data has considered the LetterBoxPad and 336px. Therefore, I think your preparation in ground truth bbox might not be correct. How do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant