Skip to content

Conversation

@Fxycst1213
Copy link

@Fxycst1213 Fxycst1213 commented Mar 17, 2025

Motivation

1.Based on the #3351 branch, I implemented the internvl model with the language part from Qwen.
2.When evaluating the model using the MMMU dataset, we encountered some issues with multi-image inputs. The generated prompt was incomplete. We conducted a comparison using VLMEvalKIT, and the results are as follows.
VLMevalKIT:
image
bench_hf:
image
Additionally, different models require different prompt modifications. For example, InternVL requires the use of "Image-i" as a separator for multi-image inputs.When evaluating the InternVL model using bench_hf, I found that its prompt differs from the one generated by VLMEvalKIT.The correct prompt format should be as follows:
image
The Qwen2-VL-7B model performed lower compared to the test results on the VLMevalKIT.

Modifications

Added the InternVL model and conducted tests on it. After modifying the MMMU evaluation script, we evaluated InternVL-38B and Qwen2VL-38B, and updated the results in the README.

InternVL38B

VLMevalKIT

image

bench_hf

image

bench_sglang

image

Qwen2-VL-7B

VLMevalKIT

image

bench_hf

image

bench_sglang

image

Checklist

@Fxycst1213 Fxycst1213 changed the title Internvl Internvl and MMMU dataset evaluation Mar 17, 2025
@tonylt
Copy link

tonylt commented Mar 18, 2025

@Fxycst1213 Well done, maybe you can update qwen2-vl and internvl's mmmu accuracy results in #4456

@Fxycst1213
Copy link
Author

@zhaochenyang20 I have fixed some bugs. Could you help me test it again?

@zhaochenyang20
Copy link
Collaborator

@zhaochenyang20 I have fixed some bugs. Could you help me test it again?

reruned. thanks! if you need immediate rerun, ping me in wechat.

@github-actions
Copy link
Contributor

This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants