Internvl and MMMU dataset evaluation #4509

Fxycst1213 · 2025-03-17T11:16:52Z

Motivation

1.Based on the #3351 branch, I implemented the internvl model with the language part from Qwen.
2.When evaluating the model using the MMMU dataset, we encountered some issues with multi-image inputs. The generated prompt was incomplete. We conducted a comparison using VLMEvalKIT, and the results are as follows.
VLMevalKIT:

bench_hf:

Additionally, different models require different prompt modifications. For example, InternVL requires the use of "Image-i" as a separator for multi-image inputs.When evaluating the InternVL model using bench_hf, I found that its prompt differs from the one generated by VLMEvalKIT.The correct prompt format should be as follows:

The Qwen2-VL-7B model performed lower compared to the test results on the VLMevalKIT.

Modifications

Added the InternVL model and conducted tests on it. After modifying the MMMU evaluation script, we evaluated InternVL-38B and Qwen2VL-38B, and updated the results in the README.

InternVL38B

VLMevalKIT

bench_hf

bench_sglang

Qwen2-VL-7B

VLMevalKIT

bench_hf

bench_sglang

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

tonylt · 2025-03-18T06:57:19Z

@Fxycst1213 Well done, maybe you can update qwen2-vl and internvl's mmmu accuracy results in #4456

Fxycst1213 · 2025-03-21T02:49:16Z

@zhaochenyang20 I have fixed some bugs. Could you help me test it again?

zhaochenyang20 · 2025-03-21T15:15:25Z

@zhaochenyang20 I have fixed some bugs. Could you help me test it again?

reruned. thanks! if you need immediate rerun, ping me in wechat.

github-actions · 2025-05-30T08:31:55Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.

fanxinyao added 3 commits March 17, 2025 12:11

support internvl and MMMU dataset evaluation

4265598

support internvl and MMMU dataset evaluation

973f750

support internvl and MMMU dataset evaluation

b6d8f3a

Fxycst1213 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, xiezhq-hermann and zhyncs as code owners March 17, 2025 11:16

Merge branch 'main' into internvl

8135bc7

Fxycst1213 changed the title ~~Internvl~~ Internvl and MMMU dataset evaluation Mar 17, 2025

Fxycst1213 added 4 commits March 17, 2025 20:18

Merge branch 'main' into internvl

3981e35

Merge branch 'main' into internvl

777bca8

Merge branch 'main' into internvl

72009a9

Merge branch 'main' into internvl

f36cdef

Fxycst1213 mentioned this pull request Mar 18, 2025

[Track] VLM accuracy in MMMU benchmark #4456

Closed

ravi03071991 mentioned this pull request Mar 20, 2025

Add MMMU benchmark results #4491

Merged

6 tasks

zhaochenyang20 and others added 4 commits March 20, 2025 08:36

Merge branch 'main' into internvl

7e523b3

Merge branch 'main' into internvl

5b66f0e

support internvl and MMMU dataset evaluation

be092d4

support internvl and MMMU dataset evaluation

3743ec1

zhaochenyang20 and others added 2 commits March 20, 2025 22:47

Merge branch 'main' into internvl

6352bb6

Merge branch 'main' into internvl

bb11672

github-actions bot closed this May 30, 2025

github-actions bot added the inactive label May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Internvl and MMMU dataset evaluation #4509

Internvl and MMMU dataset evaluation #4509

Uh oh!

Fxycst1213 commented Mar 17, 2025 •

edited

Loading

Uh oh!

tonylt commented Mar 18, 2025 •

edited

Loading

Uh oh!

Fxycst1213 commented Mar 21, 2025

Uh oh!

zhaochenyang20 commented Mar 21, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Internvl and MMMU dataset evaluation #4509

Internvl and MMMU dataset evaluation #4509

Uh oh!

Conversation

Fxycst1213 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

InternVL38B

VLMevalKIT

bench_hf

bench_sglang

Qwen2-VL-7B

VLMevalKIT

bench_hf

bench_sglang

Checklist

Uh oh!

tonylt commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fxycst1213 commented Mar 21, 2025

Uh oh!

zhaochenyang20 commented Mar 21, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fxycst1213 commented Mar 17, 2025 •

edited

Loading

tonylt commented Mar 18, 2025 •

edited

Loading