[Misc] Support MMMU accuracy benchmark by tanruixiang · Pull Request #23034 · vllm-project/vllm

tanruixiang · 2025-08-16T19:16:43Z

Purpose

related #23033

Test Plan

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

github-actions · 2025-08-16T19:16:54Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds support for the MMMU accuracy benchmark, including scripts for both HuggingFace and vLLM, along with data and evaluation utilities. My review focuses on two high-severity issues in the data processing logic: a security vulnerability due to the use of eval(), and a correctness bug related to missing image placeholders in prompts for multimodal models. Addressing these will improve the security and correctness of the benchmark.

benchmarks/mmmu/data_utils.py

Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>

DarkLight1337 · 2025-10-15T03:06:08Z

Sorry for the delay. To run it in CI, perhaps you can refer to #21810

tanruixiang · 2025-10-17T07:18:49Z

Sorry for the delay. To run it in CI, perhaps you can refer to #21810

Thank you. I'll take care of it in the next few days.

github-actions · 2026-01-17T02:16:49Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2026-01-17T02:17:31Z

Hi @tanruixiang, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify bot added the performance Performance-related issues label Aug 16, 2025

tanruixiang mentioned this pull request Aug 16, 2025

[RFC]: Support mmmu benchmark #23033

Closed

4 tasks

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

benchmarks/mmmu/data_utils.py Outdated Show resolved Hide resolved

benchmarks/mmmu/data_utils.py Outdated Show resolved Hide resolved

[Misc] Support MMMU accuracy benchmark

1cf31ec

Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>

tanruixiang force-pushed the support_mmmu branch from 3deef1d to 1cf31ec Compare August 16, 2025 19:23

DarkLight1337 added this to Multi-modality Core Aug 17, 2025

DarkLight1337 moved this to In Progress in Multi-modality Core Aug 17, 2025

DarkLight1337 removed this from Multi-modality Core Aug 17, 2025

tanruixiang added 3 commits August 17, 2025 21:30

Merge branch 'main' into support_mmmu

a6f3787

chore: tidy code

1f4d70a

Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>

chore: fix security issue

e40435a

Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>

tanruixiang marked this pull request as ready for review August 17, 2025 15:14

Merge branch 'main' into support_mmmu

77cb09c

github-actions bot added the stale Over 90 days of inactivity label Jan 17, 2026

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Support MMMU accuracy benchmark#23034

[Misc] Support MMMU accuracy benchmark#23034
tanruixiang wants to merge 5 commits intovllm-project:mainfrom
tanruixiang:support_mmmu

tanruixiang commented Aug 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 15, 2025

Uh oh!

tanruixiang commented Oct 17, 2025

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

mergify bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tanruixiang commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Oct 15, 2025

Uh oh!

tanruixiang commented Oct 17, 2025

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

mergify bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanruixiang commented Aug 16, 2025 •

edited by github-actions bot

Loading