[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test by mgoin · Pull Request #30723 · vllm-project/vllm

mgoin · 2025-12-15T21:43:57Z

Purpose

Expands the gsm8k test so we can save any server args into the config to go beyond just TP, and adds a Qwen3-Next MTP B200 with EP=2

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>

chatgpt-codex-connector · 2025-12-15T21:44:06Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request generalizes the gsm8k test arguments by moving them into configuration files and adds a new test for Qwen3-Next. The changes are mostly good, but there's a critical issue in how server arguments are parsed, which will cause tests with complex arguments to fail. I've provided a suggestion to fix this.

gemini-code-assist · 2025-12-15T21:49:55Z

tests/evals/gsm8k/test_gsm8k_correctness.py

-    ]
+    # Parse server arguments from config
+    server_args_str = eval_config.get("server_args", "")
+    server_args = server_args_str.split() if server_args_str else []


Using str.split() is not robust for parsing command-line arguments, especially when arguments themselves contain spaces (e.g., quoted JSON strings). This will cause issues with the new Qwen3-Next-80B-A3B-NVFP4-EP2.yaml config, which contains a JSON string for --speculative-config.

Please use shlex.split() for correct, shell-like parsing. You'll also need to add import shlex at the top of the file.

Suggested change

server_args = server_args_str.split() if server_args_str else []

server_args = shlex.split(server_args_str)

Signed-off-by: mgoin <mgoin64@gmail.com>

xinli-sw · 2025-12-16T16:14:26Z

cc @vadiklyutiy

vadiklyutiy · 2025-12-16T16:39:35Z

tests/evals/gsm8k/configs/Llama-3-8B-Instruct-nonuniform-CT.yaml

 num_questions: 1319
 num_fewshot: 5
-max_model_len: 4096
+server_args: "--enforce-eager --max-model-len 4096"


@mgoin why did you add --enforce-eager here and in another models?

--enforce-eager was always enforced by default in the pytest server init, I just removed the default so we always define it in the config. This greatly helps speedup the CI time due to skipping compilation. We can have separate tests for full testing

Sorry, I missed that --enforce-eager was enabled before

vadiklyutiy · 2025-12-16T16:40:36Z

tests/evals/gsm8k/configs/Qwen3-Next-80B-A3B-NVFP4-EP2.yaml

+  --max-model-len 4096
+  --tensor-parallel-size 2
+  --enable-expert-parallel
+  --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}'


I'd propose to use "num_speculative_tokens" equal to 2 or 3

As I increase the number of speculated tokens, accuracy drops. If I set it to 2 the accuracy is 0 so it isn't a valid test

## Qwen3-Next-80B-A3B-Instruct-NVFP4 vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 Accuracy: 0.861 Total latency: 115.691 s ## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=1 vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}' Accuracy: 0.779 Total latency: 135.127 s ## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=2 vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}' Accuracy: 0.002 Total latency: 60.576 s

hm, In my opinion in this case, the test is valid but vllm behavior is invalid.

vadiklyutiy

Looks good for me

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin added 2 commits December 15, 2025 16:41

Generalize gsm8k test args and add Qwen3-Next MTP B200 test

9cbdb5c

Signed-off-by: mgoin <mgoin64@gmail.com>

Update name

fb65d89

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin requested review from pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 15, 2025 21:43

mergify bot added ci/build qwen Related to Qwen models labels Dec 15, 2025

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Dec 15, 2025

github-project-automation bot added this to NVIDIA Dec 15, 2025

Fix the config

4524bf6

Signed-off-by: mgoin <mgoin64@gmail.com>

vadiklyutiy reviewed Dec 16, 2025

View reviewed changes

vadiklyutiy approved these changes Dec 16, 2025

View reviewed changes

tlrmchlsmth approved these changes Dec 16, 2025

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Dec 16, 2025

tlrmchlsmth merged commit 10ee1c6 into vllm-project:main Dec 16, 2025
55 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Dec 16, 2025

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Dec 17, 2025

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (vll…

00ac78e

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>

divakar-amd mentioned this pull request Dec 17, 2025

[AMD][CI] fix lm eval ci arg #30911

Merged

Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (vll…

b18552f

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (vll…

830fa71

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (vll…

454b679

…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test#30723

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test#30723
tlrmchlsmth merged 3 commits intovllm-project:mainfrom
neuralmagic:extend-gsm8k-test-add-qwen-next-mtp

mgoin commented Dec 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Uh oh!

xinli-sw commented Dec 16, 2025

Uh oh!

vadiklyutiy Dec 16, 2025

Uh oh!

mgoin Dec 16, 2025

Uh oh!

vadiklyutiy Dec 16, 2025

Uh oh!

vadiklyutiy Dec 16, 2025

Uh oh!

mgoin Dec 16, 2025

Uh oh!

vadiklyutiy Dec 16, 2025

Uh oh!

vadiklyutiy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	server_args = server_args_str.split() if server_args_str else []
	server_args = shlex.split(server_args_str)

Uh oh!

Conversation

mgoin commented Dec 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

xinli-sw commented Dec 16, 2025

Uh oh!

vadiklyutiy Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadiklyutiy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mgoin commented Dec 15, 2025 •

edited by github-actions bot

Loading