Skip to content

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test#30723

Merged
tlrmchlsmth merged 3 commits intovllm-project:mainfrom
neuralmagic:extend-gsm8k-test-add-qwen-next-mtp
Dec 16, 2025
Merged

[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test#30723
tlrmchlsmth merged 3 commits intovllm-project:mainfrom
neuralmagic:extend-gsm8k-test-add-qwen-next-mtp

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Dec 15, 2025

Purpose

Expands the gsm8k test so we can save any server args into the config to go beyond just TP, and adds a Qwen3-Next MTP B200 with EP=2

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added ci/build qwen Related to Qwen models labels Dec 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request generalizes the gsm8k test arguments by moving them into configuration files and adds a new test for Qwen3-Next. The changes are mostly good, but there's a critical issue in how server arguments are parsed, which will cause tests with complex arguments to fail. I've provided a suggestion to fix this.

]
# Parse server arguments from config
server_args_str = eval_config.get("server_args", "")
server_args = server_args_str.split() if server_args_str else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using str.split() is not robust for parsing command-line arguments, especially when arguments themselves contain spaces (e.g., quoted JSON strings). This will cause issues with the new Qwen3-Next-80B-A3B-NVFP4-EP2.yaml config, which contains a JSON string for --speculative-config.

Please use shlex.split() for correct, shell-like parsing. You'll also need to add import shlex at the top of the file.

Suggested change
server_args = server_args_str.split() if server_args_str else []
server_args = shlex.split(server_args_str)

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Dec 15, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
@xinli-sw
Copy link
Contributor

cc @vadiklyutiy

num_questions: 1319
num_fewshot: 5
max_model_len: 4096 No newline at end of file
server_args: "--enforce-eager --max-model-len 4096"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgoin why did you add --enforce-eager here and in another models?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--enforce-eager was always enforced by default in the pytest server init, I just removed the default so we always define it in the config. This greatly helps speedup the CI time due to skipping compilation. We can have separate tests for full testing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed that --enforce-eager was enabled before

--max-model-len 4096
--tensor-parallel-size 2
--enable-expert-parallel
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to use "num_speculative_tokens" equal to 2 or 3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I increase the number of speculated tokens, accuracy drops. If I set it to 2 the accuracy is 0 so it isn't a valid test

## Qwen3-Next-80B-A3B-Instruct-NVFP4
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4
Accuracy: 0.861
Total latency: 115.691 s

## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=1
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}'
Accuracy: 0.779
Total latency: 135.127 s

## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=2
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
Accuracy: 0.002
Total latency: 60.576 s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, In my opinion in this case, the test is valid but vllm behavior is invalid.

Copy link
Collaborator

@vadiklyutiy vadiklyutiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Dec 16, 2025
@tlrmchlsmth tlrmchlsmth merged commit 10ee1c6 into vllm-project:main Dec 16, 2025
55 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Dec 16, 2025
NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Dec 17, 2025
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…m-project#30723)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…m-project#30723)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants