[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test#30723
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request generalizes the gsm8k test arguments by moving them into configuration files and adds a new test for Qwen3-Next. The changes are mostly good, but there's a critical issue in how server arguments are parsed, which will cause tests with complex arguments to fail. I've provided a suggestion to fix this.
| ] | ||
| # Parse server arguments from config | ||
| server_args_str = eval_config.get("server_args", "") | ||
| server_args = server_args_str.split() if server_args_str else [] |
There was a problem hiding this comment.
Using str.split() is not robust for parsing command-line arguments, especially when arguments themselves contain spaces (e.g., quoted JSON strings). This will cause issues with the new Qwen3-Next-80B-A3B-NVFP4-EP2.yaml config, which contains a JSON string for --speculative-config.
Please use shlex.split() for correct, shell-like parsing. You'll also need to add import shlex at the top of the file.
| server_args = server_args_str.split() if server_args_str else [] | |
| server_args = shlex.split(server_args_str) |
Signed-off-by: mgoin <mgoin64@gmail.com>
|
cc @vadiklyutiy |
| num_questions: 1319 | ||
| num_fewshot: 5 | ||
| max_model_len: 4096 No newline at end of file | ||
| server_args: "--enforce-eager --max-model-len 4096" |
There was a problem hiding this comment.
@mgoin why did you add --enforce-eager here and in another models?
There was a problem hiding this comment.
--enforce-eager was always enforced by default in the pytest server init, I just removed the default so we always define it in the config. This greatly helps speedup the CI time due to skipping compilation. We can have separate tests for full testing
There was a problem hiding this comment.
Sorry, I missed that --enforce-eager was enabled before
| --max-model-len 4096 | ||
| --tensor-parallel-size 2 | ||
| --enable-expert-parallel | ||
| --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}' |
There was a problem hiding this comment.
I'd propose to use "num_speculative_tokens" equal to 2 or 3
There was a problem hiding this comment.
As I increase the number of speculated tokens, accuracy drops. If I set it to 2 the accuracy is 0 so it isn't a valid test
## Qwen3-Next-80B-A3B-Instruct-NVFP4
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4
Accuracy: 0.861
Total latency: 115.691 s
## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=1
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}'
Accuracy: 0.779
Total latency: 135.127 s
## Qwen3-Next-80B-A3B-Instruct-NVFP4 w/ MTP tokens=2
vllm serve nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
Accuracy: 0.002
Total latency: 60.576 s
There was a problem hiding this comment.
hm, In my opinion in this case, the test is valid but vllm behavior is invalid.
…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>
…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…m-project#30723) Signed-off-by: mgoin <mgoin64@gmail.com>
Purpose
Expands the gsm8k test so we can save any server args into the config to go beyond just TP, and adds a Qwen3-Next MTP B200 with EP=2
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.