-
-
Notifications
You must be signed in to change notification settings - Fork 15.7k
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test #30723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| model_name: "nm-testing/Qwen3-Next-80B-A3B-Instruct-NVFP4" | ||
| accuracy_threshold: 0.75 | ||
| num_questions: 1319 | ||
| num_fewshot: 5 | ||
| server_args: >- | ||
| --enforce-eager | ||
| --max-model-len 4096 | ||
| --tensor-parallel-size 2 | ||
| --enable-expert-parallel | ||
| --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}' | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd propose to use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I increase the number of speculated tokens, accuracy drops. If I set it to 2 the accuracy is 0 so it isn't a valid test
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm, In my opinion in this case, the test is valid but vllm behavior is invalid. |
||
| env: | ||
| VLLM_USE_FLASHINFER_MOE_FP4: "1" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgoin why did you add
--enforce-eagerhere and in another models?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--enforce-eagerwas always enforced by default in the pytest server init, I just removed the default so we always define it in the config. This greatly helps speedup the CI time due to skipping compilation. We can have separate tests for full testingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that
--enforce-eagerwas enabled before