[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test#32275
Conversation
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
There was a problem hiding this comment.
Code Review
This pull request disables asynchronous scheduling for the Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy test on AMD CI. This is a temporary workaround to resolve a hang that occurs when async scheduling is used with speculative decoding for this model. The change is targeted and effectively unblocks the CI pipeline. My review identifies one high-severity issue. While the workaround is correct, it highlights a contradiction between the code's behavior and its documentation regarding the automatic disabling of async scheduling with speculative decoding. It's important to address this discrepancy to maintain code and documentation quality.
| --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}' \ | ||
| --trust-remote-code \ | ||
| --max-model-len 2048 \ | ||
| --no-async-scheduling \ |
There was a problem hiding this comment.
While this flag is a good temporary fix for the CI, its necessity points to a deeper issue. The documentation for SchedulerConfig.async_scheduling in vllm/config/scheduler.py states that it should be automatically disabled when speculative decoding is used. This PR is required because async scheduling is not being automatically disabled, which causes a hang.
This discrepancy indicates that either the documentation is outdated or there is a bug in the logic that should automatically disable this feature. This should be addressed to prevent confusion and future issues. Please consider creating a follow-up issue to either update the documentation or fix the auto-disabling logic.
There was a problem hiding this comment.
This discrepancy indicates that either the documentation is outdated or there is a bug in the logic that should automatically disable this feature.
documentation is outdated
|
Please put the flag conditional to rocm platform as well. We might run this test in other platform and I don't want it to be silently disabled. |
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
@simon-mo Done |
…TP Async EPLB Accuracy Test (vllm-project#32275) Signed-off-by: Micah Williamson <micah.williamson@amd.com>
…TP Async EPLB Accuracy Test (vllm-project#32275) Signed-off-by: Micah Williamson <micah.williamson@amd.com>
…TP Async EPLB Accuracy Test (vllm-project#32275) Signed-off-by: Micah Williamson <micah.williamson@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…TP Async EPLB Accuracy Test (vllm-project#32275) Signed-off-by: Micah Williamson <micah.williamson@amd.com>
#31998 enabled async scheduling by default with spec decoding. This exposed a bug on for the Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy test, which currently only runs on AMD CI. The test hangs https://buildkite.com/vllm/amd-ci/builds/2766/summary?sid=019bb772-7097-4d58-a3c6-a282068589ed
The test hangs on after evaluating 140 prompts.
Here we disable async scheduling again to unblock CI while we investigate the issue. This should only impact AMD CI.