Skip to content

[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test#32275

Merged
tjtanaa merged 3 commits intovllm-project:mainfrom
ROCm:micah/qwen-mtp-async-scheduling
Jan 14, 2026
Merged

[ROCm][CI] Disable Async Scheduling For Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test#32275
tjtanaa merged 3 commits intovllm-project:mainfrom
ROCm:micah/qwen-mtp-async-scheduling

Conversation

@micah-wil
Copy link
Copy Markdown
Contributor

@micah-wil micah-wil commented Jan 13, 2026

#31998 enabled async scheduling by default with spec decoding. This exposed a bug on for the Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy test, which currently only runs on AMD CI. The test hangs https://buildkite.com/vllm/amd-ci/builds/2766/summary?sid=019bb772-7097-4d58-a3c6-a282068589ed

The test hangs on after evaluating 140 prompts.

Evaluating:  11%|█         | 140/1319 [01:10<01:02, 18.73it/s](EngineCore_DP0 pid=606) INFO 01-13 13:52:55 [shm_broadcast.py:542] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).

Here we disable async scheduling again to unblock CI while we investigate the issue. This should only impact AMD CI.

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@mergify mergify bot added ci/build qwen Related to Qwen models rocm Related to AMD ROCm labels Jan 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request disables asynchronous scheduling for the Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy test on AMD CI. This is a temporary workaround to resolve a hang that occurs when async scheduling is used with speculative decoding for this model. The change is targeted and effectively unblocks the CI pipeline. My review identifies one high-severity issue. While the workaround is correct, it highlights a contradiction between the code's behavior and its documentation regarding the automatic disabling of async scheduling with speculative decoding. It's important to address this discrepancy to maintain code and documentation quality.

--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":1}' \
--trust-remote-code \
--max-model-len 2048 \
--no-async-scheduling \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this flag is a good temporary fix for the CI, its necessity points to a deeper issue. The documentation for SchedulerConfig.async_scheduling in vllm/config/scheduler.py states that it should be automatically disabled when speculative decoding is used. This PR is required because async scheduling is not being automatically disabled, which causes a hang.

This discrepancy indicates that either the documentation is outdated or there is a bug in the logic that should automatically disable this feature. This should be addressed to prevent confusion and future issues. Please consider creating a follow-up issue to either update the documentation or fix the auto-disabling logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discrepancy indicates that either the documentation is outdated or there is a bug in the logic that should automatically disable this feature.

documentation is outdated

@simon-mo
Copy link
Copy Markdown
Collaborator

Please put the flag conditional to rocm platform as well. We might run this test in other platform and I don't want it to be silently disabled.

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
@micah-wil
Copy link
Copy Markdown
Contributor Author

@simon-mo Done

Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 14, 2026
@tjtanaa tjtanaa merged commit 6fa6e7e into vllm-project:main Jan 14, 2026
18 of 19 checks passed
sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
…TP Async EPLB Accuracy Test (vllm-project#32275)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…TP Async EPLB Accuracy Test (vllm-project#32275)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…TP Async EPLB Accuracy Test (vllm-project#32275)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…TP Async EPLB Accuracy Test (vllm-project#32275)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants