[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795)#37427
Conversation
…ct#36795) Signed-off-by: JartX <sagformas@epdcenter.es>
|
@xyang16 @DarkLight1337 @tjtanaa check it please :) |
There was a problem hiding this comment.
Code Review
This pull request addresses a crash on ROCm devices in the qwen3_next model. The issue stems from CUDA events being initialized only on CUDA platforms (using is_cuda()), while the auxiliary stream used for parallel execution is created on all CUDA-like platforms, including ROCm. This discrepancy leads to None events being passed to operations that expect valid event objects on ROCm, causing an AttributeError. The fix correctly aligns the event creation logic with the stream creation logic by using is_cuda_alike(), ensuring that events are properly initialized on ROCm.
|
Thanks for the fix! Sorry to break rocm. |
|
@xyang16 No problem, that's what we're here for :) |
yewentao256
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…ct#36795) (vllm-project#37427) Signed-off-by: JartX <sagformas@epdcenter.es>
PR #36795 introduced maybe_execute_in_parallel for qwen3_next but gated CUDA event creation on is_cuda() while the aux stream uses is_cuda_alike(), causing AttributeError: 'NoneType' object has no attribute 'record' on ROCm
Fix: change event guard from is_cuda() to is_cuda_alike() to match the stream check
Error: