[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831#33366
[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from #32831#33366gshtras merged 3 commits intovllm-project:mainfrom
Conversation
…o work Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
There was a problem hiding this comment.
Code Review
This pull request correctly refactors the skinny GEMM dispatch logic by removing the unconditional enforcement of contiguous inputs and instead adding checks at the call sites. For the FP8 kernel wvSplitKQ, checking only the activation tensor's contiguity is correct as the kernel can handle non-contiguous weights. However, for the unquantized kernels (wvSplitK, wvSplitKrc, LLMM1), they still seem to require both inputs to be contiguous. The current changes only check the activation tensor, which could lead to incorrect results if non-contiguous weights are passed. I've added comments to vllm/model_executor/layers/utils.py to suggest adding checks for weight contiguity for these unquantized kernels to prevent this potential issue.
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
@gshtras Excuse me for asking, but could this PR break inference on gfx1x by removing part of the contiguous logic? There have been several instances where removing contiguous logic caused the model specifically in ViT to produce nothing but hallucinations. Thank you for your time. |
No, the condition to not dispatch these kernels on Radeon is unchanged |
|
@gshtras many thanks for the answer |
|
@gshtras The tests that were breaking were from
and I tested them and found no failures. Also, I tested reverting Could you also revert that PR as well in this one? |
|
I also help to test with the command used in PR #32831 The performance seems good. Fallback is better than just trying to cast to contiguous for the model as well. |
|
@gshtras can you try rebasing again? I have retried those tests and they still failed. |
…t#32831 (vllm-project#33366) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Pai <416932041@qq.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…t#32831 (vllm-project#33366) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Fixing 2 issues from the previous PR