fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm#32244
fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm#32244tjtanaa merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly enables non-gated MoE support on ROCm. The changes are well-targeted and logical. You've correctly updated the platform check from is_cuda() to is_cuda_alike() to include ROCm. Additionally, you've properly disabled the AITER kernel for non-gated MoE, as it only supports gated activations, allowing the system to fall back to the Triton implementation which handles this case. The changes appear correct and align with the stated purpose. I have no further comments.
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @rabi, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Models like NemotronH use non-gated MoE with activations like relu2_no_mul. Previously, this was blocked on ROCm because the platform check only allowed CUDA. - Updates platform check from is_cuda() to is_cuda_alike() to allow ROCm - Disables AITER kernel for non-gated MoE since AITER only supports gated activations (silu/gelu) - Falls back to Triton implementation which properly handles non-gated activations via apply_moe_activation() Signed-off-by: rabi <ramishra@redhat.com>
vllm-project#32244) Signed-off-by: rabi <ramishra@redhat.com>
vllm-project#32244) Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
vllm-project#32244) Signed-off-by: rabi <ramishra@redhat.com>
Purpose
Models like NemotronH use non-gated MoE with activations like relu2_no_mul. Previously, this was blocked on ROCm because the platform check only allowed CUDA.
Test Plan
Tested on AMD MI210 GPU with NemotronH model.
Test Result
Model loads and serves successfully.