[ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled#31523
Conversation
Signed-off-by: ganyi <ygan@amd.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a critical accuracy issue on ROCm for MoE models when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS is enabled. The fix involves enabling the grouped_topk custom op to ensure correct kernel dispatch. While the approach is correct, the implementation could be more robust. I've suggested an improvement to handle cases where a user might have explicitly disabled this custom op, which would otherwise cause the fix to fail silently.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: ganyi <ygan@amd.com>
|
@ganyi1996ppo can you check if when aiter fused moe is enabled, are we using This is the reason why I am very reluctant to change all to custom ops. The default behaviour now is controlled through @ProExpertProg @MengqingCao @xinyu-intel Can we try to ask developers who are migrating the ops to custom ops in their PR to write unit tests for the custom op classes for all platforms because regular unit tests still run correctly with wrong code path ( |
@tjtanaa We might still get into native when fused shared expert is not enabled, but what you said is exactly what I thought. We should add |
Signed-off-by: ganyi <ygan@amd.com>
|
@tjtanaa take another look please, I just make |
vllm/platforms/rocm.py
Outdated
| compilation_config = vllm_config.compilation_config | ||
| parallel_config = vllm_config.parallel_config | ||
| is_eager_execution = compilation_config == CUDAGraphMode.NONE | ||
| use_aiter = rocm_aiter_ops.is_enabled() |
There was a problem hiding this comment.
Let's call this directly at rocm_aiter_ops.is_fused_moe_enabled() like describe in the doc string of _aiter_ops (https://github.com/vllm-project/vllm/blob/51085c2aebe7df2c53dbe9a44d89bc3ee761793f/vllm/_aiter_ops.py#L772C8-L772C42)
topk routing functions are part of MoE related ops.
vllm/platforms/rocm.py
Outdated
| use_aiter_rms_norm = rocm_aiter_ops.is_rmsnorm_enabled() | ||
| use_aiter_fp8_linear = rocm_aiter_ops.is_linear_fp8_enabled() | ||
| use_aiter_shared_expert = ( | ||
| rocm_aiter_ops.is_fused_moe_enabled() |
There was a problem hiding this comment.
we don't need the and because rocm_aiter_ops.is_fusion_moe_shared_experts_enabled() already check for both as shown in line
Line 887 in 51085c2
Signed-off-by: ganyi <ygan@amd.com>
…USION_SHARED_EXPERTS` enabled (vllm-project#31523) Signed-off-by: ganyi <ygan@amd.com>
…USION_SHARED_EXPERTS` enabled (vllm-project#31523) Signed-off-by: ganyi <ygan@amd.com>
…USION_SHARED_EXPERTS` enabled (vllm-project#31523) Signed-off-by: ganyi <ygan@amd.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…USION_SHARED_EXPERTS` enabled (vllm-project#31523) Signed-off-by: ganyi <ygan@amd.com>
Purpose
This PR fix the accuracy issue introduced from #31221, which caused all the model with MOE structure have serious accuracy regression when shared experts fusion enabled. This PR add
+grouped_topkinto custom_ops to make the dispatch actually happens.Test Plan
Test deepseek-r1 on gsm8k
Test Result
current vllm main branch:
With this fix
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.