-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Misc][ROCm] Restrict Aiter moe to specific models. #16435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
Signed-off-by: charlifu <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: charlifu <[email protected]>
SageMoore
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one question.
| f"checkpoint: {weights_not_loaded}") | ||
|
|
||
| _process_weights_after_loading(model, model_config, target_device) | ||
| with set_current_vllm_config(vllm_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully appreciate the implications of setting the config around _process_weights_after_loading. Could you explain a bit why it's necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current_vllm_config value is not set when _process_weights_after_loading is being callled. Currently, this value is set only when creating the model class. So we do not have the information when determining whether to use aiter moe.
We definitely have other options.
- Add model config parameter to
_process_weights_after_loadingandis_rocm_aiter_moe_enabledfunctions. We might have to add more cases to here. - Add a private member of model config to fused_moe layer class and set the value when creating the layer class.
Both require changing the interface. If you are concerned that exposing current vllm config during the execution of _process_weigths_after_loading could cause any potential issues. I prefer second option, since we call is_rocm_aiter_moe_enabled function during model execution as well.
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why restrict the AITER kernels to specific models? Is this a temporary performance hack or do you plan to maintain this list of models long term?
| model_cls_name = get_architecture_class_name( | ||
| get_current_vllm_config().model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like is_rocm_aiter_moe_enabled is called during the actual forward pass, when (IIUC) it's not valid to call get_current_vllm_config(), as it's not set (e.g. in dispatch_fused_experts_func). That should be resolved before landing otherwise users will get spammed with warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I add a cache decorator, it will use the cached value during the actually forward pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But then is_rocm_aiter_moe_enabled would not be valid when instantiating a second LLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah. You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just set the current vllm config during the forward pass in the model runner. @youkaichao WDYT?
We plan to maintain this list long term: AITER assembly MoE is manually tuned per model characteristics. We would like to enable it only for models that have been fully vetted to avoid performance pitfalls |
|
@SageMoore Thanks to @tlrmchlsmth, current way to make this change is not valid when creating a second |
|
What if we did something similar to FpLinearOp where we initialize an object that dispatched at init time instead of forward time? I think we're trying to move all kernels in that direction anyway. |
|
@charlifu @ProExpertProg @tlrmchlsmth I think the issue addressed by this PR is being resolved in this PR (#16727). The AITER MoE kernel selection logic has been cleaned up and now it is determined based on the arguments passed into the |
|
@tjtanaa Thank you for the information. Will close this. |
This PR adds model restriction to aiter moe kernels. On rocm, only Mixtral and DeepSeek models will use aiter moe kernels.
is_rocm_aiter_moe_enabled()vllm_configglobal value for_process_weights_after_loadingfunction.get_current_vllm_configto obtain the model name in the first timeis_rocm_aiter_moe_enabledinvoked.