-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Misc][ROCm] Restrict Aiter moe to specific models. #16435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7e5a148
bac735b
796b644
da29428
226e697
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -468,7 +468,9 @@ def load_model(self, vllm_config: VllmConfig) -> nn.Module: | |
| "Following weights were not initialized from " | ||
| f"checkpoint: {weights_not_loaded}") | ||
|
|
||
| _process_weights_after_loading(model, model_config, target_device) | ||
| with set_current_vllm_config(vllm_config): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully appreciate the implications of setting the config around
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The We definitely have other options.
Both require changing the interface. If you are concerned that exposing current vllm config during the execution of |
||
| _process_weights_after_loading(model, model_config, | ||
| target_device) | ||
|
|
||
| return model.eval() | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
is_rocm_aiter_moe_enabledis called during the actual forward pass, when (IIUC) it's not valid to callget_current_vllm_config(), as it's not set (e.g. indispatch_fused_experts_func). That should be resolved before landing otherwise users will get spammed with warningsUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I add a
cachedecorator, it will use the cached value during the actually forward pass.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But then
is_rocm_aiter_moe_enabledwould not be valid when instantiating a secondLLMThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah. You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just set the current vllm config during the forward pass in the model runner. @youkaichao WDYT?