-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Revert "[Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949]" (#40794) #40853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -550,14 +550,10 @@ def forward( | |||||||||
| hidden_states | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| # Record before `_maybe_pad_hidden_states` pads activations to match | ||||||||||
| # `moe_config.hidden_dim`, e.g. after `align_trtllm_fp4_moe_hidden_dim_for_fi` | ||||||||||
| routed_hidden_dim = hidden_states.shape[-1] | ||||||||||
| hidden_states, og_hidden_dim = self._maybe_pad_hidden_states( | ||||||||||
| shared_experts_input, | ||||||||||
| hidden_states, | ||||||||||
| ) | ||||||||||
| hidden_dim_was_padded = hidden_states.shape[-1] > routed_hidden_dim | ||||||||||
|
|
||||||||||
| result = self._forward_entry( | ||||||||||
| hidden_states, | ||||||||||
|
|
@@ -577,8 +573,6 @@ def forward( | |||||||||
|
|
||||||||||
| # Extract outputs from result | ||||||||||
| shared_output, fused_output = _unpack(result) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
|
||||||||||
| if hidden_dim_was_padded: | ||||||||||
| fused_output = fused_output[..., :routed_hidden_dim] | ||||||||||
|
|
||||||||||
| # If combine kernel already reduced fused, reduce shared to match. | ||||||||||
| # See note above re: the two all-reduce points. | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting the padding detection logic re-introduces a bug where
fused_outputandshared_outputhave mismatched dimensions when padding is applied (Fixes #35949). Instead of a full revert, these variables should be retained to allow for proper unpadding before the outputs are combined.