-
Notifications
You must be signed in to change notification settings - Fork 903
[debugging CI Do Not Merge] #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
1c88890
B12x MoE
bkryu 69ff093
Workspace allocation speedup with functional API
bkryu aba8593
Address review comments
bkryu 6d738bf
Second round of review comments
bkryu cc05a8a
Undo unnecessary deletion of comments
bkryu 7a4c90e
Guard CUDA 13
bkryu ab91095
Rename blackwell_gefore to blackwell_sm12x
bkryu fd3c7c6
Drop x_bf16 and unify to x
bkryu ee71590
Address comments
bkryu 68ad822
CUDA 12 SM100/103 fix
bkryu c191ca5
Revert "CUDA 12 SM100/103 fix"
bkryu f75ea36
Testing
bkryu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reject EP-style configs on SM120/SM121.
This path now runs the SM12x backend, but it still accepts
local_num_experts != num_expertsand non-zerolocal_expert_offset. SM120/SM121 does not support local-expert remapping, so those arguments can produce unsupported benchmark cases or route into experts that are not present in the locally-created weight tensors.Suggested guard
use_functional = getattr(args, "use_functional_api", False) # SM120 passes bf16 as x (kernel fuses quantization); SM100 passes FP4. sm_major_bm = torch.cuda.get_device_capability(device)[0] + if sm_major_bm == 12 and ( + local_num_experts != num_experts or local_expert_offset != 0 + ): + raise ValueError( + "cute_dsl_fp4_block_scale_moe on SM120/SM121 does not support " + "local expert sharding; use local_num_experts=num_experts and " + "local_expert_offset=0." + ) x_input = tensors["x_bf16"] if sm_major_bm == 12 else tensors["x"]Based on learnings: Expert Parallelism (EP) is unsupported on SM120, and the SM120 dispatch paths intentionally do not forward
local_expert_offsetbecause kernel-side remapping is missing.π€ Prompt for AI Agents