Feature: Support non-gated activation in cutlass fused MoE nvfp4#2011
Feature: Support non-gated activation in cutlass fused MoE nvfp4#2011yzh119 merged 2 commits intoflashinfer-ai:mainfrom
Conversation
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Summary of ChangesHello @omera-nv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the fused Mixture-of-Experts (MoE) implementation by introducing support for non-gated activation functions, specifically Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughAdded activation type parameter (Swiglu/Relu2) to MoE quantization paths. Modified Changes
Sequence DiagramsequenceDiagram
participant Test as Test Suite
participant RefImpl as Reference Impl
participant CutlassBinding as Cutlass Binding
participant QuantPath as Quant Path (NVFP4)
Test->>Test: Parameterize activation_type<br/>(Swiglu / Relu2)
Test->>RefImpl: Call torch_moe_nvfp4<br/>with activation_type
RefImpl->>RefImpl: Select per-expert act():<br/>Swiglu or Relu2
RefImpl->>RefImpl: Compute MoE output
Test->>CutlassBinding: Call fused_moe<br/>with activation_type
CutlassBinding->>CutlassBinding: getQuantParams()<br/>receives base_activation_type
CutlassBinding->>QuantPath: Route to conditional<br/>weight validation
alt isGatedActivation(base_activation_type)
QuantPath->>QuantPath: Gated constraint:<br/>fc1_weight_block.size(2)<br/>aligned to 2n
else Non-gated
QuantPath->>QuantPath: Non-gated constraint:<br/>fc1_weight_block.size(2)<br/>aligned to n
end
QuantPath-->>CutlassBinding: Return QuantParams
CutlassBinding-->>Test: MoE result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (2)tests/moe/test_trtllm_cutlass_fused_moe.py (1)
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1)
🪛 Ruff (0.14.2)tests/moe/test_trtllm_cutlass_fused_moe.py165-165: Avoid specifying long messages outside the exception class (TRY003) 🔇 Additional comments (8)
Comment |
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
fbf29fe to
e19f167
Compare
There was a problem hiding this comment.
Code Review
This pull request enables non-gated activations like ReLU2 for nvfp4 in the CUTLASS fused MoE kernels. This is achieved by adjusting the weight shape checks based on the activation type. The changes are accompanied by updates to the test suite, including parameterizing an existing test for different activation functions and adding a new test for the relu2 path.
My review focuses on improving code maintainability and test correctness. I've suggested refactoring a piece of duplicated code in the C++ bindings to make it more concise. More importantly, I've pointed out that the new test file lacks assertions to verify the correctness of the computation, which is a critical omission for a test.
|
/bot run |
|
[SUCCESS] Pipeline #37646195: 13/17 passed |
…shinfer-ai#2011) ## 📌 Description This PR removes an assertion in the cutlass fused moe bindings to enable non-gated activations in nvfp4. It also adds a test for this path with relu2 activation. ## 🔍 Related Issues N/A ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [v] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [v] I have installed the hooks with `pre-commit install`. - [v] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [v] Tests have been added or updated as needed. - [v] All tests are passing (`unittest`, etc.). ## Reviewer Notes N/A <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Enhanced quantized Mixture of Experts models to support configurable activation types (Swiglu and ReLU2) in the NVFP4 quantization path. * Improved parameter handling to correctly adapt weight shapes and quantization settings based on the selected activation type. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
📌 Description
This PR removes an assertion in the cutlass fused moe bindings to enable non-gated activations in nvfp4.
It also adds a test for this path with relu2 activation.
🔍 Related Issues
N/A
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
N/A
Summary by CodeRabbit