-
Notifications
You must be signed in to change notification settings - Fork 906
feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron, fixed #2462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
fda3ec0
Using ActivationType instead of GatedActType, added compiled kernels,β¦
amitz-nv 9e7d911
Add actType and eltwiseActType to 'no kernel found' message, move is_β¦
amitz-nv ce704d1
Update remaining GatedActType uses to ActivationType, remove GatedActβ¦
amitz-nv 954eaeb
Use ActivationType in benchmarks, add missing activation_type argument
amitz-nv 158b8fd
Minor fixes
amitz-nv 3a6b9f4
Fix activation_type default value to Swiglu on trtllm_fp4_block_scaleβ¦
amitz-nv 02a2502
Minor improvement
amitz-nv 7b17ca1
Support non-gated activation in NVFP4 block scale MoE
amitz-nv 5d0fa51
Rename useShuffledMatrixA to useShuffledMatrix (remove the 'A' suffix)
amitz-nv b74a7f1
Add FP4_NVFP4_NVFP4 parameterization to test_llama4_routing, update tβ¦
amitz-nv 52e0828
Increase supported topK and num experts in deepseek routing for nemotron
amitz-nv 23348b2
Commit more files for increase supported topK and num experts in deepβ¦
amitz-nv 62d0489
Fix formatting
amitz-nv 6c0409a
Change TODO to comment
amitz-nv ea95fb0
Change default activation_type to Swiglu
amitz-nv b7bbb7f
Restore intermediate size factor of 2 for gated activation in getWorkβ¦
amitz-nv 8204fb5
Formatting fixes
amitz-nv 3e0b77c
Treat SwigluBias as gated activation
amitz-nv cbf66c5
Fix use of ActivationType enum in CLI
amitz-nv e370ab2
Fix activation-type command line argument handling in benchmarks
amitz-nv c114994
Fix choices of activation-type command line argument handling in bencβ¦
amitz-nv 1b0b5f7
GEMM (non batched) still has mUseShuffledMatrixA member (with 'A' sufβ¦
amitz-nv bf88c7b
Update bench_trtllm_gen_fused_moe_autotuner.py to support more activaβ¦
amitz-nv f5ac485
Revert activation_Type check in bench_trtllm_gen_fused_moe_autotuner.β¦
amitz-nv 370579c
Include activation type in results in benchmarks/routings/moe.py
amitz-nv f7c2df5
Remove bad num experts check in csrc/trtllm_fused_moe_routing_deepseeβ¦
amitz-nv 9b69e42
Skip test cases with unnecessary parameterization combinations
amitz-nv b4edcfe
Fix ignoring compatible_activation_types in test when it's not defined
amitz-nv d5887ab
Fix data.mTopK value check in deepseek routing according to the relevβ¦
amitz-nv 67c0ea4
Add topK<=numExperts check to deepseek routing
amitz-nv df1ae03
Minor fix of passing activation_type in test_trtllm_gen_fused_moe.py
amitz-nv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "For simplicity pass set scaleAct to scaleGateC" suggests this might be a temporary solution. While this might work for the current set of activation functions (e.g., if
Relu2doesn't usemPtrScaleAct), it could lead to latent bugs if new element-wise activations are added that require a specificscaleActvalue different fromscaleGateC.To improve clarity and prevent future issues, consider passing
scaleActas a separate parameter to therunfunction and settingmPtrScaleActaccordingly. IfscaleGateCis indeed the correct value for all cases, a more detailed comment explaining why would be beneficial.