feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron, fixed by amitz-nv · Pull Request #2462 · flashinfer-ai/flashinfer

amitz-nv · 2026-02-02T15:56:40Z

📌 Description

Support element wise activation (relu^2) in fused MoE in NVFP4 and in FP8PerTensor.
Use new ActivationType enum class instead of GatedActType.
Support Nemotron in deepseek routing as in [None][feat] Add routing support for the new model for both cutlass and trtllm moe backend NVIDIA/TensorRT-LLM#9792
Remove 'A' suffix from UseShuffledMatrixA.

NOTE: This is the fixed version of #2304 that was merged and reverted.

Replaced the problematic condition in deepseek routing that required NumExperts >= MaxSupportedTopExperts with topK<=numExperts
- DeepSeek R1 works with it (tested with VLLM).
Removed irrelevant test cases.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Refactor
- Replaced old gated-activation API with a unified ActivationType enum (many activation kinds supported).
- Propagated activation_type across MoE workflows and kernels.
New Features
- Added CLI option --activation-type to select activation kind for MoE benchmarks.
Bug Fixes
- Enforced activation compatibility and validation for FP8/FP4 paths.
Tests
- Updated and expanded tests to cover new activation types and compatibility scenarios.

… adjusted test Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…gated_activation function in tests to tests/moe/utils.py Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…Type enum from core.py, update docstrings Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…_moe Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…ests skip_checks to skip on non-gated activation with quantizations that don't support it Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…seek routing for nemotron Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…spaceSizeInBytes, getDefaultValidConfigIndex, isValidConfigIndex Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…hmarks Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…fix) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…tions Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…py for trtllm_fp8_block_scale_moe Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…k.cu Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

coderabbitai · 2026-02-02T15:57:13Z

📝 Walkthrough

Walkthrough

Replaces GatedActType with a broader ActivationType across Python benchmarks/CLI, test utilities, C++ runners/launchers, headers, and CUDA kernels; threads activation_type through benchmark/autotuner/runner/kernel call paths; adds eltwise activation support and extends DeepSeek top-experts/top-K sizing and launch macros.

Changes

Cohort / File(s)	Summary
Public API exports `flashinfer/__init__.py`, `flashinfer/fused_moe/__init__.py`	Exported `GatedActType` removed and `ActivationType` added to public imports/exports.
Python MoE & Benchmarks `benchmarks/bench_trtllm_gen_fused_moe_autotuner.py`, `benchmarks/routines/flashinfer_benchmark_utils.py`, `benchmarks/routines/moe.py`, `flashinfer/fused_moe/core.py`	Added `activation_type` parameter and `--activation-type` CLI (uses new `enum_type` argparse helper); threaded activation_type through FP8/FP4 autotuner and benchmark flows; replaced gated_act references with ActivationType propagation.
Tests & Test Utilities `tests/moe/*`, `tests/moe/utils.py`, `tests/moe/test_dpsk_fused_moe_fp8.py`, `tests/moe/test_trtllm_gen_routed_fused_moe.py`	Replaced `gated_act_type` with `activation_type`; added `is_gated_activation` and `NON_GATED_ACTIVATION_SUPPORTED_QUANT_MODES`; updated skip_checks, test parametrizations, and expected routing/quant compatibility checks.
C++ MoE Runner / Permute/Gemm `include/flashinfer/trtllm/fused_moe/runner.h`, `csrc/trtllm_fused_moe_runner.cu`, `csrc/trtllm_batched_gemm_runner.cu`	Introduced `ActivationType` enum and mapping helpers (activation→gated/eltwise); added mActType and updated constructors/getOptions/signatures to accept ActivationType; adjusted workspace/intermediateSizeFactor and enforced eltwiseActType compatibility in batched gemm runner.
C++ Launchers & Kernel Launch `csrc/trtllm_fused_moe_kernel_launcher.cu`	Replaced `GatedActType` with `ActivationType` across launcher implementations; updated many init/getValidConfigs signatures and members to use ActivationType; unified SwiGlu→Swiglu naming.
Routing / DeepSeek kernel macros & sizing `include/flashinfer/trtllm/fused_moe/DevKernel.h`, `include/flashinfer/trtllm/fused_moe/RoutingKernel.h`, `csrc/trtllm_fused_moe_routing_deepseek.cu`	Added MaxNumTopExperts template parameter and constants; expanded top-K buffer sizing and sentinel handling; updated LAUNCH_ROUTING_DEEPSEEK macros to accept/pass `numTopExperts`; adjusted DeepSeek top-K limits and group/top-K checks.
BatchedGemm Options / Eltwise `include/flashinfer/trtllm/batched_gemm/KernelRunner.h`	Added `EltwiseActType` enum and `eltwiseActType` field to TrtllmGenBatchedGemmRunnerOptions; renamed `useShuffledMatrixA` → `useShuffledMatrix`.
Misc / core changes `flashinfer/fused_moe/core.py`, headers/constructors across repo	Removed `GatedActType` enum, migrated constructors/signatures to `activation_type`, added serialize/isGated helpers, and forwarded activation_type into C++/CUDA backends and tactic selection flows.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI/Bench
    participant Py as Python MoE Routines
    participant Runner as C++ MoE Runner
    participant Kernel as CUDA Kernel

    CLI->>Py: parse --activation-type (enum)
    CLI->>Py: invoke benchmark/autotuner with activation_type
    Py->>Runner: call MoE entry (activation_type.value, numTopExperts, eltwiseActType)
    Runner->>Runner: map activation_type → gated/eltwise, compute intermediateSizeFactor
    Runner->>Kernel: launch kernel with activation_type and top-K params
    Kernel-->>Runner: return timings/results
    Runner-->>Py: return results (include activation_type.name)
    Py-->>CLI: print/store results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron #2304 — Very similar migration replacing GatedActType with ActivationType and threading it through launchers, runners, and benchmarks.
Feature: Support Relu2 activation in fused MoE #1954 — Related changes adding ActivationType semantics and Relu2 handling in kernels/launchers.
Update trtllm-gen fused moe routing kernel and add more kernels #1955 — Related autotuner/bench changes touching FP8/FP4 autotuning paths and activation propagation.

Suggested labels

v0.6.2

Suggested reviewers

joker-eph
aleozlx
cyx-6
djmmoss
IwakuraRein
yzh119

Poem

🐇 I hopped from SwiGlu to ActivationType so spry,
I threaded my value from CLI up to the sky,
Top-K grew wider, kernels learned my name,
A rabbit rewired launchers — what a game! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately describes the main changes: adding non-gated Relu2 activation support in NVFP4/FP8, introducing ActivationType enum, adding Nemotron support, and indicating this is a fixed version.
Description check	✅ Passed	Description covers all main points: element-wise activation support, ActivationType replacement, Nemotron support, naming change, and fixes from the previous reverted PR. Pre-commit completed and test status addressed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Failure to add the new IP will result in interrupted reviews.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-02T15:57:26Z

Summary of Changes

Hello @amitz-nv, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the capabilities of fused Mixture-of-Experts (MoE) operations by introducing a more flexible activation function system. It adds support for the Relu2 activation, crucial for certain model architectures, and integrates compatibility for Nemotron models within the DeepSeek routing mechanism. The changes also include important fixes to DeepSeek routing logic and general code cleanup, ensuring broader applicability and robustness of the MoE implementation.

Highlights

Activation Type Refactoring: The GatedActType enum has been replaced with a more comprehensive ActivationType enum across both Python and C++ codebases. This new enum supports a wider range of activation functions, including Relu2 and Identity, alongside existing Swiglu and Geglu.
Relu2 Activation Support: Introduced support for the Relu2 (squared ReLU) element-wise activation function within fused Mixture-of-Experts (MoE) operations. This is specifically enabled for NVFP4 and FP8PerTensor quantization modes.
Nemotron Model Support: DeepSeek routing has been enhanced to support Nemotron models. This includes increasing the maximum supported top_k value from 8 to 22 and adjusting the expert count limits to accommodate models with up to 512 experts.
DeepSeek Routing Fixes: A problematic condition (NumExperts >= MaxSupportedTopExperts) that previously affected DeepSeek R1 routing has been removed, improving compatibility, especially with VLLM. Irrelevant test cases related to this condition were also removed.
Codebase Modernization: The useShuffledMatrixA parameter has been renamed to useShuffledMatrix for improved clarity and consistency throughout the code.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant feature enhancements and fixes. It adds support for the non-gated Relu2 activation in Fused MoE for NVFP4 and FP8, and support for Nemotron models in DeepSeek routing. A key refactoring is the replacement of GatedActType with a more generic ActivationType enum, which has been consistently applied across both Python and C++ codebases. The changes also include important fixes, such as removing a restrictive check on the number of experts and correcting a bug in the routing kernel. The test suite has been updated to cover these new features, ensuring robustness. Overall, the changes are well-implemented and improve the capabilities and correctness of the library.

gemini-code-assist · 2026-02-02T15:59:46Z

csrc/trtllm_batched_gemm_runner.cu

+  // For simplicity pass set scaleAct to scaleGateC
+  gemmData.mInputBuffers.mPtrScaleAct = scaleGateC;


The comment "For simplicity pass set scaleAct to scaleGateC" suggests this might be a temporary solution. While this might work for the current set of activation functions (e.g., if Relu2 doesn't use mPtrScaleAct), it could lead to latent bugs if new element-wise activations are added that require a specific scaleAct value different from scaleGateC.

To improve clarity and prevent future issues, consider passing scaleAct as a separate parameter to the run function and setting mPtrScaleAct accordingly. If scaleGateC is indeed the correct value for all cases, a more detailed comment explaining why would be beneficial.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

csrc/trtllm_fused_moe_routing_deepseek.cu (2)
196-216: ⚠️ Potential issue | 🟡 Minor

Inconsistent sentinel values used for invalid expert indices.

Line 200 uses MaxSupportedExpertCount - 1 (511) as the sentinel for invalid entries, while Line 215 uses KernelParams::MaxNumExperts - 1 for the same purpose. This inconsistency could cause subtle issues during the final top-K reduction if the sentinel values have different sort ordering relative to valid indices.

Consider using a consistent sentinel value across both code paths.
🔧 Suggested fix for consistency
          } else {
            intermidiateScore[ii] = invalidScoreFloat;
-           intermidiateExpert[ii] = KernelParams::MaxNumExperts - 1;
+           intermidiateExpert[ii] = MaxSupportedExpertCount - 1;
          }
518-541: ⚠️ Potential issue | 🔴 Critical

Add validation to enforce topK ≤ 8 for non-Nemotron expert counts, or extend conditional dispatch to all branches.

The LAUNCH_ROUTING_DEEPSEEK macro instantiates non-Nemotron expert configurations (MaxNumExpertsUnit, Deepseek, KimiK2) with MaxNumTopExperts=8, but the global validation at line 560 only enforces mTopK <= 22 without checking expert count. This creates a buffer overflow risk: if a non-Nemotron model passes mTopK > 8 at runtime, the kernel's topScores[MaxNumTopExperts] and topExperts[MaxNumTopExperts] arrays (line 125-126) would overflow when the kernel attempts to write results at indices beyond 7, while kernel logic like lines 191-196 assumes MaxNumTopExperts >= mTopK.

Either:

Add a runtime check preventing non-Nemotron models from requesting mTopK > 8, or

Extend the conditional topK dispatch (lines 531-537) to other expert branches as well.
benchmarks/routines/moe.py (1)

1230-1265: ⚠️ Potential issue | 🟡 Minor

Add validation that activation_type is Swiglu for FP8 block scale MOE benchmark.

The run_fp8_block_moe function doesn't validate the activation type, unlike the autotuner which explicitly rejects non-Swiglu activations. To prevent silent errors and maintain consistency with bench_trtllm_gen_fused_moe_autotuner.py, add a check that raises an error if args.activation_type != ActivationType.Swiglu.
tests/moe/test_trtllm_gen_fused_moe.py (1)
2163-2170: ⚠️ Potential issue | 🟠 Major

Inconsistent activation_type passing: using .value in some places but not others.

At line 2167, args.activation_type.value is passed to moe_args_dequant, but at line 2286 in run_moe_reference_mxint4, args.activation_type is passed directly (without .value). Similarly, lines 2204 and 2235 use .value.

Looking at the moe_args_dequant constructor and run_moe_dequant function, the activation_type is used in a dictionary lookup at line 1956-1960:
activation_type_to_func = {
    ActivationType.Swiglu: F.silu,
    ActivationType.Geglu: F.gelu,
    ActivationType.Relu2: lambda x: F.relu(x) ** 2,
}
activation_func = activation_type_to_func[activation_type]
This expects ActivationType enum values, not integers. Using .value will cause a KeyError since the dict keys are enum members, not integers.
🐛 Proposed fix: Remove `.value` from activation_type arguments
--- a/tests/moe/test_trtllm_gen_fused_moe.py
+++ b/tests/moe/test_trtllm_gen_fused_moe.py
@@ -2164,7 +2164,7 @@ def run_moe_reference_dsfp8(args):
         gemm2_weights_dequant,
         args.permute_info,
         args.use_routing_scales_on_input,
-        args.activation_type.value,
+        args.activation_type,
     )

     return run_moe_dequant(args_dequant, QuantMode.FP8_BLOCK_SCALE), args_dequant
@@ -2201,7 +2201,7 @@ def run_moe_reference_per_tensor_scale_fp8(args):
         gemm2_weights_dequant,
         args.permute_info,
         args.use_routing_scales_on_input,
-        args.activation_type.value,
+        args.activation_type,
     )

     return run_moe_dequant(args_dequant, QuantMode.FP8_PER_TENSOR), args_dequant
@@ -2232,7 +2232,7 @@ def run_moe_reference_bf16(args):
         gemm2_weights_dequant,
         args.permute_info,
         args.use_routing_scales_on_input,
-        args.activation_type.value,
+        args.activation_type,
     )

     return run_moe_dequant(args_dequant, QuantMode.BF16), args_dequant

🤖 Fix all issues with AI agents

In `@csrc/trtllm_fused_moe_runner.cu`:
- Around line 196-222: The function activationTypeToGatedActType is missing a
case for ActivationType::SwigluBias which causes the default-check to fire at
runtime; update activationTypeToGatedActType to handle
ActivationType::SwigluBias (returning the appropriate gated enum, e.g.,
ActType::SwiGlu) alongside ActivationType::Swiglu so that
isGatedActivation-consistent callers (like getOptions) no longer hit the
FLASHINFER_CHECK default branch.

In `@tests/moe/utils.py`:
- Around line 90-94: Update the three routing configuration dicts named kimi_k2,
DSv3, and DSLite in test_dpsk_fused_moe_fp8.py to include the required key
"compatible_activation_types" (a list of activation names). Add a
compatible_activation_types entry that includes the activation(s) used by the
tests (for example ["gelu", "relu", "swish"] or the specific activation(s) the
test suite iterates over) so pytest.skip is not triggered by an empty default.

🧹 Nitpick comments (5)

csrc/trtllm_batched_gemm_runner.cu (1)

226-227: Clarify the scaleAct assignment.

The comment indicates this is a simplification. Consider adding a more detailed comment explaining when scaleAct should differ from scaleGateC, or whether this is the correct semantic for all element-wise activation types.
include/flashinfer/trtllm/fused_moe/runner.h (1)
171-172: Consider throwing an error for invalid activation types.

The TODO comment indicates this should throw an error. Currently, returning "InvalidActivationType" silently accepts invalid values.
Proposed fix
     default:
-      return "InvalidActivationType";  // TODO throw error
+      FLASHINFER_CHECK(false, "Invalid activation type: ", static_cast<int64_t>(activationType));
+      return "InvalidActivationType";  // Unreachable
csrc/trtllm_fused_moe_kernel_launcher.cu (1)

419-424: Consider exposing activation_type in BF16 MoE API.

The comment "not exposed in api for now" suggests this is intentional. If non-gated activations (like Relu2) should eventually be supported for BF16 MoE, this would need to be parameterized.

tests/moe/utils.py (1)

105-108: Threshold change from >= 512 to > 512.

This change allows 512 experts with intermediate_size > 512 to run, where previously they were skipped. Verify this is intentional and that these configurations are supported.

tests/moe/test_dpsk_fused_moe_fp8.py (1)

590-591: Redundant CUDA availability check.

Per repository conventions, tests assume CUDA is available and do not require torch.cuda.is_available() guards. This check is redundant but not harmful.

coderabbitai · 2026-02-02T16:02:32Z

csrc/trtllm_fused_moe_runner.cu

+static inline ActType activationTypeToGatedActType(ActivationType actType) {
+  switch (actType) {
+    case ActivationType::Swiglu:
+      return ActType::SwiGlu;
+    case ActivationType::Geglu:
+      return ActType::GeGlu;
+    default:
+      FLASHINFER_CHECK(false, "Unsupported gated activation type ",
+                       serializeActivationType(actType), " of enum ",
+                       static_cast<int64_t>(actType));
+  }
+  return ActType::SwiGlu;
+}
+
+static inline EltwiseActType activationTypeToEltwiseActType(ActivationType actType) {
+  switch (actType) {
+    case ActivationType::Relu2:
+      return EltwiseActType::Relu2;
+    case ActivationType::Identity:
+      return EltwiseActType::None;
+    default:
+      FLASHINFER_CHECK(false, "Unsupported eltwise activation type ",
+                       serializeActivationType(actType), " of enum ",
+                       static_cast<int64_t>(actType));
+  }
+  return EltwiseActType::None;
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for SwigluBias in the codebase rg -n "SwigluBias" --type-list | head -20

Repository: flashinfer-ai/flashinfer

Length of output: 457

🏁 Script executed:

#!/bin/bash # Find all instances of SwigluBias rg -n "SwigluBias" --type cpp --type cu --type cuh --type py 2>/dev/null | head -50

Repository: flashinfer-ai/flashinfer

Length of output: 50

🏁 Script executed:

#!/bin/bash # Search for ActivationType enum definition and isGatedActivation function rg -n "isGatedActivation|enum.*ActivationType" --type cpp --type cu --type cuh --type py 2>/dev/null | head -50

Repository: flashinfer-ai/flashinfer

Length of output: 50

🏁 Script executed:

#!/bin/bash # Simple search for SwigluBias anywhere in the repository rg "SwigluBias" -i 2>/dev/null | head -30

Repository: flashinfer-ai/flashinfer

Length of output: 2836

🏁 Script executed:

#!/bin/bash # Look at the actual file to understand the context cat -n csrc/trtllm_fused_moe_runner.cu | sed -n '150,250p'

Repository: flashinfer-ai/flashinfer

Length of output: 4650

🏁 Script executed:

#!/bin/bash # Search for ActivationType enum and isGatedActivation function rg "enum.*ActivationType|isGatedActivation" -A 10 2>/dev/null | head -80

Repository: flashinfer-ai/flashinfer

Length of output: 6230

Add SwigluBias case to activationTypeToGatedActType function.

The isGatedActivation helper correctly identifies SwigluBias as a gated activation, but activationTypeToGatedActType lacks a case for it, causing a runtime check failure when SwigluBias is used in line 234 of the getOptions function.

Proposed fix

static inline ActType activationTypeToGatedActType(ActivationType actType) { switch (actType) { case ActivationType::Swiglu: return ActType::SwiGlu; case ActivationType::Geglu: return ActType::GeGlu; + case ActivationType::SwigluBias: + return ActType::SwiGlu; default: FLASHINFER_CHECK(false, "Unsupported gated activation type ", serializeActivationType(actType), " of enum ", static_cast<int64_t>(actType)); } return ActType::SwiGlu; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

static inline ActType activationTypeToGatedActType(ActivationType actType) {

switch (actType) {

case ActivationType::Swiglu:

return ActType::SwiGlu;

case ActivationType::Geglu:

return ActType::GeGlu;

default:

FLASHINFER_CHECK(false, "Unsupported gated activation type ",

serializeActivationType(actType), " of enum ",

static_cast<int64_t>(actType));

}

return ActType::SwiGlu;

}

static inline EltwiseActType activationTypeToEltwiseActType(ActivationType actType) {

switch (actType) {

case ActivationType::Relu2:

return EltwiseActType::Relu2;

case ActivationType::Identity:

return EltwiseActType::None;

default:

FLASHINFER_CHECK(false, "Unsupported eltwise activation type ",

serializeActivationType(actType), " of enum ",

static_cast<int64_t>(actType));

}

return EltwiseActType::None;

}

static inline ActType activationTypeToGatedActType(ActivationType actType) {

switch (actType) {

case ActivationType::Swiglu:

return ActType::SwiGlu;

case ActivationType::Geglu:

return ActType::GeGlu;

case ActivationType::SwigluBias:

return ActType::SwiGlu;

default:

FLASHINFER_CHECK(false, "Unsupported gated activation type ",

serializeActivationType(actType), " of enum ",

static_cast<int64_t>(actType));

}

return ActType::SwiGlu;

}

static inline EltwiseActType activationTypeToEltwiseActType(ActivationType actType) {

switch (actType) {

case ActivationType::Relu2:

return EltwiseActType::Relu2;

case ActivationType::Identity:

return EltwiseActType::None;

default:

FLASHINFER_CHECK(false, "Unsupported eltwise activation type ",

serializeActivationType(actType), " of enum ",

static_cast<int64_t>(actType));

}

return EltwiseActType::None;

}

🤖 Prompt for AI Agents

In `@csrc/trtllm_fused_moe_runner.cu` around lines 196 - 222, The function activationTypeToGatedActType is missing a case for ActivationType::SwigluBias which causes the default-check to fire at runtime; update activationTypeToGatedActType to handle ActivationType::SwigluBias (returning the appropriate gated enum, e.g., ActType::SwiGlu) alongside ActivationType::Swiglu so that isGatedActivation-consistent callers (like getOptions) no longer hit the FLASHINFER_CHECK default branch.

coderabbitai · 2026-02-02T16:02:32Z

tests/moe/utils.py

+    compatible_activation_types = routing_config.get("compatible_activation_types", [])
+    if activation_type not in compatible_activation_types:
+        pytest.skip(
+            f"Incompatible: activation_type={activation_type} not in compatible_activation_types ({compatible_activation_types})"
+        )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check routing_config definitions and compatible_activation_types usage rg -n "routing_config" tests/moe/ -B 2 -A 5 | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 6348

🏁 Script executed:

# Find all test files to understand routing_config usage patterns find tests/moe -name "*.py" -type f

Repository: flashinfer-ai/flashinfer

Length of output: 273

🏁 Script executed:

# Check if compatible_activation_types is present in routing configs rg -n "compatible_activation_types" tests/moe/

Repository: flashinfer-ai/flashinfer

Length of output: 897

🏁 Script executed:

# Find all routing_config dict definitions in test files rg -n "\"compatible_activation_types\"" tests/moe/ -B 10 | grep -E "(routing_config|compatible_activation_types)" | head -50

Repository: flashinfer-ai/flashinfer

Length of output: 808

🏁 Script executed:

# Check all test files for routing_config definitions without compatible_activation_types rg -n "{" tests/moe/test_*.py -A 50 | grep -E "(routing_config\s*=|compatible_activation_types)" | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 144

🏁 Script executed:

# Check if flashinfer.utils functions are imported in utils.py head -30 tests/moe/utils.py

Repository: flashinfer-ai/flashinfer

Length of output: 979

🏁 Script executed:

# Check for all routing_config dict creations in test files rg -n "routing_config\s*=" tests/moe/test_*.py -A 25 | head -200

Repository: flashinfer-ai/flashinfer

Length of output: 4185

🏁 Script executed:

# Check routing_config definitions in test_dpsk_fused_moe_fp8.py more thoroughly rg -n "routing_config\s*=" tests/moe/test_dpsk_fused_moe_fp8.py -A 15 | head -80

Repository: flashinfer-ai/flashinfer

Length of output: 1728

🏁 Script executed:

# Check all routing_config definitions in all test files rg -n "routing_config\s*=" tests/moe/test_trtllm_cutlass_fused_moe.py -A 20 | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 50

🏁 Script executed:

# Check test_trtllm_gen_routed_fused_moe.py for routing_config definitions rg -n "routing_config\s*=" tests/moe/test_trtllm_gen_routed_fused_moe.py -A 20 | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 50

🏁 Script executed:

# Check if compatible_activation_types is missing anywhere in routing configs rg -n "routing_config\s*=\s*\{" tests/moe/ -A 30 | grep -v "compatible_activation_types" | grep -E "(routing_config|\")" | head -80

Repository: flashinfer-ai/flashinfer

Length of output: 903

🏁 Script executed:

# Check the complete skip_checks function to see if flashinfer.utils is used sed -n '50,130p' tests/moe/utils.py

Repository: flashinfer-ai/flashinfer

Length of output: 3638

🏁 Script executed:

# Check test_dpsk_fused_moe_fp8.py routing_config more completely sed -n '747,762p' tests/moe/test_dpsk_fused_moe_fp8.py

Repository: flashinfer-ai/flashinfer

Length of output: 533

🏁 Script executed:

# Find all parametrize decorators with routing_config in MOE test files rg -n "@pytest.mark.parametrize" tests/moe/test_dpsk_fused_moe_fp8.py -A 30 | head -150

Repository: flashinfer-ai/flashinfer

Length of output: 2409

🏁 Script executed:

# Get the complete routing_config definitions from test_dpsk_fused_moe_fp8.py sed -n '508,545p' tests/moe/test_dpsk_fused_moe_fp8.py

Repository: flashinfer-ai/flashinfer

Length of output: 1213

🏁 Script executed:

# Check if there are any other routing_config params beyond line 545 sed -n '545,600p' tests/moe/test_dpsk_fused_moe_fp8.py

Repository: flashinfer-ai/flashinfer

Length of output: 1554

🏁 Script executed:

# Verify imports in skip_checks to confirm flashinfer.utils usage head -25 tests/moe/utils.py

Repository: flashinfer-ai/flashinfer

Length of output: 883

test_dpsk_fused_moe_fp8.py routing configs require compatible_activation_types field

The routing configuration dictionaries in test_dpsk_fused_moe_fp8.py (lines 508–545) are missing the compatible_activation_types field. Without this field, all tests using these configurations will be unexpectedly skipped when compatible_activation_types defaults to an empty list. Update the three routing config definitions (kimi_k2, DSv3, DSLite) to include this required field.

🤖 Prompt for AI Agents

In `@tests/moe/utils.py` around lines 90 - 94, Update the three routing configuration dicts named kimi_k2, DSv3, and DSLite in test_dpsk_fused_moe_fp8.py to include the required key "compatible_activation_types" (a list of activation names). Add a compatible_activation_types entry that includes the activation(s) used by the tests (for example ["gelu", "relu", "swish"] or the specific activation(s) the test suite iterates over) so pytest.skip is not triggered by an empty default.

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…ant launched max value Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

yzh119 · 2026-02-02T18:41:48Z

@flashinfer-bot run

yzh119 · 2026-02-02T18:41:53Z

/bot run

flashinfer-bot · 2026-02-02T18:42:47Z

GitLab MR !288 has been created, and the CI pipeline #43117758 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-02-03T04:48:45Z

[CANCELING] Pipeline #43117758: canceled

yzh119 · 2026-02-03T08:46:31Z

@flashinfer-bot run

yzh119 · 2026-02-03T08:47:06Z

/bot run

flashinfer-bot · 2026-02-03T08:47:16Z

GitLab MR !288 has been updated with latest changes, and the CI pipeline #43168329 is currently running. I'll report back once the pipeline job completes.

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

flashinfer-bot · 2026-02-03T19:54:07Z

[FAILED] Pipeline #43168329: 10/20 passed

nv-yunzheq · 2026-02-04T01:41:29Z

LGTM.

yongwww · 2026-02-04T19:07:55Z

The CI is green for this pr: https://github.com/flashinfer-ai/flashinfer/actions/runs/21629745844?pr=2462

aleozlx

lgtm

…ron, fixed (flashinfer-ai#2462)  ## 📌 Description - Support element wise activation (relu^2) in fused MoE in NVFP4 and in FP8PerTensor. - Use new ActivationType enum class instead of GatedActType. - Support Nemotron in deepseek routing as in NVIDIA/TensorRT-LLM#9792 - Remove 'A' suffix from UseShuffledMatrixA. NOTE: This is the fixed version of flashinfer-ai#2304 that was merged and reverted. - Replaced the problematic condition in deepseek routing that required `NumExperts >= MaxSupportedTopExperts` with `topK<=numExperts` - DeepSeek R1 works with it (tested with VLLM). - Removed irrelevant test cases. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [ ] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Refactor** * Replaced old gated-activation API with a unified ActivationType enum (many activation kinds supported). * Propagated activation_type across MoE workflows and kernels. * **New Features** * Added CLI option --activation-type to select activation kind for MoE benchmarks. * **Bug Fixes** * Enforced activation compatibility and validation for FP8/FP4 paths. * **Tests** * Updated and expanded tests to cover new activation types and compatibility scenarios.  --------- Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

IwakuraRein · 2026-02-05T20:28:30Z

flashinfer/fused_moe/core.py

+            - 4: Geglu
+            - 5: SwigluBias
+            - 6: Relu2
+            - 7: Identity


In csrc/trtllm_fused_moe_runner.cu, the functions activationTypeToGatedActType and activationTypeToEltwiseActType restrict the activation functions to [Swiglu, Geglu, Relu2, Identity]. The comment needs to be updated accordingly.

Additionally, I doubted if Geglu is actually supported for per-tensor fp8. Seems the corresponding cubins are not generated

You're right the docstring here should probably reflect what's supported instead of detailing the entire enum

amitz-nv added 27 commits February 1, 2026 09:46

Using ActivationType instead of GatedActType, added compiled kernels,…

fda3ec0

… adjusted test Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Add actType and eltwiseActType to 'no kernel found' message, move is_…

9e7d911

…gated_activation function in tests to tests/moe/utils.py Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Update remaining GatedActType uses to ActivationType, remove GatedAct…

ce704d1

…Type enum from core.py, update docstrings Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Use ActivationType in benchmarks, add missing activation_type argument

954eaeb

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Minor fixes

158b8fd

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix activation_type default value to Swiglu on trtllm_fp4_block_scale…

3a6b9f4

…_moe Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Minor improvement

02a2502

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Support non-gated activation in NVFP4 block scale MoE

7b17ca1

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Rename useShuffledMatrixA to useShuffledMatrix (remove the 'A' suffix)

5d0fa51

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Add FP4_NVFP4_NVFP4 parameterization to test_llama4_routing, update t…

b74a7f1

…ests skip_checks to skip on non-gated activation with quantizations that don't support it Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Increase supported topK and num experts in deepseek routing for nemotron

52e0828

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Commit more files for increase supported topK and num experts in deep…

23348b2

…seek routing for nemotron Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix formatting

62d0489

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Change TODO to comment

6c0409a

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Change default activation_type to Swiglu

ea95fb0

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Restore intermediate size factor of 2 for gated activation in getWork…

b7bbb7f

…spaceSizeInBytes, getDefaultValidConfigIndex, isValidConfigIndex Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Formatting fixes

8204fb5

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Treat SwigluBias as gated activation

3e0b77c

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix use of ActivationType enum in CLI

cbf66c5

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix activation-type command line argument handling in benchmarks

e370ab2

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix choices of activation-type command line argument handling in benc…

c114994

…hmarks Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

GEMM (non batched) still has mUseShuffledMatrixA member (with 'A' suf…

1b0b5f7

…fix) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Update bench_trtllm_gen_fused_moe_autotuner.py to support more activa…

bf88c7b

…tions Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Revert activation_Type check in bench_trtllm_gen_fused_moe_autotuner.…

f5ac485

…py for trtllm_fp8_block_scale_moe Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Include activation type in results in benchmarks/routings/moe.py

370579c

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Remove bad num experts check in csrc/trtllm_fused_moe_routing_deepsee…

f7c2df5

…k.cu Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Skip test cases with unnecessary parameterization combinations

9b69e42

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

amitz-nv requested review from aleozlx, jiahanc and yzh119 as code owners February 2, 2026 15:56

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

coderabbitai bot reviewed Feb 2, 2026

View reviewed changes

amitz-nv added 2 commits February 2, 2026 16:38

Fix ignoring compatible_activation_types in test when it's not defined

b4edcfe

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Fix data.mTopK value check in deepseek routing according to the relev…

d5887ab

…ant launched max value Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

flashinfer-bot added the run-ci label Feb 2, 2026

amitz-nv added 2 commits February 3, 2026 11:43

Add topK<=numExperts check to deepseek routing

67c0ea4

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

Minor fix of passing activation_type in test_trtllm_gen_fused_moe.py

df1ae03

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

amitz-nv force-pushed the fused-moe-non-gated-fp8 branch from fa81b33 to df1ae03 Compare February 3, 2026 12:13

aleozlx approved these changes Feb 4, 2026

View reviewed changes

yzh119 approved these changes Feb 4, 2026

View reviewed changes

yzh119 merged commit e284274 into flashinfer-ai:main Feb 4, 2026
28 of 34 checks passed

IwakuraRein reviewed Feb 5, 2026

View reviewed changes

dbari mentioned this pull request Feb 6, 2026

[Bug] AttributeError in trtllm_fp8_per_tensor_scale_moe in 0.6.3 #2507

Closed

coderabbitai bot mentioned this pull request Feb 13, 2026

bugfix: fix the enum/int type mismatch mentioned in #2507 #2508

Merged

5 tasks

YangXu1990uiuc mentioned this pull request Feb 18, 2026

moe::dev::routing::routingDeepSeek CUDA Exception: Warp Out-of-range Address #2575

Closed

This was referenced Mar 8, 2026

feat: Add support for TRTLLM MXFP8 non-gated MoE with ReLU2 #2707

Merged

[feat] trtllm-gen mxfp8 gemm #2653

Merged

		// For simplicity pass set scaleAct to scaleGateC
		gemmData.mInputBuffers.mPtrScaleAct = scaleGateC;

Conversation

amitz-nv commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Feb 2, 2026

Uh oh!

yzh119 commented Feb 2, 2026

Uh oh!

flashinfer-bot commented Feb 2, 2026

Uh oh!

flashinfer-bot commented Feb 3, 2026

Uh oh!

yzh119 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented Feb 3, 2026

Uh oh!

flashinfer-bot commented Feb 3, 2026

Uh oh!

flashinfer-bot commented Feb 3, 2026

Uh oh!

nv-yunzheq commented Feb 4, 2026

Uh oh!

yongwww commented Feb 4, 2026

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IwakuraRein Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amitz-nv Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

amitz-nv commented Feb 2, 2026 •

edited

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

yzh119 commented Feb 3, 2026 •

edited

Loading

IwakuraRein Feb 5, 2026 •

edited

Loading