Skip to content

feat(moe): Add is_act_and_mul=False support for Triton MoE kernels#31645

Merged
tjtanaa merged 2 commits intovllm-project:mainfrom
rabi:is_act_and_mul
Jan 8, 2026
Merged

feat(moe): Add is_act_and_mul=False support for Triton MoE kernels#31645
tjtanaa merged 2 commits intovllm-project:mainfrom
rabi:is_act_and_mul

Conversation

@rabi
Copy link
Copy Markdown
Contributor

@rabi rabi commented Jan 3, 2026

Purpose

Add support for non-fused activations (relu2_no_mul, silu_no_mul, gelu_no_mul) in Triton MoE kernels for models like Nemotron-H that use non-SwiGLU activations.

  • Add is_act_and_mul flag to FusedMoEQuantConfig
  • Implement non-fused activations in modular_kernel.py
  • Fix workspace sizes in TritonExperts for is_act_and_mul=False
  • Enable on ROCm when AITER is disabled

Test Plan

Add test_triton_moe_no_act_mul.py for CUDA and ROCm

Test Result

Tests pass successfully on local env and would be tested in CI

@mergify mergify bot added the rocm Related to AMD ROCm label Jan 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables MoE models with is_act_and_mul=False to run on ROCm by leveraging Triton kernels. The changes are well-structured, introducing is_act_and_mul to FusedMoEQuantConfig, updating workspace sizing calculations, and adding support for non-fused activations. The inclusion of a new test file for ROCm is a great addition for ensuring correctness. I have one suggestion to enhance the performance of the new activation function implementations by minimizing intermediate tensor allocations.

@rabi rabi force-pushed the is_act_and_mul branch 3 times, most recently from 373609e to e586bef Compare January 5, 2026 08:11
@rabi rabi changed the title feat(rocm): Support is_act_and_mul=False MoE with Triton feat(moe): Add is_act_and_mul=False support for Triton MoE kernels Jan 6, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 6, 2026

Hi @rabi, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Add support for non-fused activations (relu2_no_mul, silu_no_mul, gelu_no_mul)
in Triton MoE kernels for models like Nemotron-H that use non-SwiGLU activations.

- Add is_act_and_mul flag to FusedMoEQuantConfig
- Implement non-fused activations in modular_kernel.py
- Fix workspace sizes in TritonExperts for is_act_and_mul=False
- Enable on ROCm when AITER is disabled
- Add test_triton_moe_no_act_mul.py for CUDA and ROCm

Signed-off-by: rabi <ramishra@redhat.com>
@tjtanaa tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
Copy link
Copy Markdown
Collaborator

@tjtanaa tjtanaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tjtanaa tjtanaa merged commit 25eef3d into vllm-project:main Jan 8, 2026
52 checks passed
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Jan 8, 2026

@tjtanaa @rabi @danielafrimi can we actually revert this PR and land #31528 instead? I feel this fix adding is_act_and_mul to the quant config and the activation implementations are not as nice as the refactor in the other PR

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…llm-project#31645)

Signed-off-by: rabi <ramishra@redhat.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants