feat(moe): Add is_act_and_mul=False support for Triton MoE kernels#31645
feat(moe): Add is_act_and_mul=False support for Triton MoE kernels#31645tjtanaa merged 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables MoE models with is_act_and_mul=False to run on ROCm by leveraging Triton kernels. The changes are well-structured, introducing is_act_and_mul to FusedMoEQuantConfig, updating workspace sizing calculations, and adding support for non-fused activations. The inclusion of a new test file for ROCm is a great addition for ensuring correctness. I have one suggestion to enhance the performance of the new activation function implementations by minimizing intermediate tensor allocations.
373609e to
e586bef
Compare
|
Hi @rabi, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Add support for non-fused activations (relu2_no_mul, silu_no_mul, gelu_no_mul) in Triton MoE kernels for models like Nemotron-H that use non-SwiGLU activations. - Add is_act_and_mul flag to FusedMoEQuantConfig - Implement non-fused activations in modular_kernel.py - Fix workspace sizes in TritonExperts for is_act_and_mul=False - Enable on ROCm when AITER is disabled - Add test_triton_moe_no_act_mul.py for CUDA and ROCm Signed-off-by: rabi <ramishra@redhat.com>
|
@tjtanaa @rabi @danielafrimi can we actually revert this PR and land #31528 instead? I feel this fix adding |
…llm-project#31645) Signed-off-by: rabi <ramishra@redhat.com>
…llm-project#31645) Signed-off-by: rabi <ramishra@redhat.com>
…llm-project#31645) Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…llm-project#31645) Signed-off-by: rabi <ramishra@redhat.com>
Purpose
Add support for non-fused activations (relu2_no_mul, silu_no_mul, gelu_no_mul) in Triton MoE kernels for models like Nemotron-H that use non-SwiGLU activations.
Test Plan
Add test_triton_moe_no_act_mul.py for CUDA and ROCm
Test Result
Tests pass successfully on local env and would be tested in CI