Skip to content

Conversation

@ChristinaZ
Copy link
Collaborator

Refactor the topk part for the routing kernels in the MoE TrtLLMGen backend

Description

This is the first pull request (PR) for refactoring the routing kernels in the MoE TrtLLMGen backend.
In this PR, I initially relocated the topK parallelization logic from cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernel.cu to a new CUDA header file: cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernelTopK.cuh.

Also, to facilitate future refactoring efforts, I have adjusted the namespace configuration.

Test Coverage

cd cpp/build
make -j$(nproc) google-tests
./tests/unit_tests/kernels/routingKernelsTest

pytest -k test_moe_fp4 tests/unittest/_torch/thop/test_moe.py

@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10205 [ run ] triggered by Bot

@ChristinaZ ChristinaZ force-pushed the refactor_topK_in_routing_trtllmgen branch from 9e0bf1e to 224e42f Compare June 28, 2025 16:31
@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10209 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10205 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #7534 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10209 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7538 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Copy link
Collaborator

@MatthiasKohl MatthiasKohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added one minor comment/note

@ChristinaZ ChristinaZ force-pushed the refactor_topK_in_routing_trtllmgen branch from 224e42f to 68ddd58 Compare July 6, 2025 09:42
@ChristinaZ
Copy link
Collaborator Author

/bot run

Signed-off-by: Christina Zhang <[email protected]>
@ChristinaZ ChristinaZ force-pushed the refactor_topK_in_routing_trtllmgen branch from 68ddd58 to 70dcb88 Compare July 6, 2025 10:10
@ChristinaZ
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11055 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11055 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 70dcb88

@ChristinaZ
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11096 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11096 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8203 completed with status: 'SUCCESS'

@byshiue byshiue merged commit 12d8c7d into NVIDIA:main Jul 7, 2025
3 checks passed
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants