-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Refactor the topk parallelization part for the routing kernels #5567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the topk parallelization part for the routing kernels #5567
Conversation
|
/bot run |
|
PR_Github #10205 [ run ] triggered by Bot |
9e0bf1e to
224e42f
Compare
|
/bot run |
|
PR_Github #10209 [ run ] triggered by Bot |
|
PR_Github #10205 [ run ] completed with state |
|
PR_Github #10209 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just added one minor comment/note
224e42f to
68ddd58
Compare
|
/bot run |
Signed-off-by: Christina Zhang <[email protected]>
68ddd58 to
70dcb88
Compare
|
/bot kill |
|
PR_Github #11055 [ kill ] triggered by Bot |
|
PR_Github #11055 [ kill ] completed with state |
|
/bot run |
|
PR_Github #11096 [ run ] triggered by Bot |
|
PR_Github #11096 [ run ] completed with state |
…A#5567) Signed-off-by: Christina Zhang <[email protected]> Signed-off-by: Yuxin <[email protected]>
Refactor the topk part for the routing kernels in the MoE TrtLLMGen backend
Description
This is the first pull request (PR) for refactoring the routing kernels in the MoE TrtLLMGen backend.
In this PR, I initially relocated the topK parallelization logic from
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernel.cuto a new CUDA header file:cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingKernelTopK.cuh.Also, to facilitate future refactoring efforts, I have adjusted the namespace configuration.
Test Coverage