-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: large-scale EP(part 1: Add MNNVL MoE A2A support) #3504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: large-scale EP(part 1: Add MNNVL MoE A2A support) #3504
Conversation
|
/bot run |
|
PR_Github #2049 [ run ] triggered by Bot |
|
PR_Github #2049 [ run ] completed with state |
|
/bot run |
|
PR_Github #2113 [ run ] triggered by Bot |
|
PR_Github #2113 [ run ] completed with state |
|
/bot run |
24a44a8 to
163b322
Compare
|
/bot run |
|
PR_Github #2385 [ run ] triggered by Bot |
|
PR_Github #2385 [ run ] completed with state |
|
/bot run |
|
PR_Github #2408 [ run ] triggered by Bot |
|
PR_Github #2408 [ run ] completed with state |
163b322 to
bfda091
Compare
|
/bot run |
|
PR_Github #2768 [ run ] triggered by Bot |
|
PR_Github #2768 [ run ] completed with state |
|
/bot run --add-multi-gpu-test --disable-fail-fast |
|
PR_Github #2773 [ run ] triggered by Bot |
|
PR_Github #2773 [ run ] completed with state |
|
/bot run --disable-fail-fast |
bfda091 to
89ec4a5
Compare
b928430 to
b81aae4
Compare
b81aae4 to
dd0c525
Compare
|
/bot run |
|
PR_Github #3227 [ run ] triggered by Bot |
|
PR_Github #3227 [ run ] completed with state |
|
/bot run |
|
PR_Github #3269 [ run ] triggered by Bot |
|
PR_Github #3269 [ run ] completed with state |
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
Signed-off-by: Dongxu Yang <[email protected]>
dd0c525 to
2070dca
Compare
|
/bot run |
|
PR_Github #3341 [ run ] triggered by Bot |
|
PR_Github #3341 [ run ] completed with state |
|
|
||
| struct GroupSharedBuffer | ||
| { | ||
| int groupIndiceBuffer[GROUP_MAX_INDICE_COUNT]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it should either be Index or Indices(with a "s").Indexis the singular andIndices` is the plural.
| public: | ||
| static constexpr int GROUP_COUNT_PER_BLOCK = 8; | ||
| static_assert(GROUP_COUNT_PER_BLOCK <= 8, "GROUP_COUNT_PER_BLOCK must be less than or equal to 8"); | ||
| static constexpr int WARP_PER_GROUP = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be WARPS_PER_GROUP or WARP_COUNT_PER_GROUP.
| TLLM_CHECK_WITH_INFO( | ||
| blockCountPerChannel <= smCount, "GPU should support at lease one channel, usableSmCount=%d", smCount); | ||
| int perferredChannel = smCount / 2 / blockCountPerChannel; // use half SMs for communication | ||
| int channelCount = std::max(perferredChannel, 1); // at lease one channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment, it must be at least
| static int computeMoeCommChannelCount(int epSize) | ||
| { | ||
| int smCount = getMaxUsableSmCount(); | ||
| int blockCountPerChannel = (epSize + GROUP_COUNT_PER_BLOCK - 1) / GROUP_COUNT_PER_BLOCK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a function in TRT-LLM to compute this (something like divUp or ceilDiv are common names).
Add MNNVL MoE AllToAll support for large scale expert parallism.