Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
4409358
separate sendCommMeta and recvCommMeta
dongxuy04 Aug 5, 2025
8cb3fb6
add ut for local send recv
dongxuy04 Aug 7, 2025
2aa6c2c
add ut for local send recv with FIFO
dongxuy04 Aug 9, 2025
63b5833
add index mapping for sender
dongxuy04 Aug 11, 2025
217b1c2
feat: develop alltoall kernel based on SingleChannelCommunicator
WeiHaocheng Aug 12, 2025
9be64da
separate workspace channelCount and runChannelCount
dongxuy04 Aug 12, 2025
8fa1e03
feat: change alltoall API and unittest
WeiHaocheng Aug 12, 2025
c6ca92f
fix a2a opt and pass ut
dongxuy04 Aug 13, 2025
f298871
fuse moe prepare counter and expert static and apply fuse moe to torc…
WeiHaocheng Aug 13, 2025
71ba4e5
clean some code
WeiHaocheng Aug 15, 2025
39c29cc
modify fused_moe_wide_ep.py and clean code
WeiHaocheng Aug 15, 2025
3925da1
get e2e run
dongxuy04 Aug 13, 2025
a66d88d
refactor lamport into protocol for LL128
dongxuy04 Aug 14, 2025
1663c42
add LL128 proto instead of Lamport
dongxuy04 Aug 14, 2025
ebd2bc9
fix alltoall refactor perf
dongxuy04 Aug 16, 2025
547a5aa
fix refactor issues
dongxuy04 Aug 17, 2025
b5a12a7
fix format
dongxuy04 Aug 18, 2025
79002a8
fix comiple and skip tests cleanup
dongxuy04 Aug 18, 2025
e77dd4a
address review issues 1
dongxuy04 Aug 19, 2025
0afc096
clean some pytest
dongxuy04 Aug 19, 2025
ac6c188
fix mtp with online eplb
dongxuy04 Aug 19, 2025
8dbecbf
refactor moe_comm to create output tensor inside
dongxuy04 Aug 19, 2025
e60693a
remove unused Lamport Proto and clean debug print code
dongxuy04 Aug 20, 2025
5bc9622
unwaive deepseek online eplb tests for GB200
dongxuy04 Aug 20, 2025
b507f72
remove fake impl for inplace op
dongxuy04 Aug 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,372 changes: 1,372 additions & 0 deletions cpp/tensorrt_llm/kernels/fusedMoeCommKernels.cu

Large diffs are not rendered by default.

562 changes: 562 additions & 0 deletions cpp/tensorrt_llm/kernels/fusedMoeCommKernels.h

Large diffs are not rendered by default.

804 changes: 0 additions & 804 deletions cpp/tensorrt_llm/kernels/moeCommKernels.cu

This file was deleted.

268 changes: 0 additions & 268 deletions cpp/tensorrt_llm/kernels/moeCommKernels.h

This file was deleted.

47 changes: 47 additions & 0 deletions cpp/tensorrt_llm/kernels/moeCommKernelsCommon.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* Copyright (c) 2019-2025, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once

#include <stdint.h>

namespace tensorrt_llm
{
namespace kernels
{

#ifdef __CUDACC__
#define ALIGN_256 __align__(256)
#else
#define ALIGN_256 alignas(256)
#endif

constexpr int WARP_SIZE = 32;
constexpr uint32_t WARP_MASK = 0xffffffff;

struct MoeEpWorldInfo
{
int epSize;
int epRank;
};

struct MoeExpertParallelInfo
{
int expertCount = -1;
int topK = 1;
};

} // namespace kernels
} // namespace tensorrt_llm
Loading