Conversation
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| start_time = time.perf_counter() | ||
|
|
||
| for _ in range(num_iters): | ||
| topk_weights, topk_indices = grouped_topk( |
There was a problem hiding this comment.
can we do some benchmark among grouped_topk_native, grouped_topk_native with @torch.compile, grouped_topk
There was a problem hiding this comment.
grouped_topk_native, grouped_topk_native with @torch.compile has been added.
|
Suggest follow upstream's structure, put moe related kernels under |
Done. |
|
@jikunshang Considering later upstream, do u think we should follow vllm to separate moe kernels to another .so/module? |
agree, let's move this kernel into _moe.so since this is moe related kernel. |
|
@dbyoung18 @jikunshang |
I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in |
Since the name will affects all ops not only grouped_topk, maybe changing .so name should be done in another PR? |
I just refactor cmake for |
The PR for moe_sum looks like is same with what I did in this JIRA. |
1.Just noticed ur latest modifications. I made the change yesterday in parallel w/ u. The main parts of +_moe_C between ours are common. I think our main philosophy in doing this is to align with the community as much as possible to reduce effort for later upstream. |
|
vllm-project/vllm#23274 vllm add fused group_topk recently. Please take a look what we missed, thanks! |
OK, I will rewrite a new kernel from vllm fused group_topk, and compare it with what we use now. |
|
can we close this? |
add xpu grouped topk kernel