add xpu op grouped topk by mayuyuace · Pull Request #10 · vllm-project/vllm-xpu-kernels

mayuyuace · 2025-08-08T07:52:24Z

add xpu grouped topk kernel

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jikunshang · 2025-08-08T09:35:47Z

+        start_time = time.perf_counter()
+
+        for _ in range(num_iters):
+            topk_weights, topk_indices = grouped_topk(


can we do some benchmark among grouped_topk_native, grouped_topk_native with @torch.compile, grouped_topk

grouped_topk_native, grouped_topk_native with @torch.compile has been added.

dbyoung18 · 2025-08-18T04:53:44Z

Suggest follow upstream's structure, put moe related kernels under vllm-xpu-kernels/csrc/xpu/moe
reference: https://github.com/vllm-project/vllm/tree/main/csrc/moe

mayuyuace · 2025-08-18T05:02:29Z

Suggest follow upstream's structure, put moe related kernels under vllm-xpu-kernels/csrc/xpu/moe reference: https://github.com/vllm-project/vllm/tree/main/csrc/moe

Done.

dbyoung18 · 2025-08-18T07:52:56Z

@jikunshang Considering later upstream, do u think we should follow vllm to separate moe kernels to another .so/module?
https://github.com/vllm-project/vllm/blob/main/setup.py#L324-L325
https://github.com/vllm-project/vllm/blob/main/vllm/_custom_ops.py#L16-L25

jikunshang · 2025-08-18T13:47:44Z

@jikunshang Considering later upstream, do u think we should follow vllm to separate moe kernels to another .so/module? https://github.com/vllm-project/vllm/blob/main/setup.py#L324-L325 https://github.com/vllm-project/vllm/blob/main/vllm/_custom_ops.py#L16-L25

agree, let's move this kernel into _moe.so since this is moe related kernel.

mayuyuace · 2025-08-19T02:44:50Z

@dbyoung18 @jikunshang
I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

jikunshang · 2025-08-19T03:17:15Z

@dbyoung18 @jikunshang I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in _rocm_C, see link maybe we should add this non-cuda kernel to _xpu_C extension. @dbyoung18 also have a similar kernel, he will try add _xpu_C extension first.

mayuyuace · 2025-08-19T03:44:26Z

@dbyoung18 @jikunshang I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in _rocm_C, see link maybe we should add this non-cuda kernel to _xpu_C extension. @dbyoung18 also have a similar kernel, he will try add _xpu_C extension first.

Since the name will affects all ops not only grouped_topk, maybe changing .so name should be done in another PR?

dbyoung18 · 2025-08-19T04:52:01Z

@dbyoung18 @jikunshang I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in _rocm_C, see link maybe we should add this non-cuda kernel to _xpu_C extension. @dbyoung18 also have a similar kernel, he will try add _xpu_C extension first.

Since the name will affects all ops not only grouped_topk, maybe changing .so name should be done in another PR?

I just refactor cmake for _moe_C and submit a related kernel moe_sum, pls take as a reference.

mayuyuace · 2025-08-19T05:00:16Z

@dbyoung18 @jikunshang I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in _rocm_C, see link maybe we should add this non-cuda kernel to _xpu_C extension. @dbyoung18 also have a similar kernel, he will try add _xpu_C extension first.

Since the name will affects all ops not only grouped_topk, maybe changing .so name should be done in another PR?

I just refactor cmake for _moe_C and submit a related kernel moe_sum, pls take as a reference.

The PR for moe_sum looks like is same with what I did in this JIRA.

dbyoung18 · 2025-08-19T05:19:12Z

@dbyoung18 @jikunshang I have added _moe_C.abi3.so and dirs in benchmark & tests for moe.

I have some discussion with @dbyoung18 about library name, we noticed that rocm add some non-cuda kernels in _rocm_C, see link maybe we should add this non-cuda kernel to _xpu_C extension. @dbyoung18 also have a similar kernel, he will try add _xpu_C extension first.

Since the name will affects all ops not only grouped_topk, maybe changing .so name should be done in another PR?

I just refactor cmake for _moe_C and submit a related kernel moe_sum, pls take as a reference.

The PR for moe_sum looks like is same with what I did in this JIRA.

1.Just noticed ur latest modifications. I made the change yesterday in parallel w/ u. The main parts of +_moe_C between ours are common.
2.@jikunshang Since grouped_topk is a part of moe, I would prefer put it under _moe_C not _xpu_C, what's ur options?
3.For UT, I think register all kernels to vllm-xpu-kernels/tests/register_ops.py is enough, as it's designed to replace _ipex_ops.py and for CUDA, moe related kernels also seen registered to _custom_ops.py . For single UT, it's ok to isolate according to catagory as vllm/tests/kernels.

I think our main philosophy in doing this is to align with the community as much as possible to reduce effort for later upstream.

jikunshang · 2025-08-26T00:47:50Z

vllm-project/vllm#23274 vllm add fused group_topk recently. Please take a look what we missed, thanks!

mayuyuace · 2025-08-26T01:58:26Z

vllm-project/vllm#23274 vllm add fused group_topk recently. Please take a look what we missed, thanks!

OK, I will rewrite a new kernel from vllm fused group_topk, and compare it with what we use now.
And I will create a new PR if the kernel performance from vllm is better.

jikunshang · 2025-09-12T04:44:06Z

can we close this?

mayuyuace added 2 commits August 7, 2025 20:13

add xpu op grouped gemm

b8695a5

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>

Merge branch 'main' into qiming/add_xpu_op_grouped_gemm

e1b17db

Copilot AI review requested due to automatic review settings August 8, 2025 07:52

This comment was marked as outdated.

Sign in to view

mayuyuace and others added 6 commits August 8, 2025 15:57

Update benchmark/benchmark_grouped_topk.py

2dfe384

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update benchmark/benchmark_grouped_topk.py

13f8926

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update csrc/xpu/grouped_topk.cpp

56757e0

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update csrc/xpu/grouped_topk.cpp

2ccd804

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update csrc/xpu/grouped_topk.cpp

0b8f4fb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update csrc/xpu/grouped_topk.cpp

0ea779d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jikunshang reviewed Aug 8, 2025

View reviewed changes

mayuyuace added 6 commits August 10, 2025 22:13

format

0ea9a5e

add native, compile of benchmark

9bef208

format

90bf588

Merge branch 'main' into qiming/add_xpu_op_grouped_topk

1a195b6

format

27831b4

update benchmark

f344f40

move grouped_topk.cpp to moe/grouped_topk.cpp

bf3ec7f

mayuyuace added 3 commits August 18, 2025 19:14

build _moe_C.so

04bbb12

add moe dir

38edc08

format

088bb4f

mayuyuace added 4 commits August 24, 2025 18:30

merge from main

620ea83

format

76b5898

update setup

ab69be4

remove useless file

244a3c5

jikunshang approved these changes Aug 25, 2025

View reviewed changes

mayuyuace mentioned this pull request Sep 2, 2025

add xpu op grouped topk #27

Merged

mayuyuace closed this Sep 12, 2025

mayuyuace deleted the qiming/add_xpu_op_grouped_topk branch November 12, 2025 07:25

Conversation

mayuyuace commented Aug 8, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

jikunshang Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

mayuyuace Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

dbyoung18 commented Aug 18, 2025

Uh oh!

mayuyuace commented Aug 18, 2025

Uh oh!

dbyoung18 commented Aug 18, 2025

Uh oh!

jikunshang commented Aug 18, 2025

Uh oh!

mayuyuace commented Aug 19, 2025

Uh oh!

jikunshang commented Aug 19, 2025

Uh oh!

mayuyuace commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbyoung18 commented Aug 19, 2025

Uh oh!

mayuyuace commented Aug 19, 2025

Uh oh!

dbyoung18 commented Aug 19, 2025

Uh oh!

jikunshang commented Aug 26, 2025

Uh oh!

mayuyuace commented Aug 26, 2025

Uh oh!

jikunshang commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayuyuace commented Aug 19, 2025 •

edited

Loading