[Kernel] add custom op GmmSwigluQuantWeightNzTensorList by ChenxiQ · Pull Request #3804 · vllm-project/vllm-ascend

ChenxiQ · 2025-10-27T13:49:40Z

What this PR does / why we need it?

This PR introduces support for adding custom CANN aclnn ops to vllm-ascend, allowing users to define and use their own custom operators.

Key changes include:

Building and installing custom ops into the vllm-ascend-specified directory
Binding the aclnn op interface to the torch.ops._C_ascend module
Enabling invocation of these ops within vllm-ascend

This PR includes a sample custom op: aclnnGroupedMatmulSwigluQuantWeightNzTensorList, which is adapted from the CANN operator aclnnGroupedMatmulSwigluQuantWeightNZ.
Its input parameters weight and weight_scale now accept list[torch.Tensor] (i.e., at::TensorList).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-10-27T13:50:57Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-10-27T13:57:40Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

SlightwindSec · 2025-11-11T12:26:54Z

+# ======================================================================================================================
+
+########################################################################################################################
+# 环境检查


Please use English for all comments in the code.

SlightwindSec · 2025-11-11T13:21:00Z

+    at::Tensor output_scale = at::zeros({m}, x.options().dtype(at::kFloat));
+    at::Tensor output_offset = at::zeros({m}, x.options().dtype(at::kFloat));
+
+    EXEC_NPU_CMD(


I noticed this PR introduces the EXEC_NPU_CMD macro and its dependencies, which adds a significant amount of code (~10k+ lines). Could we instead follow the pattern from PR add mla_preprocess kernel #3226 for registering the custom op? This should help keep the adaptation layer much smaller.

Looking ahead, if we want to use helper macros like this from op-plugin within vllm-ascend, the best long-term path would be to promote torch_npu to expose these interfaces publicly. That way, we can call them directly instead of vendoring all the dependency code.

SlightwindSec · 2025-11-11T13:33:26Z

+bash build.sh -n grouped_matmul_swiglu_quant -c ascend910b --disable-check-compatible
+
+# install custom ops
+./output/CANN-custom_ops--linux.x86_64.run


I see the new custom operator is compiled into a separate .run installer. This approach seems to complicate the build process and will likely cause problems for users who want to build a Python wheel (.whl) package, as the operator won't be included.

Suggestion: Could we compile this operator directly into the main vllm-ascend shared library?

This would simplify the build, fix the packaging issue, and align with how other operators are handled (e.g., in PR #3226).

This PR takes #3532 as an example, introducing a new path to integrate custom ops into vllm-ascend. The ops in both PRs follows the standard invocation of aclnn ops, which currently is different from all other custom ops on vllm-ascend.

The standard two-step invocation of aclnn ops requires the invocation of aclnnXXXGetWorkspace and aclnnXXX, which is originally implemented in op-plugin. Current custom ops on vllm-ascend looks like they are either ATB ops or avoiding these by invoking ops directly.

github-actions · 2025-11-27T13:58:31Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

wangxiyuan · 2025-11-20T06:29:08Z

+ROOT_DIR=$1
+SOC_VERSION=$2
+
+case "$SOC_VERSION" in


SOC_VERSION: Enum("310", "910b", "910c", "950")

…ist operator into dynamic EPLB (#4216) ### What this PR does / why we need it? Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: #3804 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>

…-project#3804)" This reverts commit 554f16a.

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

…ist operator into dynamic EPLB (vllm-project#4216) ### What this PR does / why we need it? Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: vllm-project#3804 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…ist operator into dynamic EPLB (vllm-project#4216) Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: vllm-project#3804 no ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…ist operator into dynamic EPLB (vllm-project#4216) Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: vllm-project#3804 no ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

…ist operator into dynamic EPLB (vllm-project#4216) ### What this PR does / why we need it? Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: vllm-project#3804 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

…ist operator into dynamic EPLB (vllm-project#4216) ### What this PR does / why we need it? Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list into dynamic EPLB to support list-type parameters This PR also modify the logic of loading model in dynamic-eplb scenario. The operator is based on this pr: vllm-project#3804 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ``` vllm serve /home/weight/DeepSeek-V3.1_w8a8mix_mtp \ --max_num_seqs 8 \ --max-model-len 8192 \ --max-num-batched-tokens 16384 \ --tensor-parallel-size 8 \ --data-parallel-size 2 \ --enable-expert-parallel \ --served-model-name ds_r1 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --no-enable-prefix-caching \ --port 8999 \ --quantization "ascend" \ --gpu-memory-utilization 0.85 \ --trust-remote-code \ --compilation_config '{"cudagraph_capture_sizes":[1,2,4,8,16,32]}' \ --additional-config='{"dynamic_eplb":true, "num_iterations_eplb_update":100, "num_wait_worker_iterations":100}' ``` input&output: 2k 2k This PR: <img width="1318" height="695" alt="fusion" src="https://github.com/user-attachments/assets/f8657813-0c02-42f4-8396-d99e730f48cd" /> Baseline: <img width="1323" height="690" alt="baseline" src="https://github.com/user-attachments/assets/e1323a78-af26-4523-820c-e20e5642a38e" /> - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: 欧派果奶我还要 <845473182@qq.com> Co-authored-by: 白永斌 <baiyongbin3@h-partners.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

…#3804) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

github-actions Bot added merge-conflicts module:tests labels Oct 27, 2025

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from 371d194 to 552768b Compare October 28, 2025 11:38

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from e45ed90 to e937a53 Compare November 11, 2025 03:21

github-actions Bot removed the merge-conflicts label Nov 11, 2025

SlightwindSec reviewed Nov 11, 2025

View reviewed changes

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch 2 times, most recently from 9489122 to 88bea2a Compare November 14, 2025 08:13

845473182 mentioned this pull request Nov 17, 2025

[EPLB][Ops] Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list operator into dynamic EPLB #4216

Merged

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch 11 times, most recently from 9aa862c to 9d7c602 Compare November 20, 2025 12:34

github-actions Bot added the module:core label Nov 20, 2025

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch 6 times, most recently from d38807b to 43c6df9 Compare November 21, 2025 02:52

github-actions Bot added the ci/build label Nov 21, 2025

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from 331376a to f2975ad Compare November 27, 2025 02:21

github-actions Bot added the merge-conflicts label Nov 27, 2025

ChenxiQ added 5 commits November 27, 2025 22:03

add custom op GmmSwigluQuantWeightNZTensorList

8e1db25

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

minimal build script

9b9dd79

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

add doc

da6bc72

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

rename op

3c1be1f

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

adapt new soc_version env

49203ae

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from f2975ad to 523c2e0 Compare November 27, 2025 14:06

github-actions Bot removed the merge-conflicts label Nov 27, 2025

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from 523c2e0 to d91595a Compare November 27, 2025 14:15

adapt new torch aclnn adapter

b5015a8

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>

ChenxiQ force-pushed the br_gmm_swiglu_quant_tensor_list branch from d91595a to b5015a8 Compare November 27, 2025 15:28

zzzzwwjj approved these changes Nov 28, 2025

View reviewed changes

wangxiyuan approved these changes Nov 28, 2025

View reviewed changes

wangxiyuan merged commit 554f16a into vllm-project:main Nov 28, 2025
22 checks passed

Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Nov 30, 2025

Revert "[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (vllm…

5684329

…-project#3804)" This reverts commit 554f16a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList#3804

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList#3804
wangxiyuan merged 6 commits intovllm-project:mainfrom
ChenxiQ:br_gmm_swiglu_quant_tensor_list

ChenxiQ commented Oct 27, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Oct 27, 2025

Uh oh!

github-actions Bot commented Oct 27, 2025

Uh oh!

SlightwindSec Nov 11, 2025

Uh oh!

ChenxiQ Nov 27, 2025

Uh oh!

SlightwindSec Nov 11, 2025

Uh oh!

SlightwindSec Nov 11, 2025

Uh oh!

ChenxiQ Nov 13, 2025

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ChenxiQ commented Oct 27, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Oct 27, 2025

Uh oh!

github-actions Bot commented Oct 27, 2025

Uh oh!

SlightwindSec Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

ChenxiQ Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

SlightwindSec Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

SlightwindSec Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

ChenxiQ Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Nov 27, 2025

Uh oh!

wangxiyuan Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChenxiQ commented Oct 27, 2025 •

edited by github-actions Bot

Loading