[Op] DeepSeekV3.2 support bmm_transpose operator by ZYang6263 · Pull Request #4631 · vllm-project/vllm-ascend

ZYang6263 · 2025-12-02T08:54:28Z

What this PR does / why we need it?

DeepSeekV3.2 support bmm_transpose operator.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: ZYang6263 <zy626375@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a new bmm_transpose operator to replace torch_npu.npu_transpose_batchmatmul in _v_up_proj. While the Python-side change is straightforward, my review of the C++ implementation for the new operator (torch.ops._C_ascend.batch_matmul_transpose) revealed critical issues related to out-of-bounds memory access and multi-device safety. These issues, detailed in the line comment, could lead to runtime errors and must be addressed before this change is merged.

gemini-code-assist · 2025-12-02T08:55:59Z

+            res = torch.empty((b, self.num_heads, self.v_head_dim),
+                              dtype=x.dtype,
+                              device=x.device)
+            torch.ops._C_ascend.batch_matmul_transpose(x, self.W_UV, res)


The C++ implementation of the batch_matmul_transpose operator has a couple of critical issues that could lead to runtime errors.

Potential for out-of-bounds access: In csrc/batch_matmul_transpose/op_host/batch_matmul_transpose.h, the tiling data for different batch sizes is cached in a static array global_tiling_data of size MAX_CAPTURE_NUM (1024). The index into this array, batchIdx, is derived from the number of tokens (opShape.m). If the number of tokens is greater than 1024, which is common during prefill, this will result in an out-of-bounds access and a runtime error.

// csrc/batch_matmul_transpose/op_host/batch_matmul_transpose.h:104 int32_t batchIdx = opShape.m - 1; ... if (batchIdx >= 0 && batchIdx < MAX_CAPTURE_NUM) { // MAX_CAPTURE_NUM is 1024 ... } else { TORCH_CHECK(false, "batchIdx is out of range: ", batchIdx); }

Not safe for multi-device execution: The global_tiling_data is a static variable, meaning it's initialized only once. Its device is set to the device of the input tensor from the first call. In a multi-GPU environment where workers on different devices might call this operator, this will cause device mismatch errors for any call not on the initial device.

// csrc/batch_matmul_transpose/op_host/batch_matmul_transpose.h:106 static auto global_tiling_data = at::empty( {tilingSize * MAX_CAPTURE_NUM}, at::TensorOptions().dtype(at::kByte).device(tensor_a.options().device()));

A potential solution for the multi-device issue is to use a thread-safe map from device index to the tiling data tensor, for example using std::map with a std::mutex.

Given these issues, the underlying implementation of this operator needs to be revised before it can be safely used.

github-actions · 2025-12-02T09:37:56Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

hust17yixuan · 2025-12-03T06:09:54Z

this ops may have some problem in high level cann version, considering merge lately

github-actions · 2025-12-06T11:50:56Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ZYang6263 <50876451+ZYang6263@users.noreply.github.com>

### What this PR does / why we need it? DeepSeekV3.2 support bmm_transpose operator. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: ZYang6263 <zy626375@gmail.com> Signed-off-by: ZYang6263 <50876451+ZYang6263@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

### What this PR does / why we need it? DeepSeekV3.2 support bmm_transpose operator. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: ZYang6263 <zy626375@gmail.com> Signed-off-by: ZYang6263 <50876451+ZYang6263@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

### What this PR does / why we need it? DeepSeekV3.2 support bmm_transpose operator. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: ZYang6263 <zy626375@gmail.com> Signed-off-by: ZYang6263 <50876451+ZYang6263@users.noreply.github.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

Support bmm_transpose operator

04114ff

Signed-off-by: ZYang6263 <zy626375@gmail.com>

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

ZYang6263 changed the title ~~[Op] Support bmm_transpose operator~~ [Op] DeepSeekV3.2 support bmm_transpose operator Dec 2, 2025

wangxiyuan approved these changes Dec 2, 2025

View reviewed changes

whx-sjtu approved these changes Dec 2, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 6, 2025

wangxiyuan approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into pr_bmmtrans

4a6a610

Signed-off-by: ZYang6263 <50876451+ZYang6263@users.noreply.github.com>

github-actions bot removed the merge-conflicts label Dec 8, 2025

Merge branch 'main' into pr_bmmtrans

25d19cb

wangxiyuan merged commit a433f32 into vllm-project:main Dec 8, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Op] DeepSeekV3.2 support bmm_transpose operator#4631

[Op] DeepSeekV3.2 support bmm_transpose operator#4631
wangxiyuan merged 3 commits intovllm-project:mainfrom
rjg-lyh:pr_bmmtrans

ZYang6263 commented Dec 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

hust17yixuan commented Dec 3, 2025

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ZYang6263 commented Dec 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 2, 2025

Uh oh!

hust17yixuan commented Dec 3, 2025

Uh oh!

github-actions bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZYang6263 commented Dec 2, 2025 •

edited by github-actions bot

Loading