[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72780

crashbussy · 2025-05-19T04:47:00Z

PR Category

Execute Infrastructure

PR Types

New features

Description

paddle.Tensor.matmul支持0-Size。

修改历程介绍如下：

在PaddleAPITest report/0size_tensor中检索paddle.Tensor.matmul的错误日志，发现[accuracy error]报错。分析可能是前向过程出错。！！！注意到(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)，也是就shape不匹配的问题
`2025-03-05 15:27:11.945992 test begin: paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

[accuracy error] paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

Not equal to tolerance rtol=0.01, atol=0.01

(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)
x: array([], shape=(0, 100, 1, 40), dtype=float64)
y: array([], shape=(0, 100, 40), dtype=float64)
前向修复： a. 在Paddle代码中检索def matmul，发现matmul的核心实现调用的是_C_ops的matmul b. 以_C_ops的matmul在paddle/phi/ops/yaml中检索，发现matmul的InferMeta函数使用到一个： MatmulInferMeta- op: matmul
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : MatmulInferMeta
param: [x, y, false, false]`

c. 在代码中检索MatmulInferMeta，并检查其dims(shape)的推导是否正确（在matmul中推导是正确因此不用修改）
d. 在paddle/phi/kernels中检索matmul，找全所有matmul的实现Kernel。发现共有五个涉及matmul的文件，分别为：

paddle/phi/kernels/cpu/matmul_kernel.cc
paddle/phi/kernels/gpu/matmul_kernel.cu
paddle/phi/kernels/impl/matmul_kernel_impl.h
paddle/phi/kernels/matmul_kernel.h
paddle/phi/kernels/gpu/weight_only_linear_grad_kernel.cu
其中cc和cu文件均将前两个.h文件设为头文件，因此只用修改.h文件即可。而matmul_kernel.h和matmul_kernel_impl.h中不需要（不能）重复定义，故只修改了matmul_kernel_impl.h

注意，下面的文件仅以 paddle/phi/kernels/matmul_kernel.h为头文件，并不含有matmulkernel字段，所以不予以考虑

paddle/phi/kernels/impl/svdvals_grad_kernel_impl.h
paddle/phi/kernels/impl/eigvalsh_grad_kernel_impl.h
paddle/phi/kernels/impl/lstsq_kernel_impl.h
paddle/phi/kernels/impl/eigh_grad_kernel_impl.h
paddle/phi/kernels/impl/qr_grad_kernel_impl.h
paddle/phi/kernels/sparse/gpu/fused_attention_kernel.cu

在paddle/phi/kernels/impl/matmul_kernel_impl.h中删除了原来的代码（因为有错误），加入以下代码，完成修复
` if (x.numel() == 0 || y.numel() == 0) {
auto x_dims = x.dims();
auto y_dims = y.dims();

if (transpose_x && x_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(x_dims)[x_dims.size() - 1],
            const_cast<DDim&>(x_dims)[x_dims.size() - 2]);
}

if (transpose_y && y_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(y_dims)[y_dims.size() - 1],
            const_cast<DDim&>(y_dims)[y_dims.size() - 2]);
}

std::vector<int64_t> x_batch_dims(x_dims.data(), x_dims.data() + x_dims.size() - 2);
std::vector<int64_t> y_batch_dims(y_dims.data(), y_dims.data() + y_dims.size() - 2);

std::vector<int64_t> bcast_dims;
if (!funcs::BroadcastTwoVec(x_batch_dims, y_batch_dims, &bcast_dims)) {
  PADDLE_THROW(phi::errors::InvalidArgument(
      "Failed to broadcast input batch dimensions."));
}

std::vector<int64_t> out_shape(bcast_dims.begin(), bcast_dims.end());

int64_t m = transpose_x ? x_dims[x_dims.size() - 1] : x_dims[x_dims.size() - 2];
int64_t n = transpose_y ? y_dims[y_dims.size() - 2] : y_dims[y_dims.size() - 1];

out_shape.push_back(m);
out_shape.push_back(n);

DDim out_dims = make_ddim(out_shape);
out->Resize(out_dims);
ctx.template Alloc<T>(out);
return;

}`

添加单测：

在test/legacy_test/test_matmul_op.py中添加0 size tensor输入的单测:
`import paddle
import numpy as np

所有测试用例

test_cases = [
# 格式: (x_shape, y_shape, expected_out_shape, dtype)
((0, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((0, 100, 1), (1, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (1, 1, 4), (0, 100, 4), "float64"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float16"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float32"),
((1, 0, 1), (1, 1, 40), (1, 0, 40), "float64"),
((1, 0, 1), (1, 1, 4), (1, 0, 4), "float64"),
((1, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((1, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((1, 100, 1), (1, 1, 0), (1, 100, 0), "float64"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float16"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float32"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float16"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float32"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float16"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float32"),
]

def run_all_matmul_tests():
for idx, (x_shape, y_shape, expected_shape, dtype) in enumerate(test_cases):
print(f"\nTest {idx+1} begin: paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})")
try:
x = paddle.zeros(x_shape, dtype=dtype)
y = paddle.zeros(y_shape, dtype=dtype)
result = x.matmul(y)

        if result.shape != expected_shape:
            raise AssertionError(
                f"[accuracy error] paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})\n\n"
                f"Not equal to tolerance rtol=0.01, atol=0.01\n\n"
                f"(shapes {result.shape}, {expected_shape} mismatch)\n"
                f" x: array([], shape={x.shape}, dtype={dtype})\n"
                f" y: array([], shape={y.shape}, dtype={dtype})"
            )
        else:
            print(f"[PASS] Shape is correct: {result.shape}")
    except Exception as e:
        print(f"{e}")

if name == "main":
run_all_matmul_tests()`

备注：原来的代码存在错误，
std::vector<std::int64_t> out_dims(x_dims.size() - 1 + y_dims.size() - 1);
对于二维 matmul 是对的，但对于多维（batched）matmul 来说，这会导致错误。
正确做法应是：batch 维度取广播后的结果；最后两维做矩阵乘法；所以输出 rank 应与广播后的 batch rank 相同 + 2（最后两个维度）
重写后支持任意 rank 的输入张量（>=2）

下面提供一个最简单的示例测试
`import paddle

测试数据

x = paddle.randn([0, 100, 1]) # shape: [0, 100, 1]
y = paddle.randn([1, 1, 4]) # shape: [1, 1, 4]

result = x.matmul(y)
print(result.shape) # 输出: [0, 100, 4], 而不是原来的[0, 100, 1, 4]`

一个更鲁棒、支持广播规则、保留空维度（如 batch_size=0）的 MatmulKernel 实现版本，适用于多维张量的矩阵乘法（batched matmul）

matmul的0 size tensor输入的单测

paddle-bot · 2025-05-19T04:47:05Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

crashbussy · 2025-05-19T12:01:48Z

@wanghuancoder，想请问方便的时候能否帮忙审核下这个 PR？非常感谢你的review！

luotao1 · 2025-05-19T13:16:13Z

@DanielSun11 请看下

luotao1 · 2025-05-19T13:17:03Z

@crashbussy 你可以看下CI日志，目前编译都没有过

crashbussy · 2025-05-19T16:05:46Z

@luotao1 总是跟不上develop分支的版本，develop不断的更新导致编译不通过。晚上人少的时候我同步完再试试。

crashbussy · 2025-05-19T18:40:57Z

很奇怪，不知为何会编译不通过。查看了CI，内容很让人困惑，提示文件变更两个，但是这两个文件并不是我修改的两个文件，第一次遇到这种情况，不清楚是如何发生的。

crashbussy · 2025-05-20T01:58:48Z

@luotao1 求助，不知道为什么编译没有通过，编译报错是一些我没有动过的文件，不清楚报错与我的修改有何关联。

luotao1 · 2025-05-20T02:00:58Z

你本地能编译过么？

这里错误就是 matmul_kernel_impl.h 编译没通过

luotao1 · 2025-05-20T10:14:32Z

paddle/phi/kernels/impl/matmul_kernel_impl.h

 }

 template <typename T, typename Context>
-void MatmulWithFlattenKernelImpl(const Context& dev_ctx,


MatmulWithFlattenKernelImpl 函数删了？在2216行有用到

fangfangssj · 2025-05-20T11:38:55Z

在Tensor规范化第二期的活动中有关于0-size Tensor的测试，我正好看了关于matmul的问题，#70238 这个PR中修复的有一些问题
matmul计算输出out的形状的符号推导应该放在paddle/phi/infermeta/binary.cc下的MatmulInferMeta中，而不是放在对应的kernel中，应该去除MatmulInferMeta中关于0size的检查（如下图所示），修改MatmulInferMeta函数，同时修改对应的符号推导部分，以便于可以正常计算out的形状

之后在kernel中添加对out的判断即可，如果out->numel()是0的话，申请内存返回即可
同时还需要给matmul的反向算子添加对应的0-size支持，和前向的逻辑是一样的
详细的修改方法可以参考 #72637 个人觉得这个是比较难的一个API

crashbussy · 2025-05-20T11:48:38Z

再次修改了一次，还是发现编译通不过。感谢@fangfangssj的细致讲解，这个修改难度确实大一些，目前还没有信心能找到对应解决办法，这个pr我就关闭了。我去完成一些添加单测的任务。

crashbussy added 3 commits May 19, 2025 12:03

Update matmul_kernel_impl.h

4426a4b

一个更鲁棒、支持广播规则、保留空维度（如 batch_size=0）的 MatmulKernel 实现版本，适用于多维张量的矩阵乘法（batched matmul）

Create test_matmul_v3_op.py

31e859b

matmul的0 size tensor输入的单测

Merge branch 'PaddlePaddle:develop' into crashbussy-1

7c06327

paddle-bot bot added the contributor External developers label May 19, 2025

luotao1 added the HappyOpenSource Pro 进阶版快乐开源活动，更具挑战性的任务 label May 19, 2025

luotao1 assigned luotao1 and wanghuancoder May 19, 2025

luotao1 mentioned this pull request May 19, 2025

【开源任务】 Paddle API 0-size 机制建设 #72637

Closed

crashbussy added 3 commits May 19, 2025 15:04

Merge branch 'PaddlePaddle:develop' into crashbussy-1

dad35ef

Merge branch 'PaddlePaddle:develop' into crashbussy-1

ffd9a48

Merge branch 'PaddlePaddle:develop' into crashbussy-1

c6e23cf

Merge branch 'PaddlePaddle:develop' into crashbussy-1

ede7e5b

crashbussy closed this May 19, 2025

crashbussy reopened this May 19, 2025

crashbussy mentioned this pull request May 19, 2025

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

Closed

crashbussy added 6 commits May 20, 2025 03:12

Update matmul_kernel_impl.h

1ae8977

Update matmul_kernel_impl.h

968f07c

Update matmul_kernel_impl.h

1af5dcd

Update matmul_kernel_impl.h

4821fb4

Update matmul_kernel_impl.h

707417b

Update matmul_kernel_impl.h

6cb0d4c

Update matmul_kernel_impl.h

8c304d5

luotao1 assigned DanielSun11 May 20, 2025

luotao1 reviewed May 20, 2025

View reviewed changes

Update matmul_kernel_impl.h

a6c195a

crashbussy closed this May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72780

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72780

Uh oh!

crashbussy commented May 19, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

luotao1 commented May 19, 2025

Uh oh!

luotao1 commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

crashbussy commented May 20, 2025

Uh oh!

luotao1 commented May 20, 2025

Uh oh!

luotao1 May 20, 2025

Uh oh!

fangfangssj commented May 20, 2025 •

edited

Loading

Uh oh!

crashbussy commented May 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72780

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72780

Uh oh!

Conversation

crashbussy commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

所有测试用例

测试数据

Uh oh!

paddle-bot bot commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

luotao1 commented May 19, 2025

Uh oh!

luotao1 commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

crashbussy commented May 20, 2025

Uh oh!

luotao1 commented May 20, 2025

Uh oh!

luotao1 May 20, 2025

Choose a reason for hiding this comment

Uh oh!

fangfangssj commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crashbussy commented May 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crashbussy commented May 19, 2025 •

edited

Loading

fangfangssj commented May 20, 2025 •

edited

Loading