[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Execute Infrastructure
PR Types
New features
Description
isclose Tensor.isclose支持0-Size。
修改历程介绍如下:
在PaddleAPITest report/0size_tensor中检索paddle.Tensor.matmul的错误日志,发现[accuracy error]报错。分析可能是前向过程出错。!!!注意到(shapes (0, 100, 1, 40), (0, 100, 40) mismatch),也是就shape不匹配的问题
`2025-03-05 15:27:11.945992 test begin: paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )
[accuracy error] paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )
Not equal to tolerance rtol=0.01, atol=0.01
(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)
x: array([], shape=(0, 100, 1, 40), dtype=float64)
y: array([], shape=(0, 100, 40), dtype=float64)
前向修复: a. 在Paddle代码中检索def matmul,发现matmul的核心实现调用的是_C_ops的matmul b. 以_C_ops的matmul在paddle/phi/ops/yaml中检索,发现matmul的InferMeta函数使用到一个: MatmulInferMeta- op: matmulargs : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : MatmulInferMeta
param: [x, y, false, false]`
c. 在代码中检索MatmulInferMeta,并检查其dims(shape)的推导是否正确(在matmul中推导是正确因此不用修改)
d. 在paddle/phi/kernels中检索matmul,找全所有matmul的实现Kernel。发现共有五个涉及matmul的文件,分别为:
其中cc和cu文件均将前两个.h文件设为头文件,因此只用修改.h文件即可。而matmul_kernel.h和matmul_kernel_impl.h中不需要(不能)重复定义,故只修改了matmul_kernel_impl.h
注意,下面的文件仅以 paddle/phi/kernels/matmul_kernel.h为头文件,并不含有matmulkernel字段,所以不予以考虑
在paddle/phi/kernels/impl/matmul_kernel_impl.h中删除了原来的代码(因为有错误),加入以下代码,完成修复
` if (x.numel() == 0 || y.numel() == 0) {
auto x_dims = x.dims();
auto y_dims = y.dims();
}`
添加单测:
在test/legacy_test/test_matmul_op.py中添加0 size tensor输入的单测:
`import paddle
import numpy as np
所有测试用例
test_cases = [
# 格式: (x_shape, y_shape, expected_out_shape, dtype)
((0, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((0, 100, 1), (1, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (1, 1, 4), (0, 100, 4), "float64"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float16"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float32"),
((1, 0, 1), (1, 1, 40), (1, 0, 40), "float64"),
((1, 0, 1), (1, 1, 4), (1, 0, 4), "float64"),
((1, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((1, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((1, 100, 1), (1, 1, 0), (1, 100, 0), "float64"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float16"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float32"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float16"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float32"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float16"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float32"),
]
def run_all_matmul_tests():
for idx, (x_shape, y_shape, expected_shape, dtype) in enumerate(test_cases):
print(f"\nTest {idx+1} begin: paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})")
try:
x = paddle.zeros(x_shape, dtype=dtype)
y = paddle.zeros(y_shape, dtype=dtype)
result = x.matmul(y)
if name == "main":
run_all_matmul_tests()`
备注:原来的代码存在错误,
std::vector<std::int64_t> out_dims(x_dims.size() - 1 + y_dims.size() - 1);对于二维 matmul 是对的,但对于多维(batched)matmul 来说,这会导致错误。
正确做法应是:batch 维度取广播后的结果;最后两维做矩阵乘法;所以输出 rank 应与广播后的 batch rank 相同 + 2(最后两个维度)
重写后支持任意 rank 的输入张量(>=2)
下面提供一个最简单的示例测试
`import paddle
测试数据
x = paddle.randn([0, 100, 1]) # shape: [0, 100, 1]
y = paddle.randn([1, 1, 4]) # shape: [1, 1, 4]
result = x.matmul(y)
print(result.shape) # 输出: [0, 100, 4], 而不是原来的[0, 100, 1, 4]`