[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

crashbussy · 2025-05-19T18:38:05Z

PR Category

Execute Infrastructure

PR Types

New features

Description

isclose Tensor.isclose支持0-Size。

修改历程介绍如下：

在PaddleAPITest report/0size_tensor中检索paddle.Tensor.matmul的错误日志，发现[accuracy error]报错。分析可能是前向过程出错。！！！注意到(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)，也是就shape不匹配的问题
`2025-03-05 15:27:11.945992 test begin: paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

[accuracy error] paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

Not equal to tolerance rtol=0.01, atol=0.01

(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)
x: array([], shape=(0, 100, 1, 40), dtype=float64)
y: array([], shape=(0, 100, 40), dtype=float64)
前向修复： a. 在Paddle代码中检索def matmul，发现matmul的核心实现调用的是_C_ops的matmul b. 以_C_ops的matmul在paddle/phi/ops/yaml中检索，发现matmul的InferMeta函数使用到一个： MatmulInferMeta- op: matmul
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : MatmulInferMeta
param: [x, y, false, false]`

c. 在代码中检索MatmulInferMeta，并检查其dims(shape)的推导是否正确（在matmul中推导是正确因此不用修改）
d. 在paddle/phi/kernels中检索matmul，找全所有matmul的实现Kernel。发现共有五个涉及matmul的文件，分别为：

paddle/phi/kernels/cpu/matmul_kernel.cc
paddle/phi/kernels/gpu/matmul_kernel.cu
paddle/phi/kernels/impl/matmul_kernel_impl.h
paddle/phi/kernels/matmul_kernel.h
paddle/phi/kernels/gpu/weight_only_linear_grad_kernel.cu
其中cc和cu文件均将前两个.h文件设为头文件，因此只用修改.h文件即可。而matmul_kernel.h和matmul_kernel_impl.h中不需要（不能）重复定义，故只修改了matmul_kernel_impl.h

注意，下面的文件仅以 paddle/phi/kernels/matmul_kernel.h为头文件，并不含有matmulkernel字段，所以不予以考虑

paddle/phi/kernels/impl/svdvals_grad_kernel_impl.h
paddle/phi/kernels/impl/eigvalsh_grad_kernel_impl.h
paddle/phi/kernels/impl/lstsq_kernel_impl.h
paddle/phi/kernels/impl/eigh_grad_kernel_impl.h
paddle/phi/kernels/impl/qr_grad_kernel_impl.h
paddle/phi/kernels/sparse/gpu/fused_attention_kernel.cu

在paddle/phi/kernels/impl/matmul_kernel_impl.h中删除了原来的代码（因为有错误），加入以下代码，完成修复
` if (x.numel() == 0 || y.numel() == 0) {
auto x_dims = x.dims();
auto y_dims = y.dims();

if (transpose_x && x_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(x_dims)[x_dims.size() - 1],
            const_cast<DDim&>(x_dims)[x_dims.size() - 2]);
}

if (transpose_y && y_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(y_dims)[y_dims.size() - 1],
            const_cast<DDim&>(y_dims)[y_dims.size() - 2]);
}

std::vector<int64_t> x_batch_dims(x_dims.data(), x_dims.data() + x_dims.size() - 2);
std::vector<int64_t> y_batch_dims(y_dims.data(), y_dims.data() + y_dims.size() - 2);

std::vector<int64_t> bcast_dims;
if (!funcs::BroadcastTwoVec(x_batch_dims, y_batch_dims, &bcast_dims)) {
  PADDLE_THROW(phi::errors::InvalidArgument(
      "Failed to broadcast input batch dimensions."));
}

std::vector<int64_t> out_shape(bcast_dims.begin(), bcast_dims.end());

int64_t m = transpose_x ? x_dims[x_dims.size() - 1] : x_dims[x_dims.size() - 2];
int64_t n = transpose_y ? y_dims[y_dims.size() - 2] : y_dims[y_dims.size() - 1];

out_shape.push_back(m);
out_shape.push_back(n);

DDim out_dims = make_ddim(out_shape);
out->Resize(out_dims);
ctx.template Alloc<T>(out);
return;

}`

添加单测：

在test/legacy_test/test_matmul_op.py中添加0 size tensor输入的单测:
`import paddle
import numpy as np

所有测试用例

test_cases = [
# 格式: (x_shape, y_shape, expected_out_shape, dtype)
((0, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((0, 100, 1), (1, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (1, 1, 4), (0, 100, 4), "float64"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float16"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float32"),
((1, 0, 1), (1, 1, 40), (1, 0, 40), "float64"),
((1, 0, 1), (1, 1, 4), (1, 0, 4), "float64"),
((1, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((1, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((1, 100, 1), (1, 1, 0), (1, 100, 0), "float64"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float16"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float32"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float16"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float32"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float16"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float32"),
]

def run_all_matmul_tests():
for idx, (x_shape, y_shape, expected_shape, dtype) in enumerate(test_cases):
print(f"\nTest {idx+1} begin: paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})")
try:
x = paddle.zeros(x_shape, dtype=dtype)
y = paddle.zeros(y_shape, dtype=dtype)
result = x.matmul(y)

        if result.shape != expected_shape:
            raise AssertionError(
                f"[accuracy error] paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})\n\n"
                f"Not equal to tolerance rtol=0.01, atol=0.01\n\n"
                f"(shapes {result.shape}, {expected_shape} mismatch)\n"
                f" x: array([], shape={x.shape}, dtype={dtype})\n"
                f" y: array([], shape={y.shape}, dtype={dtype})"
            )
        else:
            print(f"[PASS] Shape is correct: {result.shape}")
    except Exception as e:
        print(f"{e}")

if name == "main":
run_all_matmul_tests()`

备注：原来的代码存在错误，
std::vector<std::int64_t> out_dims(x_dims.size() - 1 + y_dims.size() - 1);
对于二维 matmul 是对的，但对于多维（batched）matmul 来说，这会导致错误。
正确做法应是：batch 维度取广播后的结果；最后两维做矩阵乘法；所以输出 rank 应与广播后的 batch rank 相同 + 2（最后两个维度）
重写后支持任意 rank 的输入张量（>=2）

下面提供一个最简单的示例测试
`import paddle

测试数据

x = paddle.randn([0, 100, 1]) # shape: [0, 100, 1]
y = paddle.randn([1, 1, 4]) # shape: [1, 1, 4]

result = x.matmul(y)
print(result.shape) # 输出: [0, 100, 4], 而不是原来的[0, 100, 1, 4]`

paddle-bot · 2025-05-19T18:38:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

crashbussy · 2025-05-19T18:51:13Z

这是我提交的关于同一个任务的第二个pr，因为第一个pr（[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API.
#72780）中出现了令人困惑的CI日志报错，想再尝试一次，具体内容都没有变化，只是再次pr。

crashbussy added 2 commits May 20, 2025 02:34

Update matmul_kernel_impl.h

6a34638

Create test_matmul_v3_op.py

60866b5

paddle-bot bot added the contributor External developers label May 19, 2025

crashbussy added 3 commits May 20, 2025 03:08

Update matmul_kernel_impl.h

11ba28a

Update matmul_kernel_impl.h

8bf2416

Update matmul_kernel_impl.h

d42efcf

crashbussy closed this May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

paddle-bot bot commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

[0-size Tensor No.318] Add 0-size Tensor support for paddle.Tensor.matmul API. #72798

Uh oh!

Conversation

crashbussy commented May 19, 2025

PR Category

PR Types

Description

所有测试用例

测试数据

Uh oh!

paddle-bot bot commented May 19, 2025

Uh oh!

crashbussy commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant