Skip to content

Conversation

@crashbussy
Copy link
Contributor

@crashbussy crashbussy commented May 19, 2025

PR Category

Execute Infrastructure

PR Types

New features

Description

paddle.Tensor.matmul支持0-Size。

修改历程介绍如下:

在PaddleAPITest report/0size_tensor中检索paddle.Tensor.matmul的错误日志,发现[accuracy error]报错。分析可能是前向过程出错。!!!注意到(shapes (0, 100, 1, 40), (0, 100, 40) mismatch),也是就shape不匹配的问题
`2025-03-05 15:27:11.945992 test begin: paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

[accuracy error] paddle.Tensor.matmul(Tensor([0, 100, 1],"float64"), Tensor([0, 1, 40],"float64"), )

Not equal to tolerance rtol=0.01, atol=0.01

(shapes (0, 100, 1, 40), (0, 100, 40) mismatch)
x: array([], shape=(0, 100, 1, 40), dtype=float64)
y: array([], shape=(0, 100, 40), dtype=float64)
前向修复: a. 在Paddle代码中检索def matmul,发现matmul的核心实现调用的是_C_ops的matmul b. 以_C_ops的matmul在paddle/phi/ops/yaml中检索,发现matmul的InferMeta函数使用到一个: MatmulInferMeta- op: matmul
args : (Tensor x, Tensor y)
output : Tensor(out)
infer_meta :
func : MatmulInferMeta
param: [x, y, false, false]`

c. 在代码中检索MatmulInferMeta,并检查其dims(shape)的推导是否正确(在matmul中推导是正确因此不用修改)
d. 在paddle/phi/kernels中检索matmul,找全所有matmul的实现Kernel。发现共有五个涉及matmul的文件,分别为:

  • paddle/phi/kernels/cpu/matmul_kernel.cc
  • paddle/phi/kernels/gpu/matmul_kernel.cu
  • paddle/phi/kernels/impl/matmul_kernel_impl.h
  • paddle/phi/kernels/matmul_kernel.h
  • paddle/phi/kernels/gpu/weight_only_linear_grad_kernel.cu
    其中cc和cu文件均将前两个.h文件设为头文件,因此只用修改.h文件即可。而matmul_kernel.h和matmul_kernel_impl.h中不需要(不能)重复定义,故只修改了matmul_kernel_impl.h

注意,下面的文件仅以 paddle/phi/kernels/matmul_kernel.h为头文件,并不含有matmulkernel字段,所以不予以考虑

  • paddle/phi/kernels/impl/svdvals_grad_kernel_impl.h
  • paddle/phi/kernels/impl/eigvalsh_grad_kernel_impl.h
  • paddle/phi/kernels/impl/lstsq_kernel_impl.h
  • paddle/phi/kernels/impl/eigh_grad_kernel_impl.h
  • paddle/phi/kernels/impl/qr_grad_kernel_impl.h
  • paddle/phi/kernels/sparse/gpu/fused_attention_kernel.cu

在paddle/phi/kernels/impl/matmul_kernel_impl.h中删除了原来的代码(因为有错误),加入以下代码,完成修复
` if (x.numel() == 0 || y.numel() == 0) {
auto x_dims = x.dims();
auto y_dims = y.dims();

if (transpose_x && x_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(x_dims)[x_dims.size() - 1],
            const_cast<DDim&>(x_dims)[x_dims.size() - 2]);
}

if (transpose_y && y_dims.size() >= 2) {
  std::swap(const_cast<DDim&>(y_dims)[y_dims.size() - 1],
            const_cast<DDim&>(y_dims)[y_dims.size() - 2]);
}

std::vector<int64_t> x_batch_dims(x_dims.data(), x_dims.data() + x_dims.size() - 2);
std::vector<int64_t> y_batch_dims(y_dims.data(), y_dims.data() + y_dims.size() - 2);

std::vector<int64_t> bcast_dims;
if (!funcs::BroadcastTwoVec(x_batch_dims, y_batch_dims, &bcast_dims)) {
  PADDLE_THROW(phi::errors::InvalidArgument(
      "Failed to broadcast input batch dimensions."));
}

std::vector<int64_t> out_shape(bcast_dims.begin(), bcast_dims.end());

int64_t m = transpose_x ? x_dims[x_dims.size() - 1] : x_dims[x_dims.size() - 2];
int64_t n = transpose_y ? y_dims[y_dims.size() - 2] : y_dims[y_dims.size() - 1];

out_shape.push_back(m);
out_shape.push_back(n);

DDim out_dims = make_ddim(out_shape);
out->Resize(out_dims);
ctx.template Alloc<T>(out);
return;

}`

添加单测:

在test/legacy_test/test_matmul_op.py中添加0 size tensor输入的单测:
`import paddle
import numpy as np

所有测试用例

test_cases = [
# 格式: (x_shape, y_shape, expected_out_shape, dtype)
((0, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((0, 100, 1), (1, 1, 40), (0, 100, 40), "float64"),
((0, 100, 1), (1, 1, 4), (0, 100, 4), "float64"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float16"),
((0, 12, 197, 197), (0, 12, 197, 64), (0, 12, 197, 64), "float32"),
((1, 0, 1), (1, 1, 40), (1, 0, 40), "float64"),
((1, 0, 1), (1, 1, 4), (1, 0, 4), "float64"),
((1, 100, 1), (0, 1, 40), (0, 100, 40), "float64"),
((1, 100, 1), (0, 1, 4), (0, 100, 4), "float64"),
((1, 100, 1), (1, 1, 0), (1, 100, 0), "float64"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float16"),
((112, 0, 197, 197), (112, 0, 197, 64), (112, 0, 197, 64), "float32"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float16"),
((112, 12, 0, 197), (112, 12, 197, 64), (112, 12, 0, 64), "float32"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float16"),
((112, 12, 197, 197), (112, 12, 197, 0), (112, 12, 197, 0), "float32"),
]

def run_all_matmul_tests():
for idx, (x_shape, y_shape, expected_shape, dtype) in enumerate(test_cases):
print(f"\nTest {idx+1} begin: paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})")
try:
x = paddle.zeros(x_shape, dtype=dtype)
y = paddle.zeros(y_shape, dtype=dtype)
result = x.matmul(y)

        if result.shape != expected_shape:
            raise AssertionError(
                f"[accuracy error] paddle.Tensor.matmul(Tensor({x_shape}), Tensor({y_shape}), dtype={dtype})\n\n"
                f"Not equal to tolerance rtol=0.01, atol=0.01\n\n"
                f"(shapes {result.shape}, {expected_shape} mismatch)\n"
                f" x: array([], shape={x.shape}, dtype={dtype})\n"
                f" y: array([], shape={y.shape}, dtype={dtype})"
            )
        else:
            print(f"[PASS] Shape is correct: {result.shape}")
    except Exception as e:
        print(f"{e}")

if name == "main":
run_all_matmul_tests()`

备注:原来的代码存在错误,
std::vector<std::int64_t> out_dims(x_dims.size() - 1 + y_dims.size() - 1);
对于二维 matmul 是对的,但对于多维(batched)matmul 来说,这会导致错误。
正确做法应是:batch 维度取广播后的结果;最后两维做矩阵乘法;所以输出 rank 应与广播后的 batch rank 相同 + 2(最后两个维度)
重写后支持任意 rank 的输入张量(>=2)

下面提供一个最简单的示例测试
`import paddle

测试数据

x = paddle.randn([0, 100, 1]) # shape: [0, 100, 1]
y = paddle.randn([1, 1, 4]) # shape: [1, 1, 4]

result = x.matmul(y)
print(result.shape) # 输出: [0, 100, 4], 而不是原来的[0, 100, 1, 4]`

一个更鲁棒、支持广播规则、保留空维度(如 batch_size=0)的 MatmulKernel 实现版本 ,适用于多维张量的矩阵乘法(batched matmul)
matmul的0 size tensor输入的单测
@paddle-bot
Copy link

paddle-bot bot commented May 19, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label May 19, 2025
@luotao1 luotao1 added the HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务 label May 19, 2025
@crashbussy
Copy link
Contributor Author

@wanghuancoder,想请问方便的时候能否帮忙审核下这个 PR?非常感谢你的review!

@luotao1
Copy link
Contributor

luotao1 commented May 19, 2025

@DanielSun11 请看下

@luotao1
Copy link
Contributor

luotao1 commented May 19, 2025

@crashbussy 你可以看下CI日志,目前编译都没有过

@crashbussy
Copy link
Contributor Author

@luotao1 总是跟不上develop分支的版本,develop不断的更新导致编译不通过。晚上人少的时候我同步完再试试。

@crashbussy crashbussy closed this May 19, 2025
@crashbussy crashbussy reopened this May 19, 2025
@crashbussy
Copy link
Contributor Author

很奇怪,不知为何会编译不通过。查看了CI,内容很让人困惑,提示文件变更两个,但是这两个文件并不是我修改的两个文件,第一次遇到这种情况,不清楚是如何发生的。
image
image

@crashbussy
Copy link
Contributor Author

@luotao1 求助,不知道为什么编译没有通过,编译报错是一些我没有动过的文件,不清楚报错与我的修改有何关联。

@luotao1
Copy link
Contributor

luotao1 commented May 20, 2025

你本地能编译过么?
image
这里错误就是 matmul_kernel_impl.h 编译没通过

}

template <typename T, typename Context>
void MatmulWithFlattenKernelImpl(const Context& dev_ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatmulWithFlattenKernelImpl 函数删了?在2216行有用到

@fangfangssj
Copy link
Contributor

fangfangssj commented May 20, 2025

在Tensor规范化第二期的活动中有关于0-size Tensor的测试,我正好看了关于matmul的问题,#70238 这个PR中修复的有一些问题
matmul计算输出out的形状的符号推导应该放在paddle/phi/infermeta/binary.cc下的MatmulInferMeta中,而不是放在对应的kernel中,应该去除MatmulInferMeta中关于0size的检查(如下图所示),修改MatmulInferMeta函数,同时修改对应的符号推导部分,以便于可以正常计算out的形状
02a067c281bef525750d8a1eeae68cff
之后在kernel中添加对out的判断即可,如果out->numel()是0的话,申请内存返回即可
同时还需要给matmul的反向算子添加对应的0-size支持,和前向的逻辑是一样的
详细的修改方法可以参考 #72637 个人觉得这个是比较难的一个API

@crashbussy
Copy link
Contributor Author

再次修改了一次,还是发现编译通不过。感谢@fangfangssj的细致讲解,这个修改难度确实大一些,目前还没有信心能找到对应解决办法,这个pr我就关闭了。我去完成一些添加单测的任务。

@crashbussy crashbussy closed this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants