Skip to content

Conversation

@xiaohajiayou
Copy link
Contributor

@xiaohajiayou xiaohajiayou commented Sep 7, 2025

PR Category

Operator Mechanism

PR Types

Bug Fixes

Description

问题描述

  • 5_accuracy/accuracy_gpu_error.txt 中,No.17 paddle.nn.functional.conv2d这个接口所有测试用例都无法复现,可能已经被别人修复?
  • 5_accuracy/accuracy_cpu_error.txt 中, No.18 paddle.nn.functional.conv2d_transpose这个接口,同样大多数测试用例无法复现。但少数使用 data_format="NHWC"padding >0 的用例仍存在问题。
  • 修复 conv2d_transpose API 在 NHWC 数据格式下的梯度计算错误。当使用 data_format="NHWC"padding > 0 时,反向传播的梯度会出现错误的行偏移,导致梯度传播到错误的位置。

根本原因

问题出现在 paddle/phi/kernels/funcs/im2col_cfo_cpu.h 文件的 im2col_sh1sw1dh1dw1ph1pw1 函数中。NHWC
分支使用了错误的索引计算:

// 错误的代码
im_data[(((oh - plh > 0 ? oh - plh : 0) + kh) * im_width + ...) * im_channels + ic]

当 oh - plh < 0 时(padding 区域),表达式 (oh - plh > 0 ? oh - plh : 0) 会错误地将负索引映射到
0,导致梯度传播到错误的位置。

修复方案:直接计算正确的索引并进行边界检查:

// 修复后的代码
int im_row = oh - plh + kh;
int im_col = kw - plw + kow;
if (im_row >= 0 && im_row < im_height && im_col >= 0 && im_col < im_width) {
    dst_data[...] = im_data[(im_row * im_width + im_col) * im_channels + ic];
} else {
    dst_data[...] = static_cast<T>(0);
}

修改细节的说明:
1. 修改文件: paddle/phi/kernels/funcs/im2col_cfo_cpu.h
2. 修改位置: im2col_sh1sw1dh1dw1ph1pw1 函数中的 4 个 NHWC 分支
3. 修改内容: 将错误的三元运算符索引计算替换为直接计算加边界检查
4. 影响范围: 仅影响 CPU 上 NHWC 格式的 conv2d_transpose 反向传播
5. 新增测试: 添加 TestWithSAMEPad_NHWC 和 TestWithSAMEPadGroups_NHWC 测试用例

单测以及PaddleAPITest回测的结果

单元测试结果:

  • 运行 python test_conv2d_transpose_op.py: 195个测试中194个通过(self.assertRaises(AssertionError, error_0_filter_number) 这个断言未通过,但似乎与本问题无关)
  • 核心测试 TestWithGroups_NHWC: ✅ 通过
  • 新增测试 TestWithSAMEPad_NHWC: ✅ 通过
  • 新增测试 TestWithSAMEPadGroups_NHWC: ✅ 通过
  • 所有NHWC相关测试均通过,无功能回归

PaddleAPITest回测结果:

  • 测试文件: 5_accuracy/accuracy_cpu_error.txt 中所有conv2d_transpose用例
  • 修复前: NHWC + padding > 0 的配置失败
  • 修复后: 所有测试配置通过
2da511ae88e78187fbafd08f344cced

@paddle-bot
Copy link

paddle-bot bot commented Sep 7, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Sep 7, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot paddle-bot bot added the contributor External developers label Sep 7, 2025
@xiaohajiayou xiaohajiayou changed the title [Accuracy diff No.114] Fix accuracy diff for conv2d_transpose API with NHWC format 【Hackathon 9th No.17、18】 Fix accuracy diff for conv2d_transpose API with NHWC format Sep 7, 2025
…format

Fix gradient calculation error in conv2d_transpose when using NHWC format
with padding > 0. The issue was in im2col_cfo_cpu.h where incorrect index
calculation caused gradients to be shifted to wrong positions.

Key changes:
- Replace incorrect ternary operator index calculation with direct
  calculation and boundary checking in NHWC branches
- Add TestWithSAMEPad_NHWC and TestWithSAMEPadGroups_NHWC test cases
- Ensure gradients match PyTorch reference implementation
- Fix code formatting to meet clang-format requirements
@xiaohajiayou xiaohajiayou force-pushed the fix-conv2d-transpose-nhwc-gradient branch from 7d47d08 to 49c8e68 Compare September 8, 2025 14:19
@xiaohajiayou
Copy link
Contributor Author

/re-run all-failed

@lshpku lshpku merged commit 254b277 into PaddlePaddle:develop Sep 10, 2025
103 of 107 checks passed
@luotao1
Copy link
Contributor

luotao1 commented Sep 10, 2025

hi, @xiaohajiayou

  • 非常感谢你对飞桨的贡献,我们正在运营一个PFCC组织。PFCC是飞桨开源的贡献者俱乐部,只有给飞桨合入过代码的开发者才能加入,俱乐部里每两周会有一次例会(按兴趣参加),也会时不时办线下meetup面基,详情可见 https://github.com/luotao1 主页说明。
  • 如果你对PFCC有兴趣,请发送邮件至 [email protected],我们会邀请你加入~

@luotao1
Copy link
Contributor

luotao1 commented Sep 12, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants