Skip to content

Conversation

@WanRui37
Copy link
Contributor

@WanRui37 WanRui37 commented Sep 22, 2025

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

  1. 补充完整了缺失的public_python_api
  2. 补充完整了确实的prim_type
  3. 删除不必要的mean_all
  4. 删除不必要的mean_wrapper和reduce的类
  5. 修改以complex64complex128为type,因为输入存在nan,造成grad报错的问题
  6. 修复float64为type,因为输入存在nan,造成grad报错的问题
  • 后续需要一定的优化,用继承的方法简化代码

@paddle-bot
Copy link

paddle-bot bot commented Sep 22, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Sep 22, 2025
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Sep 23, 2025
self.outputs = {'Out': out_np}


class TestMeanOp_Int32ZeroSize(OpTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以解释一下这几个case为什么要删除吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我印象里面是说int类型grad不支持,我后续继续完善


def test_checkout_grad(self):
self.check_grad(['X'], 'Out', check_pir=True, check_prim_pir=True)
place = core.CUDAPlace(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尽量使用op_test中的get_device_place来获取place,这样单测可以在不会影响GPU的正确性的情况下支持更多硬件。

Copy link
Contributor Author

@WanRui37 WanRui37 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的导师

or not core.is_float16_supported(get_device_place()),
"core is not compiled with CUDA",
)
class TestReduceMeanOp(OpTest):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要删除ReduceMean相关单测

Copy link
Contributor Author

@WanRui37 WanRui37 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的导师,后续我会继续完善修改好

@WanRui37
Copy link
Contributor Author

WanRui37 commented Sep 24, 2025

研发导师您好

  1. 删除TestMeanOp_Int32ZeroSizeInt类型的ZeroSize原因:

    • mean_all的kernel没有添加int类型的支持,所以会报如下的错误
    NotFoundError: The kernel (mean) with key (GPU, Undefined(AnyLayout), int64) is not found and GPU kernel cannot fallback to CPU one
    
    • 添加了之后还是对不上,会出现
    x: array(-9223372036854775808, dtype=int64)
    y: array(nan)
    

    因为nan是float类型的特殊值,int类型只能取到最小值代替,不能这样比较,所以删除了

  2. 已经用 get_device_place 替换 place = core.CUDAPlace(0)

  3. Reduce已添加,把check_prim修改了一下

@YqGe585

@luotao1
Copy link
Contributor

luotao1 commented Sep 24, 2025

助教您好

是研发导师😄

@WanRui37
Copy link
Contributor Author

研发导师您好,我称呼搞错了,不好意思

@YqGe585
Copy link
Member

YqGe585 commented Sep 24, 2025

助教您好

  1. 删除TestMeanOp_Int32ZeroSizeInt类型的ZeroSize原因:

    • mean_all的kernel没有添加int类型的支持,所以会报如下的错误
    NotFoundError: The kernel (mean) with key (GPU, Undefined(AnyLayout), int64) is not found and GPU kernel cannot fallback to CPU one
    
    • 添加了之后还是对不上,会出现
    x: array(-9223372036854775808, dtype=int64)
    y: array(nan)
    

    因为nan是float类型的特殊值,int类型只能取到最小值代替,不能这样比较,所以删除了

  2. 已经用 get_device_place 替换 place = core.CUDAPlace(0)

  3. Reduce已添加,把check_prim修改了一下

@YqGe585

明白。那可以删除int zero-size的case。kernel中的数据类型就不要做修改了。

@WanRui37
Copy link
Contributor Author

助教您好

  1. 删除TestMeanOp_Int32ZeroSizeInt类型的ZeroSize原因:

    • mean_all的kernel没有添加int类型的支持,所以会报如下的错误
    NotFoundError: The kernel (mean) with key (GPU, Undefined(AnyLayout), int64) is not found and GPU kernel cannot fallback to CPU one
    
    • 添加了之后还是对不上,会出现
    x: array(-9223372036854775808, dtype=int64)
    y: array(nan)
    

    因为nan是float类型的特殊值,int类型只能取到最小值代替,不能这样比较,所以删除了

  2. 已经用 get_device_place 替换 place = core.CUDAPlace(0)

  3. Reduce已添加,把check_prim修改了一下

@YqGe585

明白。那可以删除int zero-size的case。kernel中的数据类型就不要做修改了。

好的导师,等CI过了之后,我就把数据类型的添加做一个删除

@WanRui37
Copy link
Contributor Author

研发导师您好,上述2个CI错误都与mean无关,我后续是否只要删除kernel中的int数据类型就可以了?

- CI / Linux-DCU / Test (pull_request)
```
test_no_grad (Failed)
========================================
There are failed tests, which have been executed re-run,but success rate is less than 50%:
Summary Failed Tests... 
========================================
The following tests FAILED: 
                1121 - test_cdist (Timeout)
                497 - test_standalone_cross_step_overlap (Timeout)
Error: Process completed with exit code 8.
```
- CI-Build / Slice / Slice test (pull_request)
```
slice测试失败, 存在性能下降case, 失败case性能变化: {'Setitem - forward - Scalar - Tuple of Integers - float16 - paddle': -0.3087517129091814}
Update successful
Traceback (most recent call last):
File "/paddle/PaddleTest/framework/slice_benchmark/run.py", line 224, in <module>
    test.ci_test()
File "/paddle/PaddleTest/framework/slice_benchmark/run.py", line 164, in ci_test
    raise Exception("slice测试失败")
Exception: slice测试失败
```

@YqGe585
Copy link
Member

YqGe585 commented Sep 25, 2025

是的,删除掉类型之后,重新commit触发一下CI吧,有可能是某些随机的原因导致CI失败,应该与你的修改无关。后续如果还失败,可以尝试comment:/re-run all-failed,来触发失败的CI流水线。如果仍然失败,那么需要看一下是哪里的修改导致的。

@WanRui37
Copy link
Contributor Author

是的,删除掉类型之后,重新commit触发一下CI吧,有可能是某些随机的原因导致CI失败,应该与你的修改无关。后续如果还失败,可以尝试comment:/re-run all-failed,来触发失败的CI流水线。如果仍然失败,那么需要看一下是哪里的修改导致的。

谢谢导师,我已重新commit

Copy link
Member

@YqGe585 YqGe585 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 7fb1efb into PaddlePaddle:develop Sep 26, 2025
70 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers HappyOpenSource 快乐开源活动issue与PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants