Skip to content

Conversation

@scyyh11
Copy link
Contributor

@scyyh11 scyyh11 commented Oct 19, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

在 commit deed9d3 中,Softplus 单测下新增了 test_check_output_2test_check_grad_2 两个测试函数。
这两个函数在 GPU 单测文件中同时执行了 CPU 与 CUDA 测试逻辑:

self.check_output_with_place(paddle.CPUPlace(), check_pir=True, check_pir_onednn=True)
if core.is_compiled_with_cuda():
    self.check_output_with_place(core.CUDAPlace(0), check_pir=True, check_pir_onednn=True)

以及:

self.check_grad_with_place(paddle.CPUPlace(), ['X'], 'Out', check_pir=True, check_pir_onednn=True)
if core.is_compiled_with_cuda():
    self.check_grad_with_place(core.CUDAPlace(0), ['X'], 'Out', check_pir=True, check_pir_onednn=True)

但 Softplus 的 FP16/Complex 类型在 CPU 上未注册对应 kernel,因此在静态图或 PIR 测试时会出现:

UnimplementedError: There are no kernels which are registered in the softplus operator.

此外,这两个 _2 方法与原有 test_check_outputtest_check_grad 的逻辑完全重复,并未带来额外的测试覆盖率,仅增加了重复执行和错误风险。本文件为GPU单测,所以不应该出现CPU相关的测试,所以提出回滚。

同时增加TestSigmoid_Complex64的tolerance,避免出现如下报错。

AssertionError: 0.0065593612 not less than or equal to 0.006

@luotao1 @YqGe585

@paddle-bot
Copy link

paddle-bot bot commented Oct 19, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 19, 2025
@scyyh11
Copy link
Contributor Author

scyyh11 commented Oct 19, 2025

/re-run all-failed

@YqGe585
Copy link
Contributor

YqGe585 commented Oct 20, 2025

此PR需要 @zhengshengning review

Copy link
Contributor

@YqGe585 YqGe585 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@scyyh11
Copy link
Contributor Author

scyyh11 commented Oct 21, 2025

/re-run all-failed

@luotao1 luotao1 self-assigned this Oct 22, 2025
@luotao1 luotao1 merged commit 3e31bf4 into PaddlePaddle:develop Oct 22, 2025
170 of 190 checks passed
@scyyh11 scyyh11 deleted the fix/test_activation_op branch October 22, 2025 09:39
zrr1999 added a commit to zrr1999/Paddle that referenced this pull request Oct 29, 2025
…els/impl (#4)

* fix custom device save error (PaddlePaddle#75961)

* fix blas for custom device (PaddlePaddle#75969)

* Revert "Revert "Disable NVIDIA_TF32_OVERRIDE by default for better precision.…" (PaddlePaddle#75972)

This reverts commit 945ea69.

* [Compat] Define the macro `CHECK` only when it is not already defined (PaddlePaddle#75963)

* [DLPack] Implement dtype and device exchange protocol (PaddlePaddle#75973)

* [CppExtension] Support `os.PathLike` in `CppExtension`/`CUDAExtension` and expose `IS_WINDOWS` to `paddle.utils.cpp_extension` (PaddlePaddle#75976)

* Support md5 checksum for API output tensor (PaddlePaddle#75835)

* support md5 checksum

* fix build

* fix build

* fix build

* fix build

* dump the md5 check sum to file

* fix err

* add switch and full support md5

* add flags to control precision and refine test

* rm useless commit

* add ut

* add ut

* fix shape=int for size_args_decorator (PaddlePaddle#75983)

* fix typo disable_loggling -> disable_logging (PaddlePaddle#75978)

* fix typo disable_loggling -> disable_logging

* fix

* fix

* fix _get_arch_info (PaddlePaddle#75921)

* clean some IS_TRT_VERSION_GE(5130) (PaddlePaddle#75946)

* clean some IS_TRT_VERSION_GE(8000) (PaddlePaddle#75944)

* clean some IS_TRT_VERSION_LT(8000) (PaddlePaddle#75919)

* clean get_cuda_version < 8100 (PaddlePaddle#75895)

* clean get_cuda_version < 8100

* fix

* clean get_cuda_version() < 11020 - part (PaddlePaddle#75618)

* clean get_cuda_version() < 11020 in test_variable_length_memory_efficient_attention.py (PaddlePaddle#75600)

* clean get_cuda_version() < 11020 in test_variable_length_memory_efficient_attention.py

* fix

* clean IS_TRT_VERSION_LT(8000) in tensorrt plugin (PaddlePaddle#75920)

* fix test_dynamic_engine (PaddlePaddle#75943)

* [Bug Fix] Fix missing header include in activation_offloader.h (PaddlePaddle#75936)

* revert_mkl_num_threads (PaddlePaddle#75985)

* [Bug Fix] Improve error handling and compatibility in TensorRT engine tests (PaddlePaddle#75948)

- 在 test_tensorrt_engine_instruction.cc 里,原先直接用 TensorRT 的 `FullyConnected` 层,现在改成手工搭建 Shuffle → Constant → MatrixMultiply → ElementWise → Shuffle 的子网,等价地实现带 bias 的全连接。这样做主要是规避 TensorRT 里旧版 FC 层的限制,并能更清楚地控制动态形状和推理流程。
- 每一步都补充了更具体的 `PADDLE_ENFORCE_NOT_NULL` 抛错信息,比如提示 reshape、常量层、矩阵乘、加法等各环节可能失败的原因,便于在引擎生成失败时快速定位问题。
- 针对 TensorRT 8.6 之后 `ICudaEngine` API 的变化,新增了 `IS_TRT_VERSION_GE(8600)` 的分支,在新老版本之间分别检查 `getNbIOTensors()` 或 `getNbBindings()`,保证测试在不同 TensorRT 版本下都能正确校验。
- 动态 shape 的测试把 Shuffle 失败时的报错信息改得更精准,明确指出是运行时 shape 绑定的问题。
- 插件测试同样完善了插件创建、层加入失败时的提示,并加入了前述的 TensorRT 版本兼容检查,使调试自定义插件时的可诊断性更好。

* 4th-batch-68-代码梯度计算错误 (PaddlePaddle#75787)

* 1013

* 1015

* 1015

* 1015

* 1015

* 1015

* 1016

* 1016

* 1017

* Revert test_activation_op.py to fix bug caused by commit deed9d3 (PaddlePaddle#75937)

* Revert test_activation_op.py to fix bug caused by commit deed9d3

* fix: Update max_relative_error in TestSigmoid_Complex64 to improve gradient checking accuracy

* 4th-batch-19-代码调用错误 (PaddlePaddle#75759)

* 1012

* 1014

* 1014

* 1016

* 1016

* 1017

* 1017

* 1018

* 1018

* 4th-batch-17-代码限制多设备场景(补充修复) (PaddlePaddle#75959)

* 1012

* 1012

* 1020

* 【UnitTestFix No.3】fix test_conv3d_transpose_op.py (PaddlePaddle#75945)

* [Bug Fix] add missing header include in ir_context.h (PaddlePaddle#75927)

* add tensorrt 10 support int64 (PaddlePaddle#75951)

* add tensorrt 10 support int64

* fix

* [Compat] Try import `tvm_ffi` when enable torch proxy (PaddlePaddle#75991)

* clean pip3.8 in Dockerfile.develop.npu (PaddlePaddle#75893)

* clean pip3.8 in Dockerfile.develop.npu

* fix

* fix

* fix masked_fill_grad value_grad bug (PaddlePaddle#75988)

* 4th-batch-20-代码存在未被使用的变量 (PaddlePaddle#75761)

* 1012

* 1014

* 1014

* 1016

* 1016

* 1017

* 1017

* 1018

* 1018

* use op_test.get_cuda_version (PaddlePaddle#75994)

* merge ifdef PADDLE_WITH_CUDA in build_strategy.cc (PaddlePaddle#75962)

* [Cherry-pick] Optimize FlashMask v3 performance (PaddlePaddle#75737) (PaddlePaddle#75984)

* Optimize FlashMask v3 performance (PaddlePaddle#75737)

* tune bwd tile size

* tune bwd tile size for seqlen <= 8192

* fix cuda 700 cause by incorrect bwd tile size

* set scheduler_needs_semaphore to true

* update fa submodule

* update fa submodule

* update fa submodule

* update fa submodule

* fix codestyle

* Revert "fix codestyle"

This reverts commit e14a08e.

* fix mistach tile size in phi, and refine bwd interface

* refine

* refine

* fix codestyle

* [Stride] Disable Split Stride Kernel (PaddlePaddle#75987)

* [Stride] Disable Split Stride Kernel

* refine

* [Bug Fix] Fix NaN/Inf check to support float16, bfloat16, and complex types (PaddlePaddle#75935)

- 在 nan_inf_utils_detail.h 里把 `TensorCheckerVisitor::apply` 拆成几类模板重载:整型继续直接跳过;标准浮点数走原来的检查;新增了对 `phi::dtype::float16`、`phi::dtype::bfloat16` 的专门分支,以及对复数类型的分支,并为其它不支持的类型打印明确的 `VLOG`。这样半精度、bfloat16 等之前没法依靠 `std::is_floating_point` 判定的类型也能被纳入 NaN/Inf 检查。
- 新增头文件 `<typeinfo>`、`float16.h`、`bfloat16.h` 是为了支撑上述新分支里的类型别名和 `typeid` 输出。
- 把原先分散在 `apply` 里的检查逻辑抽成了私有的 `do_check`,并把获取 `DeviceContext` 的指针改成 `const Context*`,减少代码重复同时保证不会误改上下文。
- 新增的“跳过未支持类型”的日志可以帮助调试:遇到自定义或未覆盖的数据类型时,会直接在 VLOG 中报出具体类型名字,方便扩展。

* [Stride] Optimizing H2D Copy by TensorIterator and OpenMP (PaddlePaddle#75192)

* cpu init

* v1

* final

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* [Precision Depth Alignment] implement torch compatible max_pool2d grad kernel (PaddlePaddle#75965)

* add torch_compatible_pool_grad

* add test

* update

* rename flag

* fix to_tensor bug (PaddlePaddle#76000)

* [CINN] Fix bug of infer_symbol_shape for crop op (PaddlePaddle#75992)

* fix bug of infer_symbol_shape for crop op

* fix unittest

* 【CUDA Kernel No.93】psroi_pool_grad_kernel算子修复 (PaddlePaddle#75938)

* fix psroi_pool_grad_kernel.cu

* fix psroi_pool_grad_kernel.cu header include order

* fix win32 rms_norm. (PaddlePaddle#76007)

* Update check_approval.sh (PaddlePaddle#76012)

* Update check_approval.sh

* Update check_approval.sh

* [Fix] log sigmoid complex (PaddlePaddle#75953)

* feature: Add specialized LogSigmoidFunctor and CudaLogSigmoidFunctor for complex numbers

This commit introduces specialized implementations of LogSigmoidFunctor and CudaLogSigmoidFunctor to handle complex number inputs. The new implementations utilize direct formulas for improved accuracy and stability in calculations involving complex types.

* refactor: Optimize LogSigmoidFunctor and CudaLogSigmoidFunctor for complex types by caching exp(-x) to reduce redundant computations. This change enhances performance while maintaining accuracy in calculations.

* refactor: modified the formula in LogSigmoidFunctor to make it numerical stable

* [PHI] Flash Attention V3 128B aligned chunking load/store (PaddlePaddle#76003)

* [PHI] Flash Attention V3 128B aligned chunking load/store

* Update flashattn version

* [Slice] Fix big tensor (PaddlePaddle#76004)

* fix python version in ci/utils.sh (PaddlePaddle#75997)

* clean pip3.8 in Dockerfile.develop.dtk (PaddlePaddle#75738)

* fix repeat IS_TRT_VERSION_GE (PaddlePaddle#75975)

* clean IS_TRT_VERSION_GE(5000)  (PaddlePaddle#75990)

* clean IS_TRT_VERSION_GE(5000)

* ci

* Initial plan

* Fix int32 overflow in elementwise_grad_kernel_impl.h

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in accuracy_check and isclose kernel impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in renorm, unstack, kldiv, and svdvals_grad impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in gumbel_softmax and kldiv_loss impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in lrn and frame kernel impl

Co-authored-by: zrr1999 <[email protected]>

* Fix function signatures in lrn_kernel_impl to match int64_t parameters

Co-authored-by: zrr1999 <[email protected]>

* Add validation checks for large tensor support in LRN kernels

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in stft and fold/unfold kernel impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in lstm, lstsq, qr_grad, and spectral_norm_grad impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in warpctc, warprnnt, gru_unit and spectral_norm impl

Co-authored-by: zrr1999 <[email protected]>

* Fix int32 overflow in svd_grad and conv kernel impl

Co-authored-by: zrr1999 <[email protected]>

---------

Co-authored-by: Yuqiang Ge <[email protected]>
Co-authored-by: Zhaowu Pan <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: Nyakku Shigure <[email protected]>
Co-authored-by: SUN Dong <[email protected]>
Co-authored-by: HydrogenSulfate <[email protected]>
Co-authored-by: Runming Xie <[email protected]>
Co-authored-by: zhengshengning <[email protected]>
Co-authored-by: fanhaoxuee <[email protected]>
Co-authored-by: Bvicii <[email protected]>
Co-authored-by: Chen Zhiyang <[email protected]>
Co-authored-by: umiswing <[email protected]>
Co-authored-by: Eddie-Wang <[email protected]>
Co-authored-by: Zhan Rongrui <[email protected]>
Co-authored-by: wanghuancoder <[email protected]>
Co-authored-by: zyfncg <[email protected]>
Co-authored-by: xxiu1 <[email protected]>
Co-authored-by: Tao Luo <[email protected]>
Co-authored-by: Qianyue He <[email protected]>
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants