-
Notifications
You must be signed in to change notification settings - Fork 5.9k
clean some IS_TRT_VERSION_LT(8000) #75919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
你的PR提交成功,感谢你对开源项目的贡献! |
luotao1
approved these changes
Oct 22, 2025
Contributor
luotao1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from @yuanlehome
zrr1999
added a commit
to zrr1999/Paddle
that referenced
this pull request
Oct 29, 2025
…els/impl (#4) * fix custom device save error (PaddlePaddle#75961) * fix blas for custom device (PaddlePaddle#75969) * Revert "Revert "Disable NVIDIA_TF32_OVERRIDE by default for better precision.…" (PaddlePaddle#75972) This reverts commit 945ea69. * [Compat] Define the macro `CHECK` only when it is not already defined (PaddlePaddle#75963) * [DLPack] Implement dtype and device exchange protocol (PaddlePaddle#75973) * [CppExtension] Support `os.PathLike` in `CppExtension`/`CUDAExtension` and expose `IS_WINDOWS` to `paddle.utils.cpp_extension` (PaddlePaddle#75976) * Support md5 checksum for API output tensor (PaddlePaddle#75835) * support md5 checksum * fix build * fix build * fix build * fix build * dump the md5 check sum to file * fix err * add switch and full support md5 * add flags to control precision and refine test * rm useless commit * add ut * add ut * fix shape=int for size_args_decorator (PaddlePaddle#75983) * fix typo disable_loggling -> disable_logging (PaddlePaddle#75978) * fix typo disable_loggling -> disable_logging * fix * fix * fix _get_arch_info (PaddlePaddle#75921) * clean some IS_TRT_VERSION_GE(5130) (PaddlePaddle#75946) * clean some IS_TRT_VERSION_GE(8000) (PaddlePaddle#75944) * clean some IS_TRT_VERSION_LT(8000) (PaddlePaddle#75919) * clean get_cuda_version < 8100 (PaddlePaddle#75895) * clean get_cuda_version < 8100 * fix * clean get_cuda_version() < 11020 - part (PaddlePaddle#75618) * clean get_cuda_version() < 11020 in test_variable_length_memory_efficient_attention.py (PaddlePaddle#75600) * clean get_cuda_version() < 11020 in test_variable_length_memory_efficient_attention.py * fix * clean IS_TRT_VERSION_LT(8000) in tensorrt plugin (PaddlePaddle#75920) * fix test_dynamic_engine (PaddlePaddle#75943) * [Bug Fix] Fix missing header include in activation_offloader.h (PaddlePaddle#75936) * revert_mkl_num_threads (PaddlePaddle#75985) * [Bug Fix] Improve error handling and compatibility in TensorRT engine tests (PaddlePaddle#75948) - 在 test_tensorrt_engine_instruction.cc 里,原先直接用 TensorRT 的 `FullyConnected` 层,现在改成手工搭建 Shuffle → Constant → MatrixMultiply → ElementWise → Shuffle 的子网,等价地实现带 bias 的全连接。这样做主要是规避 TensorRT 里旧版 FC 层的限制,并能更清楚地控制动态形状和推理流程。 - 每一步都补充了更具体的 `PADDLE_ENFORCE_NOT_NULL` 抛错信息,比如提示 reshape、常量层、矩阵乘、加法等各环节可能失败的原因,便于在引擎生成失败时快速定位问题。 - 针对 TensorRT 8.6 之后 `ICudaEngine` API 的变化,新增了 `IS_TRT_VERSION_GE(8600)` 的分支,在新老版本之间分别检查 `getNbIOTensors()` 或 `getNbBindings()`,保证测试在不同 TensorRT 版本下都能正确校验。 - 动态 shape 的测试把 Shuffle 失败时的报错信息改得更精准,明确指出是运行时 shape 绑定的问题。 - 插件测试同样完善了插件创建、层加入失败时的提示,并加入了前述的 TensorRT 版本兼容检查,使调试自定义插件时的可诊断性更好。 * 4th-batch-68-代码梯度计算错误 (PaddlePaddle#75787) * 1013 * 1015 * 1015 * 1015 * 1015 * 1015 * 1016 * 1016 * 1017 * Revert test_activation_op.py to fix bug caused by commit deed9d3 (PaddlePaddle#75937) * Revert test_activation_op.py to fix bug caused by commit deed9d3 * fix: Update max_relative_error in TestSigmoid_Complex64 to improve gradient checking accuracy * 4th-batch-19-代码调用错误 (PaddlePaddle#75759) * 1012 * 1014 * 1014 * 1016 * 1016 * 1017 * 1017 * 1018 * 1018 * 4th-batch-17-代码限制多设备场景(补充修复) (PaddlePaddle#75959) * 1012 * 1012 * 1020 * 【UnitTestFix No.3】fix test_conv3d_transpose_op.py (PaddlePaddle#75945) * [Bug Fix] add missing header include in ir_context.h (PaddlePaddle#75927) * add tensorrt 10 support int64 (PaddlePaddle#75951) * add tensorrt 10 support int64 * fix * [Compat] Try import `tvm_ffi` when enable torch proxy (PaddlePaddle#75991) * clean pip3.8 in Dockerfile.develop.npu (PaddlePaddle#75893) * clean pip3.8 in Dockerfile.develop.npu * fix * fix * fix masked_fill_grad value_grad bug (PaddlePaddle#75988) * 4th-batch-20-代码存在未被使用的变量 (PaddlePaddle#75761) * 1012 * 1014 * 1014 * 1016 * 1016 * 1017 * 1017 * 1018 * 1018 * use op_test.get_cuda_version (PaddlePaddle#75994) * merge ifdef PADDLE_WITH_CUDA in build_strategy.cc (PaddlePaddle#75962) * [Cherry-pick] Optimize FlashMask v3 performance (PaddlePaddle#75737) (PaddlePaddle#75984) * Optimize FlashMask v3 performance (PaddlePaddle#75737) * tune bwd tile size * tune bwd tile size for seqlen <= 8192 * fix cuda 700 cause by incorrect bwd tile size * set scheduler_needs_semaphore to true * update fa submodule * update fa submodule * update fa submodule * update fa submodule * fix codestyle * Revert "fix codestyle" This reverts commit e14a08e. * fix mistach tile size in phi, and refine bwd interface * refine * refine * fix codestyle * [Stride] Disable Split Stride Kernel (PaddlePaddle#75987) * [Stride] Disable Split Stride Kernel * refine * [Bug Fix] Fix NaN/Inf check to support float16, bfloat16, and complex types (PaddlePaddle#75935) - 在 nan_inf_utils_detail.h 里把 `TensorCheckerVisitor::apply` 拆成几类模板重载:整型继续直接跳过;标准浮点数走原来的检查;新增了对 `phi::dtype::float16`、`phi::dtype::bfloat16` 的专门分支,以及对复数类型的分支,并为其它不支持的类型打印明确的 `VLOG`。这样半精度、bfloat16 等之前没法依靠 `std::is_floating_point` 判定的类型也能被纳入 NaN/Inf 检查。 - 新增头文件 `<typeinfo>`、`float16.h`、`bfloat16.h` 是为了支撑上述新分支里的类型别名和 `typeid` 输出。 - 把原先分散在 `apply` 里的检查逻辑抽成了私有的 `do_check`,并把获取 `DeviceContext` 的指针改成 `const Context*`,减少代码重复同时保证不会误改上下文。 - 新增的“跳过未支持类型”的日志可以帮助调试:遇到自定义或未覆盖的数据类型时,会直接在 VLOG 中报出具体类型名字,方便扩展。 * [Stride] Optimizing H2D Copy by TensorIterator and OpenMP (PaddlePaddle#75192) * cpu init * v1 * final * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * [Precision Depth Alignment] implement torch compatible max_pool2d grad kernel (PaddlePaddle#75965) * add torch_compatible_pool_grad * add test * update * rename flag * fix to_tensor bug (PaddlePaddle#76000) * [CINN] Fix bug of infer_symbol_shape for crop op (PaddlePaddle#75992) * fix bug of infer_symbol_shape for crop op * fix unittest * 【CUDA Kernel No.93】psroi_pool_grad_kernel算子修复 (PaddlePaddle#75938) * fix psroi_pool_grad_kernel.cu * fix psroi_pool_grad_kernel.cu header include order * fix win32 rms_norm. (PaddlePaddle#76007) * Update check_approval.sh (PaddlePaddle#76012) * Update check_approval.sh * Update check_approval.sh * [Fix] log sigmoid complex (PaddlePaddle#75953) * feature: Add specialized LogSigmoidFunctor and CudaLogSigmoidFunctor for complex numbers This commit introduces specialized implementations of LogSigmoidFunctor and CudaLogSigmoidFunctor to handle complex number inputs. The new implementations utilize direct formulas for improved accuracy and stability in calculations involving complex types. * refactor: Optimize LogSigmoidFunctor and CudaLogSigmoidFunctor for complex types by caching exp(-x) to reduce redundant computations. This change enhances performance while maintaining accuracy in calculations. * refactor: modified the formula in LogSigmoidFunctor to make it numerical stable * [PHI] Flash Attention V3 128B aligned chunking load/store (PaddlePaddle#76003) * [PHI] Flash Attention V3 128B aligned chunking load/store * Update flashattn version * [Slice] Fix big tensor (PaddlePaddle#76004) * fix python version in ci/utils.sh (PaddlePaddle#75997) * clean pip3.8 in Dockerfile.develop.dtk (PaddlePaddle#75738) * fix repeat IS_TRT_VERSION_GE (PaddlePaddle#75975) * clean IS_TRT_VERSION_GE(5000) (PaddlePaddle#75990) * clean IS_TRT_VERSION_GE(5000) * ci * Initial plan * Fix int32 overflow in elementwise_grad_kernel_impl.h Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in accuracy_check and isclose kernel impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in renorm, unstack, kldiv, and svdvals_grad impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in gumbel_softmax and kldiv_loss impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in lrn and frame kernel impl Co-authored-by: zrr1999 <[email protected]> * Fix function signatures in lrn_kernel_impl to match int64_t parameters Co-authored-by: zrr1999 <[email protected]> * Add validation checks for large tensor support in LRN kernels Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in stft and fold/unfold kernel impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in lstm, lstsq, qr_grad, and spectral_norm_grad impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in warpctc, warprnnt, gru_unit and spectral_norm impl Co-authored-by: zrr1999 <[email protected]> * Fix int32 overflow in svd_grad and conv kernel impl Co-authored-by: zrr1999 <[email protected]> --------- Co-authored-by: Yuqiang Ge <[email protected]> Co-authored-by: Zhaowu Pan <[email protected]> Co-authored-by: co63oc <[email protected]> Co-authored-by: Nyakku Shigure <[email protected]> Co-authored-by: SUN Dong <[email protected]> Co-authored-by: HydrogenSulfate <[email protected]> Co-authored-by: Runming Xie <[email protected]> Co-authored-by: zhengshengning <[email protected]> Co-authored-by: fanhaoxuee <[email protected]> Co-authored-by: Bvicii <[email protected]> Co-authored-by: Chen Zhiyang <[email protected]> Co-authored-by: umiswing <[email protected]> Co-authored-by: Eddie-Wang <[email protected]> Co-authored-by: Zhan Rongrui <[email protected]> Co-authored-by: wanghuancoder <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: xxiu1 <[email protected]> Co-authored-by: Tao Luo <[email protected]> Co-authored-by: Qianyue He <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
User Experience
PR Types
Others
Description
clean some IS_TRT_VERSION_LT(8000)