Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs #48203

Merged
merged 3 commits into from
Nov 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions paddle/fluid/operators/fused/cudnn_norm_conv.cu.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ struct NormConvolutionArgs {
int stride,
int dilation,
int group) {
PADDLE_ENFORCE_LT(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个小疑问,当初为了进一步优化ResNet50性能,使用cudnnFusedOpsPlan_t相关接口实现了多个融合算子,代码分别在cudnn_bn_stats_finalize.cu.hcudnn_norm_conv.cu.hcudnn_scale_bias_add_relu.cu.h,公共类实现在cudnn_fusion_helper.h,请问只有cudnn_norm_conv.cu.h不再支持了吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前在H100上只看到cudnn_norm_conv.cu.h相关的test挂掉,别的暂时没问题。

ctx.GetComputeCapability(),
90,
phi::errors::PreconditionNotMet(
"Expect compute compatiblity to be less than 90, but got %d. "
"CUDNN FusedOps is no longer available on H100 and later "
"devices.",
ctx.GetComputeCapability()));
PADDLE_ENFORCE_EQ(
input_shape.size(),
4U,
Expand Down
8 changes: 4 additions & 4 deletions paddle/fluid/operators/fused/cudnn_norm_conv_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@ TEST(CudnnNormConvFp16, K1S1) {
phi::GPUContext *ctx = static_cast<phi::GPUContext *>(
platform::DeviceContextPool::Instance().Get(platform::CUDAPlace(0)));

if (ctx->GetComputeCapability() < 70) {
if (ctx->GetComputeCapability() < 70 || ctx->GetComputeCapability() >= 90) {
ASSERT_THROW(test.CheckForward(1e-3, true),
paddle::platform::EnforceNotMet);
ASSERT_THROW(test.CheckBackward(1e-3, true),
Expand Down Expand Up @@ -469,7 +469,7 @@ TEST(CudnnNormConvFp16, K3S1) {
phi::GPUContext *ctx = static_cast<phi::GPUContext *>(
platform::DeviceContextPool::Instance().Get(platform::CUDAPlace(0)));

if (ctx->GetComputeCapability() < 70) {
if (ctx->GetComputeCapability() < 70 || ctx->GetComputeCapability() >= 90) {
ASSERT_THROW(test.CheckForward(1e-3, true),
paddle::platform::EnforceNotMet);
ASSERT_THROW(test.CheckBackward(1e-3, true),
Expand Down Expand Up @@ -499,7 +499,7 @@ TEST(CudnnNormConvFp16, K1S1O4) {
phi::GPUContext *ctx = static_cast<phi::GPUContext *>(
platform::DeviceContextPool::Instance().Get(platform::CUDAPlace(0)));

if (ctx->GetComputeCapability() < 70) {
if (ctx->GetComputeCapability() < 70 || ctx->GetComputeCapability() >= 90) {
ASSERT_THROW(test.CheckForward(1e-3, true),
paddle::platform::EnforceNotMet);
ASSERT_THROW(test.CheckBackward(1e-3, true),
Expand Down Expand Up @@ -529,7 +529,7 @@ TEST(CudnnNormConvFp16, K1S2O4) {
phi::GPUContext *ctx = static_cast<phi::GPUContext *>(
platform::DeviceContextPool::Instance().Get(platform::CUDAPlace(0)));

if (ctx->GetComputeCapability() <= 70) {
if (ctx->GetComputeCapability() <= 70 || ctx->GetComputeCapability() >= 90) {
ASSERT_THROW(test.CheckForward(1e-3, true),
paddle::platform::EnforceNotMet);
ASSERT_THROW(test.CheckBackward(1e-3), paddle::platform::EnforceNotMet);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@
@unittest.skipIf(
not paddle.is_compiled_with_cuda()
or paddle.get_cudnn_version() < 8000
or paddle.device.cuda.get_device_capability()[0] < 7,
or paddle.device.cuda.get_device_capability()[0] < 7
or paddle.device.cuda.get_device_capability()[0] >= 9,
"only support with cuda and cudnn version is at least 8.0 "
"and device's compute capability is at least 7.0",
"and device's compute capability is at least 7.0 and less than 9.0",
)
class TestFuseResNetUnit(unittest.TestCase):
def test_fuse_resenet_unit(self):
Expand Down