ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

ZzSean · 2021-09-07T12:55:10Z

PR types

New features

PR changes

OPs

Describe

使用 cuDNN 的 fused op 接口实现 resnet_unit_op，此 PR 为后端代码。
因为conv的计算为half类型，最多可表示小数位为3位，因此在单测中使用的阈值为1e-3.
在CI-Py3中会跑到此新增单测，结果如下

paddle-bot-old · 2021-09-07T12:55:17Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/platform/cudnn_desc.h

paddle/fluid/operators/fused/cudnn_scale_bias_add_relu.cu.h

paddle/fluid/operators/CMakeLists.txt

paddle/fluid/operators/cudnn_norm_conv_test.cu

Xreki · 2021-09-22T05:47:34Z

paddle/fluid/operators/cudnn_norm_conv_test.cu

+
+// get paddle conv2d op results as baseline
+template <typename T>
+void GetConv2DOp(const std::vector<T> &x, const std::vector<T> &w,


该函数不是为了拿到一个conv2d op，而是为了拿到conv2d op的计算结果，函数名需正确体现函数的功能。

改成 Conv2DForwardCompute

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

Xreki · 2021-09-22T06:44:51Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+  platform::FilterDescriptor filter_desc_;
+  platform::TensorDescriptor out_desc_;
+  platform::TensorDescriptor out_stats_desc_;
+  platform::ConvolutionDescriptor conv_desc_;


发现原来还有个cudnn_helper.h文件，且那个文件引用的多一些，其中有ScopedTensorDescriptor、ScopedFilterDescriptor、ScopedConvolutionDescriptor，后续PR可以考虑看这些实现接口是否可用。

看了一下这个文件，里面的这几个接口用起来限制条件更多一点，而且对于 conv 的覆盖情况不全，暂时还用cudnn_desc.h这里的接口吧

cudnn_desc.h和cudnn_helper.h功能重复，我倾向于只保留一个，后续还是可以考虑一下，有需要什么功能也可以加到cudnn_helper.h里面。

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

Xreki · 2021-09-22T06:46:08Z

paddle/fluid/operators/fused/resnet_unit_op.h

@@ -0,0 +1,95 @@
+/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


这个文件在这个PR中没有用到，不要在这个PR里面添加。

Xreki · 2021-09-22T06:47:37Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+
+  void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr,
+               T *filter_ptr, T *output_ptr, float *sum_ptr,
+               float *sum_of_squares_ptr) {


比较倾向于传Tensor，而不是裸指针。

因为最终的resnet_unit_op.cu中是三个OP组合在一起的，所以如果都传Tensor的话，有很多代码都是重复的，而传指针只需要定义一次就可以反复使用

不太理解，后续PR中再看看吧。

Xreki

LGTM. 另外，确认有哪个CI跑到了这个单测吗？

Xreki · 2021-09-22T11:00:07Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+
+  void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr,
+               T *filter_ptr, T *output_ptr, float *sum_ptr,
+               float *sum_of_squares_ptr) {


不太理解，后续PR中再看看吧。

Xreki · 2021-09-22T11:02:44Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+
+#if CUDNN_VERSION >= 8000
+template <typename T>
+class CudnnNormConvolutionOp {


该类并没有对应到一个Paddle的OP？所以不建议类名中加Op。

Xreki · 2021-09-22T11:10:10Z

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h

+  platform::FilterDescriptor filter_desc_;
+  platform::TensorDescriptor out_desc_;
+  platform::TensorDescriptor out_stats_desc_;
+  platform::ConvolutionDescriptor conv_desc_;


cudnn_desc.h和cudnn_helper.h功能重复，我倾向于只保留一个，后续还是可以考虑一下，有需要什么功能也可以加到cudnn_helper.h里面。

Xreki · 2021-09-22T11:26:58Z

paddle/fluid/operators/fused/cudnn_norm_conv_test.cc

+    kernel_size_ = 1;
+    stride_ = 1;
+    pad_ = 0;
+  }


这个默认构造函数没有必要？

Xreki · 2021-09-22T11:28:12Z

paddle/fluid/operators/fused/cudnn_norm_conv_test.cc

+    output_channels_ = output_channels;
+    kernel_size_ = kernel_size;
+    stride_ = stride;
+    pad_ = (kernel_size_ - 1) / 2;


pad确定使用这种计算的方式？是只支持这种配置？

因为只支持kernel_size=1 or 3，且输入输出的h和w保持不变，所以pad不需要外面传入，内部这样算就可以，resnet50组网中也是这样算的

Xreki · 2021-09-22T11:34:15Z

paddle/fluid/operators/fused/cudnn_norm_conv_test.cc

+    float *sum_of_squares_ptr = sum_of_squares_.mutable_data<float>(place_);
+
+    std::shared_ptr<op::CudnnNormConvolutionOp<T>> conv_op(
+        new op::CudnnNormConvolutionOp<T>());


这里直接用op::CudnnNormConvolutionOp<T> conv_op;就行了吧。

Xreki · 2021-09-22T11:36:56Z

paddle/fluid/operators/fused/cudnn_norm_conv_test.cc

+    ctx_->Wait();
+  }
+
+  void Run() {


倾向于dev_ctx通过参数传进来。

…e#35557)

Xreki reviewed Sep 8, 2021

View reviewed changes

ZzSean force-pushed the resnet_unit_op branch from e6b08e5 to 70976f5 Compare September 10, 2021 03:52

ZzSean added 8 commits September 13, 2021 09:51

ResnetUnitOp implemented by cuDNN fused op(backend code)

c2fd50f

revert

8acd90b

revert

84a4928

add new set() for tensor descriptor

4ce11e0

modify set() and delete macro

ab3c97e

fix

a29de06

add test

856d66a

add test

083598d

ZzSean force-pushed the resnet_unit_op branch from 0808640 to 083598d Compare September 13, 2021 09:51

ZzSean added 9 commits September 13, 2021 09:53

split pr

7d3a699

fix compile

4e53f04

delete const_cast for temp

5c3ee79

pass ci

cd5befc

fix

28d269a

pass ci

f6c63ac

pass ci

8f706b9

pass ci

293599e

pass ci

6c7d80e

pass ci

2d3b45d

Xreki reviewed Sep 22, 2021

View reviewed changes

ZzSean added 3 commits September 22, 2021 09:24

modify based on the review comments

c08fef2

change .cu to .cc

033d053

fix

ffefd8c

Xreki previously approved these changes Sep 22, 2021

View reviewed changes

JZ-LIANG dismissed Xreki’s stale review via ffefd8c September 22, 2021 11:57

Xreki approved these changes Sep 22, 2021

View reviewed changes

Xreki merged commit 736a738 into PaddlePaddle:develop Sep 22, 2021

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

ResnetUnitOp implemented by cuDNN fused op(backend code) (PaddlePaddl…

2e8eee9

…e#35557)

This was referenced Oct 9, 2021

Add more tests and fix bugs for cudnn_norm_conv_test and cudnn_bn_and_relu_test #36314

Merged

Add the complete code and related files of resnet_unit_op #36366

Merged

Add ResNetUnit Python API #35426

Merged

ZzSean deleted the resnet_unit_op branch October 15, 2021 06:41

sneaxiy mentioned this pull request Nov 10, 2021

MLPerf Optimization for Release/2.2 #37109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

ZzSean commented Sep 7, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 7, 2021

Xreki Sep 22, 2021

ZzSean Sep 22, 2021 •

edited

Loading

Xreki Sep 22, 2021

ZzSean Sep 22, 2021

Xreki Sep 22, 2021

Xreki Sep 22, 2021

ZzSean Sep 22, 2021

Xreki Sep 22, 2021

ZzSean Sep 22, 2021

Xreki Sep 22, 2021

Xreki left a comment

Xreki Sep 22, 2021

Xreki Sep 22, 2021

Xreki Sep 22, 2021

Xreki Sep 22, 2021

Xreki Sep 22, 2021

ZzSean Sep 22, 2021

Xreki Sep 22, 2021

Xreki Sep 22, 2021

		@@ -0,0 +1,95 @@
		/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

Conversation

ZzSean commented Sep 7, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 7, 2021

Choose a reason for hiding this comment

ZzSean Sep 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZzSean commented Sep 7, 2021 •

edited

Loading

ZzSean Sep 22, 2021 •

edited

Loading