Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResnetUnitOp implemented by cuDNN fused op(backend code) #35557

Merged
merged 21 commits into from
Sep 22, 2021

Conversation

ZzSean
Copy link
Contributor

@ZzSean ZzSean commented Sep 7, 2021

PR types

New features

PR changes

OPs

Describe

使用 cuDNN 的 fused op 接口实现 resnet_unit_op,此 PR 为后端代码。
因为conv的计算为half类型,最多可表示小数位为3位,因此在单测中使用的阈值为1e-3.
在CI-Py3中会跑到此新增单测,结果如下
image

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 7, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle/fluid/operators/CMakeLists.txt Outdated Show resolved Hide resolved
paddle/fluid/operators/cudnn_norm_conv_test.cu Outdated Show resolved Hide resolved
paddle/fluid/operators/cudnn_norm_conv_test.cu Outdated Show resolved Hide resolved
paddle/fluid/operators/cudnn_norm_conv_test.cu Outdated Show resolved Hide resolved

// get paddle conv2d op results as baseline
template <typename T>
void GetConv2DOp(const std::vector<T> &x, const std::vector<T> &w,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该函数不是为了拿到一个conv2d op,而是为了拿到conv2d op的计算结果,函数名需正确体现函数的功能。

Copy link
Contributor Author

@ZzSean ZzSean Sep 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 Conv2DForwardCompute

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h Outdated Show resolved Hide resolved
platform::FilterDescriptor filter_desc_;
platform::TensorDescriptor out_desc_;
platform::TensorDescriptor out_stats_desc_;
platform::ConvolutionDescriptor conv_desc_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

发现原来还有个cudnn_helper.h文件,且那个文件引用的多一些,其中有ScopedTensorDescriptorScopedFilterDescriptorScopedConvolutionDescriptor,后续PR可以考虑看这些实现接口是否可用。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看了一下这个文件,里面的这几个接口用起来限制条件更多一点,而且对于 conv 的覆盖情况不全,暂时还用cudnn_desc.h这里的接口吧

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn_desc.hcudnn_helper.h功能重复,我倾向于只保留一个,后续还是可以考虑一下,有需要什么功能也可以加到cudnn_helper.h里面。

paddle/fluid/operators/fused/cudnn_norm_conv.cu.h Outdated Show resolved Hide resolved
@@ -0,0 +1,95 @@
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件在这个PR中没有用到,不要在这个PR里面添加。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除


void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr,
T *filter_ptr, T *output_ptr, float *sum_ptr,
float *sum_of_squares_ptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

比较倾向于传Tensor,而不是裸指针。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为最终的resnet_unit_op.cu中是三个OP组合在一起的,所以如果都传Tensor的话,有很多代码都是重复的,而传指针只需要定义一次就可以反复使用

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不太理解,后续PR中再看看吧。

Xreki
Xreki previously approved these changes Sep 22, 2021
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 另外,确认有哪个CI跑到了这个单测吗?


void Forward(const platform::CUDADeviceContext &ctx, T *input_ptr,
T *filter_ptr, T *output_ptr, float *sum_ptr,
float *sum_of_squares_ptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不太理解,后续PR中再看看吧。


#if CUDNN_VERSION >= 8000
template <typename T>
class CudnnNormConvolutionOp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该类并没有对应到一个Paddle的OP?所以不建议类名中加Op

platform::FilterDescriptor filter_desc_;
platform::TensorDescriptor out_desc_;
platform::TensorDescriptor out_stats_desc_;
platform::ConvolutionDescriptor conv_desc_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn_desc.hcudnn_helper.h功能重复,我倾向于只保留一个,后续还是可以考虑一下,有需要什么功能也可以加到cudnn_helper.h里面。

kernel_size_ = 1;
stride_ = 1;
pad_ = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个默认构造函数没有必要?

output_channels_ = output_channels;
kernel_size_ = kernel_size;
stride_ = stride;
pad_ = (kernel_size_ - 1) / 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pad确定使用这种计算的方式?是只支持这种配置?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为只支持kernel_size=1 or 3,且输入输出的h和w保持不变,所以pad不需要外面传入,内部这样算就可以,resnet50组网中也是这样算的

float *sum_of_squares_ptr = sum_of_squares_.mutable_data<float>(place_);

std::shared_ptr<op::CudnnNormConvolutionOp<T>> conv_op(
new op::CudnnNormConvolutionOp<T>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接用op::CudnnNormConvolutionOp<T> conv_op;就行了吧。

ctx_->Wait();
}

void Run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

倾向于dev_ctx通过参数传进来。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants