Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the grad and enhance the cache of norm_convolution fusion ops. #36168

Merged
merged 8 commits into from
Sep 29, 2021

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Sep 27, 2021

PR types

Performance optimization

PR changes

OPs

Describe

  1. 完善基于cudnnFusedOpsPlan_tCudnnNormConvolution融合计算的写法,包括:
    • 基于 @JamesLim-sy 提供的初版,实现 CudnnNormConvolutionGrad
    • 实现一个CudnnFusionOpCache,将生成的CudnnFusionOp cache起来,避免每次调用cudnnMakeFusedOpsPlan造成的巨大的CPU开销
  2. 调整 cudnn_norm_convolution的单测:
    • 增强代码的结构性和重用性
    • 添加前向对sum和sum_of_square结果的检查
    • 添加反向计算结果的检查

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

handle, args_.filter_desc.desc(), args_.out_desc.desc(),
args_.conv_desc.desc(), args_.in_desc.desc(), dgrad_algo_,
&workspace_size));
return RoundUp(workspace_size, 512);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果也cache住的话,wgrad_op->GetWorkspaceSizeInBytes(ctx.cudnn_handle())得到的dweight_workspace_size和这里得到的dgrad_workspace_size取最大值,这里的开销也可以减掉了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分开销不算大,而且cache的对象不一样。Fused方式是需要把整个CudnnFusionOp都cache下来,主要目的是cache住 FusedOpsPlan。后续可以再验证下这部分对性能是否有影响,若有则再进一步cache吧。

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants