optimize elementwise_mul_grad using new interfaces #37728

AshburnLee · 2021-11-30T12:52:26Z

PR types

Performance optimization

PR changes

OPs

Describe

功能

该PR使用新的接口优化了elementwise_mul的反向计算

opbenchmark 表现

关于CI-opbenchmark 性能较develop性能变差的3个配置：使用reduce接口优化后，前3个配置较dev下降15%左右；适配多输出代码优化后，该3个配置与dev打平，其他超越（1.85x~12.16x)或打平。整体上，不差于dev和竞品。

Update forked PaddlePaddle

Update my fork

update from PaddlePaddle

Update forked paddle repo

Update USERNAME/paddle

update Paddle USERNAME repo

update username repo

update local paddlepaddle

update paddlepaddle

… develop

… elem_mul_grad

paddle-bot-old · 2021-11-30T12:52:30Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… elem_mul_grad

ZzSean · 2021-12-03T02:17:26Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+      std::vector<int> reduce_dims = GetReduceDim(x->dims(), out->dims(), axis);
+      gpuStream_t stream = ctx.cuda_device_context().stream();
+
+      framework::Tensor wayto_dx;


变量名最好可以修改一下

ZzSean · 2021-12-03T02:19:25Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+      wayto_dx.Resize(dout->dims());
+      default_elementwise_mul<DeviceContext, T>(ctx, dout, y, &wayto_dx);
+
+      const framework::Tensor* const_to_dx =


这句可以和下面合并

… elem_mul_grad

Zjq9409 · 2021-12-06T03:02:02Z

paddle/fluid/operators/elementwise/elementwise_functor.h

@@ -48,6 +49,17 @@ template <typename T>
 struct MulFunctor {
  inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
 };
+
+template <typename T>
+struct MulFunctor<paddle::platform::complex<T>> {


这个函数可以删除，可以判断当复数形式时从原来的y(y.real, y.imag)构造y_conj(y.real, -y.imag);传入乘法就行

此函数放在这里确实不合适，与MulFunctor语义冲突。已修改

AshburnLee · 2021-12-10T06:51:07Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+  inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
+};
+template <typename T>
+struct MulDxDyFunctor<paddle::platform::complex<T>> {


函数MulGradDY接受4个参数，如果使用这个函数，就要将调用者改为函数ElemwiseGradCompute，该函数最终调用kernel是优化前的kernel，并且意义不同，此处的功能是elemwiseCompute而非ElemwiseGradCompute。所以我认为，此处复用不了MulGradDY，故提供了MulDxDyFunctor。

Zjq9409 · 2021-12-10T09:37:12Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+  inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }
+};
+template <typename T>
+struct MulDxDyFunctor<paddle::platform::complex<T>> {


名字可以修改一下，并且提取到elementwise_functor.h公共文件中

Zjq9409 · 2021-12-10T09:37:46Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+      std::vector<int> reduce_dims = GetReduceDim(x->dims(), out->dims(), axis);
+      gpuStream_t stream = ctx.cuda_device_context().stream();
+
+      framework::Tensor dx_tmp;


建议修改命名方式

done. 改为dx_origin_dims，表示reduce之前的dx结果。

Zjq9409 · 2021-12-10T09:39:47Z

paddle/fluid/operators/elementwise/elementwise_mul_op.h

+template <typename DeviceContext, typename T>
+typename std::enable_if<
+    std::is_same<DeviceContext, platform::CPUDeviceContext>::value>::type
+default_elementwise_mul_grad(const framework::ExecutionContext& ctx,


default_elementwise_mul_grad和elementwise_mul_grad代码存在重复

Zjq9409 · 2021-12-13T06:47:39Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

@@ -114,6 +116,73 @@ __global__ void SimpleElemwiseMulGradCUDAKernel<plat::complex<double>>(
  }
 }

+template <typename T>
+struct MulDxDyFunctor {
+  inline HOSTDEVICE T operator()(const T& a, const T& b) const { return a * b; }


这里的参数a，b与下面的参数 x、y不统一

… elem_mul_grad

JamesLim-sy · 2022-01-05T03:18:18Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+    }
+  }
+}
+*/


删除无效的注释

JamesLim-sy · 2022-01-05T03:19:12Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

+template <typename DeviceContext, typename T>
+typename std::enable_if<
+    std::is_same<DeviceContext, platform::CUDADeviceContext>::value>::type
+default_elementwise_mul_grad(const framework::ExecutionContext& ctx,


这部分的代码根据Zjq9409的最新合入PR修改一下

Zjq9409 · 2022-01-05T03:25:51Z

paddle/fluid/operators/elementwise/elementwise_mul_op.cu

@@ -113,6 +114,181 @@ __global__ void SimpleElemwiseMulGradCUDAKernel<plat::complex<double>>(
  }


SimpleElemwiseMulGradCUDAKernel函数代码可以删除

… elem_mul_grad

JamesLim-sy

I agree with this pr, if other reviewers also agree with it, then it can be merged.

Zjq9409

#include "paddle/fluid/operators/elementwise/elementwise_op_broadcast.cu.h"
#include "paddle/fluid/platform/complex.h"
#include "paddle/fluid/platform/float16.h"

elementwise_mul_op.cu文件中以上头文件可以删除，可以在下个PR中删掉

AshburnLee added 15 commits September 8, 2020 09:45

Merge pull request #1 from PaddlePaddle/develop

8f532b0

Update forked PaddlePaddle

Merge pull request #2 from PaddlePaddle/develop

5b5804d

Update my fork

Merge pull request #3 from PaddlePaddle/develop

cee2470

update from PaddlePaddle

Merge pull request #4 from PaddlePaddle/develop

5be3a45

Update forked paddle repo

Merge pull request #5 from PaddlePaddle/develop

a1d92b7

Update USERNAME/paddle

Merge pull request #6 from PaddlePaddle/develop

e674a5d

update Paddle USERNAME repo

Merge pull request #7 from PaddlePaddle/develop

855d00b

update username repo

Merge pull request #8 from PaddlePaddle/develop

7cb2c97

update local paddlepaddle

Merge pull request #9 from PaddlePaddle/develop

db9fc91

update paddlepaddle

Merge branch 'develop' of https://github.com/PaddlePaddle/paddle into…

c7b68c8

… develop

Merge branch 'PaddlePaddle:develop' into develop

0fd630e

Merge branch 'PaddlePaddle:develop' into develop

4bbb33b

Merge branch 'PaddlePaddle:develop' into develop

30a1a89

init commit: new elem_mul_grad

ae1d4ba

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e06dd3c

… elem_mul_grad

AshburnLee added 2 commits December 1, 2021 08:13

add template speciallization for complex in multiply

7bee4f5

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

30c53ff

… elem_mul_grad

ZzSean reviewed Dec 3, 2021

View reviewed changes

AshburnLee added 4 commits December 3, 2021 10:55

reply review comments

363daf2

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

56b3ec6

… elem_mul_grad

correct dx and dy computation when T is complex

e1e6ef4

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0965326

… elem_mul_grad

Zjq9409 reviewed Dec 6, 2021

View reviewed changes

AshburnLee commented Dec 10, 2021

View reviewed changes

Zjq9409 reviewed Dec 13, 2021

View reviewed changes

AshburnLee added 3 commits December 13, 2021 07:26

reply review comments

2c084e8

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e41f374

… elem_mul_grad

update to new ReduceRunctor

25f91d3

AshburnLee added 7 commits December 20, 2021 03:55

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fec5b3e

… elem_mul_grad

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9b1507e

… elem_mul_grad

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5cf6b5d

… elem_mul_grad

mul-output broadcast

ef592c1

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c7a6037

… elem_mul_grad

call functions

99a9324

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

2951afc

… elem_mul_grad

JamesLim-sy reviewed Jan 5, 2022

View reviewed changes

Zjq9409 reviewed Jan 5, 2022

View reviewed changes

AshburnLee added 3 commits January 5, 2022 04:45

call functions with comments

8fdd6d1

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7b663f4

… elem_mul_grad

remove comments

8b19aaf

JamesLim-sy approved these changes Jan 5, 2022

View reviewed changes

Zjq9409 approved these changes Jan 5, 2022

View reviewed changes

JamesLim-sy merged commit 36a102f into PaddlePaddle:develop Jan 5, 2022

AshburnLee deleted the elem_mul_grad branch January 5, 2022 14:03

JamesLim-sy mentioned this pull request Jan 5, 2022

Remove useless headers for some grad ops #38732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize elementwise_mul_grad using new interfaces #37728

optimize elementwise_mul_grad using new interfaces #37728

AshburnLee commented Nov 30, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 30, 2021

ZzSean Dec 3, 2021

AshburnLee Dec 3, 2021

ZzSean Dec 3, 2021

AshburnLee Dec 3, 2021

Zjq9409 Dec 6, 2021

AshburnLee Dec 6, 2021

AshburnLee Dec 10, 2021 •

edited

Loading

Zjq9409 Dec 10, 2021

Zjq9409 Dec 10, 2021

AshburnLee Dec 13, 2021

Zjq9409 Dec 10, 2021

AshburnLee Dec 13, 2021

Zjq9409 Dec 13, 2021

AshburnLee Dec 13, 2021

JamesLim-sy Jan 5, 2022

AshburnLee Jan 5, 2022

JamesLim-sy Jan 5, 2022

AshburnLee Jan 5, 2022

Zjq9409 Jan 5, 2022

AshburnLee Jan 5, 2022

JamesLim-sy left a comment •

edited

Loading

Zjq9409 left a comment •

edited by JamesLim-sy

Loading

		@@ -113,6 +114,181 @@ __global__ void SimpleElemwiseMulGradCUDAKernel<plat::complex<double>>(
		}

optimize elementwise_mul_grad using new interfaces #37728

optimize elementwise_mul_grad using new interfaces #37728

Conversation

AshburnLee commented Nov 30, 2021 • edited Loading

PR types

PR changes

Describe

功能

opbenchmark 表现

paddle-bot-old bot commented Nov 30, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AshburnLee Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesLim-sy left a comment • edited Loading

Choose a reason for hiding this comment

Zjq9409 left a comment • edited by JamesLim-sy Loading

Choose a reason for hiding this comment

AshburnLee commented Nov 30, 2021 •

edited

Loading

AshburnLee Dec 10, 2021 •

edited

Loading

JamesLim-sy left a comment •

edited

Loading

Zjq9409 left a comment •

edited by JamesLim-sy

Loading