Added reshape+transpose+matmul_v2 fuse pass #36759

sfraczek · 2021-10-26T14:29:57Z

PR types

Performance optimization

PR changes

Others

Describe

This is reshape+transpose+matmul_v2 fuse pass which is based on previous identical fuse for matmul_v1: #23754, this fuse will speedup bert-like models.

paddle-bot-old · 2021-10-26T14:30:05Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

jakpiase

Thank you very much for your contribution!

jakpiase · 2021-10-26T18:48:08Z

paddle/fluid/operators/matmul_op.cc

-framework::DDim GetDimForInput(const framework::InferShapeContext &ctx,
-                               std::string input_name) {
+static framework::DDim GetDimForInput(const framework::InferShapeContext &ctx,
+                                      std::string input_name) {


Suggested change

std::string input_name) {

const std::string& input_name) {

That might not be performance-critical, but it still may save us some time

AFAIK GetDimForInput is always called with const char*.
I think that the string construction happens in place here based on char ptr so there is no copying of string but passing a char*. If so, then it's not necessarily slower. I will use a char argument instead
and create a string based on it inside the function explicitly. It will show better the intent behind it. I will also validate if it's exactly either 'X' or 'Y'.

That sounds nice, thank you!

jakpiase · 2021-10-26T18:48:25Z

paddle/fluid/operators/matmul_v2_op.cc

@@ -19,6 +19,36 @@
 namespace paddle {
 namespace operators {

+static framework::DDim GetDimForInput(const framework::InferShapeContext& ctx,
+                                      std::string input_name) {


Same as above

same reply as above

jakpiase · 2021-10-26T18:48:59Z

paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc

+using paddle::framework::DDim;
+
+static DDim GetDimForInput(const ExecutionContext& ctx,
+                           std::string input_name) {


Same as above

same reply as above

jakpiase · 2021-10-26T18:49:52Z

paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc

+}
+
+std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,
+                                     std::string input_name) {


Same as above

I was actually using input_name as read/write variable so I liked the copy/constructed object from const char*.
I will change it to char instead as mentioned in above comment.

jakpiase · 2021-10-26T18:53:38Z

paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc

-    if (!trans_x) {
-      x_strides.insert(x_strides.end(), {M * K, K, 1});
+    if (!strides_x.empty()) {
+      x_strides = strides_x;


Could you please change the names of these variables? Since x_strides and strides_x sounds like it's exactly the same

jakpiase · 2021-10-26T18:53:54Z

paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc

-    if (!trans_y) {
-      y_strides.insert(y_strides.end(), {N * K, N, 1});
+    if (!strides_y.empty()) {
+      y_strides = strides_y;


Same as with x_strides and strides_x

jakpiase · 2021-10-26T18:57:22Z

python/paddle/fluid/tests/unittests/mkldnn/test_matmul_v2_mkldnn_op.py

+        TestReshapeTransposeMatMulOp3DYFloat):
+    def set_op_type_and_transpose_y_name(self):
+        self.op_type = "matmul_v2"
+        self.transpose_y_name = "trans_y"


Why only transpose_y is tested?

Maybe it is because the only use case for bert like models had transposition of y so it was prepared for that only.

jakpiase · 2021-10-26T18:58:30Z

paddle/fluid/operators/mkldnn/matmul_v2_mkldnn_op.cc

+  return new_x;
+}
+
+std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,


Suggested change

std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,

static std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,

Since matmul and matmul_v2 are sharing these functions, maybe it would be nice to include just the signature in "matmul_mkldnn_op.h", so we won't need to have two copies of exactly the same function, what do you think about that?

I can move some function declarations there but some function signatures are different - those I won't move unless we figure out something else.

wozna

Great job, only have one question

wozna · 2021-10-28T10:14:08Z

paddle/fluid/operators/mkldnn/matmul_mkldnn_op.cc

+  return new_x;
+}
+
+static framework::DDim GetDimForInput(const framework::InferShapeContext& ctx,


This is a similar question that Jakub drew attention to. Couldn't we dump this function to matmul_mkldnn_op.h to implement mkldnn? I can see that for matmul_mkldnn_op.cc, matmul_v2_mkldnn_op.cc we have different contexts: const ExecutionContext & ctx and const framework::InferShapeContext & ctx, do you know why they are different?

Actually, this function has the same logic for matmul_op.cc, matmul_v2_op.cc, matmul_mkldnn_op.cc, matmul_v2_mkldnn_op.cc, which gives us four copies of this code. So maybe at least for mkldnn we can optimize it.

I tried moving functions to header by it doesn't build on CI saying that there's duplicate definition and other problems. I cannot reproduce it on my machine, maybe when I got cuda/cudnn installed and configured then I could try to reproduce that error. Otherwise testing by pushing to a PR is not a good way for debugging this. There are hard to understand include patterns for matul matmul v2 and mkldnn counterparts. Some functions are the same in matmul_xpu functions also. I couldn't figure it out given the time I had. I would suggest trying refactoring later in a separate PR depending on priorities. I think we could re-think the namespaces in those files.

They have different context I needed to use the same function in a place where I had access to found different context variable so I made it this way.

lidanqing-intel · 2021-11-02T02:30:31Z

@sfraczek Hi, more CIs failed, do you know why ?

sfraczek · 2021-11-02T10:12:10Z

@sfraczek Hi, more CIs failed, do you know why ?

Hi, yes I have reproduction thanks to Joanna and I'm working on it. The problem is with WITH_UNITY_BUILD

…rator

lidanqing-intel

LGTM. Thank you very much !

lidanqing-intel · 2021-11-09T03:08:47Z

@sfraczek CheckPRTemplate passed. I will try to ask for approval now

lidanqing-intel · 2021-11-09T03:12:53Z

This PR will need two approvals.
Because it exceeds 20 files and added new attributes in the native op.

2021-11-05 22:08:10 2021-11-05 22:08:10 (3.12 MB/s) - 已保存 “bk.txt” [5/5])
2021-11-05 22:08:17 ****************
2021-11-05 22:08:17 0. You must have Dianhai approval for change 20+ files or add than 1000+ lines of content.
2021-11-05 22:08:17 1. You must have one RD (Avin0323(Recommend) or zhouwei25 or wanghuancoder or luotao1) approval for modifying unity_build_rule.cmake which the rules of Unity Build.
2021-11-05 22:08:17 There are 2 approved errors.
2021-11-05 22:08:17 ****************

sfraczek · 2021-11-10T10:32:32Z

on fp32 model by running cpu_infer.py from #36962 with commented out passes

      // "map_matmul_v2_to_mul_pass",              //
      // "map_matmul_v2_to_matmul_pass",           //
      // "map_matmul_to_mul_pass",                 //

Built with RelWithDebInfo
after (this branch)
acc: 0.5567, time: 41.98263645172119
before (develop):
acc: 0.5567, time: 42.94174814224243
on i9 desktop machine

From log of fuses:
I1110 10:33:40.420223 28072 fuse_pass_base.cc:57] --- detected 6 subgraphs
--- Fused 6 ReshapeTransposeMatmul patterns for matmul_v2 Op with reshape's xshape with transpose's xshape

lidanqing-intel · 2021-11-10T11:03:21Z

@baoachun Hi, 这个PR可以approve然后合入吗？这个PR实现了matmul_v2 fuse, 而且已经在真实Bert 模型#36962 提供的模型上测过了，对性能有提升，对精度无影响。这个PR超过20个文件，因为涉及的文件比较多，但是单测和真实模型都测过了，可以合其实。

jakpiase

LGTM, really good work :)

jczaja

LGTM

paddle-bot-old · 2021-11-13T02:37:18Z

Sorry to inform you that fbcf847's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

lidanqing-intel

LGTM

lidanqing-intel · 2021-11-22T07:38:19Z

@baoachun Please send more models with broadcasting. Thanks

lidanqing-intel · 2021-11-22T07:38:58Z

@sfraczek Baidu require to split this PR

paddle-bot-old · 2021-11-25T02:35:34Z

Sorry to inform you that 38d3971's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

sfraczek · 2021-12-03T16:15:17Z

opened new version #37847

$sfraczek$

$@sfraczek$

reshape+transpose+matmul_v2

5182cbf

$@sfraczek$ sfraczek requested a review from jakpiase October 26, 2021 14:30

$@sfraczek$ sfraczek requested a review from lidanqing-intel October 26, 2021 14:30

$@sfraczek$ sfraczek added the Intel label Oct 26, 2021

jakpiase reviewed Oct 26, 2021

View reviewed changes

$@sfraczek$

review fixes

7dabda7

wozna reviewed Oct 28, 2021

View reviewed changes

$@sfraczek$

revert broken changes

97d269a

lidanqing-intel changed the title ~~Added reshape+transpose+matmul_v2 fuse pass~~ [WIP] Added reshape+transpose+matmul_v2 fuse pass Nov 2, 2021

sfraczek added 7 commits November 4, 2021 17:14

$@sfraczek$

compilation fixes

c3e8f41

$@sfraczek$

Merge branch 'develop' into reshape-transpose-matmulv2

680e711

$@sfraczek$

fix after merge

c386d5c

$@sfraczek$

replace include operator with op_info

bbb1877

$@sfraczek$

remove include op info

95632d9

$@sfraczek$

attemt at fixing missing framework.pb.h

6d56ac7

$@sfraczek$

attemt fix 2: change DEPS of matmul_utils from framework_proto to ope…

fbcf847

…rator

lidanqing-intel approved these changes Nov 9, 2021

View reviewed changes

lidanqing-intel changed the title ~~[WIP] Added reshape+transpose+matmul_v2 fuse pass~~ Added reshape+transpose+matmul_v2 fuse pass Nov 10, 2021

jakpiase previously approved these changes Nov 10, 2021

View reviewed changes

jczaja self-requested a review November 10, 2021 14:07

jczaja previously approved these changes Nov 10, 2021

View reviewed changes

lidanqing-intel previously approved these changes Nov 15, 2021

View reviewed changes

$@sfraczek$

add fuse to quant2_int8_mkldnn_pass

38d3971

$@sfraczek$ sfraczek dismissed stale reviews from lidanqing-intel, jczaja, and jakpiase via 38d3971 November 17, 2021 16:04

$@sfraczek$

Merge branch 'develop' into reshape-transpose-matmulv2

3a17222

$@sfraczek$ sfraczek closed this Dec 3, 2021

	std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,
	static std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,

Added reshape+transpose+matmul_v2 fuse pass #36759

Added reshape+transpose+matmul_v2 fuse pass #36759

Conversation

sfraczek commented Oct 26, 2021 • edited by lidanqing-intel Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Oct 26, 2021

jakpiase left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wozna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidanqing-intel commented Nov 2, 2021

sfraczek commented Nov 2, 2021

lidanqing-intel left a comment

Choose a reason for hiding this comment

lidanqing-intel commented Nov 9, 2021

lidanqing-intel commented Nov 9, 2021 • edited Loading

sfraczek commented Nov 10, 2021

lidanqing-intel commented Nov 10, 2021 • edited Loading

jakpiase left a comment

Choose a reason for hiding this comment

jczaja left a comment

Choose a reason for hiding this comment

paddle-bot-old bot commented Nov 13, 2021

lidanqing-intel left a comment

Choose a reason for hiding this comment

lidanqing-intel commented Nov 22, 2021

lidanqing-intel commented Nov 22, 2021

paddle-bot-old bot commented Nov 25, 2021

sfraczek commented Dec 3, 2021

$@sfraczek$ sfraczek commented Oct 26, 2021 •

edited by lidanqing-intel

Loading

lidanqing-intel commented Nov 9, 2021 •

edited

Loading

lidanqing-intel commented Nov 10, 2021 •

edited

Loading