Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added reshape+transpose+matmul_v2 fuse pass #36759

Closed

Conversation

sfraczek
Copy link
Contributor

@sfraczek sfraczek commented Oct 26, 2021

PR types

Performance optimization

PR changes

Others

Describe

This is reshape+transpose+matmul_v2 fuse pass which is based on previous identical fuse for matmul_v1: #23754, this fuse will speedup bert-like models.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution!

framework::DDim GetDimForInput(const framework::InferShapeContext &ctx,
std::string input_name) {
static framework::DDim GetDimForInput(const framework::InferShapeContext &ctx,
std::string input_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string input_name) {
const std::string& input_name) {

That might not be performance-critical, but it still may save us some time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK GetDimForInput is always called with const char*.
I think that the string construction happens in place here based on char ptr so there is no copying of string but passing a char*. If so, then it's not necessarily slower. I will use a char argument instead
and create a string based on it inside the function explicitly. It will show better the intent behind it. I will also validate if it's exactly either 'X' or 'Y'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds nice, thank you!

@@ -19,6 +19,36 @@
namespace paddle {
namespace operators {

static framework::DDim GetDimForInput(const framework::InferShapeContext& ctx,
std::string input_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reply as above

using paddle::framework::DDim;

static DDim GetDimForInput(const ExecutionContext& ctx,
std::string input_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same reply as above

}

std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,
std::string input_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually using input_name as read/write variable so I liked the copy/constructed object from const char*.
I will change it to char instead as mentioned in above comment.

if (!trans_x) {
x_strides.insert(x_strides.end(), {M * K, K, 1});
if (!strides_x.empty()) {
x_strides = strides_x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change the names of these variables? Since x_strides and strides_x sounds like it's exactly the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

if (!trans_y) {
y_strides.insert(y_strides.end(), {N * K, N, 1});
if (!strides_y.empty()) {
y_strides = strides_y;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with x_strides and strides_x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

TestReshapeTransposeMatMulOp3DYFloat):
def set_op_type_and_transpose_y_name(self):
self.op_type = "matmul_v2"
self.transpose_y_name = "trans_y"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only transpose_y is tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is because the only use case for bert like models had transposition of y so it was prepared for that only.

return new_x;
}

std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,
static std::vector<int64_t> GetInputStrides(const ExecutionContext& ctx,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since matmul and matmul_v2 are sharing these functions, maybe it would be nice to include just the signature in "matmul_mkldnn_op.h", so we won't need to have two copies of exactly the same function, what do you think about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move some function declarations there but some function signatures are different - those I won't move unless we figure out something else.

Copy link
Contributor

@wozna wozna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, only have one question

return new_x;
}

static framework::DDim GetDimForInput(const framework::InferShapeContext& ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a similar question that Jakub drew attention to. Couldn't we dump this function to matmul_mkldnn_op.h to implement mkldnn? I can see that for matmul_mkldnn_op.cc, matmul_v2_mkldnn_op.cc we have different contexts: const ExecutionContext & ctx and const framework::InferShapeContext & ctx, do you know why they are different?

Actually, this function has the same logic for matmul_op.cc, matmul_v2_op.cc, matmul_mkldnn_op.cc, matmul_v2_mkldnn_op.cc, which gives us four copies of this code. So maybe at least for mkldnn we can optimize it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried moving functions to header by it doesn't build on CI saying that there's duplicate definition and other problems. I cannot reproduce it on my machine, maybe when I got cuda/cudnn installed and configured then I could try to reproduce that error. Otherwise testing by pushing to a PR is not a good way for debugging this. There are hard to understand include patterns for matul matmul v2 and mkldnn counterparts. Some functions are the same in matmul_xpu functions also. I couldn't figure it out given the time I had. I would suggest trying refactoring later in a separate PR depending on priorities. I think we could re-think the namespaces in those files.

They have different context I needed to use the same function in a place where I had access to found different context variable so I made it this way.

@lidanqing-intel
Copy link
Contributor

@sfraczek Hi, more CIs failed, do you know why ?

@lidanqing-intel lidanqing-intel changed the title Added reshape+transpose+matmul_v2 fuse pass [WIP] Added reshape+transpose+matmul_v2 fuse pass Nov 2, 2021
@sfraczek
Copy link
Contributor Author

sfraczek commented Nov 2, 2021

@sfraczek Hi, more CIs failed, do you know why ?

Hi, yes I have reproduction thanks to Joanna and I'm working on it. The problem is with WITH_UNITY_BUILD

Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you very much !

@lidanqing-intel
Copy link
Contributor

@sfraczek CheckPRTemplate passed. I will try to ask for approval now

@lidanqing-intel
Copy link
Contributor

lidanqing-intel commented Nov 9, 2021

This PR will need two approvals.
Because it exceeds 20 files and added new attributes in the native op.

2021-11-05 22:08:10 2021-11-05 22:08:10 (3.12 MB/s) - 已保存 “bk.txt” [5/5])
2021-11-05 22:08:17 ****************
2021-11-05 22:08:17 0. You must have Dianhai approval for change 20+ files or add than 1000+ lines of content.
2021-11-05 22:08:17 1. You must have one RD (Avin0323(Recommend) or zhouwei25 or wanghuancoder or luotao1) approval for modifying unity_build_rule.cmake which the rules of Unity Build.
2021-11-05 22:08:17 There are 2 approved errors.
2021-11-05 22:08:17 ****************

@sfraczek
Copy link
Contributor Author

on fp32 model by running cpu_infer.py from #36962 with commented out passes

      // "map_matmul_v2_to_mul_pass",              //
      // "map_matmul_v2_to_matmul_pass",           //
      // "map_matmul_to_mul_pass",                 //

Built with RelWithDebInfo
after (this branch)
acc: 0.5567, time: 41.98263645172119
before (develop):
acc: 0.5567, time: 42.94174814224243
on i9 desktop machine

From log of fuses:
I1110 10:33:40.420223 28072 fuse_pass_base.cc:57] --- detected 6 subgraphs
--- Fused 6 ReshapeTransposeMatmul patterns for matmul_v2 Op with reshape's xshape with transpose's xshape

@lidanqing-intel lidanqing-intel changed the title [WIP] Added reshape+transpose+matmul_v2 fuse pass Added reshape+transpose+matmul_v2 fuse pass Nov 10, 2021
@lidanqing-intel
Copy link
Contributor

lidanqing-intel commented Nov 10, 2021

@baoachun Hi, 这个PR可以approve然后合入吗?这个PR实现了matmul_v2 fuse, 而且已经在真实Bert 模型#36962 提供的模型上测过了,对性能有提升,对精度无影响。这个PR超过20个文件,因为涉及的文件比较多,但是单测和真实模型都测过了,可以合其实。

jakpiase
jakpiase previously approved these changes Nov 10, 2021
Copy link
Contributor

@jakpiase jakpiase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really good work :)

@jczaja jczaja self-requested a review November 10, 2021 14:07
jczaja
jczaja previously approved these changes Nov 10, 2021
Copy link
Contributor

@jczaja jczaja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@paddle-bot-old
Copy link

Sorry to inform you that fbcf847's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Contributor

@lidanqing-intel lidanqing-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lidanqing-intel
Copy link
Contributor

@baoachun Please send more models with broadcasting. Thanks

@lidanqing-intel
Copy link
Contributor

@sfraczek Baidu require to split this PR

@paddle-bot-old
Copy link

Sorry to inform you that 38d3971's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@sfraczek
Copy link
Contributor Author

sfraczek commented Dec 3, 2021

opened new version #37847

@sfraczek sfraczek closed this Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants