-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OneDNN md-in-tensor refactoring part 5: Memory descriptor enabled for elementwises, reductions and expand_v2 ops #43036
Conversation
很抱歉,经过我们的反复讨论,你的PR暂未达到合入标准,请阅读飞桨原生算子开发规范,你可以重新提交新的PR,我们先将此PR关闭,感谢你的贡献。 |
Reopened due to license/cla being stuck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@tsocha please review this PR |
platform::Place cpu_place, const Tensor* x, | ||
Tensor* out, float scale_x, float scale_y, | ||
const std::vector<int64_t>& extended_x_dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CONCERN]
Type is almost the same but order has changed.
I hope that our tests will detect potential problems with it.
Was this change of order necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was not necessary, but for me order: input, output is more intuitive than output, input. I have checked all the tests for reduce_grad and expand_v2 and they all work correctly after my changes
paddle/fluid/platform/mkldnn_reuse.h
Outdated
dnnl::primitive_attr attrs; | ||
attrs.set_scales(DNNL_ARG_SRC_0, 0, {scale_x}); | ||
attrs.set_scales(DNNL_ARG_SRC_1, 0, {scale_y}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OPINION]
The old name was good.
const Tensor* y, std::vector<int64_t> y_tz, | ||
const dnnl::primitive_attr& attr = NULL) | ||
const Tensor* out, std::vector<int64_t> out_tz, | ||
const dnnl::primitive_attr& attrs = NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[COMMENT]
If dnnl::primitive_attr
can hold multiple attributes its name should be changed in onednn, but if it can't then our parameter should have the old name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can hold scales, post-ops, zero-points, so it definitely supports multiple attributes, but probably because of backward compatibility it won't be changed inside oneDNN
|
||
std::vector<int64_t> output_dims(phi::vectorize(input->dims())); | ||
std::vector<int64_t> output_dims = phi::vectorize(input->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[QUESTION]
Was this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it will produce the exact same assembly, I will revert it then
for (size_t i = 0; i < reduce_dims.size(); ++i) { | ||
// handle negative dims, f.e. -1 means last dimension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CONCERN]
I think last dimension
is not the best description.
(1, 3, -1, 512, 512)
<< the last dimension is -1
or 512
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'll change it to "rightmost"
const auto* x = ctx.Input<LoDTensor>("X"); | ||
auto* out = ctx.Output<Tensor>("Out"); | ||
|
||
auto reduce_dims = ctx.Attr<std::vector<int>>("dim"); | ||
bool reduce_all = ctx.Attr<bool>("reduce_all"); | ||
bool keep_dim = ctx.Attr<bool>("keep_dim"); | ||
|
||
auto output_dims = | ||
CalculateReducedDims(input, output, reduce_dims, reduce_all, keep_dim); | ||
auto input_dims = phi::vectorize(input->dims()); | ||
auto x_tz = phi::vectorize(x->dims()); | ||
auto out_tz = | ||
CalculateReducedDims(x, out, reduce_dims, reduce_all, keep_dim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OPINION]
I see why you changed names here but input
>x
IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed them to unify all of our ops, since all new ops are following this pattern, i.e. inputs are described as 'X' and 'Y' and output is described as 'Out'. Some legacy operators are having different names because of backwards compatibility, but I was thinking that unifying them to get rid of the additional noise would be nice
|
||
auto reorder_dst_memory_p = reorder_handler.AcquireDstMemory( | ||
output, input->mem_desc(), ctx.GetPlace()); | ||
// reuse same mem desc since it is a simple copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// reuse same mem desc since it is a simple copy | |
// reuse mem desc since it is a simple copy |
@@ -124,25 +126,18 @@ class ReduceGradMKLDNNKernel : public framework::OpKernel<T> { | |||
bool keep_dim = ctx.Attr<bool>("keep_dim"); | |||
bool reduce_all = ctx.Attr<bool>("reduce_all"); | |||
auto dims = ctx.Attr<std::vector<int>>("dim"); | |||
const auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); | |||
auto* dout = ctx.Input<Tensor>(framework::GradVarName("Out")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CONCERN]
Do you plan to change this pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, nice catch, that was a mistake, thanks
auto dout_tz = CalculateReducedDims(dx, dout, dims, reduce_all, keep_dim); | ||
auto dx_tz = phi::vectorize(dx->dims()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CONCERN]
The logic changed here.
Input has been swaped with output, was it intentional?
What means tz
and d
in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input
and Output
were incorrectly swapped here earlier. Reduce kernel was one of the first ops that I have made and there were some inconsistencies and bugs inside of it, so that change is intentional
d
means that it is a grad tensor, so dout
is the input and dx
is the output
I have no idea what tz
means, but it is commonly used as a synonym for dims
inside PaddlePaddle, oneDNN and other frameworks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… elementwises, reductions and expand_v2 ops (PaddlePaddle#43036) * enabled md in elementwises, reductions and expand_v2 * CI fix for invalid numpy copy * fixed formatting * CI rerun * changes after review
PR types
Others
PR changes
OPs
Describe
Memory descriptor enabled for elementwises, reductions and expand_v2 ops