Add Conv Transpose BF16 #30877

wozna · 2021-02-03T16:41:47Z

PR types

Others

PR changes

OPs

Describe

This PR:

change conv_tranpose op mkldnn kernel to use MKLDNNHandlerT
add BF16 support for conv_tranpose op

paddle-bot-old · 2021-02-03T16:41:56Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

jczaja · 2021-02-04T10:01:52Z

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

+      : platform::MKLDNNHandlerT<T, mkldnn::deconvolution_forward>(
+            dev_ctx, mkldnn_engine, cpu_place,
+            platform::CreateKey(dev_ctx, framework::vectorize(input->dims()),
+                                unique_name)) {
    const bool is_test = ctx.Attr<bool>("is_test");


In general I like very much this PR . The only thing that is missing is that inside a ConvTransposeMKLDNNHandler you should call isCached() method not to create MD again. Please look to other ops implemented with MKLDNNHandlerT like pool, softmax etc.

arogowie-intel

In general there is huge overlap with oneDNN conv kernel (not surprisingly) so it would be good to have this common part in one place. But such refactoring is rather better for another PR.

paddle/fluid/platform/mkldnn_reuse.h

arogowie-intel · 2021-02-04T18:11:04Z

paddle/fluid/platform/mkldnn_reuse.h

@@ -253,6 +255,11 @@ class MKLDNNHandlerT {
        std::static_pointer_cast<dnnl::memory>(dev_ctx_.GetBlob(target_key));

    if (target_memory_p == nullptr) {
+      if (custom_func) {


If I understand still after this custom reorder the condition user_md != target_md may be true and this would result in second reorder. Is it intentionally?

Yes, it is intentionally. Custom reorder function is used to set an appropriate reorder for data_format. However, user_md! = target_md may differ in data type. In the case of bf16, the original weights (user_md) are float, while for calculations (target_md) we need them in bf16 and it is converted in this reorder.

Ok, then if the additional thing you want to do is data type conversion I'd suggest to create a helper function for this task explicitly named anything close to ConvertMemDataType. With it there would be clear intention of control-flow logic: 1st reorder data, 2nd convert memory data type. This AcquireMemoryWithReorder function is already very complicated, very hard to understand it's control-flow and I suppose hard to debug. Not mentioning it's maintenance and testing. IMHO this function is doing to much things.

@arogowie-intel custom function was introduced because there was no NCWH format enum in oneDNN so we needed to do reorder ourselves. This is outdated as after we impleented custom reorder relevant enum was added. So actual task is to remove custom reorder and there is a issue for that in our tracker.

paddle/fluid/platform/mkldnn_reuse.h

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

arogowie-intel · 2021-02-05T09:24:15Z

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

    auto fwd_prop_kind = is_test ? mkldnn::prop_kind::forward_inference
                                 : mkldnn::prop_kind::forward_training;


There is already check forcing is_test attribute to be true at the beginning . So is this ternary operator needed?

We will be changing operators anyway to support training, so I think it's worth leaving it.

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

arogowie-intel · 2021-02-05T10:39:36Z

paddle/fluid/operators/mkldnn/conv_transpose_mkldnn_op.cc

+          weights_tz, platform::MKLDNNGetDataType<K>(),
+          (g == 1) ? filter->format() : MKLDNNMemoryFormat::goihw);
+
+      // Custom Reorder from IOHW to OIHW


Doesn't oneDNN support such reorder?

This is also related to group convolution, so we have to specify that format explicitly.

@arogowie-intel This code is reflecting the diffrence among oneDNN and PaddlePaddle in implementing groups. in oneDNN groups are another dimension e.g. shape without groups OIWH (weights) when there are more than one groups then It becomes GOIHW (5 dimensional). In PaddlePaddle OIHW and GOIHW are expressed in 4D data. It just weights of second group are glued (concatenated) to the end of weights of first group. That is why when groups are present we cannot rely on format inside tensor and we need to change from OIHW to GOIHW

python/paddle/fluid/tests/unittests/mkldnn/test_conv2d_transpose_bf16_mkldnn_op.py

lidanqing-intel · 2021-02-10T11:07:04Z

@wozna

2021-02-10 02:15:11 ****************
2021-02-10 02:15:11 0. Unittest is not allowed to be disabled.
2021-02-10 02:15:11 You must have one RD (kolinwei(Recommend), or luotao1) approval for the usage of @unittest.skip or @unittest.skipIf.
2021-02-10 02:15:11 [email protected](not core.supports_bfloat16(),
2021-02-10 02:15:11 1. The error message you wrote in PADDLE_ENFORCE{_**} or PADDLE_THROW does not meet our error message writing specification. Possible errors include 1. the error message is empty / 2. the error message is too short / 3. the error type is not specified. Please read the specification [ https://github.com/PaddlePaddle/Paddle/wiki/Paddle-Error-Message-Writing-Specification ], then refine the error message. If it is a mismatch, please request chenwhql (Recommend), luotao1 or lanxianghit review and approve.
2021-02-10 02:15:11 The PADDLE_ENFORCE{_**} or PADDLE_THROW entries that do not meet the specification are as follows:
2021-02-10 02:15:11 PADDLE_ENFORCE_NE(input->format(), MKLDNNMemoryFormat::undef, + "Got wrong format for Input tensor.")); 
2021-02-10 02:15:11 2. Developers are not allowed to set the check_dygraph field directly, which is set to True by default. If you need to change the check_dygraph field, you must have one RD (phlrain (Recommend), fuyinno4 (Recommend for kunlun) or lanxianghit) review and approve. 
2021-02-10 02:15:11 The code that do not meet the specification are as follows:
2021-02-10 02:15:11  python/paddle/fluid/tests/unittests/mkldnn/test_conv2d_transpose_bf16_mkldnn_op.py : 
2021-02-10 02:15:11 +        self.check_output(check_dygraph=(self.use_mkldnn == False)) 
2021-02-10 02:15:11 There are 3 approved errors.
2021-02-10 02:15:11 ****************

jczaja

LGTM

arogowie-intel · 2021-02-12T12:44:46Z

Looks good. Just a general note on auto - it deduces plain type if you pass it a reference. Thus it entails a copy. Please pay attention to use auto& or const auto& whenever possible to avoid those copies.

Add conv transpose BF16

e468746

wozna added BF16 Intel labels Feb 3, 2021

Share function GetWeightsTz

f73339b

jczaja reviewed Feb 4, 2021

View reviewed changes

arogowie-intel reviewed Feb 5, 2021

View reviewed changes

wozna added 2 commits February 8, 2021 16:18

Adjust to review and fix op compatibility

f1b1a11

Add bias to unique handler name

50ff881

Remove errors related to paddle enforce

4fa62a2

jczaja previously approved these changes Feb 11, 2021

View reviewed changes

wozna dismissed jczaja’s stale review via 2a8a832 February 16, 2021 10:45

Add conv2d_transpose to bf16 list and kernel refator

2a8a832

jczaja approved these changes Feb 17, 2021

View reviewed changes

luotao1 approved these changes Feb 18, 2021

View reviewed changes

luotao1 merged commit caf9d39 into PaddlePaddle:develop Feb 18, 2021

luotao1 mentioned this pull request Feb 25, 2021

Enable BF16 on Paddle Parameter Server Distributed Training #30560

Closed

wozna deleted the bf16_conv_trans branch February 24, 2023 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Conv Transpose BF16 #30877

Add Conv Transpose BF16 #30877

wozna commented Feb 3, 2021

paddle-bot-old bot commented Feb 3, 2021

jczaja Feb 4, 2021

arogowie-intel left a comment

arogowie-intel Feb 4, 2021

wozna Feb 8, 2021

arogowie-intel Feb 8, 2021

jczaja Feb 10, 2021

arogowie-intel Feb 5, 2021

wozna Feb 8, 2021

arogowie-intel Feb 5, 2021

wozna Feb 8, 2021

jczaja Feb 10, 2021

lidanqing-intel commented Feb 10, 2021

jczaja left a comment

arogowie-intel commented Feb 12, 2021

		auto fwd_prop_kind = is_test ? mkldnn::prop_kind::forward_inference
		: mkldnn::prop_kind::forward_training;

Add Conv Transpose BF16 #30877

Add Conv Transpose BF16 #30877

Conversation

wozna commented Feb 3, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Feb 3, 2021

Choose a reason for hiding this comment

arogowie-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidanqing-intel commented Feb 10, 2021

jczaja left a comment

Choose a reason for hiding this comment

arogowie-intel commented Feb 12, 2021