Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fall back when sparse arrays are passed to MKLDNN-enabled operators #11664

Merged
merged 24 commits into from
Aug 24, 2018

Conversation

luobao-intel
Copy link
Contributor

Description

Currently, the MKLDNN-enabled operators, such as convolution and pooling, can't handle sparse arrays correctly. The reason is that the storage inference of these operators doesn't return the right dispatch mode.#11448

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Activation
  • Convolution
  • Deconvolution
  • LRN
  • Pooling
  • Softmax

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@pengzhao-intel @zheng-da @xinyu-intel

@@ -240,5 +240,135 @@ def check_batchnorm_training(stype):
for stype in stypes:
check_batchnorm_training(stype)

@with_seed()
Copy link
Contributor

@marcoabreu marcoabreu Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these tests are mkldnn specific and should rather be put in test_operators or test_operators_sparse
@eric-haibin-lin @haojin2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed earlier, we should still have such test in place for testing the fallback logic with USE_MKLDNN=1. Depends on the logic in InferStorage, we may need an extra test in test_sparse_operator/test_operator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but these tests can then be placed in the regular operator tests. The fallback will be implicitly tested. I don't see any mkldnn specific code that would justify it from not being run with MXNet or cuda as backend.

I had the same discussion in another PR from shufan (on mobile ATM). Would you mind joining in over there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely would like to, would you mind providing a pointer to that PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@marcoabreu
Copy link
Contributor

Hello @luobao-intel, welcome to MXNet and thanks for your contribution!

@zheng-da
Copy link
Contributor

zheng-da commented Jul 12, 2018

This is the final rectify for fallback problem(functions call)
@luobao-intel
Copy link
Contributor Author

The final commit deleted the wrong test case added by me. And this commit follows the unifying code structure of batchnorm. @zheng-da

}
if (dev_mask == mshadow::cpu::kDevMask && SupportMKLDNNAct(param))
return ElemwiseStorageType<1, 1, false, false, false>(
attrs, dev_mask, dispatch_mode, in_attrs, out_attrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's CPU and support MKLDNN activation, is dispatch_mode kFComputeEx?

@luobao-intel
Copy link
Contributor Author

Thanks for reviewing again. @zheng-da

if (param.act_type != activation::kReLU) {
CHECK_EQ(in_attrs->size(), 3U);
ret = ElemwiseStorageType<3, 1, false, false, false>(
attrs, dev_mask, dispatch_mode, in_attrs, out_attrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have the same question here. dispatch_mode can be kFComputeEx? ElemwiseStorageType only uses kFComputeEx for sparse storage type.

@zheng-da
Copy link
Contributor

Please benchmark the performance with this modification to make sure there isn't performance regression.

@ZhennanQin
Copy link
Contributor

@marcoabreu , if cudnn and mkldnn are both used, the behavior is unchanged. This PR only covers scenarios using mkldnn only.

@ZhennanQin
Copy link
Contributor

ZhennanQin commented Aug 1, 2018

@marcoabreu I did some investigation and found that activation doesn't support FComputeEx<GPU>, so FInferStorageType should be used for MKLDNN only, just like other ops. I refactored the code, please review. Thanks.

@sandeep-krishnamurthy sandeep-krishnamurthy added Sparse MKLDNN Backend Issues related to the backend of MXNet pr-awaiting-review PR is waiting for code review labels Aug 8, 2018
@pengzhao-intel
Copy link
Contributor

@luobao-intel please help add the performance data.

@zheng-da @marcoabreu please take a review again and merge in case no further comments.

@luobao-intel
Copy link
Contributor Author

The training and inference performance of regression verifying for fallback merge is shown as follows. This data is collected on commit 3ea67a7.
The training comparison:

Topo batchsize 6148(fallback) 6148(origin)
alexnet 1 30.594143 31.317253
2 41.044092 41.132786
4 80.769897 81.800255
8 133.454369 127.187418
16 224.20456 221.552398
32 302.540959 305.953074
64 383.023917 330.386532
128 447.751747 435.501388
256 477.884483 474.643253
vgg-16 1 9.848415 9.811869
2 12.403094 12.305232
4 17.652585 17.385885
8 22.634356 20.478328
16 27.60285 27.545312
32 31.050534 30.916987
64 33.113762 32.895624
128 34.623087 34.604778
256 35.412343 35.241626
vgg-19 1 8.483369 8.561665
2 10.670349 10.639146
4 15.133063 15.205443
8 19.301709 19.264988
16 22.8069 23.004417
32 25.658372 25.787008
64 27.210528 27.148756
128 28.396534 28.622843
256 28.990824 28.944935
inception-bn 1 5.173929 5.113685
2 9.713608 9.612722
4 18.292555 17.625253
8 32.262617 32.123224
16 51.378436 50.281799
32 71.756796 77.19132
64 99.396415 101.789003
128 122.943234 125.025602
256 140.761579 140.559997
inception-v3 1 3.647243 3.701843
2 6.893006 7.003765
4 12.292691 12.322165
8 20.018688 20.248172
16 29.733195 29.502392
32 39.961187 39.015816
64 46.334718 47.595459
128 53.897898 54.346451
256 58.247023 58.307715
resnet-50 1 5.828596 5.758312
2 9.999834 9.925494
4 17.197613 16.971942
8 27.336548 26.727552
16 39.047838 37.635901
32 51.535267 50.304822
64 60.800547 59.694137
128 68.016266 66.886088
256 73.783432 72.118631
resnet-152 1 2.073641 2.020319
2 3.745274 3.577454
4 6.488948 6.380499
8 10.482311 10.37387
16 15.354603 14.47856
32 19.90136 19.666869
64 25.205104 24.286143
128 28.272118 28.342906
256 30.751341 29.938136

The Inference comparison:

Topo batchsize 6148(fallback) 6148(origin)
alexnet 1 230.609201 272.974964
2 459.347717 418.977067
4 562.738327 500.14998
8 757.427326 683.123778
16 995.188787 1010.835509
32 1236.91575 1220.31215
64 1386.149118 1310.73248
128 1492.969058 1059.182116
256 1516.787892 1502.325897
vgg-16 1 64.936414 62.079648
2 84.860327 82.680198
4 88.850584 89.668909
8 98.573954 98.142158
16 114.243305 114.564623
32 117.061494 116.955362
64 116.544261 119.016377
128 117.818935 119.549992
256 120.201841 119.661409
vgg-19 1 56.805177 54.132415
2 71.964449 70.584329
4 77.130441 75.984336
8 84.741283 83.674499
16 94.627382 94.114408
32 96.506882 95.726291
64 95.06602 96.789072
128 97.67915 96.973608
256 97.465934 96.771837
inception-bn 1 103.438399 103.472951
2 166.834318 161.873053
4 255.293207 250.687353
8 340.244836 333.390418
16 410.579713 406.384
32 452.503148 447.816195
64 490.925741 467.791765
128 493.556792 497.755161
256 504.554601 483.210319
inception-v3 1 51.675768 51.394543
2 84.020412 83.020656
4 117.211207 118.202891
8 145.685021 144.755562
16 164.314623 161.190974
32 181.601478 177.47979
64 187.68821 183.350203
128 187.519656 186.671198
256 190.363415 185.131233
resnet-50 1 74.352261 71.926143
2 105.118596 100.971862
4 136.682507 129.251114
8 160.903148 150.033564
16 186.371277 184.590927
32 210.353603 197.494244
64 220.03369 205.50499
128 222.752326 209.56644
256 220.268782 202.993866
resnet-152 1 31.555538 30.415254
2 44.313238 41.896732
4 56.402015 53.602173
8 66.893727 61.240171
16 79.20712 73.083458
32 83.782056 82.280132
64 93.311579 85.641401
128 93.520913 87.348002
256 89.06415 83.397316

@zheng-da
Copy link
Contributor

thanks for providing the performance results.

@pengzhao-intel
Copy link
Contributor

@marcoabreu any other comments? Could you help to merge?
It's a usability improvement and I think it's better to go 1.3.

NDArray data = inputs_[i];
inputs.emplace_back(data.shape(), ctx, false, data.dtype());
if (data.IsMKLDNNData() && data.IsView())
data = data.Reorder2Default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you remove the code here?
I think the original code is correct.

if (is_excluded.get(attrs.op, false)) {
LOG(WARNING) << attrs.op->name << " not checked. TExcludeMKLDNNDebug flag present";
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I didn't mean to remove the code. It was caused by the wrong merge. I'll handle it.

@zheng-da
Copy link
Contributor

@azai91 could you please review the code?

@piiswrong piiswrong merged commit 15e43c0 into apache:master Aug 24, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
…pache#11664)

* softmax_fallbach

* Fallback Amend
This is the final rectify for fallback problem(functions call)

* Lint amend

* test_try

* Patch for test fail

* Pooling amend

* Delete non_rectified_operation_test

* fallback_normal

* Fixed_dispatch

* activation-amend

* activation second

* activation backward

* activate_try

* activation_debug

* Act change.

* test_random

* mkldnn choice

* format_modify

* rebase
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
…pache#11664)

* softmax_fallbach

* Fallback Amend
This is the final rectify for fallback problem(functions call)

* Lint amend

* test_try

* Patch for test fail

* Pooling amend

* Delete non_rectified_operation_test

* fallback_normal

* Fixed_dispatch

* activation-amend

* activation second

* activation backward

* activate_try

* activation_debug

* Act change.

* test_random

* mkldnn choice

* format_modify

* rebase
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet MKLDNN pr-awaiting-review PR is waiting for code review Sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants