LRN: caching OP and pass workspace from FW to BW #15

pengzhao-intel · 2018-01-11T08:48:25Z

Description

Refine the code structure of LRN and caching OP primitive and memory.
Performance improve from 200 img/sec to 230 image/sec for BS=1 in BDW 2699 v4.

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, Caching OP by hashtable
Feature2, Create workspace in LRNShape (only support FP32 now)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

This commit may add some overhead of managing NDArray for each fallback.

Conflicts: src/operator/nn/mkldnn/mkldnn_batch_norm-inl.h

2. Add memory into signature; 3. Try to split BatchNorm into .h file and .cc file. Will finish it after backward code is refactored.

Caching primitive for BatchNorm forward computation

Add primitive caching for Pooling forward computation

OP primitive cache: use memory as signature for MKLDNN storage type

zheng-da · 2018-01-11T19:13:08Z

src/operator/nn/lrn.cc

@@ -42,6 +50,22 @@ static bool LRNShape(const nnvm::NodeAttrs& attrs,
  out_shape->clear();
  out_shape->push_back(dshape);
  out_shape->push_back(dshape);
+#if MXNET_USE_MKLDNN == 1
+  // Create LRN primitive for getting the workspace size
+  CHECK_EQ(dshape.ndim(), 4U);


does MXNet LRN always run on 4D arrays? We should provide full compatibility with the original MXNet operator.

In general, MKL-DNN is specific for deep learning so that the default input tensor is 4D for all OPs. Currently, the 4D shape is fully supported by MKLDNN and 2D is also can work for OPs.
http://01org.github.io/mkl-dnn/group__c__api__lrn.html

And 2D shape in here is not very computation intensive so it will introduce extra overhead by using MKL-DNN.

I suggest we only enable 4D calculation for MKL-DNN.

zheng-da · 2018-01-11T19:13:53Z

src/operator/nn/lrn.cc

+                          static_cast<int>(dshape[1]),
+                          static_cast<int>(dshape[2]),
+                          static_cast<int>(dshape[3])};
+  auto src_md = memory::desc({ src_tz_ }, memory::data_type::f32,


you use f32 here. will it be a problem when mkldnn supports more types?

f64 also can work for MKL-DNN. The problem is that LRNShape can't get the data type information in the symbolic stage. In the short term, I can allocate memory for both fp32 and fp64 in here and then we can select one in runtime but another one is waste. In the long term, an InferShape function should provide the data type and more information even in the symbolic stage.
What's your opinion?

zheng-da · 2018-01-11T19:16:28Z

src/operator/nn/lrn.cc

  int n_out = 2;
+#endif
  out_type->clear();
  for (int i = 0; i < n_out; ++i ) out_type->push_back(dtype);


is the dtype for workspace correct?

Just check the type of workspace and it's FP32 but the better way is to query the data type by MKL-DNN API.
xxx.get_primitive_desc().desc().data.data_type
Will update the code.

zheng-da · 2018-01-11T20:01:17Z

src/operator/nn/mkldnn/mkldnn_lrn-inl.h

+
+  if (this->is_train) {
+    if (workspace == nullptr) {
+      this->ws_mem.reset(new mkldnn::memory(fwd_pd.workspace_primitive_desc()));


why do you need to create one when workspace is null? if it's null, shouldn't it mean workspace isn't required?

NO. Two points as below.

workspace is mandatory in LRN for the training so we always should have the workspace.

if it's null, it means we use this class with the really stateless method. we don't pass any dependence information between FW and BW or time step.

zheng-da · 2018-01-11T20:01:50Z

src/operator/nn/mkldnn/mkldnn_lrn-inl.h

+static MKLDNNLRNFwd &GetLRNFwd(const LRNParam& param,
+                                      const OpContext &ctx,
+                                      const NDArray &in_data,
+                                      const NDArray *workspace) {


pengzhao-intel · 2018-01-13T14:03:57Z

Thanks for your comments and good suggestions.

zheng-da · 2018-01-13T20:56:13Z

Given the current interface, I think the best solution for workspace is to use stateful compute. The nnvm interface provides such an option.

pengzhao-intel · 2018-01-14T03:20:56Z

@zheng-da I agree with you about stateful OP.
Could you help provide a link or an example about stateful OP?
Thanks

pengzhao-intel · 2018-01-15T12:30:04Z

will use new code and re-submit again.

* update tests. * fix shape/dtype/storage inference. * fix.

* Added tutorial for FIT API * Added tests for Fit API tutorial * Updated index.md for the new tutorial to show up * Addressed PR feedback * Addressed PR feedback * Removed spurious comment for Py2 and Py3 compatibility * Address PR feedback * Addressed PR feedback * Fixed typo * Added example to showcase custom event handler * Fixed imports as estimator moved to contrib package * Added a side note to inform about estimator reference being updated by the handlers * Corrected typo * update tutorial * address comments * new line * fix import * fix cached graph * fix import * address comments * fix doc gen * add softmax * add to website index * fix doc string * Fix doc gen (zheng-da#12) * fix warining * fix test * fix * fix * fix print * fix test (zheng-da#13) * fix warning (zheng-da#14) * fix href (zheng-da#15)

zheng-da added 30 commits December 17, 2017 17:46

Make batchnorm stateless.

f4c6f1c

Make SoftmaxActivation stateless.

30b5fd9

Fix a code style problem.

95ef90e

pass amalgamation test for batch norm.

921859a

pass amalgamation test for dropout.

485f58f

Get convolution ops from a function.

660968f

Fix compilation errors for GPU.

26e9430

Fix thread local in diff platforms.

5504e2c

Avoid using thread_local for non-CuDNN conv/deconv.

6324176

Remove TODO in deconv.

36c466f

Fix a compilation error in dropout.

6410684

Fix a bug in batch norm.

1fa3898

Fix a bug in fully connected.

588383a

Don't set #inputs for backward convolution.

66a281a

Remove MKL code.

d3ce902

Update MXNet for MKLDNN.

caa3bf3

Enable MKLDNN Relu.

db10bb1

Fix a compilation error.

99c1e08

Change Makefile for MKLDNN.

a6c2c82

Remove infer storage in convolution.

3f75f52

Update MXNet for MKLDNN.

edf6842

Support MKLDNN storage type in python.

c96ca26

Update activation.

1a6e06e

Add MKLDNN base classes.

ca30cac

Implement MKLDNN fully connected.

79c563c

Add MKLDNN convolution.

2f5ed28

Update MKLDNN interface in NDArray.

126b85e

MKLDNN convolution handle CreateMKLDNNData failure.

7672608

Add another GetMKLDNNData in NDArray.

4064bef

Have mkldnn to define the data format.

9c6bf6f

zheng-da and others added 15 commits January 2, 2018 21:23

Fix compilation error without MKLDNN.

19d8749

Fix a bug in (de)conv for weight arrays.

18236fc

Fix a minor bug in MKLDNN conv.

1cd8bad

Avoid caching TBlob from NDArray.

9b3c8b2

This commit may add some overhead of managing NDArray for each fallback.

Fix a bug in MKLDNNOpSignature.

c426bfa

Merge remote-tracking branch 'da/refactor' into bn-primitive

623b994

Conflicts: src/operator/nn/mkldnn/mkldnn_batch_norm-inl.h

1. Fix coding style in BatchNorm;

5825191

2. Add memory into signature; 3. Try to split BatchNorm into .h file and .cc file. Will finish it after backward code is refactored.

Merge pull request #5 from TaoLv/bn-primitive

87fd9d5

Caching primitive for BatchNorm forward computation

Update mkldnn_base-inl.h

7c957ef

OP primitive cache: use memory as signature for MKLDNN storage type

b4cdd13

Add primitive caching for Pooling forward computation

15199c7

Merge pull request #13 from TaoLv/pooling-primitive

e814b3e

Add primitive caching for Pooling forward computation

Merge pull request #11 from jinhuang415/refactor_pr1

1cba97f

OP primitive cache: use memory as signature for MKLDNN storage type

LRN Op caching and workspace pass from FW to BW

4b2d86f

Remove unnecessary code

bd62197

zheng-da force-pushed the refactor branch from 1cba97f to 7fa4d0a Compare January 11, 2018 20:13

zheng-da reviewed Jan 12, 2018

View reviewed changes

zheng-da force-pushed the refactor branch 3 times, most recently from 116b4f0 to b566556 Compare January 13, 2018 07:40

zheng-da force-pushed the refactor branch from d6402e0 to ea46c2f Compare January 15, 2018 05:20

pengzhao-intel closed this Jan 15, 2018

zheng-da added a commit that referenced this pull request Jun 7, 2018

fix shape/dtype/storage inference. (#15)

e0a6369

* update tests. * fix shape/dtype/storage inference. * fix.

zheng-da added a commit that referenced this pull request Jun 13, 2018

fix shape/dtype/storage inference. (#15)

734d0aa

* update tests. * fix shape/dtype/storage inference. * fix.

TaoLv pushed a commit to TaoLv/incubator-mxnet that referenced this pull request Jun 16, 2018

fix shape/dtype/storage inference. (zheng-da#15)

c7d2719

* update tests. * fix shape/dtype/storage inference. * fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LRN: caching OP and pass workspace from FW to BW #15

LRN: caching OP and pass workspace from FW to BW #15

pengzhao-intel commented Jan 11, 2018

zheng-da Jan 11, 2018

pengzhao-intel Jan 13, 2018 •

edited

Loading

zheng-da Jan 11, 2018

pengzhao-intel Jan 13, 2018

zheng-da Jan 11, 2018

pengzhao-intel Jan 13, 2018 •

edited

Loading

zheng-da Jan 11, 2018

pengzhao-intel Jan 13, 2018

zheng-da Jan 11, 2018

pengzhao-intel Jan 13, 2018

pengzhao-intel commented Jan 13, 2018

zheng-da commented Jan 13, 2018

pengzhao-intel commented Jan 14, 2018

pengzhao-intel commented Jan 15, 2018

LRN: caching OP and pass workspace from FW to BW #15

LRN: caching OP and pass workspace from FW to BW #15

Conversation

pengzhao-intel commented Jan 11, 2018

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

pengzhao-intel Jan 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel Jan 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Jan 13, 2018

zheng-da commented Jan 13, 2018

pengzhao-intel commented Jan 14, 2018

pengzhao-intel commented Jan 15, 2018

pengzhao-intel Jan 13, 2018 •

edited

Loading

pengzhao-intel Jan 13, 2018 •

edited

Loading