Add NHWC layout support to Pooling (cuDNN only) #13362

ptrendx · 2018-11-21T21:28:37Z

Description

This PR adds NHWC layout support to Pooling operator. Work mostly done by @DickJC123

Training using TensorCores is more efficient when using NHWC data layout. Convolutions and BatchNorm operators in MXNet already have support for NHWC, while Pooling operator currently accepts only NCHW format. This PR introduces NHWC layout support for Pooling when using cuDNN.

This PR is part of upstreaming improvements available in NVIDIA's NGC MXNet container and was tested as part of it to train multiple convnets to expected accuracy. We also introduced additional tests checking consistency between NHWC and NCHW versions of pooling.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

Support for channel last is done only when cuDNN is enabled (noted in the updated documentation)

stu1130 · 2018-11-21T22:06:53Z

@mxnet-label-bot add [pr-awaiting-review]
Thanks for your contribution @ptrendx

KellenSunderland · 2018-11-23T20:40:16Z

Thanks for the contribution guys. FYI: I'm keeping an eye on this one and will do my best to get it past CI. We had a few CI issues that this PR looks like it has been caught up in. If you have a chance rebasing might help.

marcoabreu · 2018-11-23T21:39:39Z

I took care of it :)

I actually just wanted to show up as contributor of this PR :P Just kidding, I hope everything is fine now.

lupesko · 2018-12-05T05:58:32Z

Thanks for the contribution @DickJC123 and @ptrendx !
Adding @apeforest @yuxihu and @anirudh2290 to help review.

Vikas-kum · 2018-12-05T07:05:56Z

@ptrendx can you please add more details in description explaining what is this PR doing and the motivation behind doing it(why is this done). Also, if you can please add any required background information, it would be helpful.

apeforest · 2018-12-05T07:11:01Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

@@ -73,15 +73,17 @@ class CuDNNPoolingOp {
    CUDNN_CALL(cudnnDestroyPoolingDescriptor(pooling_desc_));
  }

-  void Forward(const OpContext &ctx, const TBlob &in_data,
+  // Return boolean saying whether pooling configuration is supported.
+  bool Forward(const OpContext &ctx, const TBlob &in_data,


Is this boolean return necessary? I don't find it in other operators. What happens if it returns false?

apeforest · 2018-12-05T07:11:14Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

  }

-  void Backward(const OpContext &ctx, const TBlob &out_grad,
+  // Return boolean saying whether pooling configuration is supported
+  bool Backward(const OpContext &ctx, const TBlob &out_grad,


apeforest · 2018-12-05T07:11:35Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

  }

 private:
-  inline void Init(mshadow::Stream<gpu> *s, const TBlob &in_data,
+  // Return boolean saying whether pooling configuration is supported
+  inline bool Init(mshadow::Stream<gpu> *s, const TBlob &in_data,


apeforest · 2018-12-05T07:12:54Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

+      int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1];
+      // CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above.
+      #if CUDNN_VERSION == 7104
+      is_supported = window_height <= 8 && window_width <= 8;


Could you please add a reference to this?

Can we move this check earlier so this function can have an early exit if not supported?

Reference to this - do you mean the cudnn 7.1.4 bug? Not sure where should I put it, it's mentioned here, in fixed issues: https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_721.html#rel_721

Yes, if it does not violate any nvidia license rules, it would be great to add the link here as comment.

Ok, will do.

apeforest

Many thanks for your contribution. I added some comments. Feel free to ping me if you have any question.

apeforest · 2018-12-05T07:15:37Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

-                                            data.shape_[1],
-                                            data.shape_[2],
-                                            data.shape_[3]));
+                                            dshape_nchw[0],


Maybe define enum constant N, C, H, W instead of 0, 1, 2, 3 to improve readability?

apeforest · 2018-12-05T07:17:36Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

+      int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1];
+      // CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above.
+      #if CUDNN_VERSION == 7104
+      is_supported = window_height <= 8 && window_width <= 8;


Can we move this check earlier so this function can have an early exit if not supported?

apeforest · 2018-12-05T07:26:00Z

Adding @sandeep-krishnamurthy for review.

TaoLv · 2018-12-05T07:41:39Z

Have you ever tried to pass NHWC input into MKL-DNN pooling? Seems it's not well handled here.

TaoLv · 2018-12-05T07:43:39Z

The new parameter layout will also be passed to quantized_pooling: https://github.com/apache/incubator-mxnet/blob/master/src/operator/quantization/quantized_pooling.cc

TaoLv · 2018-12-05T08:05:23Z

Do we have any other operator exposing layout information to front user? Does NDArray know the layout information of itself?

apeforest · 2018-12-05T21:41:09Z

python/mxnet/gluon/nn/conv_layers.py

@@ -673,7 +673,7 @@ def __init__(self, channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0),
 class _Pooling(HybridBlock):
    """Abstract class for different pooling layers."""
    def __init__(self, pool_size, strides, padding, ceil_mode, global_pool,
-                 pool_type, count_include_pad=None, **kwargs):
+                 pool_type, layout, count_include_pad=None, **kwargs):


Do we need to expose this option to user? Is there any use case?

I'm not sure I understand - that option was already exposed to users, e.g. see here: https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.MaxPool1D
It just was not passed up to base class and to C++ because pooling supported only 1 layout.

apeforest · 2018-12-05T21:49:35Z

src/operator/nn/pooling-inl.h

@@ -53,6 +53,7 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> {
  bool cudnn_off;
  dmlc::optional<int> p_value;
  dmlc::optional<bool> count_include_pad;
+  dmlc::optional<int> layout;


Why only adding this option to pooling?

Convolutions already have it (layout parameter), batchnorm has notion of axis as well.

apeforest · 2018-12-05T21:50:36Z

@ptrendx It would be great if you can add in the PR description why we need this option. Thanks!

ptrendx · 2018-12-05T22:07:56Z

@TaoLv
Re MKLDNN: Per title of this PR this is enabling NHWC support only for cuDNN path. I don't know whether MKLDNN supports NHWC pooling.
Re quantized pooling: that file seems to assume NCHW layout. What would you like me to do with it?
Re layout parameter: Yes, other operators expose layout information (in fact, pooling already does as well) - see convolutions (e.g. here https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.Conv2D) and batchnorm (axis parameter). NDArray does not know its layout and for many use cases it would not make sense. That is why layout is exposed as an operator parameter.

ptrendx · 2018-12-05T22:11:09Z

@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling?

Vikas-kum

@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling?

@ptrendx I want to make sure that PR description is descriptive enough. For your case: why do we need NHWC support for pooling. What are the benefits that we get with this change and testing done with numbers if any.
I think a good PR description will help the developer community to understand the PR, and also when referring to PR in future.
I would request you to please spend some time in writing a good PR description explaining things that was done, why it was done(motivation) and testing done if any. That would be very helpful to community developers.

sandeep-krishnamurthy

Thank you for this contribution.

sandeep-krishnamurthy · 2018-12-05T22:16:49Z

python/mxnet/gluon/nn/conv_layers.py

@@ -738,12 +741,13 @@ class MaxPool1D(_Pooling):
    """
    def __init__(self, pool_size=2, strides=None, padding=0, layout='NCW',
                 ceil_mode=False, **kwargs):
-        assert layout == 'NCW', "Only supports 'NCW' layout for now"
+        assert layout in ('NCW', 'NWC'),\
+            "Only NCW and NWC layouts are valid for 1D"


nit: 1D Pooling?
Same comment across 2D and 3D

sandeep-krishnamurthy · 2018-12-05T23:02:56Z

src/operator/nn/pooling.cu

+                                                   req[0], outputs[0])) {
+            return;
+          }
+          break;


should we handle else case here?

No - if Backward returns false, program will exit the switch statement and proceed to non-cuDNN implementation.

sandeep-krishnamurthy · 2018-12-05T23:04:24Z

tests/python/gpu/test_operator_gpu.py

+
+
+@with_seed()
+@assert_raises_cudnn_not_satisfied(min_version='7.0.1')


should it rather skip this test if cudnn version is < 7.0.1 ?

Hmm, skipping this test if cudnn version is not satisfied was more or less my understanding of what this decorator is doing. Is it wrong? What should I do here instead?

This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.

ptrendx · 2018-12-05T23:20:32Z

@Vikas89 I took a stab at modifying PR description to include motivation and testing that we did.

DickJC123 · 2018-12-05T23:25:16Z

@TaoLv , providing details and motivation of the PR (partial duplicate of info just added by @ptrendx ):

In MXNet, layout is not something that is stored with the NDArray. Some operators, like pointwise ones, don't even care about the layout, and will produce the same output regardless of layout. Other operators, like Convolution, Batchnorm and Pooling will need to be told the layout. Convolution supports a limited number of layouts via the 'layout' parameter, e.g. layout='NHWC'. Batchnorm doesn't need to know everything about the layout, just which dimension is the 'C' dimension. For this, the Batchnorm op accepts the axis parameter, e.g. axis=3 for NHWC batchnorm. Prior to this PR, in MXNet, the Pooling operator did not have a parameter to specify the layout, forcing a transposition always to NCHW.

We have two goals with this PR:
- Create a way to inform Pooling of the layout, in the style of the Convolution 'layout' parameter, thereby allowing direct use of the arbitrary-layout Pooling support offered by cudnn, and
- Enable MXNet to support an end-to-end processing of NHWC-layout data (i.e. no transposes), which is particularly efficient in mixed-precision on Volta Tensor Cores.

DickJC123 · 2018-12-06T00:05:38Z

@apeforest Re: bool return of Forward(), Backward(), etc.

As you know, MXNet has been moving to a more imperative style of processing. Before that, all operators were subclasses of Operator, which defined Forward() and Backward() as returning void. Now, operators like Pooling are their own class and can define Forward() and Backward() as they desire. In addition, with every operator, we face the chore of selecting the best implementation (cudnn, cuda, mkldnn, etc.). Rather than code the selection logic in a central place, I recommend each implementation be asked (in a favored order) whether it supports the operation, given its parameters. With the cudnn convolution operator back in the "pre-imperative era", I added a 'Supports(..., param, ...) method. Sadly, this tended to repeat a lot of the checks and analysis done by the actual Forward() and Backward() calls. Rather than adding a repetitive Supports() method to Pooling, I realized how much cleaner it was to just have Forward() or Backward() return a flag indicating whether the function could be performed. This allows the implementation selector to cleanly fall back to another implementation.

TaoLv · 2018-12-06T02:43:13Z

Prior to this PR, in MXNet, the Pooling operator did not have a parameter to specify the layout, forcing a transposition always to NCHW.

@DickJC123 Good to know that. Could you explain a bit more about when and where this kind of transposition happens?

With this new feature, what's the typical usage in user model? Does user need to specify layout in each layer and each operator?

marcoabreu · 2018-12-06T02:54:10Z

I love that approach about the operator selection, dick!

DickJC123 · 2018-12-06T03:18:08Z

The forced transposition I referred to is done currently by the user in the python model. With this new feature, the Transpose operators could get removed in favor of a Pooling with the new layout parameter.

I'm a bit sorry that NDArray's don't carry the layout information. As a result, the layout has to travel along with the data ('out-of-band') and be passed into the operators along the way. On the other hand, having multi-input operators deal with inconsistent layouts would be a headache. Anyway, that's a discussion to be debated in its own thread.

TaoLv · 2018-12-06T04:05:56Z

Sure. It's a problem of MXNet, not this PR. Do you mind documenting some where what does the output layout look like? Then user can know what they need to do before sending the output to next layer.

apeforest

@DickJC123 Thanks for your detailed explanation about the operator selection logic. I like the way you used a boolean return type for Forward() and Backward() methods to choose the right operator implementation. Although it might be an elegant way, this method is not yet well received in the developer community. And all the other operators do not currently implement in this way. I am afraid that using it alone in this operator without a clear guideline may cause more confusion for prospect developers. Can you still follow the traditional way of declaring Forward() and Backward() function and maybe propose this in a separate thread to refactor the Forward() and Backward() with boolean return type?

TaoLv · 2018-12-06T07:11:14Z

Tested this PR on CPU w/ and w/o MKL-DNN and got error message as below.
w/o MKL-DNN:

Traceback (most recent call last):
  File "test_pool_nhwc.py", line 28, in <module>
    test_pooling_nhwc()
  File "test_pool_nhwc.py", line 21, in test_pooling_nhwc
    out_nhwc = mx.nd.Pooling(data=inp_nhwc, kernel=kernel, pool_type='max', global_pool=False, stride=stride, pad=pad, layout='NHWC')
  File "<string>", line 132, in Pooling
  File "/home/lvtao/Workspace/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/home/lvtao/Workspace/mxnet/python/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [12:35:52] src/operator/nn/./pooling-inl.h:167: Check failed: param_.layout.value() == kNCW || param_.layout.value() == kNCHW || param_.layout.value() == kNCDHW Need CuDNN for layout support

w/ MKL-DNN:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [12:28:11] src/engine/./threaded_engine.h:380: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 10 entries:
[bt] (0) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace()+0x42) [0x7f547889afb6]
[bt] (1) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1b) [0x7f547889b233]
[bt] (2) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x66e) [0x7f547b7ad24c]
[bt] (3) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::CPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x8b) [0x7f547b7bfe9d]
[bt] (4) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}::operator()(dmlc::ManualEvent) const+0x33) [0x7f547b7be22b]
[bt] (5) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x4a) [0x7f547b7c2751]
[bt] (6) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::function<void (std::shared_ptr<dmlc::ManualEvent>)>::operator()(std::shared_ptr<dmlc::ManualEvent>) const+0x5c) [0x7f547b7b6142]
[bt] (7) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(void std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x56) [0x7f547b7b6036]
[bt] (8) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::operator()()+0x1b) [0x7f547b7b5ee1]
[bt] (9) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x1c) [0x7f547b7b5e30]


Aborted (core dumped)

marcoabreu · 2018-12-06T09:45:24Z

I would be prefer to leave the booleans in. If there is no harm in taking that approach, I'd rather go down that path and use this operator implementation as example for others rather than waiting for a saint who's going to refactor everything - because that won't happen.

Dicks designs often greatly improved the architecture or usability of mxnet, thus I really appreciate these things and definitely would not like to see these improvements blocked because some people can't adapt that fast.

This project had some sub-par design decisions that mainly involved choosing preprocessor statements instead of proper class design and abstraction. This PR is a good step towards the first direction and I think everybody should support this path.

I'm feeling strong for having this new method in as long as it doesn't have any drawbacks. If there are any, I'm happy to revisit my decision.

P.s. I'm not a fan of carrying the data layout outside the data structure, but that's also something we can address in future with a proper class design for ndarray and operators by allowing operators to define their favorite layout and automatic converters from the engine side. If the layout doesn't match (like it's usually the problem for cudnn and mkldnn), it could then be automatically converted into the appropriate format. But this PR is a step towards the right direction, but it can't solve all problems at once. We should appreciate these improvements and I'm sure Dick - or somebody else - will come up with a good design and then provide a reference implementation.

marcoabreu · 2018-12-06T09:49:34Z

Thanks for the test Tao! Can you check why that has not been caught by our CI? What are we missing?

TaoLv · 2018-12-06T09:52:51Z

Seems there is no negative test for this feature. Only GPU with certain versions of CUDNN will be tested.

marcoabreu · 2018-12-06T09:58:28Z

Great catch! Do you mind adding a test that runs on both cpu and gpu?

I'm still having trouble to understand why we are exposing the Backend (whether it's cudnn or mkldnn) to the frontend and making conditional tests. The tests should work with any Backend (except if there is no valid implementation available). In an ideal world, we would just call the Operator and the mkldnn/cudnn magic would be done by the Backend. I think this new structure is a good step towards that.
I mean I get why we have these conditions, but can't we have some type of fallback?

Can we already add a generalized test that both works on CPU and GPU at this stage to catch Taos error?

apeforest · 2018-12-06T19:56:49Z

@marcoabreu The harm of leaving this boolean return type in this PR is that it will cause confusion to developers and also leave unclear messages to MXNet users. E.g. when it returns false, what happens in the Python frontend, any clear message? If this is meant to signal the frontend that this operatator implementation is not supported on particular platform, we need a consistent log message (in fact, we already achieve this via a different way in https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_pooling.cc#L53). Falling back to the conventional void return type seems not affect any functionality of this PR. I would like this PR to mainly focusing on the new feature in Pooling only and we can have a separate PR to change void to boolean return type and invite more discussions there. I am open to any other suggestions and add @Vikas89 @yuxihu for their comments.

yuxihu · 2018-12-07T00:03:35Z

src/operator/nn/pooling-inl.h

@@ -92,6 +93,16 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> {
              "calculation. For example, with a 5*5 kernel on a 3*3 corner of a image,"
              "the sum of the 9 valid elements will be divided by 25 if this is set to true,"
              "or it will be divided by 9 if this is set to false. Defaults to true.");
+
+    DMLC_DECLARE_FIELD(layout)


Where is the enum for NWC defined? I see it is used in multiple places in other files like pooling.cc.

yuxihu · 2018-12-07T00:06:02Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

                                             param_.global_pool ? 1 : param_.stride[0],
                                             param_.global_pool ? 1 : param_.stride[1]));
      #endif
    } else {
+      CHECK(param_.layout.value() == mshadow::kNCDHW ||
+            param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout";
+      CHECK(param_.layout.value() == mshadow::kNCDHW) << "Only the NCDHW layout is supported.";


If only NCDHW layout is supported, it seems me that there is no need to have the above "Need 3D layout" check.

Besides that, I am also a little bit confused here. In conv_layers.py, it is mentioned that "Dimension ordering of data and weight. Only supports 'NCDHW' and 'NDHWC' (only with cuDNN) layouts for now." for all 3D pool classes. Why here we only support NCDHW? This is CuDNNPoolingOp class, right? Am I missing something?

anirudh2290

Since the reason for the implementation is to fallback cleanly and since this is a helper class for cudnn pooling it looks fine. @apeforest has a good point about the error message though. When we fallback to the non cudnn implementation can it provide an error message with LOG(WARNING) on why it fellback when is_supported is false.

anirudh2290 · 2018-12-06T23:49:56Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

    #if CUDNN_MAJOR >= 5
    nan_prop_ = CUDNN_NOT_PROPAGATE_NAN;
    #endif
    if (param_.kernel.ndim() == 2) {
      // 2d conv
+      CHECK(param_.layout.value() == mshadow::kNCHW ||
+            param_.layout.value() == mshadow::kNHWC) << "Need 2D layout";


nit: can we mention, "Need 2D layout: NCHW or NHWC"

anirudh2290 · 2018-12-06T23:57:13Z

src/operator/nn/cudnn/cudnn_pooling-inl.h

                                             param_.global_pool ? 1 : param_.stride[0],
                                             param_.global_pool ? 1 : param_.stride[1]));
      #endif
    } else {
+      CHECK(param_.layout.value() == mshadow::kNCDHW ||
+            param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout";


is this needed ?

anirudh2290 · 2018-12-07T00:02:31Z

src/operator/nn/pooling-inl.h

@@ -149,6 +162,9 @@ class PoolingOp {
  void Forward(const OpContext& ctx, const TBlob& in_data,
               const OpReqType& req, const TBlob& out_data) {
    using namespace mshadow;
+    CHECK(param_.layout.value() == kNCW ||
+          param_.layout.value() == kNCHW ||
+          param_.layout.value() == kNCDHW) << "Need CuDNN for layout support";


can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()

anirudh2290 · 2018-12-07T00:02:40Z

src/operator/nn/pooling-inl.h

@@ -198,6 +214,9 @@ class PoolingOp {
                const TBlob& in_data, const TBlob& out_data,
                const OpReqType& req, const TBlob& in_grad) {
    using namespace mshadow;
+    CHECK(param_.layout.value() == kNCW ||
+          param_.layout.value() == kNCHW ||
+          param_.layout.value() == kNCDHW) << "Need CuDNN for layout support";


can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()

anirudh2290 · 2018-12-07T00:26:05Z

tests/python/gpu/test_operator_gpu.py

+
+
+@with_seed()
+@assert_raises_cudnn_not_satisfied(min_version='7.0.1')


This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.

marcoabreu · 2018-12-07T13:45:36Z

@apeforest I expect that this PR is one of multiple. In the end, there will be a dispatcher in between the operators and the initial call to the operators. That dispatcher will evaluate the capabilities of every registered operator and then return a unified message. The frontends shouldn't even know about the fact that there may be multiple operators.

The way it was done in mkldnn is also not consistent. This evaluation has to come from a central component and the further processing as well. In this particular example you gave, the process would then fail to execute an operator although there might be other implementations available that actually support the requested operation.

I think you got valid points regarding what's going to happen with the return value and at which point this logic is finally handled. Dick might be happy to answer this.

eric-haibin-lin · 2018-12-18T01:29:11Z

Although @DickJC123 already worked on the CPU implementation - I personally think it's totally fine to merge it if the PR only contains CUDNN support for NHWC and throw meaningful err msg if CPU is used. As a community we can incrementally add more implementations

ptrendx · 2018-12-31T01:32:40Z

Closing in favor of PR #13749

ptrendx requested review from anirudh2290 and szha as code owners November 21, 2018 21:28

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 21, 2018

DickJC123 and others added 2 commits November 21, 2018 14:14

Adds layout support: mx.sym.Pooling(..., layout='NHWC',...) with tests.

cf0dc09

Docs changes

55ebfb7

ptrendx force-pushed the pr_nhwc_pooling branch from 7df5b10 to 55ebfb7 Compare November 21, 2018 22:14

Trigger

79e07fd

ptrendx added 2 commits November 26, 2018 14:18

Skip NHWC pooling tests on non-cuDNN platforms

4e0d0e9

Fix pylint NHWC pooling

c7e06a9

apeforest reviewed Dec 5, 2018

View reviewed changes

Vikas-kum suggested changes Dec 5, 2018

View reviewed changes

sandeep-krishnamurthy reviewed Dec 5, 2018

View reviewed changes

Fixes from review

7fcbdac

apeforest suggested changes Dec 6, 2018

View reviewed changes

yuxihu suggested changes Dec 7, 2018

View reviewed changes

anirudh2290 reviewed Dec 7, 2018

View reviewed changes

nswamy added the Operator label Dec 7, 2018

DickJC123 mentioned this pull request Dec 31, 2018

Add NHWC layout support to Pooling (cpu, gpu cuda, gpu cuDNN) #13749

Merged

7 tasks

ptrendx closed this Dec 31, 2018

ptrendx deleted the pr_nhwc_pooling branch January 3, 2019 22:33



		@with_seed()
		@assert_raises_cudnn_not_satisfied(min_version='7.0.1')

Add NHWC layout support to Pooling (cuDNN only) #13362

Add NHWC layout support to Pooling (cuDNN only) #13362

Conversation

ptrendx commented Nov 21, 2018 • edited Loading

Description

Checklist

Essentials

Comments

stu1130 commented Nov 21, 2018 • edited Loading

KellenSunderland commented Nov 23, 2018

marcoabreu commented Nov 23, 2018

lupesko commented Dec 5, 2018

Vikas-kum commented Dec 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented Dec 5, 2018

TaoLv commented Dec 5, 2018

TaoLv commented Dec 5, 2018

TaoLv commented Dec 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apeforest commented Dec 5, 2018

ptrendx commented Dec 5, 2018

ptrendx commented Dec 5, 2018

Vikas-kum left a comment

Choose a reason for hiding this comment

sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrendx commented Dec 5, 2018

DickJC123 commented Dec 5, 2018

DickJC123 commented Dec 6, 2018

TaoLv commented Dec 6, 2018

marcoabreu commented Dec 6, 2018

DickJC123 commented Dec 6, 2018

TaoLv commented Dec 6, 2018

apeforest left a comment • edited Loading

Choose a reason for hiding this comment

TaoLv commented Dec 6, 2018 • edited Loading

marcoabreu commented Dec 6, 2018 • edited Loading

marcoabreu commented Dec 6, 2018

TaoLv commented Dec 6, 2018

marcoabreu commented Dec 6, 2018 • edited Loading

apeforest commented Dec 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu commented Dec 7, 2018

eric-haibin-lin commented Dec 18, 2018

ptrendx commented Dec 31, 2018

ptrendx commented Nov 21, 2018 •

edited

Loading

stu1130 commented Nov 21, 2018 •

edited

Loading

apeforest left a comment •

edited

Loading

apeforest left a comment •

edited

Loading

TaoLv commented Dec 6, 2018 •

edited

Loading

marcoabreu commented Dec 6, 2018 •

edited

Loading

marcoabreu commented Dec 6, 2018 •

edited

Loading

anirudh2290 left a comment •

edited

Loading