Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add NHWC layout support to Pooling (cuDNN only) #13362

Closed
wants to merge 6 commits into from

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Nov 21, 2018

Description

This PR adds NHWC layout support to Pooling operator. Work mostly done by @DickJC123

Training using TensorCores is more efficient when using NHWC data layout. Convolutions and BatchNorm operators in MXNet already have support for NHWC, while Pooling operator currently accepts only NCHW format. This PR introduces NHWC layout support for Pooling when using cuDNN.

This PR is part of upstreaming improvements available in NVIDIA's NGC MXNet container and was tested as part of it to train multiple convnets to expected accuracy. We also introduced additional tests checking consistency between NHWC and NCHW versions of pooling.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

  • Support for channel last is done only when cuDNN is enabled (noted in the updated documentation)

@stu1130
Copy link
Contributor

stu1130 commented Nov 21, 2018

@mxnet-label-bot add [pr-awaiting-review]
Thanks for your contribution @ptrendx

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 21, 2018
@KellenSunderland
Copy link
Contributor

Thanks for the contribution guys. FYI: I'm keeping an eye on this one and will do my best to get it past CI. We had a few CI issues that this PR looks like it has been caught up in. If you have a chance rebasing might help.

@marcoabreu
Copy link
Contributor

I took care of it :)

I actually just wanted to show up as contributor of this PR :P Just kidding, I hope everything is fine now.

@lupesko
Copy link
Contributor

lupesko commented Dec 5, 2018

Thanks for the contribution @DickJC123 and @ptrendx !
Adding @apeforest @yuxihu and @anirudh2290 to help review.

@Vikas-kum
Copy link
Contributor

@ptrendx can you please add more details in description explaining what is this PR doing and the motivation behind doing it(why is this done). Also, if you can please add any required background information, it would be helpful.

@@ -73,15 +73,17 @@ class CuDNNPoolingOp {
CUDNN_CALL(cudnnDestroyPoolingDescriptor(pooling_desc_));
}

void Forward(const OpContext &ctx, const TBlob &in_data,
// Return boolean saying whether pooling configuration is supported.
bool Forward(const OpContext &ctx, const TBlob &in_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this boolean return necessary? I don't find it in other operators. What happens if it returns false?

}

void Backward(const OpContext &ctx, const TBlob &out_grad,
// Return boolean saying whether pooling configuration is supported
bool Backward(const OpContext &ctx, const TBlob &out_grad,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

}

private:
inline void Init(mshadow::Stream<gpu> *s, const TBlob &in_data,
// Return boolean saying whether pooling configuration is supported
inline bool Init(mshadow::Stream<gpu> *s, const TBlob &in_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1];
// CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above.
#if CUDNN_VERSION == 7104
is_supported = window_height <= 8 && window_width <= 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a reference to this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this check earlier so this function can have an early exit if not supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference to this - do you mean the cudnn 7.1.4 bug? Not sure where should I put it, it's mentioned here, in fixed issues: https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_721.html#rel_721

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it does not violate any nvidia license rules, it would be great to add the link here as comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for your contribution. I added some comments. Feel free to ping me if you have any question.

data.shape_[1],
data.shape_[2],
data.shape_[3]));
dshape_nchw[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe define enum constant N, C, H, W instead of 0, 1, 2, 3 to improve readability?

int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1];
// CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above.
#if CUDNN_VERSION == 7104
is_supported = window_height <= 8 && window_width <= 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this check earlier so this function can have an early exit if not supported?

@apeforest
Copy link
Contributor

Adding @sandeep-krishnamurthy for review.

@TaoLv
Copy link
Member

TaoLv commented Dec 5, 2018

Have you ever tried to pass NHWC input into MKL-DNN pooling? Seems it's not well handled here.

@TaoLv
Copy link
Member

TaoLv commented Dec 5, 2018

The new parameter layout will also be passed to quantized_pooling: https://github.com/apache/incubator-mxnet/blob/master/src/operator/quantization/quantized_pooling.cc

@TaoLv
Copy link
Member

TaoLv commented Dec 5, 2018

Do we have any other operator exposing layout information to front user? Does NDArray know the layout information of itself?

@@ -673,7 +673,7 @@ def __init__(self, channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0),
class _Pooling(HybridBlock):
"""Abstract class for different pooling layers."""
def __init__(self, pool_size, strides, padding, ceil_mode, global_pool,
pool_type, count_include_pad=None, **kwargs):
pool_type, layout, count_include_pad=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expose this option to user? Is there any use case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand - that option was already exposed to users, e.g. see here: https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.MaxPool1D
It just was not passed up to base class and to C++ because pooling supported only 1 layout.

@@ -53,6 +53,7 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> {
bool cudnn_off;
dmlc::optional<int> p_value;
dmlc::optional<bool> count_include_pad;
dmlc::optional<int> layout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only adding this option to pooling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convolutions already have it (layout parameter), batchnorm has notion of axis as well.

@apeforest
Copy link
Contributor

@ptrendx It would be great if you can add in the PR description why we need this option. Thanks!

@ptrendx
Copy link
Member Author

ptrendx commented Dec 5, 2018

@TaoLv
Re MKLDNN: Per title of this PR this is enabling NHWC support only for cuDNN path. I don't know whether MKLDNN supports NHWC pooling.
Re quantized pooling: that file seems to assume NCHW layout. What would you like me to do with it?
Re layout parameter: Yes, other operators expose layout information (in fact, pooling already does as well) - see convolutions (e.g. here https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.Conv2D) and batchnorm (axis parameter). NDArray does not know its layout and for many use cases it would not make sense. That is why layout is exposed as an operator parameter.

@ptrendx
Copy link
Member Author

ptrendx commented Dec 5, 2018

@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling?

Copy link
Contributor

@Vikas-kum Vikas-kum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling?

@ptrendx I want to make sure that PR description is descriptive enough. For your case: why do we need NHWC support for pooling. What are the benefits that we get with this change and testing done with numbers if any.
I think a good PR description will help the developer community to understand the PR, and also when referring to PR in future.
I would request you to please spend some time in writing a good PR description explaining things that was done, why it was done(motivation) and testing done if any. That would be very helpful to community developers.

Copy link
Contributor

@sandeep-krishnamurthy sandeep-krishnamurthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution.

@@ -738,12 +741,13 @@ class MaxPool1D(_Pooling):
"""
def __init__(self, pool_size=2, strides=None, padding=0, layout='NCW',
ceil_mode=False, **kwargs):
assert layout == 'NCW', "Only supports 'NCW' layout for now"
assert layout in ('NCW', 'NWC'),\
"Only NCW and NWC layouts are valid for 1D"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 1D Pooling?
Same comment across 2D and 3D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

req[0], outputs[0])) {
return;
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle else case here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - if Backward returns false, program will exit the switch statement and proceed to non-cuDNN implementation.



@with_seed()
@assert_raises_cudnn_not_satisfied(min_version='7.0.1')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it rather skip this test if cudnn version is < 7.0.1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, skipping this test if cudnn version is not satisfied was more or less my understanding of what this decorator is doing. Is it wrong? What should I do here instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.

@ptrendx
Copy link
Member Author

ptrendx commented Dec 5, 2018

@Vikas89 I took a stab at modifying PR description to include motivation and testing that we did.

@DickJC123
Copy link
Contributor

@TaoLv , providing details and motivation of the PR (partial duplicate of info just added by @ptrendx ):

In MXNet, layout is not something that is stored with the NDArray. Some operators, like pointwise ones, don't even care about the layout, and will produce the same output regardless of layout. Other operators, like Convolution, Batchnorm and Pooling will need to be told the layout. Convolution supports a limited number of layouts via the 'layout' parameter, e.g. layout='NHWC'. Batchnorm doesn't need to know everything about the layout, just which dimension is the 'C' dimension. For this, the Batchnorm op accepts the axis parameter, e.g. axis=3 for NHWC batchnorm. Prior to this PR, in MXNet, the Pooling operator did not have a parameter to specify the layout, forcing a transposition always to NCHW.

We have two goals with this PR:
- Create a way to inform Pooling of the layout, in the style of the Convolution 'layout' parameter, thereby allowing direct use of the arbitrary-layout Pooling support offered by cudnn, and
- Enable MXNet to support an end-to-end processing of NHWC-layout data (i.e. no transposes), which is particularly efficient in mixed-precision on Volta Tensor Cores.

@DickJC123
Copy link
Contributor

@apeforest Re: bool return of Forward(), Backward(), etc.

As you know, MXNet has been moving to a more imperative style of processing. Before that, all operators were subclasses of Operator, which defined Forward() and Backward() as returning void. Now, operators like Pooling are their own class and can define Forward() and Backward() as they desire. In addition, with every operator, we face the chore of selecting the best implementation (cudnn, cuda, mkldnn, etc.). Rather than code the selection logic in a central place, I recommend each implementation be asked (in a favored order) whether it supports the operation, given its parameters. With the cudnn convolution operator back in the "pre-imperative era", I added a 'Supports(..., param, ...) method. Sadly, this tended to repeat a lot of the checks and analysis done by the actual Forward() and Backward() calls. Rather than adding a repetitive Supports() method to Pooling, I realized how much cleaner it was to just have Forward() or Backward() return a flag indicating whether the function could be performed. This allows the implementation selector to cleanly fall back to another implementation.

@TaoLv
Copy link
Member

TaoLv commented Dec 6, 2018

Prior to this PR, in MXNet, the Pooling operator did not have a parameter to specify the layout, forcing a transposition always to NCHW.

@DickJC123 Good to know that. Could you explain a bit more about when and where this kind of transposition happens?

With this new feature, what's the typical usage in user model? Does user need to specify layout in each layer and each operator?

@marcoabreu
Copy link
Contributor

I love that approach about the operator selection, dick!

@DickJC123
Copy link
Contributor

The forced transposition I referred to is done currently by the user in the python model. With this new feature, the Transpose operators could get removed in favor of a Pooling with the new layout parameter.

I'm a bit sorry that NDArray's don't carry the layout information. As a result, the layout has to travel along with the data ('out-of-band') and be passed into the operators along the way. On the other hand, having multi-input operators deal with inconsistent layouts would be a headache. Anyway, that's a discussion to be debated in its own thread.

@TaoLv
Copy link
Member

TaoLv commented Dec 6, 2018

Sure. It's a problem of MXNet, not this PR. Do you mind documenting some where what does the output layout look like? Then user can know what they need to do before sending the output to next layer.

Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DickJC123 Thanks for your detailed explanation about the operator selection logic. I like the way you used a boolean return type for Forward() and Backward() methods to choose the right operator implementation. Although it might be an elegant way, this method is not yet well received in the developer community. And all the other operators do not currently implement in this way. I am afraid that using it alone in this operator without a clear guideline may cause more confusion for prospect developers. Can you still follow the traditional way of declaring Forward() and Backward() function and maybe propose this in a separate thread to refactor the Forward() and Backward() with boolean return type?

@TaoLv
Copy link
Member

TaoLv commented Dec 6, 2018

Tested this PR on CPU w/ and w/o MKL-DNN and got error message as below.
w/o MKL-DNN:

Traceback (most recent call last):
  File "test_pool_nhwc.py", line 28, in <module>
    test_pooling_nhwc()
  File "test_pool_nhwc.py", line 21, in test_pooling_nhwc
    out_nhwc = mx.nd.Pooling(data=inp_nhwc, kernel=kernel, pool_type='max', global_pool=False, stride=stride, pad=pad, layout='NHWC')
  File "<string>", line 132, in Pooling
  File "/home/lvtao/Workspace/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/home/lvtao/Workspace/mxnet/python/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [12:35:52] src/operator/nn/./pooling-inl.h:167: Check failed: param_.layout.value() == kNCW || param_.layout.value() == kNCHW || param_.layout.value() == kNCDHW Need CuDNN for layout support

w/ MKL-DNN:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [12:28:11] src/engine/./threaded_engine.h:380: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 10 entries:
[bt] (0) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace()+0x42) [0x7f547889afb6]
[bt] (1) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1b) [0x7f547889b233]
[bt] (2) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x66e) [0x7f547b7ad24c]
[bt] (3) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(void mxnet::engine::ThreadedEnginePerDevice::CPUWorker<(dmlc::ConcurrentQueueType)0>(mxnet::Context, mxnet::engine::ThreadedEnginePerDevice::ThreadWorkerBlock<(dmlc::ConcurrentQueueType)0>*, std::shared_ptr<dmlc::ManualEvent> const&)+0x8b) [0x7f547b7bfe9d]
[bt] (4) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}::operator()(dmlc::ManualEvent) const+0x33) [0x7f547b7be22b]
[bt] (5) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0x4a) [0x7f547b7c2751]
[bt] (6) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::function<void (std::shared_ptr<dmlc::ManualEvent>)>::operator()(std::shared_ptr<dmlc::ManualEvent>) const+0x5c) [0x7f547b7b6142]
[bt] (7) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(void std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x56) [0x7f547b7b6036]
[bt] (8) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)>::operator()()+0x1b) [0x7f547b7b5ee1]
[bt] (9) /home/lvtao/Workspace/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x1c) [0x7f547b7b5e30]


Aborted (core dumped)

@marcoabreu
Copy link
Contributor

marcoabreu commented Dec 6, 2018

I would be prefer to leave the booleans in. If there is no harm in taking that approach, I'd rather go down that path and use this operator implementation as example for others rather than waiting for a saint who's going to refactor everything - because that won't happen.

Dicks designs often greatly improved the architecture or usability of mxnet, thus I really appreciate these things and definitely would not like to see these improvements blocked because some people can't adapt that fast.

This project had some sub-par design decisions that mainly involved choosing preprocessor statements instead of proper class design and abstraction. This PR is a good step towards the first direction and I think everybody should support this path.

I'm feeling strong for having this new method in as long as it doesn't have any drawbacks. If there are any, I'm happy to revisit my decision.

P.s. I'm not a fan of carrying the data layout outside the data structure, but that's also something we can address in future with a proper class design for ndarray and operators by allowing operators to define their favorite layout and automatic converters from the engine side. If the layout doesn't match (like it's usually the problem for cudnn and mkldnn), it could then be automatically converted into the appropriate format. But this PR is a step towards the right direction, but it can't solve all problems at once. We should appreciate these improvements and I'm sure Dick - or somebody else - will come up with a good design and then provide a reference implementation.

@marcoabreu
Copy link
Contributor

Thanks for the test Tao! Can you check why that has not been caught by our CI? What are we missing?

@TaoLv
Copy link
Member

TaoLv commented Dec 6, 2018

Seems there is no negative test for this feature. Only GPU with certain versions of CUDNN will be tested.

@marcoabreu
Copy link
Contributor

marcoabreu commented Dec 6, 2018

Great catch! Do you mind adding a test that runs on both cpu and gpu?

I'm still having trouble to understand why we are exposing the Backend (whether it's cudnn or mkldnn) to the frontend and making conditional tests. The tests should work with any Backend (except if there is no valid implementation available). In an ideal world, we would just call the Operator and the mkldnn/cudnn magic would be done by the Backend. I think this new structure is a good step towards that.
I mean I get why we have these conditions, but can't we have some type of fallback?

Can we already add a generalized test that both works on CPU and GPU at this stage to catch Taos error?

@apeforest
Copy link
Contributor

@marcoabreu The harm of leaving this boolean return type in this PR is that it will cause confusion to developers and also leave unclear messages to MXNet users. E.g. when it returns false, what happens in the Python frontend, any clear message? If this is meant to signal the frontend that this operatator implementation is not supported on particular platform, we need a consistent log message (in fact, we already achieve this via a different way in https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_pooling.cc#L53). Falling back to the conventional void return type seems not affect any functionality of this PR. I would like this PR to mainly focusing on the new feature in Pooling only and we can have a separate PR to change void to boolean return type and invite more discussions there. I am open to any other suggestions and add @Vikas89 @yuxihu for their comments.

@@ -92,6 +93,16 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> {
"calculation. For example, with a 5*5 kernel on a 3*3 corner of a image,"
"the sum of the 9 valid elements will be divided by 25 if this is set to true,"
"or it will be divided by 9 if this is set to false. Defaults to true.");

DMLC_DECLARE_FIELD(layout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the enum for NWC defined? I see it is used in multiple places in other files like pooling.cc.

param_.global_pool ? 1 : param_.stride[0],
param_.global_pool ? 1 : param_.stride[1]));
#endif
} else {
CHECK(param_.layout.value() == mshadow::kNCDHW ||
param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout";
CHECK(param_.layout.value() == mshadow::kNCDHW) << "Only the NCDHW layout is supported.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only NCDHW layout is supported, it seems me that there is no need to have the above "Need 3D layout" check.

Besides that, I am also a little bit confused here. In conv_layers.py, it is mentioned that "Dimension ordering of data and weight. Only supports 'NCDHW' and 'NDHWC' (only with cuDNN) layouts for now." for all 3D pool classes. Why here we only support NCDHW? This is CuDNNPoolingOp class, right? Am I missing something?

Copy link
Member

@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the reason for the implementation is to fallback cleanly and since this is a helper class for cudnn pooling it looks fine. @apeforest has a good point about the error message though. When we fallback to the non cudnn implementation can it provide an error message with LOG(WARNING) on why it fellback when is_supported is false.

#if CUDNN_MAJOR >= 5
nan_prop_ = CUDNN_NOT_PROPAGATE_NAN;
#endif
if (param_.kernel.ndim() == 2) {
// 2d conv
CHECK(param_.layout.value() == mshadow::kNCHW ||
param_.layout.value() == mshadow::kNHWC) << "Need 2D layout";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we mention, "Need 2D layout: NCHW or NHWC"

param_.global_pool ? 1 : param_.stride[0],
param_.global_pool ? 1 : param_.stride[1]));
#endif
} else {
CHECK(param_.layout.value() == mshadow::kNCDHW ||
param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed ?

@@ -149,6 +162,9 @@ class PoolingOp {
void Forward(const OpContext& ctx, const TBlob& in_data,
const OpReqType& req, const TBlob& out_data) {
using namespace mshadow;
CHECK(param_.layout.value() == kNCW ||
param_.layout.value() == kNCHW ||
param_.layout.value() == kNCDHW) << "Need CuDNN for layout support";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()

@@ -198,6 +214,9 @@ class PoolingOp {
const TBlob& in_data, const TBlob& out_data,
const OpReqType& req, const TBlob& in_grad) {
using namespace mshadow;
CHECK(param_.layout.value() == kNCW ||
param_.layout.value() == kNCHW ||
param_.layout.value() == kNCDHW) << "Need CuDNN for layout support";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()



@with_seed()
@assert_raises_cudnn_not_satisfied(min_version='7.0.1')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.

@nswamy nswamy added the Operator label Dec 7, 2018
@marcoabreu
Copy link
Contributor

@apeforest I expect that this PR is one of multiple. In the end, there will be a dispatcher in between the operators and the initial call to the operators. That dispatcher will evaluate the capabilities of every registered operator and then return a unified message. The frontends shouldn't even know about the fact that there may be multiple operators.

The way it was done in mkldnn is also not consistent. This evaluation has to come from a central component and the further processing as well. In this particular example you gave, the process would then fail to execute an operator although there might be other implementations available that actually support the requested operation.

I think you got valid points regarding what's going to happen with the return value and at which point this logic is finally handled. Dick might be happy to answer this.

@eric-haibin-lin
Copy link
Member

Although @DickJC123 already worked on the CPU implementation - I personally think it's totally fine to merge it if the PR only contains CUDNN support for NHWC and throw meaningful err msg if CPU is used. As a community we can incrementally add more implementations

@ptrendx
Copy link
Member Author

ptrendx commented Dec 31, 2018

Closing in favor of PR #13749

@ptrendx ptrendx closed this Dec 31, 2018
@ptrendx ptrendx deleted the pr_nhwc_pooling branch January 3, 2019 22:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.