-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Add NHWC layout support to Pooling (cuDNN only) #13362
Conversation
@mxnet-label-bot add [pr-awaiting-review] |
7df5b10
to
55ebfb7
Compare
Thanks for the contribution guys. FYI: I'm keeping an eye on this one and will do my best to get it past CI. We had a few CI issues that this PR looks like it has been caught up in. If you have a chance rebasing might help. |
I took care of it :) I actually just wanted to show up as contributor of this PR :P Just kidding, I hope everything is fine now. |
Thanks for the contribution @DickJC123 and @ptrendx ! |
@ptrendx can you please add more details in description explaining what is this PR doing and the motivation behind doing it(why is this done). Also, if you can please add any required background information, it would be helpful. |
@@ -73,15 +73,17 @@ class CuDNNPoolingOp { | |||
CUDNN_CALL(cudnnDestroyPoolingDescriptor(pooling_desc_)); | |||
} | |||
|
|||
void Forward(const OpContext &ctx, const TBlob &in_data, | |||
// Return boolean saying whether pooling configuration is supported. | |||
bool Forward(const OpContext &ctx, const TBlob &in_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this boolean return necessary? I don't find it in other operators. What happens if it returns false?
} | ||
|
||
void Backward(const OpContext &ctx, const TBlob &out_grad, | ||
// Return boolean saying whether pooling configuration is supported | ||
bool Backward(const OpContext &ctx, const TBlob &out_grad, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
} | ||
|
||
private: | ||
inline void Init(mshadow::Stream<gpu> *s, const TBlob &in_data, | ||
// Return boolean saying whether pooling configuration is supported | ||
inline bool Init(mshadow::Stream<gpu> *s, const TBlob &in_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1]; | ||
// CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above. | ||
#if CUDNN_VERSION == 7104 | ||
is_supported = window_height <= 8 && window_width <= 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a reference to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this check earlier so this function can have an early exit if not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference to this - do you mean the cudnn 7.1.4 bug? Not sure where should I put it, it's mentioned here, in fixed issues: https://docs.nvidia.com/deeplearning/sdk/cudnn-release-notes/rel_721.html#rel_721
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if it does not violate any nvidia license rules, it would be great to add the link here as comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your contribution. I added some comments. Feel free to ping me if you have any question.
data.shape_[1], | ||
data.shape_[2], | ||
data.shape_[3])); | ||
dshape_nchw[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe define enum constant N, C, H, W instead of 0, 1, 2, 3 to improve readability?
int window_width = param_.global_pool ? dshape_nchw[3] : param_.kernel[1]; | ||
// CuDNN v7.1.4 backprop kernel doesn't support window sizes 9 and above. | ||
#if CUDNN_VERSION == 7104 | ||
is_supported = window_height <= 8 && window_width <= 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this check earlier so this function can have an early exit if not supported?
Adding @sandeep-krishnamurthy for review. |
Have you ever tried to pass NHWC input into MKL-DNN pooling? Seems it's not well handled here. |
The new parameter |
Do we have any other operator exposing layout information to front user? Does NDArray know the layout information of itself? |
@@ -673,7 +673,7 @@ def __init__(self, channels, kernel_size, strides=(1, 1, 1), padding=(0, 0, 0), | |||
class _Pooling(HybridBlock): | |||
"""Abstract class for different pooling layers.""" | |||
def __init__(self, pool_size, strides, padding, ceil_mode, global_pool, | |||
pool_type, count_include_pad=None, **kwargs): | |||
pool_type, layout, count_include_pad=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose this option to user? Is there any use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand - that option was already exposed to users, e.g. see here: https://mxnet.incubator.apache.org/api/python/gluon/nn.html#mxnet.gluon.nn.MaxPool1D
It just was not passed up to base class and to C++ because pooling supported only 1 layout.
@@ -53,6 +53,7 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> { | |||
bool cudnn_off; | |||
dmlc::optional<int> p_value; | |||
dmlc::optional<bool> count_include_pad; | |||
dmlc::optional<int> layout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only adding this option to pooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convolutions already have it (layout parameter), batchnorm has notion of axis as well.
@ptrendx It would be great if you can add in the PR description why we need this option. Thanks! |
@TaoLv |
@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apeforest @Vikas89 Not sure I understand the question. Are you asking why do we need NHWC support for pooling?
@ptrendx I want to make sure that PR description is descriptive enough. For your case: why do we need NHWC support for pooling. What are the benefits that we get with this change and testing done with numbers if any.
I think a good PR description will help the developer community to understand the PR, and also when referring to PR in future.
I would request you to please spend some time in writing a good PR description explaining things that was done, why it was done(motivation) and testing done if any. That would be very helpful to community developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution.
python/mxnet/gluon/nn/conv_layers.py
Outdated
@@ -738,12 +741,13 @@ class MaxPool1D(_Pooling): | |||
""" | |||
def __init__(self, pool_size=2, strides=None, padding=0, layout='NCW', | |||
ceil_mode=False, **kwargs): | |||
assert layout == 'NCW', "Only supports 'NCW' layout for now" | |||
assert layout in ('NCW', 'NWC'),\ | |||
"Only NCW and NWC layouts are valid for 1D" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 1D Pooling?
Same comment across 2D and 3D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
req[0], outputs[0])) { | ||
return; | ||
} | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we handle else case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - if Backward returns false, program will exit the switch statement and proceed to non-cuDNN implementation.
|
||
|
||
@with_seed() | ||
@assert_raises_cudnn_not_satisfied(min_version='7.0.1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it rather skip this test if cudnn version is < 7.0.1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, skipping this test if cudnn version is not satisfied was more or less my understanding of what this decorator is doing. Is it wrong? What should I do here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.
@Vikas89 I took a stab at modifying PR description to include motivation and testing that we did. |
@TaoLv , providing details and motivation of the PR (partial duplicate of info just added by @ptrendx ): In MXNet, layout is not something that is stored with the NDArray. Some operators, like pointwise ones, don't even care about the layout, and will produce the same output regardless of layout. Other operators, like Convolution, Batchnorm and Pooling will need to be told the layout. Convolution supports a limited number of layouts via the 'layout' parameter, e.g. layout='NHWC'. Batchnorm doesn't need to know everything about the layout, just which dimension is the 'C' dimension. For this, the Batchnorm op accepts the axis parameter, e.g. axis=3 for NHWC batchnorm. Prior to this PR, in MXNet, the Pooling operator did not have a parameter to specify the layout, forcing a transposition always to NCHW. We have two goals with this PR: |
@apeforest Re: bool return of Forward(), Backward(), etc. As you know, MXNet has been moving to a more imperative style of processing. Before that, all operators were subclasses of Operator, which defined Forward() and Backward() as returning void. Now, operators like Pooling are their own class and can define Forward() and Backward() as they desire. In addition, with every operator, we face the chore of selecting the best implementation (cudnn, cuda, mkldnn, etc.). Rather than code the selection logic in a central place, I recommend each implementation be asked (in a favored order) whether it supports the operation, given its parameters. With the cudnn convolution operator back in the "pre-imperative era", I added a 'Supports(..., param, ...) method. Sadly, this tended to repeat a lot of the checks and analysis done by the actual Forward() and Backward() calls. Rather than adding a repetitive Supports() method to Pooling, I realized how much cleaner it was to just have Forward() or Backward() return a flag indicating whether the function could be performed. This allows the implementation selector to cleanly fall back to another implementation. |
@DickJC123 Good to know that. Could you explain a bit more about when and where this kind of transposition happens? With this new feature, what's the typical usage in user model? Does user need to specify layout in each layer and each operator? |
I love that approach about the operator selection, dick! |
The forced transposition I referred to is done currently by the user in the python model. With this new feature, the Transpose operators could get removed in favor of a Pooling with the new layout parameter. I'm a bit sorry that NDArray's don't carry the layout information. As a result, the layout has to travel along with the data ('out-of-band') and be passed into the operators along the way. On the other hand, having multi-input operators deal with inconsistent layouts would be a headache. Anyway, that's a discussion to be debated in its own thread. |
Sure. It's a problem of MXNet, not this PR. Do you mind documenting some where what does the output layout look like? Then user can know what they need to do before sending the output to next layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DickJC123 Thanks for your detailed explanation about the operator selection logic. I like the way you used a boolean return type for Forward()
and Backward()
methods to choose the right operator implementation. Although it might be an elegant way, this method is not yet well received in the developer community. And all the other operators do not currently implement in this way. I am afraid that using it alone in this operator without a clear guideline may cause more confusion for prospect developers. Can you still follow the traditional way of declaring Forward()
and Backward()
function and maybe propose this in a separate thread to refactor the Forward()
and Backward()
with boolean return type?
Tested this PR on CPU w/ and w/o MKL-DNN and got error message as below.
w/ MKL-DNN:
|
I would be prefer to leave the booleans in. If there is no harm in taking that approach, I'd rather go down that path and use this operator implementation as example for others rather than waiting for a saint who's going to refactor everything - because that won't happen. Dicks designs often greatly improved the architecture or usability of mxnet, thus I really appreciate these things and definitely would not like to see these improvements blocked because some people can't adapt that fast. This project had some sub-par design decisions that mainly involved choosing preprocessor statements instead of proper class design and abstraction. This PR is a good step towards the first direction and I think everybody should support this path. I'm feeling strong for having this new method in as long as it doesn't have any drawbacks. If there are any, I'm happy to revisit my decision. P.s. I'm not a fan of carrying the data layout outside the data structure, but that's also something we can address in future with a proper class design for ndarray and operators by allowing operators to define their favorite layout and automatic converters from the engine side. If the layout doesn't match (like it's usually the problem for cudnn and mkldnn), it could then be automatically converted into the appropriate format. But this PR is a step towards the right direction, but it can't solve all problems at once. We should appreciate these improvements and I'm sure Dick - or somebody else - will come up with a good design and then provide a reference implementation. |
Thanks for the test Tao! Can you check why that has not been caught by our CI? What are we missing? |
Seems there is no negative test for this feature. Only GPU with certain versions of CUDNN will be tested. |
Great catch! Do you mind adding a test that runs on both cpu and gpu? I'm still having trouble to understand why we are exposing the Backend (whether it's cudnn or mkldnn) to the frontend and making conditional tests. The tests should work with any Backend (except if there is no valid implementation available). In an ideal world, we would just call the Operator and the mkldnn/cudnn magic would be done by the Backend. I think this new structure is a good step towards that. Can we already add a generalized test that both works on CPU and GPU at this stage to catch Taos error? |
@marcoabreu The harm of leaving this boolean return type in this PR is that it will cause confusion to developers and also leave unclear messages to MXNet users. E.g. when it returns false, what happens in the Python frontend, any clear message? If this is meant to signal the frontend that this operatator implementation is not supported on particular platform, we need a consistent log message (in fact, we already achieve this via a different way in https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_pooling.cc#L53). Falling back to the conventional |
@@ -92,6 +93,16 @@ struct PoolingParam : public dmlc::Parameter<PoolingParam> { | |||
"calculation. For example, with a 5*5 kernel on a 3*3 corner of a image," | |||
"the sum of the 9 valid elements will be divided by 25 if this is set to true," | |||
"or it will be divided by 9 if this is set to false. Defaults to true."); | |||
|
|||
DMLC_DECLARE_FIELD(layout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the enum for NWC defined? I see it is used in multiple places in other files like pooling.cc.
param_.global_pool ? 1 : param_.stride[0], | ||
param_.global_pool ? 1 : param_.stride[1])); | ||
#endif | ||
} else { | ||
CHECK(param_.layout.value() == mshadow::kNCDHW || | ||
param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout"; | ||
CHECK(param_.layout.value() == mshadow::kNCDHW) << "Only the NCDHW layout is supported."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only NCDHW layout is supported, it seems me that there is no need to have the above "Need 3D layout" check.
Besides that, I am also a little bit confused here. In conv_layers.py, it is mentioned that "Dimension ordering of data and weight. Only supports 'NCDHW' and 'NDHWC' (only with cuDNN) layouts for now." for all 3D pool classes. Why here we only support NCDHW? This is CuDNNPoolingOp class, right? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the reason for the implementation is to fallback cleanly and since this is a helper class for cudnn pooling it looks fine. @apeforest has a good point about the error message though. When we fallback to the non cudnn implementation can it provide an error message with LOG(WARNING)
on why it fellback when is_supported is false.
#if CUDNN_MAJOR >= 5 | ||
nan_prop_ = CUDNN_NOT_PROPAGATE_NAN; | ||
#endif | ||
if (param_.kernel.ndim() == 2) { | ||
// 2d conv | ||
CHECK(param_.layout.value() == mshadow::kNCHW || | ||
param_.layout.value() == mshadow::kNHWC) << "Need 2D layout"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we mention, "Need 2D layout: NCHW or NHWC"
param_.global_pool ? 1 : param_.stride[0], | ||
param_.global_pool ? 1 : param_.stride[1])); | ||
#endif | ||
} else { | ||
CHECK(param_.layout.value() == mshadow::kNCDHW || | ||
param_.layout.value() == mshadow::kNDHWC) << "Need 3D layout"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed ?
@@ -149,6 +162,9 @@ class PoolingOp { | |||
void Forward(const OpContext& ctx, const TBlob& in_data, | |||
const OpReqType& req, const TBlob& out_data) { | |||
using namespace mshadow; | |||
CHECK(param_.layout.value() == kNCW || | |||
param_.layout.value() == kNCHW || | |||
param_.layout.value() == kNCDHW) << "Need CuDNN for layout support"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()
@@ -198,6 +214,9 @@ class PoolingOp { | |||
const TBlob& in_data, const TBlob& out_data, | |||
const OpReqType& req, const TBlob& in_grad) { | |||
using namespace mshadow; | |||
CHECK(param_.layout.value() == kNCW || | |||
param_.layout.value() == kNCHW || | |||
param_.layout.value() == kNCDHW) << "Need CuDNN for layout support"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also output the layout :
"Need CuDNN for layout support" << param_.layout.value()
|
||
|
||
@with_seed() | ||
@assert_raises_cudnn_not_satisfied(min_version='7.0.1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine. The decorator checks if an exception is raised when cudnn version is less than 7.0.1 and you run the test. For cudnn versions greater than 7.0.1 it should just work like other tests.
@apeforest I expect that this PR is one of multiple. In the end, there will be a dispatcher in between the operators and the initial call to the operators. That dispatcher will evaluate the capabilities of every registered operator and then return a unified message. The frontends shouldn't even know about the fact that there may be multiple operators. The way it was done in mkldnn is also not consistent. This evaluation has to come from a central component and the further processing as well. In this particular example you gave, the process would then fail to execute an operator although there might be other implementations available that actually support the requested operation. I think you got valid points regarding what's going to happen with the return value and at which point this logic is finally handled. Dick might be happy to answer this. |
Although @DickJC123 already worked on the CPU implementation - I personally think it's totally fine to merge it if the PR only contains CUDNN support for NHWC and throw meaningful err msg if CPU is used. As a community we can incrementally add more implementations |
Closing in favor of PR #13749 |
Description
This PR adds NHWC layout support to Pooling operator. Work mostly done by @DickJC123
Training using TensorCores is more efficient when using NHWC data layout. Convolutions and BatchNorm operators in MXNet already have support for NHWC, while Pooling operator currently accepts only NCHW format. This PR introduces NHWC layout support for Pooling when using cuDNN.
This PR is part of upstreaming improvements available in NVIDIA's NGC MXNet container and was tested as part of it to train multiple convnets to expected accuracy. We also introduced additional tests checking consistency between NHWC and NCHW versions of pooling.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments