Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add ability to query cuDNN BatchNorm min. epsilon. Allow ONNX importer to use cuDNN BN if chosen eps >= cuDNN min. eps. #11380

Merged
merged 1 commit into from
Jul 11, 2018

Conversation

mkolod
Copy link
Contributor

@mkolod mkolod commented Jun 24, 2018

Description

The ONNX importer always disables cuDNN BatchNorm when importing models. The reason for this is that if the epsilon chosen in the model is less than the cuDNN minimum (CUDNN_BN_MIN_EPSILON), the cuDNN call will fail. However, cuDNN BatchNorm need not be always disabled when importing models. This PR adds an ability to query the value of CUDNN_BN_MIN_EPSILON, and only disable cuDNN BatchNorm during ONNX import if the value from the ONNX model is less than the cuDNN minimum. The ability to query the minimum cuDNN value can also be helpful in other scenarios, such as when verifying if the value in a model is too small for numeric stability (the cuDNN value is quite reasonable as a minimum).

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Code is well-documented:
  • This is a tiny change so only the call to get the value of CUDNN_BN_MIN_EPSILON (mx.base.get_cudnn_epsilon) needs documentation
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • mx.base.get_cudnn_epsilon() implementation (python/mxnet/base.py)
  • test of get_cudnn_epsilon() at tests/python/gpu/test_cudnn_eps.py
  • adaptation of the ONNX importer to only disable cuDNN BatchNorm if cuDNN is not available, or the epsilon from the ONNX model is less than CUDNN_BN_MIN_EPSILON (python/mxnet/contrib/onnx/_import/op_translations.py)

@mkolod mkolod requested a review from szha as a code owner June 24, 2018 05:17
@mkolod mkolod changed the title Add ability to query cuDNN BatchNorm min. epsilon. Allow ONNX importer to use cuDNN BN if chosen eps >= cuDNN min. aps. Add ability to query cuDNN BatchNorm min. epsilon. Allow ONNX importer to use cuDNN BN if chosen eps >= cuDNN min. eps. Jun 24, 2018
@mkolod mkolod force-pushed the cudnn_epsilon_query branch 2 times, most recently from 1978538 to bef6003 Compare June 25, 2018 16:31
@mkolod
Copy link
Contributor Author

mkolod commented Jun 25, 2018

@Roshrini ^^ in case you're interested.

@Roshrini
Copy link
Member

Thanks @mkolod for making this change. This will definitely be helpful.
@anirudhacharya @spidydev

@mkolod mkolod force-pushed the cudnn_epsilon_query branch 4 times, most recently from 663cd7e to a840509 Compare June 25, 2018 23:42
@mkolod
Copy link
Contributor Author

mkolod commented Jun 26, 2018

@marcoabreu It seems like all tests pass on all plaforms, except for Windows-GPU, which is failing on all tests with CUDA: unspecified launch failure. It seems like the issue may be with the Windows runner.

@@ -209,7 +211,9 @@ def batch_norm(attrs, inputs, proto_obj):
'is_test': 'fix_gamma'})
new_attrs = translation_utils._remove_attributes(new_attrs,
['spatial', 'consumed_inputs'])
new_attrs = translation_utils._add_extra_attributes(new_attrs, {'cudnn_off': 1})
cudnn_eps = get_cudnn_epsilon()
cudnn_off = 0 if not math.isnan(cudnn_eps) and attrs['epsilon'] >= cudnn_eps else 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if attrs does not contain 'epsilon', then it should be defaulted to 1e-5. (reference)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anirudhacharya I just updated the PR. Let me know if you need anything else.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 27, 2018

@szha @marcoabreu Can this PR's CI be re-run once the Caffe CI issue is resolved? Thanks!

@marcoabreu
Copy link
Contributor

Sure thing! I have retriggered. It might be possible that it didn't purge the git cache (thus merging onto a commit with the broken Caffe test). If that's the case, please rebase.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@szha @marcoabreu I rebased because of the git caching issue with the Caffe test that you mentioned. It looks like this will require a new review because of that.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@marcoabreu The Python3-MKLDNN-GPU build is failing with "CUDA: unspecified launch failure." Every other build seems to have succeeded, along with the tests. What do we do now? I reported a similar problem before for the same PR (see here) but that time it was the Windows-GPU. As you can see, Python2-GPU, Python2-MKLDNN-GPU, Python2-Quantize-GPU, Python2-GPU-Win, Perl-GPU, Cpp-MKLDNN-GPU, Cpp-GPU, R-GPU and Scala-GPU, as well as all CPU builds and unit test runs, were OK. It seems like a random failure to me. What now?

@szha
Copy link
Member

szha commented Jun 28, 2018

I'm not sure about adding this API. This doesn't seem like a scalable way of managing these constants but rather like a one-off case.

@marcoabreu
Copy link
Contributor

Sorry Marek, we identified the issue at #11395 and disabled the problematic test at #11440. Please rebase and everything should be going smooth again.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@szha Makes sense. In that case, I'd propose the following. How about I remove the Python function to query that cuDNN constant and inline that call in the ONNX importer, so it's not exposed as an API if it's indeed a one-off? That said, even in that case, I'd need the actual C function because that constant is not a static const float that can be accessed, and it's not a cuDNN query function either. It's a macro, so it needs the C query function. The use in Python could be inlined though. I hope that we agree on the issue that it would be useful not to disable cuDNN BN for ONNX imports if the epsilon is big enough that it is equal to, or exceeds, the cuDNN minimum. cuDNN BN is faster for both fp32 and fp16.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@marcoabreu No problem, I'm trying to address @szha's latest concern and will have to update the PR anyway. Thank you for your help.

@piiswrong
Copy link
Contributor

I think adding a CAPI just for this is way over kill

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@piiswrong Do you have suggestions as to what to do instead? CUDNN_BN_MIN_EPSILON is a macro, not a constant that can be called from ctypes, so it requires a C wrapper to extract the macro's value at compile time. If adding the C API is overkill, then I see 2 options:

(1) We can hard-code the value to 1e-5 in Python, and not check what cuDNN provides. In that case, we have no C API, but we risk getting the wrong value in case cuDNN changes the constant. It's extremely unlikely that cuDNN will ever change it, but it's not a guarantee.

(2) Ignore the feature altogether. This results in ONNX imported models working more slowly than the MxNet checkpointed models. To the extent that there are any numerical differences, or differences in memory use, that will make the ONNX experience different from the MxNet checkpoint experience in more ways than just lower performance.

What should I do next?

szha
szha previously requested changes Jun 28, 2018
Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkolod please work with @piiswrong and find a more suitable solution. Thanks.

@mkolod
Copy link
Contributor Author

mkolod commented Jun 28, 2018

@szha Will do. @piiswrong, please see my previous comment where I proposed 2 solutions. Please either choose among them, or suggest something else. Thank you.

@piiswrong
Copy link
Contributor

I think constant 1e-5 is good enough

* \brief Query cuDNN for minimum permissible epsilon for BatchNorm. If not installed, return NaN.
* \param result the minimum epsilon provided by cuDNN
*/
MXNET_DLL int MXGetCudnnBnEpsilon(float* result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be renamed to MXGetCudnnBNEpsilon for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larroy Makes sense, but given what @piiswrong said, I'll get rid of the C call altogether and check against the 1e-5 magic value instead of adding the API that expands the macro from the particular version of cudnn.h.

@mkolod
Copy link
Contributor Author

mkolod commented Jul 3, 2018

@szha I addressed @piiswrong's request from here. If you agree, can you approve this change? Thanks!

@szha
Copy link
Member

szha commented Jul 3, 2018

@mkolod did you intend to remove the C API?

@mkolod
Copy link
Contributor Author

mkolod commented Jul 3, 2018

@szha Please check the commit now.

@szha szha dismissed their stale review July 3, 2018 19:26

problem corrected

@mkolod
Copy link
Contributor Author

mkolod commented Jul 3, 2018

@marcoabreu Note that all tests succeed except for MKLDNN-GPU, which gives:

test_operator_gpu.test_sparse_hybrid_block_grad ... [19:36:39] src/operator/tensor/./.././../common/../operator/mxnet_op.h:576: Check failed: (err) == (cudaSuccess) Name: mxnet_generic_kernel ErrStr:an illegal memory access was encountered

No other GPU build reports such an issue. 48 tests fail this way on that particular runner, but they don't fail for any other GPU build. Seems like that runner is having issues. Note that my code only pertains to the ONNX importer (see here), and these tests aren't even related to the ONNX importer.

@mkolod
Copy link
Contributor Author

mkolod commented Jul 6, 2018

@marcoabreu ping

@marcoabreu
Copy link
Contributor

marcoabreu commented Jul 6, 2018

Hi Marek, I'm currently on vacation. @KellenSunderland could you follow up please? If you can't solve the problem, please don't hesitate to ping me again and I'll check what I can do :)

@mkolod
Copy link
Contributor Author

mkolod commented Jul 6, 2018

Thanks @marcoabreu. @KellenSunderland, could you check this out?

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Jul 6, 2018 via email

@mkolod
Copy link
Contributor Author

mkolod commented Jul 10, 2018

@larroy Ping.

@larroy
Copy link
Contributor

larroy commented Jul 10, 2018

@mkolod please rebase and push, the gluon tests is disabled now.

@mkolod
Copy link
Contributor Author

mkolod commented Jul 10, 2018

@larroy It looks like the CI is happy now.

@szha szha merged commit d814733 into apache:master Jul 11, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants