You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
In src/operator/rnn-inl.h, the calculation of the mathPrec passed to cudnnSetRNNDescriptor_v6() needs review (the variable is dtype_with_fallback_). The current logic is:
The use of __CUDA_ARCH__ in determining host-side code behavior is not appropriate (the variable would be undefined in this case and interpreted as 0 in all cases). Thus the intent of the code (to ensure pseudo-fp16 handling of RNNs for Maxwell GPUs even after the API change of cuDNN 7.5) is being applied to all architectures.
My recommendation would be to adopt pseudo-fp16 for all architectures anyway. A possible fix:
Technically, since cuDNN prior to v7.5 ignores the use mathPrec in cudnnSetRNNDescriptor_v6(), the CUDNN_VERSION >= 7500 term could be dropped with no impact.
While we're fixing this, I think we should consider turning on Tensor Cores by default, or at least giving the user a mechanism to do so.
To Reproduce
To verify my claim about __CUDA_ARCH__, I inserted the following code in rnn-inl.h just before the definition of dtype_with_fallback_:
#ifdef __CUDA_ARCH__
LOG(INFO) << "************* __CUDA_ARCH__ = " << __CUDA_ARCH__;
#else
LOG(INFO) << "************* __CUDA_ARCH__ is not defined";
#endif
$ nosetests --verbose -s tests/python/gpu/test_operator_gpu.py:test_rnn
...
[10:53:00] src/operator/./rnn-inl.h:1388: ************* __CUDA_ARCH__ is not defined
[10:53:00] src/operator/./rnn-inl.h:1388: ************* __CUDA_ARCH__ is not defined
@DickJC123 do you think we can close the issue?
My though is that there are two things in the issue, first one was addressed but the the second one which is enabling the Tensor Cores by default is not tackled yet so I kept it open. But feel free to close it and maybe create enabling Tensor Cores issue if you want.
Description
In src/operator/rnn-inl.h, the calculation of the mathPrec passed to cudnnSetRNNDescriptor_v6() needs review (the variable is
dtype_with_fallback_
). The current logic is:The use of
__CUDA_ARCH__
in determining host-side code behavior is not appropriate (the variable would be undefined in this case and interpreted as 0 in all cases). Thus the intent of the code (to ensure pseudo-fp16 handling of RNNs for Maxwell GPUs even after the API change of cuDNN 7.5) is being applied to all architectures.My recommendation would be to adopt pseudo-fp16 for all architectures anyway. A possible fix:
Technically, since cuDNN prior to v7.5 ignores the use mathPrec in cudnnSetRNNDescriptor_v6(), the CUDNN_VERSION >= 7500 term could be dropped with no impact.
While we're fixing this, I think we should consider turning on Tensor Cores by default, or at least giving the user a mechanism to do so.
To Reproduce
To verify my claim about
__CUDA_ARCH__
, I inserted the following code in rnn-inl.h just before the definition of dtype_with_fallback_:@stu1130 @ptrendx
The text was updated successfully, but these errors were encountered: