Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

cuDNN RNN dtype_with_fallback_ calc needs update #16670

Closed
DickJC123 opened this issue Oct 29, 2019 · 2 comments
Closed

cuDNN RNN dtype_with_fallback_ calc needs update #16670

DickJC123 opened this issue Oct 29, 2019 · 2 comments
Labels

Comments

@DickJC123
Copy link
Contributor

DickJC123 commented Oct 29, 2019

Description

In src/operator/rnn-inl.h, the calculation of the mathPrec passed to cudnnSetRNNDescriptor_v6() needs review (the variable is dtype_with_fallback_). The current logic is:

#if __CUDA_ARCH__ < 530 && CUDNN_VERSION >= 7500
      if (dtype_ == CUDNN_DATA_HALF) {
        dtype_with_fallback_ = CUDNN_DATA_FLOAT;
      } else {
        dtype_with_fallback_ = dtype_;
      }
#else
        dtype_with_fallback_ = dtype_;
#endif

The use of __CUDA_ARCH__ in determining host-side code behavior is not appropriate (the variable would be undefined in this case and interpreted as 0 in all cases). Thus the intent of the code (to ensure pseudo-fp16 handling of RNNs for Maxwell GPUs even after the API change of cuDNN 7.5) is being applied to all architectures.

My recommendation would be to adopt pseudo-fp16 for all architectures anyway. A possible fix:

    cudnnDataType_t dtype_with_fallback_ =
        (CUDNN_VERSION >= 7500 && dtype_ == CUDNN_DATA_HALF) ? CUDNN_DATA_FLOAT
                                                             : dtype_; 

Technically, since cuDNN prior to v7.5 ignores the use mathPrec in cudnnSetRNNDescriptor_v6(), the CUDNN_VERSION >= 7500 term could be dropped with no impact.

While we're fixing this, I think we should consider turning on Tensor Cores by default, or at least giving the user a mechanism to do so.

To Reproduce

To verify my claim about __CUDA_ARCH__, I inserted the following code in rnn-inl.h just before the definition of dtype_with_fallback_:

#ifdef __CUDA_ARCH__
      LOG(INFO) << "************* __CUDA_ARCH__ = " << __CUDA_ARCH__;
#else
      LOG(INFO) << "************* __CUDA_ARCH__ is not defined";
#endif
$ nosetests --verbose -s tests/python/gpu/test_operator_gpu.py:test_rnn
...
[10:53:00] src/operator/./rnn-inl.h:1388: ************* __CUDA_ARCH__ is not defined
[10:53:00] src/operator/./rnn-inl.h:1388: ************* __CUDA_ARCH__ is not defined

@stu1130 @ptrendx

@stu1130
Copy link
Contributor

stu1130 commented Oct 29, 2019

Thanks for the suggestion @DickJC123
I created a quick fix #16671

@stu1130
Copy link
Contributor

stu1130 commented Nov 4, 2019

@DickJC123 do you think we can close the issue?
My though is that there are two things in the issue, first one was addressed but the the second one which is enabling the Tensor Cores by default is not tackled yet so I kept it open. But feel free to close it and maybe create enabling Tensor Cores issue if you want.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants