-
Notifications
You must be signed in to change notification settings - Fork 6.8k
When setting MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION no speedup observed #14684
Comments
Hey, this is the MXNet Label Bot. |
Auto-tuning is overwriting the math mode at convolution tuning time. Probably the right thing to do when implementing TCs but it's preventing the conversion math type from being used. We'll have to think about the long-term fix for this, but I've currently commented out the math type reset locally and I'm trying to verify this cudnn feature provides a significant speedup before moving forward. New output logs from the CuDNN Api logging look a bit happier:
|
nvprof summary when autotuning manually disabled via env var (Tensor Cores enabled here):
nvprof summary with some local code changes to autotuning to allow selection of cudnn mixed math mode (Tensor Cores also enabled):
nvprof without local changes (Tensor Cores not enabled):
So the summary is we need a few code changes to actually enable this feature. I'll have to think a little about what the easiest change is and then I'll open a PR. I'm also a little concerned here that the non-TC cudnn convolution implementation seems faster. |
@mxnet-label-bot add [CUDA] |
Doing a quick debugging session on why I don't see any speedup when enabling MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION.
So far interesting observations:
Enabling CuDNN API logging and observing forward calls I see repeated examples of these calls with default math type. Debugging I have verified we do set correct mathtype during cudnn conv setup here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/cudnn/cudnn_convolution-inl.h#L588
but when running a quick inference sample from the resnet50v2 zoo I get many of these outputs:
The text was updated successfully, but these errors were encountered: