Batch_dot does not support FP16 well #11796

szhengac · 2018-07-18T04:07:23Z

The batch_dot does not support FP16 well and can make training slower compared to using FP32. This is tested using Transformer model in Gluonnlp. This feature has been added in a NVIDIA mxnet. So I think it is good to enable this in the master.

The text was updated successfully, but these errors were encountered:

szha · 2018-07-18T04:23:22Z

Oops, wrong button. Relevant links: https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/dot-inl.h#L1347-L1364 https://github.com/dmlc/mshadow/blob/master/mshadow/dot_engine-inl.h#L528-L539. While for float the strided gemm is used, the half_t type is calling regular gemm. Instead, the strided gemm in cublas can be used which supports half_t: https://docs.nvidia.com/cuda/cublas/#cublas-lt-t-gt-gemmbatched

ptrendx · 2018-07-18T21:06:16Z

@DickJC123

eric-haibin-lin · 2018-08-04T04:58:18Z

I'm adding cublas strided gemm calls in mshadow dmlc/mshadow#353

eric-haibin-lin · 2018-09-12T22:15:11Z

merged

sbodenstein · 2018-11-06T21:27:46Z

@szha: can we reopen this? For some reason, the fix in dmlc/mshadow#353 was reverted by this commit by @eric-haibin-lin .

This code, run on version 1.3.0 (latest EC2 Deep Learning AMI):

import mxnet as mx
a = mx.nd.ones((100,100,100), ctx=mx.gpu(), dtype='float16')
b = mx.nd.ones((100,100,100), ctx=mx.gpu(), dtype='float16')
for i in range(10):
    c = mx.nd.batch_dot(a,b)
mx.nd.waitall()
import time
begin = time.time()
for i in range(500):
    c = mx.nd.batch_dot(a,b)
mx.nd.waitall()
end = time.time()
print(end - begin)

takes 0.9s on a V100 (and 0.0318s when using float32 instead, a 30x slowdown!)

We want to implement transformers using TensorCores for training, but there is no way of doing this in MXNet at the moment (linalg_gemm and linalg_gemm2 unfortunately don't support float16 either, despite it seemingly being implemented here).

What is the plan for exposing any form of GEMM to users with Real16 and TensorCore support?

@szhengac

eric-haibin-lin · 2018-11-07T17:21:20Z

Sorry about the revert. I found that it is better to implement fp16 ops in mxnet instead of in mshadow, since there are built in functionality to detect/enable tensorcore. I can make a PR in maybe two or three days. @sbodenstein are you using symbol or gluon to train transformer?

sbodenstein · 2018-11-07T20:51:52Z

@eric-haibin-lin: we are using symbol to train transformer. That would be great to reenable this as soon as possible.

Is there any reason to not expose linalg_gemm and linalg_gemm2 float16-support on GPU as well?

sbodenstein · 2018-11-12T09:43:40Z

@eric-haibin-lin: any updates about this?

eric-haibin-lin · 2019-01-30T05:54:26Z

Added in #13716

szha closed this as completed Jul 18, 2018

szha reopened this Jul 18, 2018

szha added Operator Feature request Performance labels Jul 18, 2018

eric-haibin-lin mentioned this issue Aug 4, 2018

Add half_t support for batch_dot. dmlc/mshadow#353

Merged

eric-haibin-lin self-assigned this Aug 4, 2018

eric-haibin-lin closed this as completed Sep 12, 2018

eric-haibin-lin reopened this Nov 7, 2018

sbodenstein mentioned this issue Nov 20, 2018

GEMM Tensor Core Support #13336

Merged

eric-haibin-lin mentioned this issue Dec 21, 2018

[BERT] fp16 support for BERT and Transformer dmlc/gluon-nlp#472

Closed

2 tasks

eric-haibin-lin closed this as completed Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch_dot does not support FP16 well #11796

Batch_dot does not support FP16 well #11796

szhengac commented Jul 18, 2018

szha commented Jul 18, 2018

ptrendx commented Jul 18, 2018

eric-haibin-lin commented Aug 4, 2018

eric-haibin-lin commented Sep 12, 2018

sbodenstein commented Nov 6, 2018

eric-haibin-lin commented Nov 7, 2018

sbodenstein commented Nov 7, 2018

sbodenstein commented Nov 12, 2018

eric-haibin-lin commented Jan 30, 2019

Batch_dot does not support FP16 well #11796

Batch_dot does not support FP16 well #11796

Comments

szhengac commented Jul 18, 2018

szha commented Jul 18, 2018

ptrendx commented Jul 18, 2018

eric-haibin-lin commented Aug 4, 2018

eric-haibin-lin commented Sep 12, 2018

sbodenstein commented Nov 6, 2018

eric-haibin-lin commented Nov 7, 2018

sbodenstein commented Nov 7, 2018

sbodenstein commented Nov 12, 2018

eric-haibin-lin commented Jan 30, 2019