You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Currently, the inner gradient accumulation method in Embedding and take is not based on safe accumulation, which means that we will lose precision in the fp16 case. Here's the example that amplified the issue:
The simplest fix is to revise the kernel with safe accumulation, which means to cast float16 to float32 before accumulating. Also, I suggest that we should turn on MXNET_SAFE_ACCUMULATION for float16 type in 1.7 (change the default behavior) so that float16 is accumulated via float32.
I think we should use the following approach for summing up a sequence of float16 numbers:
Description
Currently, the inner gradient accumulation method in
Embedding
andtake
is not based on safe accumulation, which means that we will lose precision in the fp16 case. Here's the example that amplified the issue:Output:
Also, the same happens for take
Output:
The text was updated successfully, but these errors were encountered: