You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
While fixing an issue described here.
I applied the changes from master and found that the change from #14616 follows the same pattern as the softmax change in #14098 and increases the loss and hence perpelity from 126.66 to 135, without this change and the fix to softmax #14759, validation perplexity comes back to 126.66
Backward incompatibility is a concern. What about adding env var like MXNET_ENFORCE_SAFE_ACCUMULATION = 1 to trigger safe accumulation with higher precision on existing ops?
If we have to do this for many operators it makes sense to control through a environment variable like you are suggesting. how many operators need this change?
What i am puzzled about is why the loss goes up when the accumulation precision is increased.
Can we avoid adding another env variable ? We can accumulate only when dtype is set otherwise use the default behavior before this change was added. Also I understand FP16 inputs with accumulation in FP32 for softmax but does this also apply to FP32 inputs with accumulation in FP64. Will there be no accuracy loss ?
While fixing an issue described here.
I applied the changes from master and found that the change from #14616 follows the same pattern as the softmax change in #14098 and increases the loss and hence perpelity from 126.66 to 135, without this change and the fix to softmax #14759, validation perplexity comes back to 126.66
Details on how to test are provided in #14722
@haojin2 @eric-haibin-lin
The text was updated successfully, but these errors were encountered: