Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

upcasting accumulation type in norm increases loss/Perplexity #14760

Closed
nswamy opened this issue Apr 22, 2019 · 5 comments
Closed

upcasting accumulation type in norm increases loss/Perplexity #14760

nswamy opened this issue Apr 22, 2019 · 5 comments

Comments

@nswamy
Copy link
Member

nswamy commented Apr 22, 2019

While fixing an issue described here.
I applied the changes from master and found that the change from #14616 follows the same pattern as the softmax change in #14098 and increases the loss and hence perpelity from 126.66 to 135, without this change and the fix to softmax #14759, validation perplexity comes back to 126.66

Details on how to test are provided in #14722

@haojin2 @eric-haibin-lin

@eric-haibin-lin
Copy link
Member

eric-haibin-lin commented Apr 22, 2019

Backward incompatibility is a concern. What about adding env var like MXNET_ENFORCE_SAFE_ACCUMULATION = 1 to trigger safe accumulation with higher precision on existing ops?

@nswamy
Copy link
Member Author

nswamy commented Apr 22, 2019

If we have to do this for many operators it makes sense to control through a environment variable like you are suggesting. how many operators need this change?

What i am puzzled about is why the loss goes up when the accumulation precision is increased.

@eric-haibin-lin
Copy link
Member

We need this for softmax, log_softmax, norm and layernorm (coming soon).

@szha @ptrendx what do you guys think is the best way to handle this?

@anirudh2290
Copy link
Member

Can we avoid adding another env variable ? We can accumulate only when dtype is set otherwise use the default behavior before this change was added. Also I understand FP16 inputs with accumulation in FP32 for softmax but does this also apply to FP32 inputs with accumulation in FP64. Will there be no accuracy loss ?

@eric-haibin-lin
Copy link
Member

Should be fixed in #14830

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants