upcasting accumulation type in norm increases loss/Perplexity #14760

nswamy · 2019-04-22T07:01:30Z

While fixing an issue described here.
I applied the changes from master and found that the change from #14616 follows the same pattern as the softmax change in #14098 and increases the loss and hence perpelity from 126.66 to 135, without this change and the fix to softmax #14759, validation perplexity comes back to 126.66

Details on how to test are provided in #14722

@haojin2 @eric-haibin-lin

eric-haibin-lin · 2019-04-22T16:59:24Z

Backward incompatibility is a concern. What about adding env var like MXNET_ENFORCE_SAFE_ACCUMULATION = 1 to trigger safe accumulation with higher precision on existing ops?

nswamy · 2019-04-22T17:15:23Z

If we have to do this for many operators it makes sense to control through a environment variable like you are suggesting. how many operators need this change?

What i am puzzled about is why the loss goes up when the accumulation precision is increased.

eric-haibin-lin · 2019-04-22T17:54:00Z

We need this for softmax, log_softmax, norm and layernorm (coming soon).

@szha @ptrendx what do you guys think is the best way to handle this?

anirudh2290 · 2019-04-23T00:38:56Z

Can we avoid adding another env variable ? We can accumulate only when dtype is set otherwise use the default behavior before this change was added. Also I understand FP16 inputs with accumulation in FP32 for softmax but does this also apply to FP32 inputs with accumulation in FP64. Will there be no accuracy loss ?

eric-haibin-lin · 2019-05-22T18:54:21Z

Should be fixed in #14830

nswamy added Bug Operator labels Apr 22, 2019

anirudh2290 mentioned this issue Apr 30, 2019

Use env var to enforce safe accumulation in ReduceAxesCompute #14830

Merged

8 tasks

eric-haibin-lin closed this as completed May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upcasting accumulation type in norm increases loss/Perplexity #14760

upcasting accumulation type in norm increases loss/Perplexity #14760

nswamy commented Apr 22, 2019

eric-haibin-lin commented Apr 22, 2019 •

edited

Loading

nswamy commented Apr 22, 2019

eric-haibin-lin commented Apr 22, 2019

anirudh2290 commented Apr 23, 2019

eric-haibin-lin commented May 22, 2019

upcasting accumulation type in norm increases loss/Perplexity #14760

upcasting accumulation type in norm increases loss/Perplexity #14760

Comments

nswamy commented Apr 22, 2019

eric-haibin-lin commented Apr 22, 2019 • edited Loading

nswamy commented Apr 22, 2019

eric-haibin-lin commented Apr 22, 2019

anirudh2290 commented Apr 23, 2019

eric-haibin-lin commented May 22, 2019

eric-haibin-lin commented Apr 22, 2019 •

edited

Loading