Make the default matmul precision float32 even on TPUs #7010

shoyer · 2021-06-17T21:37:33Z

Follow-up from #2161:

The default low precision is a bit of a footgun, at least when doing anything that isn't implementing a neural net layer. In my opinion, it would be much safer to use "highest" precision by default (which isn't that much slower) on float32 data. Neural net libraries, of course, can default to lower precision, so this really only effects users who directly use NumPy APIs or the @ infix operator.

j-towns · 2021-06-21T14:25:14Z

I would argue this is also a footgun for some neural net use cases. Both of the two times that I’ve carefully re-implemented a model in JAX, I’ve found that performance is worse than expected with the default precision and it took me some time to realise that precision was the cause.

As a (temporary) improvement on the current situation, we could at least add some information on this issue to point 5 of the ‘Current Gotchas’ in the readme.

shoyer · 2021-06-22T22:32:52Z

#6143 added a config flag, so this should in principle be easier to change now.

j-towns · 2021-07-16T10:29:02Z

As @juliuskunze just pointed out to me, this is the only difference in semantics (that we can think of) between the GPU, CPU and TPU backends. 'Backend transparency' is a valuable property in mine and Julius's opinion, and given how close JAX is to achieving it (if this really is the only case where semantics differ significantly), it's surely worth changing the TPU default for all matmul-like ops (including conv) to closely approximate GPU and CPU behaviour. I understand this will harm 'default' speed, but I think we can mitigate that by making clear the availability of the bfloat16 option.

jonbarron · 2021-07-23T16:03:26Z

My team shot itself in the foot last week for the ~fourth time due to matmult defaulting to bfloat16. This issue continues to be my biggest/only grievance with JAX.

mattjj · 2021-09-08T20:54:38Z

Just got another +1 for this issue. Another foot lost.

On CPU and GPU, this change has no effect. On TPU, this PR changes the default matmul algorithm from a fast, low-quality algorithm to a slower, high-precision algorithm that uses multiple passes. Many users have reported the low-quality-by-default behavior to be a footgun, especially when performing non-neural network computations. The old behavior can be restored either by passing an explicit Precision option to operators such as `dot`, or by changing the default precision, e.g., jax.config.update('jax_default_matmul_precision', 'fastest') #7010 PiperOrigin-RevId: 395549544

patrickvonplaten · 2022-02-22T20:03:16Z

+1 for this issue from the Transformers team for one of the most popular architectures for speech recognition (Wav2Vec2) - see: huggingface/transformers#15754

inoryy · 2022-05-12T10:45:52Z

+1 to setting default to f32.

Having had the pleasure of shooting both my feet with either performance gotchas and numerics gotchas, I'd wholeheartedly prefer debugging the performance ones as they're much easier to spot.

ppwwyyxx · 2022-06-29T21:22:26Z

Similar discussion in pytorch: pytorch/pytorch#67384. Pytorch once enabled tfloat32 by default for a few ops, and then had to revert the decision due to similar complains. Enabling bfloat16 by default is presumably even worse.

jakeh-gc · 2022-11-15T17:37:05Z

If anyone here is collecting feet 🦶🔫, I lost one to something related to this too. #12008 (comment)

nouiz · 2022-11-15T17:39:44Z

What about having a kind of warning/info printed once per process when the low precision is used by default? Printing too much stuff like TF by default isn't great. But I think this one is worth it.
We should allow to remove that warning too.

DavidNorman · 2022-11-15T17:48:29Z

It does seem a poor choice of user experience that the system does the wrong thing by default, and forces a developer to have to debug its unexpected behaviour. Rather than do the right thing by default and allow the developer an opportunity to feel good about optimising the performance by selecting lower precision maths. An action which they will then find easy to undo should the system perform badly due to the limited precision.

ayaka14732 · 2022-11-16T09:50:18Z

It does seem a poor choice of user experience that the system does the wrong thing by default, and forces a developer to have to debug its unexpected behaviour. Rather than do the right thing by default and allow the developer an opportunity to feel good about optimising the performance by selecting lower precision maths. An action which they will then find easy to undo should the system perform badly due to the limited precision.

+1 for this. My benchmark shows that models in low precision do not always get the same performance as high precision, so the correct way (i.e. high precision) should be the default.

nouiz · 2022-11-23T21:58:47Z

My personal point of view is that this is more complicated than this.
Perf, when we talk about >2x speed up is a feature. Smallish speed up like 1.Nx isn't as much a feature.
Here, DL and non-DL community need different feature.

When a software target or is used dominantly by one community, you just pick the favorite default.
But JAX has a fair amount of non-DL users that need different features then DL. So the choice is more complicated. If we do the switch other users will complain about the big slowdown.

Increasing the knowledge of this issue would be a first good step that I guess people would agree on.
Like adding this in the FAQ, the Sharp Bit and GPU doc. If people know about this, they can adjust more easily.
Are you interested in contributing this?

jaschau · 2023-01-15T11:10:05Z

Solving neural ODEs with diffrax is also affected by the unexpected default choice of TensorFloat32, see here patrick-kidger/diffrax#213.

kvablack · 2023-07-13T18:49:21Z

+1 another foot lost (kvablack/ddpo-pytorch#3 (comment)), this time in the form of forcing me to update 1/4 of the results for a paper that I already submitted and released. I do think using high-precision by default and allowing users to opt-in for better performance is much easier to debug.

shoyer added the enhancement New feature or request label Jun 17, 2021

jakevdp assigned hawkinsp Jun 18, 2021

j-towns mentioned this issue Jul 16, 2021

Add TPU precision details to README gotchas #7304

Merged

copybara-service bot mentioned this issue Sep 8, 2021

Change the default matmul precision in JAX to highest precision. #7859

Closed

shoyer mentioned this issue Feb 14, 2022

Adds matrix sqrt #9544

Merged

stas00 mentioned this issue Apr 5, 2022

RFC: torch==1.12 will toggle torch.backends.matmul.allow_tf32 to False - what should we do? huggingface/transformers#16588

Closed

hawkinsp mentioned this issue Jun 14, 2022

Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

Closed

This was referenced Nov 14, 2022

jnp.linalg.solve() not precise on TPU #13224

Closed

BUG: Many JAX tests fail on A100 because of TF32 math #12008

Closed

kvablack mentioned this issue Jul 13, 2023

reproducing the aesthetic experiment kvablack/ddpo-pytorch#3

Closed

bhack mentioned this issue Apr 3, 2024

Torch compile settings haifeng-jin/keras-benchmarks#2

Closed

jaeminoh mentioned this issue Apr 16, 2024

Different results on different GPUs f0uriest/interpax#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the default matmul precision float32 even on TPUs #7010

Make the default matmul precision float32 even on TPUs #7010

shoyer commented Jun 17, 2021

j-towns commented Jun 21, 2021

shoyer commented Jun 22, 2021

j-towns commented Jul 16, 2021 •

edited

Loading

jonbarron commented Jul 23, 2021

mattjj commented Sep 8, 2021

patrickvonplaten commented Feb 22, 2022

inoryy commented May 12, 2022

ppwwyyxx commented Jun 29, 2022 •

edited

Loading

jakeh-gc commented Nov 15, 2022 •

edited

Loading

nouiz commented Nov 15, 2022

DavidNorman commented Nov 15, 2022

ayaka14732 commented Nov 16, 2022

nouiz commented Nov 23, 2022

jaschau commented Jan 15, 2023

kvablack commented Jul 13, 2023

Make the default matmul precision float32 even on TPUs #7010

Make the default matmul precision float32 even on TPUs #7010

Comments

shoyer commented Jun 17, 2021

j-towns commented Jun 21, 2021

shoyer commented Jun 22, 2021

j-towns commented Jul 16, 2021 • edited Loading

jonbarron commented Jul 23, 2021

mattjj commented Sep 8, 2021

patrickvonplaten commented Feb 22, 2022

inoryy commented May 12, 2022

ppwwyyxx commented Jun 29, 2022 • edited Loading

jakeh-gc commented Nov 15, 2022 • edited Loading

nouiz commented Nov 15, 2022

DavidNorman commented Nov 15, 2022

ayaka14732 commented Nov 16, 2022

nouiz commented Nov 23, 2022

jaschau commented Jan 15, 2023

kvablack commented Jul 13, 2023

j-towns commented Jul 16, 2021 •

edited

Loading

ppwwyyxx commented Jun 29, 2022 •

edited

Loading

jakeh-gc commented Nov 15, 2022 •

edited

Loading