Matrix multiplication precision API #2161

shoyer · 2020-02-04T02:43:06Z

Operations that do matrix multiplication in JAX accept a precison for controlling precision when executed on TPUs.

The current API seems non-ideal to me:

You have to pass an enum value (e.g., np.dot(x, y, precision=lax.Precison.HIGHEST)). This is a little cumbersome and inconsistent with most NumPy/SciPy APIs which use strings (e.g., np.dot(x, y, precision='highest')).
The current names for precision levels ("highest", "high" and "default") are not very descriptive. In my ideal world we would use some direct indication of the corresponding precision (e.g., bfloat16 multiplication with float32 accumulation), but as the very least can we switch "default" to "low"?
The default low precision is a bit of a footgun, at least when doing anything that isn't implementing a neural net layer. In my opinion, it would be much safer to use "highest" precision by default (which isn't that much slower) on float32 data. Neural net libraries, of course, can default to lower precision, so this really only effects users who directly use NumPy APIs or the @ infix operator.

The text was updated successfully, but these errors were encountered:

shoyer · 2020-04-17T16:52:15Z

On TPUs, "HIGH" precision corresponds to 3 passes of bfloat16 and "HIGHEST" precision corresponds to 6 passes, which is effectively full float32 (dropping denormals), as explained in this Intel paper.

With that in mind, and considering that ideally we would retain some flexibility for alternative matmul optimizations that might appears on other platforms, what more descriptive naming scheme makes sense for values of the precision argument?

Some ideas:

'low', 'high', 'highest': description of precision level
'fastest', 'fast', 'slow': description of speed
'fastest', 'fast', 'accurate': mixed description, using only positive words
'bfloat16', 'float24', 'float32': rough precision of the underlying arithmetic (but what is "float24"??)

I think I lean towards option 3?

shoyer · 2020-04-17T21:19:40Z

Notes from offline discussion:

This is really a "minimum precision" configuration, so perhaps a name like min_precision would be more appropriate.
The other way to configure matmul precision (maybe more obvious) is by explicitly setting dtype. XLA will use lowest precision on bfloat16 data regardless of the precision option.
We want an API that also can support new matmul precision options as they arise on different platforms (GPU, CPU, etc), e.g., precision={'tpu': 'bfloat16', 'gpu': 'float16'}.
Another option would be to specify precision numerically, e.g., precision=1e-2 or precision=1e-6. But this mixes together precision in the significand and the exponent, which misses important nuances like bfloat16 vs float16.

Given that we want to support platform specific options, descriptive names seem like the best bet.

The main remaining concern is what to call "3 pass bfloat16" precision on TPUs, which approximates roughly 16 bits of precision for the significand. "intermediate" precision would be OK for TPUs, but seems very vague in general. Maybe bfloat24 or bfloat16_3x would be appropriate? (We could also support bfloat16_6x as a more precise description of float32.)

shoyer · 2020-05-16T17:12:57Z

“3 pass bfloat16” is coincidentally very close to (slightly higher than?) the precision of Nvidia’s new “tensorfloat32”. So that could also be a good name for this intermediate precision on TPUs

hawkinsp · 2020-08-19T20:22:32Z

Users have also requested a way to set a more "global" default precision.

One possible mechanism to do this is via a scope, e.g.:

with jax.precision("highest"):
  ...

I would suggest that it should override only operations with default precision.

shoyer · 2020-08-19T20:45:05Z

I would suggest that it should override only operations with default precision.

I assume you mean only for precision=None, rather than the confusingly named precision=lax.Precison.DEFAULT (aka bfloat16)?

hawkinsp · 2020-08-20T14:00:20Z

Yes, I meant None, not what we are currently calling DEFAULT.

marksandler2 · 2023-05-05T22:41:48Z

I just spent several day chasing numerical stability issue until i pin-pointed it the matmul precision. Using "low" precision by default, seems like a very questionable design decision, outside of optimized pipelines with carefully measured speed/quality trade-offs.

hawkinsp · 2023-05-08T12:47:33Z

@marksandler2 The issue is that there are at least two communities of people using JAX:

machine learning researchers/practitioners, who mostly expect speed over precision
scientific users, who expect precision over speed.

I'm not sure there's any default setting that makes both groups happy.

You didn't say if you are using TPU or GPU, but on TPU, at least, one can argue that if you didn't want fast lower-precision matmuls most of the time then using a TPU is an odd choice. It's more complicated on GPU.

shoyer mentioned this issue Feb 22, 2020

Default matrix-multiplication precision on TPUs #1856

Closed

shoyer mentioned this issue Apr 16, 2020

Initial implementation of np.linalg.lstsq() via SVD #2744

Merged

shoyer mentioned this issue Jan 10, 2021

Complex SVD JVP rule #5225

Merged

mattjj mentioned this issue Mar 19, 2021

add jax_default_matmul_precision flag and context manager #6143

Merged

2 tasks

copybara-service bot closed this as completed in #6143 Mar 24, 2021

shoyer mentioned this issue Jun 17, 2021

Make the default matmul precision float32 even on TPUs #7010

Open

hawkinsp mentioned this issue Jun 12, 2023

Significant numerical differences of dot_general on TPU #16349

Closed

bwohlberg mentioned this issue Jul 19, 2024

Matrix multiplication inaccurate on A100 in both TF32 and FP32 #19444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix multiplication precision API #2161

Matrix multiplication precision API #2161

shoyer commented Feb 4, 2020 •

edited

Loading

shoyer commented Apr 17, 2020 •

edited

Loading

shoyer commented Apr 17, 2020 •

edited

Loading

shoyer commented May 16, 2020

hawkinsp commented Aug 19, 2020

shoyer commented Aug 19, 2020

hawkinsp commented Aug 20, 2020

marksandler2 commented May 5, 2023

hawkinsp commented May 8, 2023

Matrix multiplication precision API #2161

Matrix multiplication precision API #2161

Comments

shoyer commented Feb 4, 2020 • edited Loading

shoyer commented Apr 17, 2020 • edited Loading

shoyer commented Apr 17, 2020 • edited Loading

shoyer commented May 16, 2020

hawkinsp commented Aug 19, 2020

shoyer commented Aug 19, 2020

hawkinsp commented Aug 20, 2020

marksandler2 commented May 5, 2023

hawkinsp commented May 8, 2023

shoyer commented Feb 4, 2020 •

edited

Loading

shoyer commented Apr 17, 2020 •

edited

Loading

shoyer commented Apr 17, 2020 •

edited

Loading