Converting from half precision to `bfloat16` in Pearson correlation coefficient cause numerical errors #1922

ziw-liu · 2023-07-14T21:10:38Z

🐛 Bug

The lack of half-precision op support on CPUs has lead to the decision to convert to bfloat16 in the calculation of Pearson correlation coefficient. (#1813)

However this leads to broadcasting errors when the values are (meaningfully) small.

To Reproduce

>>> import torch
>>> a = torch.rand((32,), dtype=torch.half)
>>> b = torch.rand((32,), dtype=torch.half)
>>> from torchmetrics.functional import pearson_corrcoef
>>> pearson_corrcoef(a, b)
tensor(0.2240)
>>> pearson_corrcoef(a * 1e-1, b * 1e-1)
tensor(0.2251)
>>> pearson_corrcoef(a * 1e-2, b * 1e-2)
tensor(0.2213)
>>> pearson_corrcoef(a * 1e-3, b * 1e-3)
tensor(0.)
>>> pearson_corrcoef(a * 1e-4, b * 1e-4)
tensor(nan)

Which generally makes sense since there are only 5 exponential bits in float16, and some are lost in the operations. However this is not obvious during debugging since 0.001 is not that small...

Expected behavior

A warning when the dynamic range is problematic, such as what SciPy does.

Or there may be some way to preserve more precision?

Environment

TorchMetrics version (pypi v1.0.1):
Python & PyTorch Version (3.10, 2.0.1):
Any other relevant information such as OS (tested on Linux_x64 and macOS_arm64):

The text was updated successfully, but these errors were encountered:

github-actions · 2023-07-14T21:11:18Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2023-07-17T09:59:41Z

Hi @ziw-liu, thanks for raising this issue :]
I looked into it and it actually seems from the next version of Pytorch (v2.1.0) all operations involved in calculating the Pearson corr coef will have native support for half precision. So at that point we could remove the bfloat16 conversion.

That said, it is not really the underlying problem here. For values in the 1e-3 and 1e-4 range you are also simply running into problems with what can be done in half precision. In particular, it is the final computation that is a problem

torchmetrics/src/torchmetrics/functional/regression/pearson.py

Line 101 in 74e8c06

corrcoef = (corr_xy / (var_x * var_y).sqrt()).squeeze()

Essentially, the variance of the input is so small that the multiplication of the variance is running out of precision:

The calculation is only working in half precision when the individual variances are larger than 1e-3 to 1e-4.

ziw-liu · 2023-07-17T16:36:18Z

@SkafteNicki Thanks for looking into this!

The best way to handle small variance input seems to be a UserWarning then?

ziw-liu added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 14, 2023

SkafteNicki mentioned this issue Jul 17, 2023

Add warning on small input in PearsonCorrCoef #1926

Merged

4 tasks

Borda closed this as completed in #1926 Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting from half precision to `bfloat16` in Pearson correlation coefficient cause numerical errors #1922

Converting from half precision to `bfloat16` in Pearson correlation coefficient cause numerical errors #1922

ziw-liu commented Jul 14, 2023

github-actions bot commented Jul 14, 2023

SkafteNicki commented Jul 17, 2023

ziw-liu commented Jul 17, 2023

Converting from half precision to bfloat16 in Pearson correlation coefficient cause numerical errors #1922

Converting from half precision to bfloat16 in Pearson correlation coefficient cause numerical errors #1922

Comments

ziw-liu commented Jul 14, 2023

🐛 Bug

To Reproduce

Expected behavior

Environment

github-actions bot commented Jul 14, 2023

SkafteNicki commented Jul 17, 2023

ziw-liu commented Jul 17, 2023

Converting from half precision to `bfloat16` in Pearson correlation coefficient cause numerical errors #1922

Converting from half precision to `bfloat16` in Pearson correlation coefficient cause numerical errors #1922