Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from half precision to bfloat16 in Pearson correlation coefficient cause numerical errors #1922

Closed
ziw-liu opened this issue Jul 14, 2023 · 3 comments · Fixed by #1926
Labels
bug / fix Something isn't working help wanted Extra attention is needed

Comments

@ziw-liu
Copy link

ziw-liu commented Jul 14, 2023

🐛 Bug

The lack of half-precision op support on CPUs has lead to the decision to convert to bfloat16 in the calculation of Pearson correlation coefficient. (#1813)

However this leads to broadcasting errors when the values are (meaningfully) small.

To Reproduce

>>> import torch
>>> a = torch.rand((32,), dtype=torch.half)
>>> b = torch.rand((32,), dtype=torch.half)
>>> from torchmetrics.functional import pearson_corrcoef
>>> pearson_corrcoef(a, b)
tensor(0.2240)
>>> pearson_corrcoef(a * 1e-1, b * 1e-1)
tensor(0.2251)
>>> pearson_corrcoef(a * 1e-2, b * 1e-2)
tensor(0.2213)
>>> pearson_corrcoef(a * 1e-3, b * 1e-3)
tensor(0.)
>>> pearson_corrcoef(a * 1e-4, b * 1e-4)
tensor(nan)

Which generally makes sense since there are only 5 exponential bits in float16, and some are lost in the operations. However this is not obvious during debugging since 0.001 is not that small...

Expected behavior

A warning when the dynamic range is problematic, such as what SciPy does.

Or there may be some way to preserve more precision?

Environment

  • TorchMetrics version (pypi v1.0.1):
  • Python & PyTorch Version (3.10, 2.0.1):
  • Any other relevant information such as OS (tested on Linux_x64 and macOS_arm64):
@ziw-liu ziw-liu added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 14, 2023
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi @ziw-liu, thanks for raising this issue :]
I looked into it and it actually seems from the next version of Pytorch (v2.1.0) all operations involved in calculating the Pearson corr coef will have native support for half precision. So at that point we could remove the bfloat16 conversion.

That said, it is not really the underlying problem here. For values in the 1e-3 and 1e-4 range you are also simply running into problems with what can be done in half precision. In particular, it is the final computation that is a problem

corrcoef = (corr_xy / (var_x * var_y).sqrt()).squeeze()

Essentially, the variance of the input is so small that the multiplication of the variance is running out of precision:
billede
The calculation is only working in half precision when the individual variances are larger than 1e-3 to 1e-4.

@ziw-liu
Copy link
Author

ziw-liu commented Jul 17, 2023

@SkafteNicki Thanks for looking into this!

The best way to handle small variance input seems to be a UserWarning then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants