Computing Accuarcy on 16bit softmaxed values causes ValueError #245

l-salewski · 2021-05-12T14:00:37Z

🐛 Bug

Computing an accuarcy on on-the-fly softmaxed values causes ValueError Probabilities in preds must sum up to 1 across the C dimension even though the tensor does sum up to 1 in the correct dimension.

I upgraded pytorch-lightning from 1.1.8 to 1.2.10 to 1.3.1 and thus had to manually insert F.softmax(logits) to make the accuarcies work again (see this and this). No big deal in itself. But I am getting said ValueError immediately on 1.3.1 and after a couple of dozens of steps on 1.2.10. I have checked, that I am setting the correct dim on F.softmax(logits, dim=1).

I am suspecting that the root cause could be half precision (which I am always using, as my model is large), causing torch.isclose(...) to wrongly trigger this check:

if not torch.isclose(preds.sum(dim=1), torch.ones_like(preds.sum(dim=1))).all():

After removing the .all() it turns out, that only for some values, isclose returns False.

Workaround

I tried to adjust rtol of isclose and 1e-04 instead of the original 1e-05 and that works fine. A hacky solution I guess, ideally this setting would be derived from the possible value range of 16 bit numbers.

Expected behavior

No ValueError should be risen, as the values sum up to 1.

Environment

PyTorch Version (e.g., 1.0): 1.8.1
OS (e.g., Linux): Ubuntu
How you installed PyTorch: conda
Python version: 3.9.4
CUDA/cuDNN version: 11
GPU models and configuration: v100

The text was updated successfully, but these errors were encountered:

github-actions · 2021-05-12T14:01:18Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2021-05-13T08:18:20Z

Hi l-salewski,
Thanks for raising this issue. Yesterday this PR #200 was merged that removes the check that is the preds tensor needs to sum to 1 such that users also can input unnomalized model output (logits). The issue therefore seems to be automatically solved by that. To get these changes you can install from master (pip install git+https://github.com/PytorchLightning/metrics.git@master)
Going to close the issue, feel free to reopen if the issue persist :]

l-salewski · 2021-05-13T14:18:17Z

Hi SkafeNicki, thanks for the swift response and the even faster PR!

l-salewski added bug / fix Something isn't working help wanted Extra attention is needed labels May 12, 2021

SkafteNicki closed this as completed May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

l-salewski commented May 12, 2021

github-actions bot commented May 12, 2021

SkafteNicki commented May 13, 2021

l-salewski commented May 13, 2021

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

Computing Accuarcy on 16bit softmaxed values causes ValueError #245

Comments

l-salewski commented May 12, 2021

🐛 Bug

Workaround

Expected behavior

Environment

github-actions bot commented May 12, 2021

SkafteNicki commented May 13, 2021

l-salewski commented May 13, 2021