for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

mshergad · 2023-07-11T19:34:33Z

🐛 Bug

given this example

from torch import tensor
from torchmetrics.classification import BinaryCalibrationError
preds = tensor([0.25, 0.25, 0.55, 0.75, 0.75])
target = tensor([0, 0, 1, 1, 1])
metric = BinaryCalibrationError(n_bins=2, norm='l1')
metric(preds, target)

the number of bins are 2 but they should be 3 since there are three probability values -- not sure if this is a documentation error or a bug

To Reproduce

Steps to reproduce the behavior...

Code sample

Expected behavior

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source):
Python & PyTorch Version (e.g., 1.0):
Any other relevant information such as OS (e.g., Linux):

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2023-07-11T19:35:05Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2023-07-12T15:39:39Z

Hi @mshergad, thanks for raising this issue.
Not completely sure what is the problem here. The n_bins is just a freely chosen parameter that determines how precise your approximation of the calibration error is. So for the sake of the example it does not really matter what value I set it to.
Regarding the line you are referring to in the code, that is the normal way to define the bin boundaries. If I have n_bin=2 and I then evaluate

torch.linspace(0,1,n_bin)

I get tensor([0., 1.]) which is not correct, this only gives me one bin from 0 to 1. Adding 1 to the number of steps:

torch.linspace(0,1,n_bin+1)

I get tensor([0.0000, 0.5000, 1.0000]) which correctly are the two bins I would expect, one from 0 to 0.5 and one from 0.5 to 1.

mshergad · 2023-07-12T16:35:18Z

Thanks for the quick response. Here is the example I tried:

preds is this tensor([0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.8000, 0.8000, 0.0100, 0.3300,
0.3400, 0.9900, 0.6100], dtype=torch.float64)

say target is tensor([1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0])

for number of bins as 100 and 99 the value is different --

if I consider 100 bins split for 0-1 I would have the right boundary of each bin in increments of 0.01
for the 100 bin case and using scikit learn and manual calculation of accuracy and weighted average I get the same calculation as that of 99 bins not 100 bins

SkafteNicki · 2023-07-13T09:29:42Z

Thanks for the example. I can definitely see now this is some corner case that we are not covering right now.
I tried to remove the +1 for the code and run our unittest and there everything is starting to fail, so I do not think that is the solution.
Internally we compare against
https://github.com/EFS-OpenSource/calibration-framework
which does work for the example you have provided:

from torchmetrics.functional.classification import binary_calibration_error
from netcal.metrics import ECE
import torch
from torch import tensor
preds =  tensor([0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.8000, 0.8000, 0.0100, 0.3300, 0.3400, 0.9900, 0.6100], dtype=torch.float64)
target = tensor([1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0])

print(binary_calibration_error(preds, target, n_bins=99, norm="l1"))  # tensor(0.4733, dtype=torch.float64)
print(binary_calibration_error(preds, target, n_bins=100, norm="l1"))  # tensor(0.4183, dtype=torch.float64)

print(ECE(99).measure(preds.numpy(), target.numpy()))  # 0.4733333333333334
print(ECE(100).measure(preds.numpy(), target.numpy()))  # 0.4733333333333333

they also have a +1 in their code regarding the bin boundaries:
https://github.com/EFS-OpenSource/calibration-framework/blob/45bebd15c873ae399348b8148eb2ea5c89254d27/netcal/metrics/Miscalibration.py#L379
to me this means that it is later in the code that we are doing something wrong that is causing this example to fail.

SkafteNicki · 2023-07-13T09:37:37Z

Just saw that calibration curve in sklearn also has a +1:
https://github.com/scikit-learn/scikit-learn/blob/7f9bad99d6e0a3e8ddf92a7e5561245224dab102/sklearn/calibration.py#L1052

SkafteNicki · 2023-07-13T10:36:27Z

Okay, after further investigation it seems to just be a matter of precision e.g. float vs double.
Numpy/Sklearn by default uses double, whereas torch uses float. In this case the fix is fairly simple:

torchmetrics/src/torchmetrics/functional/classification/calibration_error.py

Line 85 in 0fad881

    
           bin_boundaries = torch.linspace(0, 1, bin_boundaries + 1, dtype=torch.float, device=confidences.device)

instead dtype=torch.float it should be dtype=confidences.dtype.
Fix is implemented in PR #1919, with the example provided in this issue added as an unit test.

mshergad · 2023-07-13T15:07:56Z

That’s great to hear that you’ve found the issue! I wasn’t able to grock how a precision error caused this change but I really appreciate your quick responses and actions. Thanks!

mshergad added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 11, 2023

SkafteNicki added this to the v1.1.0 milestone Jul 12, 2023

SkafteNicki added working as intended and removed bug / fix Something isn't working labels Jul 12, 2023

SkafteNicki added bug / fix Something isn't working and removed working as intended labels Jul 13, 2023

SkafteNicki mentioned this issue Jul 13, 2023

Fix precision issue in calibration error #1919

Merged

4 tasks

SkafteNicki closed this as completed in #1919 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

mshergad commented Jul 11, 2023 •

edited by SkafteNicki

Loading

github-actions bot commented Jul 11, 2023

SkafteNicki commented Jul 12, 2023

mshergad commented Jul 12, 2023

SkafteNicki commented Jul 13, 2023

SkafteNicki commented Jul 13, 2023

SkafteNicki commented Jul 13, 2023

mshergad commented Jul 13, 2023

for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

Comments

mshergad commented Jul 11, 2023 • edited by SkafteNicki Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

github-actions bot commented Jul 11, 2023

SkafteNicki commented Jul 12, 2023

mshergad commented Jul 12, 2023

SkafteNicki commented Jul 13, 2023

SkafteNicki commented Jul 13, 2023

SkafteNicki commented Jul 13, 2023

mshergad commented Jul 13, 2023

mshergad commented Jul 11, 2023 •

edited by SkafteNicki

Loading