Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for BinaryCalibrationError the n_bins needs to be 1 less than actual to match manual calculation #1907

Closed
mshergad opened this issue Jul 11, 2023 · 7 comments · Fixed by #1919
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Milestone

Comments

@mshergad
Copy link

mshergad commented Jul 11, 2023

🐛 Bug

given this example

from torch import tensor
from torchmetrics.classification import BinaryCalibrationError
preds = tensor([0.25, 0.25, 0.55, 0.75, 0.75])
target = tensor([0, 0, 1, 1, 1])
metric = BinaryCalibrationError(n_bins=2, norm='l1')
metric(preds, target)

the number of bins are 2 but they should be 3 since there are three probability values -- not sure if this is a documentation error or a bug
Screenshot 2023-07-11 at 12 33 24 PM

To Reproduce

Steps to reproduce the behavior...

Code sample

Expected behavior

Environment

  • TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source):
  • Python & PyTorch Version (e.g., 1.0):
  • Any other relevant information such as OS (e.g., Linux):

Additional context

@mshergad mshergad added bug / fix Something isn't working help wanted Extra attention is needed labels Jul 11, 2023
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki SkafteNicki added this to the v1.1.0 milestone Jul 12, 2023
@SkafteNicki
Copy link
Member

Hi @mshergad, thanks for raising this issue.
Not completely sure what is the problem here. The n_bins is just a freely chosen parameter that determines how precise your approximation of the calibration error is. So for the sake of the example it does not really matter what value I set it to.
Regarding the line you are referring to in the code, that is the normal way to define the bin boundaries. If I have n_bin=2 and I then evaluate

torch.linspace(0,1,n_bin)

I get tensor([0., 1.]) which is not correct, this only gives me one bin from 0 to 1. Adding 1 to the number of steps:

torch.linspace(0,1,n_bin+1)

I get tensor([0.0000, 0.5000, 1.0000]) which correctly are the two bins I would expect, one from 0 to 0.5 and one from 0.5 to 1.

@SkafteNicki SkafteNicki added working as intended and removed bug / fix Something isn't working labels Jul 12, 2023
@mshergad
Copy link
Author

Thanks for the quick response. Here is the example I tried:

preds is this tensor([0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.8000, 0.8000, 0.0100, 0.3300,
0.3400, 0.9900, 0.6100], dtype=torch.float64)

say target is tensor([1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0])

for number of bins as 100 and 99 the value is different --

Screenshot 2023-07-12 at 9 31 09 AM

if I consider 100 bins split for 0-1 I would have the right boundary of each bin in increments of 0.01
for the 100 bin case and using scikit learn and manual calculation of accuracy and weighted average I get the same calculation as that of 99 bins not 100 bins
Screenshot 2023-07-12 at 9 32 49 AM

@SkafteNicki SkafteNicki added bug / fix Something isn't working and removed working as intended labels Jul 13, 2023
@SkafteNicki
Copy link
Member

Thanks for the example. I can definitely see now this is some corner case that we are not covering right now.
I tried to remove the +1 for the code and run our unittest and there everything is starting to fail, so I do not think that is the solution.
Internally we compare against
https://github.com/EFS-OpenSource/calibration-framework
which does work for the example you have provided:

from torchmetrics.functional.classification import binary_calibration_error
from netcal.metrics import ECE
import torch
from torch import tensor
preds =  tensor([0.9000, 0.9000, 0.9000, 0.9000, 0.9000, 0.8000, 0.8000, 0.0100, 0.3300, 0.3400, 0.9900, 0.6100], dtype=torch.float64)
target = tensor([1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0])

print(binary_calibration_error(preds, target, n_bins=99, norm="l1"))  # tensor(0.4733, dtype=torch.float64)
print(binary_calibration_error(preds, target, n_bins=100, norm="l1"))  # tensor(0.4183, dtype=torch.float64)

print(ECE(99).measure(preds.numpy(), target.numpy()))  # 0.4733333333333334
print(ECE(100).measure(preds.numpy(), target.numpy()))  # 0.4733333333333333

they also have a +1 in their code regarding the bin boundaries:
https://github.com/EFS-OpenSource/calibration-framework/blob/45bebd15c873ae399348b8148eb2ea5c89254d27/netcal/metrics/Miscalibration.py#L379
to me this means that it is later in the code that we are doing something wrong that is causing this example to fail.

@SkafteNicki
Copy link
Member

@SkafteNicki
Copy link
Member

Okay, after further investigation it seems to just be a matter of precision e.g. float vs double.
Numpy/Sklearn by default uses double, whereas torch uses float. In this case the fix is fairly simple:

bin_boundaries = torch.linspace(0, 1, bin_boundaries + 1, dtype=torch.float, device=confidences.device)

instead dtype=torch.float it should be dtype=confidences.dtype.
Fix is implemented in PR #1919, with the example provided in this issue added as an unit test.

@mshergad
Copy link
Author

That’s great to hear that you’ve found the issue! I wasn’t able to grock how a precision error caused this change but I really appreciate your quick responses and actions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants