Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification: option to disable input formatting [wip] #1676

Open
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

SkafteNicki
Copy link
Member

@SkafteNicki SkafteNicki commented Mar 31, 2023

What does this PR do?

Fixes #1526
Fixes #1604
Fixes #1989
Fixes #2195
Fixes #2329

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • accuracy.py
  • auroc.py
  • average_precision.py
  • calibration_error.py
  • cohen_kappa.py
  • confusion_matrix.py
  • dice.py
  • exact_match.py
  • f_beta.py
  • group_fairness.py
  • hamming.py
  • hinge.py
  • jaccard.py
  • matthews_corrcoef.py
  • precision_fixed_recall.py
  • precision_recall.py
  • precision_recall_curve.py
  • ranking.py
  • recall_fixed_precision.py
  • roc.py
  • specificity.py
  • specificity_sensitivity.py
  • stat_scores.py
PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@Borda
Copy link
Member

Borda commented Apr 17, 2023

@SkafteNicki how is it going here? 🐰

@SkafteNicki SkafteNicki modified the milestones: v1.0.0, future Jun 2, 2023
@samet-akcay
Copy link

Hi @SkafteNicki, do you guys have any update on this one please?

@Borda
Copy link
Member

Borda commented Aug 3, 2023

do you guys have any update on this one please?

We need to address failing checks

@Borda
Copy link
Member

Borda commented Aug 7, 2023

@SkafteNicki how is it going here? 🐰

@SkafteNicki SkafteNicki modified the milestones: future, v1.0.x Aug 9, 2023
@@ -329,6 +349,16 @@ def binary_precision_recall_curve(
Specifies a target value that is ignored and does not contribute to the metric calculation
validate_args: bool indicating if input arguments and tensors should be validated for correctness.
Set to ``False`` for faster computations.
input_format: str or bool specifying the format of the input preds tensor. Can be one of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this looks good to me...
cc: @Lightning-AI/core-metrics @awaelchli

Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Merging #1676 (a7d719b) into master (4c999b8) will decrease coverage by 0%.
Report is 14 commits behind head on master.
The diff coverage is 36%.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1676   +/-   ##
======================================
- Coverage      69%     69%   -0%     
======================================
  Files         307     307           
  Lines       17352   17406   +54     
======================================
+ Hits        11961   11992   +31     
- Misses       5391    5414   +23     

@mergify mergify bot removed the has conflicts label Jan 12, 2024
@idc9
Copy link

idc9 commented Feb 3, 2024

Thanks for working on the bug in #1604! I suspect this bug is silently affecting people currently using torchmetrics.

It looks like the current solution is a new argument input_formatand gives the user the option to specify if they are providing probabilities, logits, etc (see below documentation). This almost fixes the problem, but this argument defaults to input_format='auto' which will cause exactly the bug discussed in #1604 (i.e. if your logits happen to all be in [0, 1] this will incorrectly fail to convert them to probabilities).

I see two options

  1. Change the default to input_format='logits' (or probs though I think logits is the most common input). This default options is explicit about behavior that is a but subtle and will prevent the bug from happening by defaults

  2. Get rid of the auto option all together. The auto option will always be buggy if the user inputs logits.

I personally see no reason to include the auto option and would vote for option 2.

input_format: str specifying the format of the input preds tensor. Can be one of:

    - ``'auto'``: automatically detect the format based on the values in the tensor. If all values
        are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
        If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
        tensor to be logits and will apply sigmoid to the tensor and threshold the values.
    - ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
        thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
    - ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
        will apply sigmoid to the tensor and threshold the values before calculating the metric.
    - ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
        applied to preds tensor.
    - ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.

@SkafteNicki
Copy link
Member Author

@idc9 thanks for giving your opinions on this issue.
I already laid out on slack some of my opinions, but just for transparency here are the two main reasons for having auto as the default:

  1. To keep everything backwards compatible
  2. To say "The auto option will always be buggy if the user inputs logits." is simply wrong. For reasonably sized input, any model should really output logit values outside the [0,1] range which after sigmoid transformation corresponds to probabilities in the [0.5, 0.73] range. If a model is only outputting values in the 0.5-0.73 probability range, this will mean it is never going to predict the negative class, and it is never confident about the positive class. Both seems highly unlikely for any well trained model.

@NoahAtKintsugi
Copy link

@SkafteNicki In cases where GPU memory limits batch sizes to be very small, there will be no "reasonably-sized input", so "auto" will be buggy. What about not having "auto", but rather having only "probs" and "logits", with "probs" being the default and verifying that its input is in [0, 1]? This will be backwards compatible, except raise an error in the case that the old version was doing the wrong thing. The error message can suggest the user might mean input_format = "logits".

@idc9
Copy link

idc9 commented Feb 6, 2024

Sorry I should have been more careful with my language! Let me rephrase “The auto option will always be buggy if the user inputs logits” to “The ‘auto’ option leaves open the possibility for the bug to happen when the user inputs logits”.

Since the auto option leaves open the possibility of this bug occurring I suspect we don’t we want to preserve backwards compatibility. This is a subtle/unexpected issue many users won’t immediately realize it can be an issue. Given it’s straightforward for the user to specify logits/prob, I’m curious if there is any case where ‘auto’ is useful (i.e. should it just be removed)?

As @SkafteNicki pointed out the logits bug will only occur if all of the predicted probabilities lie in [0.5, 0.73]. It’s true this is likely not the usual scenario but it probably does come up. A few examples where it may be more common include: small batches, early phases of model training, difficult applications where hold-out probabilities are (sadly) closeish to 0.5, unbalanced classes, situations where your label prediction rule thresholds the probability at some value different from 0.5.

@SkafteNicki
Copy link
Member Author

@NoahAtKintsugi and @idc9 Alright, I am willing to give me on this, but we are then going to break backwards compatibility, which I do not take as a light issue. If we do this change we need to do it right. You are probably right that it is better to break stuff now and then fix this issue that may be a potential hidden problem for some users.

@Borda and @justusschock please give some opinions on this. Do we

  • continue with input_format="auto" as the default keeping everything backwards compatibility
  • set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5
  • something else...

@mergify mergify bot removed the has conflicts label Feb 15, 2024
@Borda
Copy link
Member

Borda commented Feb 15, 2024

  • set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5

I think due to stability, we may add a warning but being having a hard switch from 2.0
cc: @lantiga

@idc9
Copy link

idc9 commented Feb 20, 2024

Defaulting to logits might make the most sense. I suspect is that it's more common for users to default to outputting logits than outputting probabilities (e.g. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, https://github.com/huggingface/pytorch-image-models/blob/main/train.py, https://github.com/pytorch/examples/blob/main/imagenet/main.py) probably since this avoids a small bit of extra computation.

@Borda
Copy link
Member

Borda commented Mar 28, 2024

@SkafteNicki, how is this one going? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants