Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nDCG can not be called with negative relevance targets #378

Merged
merged 26 commits into from
Jul 28, 2021

Conversation

paul-grundmann
Copy link
Contributor

@paul-grundmann paul-grundmann commented Jul 16, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs? (No update needed)
  • Did you write any new necessary tests?

What does this PR do?

Fixes #377

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Of course I had fun fixing this issue :)

@Borda Borda changed the title Should fix issue #377 - fix nDCG can not be called with negative relevance targets Jul 16, 2021
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also add a test to cover this behaviour as nothing was failing till now...

@Borda Borda added the bug / fix Something isn't working label Jul 16, 2021
@codecov
Copy link

codecov bot commented Jul 16, 2021

Codecov Report

Merging #378 (84d7be6) into master (e00e3ab) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #378   +/-   ##
=======================================
  Coverage   96.44%   96.45%           
=======================================
  Files         120      120           
  Lines        3801     3804    +3     
=======================================
+ Hits         3666     3669    +3     
  Misses        135      135           
Flag Coverage Δ
Linux 76.49% <100.00%> (+0.04%) ⬆️
Windows 76.49% <100.00%> (+0.04%) ⬆️
cpu 95.71% <100.00%> (-0.69%) ⬇️
gpu 96.37% <100.00%> (+<0.01%) ⬆️
macOS 95.71% <100.00%> (-0.69%) ⬇️
pytest 96.45% <100.00%> (+<0.01%) ⬆️
python3.6 95.58% <100.00%> (+<0.01%) ⬆️
python3.8 95.66% <100.00%> (-0.74%) ⬇️
python3.9 ?
torch1.3.1 95.58% <100.00%> (+<0.01%) ⬆️
torch1.4.0 95.66% <100.00%> (+<0.01%) ⬆️
torch1.9.0 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
torchmetrics/functional/retrieval/ndcg.py 100.00% <100.00%> (ø)
torchmetrics/utilities/checks.py 92.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e00e3ab...84d7be6. Read the comment docs.

@SkafteNicki
Copy link
Member

@paul-grundmann could you try changing the target line to target=torch.randint(low=-2, high=4... to also check for negative values?
https://github.com/PyTorchLightning/metrics/blob/a4d2a7a81ba482f28edd30bd76abc3cffbaca548/tests/retrieval/inputs.py#L35-L39

mergify bot and others added 3 commits July 16, 2021 21:43
@Borda
Copy link
Member

Borda commented Jul 20, 2021

@paul-grundmann mind check the failing tests and #378 (comment)

@paul-grundmann
Copy link
Contributor Author

It seems that the random generation of inputs can lead to invalid inputs for the calculation of the DCG (e.g. a single target with 0 relevance). This leads to a division by zero:

return _dcg(sorted_target) / _dcg(ideal_target) # _dcg(ideal_target) == 0

Should I add a test in the nDCG calculation if the ideal DCG is zero and then return 0.0 as a result?

@SkafteNicki
Copy link
Member

It seems to me that maybe the ndcg should be changed to account for division by zero. Looking at sklearns implementation:
https://github.com/scikit-learn/scikit-learn/blob/2beed55847ee70d363bdbfe14ee4401438fba057/sklearn/metrics/_ranking.py#L1458-L1466
they set it to 0 we the denominator is 0. @lucadiliello any opinion here?

@lucadiliello
Copy link
Contributor

It seems that the random generation of inputs can lead to invalid inputs for the calculation of the DCG (e.g. a single target with 0 relevance). This leads to a division by zero:

return _dcg(sorted_target) / _dcg(ideal_target) # _dcg(ideal_target) == 0

Should I add a test in the nDCG calculation if the ideal DCG is zero and then return 0.0 as a result?

Yes, I think the functional implementation should behave like the sklearn version, so it should return 0. However, if now you are allowing also negative targets, I think this line should be changed.

@mergify mergify bot added ready and removed ready labels Jul 26, 2021
paul-grundmann and others added 3 commits July 26, 2021 19:27
- Use the scikit-learn implementation of nDCG
- Removed the test for non binary targets in test_ndcg.py and replaced the default parameters in the error test with a custom one that does not check for binary targets
- set the _input_retrieval_scores_non_binary_target low to -1 to reduce the test failure rate
@paul-grundmann
Copy link
Contributor Author

Ok I had some time to play around since the recent code change introduced a lot more failing tests.
On the one hand it seems that scikit learn also does not handle the edge case of the DCG being zero Typically this happens for k=1 and a relevance target of 0 at idx 0. This of course produces failing tests.

On the other hand the current architecture of the tests. In the helpers.py script there are tests for the errors we test against. In case of the nDCG the _errors_test_functional_metric_parameters_default contains a check for binary values. So I added a new parameter definition which is in fact mostly identical to the default_parameters but without the check for binary targets.
I am not so fluent with pyTest so I don't know whether there might be a more elegant solution for that.

I think the first problem is more drastically because the tests can actually fail randomly. With relevance targets of only -1 instead of -2 I had a lot of successful test runs but one or two failing at some point. Maybe the inputs need to be generated differently or at least with a check if the targets with k=1 contain at least one relevant sample

@mergify mergify bot added the ready label Jul 26, 2021
- removed unused imports in ndcg.py
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐰

@mergify mergify bot added ready and removed ready labels Jul 28, 2021
@pep8speaks
Copy link

pep8speaks commented Jul 28, 2021

Hello @paul-grundmann! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-07-28 16:07:39 UTC

@SkafteNicki SkafteNicki enabled auto-merge (squash) July 28, 2021 12:59
@Borda Borda disabled auto-merge July 28, 2021 13:35
@Borda Borda enabled auto-merge (squash) July 28, 2021 13:35
@mergify mergify bot removed the ready label Jul 28, 2021
@Borda Borda merged commit b1062c9 into Lightning-AI:master Jul 28, 2021
@mergify mergify bot added the ready label Jul 28, 2021
@paul-grundmann paul-grundmann deleted the patch-1 branch July 29, 2021 07:16
@Borda Borda added this to the v0.5 milestone Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Functional nDCG can not be called with negative relevance targets
5 participants