Skip to content

Conversation

@zucchini-nlp
Copy link
Member

What does this PR do?

Fixes #29551 . In fact we do not really fix it. As @gante already explained why the logits and scores for contrastive decoding are same in that setting, because we did not apply any logits processors.

This PR changes all in-place operation on scores to out-of-place, so that the logits and scores are actually different when logits processors are used. Actually, this is mostly copy-paste from PR for compile compatibility, we also has to get rid of in-place operations there.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@gante

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍

Missing: a test regarding that confirms that logits == scores when no processors are used in generate, and logits != scores otherwise.


scores.scatter_(1, input_ids, score)
scores = scores.scatter(1, input_ids, score)
return scores
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should do return scores_processed or return scores in ALL processors, for the sake of keeping a consistent pattern

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good 👍

One big question though: no doctests should have changed in the process. Do you know what is causing the change?

@gante gante requested a review from amyeroberts March 20, 2024 11:11
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 🔥 Thanks for working on this!

Quite a few of the changes don't seem to be necessary, but I'm assuming it's for consistency of having scores_processed returned.

Comment on lines +290 to +291
scores_processed = scores / self.temperature
return scores_processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it surprising that this causes an inplace modification

In [8]: import torch

In [9]: x = torch.Tensor([1, 2, 3, 4])

In [10]: x
Out[10]: tensor([1., 2., 3., 4.])

In [11]: id(x)
Out[11]: 4339929136

In [12]: x /= 2

In [13]: id(x)
Out[13]: 4339929136

In [14]: x = x / 5

In [15]: id(x)
Out[15]: 10943842160

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was not causing any modifications on scores itself. That naming is for consistency only :)

indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
scores_processed = scores.masked_fill(indices_to_remove, self.filter_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all the masked_fill calls here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, naming consistency purposes again

Comment on lines +1430 to +1432
scores_processed = torch.full_like(scores, -math.inf)
scores_processed[:, self.bos_token_id] = 0
return scores_processed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot nicer :)

@gante gante merged commit fadb053 into huggingface:main Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Contrastive decoding "raw" logits and scores are identical

4 participants