Skip to content

Conversation

@ArthurZucker
Copy link
Collaborator

What does this PR do?

The argument past was completely replaced with past_key_values thus this PR should fix any problem with kwargs being swallowed for old models in generation.
Related to #20347

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 30, 2022

The documentation is not available anymore as the PR was closed or merged.

@ArthurZucker ArthurZucker marked this pull request as ready for review January 4, 2023 10:17
@ArthurZucker ArthurZucker requested a review from sgugger January 4, 2023 10:19
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! There seems to be an issue with the TensorFlow tests. Also would like to have @gante opinion on this before merging.

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Although the change in the failing TF test is weird, we should try to understand it before we merge this PR 🤔

Comment on lines 1478 to 1487
@staticmethod
def _reorder_cache(past, beam_idx):
reordered_past = ()
for layer_past in past:
# cached cross_attention states don't have to be reordered -> they are always the same
reordered_past += (
tuple(tf.gather(past_state, beam_idx, axis=0) for past_state in layer_past[:2]) + layer_past[2:],
)
return reordered_past

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed in the TF side :D (It was used in the code path that existed before the XLA transition)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought that the failing test might come from this but I was wrong 😉

@ArthurZucker
Copy link
Collaborator Author

Ok the failing tests were because I did not pull from main, were the tf_utils now uses the generate_config. LGTM the failing test seems to be unrelated

@sgugger
Copy link
Collaborator

sgugger commented Jan 6, 2023

Yes, good to merge for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants