-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Fix TF generation (especially for TFMarian)
#20853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave a few comments to help the review
|
|
||
| # 2. can the new beams still improve? | ||
| best_running_score = running_scores[:, :1] / (max_length**length_penalty) | ||
| best_running_score = running_scores[:, :1] / tf.cast(cur_len, dtype=running_scores.dtype) ** length_penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current main branch, max_length is used instead of cur_len. However, in our PyTorch generation's BeamHypotheses, it is cur_len, see
| cur_score = best_sum_logprobs / cur_len**self.length_penalty |
When running the code snippet in the reported TFMarian issue (#18149), we get max_length being a constant of 512, but the PyTorch generation code runs with cur_len which is from 1 (or 2) to 5.
(However, this is not the root cause of the issue in #18149)
| # still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping) | ||
| still_open_beam = ~(tf.math.reduce_all(is_sent_finished)) | ||
|
|
||
| return not_max_length_yet & (still_open_beam | improvement_still_possible) | ||
| _early_stopping = tf.constant(early_stopping > 0, dtype=tf.bool) | ||
|
|
||
| # return not_max_length_yet & (still_open_beam | improvement_still_possible) | ||
| return not_max_length_yet & (still_open_beam | (~_early_stopping & improvement_still_possible)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method beam_search_cond_fn corresponds to BeamHypotheses.is_done in our PyTorch generation code (despite the meaning is reversed: generation done v.s. not done).
-
In current
mainbranch, the logic here isstill_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping) return not_max_length_yet & (still_open_beam | improvement_still_possible)
When
early_stoppingisFalse,still_open_beamwill beTrueand the return value becomesTrue(if
not_max_length_yetisTrue) - i.e.it should continue the generation -
However, in BeamHypotheses.is_done, if
early_stoppingisFalse(and if ), it will compare the scores
transformers/src/transformers/generation/beam_search.py
Lines 895 to 897 in 3be028b
cur_score = best_sum_logprobs / cur_len**self.length_penalty ret = self.worst_score >= cur_score return ret
and in the code snippet for Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate #18149, it returnsTrueforis_doneafter 5 or 6 generation steps - i.e.it should **NOT** continue the generation
The above suggests:
- The main issue in
TFMariansuper slow generation comes from the condition aroundearly_stopping - With the changes in this PR, it could generate quickly just as the
Marian
I run the slow tests for bert, gpt2, bart, t5: One test need to be fixed tests/models/bart/test_modeling_tf_bart.py::TFBartModelTest::test_xla_generate_slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, one thing I don't understand very well is:
this part len(self) < self.num_beams in BeamHypotheses.is_done
| if len(self) < self.num_beams: |
v.s.
tf.math.reduce_all(is_sent_finished) and/or not_max_length_yet in beam_search_cond_fn.
It doesn't seem 100% equivalent conditions. (But I didn't really go into the details around this part)
|
|
||
| # 3. is there still a beam that has not finished? | ||
| still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping) | ||
| # still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be removed before merge
| return not_max_length_yet & (still_open_beam | improvement_still_possible) | ||
| _early_stopping = tf.constant(early_stopping > 0, dtype=tf.bool) | ||
|
|
||
| # return not_max_length_yet & (still_open_beam | improvement_still_possible) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be removed before merge
|
Hey @ydshieh 👋 Thank you for opening this PR, it made me realize a detail that is wrong in both frameworks 👀 We know that logprobs is a negative value, and we want to maximize it in beam search (i.e. make it as close to 0 as possible). Since logprobs is always negative, and the final score is the sum of the logprobs, we can anticipate the best possible score and use it to end beam search with no drawback. Well, it turns out that the method to compute the best possible score depends on
On top of this incomplete best score computation on both ends, your PR made me realize that the stopping condition for TF also had a problem (after factoring in the correct length penalty computation, a few tests failed). I'm opening a PR to compare against this one with what I think is the correct solution to this bug 🐛 |
|
Close in favor of #20901 |
What does this PR do?
Fix TF generation (especially for the
TFMariangeneration issue in #18149)Fix #18149