Skip to content

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Dec 20, 2022

What does this PR do?

Fix TF generation (especially for the TFMarian generation issue in #18149)

Fix #18149

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 20, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator Author

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a few comments to help the review


# 2. can the new beams still improve?
best_running_score = running_scores[:, :1] / (max_length**length_penalty)
best_running_score = running_scores[:, :1] / tf.cast(cur_len, dtype=running_scores.dtype) ** length_penalty
Copy link
Collaborator Author

@ydshieh ydshieh Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current main branch, max_length is used instead of cur_len. However, in our PyTorch generation's BeamHypotheses, it is cur_len, see

cur_score = best_sum_logprobs / cur_len**self.length_penalty

When running the code snippet in the reported TFMarian issue (#18149), we get max_length being a constant of 512, but the PyTorch generation code runs with cur_len which is from 1 (or 2) to 5.

(However, this is not the root cause of the issue in #18149)

Comment on lines +3076 to +3082
# still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping)
still_open_beam = ~(tf.math.reduce_all(is_sent_finished))

return not_max_length_yet & (still_open_beam | improvement_still_possible)
_early_stopping = tf.constant(early_stopping > 0, dtype=tf.bool)

# return not_max_length_yet & (still_open_beam | improvement_still_possible)
return not_max_length_yet & (still_open_beam | (~_early_stopping & improvement_still_possible))
Copy link
Collaborator Author

@ydshieh ydshieh Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method beam_search_cond_fn corresponds to BeamHypotheses.is_done in our PyTorch generation code (despite the meaning is reversed: generation done v.s. not done).

The above suggests:

  • The main issue in TFMarian super slow generation comes from the condition around early_stopping
  • With the changes in this PR, it could generate quickly just as the Marian

I run the slow tests for bert, gpt2, bart, t5: One test need to be fixed tests/models/bart/test_modeling_tf_bart.py::TFBartModelTest::test_xla_generate_slow

Copy link
Collaborator Author

@ydshieh ydshieh Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, one thing I don't understand very well is:

this part len(self) < self.num_beams in BeamHypotheses.is_done

if len(self) < self.num_beams:

v.s.

tf.math.reduce_all(is_sent_finished) and/or not_max_length_yet in beam_search_cond_fn.

It doesn't seem 100% equivalent conditions. (But I didn't really go into the details around this part)

@ydshieh ydshieh requested a review from gante December 21, 2022 07:35
@ydshieh ydshieh marked this pull request as ready for review December 21, 2022 07:36

# 3. is there still a beam that has not finished?
still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping)
# still_open_beam = ~(tf.math.reduce_all(is_sent_finished) & early_stopping)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be removed before merge

return not_max_length_yet & (still_open_beam | improvement_still_possible)
_early_stopping = tf.constant(early_stopping > 0, dtype=tf.bool)

# return not_max_length_yet & (still_open_beam | improvement_still_possible)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be removed before merge

@gante
Copy link
Contributor

gante commented Dec 26, 2022

Hey @ydshieh 👋

Thank you for opening this PR, it made me realize a detail that is wrong in both frameworks 👀

We know that logprobs is a negative value, and we want to maximize it in beam search (i.e. make it as close to 0 as possible). Since logprobs is always negative, and the final score is the sum of the logprobs, we can anticipate the best possible score and use it to end beam search with no drawback. Well, it turns out that the method to compute the best possible score depends on length_penalty, and we are not accounting for that!

  • Scenario 1, length_penalty > 0.0: In this case, as the sentence grows, the denominator grows as well. This means the score can get closer to 0 (i.e. higher) as the sentence grows, and longer sentences are promoted. In this case, the best possible score can be determined from the maximum sequence length (TF implementation).
  • Scenario 2, length_penalty < 0.0: In this case, as the sentence grows, the denominator gets smaller. This means the score will get farther away to 0 (i.e. lower) as the sentence grows, and shorter sentences are promoted. In this case, the best possible score can be determined from the current sequence length (PT implementation).

On top of this incomplete best score computation on both ends, your PR made me realize that the stopping condition for TF also had a problem (after factoring in the correct length penalty computation, a few tests failed).

I'm opening a PR to compare against this one with what I think is the correct solution to this bug 🐛

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jan 3, 2023

Close in favor of #20901

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inference for TFMarianMTModel (en to Romance language translation) is slow and inaccurate

4 participants