Fixed '--no-ignore_eos' not working correctly#1408
Closed
Jing1Ling wants to merge 1 commit into
Closed
Conversation
Contributor
Author
I will close this PR because the issue has been fixed by the PR you mentioned. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes # (issue)
Command used for testing:
Added the following code after the L480 of 'run_generation.py' file to observe the output of the model.
Expected behavior: with '--no-ignore_eos', if a sample generates an eos token, the subsequently generated tokens will be overwritten by the pad token until all samples in the current batch have generated eos or meet other termination conditions.
Actual behavior: Taking batch_size=2 as an example, sample0 and sample1 have generated eos tokens respectively (note that not at the same time), the model will not stop generating, and subsequent tokens will not be overwritten by pad tokens. Instead, it will not stop until all samples generate eos tokens at the same time or trigger other stop criteria. This will cause some unreasonable sentences to appear at the end of the output.
Code Details
As shown here,
unfinished_sequencesmaintains information about whether each sample has finished generating. If the sample withidx==xhas not finished generating, thenunfinished_sequences[x]will be True. If it has finished, the subsequently generated token will be overwritten by the pad token. This is fine.However, since eos-related stop criteria only return a single bool value[see here] , the
unfinished_sequencecannot be updated correctly[update code].I changed the return value to tensor and the final output was as expected.
Question
There are comments in code(comment1, comment2) indicate that a boolean value is returned in the case of static shape. But why is it necessary to return a single value under static shape?
In addition, can we change the default value of ignore_eos to false? Set it to true only when we need to test performance? In actual use, it is more intuitive to stop after generating eos, and it can also avoid unnecessary calculations.
Before submitting