Diverse number of return sequences for greedy search and sampling generation #8840

LSinev · 2020-11-29T20:52:12Z

What does this PR do?

A new option proposed, diverse_sequences, for cases, when one wants really different sequences to be generated (conversational bot, for example). For greedy search, it starts generating new sequences from top num_return_sequences tokens (as first tokens in sequences). For sample generation mode, num_return_sequences first tokens are taken from a multinomial distribution.
Default diverse_sequences=False leaves generation in a way it was before this PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

GPT2: @patrickvonplaten
Text Generation: @TevenLeScao
T5: @patrickvonplaten

* refactor * further refactor * fix the rest tomorrow * save intermediate * finish slow tokenizer * make more tests pass * finish refactor * fix comment * clean further * fix name * fix naming * Update src/transformers/models/reformer/tokenization_reformer.py * Apply suggestions from code review * Apply suggestions from code review * refactor * fix init tokenizers * refactor * improve convert * refactor * correct convert slow tokenizer * final fix for Pegasus Tok * remove ipdb * improve links

* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>

* implement job skipping for doc-only PRs * silent grep is crucial * wip * wip * wip * wip * wip * wip * wip * wip * let's add doc * let's add code * revert test commits * restore * Better name * Better name * Better name * some more testing * some more testing * some more testing * finish testing

patrickvonplaten · 2020-11-30T07:44:36Z

src/transformers/generation_utils.py

+                probs = F.softmax(scores, dim=-1)
+                next_tokens = torch.multinomial(probs, num_samples=num_return_sequences).reshape(-1)
+                # Once we got next_tokens, we have expand metadata
+                unfinished_sequences = unfinished_sequences.repeat_interleave(num_return_sequences, dim=0)


Won't this quickly explode the number of current sequences?

its purpose to explode sequences. Instead of exlploding them before calling sample and start each sequence from scratch (when nothing stops first token of each sequence to be equal (to the first token of another sequence)) this ensures first tokens to differ

patrickvonplaten · 2020-11-30T07:47:28Z

Hey @LSinev,

thanks for your PR! Since the generate() refactor we are really trying to not add any use-case specific code anymore to the individual generate() methods. This quickly led to an unmaintainable code previously (especially if start adding lots of those if statements) again, so we can't merge the PR as it is in this state. It would be ideal if we could just add a new LogitsPreprocessor class or a LogitsWarperClass.

If this is not sufficient, we have to think a bit more about how to add this PR.
One thing, I don't really understand is how greedy search with diverse_sequences=True is different from Beam Search with num_return_sequences > 1 -> it seems to be the same thing for me...Also could you add some links/pointers (paper, blog, other codebase) that makes use of this method?

LSinev · 2020-11-30T08:31:32Z

It would be ideal if we could just add a new LogitsPreprocessor class or a LogitsWarperClass.

Ok. I will check if it is possible (but this can move if statements inside, as I have to check processing of first token somehow).

how greedy search with diverse_sequences=True is different from Beam Search with num_return_sequences > 1 -> it seems to be the same thing for me...

Never thought about this. Will check.

Also could you add some links/pointers (paper, blog, other codebase) that makes use of this method?

Nothing openly available as far as i know. Because of transformers popularity, if such possibility not implemented, few developers will try these ideas. Main usecase is additional ranking of generated sequences. As for now, nothing stops to have exactly same sequences as output. It can also be used with probabilities of final sequences from second head of GPT2DoubleHeadsModel, for example (#5164).

github-actions · 2021-03-06T00:15:17Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

patrickvonplaten and others added 5 commits November 28, 2020 19:50

fix mt5 config (#8832)

36b60ce

Minor docs typo fixes (#8797)

3a08cc1

* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>

Really different sequences for greedy search and sampling

24cee95

patrickvonplaten reviewed Nov 30, 2020

View reviewed changes

patrickvonplaten mentioned this pull request Nov 30, 2020

Diverse beam search #8627

Closed

mfuntowicz force-pushed the master branch from 447808c to 18c32ee Compare December 8, 2020 22:38

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this Mar 6, 2021

LSinev deleted the feature/different-sequence branch May 1, 2021 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diverse number of return sequences for greedy search and sampling generation #8840

Diverse number of return sequences for greedy search and sampling generation #8840

Uh oh!

LSinev commented Nov 29, 2020

Uh oh!

patrickvonplaten Nov 30, 2020

Uh oh!

LSinev Nov 30, 2020

Uh oh!

patrickvonplaten commented Nov 30, 2020

Uh oh!

LSinev commented Nov 30, 2020 •

edited

Loading

Uh oh!

github-actions bot commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Diverse number of return sequences for greedy search and sampling generation #8840

Diverse number of return sequences for greedy search and sampling generation #8840

Uh oh!

Conversation

LSinev commented Nov 29, 2020

What does this PR do?

Before submitting

Who can review?

Uh oh!

patrickvonplaten Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

LSinev Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Nov 30, 2020

Uh oh!

LSinev commented Nov 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LSinev commented Nov 30, 2020 •

edited

Loading