Skip to content

Conversation

@LSinev
Copy link
Contributor

@LSinev LSinev commented Nov 29, 2020

What does this PR do?

A new option proposed, diverse_sequences, for cases, when one wants really different sequences to be generated (conversational bot, for example). For greedy search, it starts generating new sequences from top num_return_sequences tokens (as first tokens in sequences). For sample generation mode, num_return_sequences first tokens are taken from a multinomial distribution.
Default diverse_sequences=False leaves generation in a way it was before this PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

GPT2: @patrickvonplaten
Text Generation: @TevenLeScao
T5: @patrickvonplaten

patrickvonplaten and others added 5 commits November 28, 2020 19:50
* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links
* Fix minor typos

* Additional typos

* Style fix

Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
probs = F.softmax(scores, dim=-1)
next_tokens = torch.multinomial(probs, num_samples=num_return_sequences).reshape(-1)
# Once we got next_tokens, we have expand metadata
unfinished_sequences = unfinished_sequences.repeat_interleave(num_return_sequences, dim=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this quickly explode the number of current sequences?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its purpose to explode sequences. Instead of exlploding them before calling sample and start each sequence from scratch (when nothing stops first token of each sequence to be equal (to the first token of another sequence)) this ensures first tokens to differ

@patrickvonplaten
Copy link
Contributor

Hey @LSinev,

thanks for your PR! Since the generate() refactor we are really trying to not add any use-case specific code anymore to the individual generate() methods. This quickly led to an unmaintainable code previously (especially if start adding lots of those if statements) again, so we can't merge the PR as it is in this state. It would be ideal if we could just add a new LogitsPreprocessor class or a LogitsWarperClass.

If this is not sufficient, we have to think a bit more about how to add this PR.
One thing, I don't really understand is how greedy search with diverse_sequences=True is different from Beam Search with num_return_sequences > 1 -> it seems to be the same thing for me...Also could you add some links/pointers (paper, blog, other codebase) that makes use of this method?

@LSinev
Copy link
Contributor Author

LSinev commented Nov 30, 2020

It would be ideal if we could just add a new LogitsPreprocessor class or a LogitsWarperClass.

Ok. I will check if it is possible (but this can move if statements inside, as I have to check processing of first token somehow).

how greedy search with diverse_sequences=True is different from Beam Search with num_return_sequences > 1 -> it seems to be the same thing for me...

Never thought about this. Will check.

Also could you add some links/pointers (paper, blog, other codebase) that makes use of this method?

Nothing openly available as far as i know. Because of transformers popularity, if such possibility not implemented, few developers will try these ideas. Main usecase is additional ranking of generated sequences. As for now, nothing stops to have exactly same sequences as output. It can also be used with probabilities of final sequences from second head of GPT2DoubleHeadsModel, for example (#5164).

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Mar 6, 2021
@LSinev LSinev deleted the feature/different-sequence branch May 1, 2021 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants