Skip to content

Conversation

@jsrozner
Copy link
Contributor

What does this PR do?

While testing various generation configurations, I got confused when beam_sample's outputs (i.e. num_beams > 1, do_sample=True) failed to change with different settings of top_k, top_p, temperature. Then I realized that in my call to generate(), do_sample was still set to false even though I was cycling through various settings of top_k, top_p and temperature.

Generate (and its helper methods) already contain some parameter compatibility checks. This adds a few more checks:

  • num_beams must be >= 1
  • when do_sample is not set, warn the user if any of top_p, top_k, temperature are not None (since they will have no effect)
  • if num_beams is 1 (no beam search), warn user if any of early_stopping, length_penalty are not None (since they will have no effect)
  • if an invalid set of params num_beams and do_sample are passed, raise ValueError. Note that since we also add a check for num_beams < 1, this final value error will never be raised, but this prevents the generate function from falling off the end of the ifelse chain if something is altered in the future.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x ] Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
    No doc changes necessary
  • Did you write any new necessary tests?
    ran tests, but no new functionality created, just warning messages

Who can review?

Please tag fewer than 3 people.
Text Generation: @patrickvonplaten

patrickvonplaten and others added 20 commits November 28, 2020 19:50
* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links
* Fix minor typos

* Additional typos

* Style fix

Co-authored-by: guyrosin <[email protected]>
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
* Migration guide from v3.x to v4.x

* Better wording

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* Sylvain's comments

* Better wording.

Co-authored-by: Sylvain Gugger <[email protected]>
* Add T5 Encoder class for feature extraction

* fix T5 encoder add_start_docstrings indent

* update init with T5 encoder

* update init with TFT5ModelEncoder

* remove TFT5ModelEncoder

* change T5ModelEncoder order in init

* add T5ModelEncoder to transformers init

* clean T5ModelEncoder

* update init with TFT5ModelEncoder

* add TFModelEncoder for Tensorflow

* update init with TFT5ModelEncoder

* Update src/transformers/models/t5/modeling_t5.py

change output from Seq2SeqModelOutput to BaseModelOutput

Co-authored-by: Patrick von Platen <[email protected]>

* remove encoder_outputs

1. remove encoder_outputs from the function call.
2. remove the encoder_outputs If statement.
3. remove isinstance from return_dict.

* Authorize missing decoder keys

* remove unnecessary input parameters

remove pask_key_values and use_cache

* remove use_cache

remove use_cache from the forward method

* add doctoring for T5 encoder

add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING

* change return_dict to dot access

* add T5_ENCODER_INPUTS_DOCSTRING for TF T5

* change TFT5Encoder output type to BaseModelOutput

* remove unnecessary parameters for TFT5Encoder

* remove unnecessary if statement

* add import BaseModelOutput

* fix BaseModelOutput typo to TFBaseModelOutput

* update T5 doc with T5ModelEncoder

* add T5ModelEncoder to tests

* finish pytorch

* finish docs and mt5

* add mtf to init

* fix init

* remove n_positions

* finish PR

* Update src/transformers/models/mt5/modeling_mt5.py

Co-authored-by: Lysandre Debut <[email protected]>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Lysandre Debut <[email protected]>

* Update src/transformers/models/t5/modeling_tf_t5.py

Co-authored-by: Lysandre Debut <[email protected]>

* Update src/transformers/models/mt5/modeling_tf_mt5.py

Co-authored-by: Lysandre Debut <[email protected]>

* make style

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Lysandre Debut <[email protected]>
* Use model.from_pretrained for DataParallel also

When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.

This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.

If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?

* Fix silly bug

* Address review comments

Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
* Remove deprecated `evalutate_during_training`

* Update src/transformers/training_args_tf.py

Co-authored-by: Lysandre Debut <[email protected]>

Co-authored-by: Lysandre Debut <[email protected]>
* Slightly increase tolerance between pytorch and flax output

Signed-off-by: Morgan Funtowicz <[email protected]>

* test_multiple_sentences doesn't require torch

Signed-off-by: Morgan Funtowicz <[email protected]>

* Simplify parameterization on "jit" to use boolean rather than str

Signed-off-by: Morgan Funtowicz <[email protected]>

* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Remove pytest.mark.parametrize which seems to fail in some circumstances

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Fix style.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Give default parameters values for traced model.

Signed-off-by: Morgan Funtowicz <[email protected]>

* Review comment: Change sentences to sequences

Signed-off-by: Morgan Funtowicz <[email protected]>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
…ngface#8781)

* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style
* fix DP case on multi-gpu

* make executable

* test all 3 modes

* use the correct check for distributed

* dp doesn't need a special case

* restore original name

* cleanup
@jsrozner
Copy link
Contributor Author

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.

And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

is_beam_sample_gen_mode = (num_beams > 1) and do_sample is True

# warn if certain params will be unused
if not do_sample:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to add those warnings as we would have to do this for all parameters. IMO, if a user is playing around with top_k, top_p and/or temperature, she/he should know that it only works with do_sample=True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy however to add a statement for each input param to the doc. E.g. under top_p. Note that top_p is only effective if do_sample=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideas:

  • What if we also do it for the other params? In that case what would those other params be? I'm happy to add checks for the other params in this function (though that could mean that this function's behavior will differ from other behaviors in the library since warnings are not generally issued for unused parameters. I do think that's one of the weaknesses of a library this general. It's sometimes hard to find where something isn't behaving as expected because an arg gets dropped or is set unexpectedly in another location.)

  • A more involved idea would be to group the params as dicts for each type of generation.

But more generally I agree with you that the user should know that if he plays with top_p, top_k it's only valid if do_sample is true. My problem was that in my wandb sweep code, I had forgotten to set do_sample to True, so I was iterating through a ton of topp, topk settings but seeing no changes in output. At first I thought beam_sample was broken. It took me a while to figure out that my do_sample setting to false was nullifying all the other params I was passing in. A warning would have been nice...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it's not worth it to clutter the code for such warnings. Another problem is that this warning will not catch if people change top_k and top_p with their config and this is also hard to check since config.top_k for example is set to 50 already. I'd really prefer to just add those statements to the doc.
If we go down the road of adding warnings for every single parameter, it won't become maintainable, e.g. we might merge this PR soon: #8627

It's super useful to add these statements to the doc, but not worth the if-else cascade for the warnings IMO. There is already a lot of stuff happening under-the-hood in generate() and if one wants to change the default generate configs of the model he/she should be aware of what is going on exactly.

if not do_sample:
if top_k is not None or top_p is not None or temperature is not None:
logger.warning("do_sample is False so top_k, top_p, temperature will be ignored")
if num_beams == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

eos_token_id=eos_token_id,
**model_kwargs,
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to arrive here if num_beams is set correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - once we have the num_beams check this could never happen. (I mentioned that in the PR note!)

It just seems like good code style to avoid having the possibility of falling off the end if changes are ever made to the code above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me. Can we change the error message though, e.g.:

"No generation method was called. Make sure to correctly set the parameters num_beams and do_sample to call one of greedy_search, sample, beam_seach, or beam_sample."

raise ValueError("Make sure that `model_kwargs` include `encoder_outputs` of type `ModelOutput`.")

# determine generation mode
if num_beams < 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very much on-board with this raise! Can we move it however below line 464?

@jsrozner
Copy link
Contributor Author

jsrozner commented Dec 1, 2020

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.

And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

Regarding reading from config - am I missing something or do these never get checked?

@patrickvonplaten

@patrickvonplaten
Copy link
Contributor

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.
And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

Regarding reading from config - am I missing something or do these never get checked?

@patrickvonplaten

top_p = top_p if top_p is not None else self.config.top_p

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.