Add warnings for incompatible generation parameters #8861

jsrozner · 2020-11-30T22:33:15Z

What does this PR do?

While testing various generation configurations, I got confused when beam_sample's outputs (i.e. num_beams > 1, do_sample=True) failed to change with different settings of top_k, top_p, temperature. Then I realized that in my call to generate(), do_sample was still set to false even though I was cycling through various settings of top_k, top_p and temperature.

Generate (and its helper methods) already contain some parameter compatibility checks. This adds a few more checks:

num_beams must be >= 1
when do_sample is not set, warn the user if any of top_p, top_k, temperature are not None (since they will have no effect)
if num_beams is 1 (no beam search), warn user if any of early_stopping, length_penalty are not None (since they will have no effect)
if an invalid set of params num_beams and do_sample are passed, raise ValueError. Note that since we also add a check for num_beams < 1, this final value error will never be raised, but this prevents the generate function from falling off the end of the ifelse chain if something is altered in the future.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x ] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
No doc changes necessary
Did you write any new necessary tests?
ran tests, but no new functionality created, just warning messages

Who can review?

Please tag fewer than 3 people.
Text Generation: @patrickvonplaten

* refactor * further refactor * fix the rest tomorrow * save intermediate * finish slow tokenizer * make more tests pass * finish refactor * fix comment * clean further * fix name * fix naming * Update src/transformers/models/reformer/tokenization_reformer.py * Apply suggestions from code review * Apply suggestions from code review * refactor * fix init tokenizers * refactor * improve convert * refactor * correct convert slow tokenizer * final fix for Pegasus Tok * remove ipdb * improve links

* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <[email protected]>

* implement job skipping for doc-only PRs * silent grep is crucial * wip * wip * wip * wip * wip * wip * wip * wip * let's add doc * let's add code * revert test commits * restore * Better name * Better name * Better name * some more testing * some more testing * some more testing * finish testing

* Migration guide from v3.x to v4.x * Better wording * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Sylvain's comments * Better wording. Co-authored-by: Sylvain Gugger <[email protected]>

* Add T5 Encoder class for feature extraction * fix T5 encoder add_start_docstrings indent * update init with T5 encoder * update init with TFT5ModelEncoder * remove TFT5ModelEncoder * change T5ModelEncoder order in init * add T5ModelEncoder to transformers init * clean T5ModelEncoder * update init with TFT5ModelEncoder * add TFModelEncoder for Tensorflow * update init with TFT5ModelEncoder * Update src/transformers/models/t5/modeling_t5.py change output from Seq2SeqModelOutput to BaseModelOutput Co-authored-by: Patrick von Platen <[email protected]> * remove encoder_outputs 1. remove encoder_outputs from the function call. 2. remove the encoder_outputs If statement. 3. remove isinstance from return_dict. * Authorize missing decoder keys * remove unnecessary input parameters remove pask_key_values and use_cache * remove use_cache remove use_cache from the forward method * add doctoring for T5 encoder add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING * change return_dict to dot access * add T5_ENCODER_INPUTS_DOCSTRING for TF T5 * change TFT5Encoder output type to BaseModelOutput * remove unnecessary parameters for TFT5Encoder * remove unnecessary if statement * add import BaseModelOutput * fix BaseModelOutput typo to TFBaseModelOutput * update T5 doc with T5ModelEncoder * add T5ModelEncoder to tests * finish pytorch * finish docs and mt5 * add mtf to init * fix init * remove n_positions * finish PR * Update src/transformers/models/mt5/modeling_mt5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/t5/modeling_tf_t5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/mt5/modeling_tf_mt5.py Co-authored-by: Lysandre Debut <[email protected]> * make style Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

… is_world_master() (huggingface#8828)

Related issue: huggingface#8837

* Use model.from_pretrained for DataParallel also When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end. This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel. If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden? * Fix silly bug * Address review comments Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?

* Remove deprecated `evalutate_during_training` * Update src/transformers/training_args_tf.py Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

* Slightly increase tolerance between pytorch and flax output Signed-off-by: Morgan Funtowicz <[email protected]> * test_multiple_sentences doesn't require torch Signed-off-by: Morgan Funtowicz <[email protected]> * Simplify parameterization on "jit" to use boolean rather than str Signed-off-by: Morgan Funtowicz <[email protected]> * Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub. Signed-off-by: Morgan Funtowicz <[email protected]> * Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove pytest.mark.parametrize which seems to fail in some circumstances Signed-off-by: Morgan Funtowicz <[email protected]> * Fix unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix style. Signed-off-by: Morgan Funtowicz <[email protected]> * Give default parameters values for traced model. Signed-off-by: Morgan Funtowicz <[email protected]> * Review comment: Change sentences to sequences Signed-off-by: Morgan Funtowicz <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]>

…ngface#8781) * NerPipeline (TokenClassification) now outputs offsets of words - It happens that the offsets are missing, it forces the user to pattern match the "word" from his input, which is not always feasible. For instance if a sentence contains the same word twice, then there is no way to know which is which. - This PR proposes to fix that by outputting 2 new keys for this pipelines outputs, "start" and "end", which correspond to the string offsets of the word. That means that we should always have the invariant: ```python input[entity["start"]: entity["end"]] == entity["entity_group"] # or entity["entity"] if not grouped ``` * Fixing doc style

* fix DP case on multi-gpu * make executable * test all 3 modes * use the correct check for distributed * dp doesn't need a special case * restore original name * cleanup

jsrozner · 2020-11-30T22:35:46Z

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.

And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

patrickvonplaten · 2020-12-01T16:10:13Z

src/transformers/generation_utils.py

        is_beam_sample_gen_mode = (num_beams > 1) and do_sample is True

+        # warn if certain params will be unused
+        if not do_sample:


I don't want to add those warnings as we would have to do this for all parameters. IMO, if a user is playing around with top_k, top_p and/or temperature, she/he should know that it only works with do_sample=True.

I'm happy however to add a statement for each input param to the doc. E.g. under top_p. Note that top_p is only effective if do_sample=True

Ideas:

What if we also do it for the other params? In that case what would those other params be? I'm happy to add checks for the other params in this function (though that could mean that this function's behavior will differ from other behaviors in the library since warnings are not generally issued for unused parameters. I do think that's one of the weaknesses of a library this general. It's sometimes hard to find where something isn't behaving as expected because an arg gets dropped or is set unexpectedly in another location.)

A more involved idea would be to group the params as dicts for each type of generation.

But more generally I agree with you that the user should know that if he plays with top_p, top_k it's only valid if do_sample is true. My problem was that in my wandb sweep code, I had forgotten to set do_sample to True, so I was iterating through a ton of topp, topk settings but seeing no changes in output. At first I thought beam_sample was broken. It took me a while to figure out that my do_sample setting to false was nullifying all the other params I was passing in. A warning would have been nice...

In my opinion, it's not worth it to clutter the code for such warnings. Another problem is that this warning will not catch if people change top_k and top_p with their config and this is also hard to check since config.top_k for example is set to 50 already. I'd really prefer to just add those statements to the doc.
If we go down the road of adding warnings for every single parameter, it won't become maintainable, e.g. we might merge this PR soon: #8627

It's super useful to add these statements to the doc, but not worth the if-else cascade for the warnings IMO. There is already a lot of stuff happening under-the-hood in generate() and if one wants to change the default generate configs of the model he/she should be aware of what is going on exactly.

patrickvonplaten · 2020-12-01T16:11:36Z

src/transformers/generation_utils.py

+        if not do_sample:
+            if top_k is not None or top_p is not None or temperature is not None:
+                logger.warning("do_sample is False so top_k, top_p, temperature will be ignored")
+        if num_beams == 1:


patrickvonplaten · 2020-12-01T16:12:25Z

src/transformers/generation_utils.py

                eos_token_id=eos_token_id,
                **model_kwargs,
            )
+        else:


I don't think it's possible to arrive here if num_beams is set correctly

Agreed - once we have the num_beams check this could never happen. (I mentioned that in the PR note!)

It just seems like good code style to avoid having the possibility of falling off the end if changes are ever made to the code above?

Ok for me. Can we change the error message though, e.g.:

"No generation method was called. Make sure to correctly set the parameters num_beams and do_sample to call one of greedy_search, sample, beam_seach, or beam_sample."

patrickvonplaten · 2020-12-01T16:13:17Z

src/transformers/generation_utils.py

                raise ValueError("Make sure that `model_kwargs` include `encoder_outputs` of type `ModelOutput`.")

        # determine generation mode
+        if num_beams < 1:


I'm very much on-board with this raise! Can we move it however below line 464?

src/transformers/generation_utils.py

jsrozner · 2020-12-01T17:21:31Z

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.

And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

Regarding reading from config - am I missing something or do these never get checked?

@patrickvonplaten

patrickvonplaten · 2020-12-01T22:31:03Z

Question: the generate() documentation says that topk etc params will default to values in the pretrainedconfig, but top_k, for example is never read from the config file. This differs from, e.g., num_beams, max_length which are read from config if they are not passed in as params to the generate function.
And since, for example, T5PreTrainedModel never overwrites the generate function, I don't see how the defaults in the config (for params like top_k) could actually end up being passed to the generate function?

Regarding reading from config - am I missing something or do these never get checked?

@patrickvonplaten

transformers/src/transformers/generation_utils.py

Line 240 in 693ac35

top_p = top_p if top_p is not None else self.config.top_p

github-actions · 2021-03-06T00:15:13Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

patrickvonplaten and others added 20 commits November 28, 2020 19:50

fix mt5 config (huggingface#8832)

36b60ce

Minor docs typo fixes (huggingface#8797)

3a08cc1

* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <[email protected]>

Migration guide from v3.x to v4.x (huggingface#8763)

610cb10

* Migration guide from v3.x to v4.x * Better wording * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Sylvain's comments * Better wording. Co-authored-by: Sylvain Gugger <[email protected]>

token-classification: use is_world_process_zero instead of deprecated…

19fa01c

… is_world_master() (huggingface#8828)

Correct docstring. (huggingface#8845)

cc983cd

Related issue: huggingface#8837

Add a direct link to the big table (huggingface#8850)

75f8100

Comment the skip job on doc line

08e7076

Merge remote-tracking branch 'origin/master'

4062c75

Remove deprecated evalutate_during_training (huggingface#8852)

5530299

* Remove deprecated `evalutate_during_training` * Update src/transformers/training_args_tf.py Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>

Release: v4.0.0

22b0ff7

Update docs

9995a34

fix pypi complaint on version naming

5fd3d81

[s2s trainer] fix DP mode (huggingface#8823)

7f34d75

* fix DP case on multi-gpu * make executable * test all 3 modes * use the correct check for distributed * dp doesn't need a special case * restore original name * cleanup

Add warnings for incompatible generation parameters

5dae932

patrickvonplaten reviewed Dec 1, 2020

View reviewed changes

src/transformers/generation_utils.py Show resolved Hide resolved

mfuntowicz force-pushed the master branch from 447808c to 18c32ee Compare December 8, 2020 22:38

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this Mar 6, 2021

Add warnings for incompatible generation parameters #8861

Add warnings for incompatible generation parameters #8861

Uh oh!

Conversation

jsrozner commented Nov 30, 2020

What does this PR do?

Before submitting

Who can review?

Uh oh!

jsrozner commented Nov 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jsrozner commented Dec 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten commented Dec 1, 2020

Uh oh!

github-actions bot commented Mar 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

jsrozner commented Dec 1, 2020 •

edited

Loading