-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add warnings for incompatible generation parameters #8861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* refactor * further refactor * fix the rest tomorrow * save intermediate * finish slow tokenizer * make more tests pass * finish refactor * fix comment * clean further * fix name * fix naming * Update src/transformers/models/reformer/tokenization_reformer.py * Apply suggestions from code review * Apply suggestions from code review * refactor * fix init tokenizers * refactor * improve convert * refactor * correct convert slow tokenizer * final fix for Pegasus Tok * remove ipdb * improve links
* Fix minor typos * Additional typos * Style fix Co-authored-by: guyrosin <[email protected]>
* implement job skipping for doc-only PRs * silent grep is crucial * wip * wip * wip * wip * wip * wip * wip * wip * let's add doc * let's add code * revert test commits * restore * Better name * Better name * Better name * some more testing * some more testing * some more testing * finish testing
* Migration guide from v3.x to v4.x * Better wording * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> * Sylvain's comments * Better wording. Co-authored-by: Sylvain Gugger <[email protected]>
* Add T5 Encoder class for feature extraction * fix T5 encoder add_start_docstrings indent * update init with T5 encoder * update init with TFT5ModelEncoder * remove TFT5ModelEncoder * change T5ModelEncoder order in init * add T5ModelEncoder to transformers init * clean T5ModelEncoder * update init with TFT5ModelEncoder * add TFModelEncoder for Tensorflow * update init with TFT5ModelEncoder * Update src/transformers/models/t5/modeling_t5.py change output from Seq2SeqModelOutput to BaseModelOutput Co-authored-by: Patrick von Platen <[email protected]> * remove encoder_outputs 1. remove encoder_outputs from the function call. 2. remove the encoder_outputs If statement. 3. remove isinstance from return_dict. * Authorize missing decoder keys * remove unnecessary input parameters remove pask_key_values and use_cache * remove use_cache remove use_cache from the forward method * add doctoring for T5 encoder add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING * change return_dict to dot access * add T5_ENCODER_INPUTS_DOCSTRING for TF T5 * change TFT5Encoder output type to BaseModelOutput * remove unnecessary parameters for TFT5Encoder * remove unnecessary if statement * add import BaseModelOutput * fix BaseModelOutput typo to TFBaseModelOutput * update T5 doc with T5ModelEncoder * add T5ModelEncoder to tests * finish pytorch * finish docs and mt5 * add mtf to init * fix init * remove n_positions * finish PR * Update src/transformers/models/mt5/modeling_mt5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/t5/modeling_t5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/t5/modeling_tf_t5.py Co-authored-by: Lysandre Debut <[email protected]> * Update src/transformers/models/mt5/modeling_tf_mt5.py Co-authored-by: Lysandre Debut <[email protected]> * make style Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
… is_world_master() (huggingface#8828)
Related issue: huggingface#8837
* Use model.from_pretrained for DataParallel also When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end. This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel. If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden? * Fix silly bug * Address review comments Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
* Remove deprecated `evalutate_during_training` * Update src/transformers/training_args_tf.py Co-authored-by: Lysandre Debut <[email protected]> Co-authored-by: Lysandre Debut <[email protected]>
* Slightly increase tolerance between pytorch and flax output Signed-off-by: Morgan Funtowicz <[email protected]> * test_multiple_sentences doesn't require torch Signed-off-by: Morgan Funtowicz <[email protected]> * Simplify parameterization on "jit" to use boolean rather than str Signed-off-by: Morgan Funtowicz <[email protected]> * Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub. Signed-off-by: Morgan Funtowicz <[email protected]> * Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting. Signed-off-by: Morgan Funtowicz <[email protected]> * Remove pytest.mark.parametrize which seems to fail in some circumstances Signed-off-by: Morgan Funtowicz <[email protected]> * Fix unused imports. Signed-off-by: Morgan Funtowicz <[email protected]> * Fix style. Signed-off-by: Morgan Funtowicz <[email protected]> * Give default parameters values for traced model. Signed-off-by: Morgan Funtowicz <[email protected]> * Review comment: Change sentences to sequences Signed-off-by: Morgan Funtowicz <[email protected]> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]>
…ngface#8781) * NerPipeline (TokenClassification) now outputs offsets of words - It happens that the offsets are missing, it forces the user to pattern match the "word" from his input, which is not always feasible. For instance if a sentence contains the same word twice, then there is no way to know which is which. - This PR proposes to fix that by outputting 2 new keys for this pipelines outputs, "start" and "end", which correspond to the string offsets of the word. That means that we should always have the invariant: ```python input[entity["start"]: entity["end"]] == entity["entity_group"] # or entity["entity"] if not grouped ``` * Fixing doc style
* fix DP case on multi-gpu * make executable * test all 3 modes * use the correct check for distributed * dp doesn't need a special case * restore original name * cleanup
|
Question: the And since, for example, |
| is_beam_sample_gen_mode = (num_beams > 1) and do_sample is True | ||
|
|
||
| # warn if certain params will be unused | ||
| if not do_sample: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to add those warnings as we would have to do this for all parameters. IMO, if a user is playing around with top_k, top_p and/or temperature, she/he should know that it only works with do_sample=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy however to add a statement for each input param to the doc. E.g. under top_p. Note that top_p is only effective if do_sample=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideas:
-
What if we also do it for the other params? In that case what would those other params be? I'm happy to add checks for the other params in this function (though that could mean that this function's behavior will differ from other behaviors in the library since warnings are not generally issued for unused parameters. I do think that's one of the weaknesses of a library this general. It's sometimes hard to find where something isn't behaving as expected because an arg gets dropped or is set unexpectedly in another location.)
-
A more involved idea would be to group the params as dicts for each type of generation.
But more generally I agree with you that the user should know that if he plays with top_p, top_k it's only valid if do_sample is true. My problem was that in my wandb sweep code, I had forgotten to set do_sample to True, so I was iterating through a ton of topp, topk settings but seeing no changes in output. At first I thought beam_sample was broken. It took me a while to figure out that my do_sample setting to false was nullifying all the other params I was passing in. A warning would have been nice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, it's not worth it to clutter the code for such warnings. Another problem is that this warning will not catch if people change top_k and top_p with their config and this is also hard to check since config.top_k for example is set to 50 already. I'd really prefer to just add those statements to the doc.
If we go down the road of adding warnings for every single parameter, it won't become maintainable, e.g. we might merge this PR soon: #8627
It's super useful to add these statements to the doc, but not worth the if-else cascade for the warnings IMO. There is already a lot of stuff happening under-the-hood in generate() and if one wants to change the default generate configs of the model he/she should be aware of what is going on exactly.
| if not do_sample: | ||
| if top_k is not None or top_p is not None or temperature is not None: | ||
| logger.warning("do_sample is False so top_k, top_p, temperature will be ignored") | ||
| if num_beams == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
| eos_token_id=eos_token_id, | ||
| **model_kwargs, | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to arrive here if num_beams is set correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - once we have the num_beams check this could never happen. (I mentioned that in the PR note!)
It just seems like good code style to avoid having the possibility of falling off the end if changes are ever made to the code above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me. Can we change the error message though, e.g.:
"No generation method was called. Make sure to correctly set the parameters num_beams and do_sample to call one of greedy_search, sample, beam_seach, or beam_sample."
| raise ValueError("Make sure that `model_kwargs` include `encoder_outputs` of type `ModelOutput`.") | ||
|
|
||
| # determine generation mode | ||
| if num_beams < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very much on-board with this raise! Can we move it however below line 464?
Regarding reading from config - am I missing something or do these never get checked? |
|
|
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions. If you think this still needs to be addressed please comment on this thread. |
What does this PR do?
While testing various generation configurations, I got confused when beam_sample's outputs (i.e.
num_beams > 1, do_sample=True) failed to change with different settings oftop_k, top_p, temperature. Then I realized that in my call togenerate(),do_samplewas still set to false even though I was cycling through various settings of top_k, top_p and temperature.Generate (and its helper methods) already contain some parameter compatibility checks. This adds a few more checks:
num_beamsmust be >= 1do_sampleis not set, warn the user if any oftop_p, top_k, temperatureare not None (since they will have no effect)num_beamsis 1 (no beam search), warn user if any ofearly_stopping, length_penaltyare not None (since they will have no effect)num_beamsanddo_sampleare passed, raise ValueError. Note that since we also add a check fornum_beams < 1, this final value error will never be raised, but this prevents thegeneratefunction from falling off the end of the ifelse chain if something is altered in the future.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
No doc changes necessary
ran tests, but no new functionality created, just warning messages
Who can review?
Please tag fewer than 3 people.
Text Generation: @patrickvonplaten