-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[Whisper] Refactor whisper #21252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Whisper] Refactor whisper #21252
Conversation
| # priority: `generation_config` argument > `model.generation_config` (the default generation config) | ||
| if generation_config is None: | ||
| # legacy: users may modify the model configuration to control generation -- update the generation config | ||
| # model attribute accordingly, if it was created from the model config | ||
| if self.generation_config._from_model_config: | ||
| new_generation_config = GenerationConfig.from_model_config(self.config) | ||
| if new_generation_config != self.generation_config: | ||
| warnings.warn( | ||
| "You have modified the pretrained model configuration to control generation. This is a" | ||
| " deprecated strategy to control generation and will be removed soon, in a future version." | ||
| " Please use a generation configuration file (see" | ||
| " https://huggingface.co/docs/transformers/main_classes/text_generation)I don't agree with" | ||
| " this warning, the generation config can be different but the rest of the model is the" | ||
| " samne.........." | ||
| ) | ||
| self.generation_config = new_generation_config | ||
| generation_config = self.generation_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is redundant with super.generate(). Would be good if self.generation_config was already created and would only require an update at this point.
|
The documentation is not available anymore as the PR was closed or merged. |
Narsil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll delay the full review but gave some early comments.
The code is indeed cleaner that way !
| # apply the `max_initial_timestamp` option | ||
| if input_ids.shape[1] == self.begin_index and self.max_initial_timestamp_index is not None: | ||
| last_allowed = self.timestamp_begin + self.max_initial_timestamp_index | ||
| if input_ids.shape[1] == self.begin_index and self.max_initial_timestamp_idx is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm under the impression Sylvain would favor index over idx (And I agree)
| out = {"tokens": tokens} | ||
| if stride is not None: | ||
| out["stride"] = stride | ||
| if self.type == "seq2seq_whisper": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We needed to pop before generate.
If there's no need for it to pop before we can simplify be simply setting something like:
out = {**out, **model_inputs} or something slightly along those lines including only stride.
|
"The language is automatically detected". From my experience the language detection by Whisper is very unreliable. Will it still be possible to specify language? |
|
Sure, let's make sure we still allow the language to be past! Thanks for pointing this out |
|
Once #21257 is merged, the tests here should also pass ! |
Narsil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super nice !
LGTM.
It does clean up quite nicely imo.
|
Pipeline tests need #21269 to be merge 😉 |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from the doc. Thanks!
|
The two failing tests are from the latest modification of the multilingual tokenizer's config |
|
|
||
| forced_decoder_ids = [] | ||
|
|
||
| if hasattr(generation_config, "is_multilingual") and generation_config.is_multilingual: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we first introduced the generation config. Unless the task and language were passed as inputs, we'd default to speech transcription with language detection
What does this PR do?
The goal of this PR is to allow the users to do the following :
The language is automatically detected. This also simplifies the pipeline calls, and add a good example of
generation_config's intended usage.