[Whisper] Refactor whisper #21252

ArthurZucker · 2023-01-23T10:48:44Z

What does this PR do?

The goal of this PR is to allow the users to do the following :

...
whisper_model.generate(audio, return_timestamps = True)
whisper_model.generate(audio, return_timestamps = True, task = Transcribe)

The language is automatically detected. This also simplifies the pipeline calls, and add a good example of generation_config 's intended usage.

ArthurZucker · 2023-01-23T10:54:00Z

src/transformers/models/whisper/modeling_whisper.py

+        # priority: `generation_config` argument > `model.generation_config` (the default generation config)
+        if generation_config is None:
+            # legacy: users may modify the model configuration to control generation -- update the generation config
+            # model attribute accordingly, if it was created from the model config
+            if self.generation_config._from_model_config:
+                new_generation_config = GenerationConfig.from_model_config(self.config)
+                if new_generation_config != self.generation_config:
+                    warnings.warn(
+                        "You have modified the pretrained model configuration to control generation. This is a"
+                        " deprecated strategy to control generation and will be removed soon, in a future version."
+                        " Please use a generation configuration file (see"
+                        " https://huggingface.co/docs/transformers/main_classes/text_generation)I don't agree with"
+                        " this warning, the generation config can be different but the rest of the model is the"
+                        " samne.........."
+                    )
+                    self.generation_config = new_generation_config
+            generation_config = self.generation_config


This part is redundant with super.generate(). Would be good if self.generation_config was already created and would only require an update at this point.

HuggingFaceDocBuilderDev · 2023-01-23T11:08:01Z

The documentation is not available anymore as the PR was closed or merged.

Narsil

I'll delay the full review but gave some early comments.

The code is indeed cleaner that way !

Narsil · 2023-01-23T12:42:34Z

src/transformers/generation/logits_process.py

            # apply the `max_initial_timestamp` option
-            if input_ids.shape[1] == self.begin_index and self.max_initial_timestamp_index is not None:
-                last_allowed = self.timestamp_begin + self.max_initial_timestamp_index
+            if input_ids.shape[1] == self.begin_index and self.max_initial_timestamp_idx is not None:


nit: I'm under the impression Sylvain would favor index over idx (And I agree)

Narsil · 2023-01-23T12:44:59Z

src/transformers/pipelines/automatic_speech_recognition.py

-            out = {"tokens": tokens}
-            if stride is not None:
-                out["stride"] = stride
+            if self.type == "seq2seq_whisper":


We needed to pop before generate.

If there's no need for it to pop before we can simplify be simply setting something like:

out = {**out, **model_inputs} or something slightly along those lines including only stride.

bjelkenhed · 2023-01-23T15:35:28Z

"The language is automatically detected". From my experience the language detection by Whisper is very unreliable. Will it still be possible to specify language?

ArthurZucker · 2023-01-23T15:38:01Z

Sure, let's make sure we still allow the language to be past! Thanks for pointing this out

ArthurZucker · 2023-01-23T17:58:11Z

Once #21257 is merged, the tests here should also pass !

…to refactor-whisper

Narsil

This is super nice !
LGTM.

It does clean up quite nicely imo.

ArthurZucker · 2023-01-24T16:37:49Z

Pipeline tests need #21269 to be merge 😉

…to refactor-whisper

src/transformers/generation/logits_process.py

src/transformers/models/whisper/modeling_whisper.py

tests/pipelines/test_pipelines_automatic_speech_recognition.py

sgugger

LGTM apart from the doc. Thanks!

src/transformers/models/whisper/modeling_whisper.py

ArthurZucker · 2023-01-25T10:39:21Z

The two failing tests are from the latest modification of the multilingual tokenizer's config

sanchit-gandhi · 2024-01-29T15:48:23Z

src/transformers/models/whisper/modeling_whisper.py

+
+        forced_decoder_ids = []
+
+        if hasattr(generation_config, "is_multilingual") and generation_config.is_multilingual:


This is where we first introduced the generation config. Unless the task and language were passed as inputs, we'd default to speech transcription with language detection

ArthurZucker added 3 commits January 23, 2023 10:42

update whisper logit processor

ffb6350

add generate for whisper

68b7f71

remove part of the whisper specific code from pipeline

9237431

ArthurZucker commented Jan 23, 2023

View reviewed changes

ArthurZucker requested a review from Narsil January 23, 2023 10:54

Narsil reviewed Jan 23, 2023

View reviewed changes

ArthurZucker mentioned this pull request Jan 23, 2023

[Whisper] ASR Pipeline with "return_timestamps=True" gives IndexError: index -1 is out of bounds for axis 0 with size 0 #21262

Closed

4 tasks

ArthurZucker linked an issue Jan 23, 2023 that may be closed by this pull request

[Whisper] ASR Pipeline with "return_timestamps=True" gives IndexError: index -1 is out of bounds for axis 0 with size 0 #21262

Closed

4 tasks

ArthurZucker self-assigned this Jan 23, 2023

ArthurZucker added 8 commits January 23, 2023 18:01

update logit processes

4dda544

Merge branch 'main' of https://github.com/huggingface/transformers in…

bd5b1e8

…to refactor-whisper

Merge branch 'main' of https://github.com/huggingface/transformers in…

d2f3e16

…to refactor-whisper

major update

6fbb110

enforce first timestamp

19c7c97

update generate

35a810f

add more tests

60b7575

update new decoding strategy

e455131

Narsil approved these changes Jan 24, 2023

View reviewed changes

Merge branch 'main' of https://github.com/huggingface/transformers in…

8bd6ee8

…to refactor-whisper

ArthurZucker requested a review from sgugger January 24, 2023 18:25

ArthurZucker commented Jan 24, 2023

View reviewed changes

src/transformers/generation/logits_process.py Outdated Show resolved Hide resolved

src/transformers/models/whisper/modeling_whisper.py Outdated Show resolved Hide resolved

tests/pipelines/test_pipelines_automatic_speech_recognition.py Outdated Show resolved Hide resolved

Apply suggestions from code review

148f077

sgugger approved these changes Jan 24, 2023

View reviewed changes

src/transformers/models/whisper/modeling_whisper.py Outdated Show resolved Hide resolved

ArthurZucker added 2 commits January 25, 2023 09:27

update docstring

7a40626

fixup

b6ec3c4

default config will not have multilingual ar

b14b6c8

update expected tokenizer size, see pull on the hub for whisper-tiny

b20a59b

ArthurZucker merged commit 255257f into huggingface:main Jan 25, 2023

ArthurZucker mentioned this pull request Jan 25, 2023

Add WhisperTokenizerFast #21222

Merged

This was referenced Mar 24, 2023

Fix ORTModel MRO for whisper huggingface/optimum#919

Merged

Make inheritance consistent for classes having a generate method #22369

Closed

sanchit-gandhi reviewed Jan 29, 2024

View reviewed changes

ArthurZucker deleted the refactor-whisper branch January 30, 2024 09:26


		forced_decoder_ids = []

		if hasattr(generation_config, "is_multilingual") and generation_config.is_multilingual:

[Whisper] Refactor whisper #21252

[Whisper] Refactor whisper #21252

Uh oh!

Conversation

ArthurZucker commented Jan 23, 2023

What does this PR do?

Uh oh!

ArthurZucker Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

Narsil Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

Narsil Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

bjelkenhed commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Jan 23, 2023

Uh oh!

ArthurZucker commented Jan 23, 2023

Uh oh!

Narsil left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Jan 24, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ArthurZucker commented Jan 25, 2023

Uh oh!

sanchit-gandhi Jan 29, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HuggingFaceDocBuilderDev commented Jan 23, 2023 •

edited

Loading

bjelkenhed commented Jan 23, 2023 •

edited

Loading