[Whisper] Fix decoder ids methods #20599

sanchit-gandhi · 2022-12-05T17:59:07Z

What does this PR do?

The previous PR #20589 incorrectly returned a list of forced decoder ids:

from transformers import WhisperProcessor

processor = WhisperProcessor.from_pretrained("openai/whisper-tiny.en")
print(processor.get_decoder_prompt_ids(task="transcribe"))

Print Output:

[50257, 50358, 50362]

The correct format is a nested list of decoder ids, where the first element of each list specifies the position of the forced token and the second the token id:

print(processor.get_decoder_prompt_ids(task="transcribe"))

Print Output:

[(1, 50257), (2, 50358), (3, 50362)]

(at position 1 we force token 50257, at 2 we force 50358, at 3 we force 50362)

The PR also implements a test, thus making sure that no such error can be made again 😅

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2022-12-05T18:17:45Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for fixing!

ArthurZucker

Thanks for the quick fix

ArthurZucker · 2022-12-05T18:31:46Z

src/transformers/models/whisper/tokenization_whisper.py

+        self.set_prefix_tokens(task=task, language=language, predict_timestamps=not no_timestamps)
+        forced_decoder_ids = [(rank + 1, token) for rank, token in enumerate(self.prefix_tokens)]
+        return forced_decoder_ids


Nice catch! I should have realized when reviewing!

ArthurZucker · 2022-12-05T18:32:14Z

tests/models/whisper/test_processor_whisper.py

            msg="`processor` and `feature_extractor` model input names do not match",
        )
+
+    def test_get_decoder_prompt_ids(self):


* [Whisper] Fix decoder ids methods * enum property

[Whisper] Fix decoder ids methods

4bfbbbd

sanchit-gandhi requested a review from ArthurZucker December 5, 2022 18:00

enum property

416ce1e

bofenghuang mentioned this pull request Dec 5, 2022

Fix get_decoder_prompt_ids in whisper #20598

Closed

5 tasks

sanchit-gandhi requested a review from sgugger December 5, 2022 18:13

sgugger approved these changes Dec 5, 2022

View reviewed changes

ArthurZucker approved these changes Dec 5, 2022

View reviewed changes

sanchit-gandhi merged commit 74fb524 into huggingface:main Dec 5, 2022

mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022

[Whisper] Fix decoder ids methods (huggingface#20599)

11c904d

* [Whisper] Fix decoder ids methods * enum property

sanchit-gandhi deleted the whisper-tok-fix branch June 25, 2023 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Whisper] Fix decoder ids methods #20599

[Whisper] Fix decoder ids methods #20599

Uh oh!

sanchit-gandhi commented Dec 5, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 5, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Dec 5, 2022

Uh oh!

ArthurZucker Dec 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Whisper] Fix decoder ids methods #20599

[Whisper] Fix decoder ids methods #20599

Uh oh!

Conversation

sanchit-gandhi commented Dec 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 5, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 5, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanchit-gandhi commented Dec 5, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 5, 2022 •

edited

Loading