[ci-daily] Fix pipeline tests #21257

ArthurZucker · 2023-01-23T13:36:28Z

What does this PR do?

Should fix the automatic_speech_recognition_pipeline tests.
Also using streaming dataset to speed up tests. Think it is a good idea if we are only using 1 data.

HuggingFaceDocBuilderDev · 2023-01-23T13:56:08Z

The documentation is not available anymore as the PR was closed or merged.

…to ci-daily

ArthurZucker · 2023-01-23T17:43:32Z

src/transformers/pipelines/automatic_speech_recognition.py

-                word_offsets = []
+                offsets = []
                for word, (start_offset, end_offset) in chunk_offset:
-                    word_offsets.append({"word": word, "start_offset": start_offset, "end_offset": end_offset})
+                    offsets.append({"word": word, "start_offset": start_offset, "end_offset": end_offset})


normalized the name of this variable

ArthurZucker · 2023-01-23T17:44:06Z

src/transformers/pipelines/automatic_speech_recognition.py

            yield {"is_last": True, **processed, **extra}

-    def _forward(self, model_inputs, generate_kwargs=None):
+    def _forward(self, model_inputs, return_timestamps=False, generate_kwargs=None):


Adding this argument prevent the .pop from removing it for other processes.

ArthurZucker · 2023-01-23T17:44:37Z

src/transformers/pipelines/automatic_speech_recognition.py

-        consecutive = np.where(timestamp_tokens[:-1] & timestamp_tokens[1:])[0] + 1
-        last_timestamp = np.where(timestamp_tokens)[0][-1]
-        consecutive = np.append(consecutive, last_timestamp) if last_timestamp not in consecutive else consecutive
-        if seq_idx != 0:
+        if seq_idx != 0 and sum(timestamp_tokens) > 0:
+            consecutive = np.where(timestamp_tokens[:-1] & timestamp_tokens[1:])[0] + 1
+            last_timestamp = np.where(timestamp_tokens)[0][-1]
+            consecutive = np.append(consecutive, last_timestamp) if last_timestamp not in consecutive else consecutive


This just makes sure that if the model output no timestamps, we just don't throw an error

ArthurZucker · 2023-01-23T17:44:49Z

src/transformers/pipelines/automatic_speech_recognition.py

+        if "input_features" in processed:
+            processed_len = processed["input_features"].shape[-1]
+        elif "input_values" in processed:
+            processed_len = processed["input_values"].shape[-1]
+        if processed_len != chunk.shape[-1] and rescale:
+            ratio = processed_len / chunk_len


This was missing! Fixes the LM tests

sgugger

LGTM if @Narsil agrees :-)

use streaming dataset

e256e3c

ArthurZucker changed the title ~~use streaming dataset~~ [ci-daily] Fix pipeline tests Jan 23, 2023

ArthurZucker added 3 commits January 23, 2023 15:45

Merge branch 'main' of https://github.com/huggingface/transformers in…

bb191b9

…to ci-daily

fix whisper's test

4c70ec0

add rescale argument to chunk_iter

7c47fcd

ArthurZucker marked this pull request as ready for review January 23, 2023 17:41

ArthurZucker commented Jan 23, 2023

View reviewed changes

ArthurZucker requested review from Narsil and sgugger January 23, 2023 17:45

ArthurZucker mentioned this pull request Jan 23, 2023

[Whisper] Refactor whisper #21252

Merged

sgugger approved these changes Jan 23, 2023

View reviewed changes

Narsil approved these changes Jan 23, 2023

View reviewed changes

ArthurZucker merged commit b80b221 into huggingface:main Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci-daily] Fix pipeline tests #21257

[ci-daily] Fix pipeline tests #21257

Uh oh!

ArthurZucker commented Jan 23, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jan 23, 2023 •

edited

Loading

Uh oh!

ArthurZucker Jan 23, 2023

Uh oh!

ArthurZucker Jan 23, 2023

Uh oh!

ArthurZucker Jan 23, 2023

Uh oh!

ArthurZucker Jan 23, 2023

Uh oh!

sgugger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ci-daily] Fix pipeline tests #21257

[ci-daily] Fix pipeline tests #21257

Uh oh!

Conversation

ArthurZucker commented Jan 23, 2023

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 23, 2023

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HuggingFaceDocBuilderDev commented Jan 23, 2023 •

edited

Loading